[Online] Scaling CUDA-Accelerated Applications
Online
Scaling CUDA-Accelerated Applications

Schedule & Format
- Date: 2026, September 7-9
- Times:
- Sep 7: 9:00 - 15:00 CE(S)T
- Sep 8: 9:00 - 15:00 CE(S)T
- Sep 9: 9:00 - 15:00 CE(S)T
- Format: Three-day
- Location: Online via Zoom
- Language: English
Registered participants will receive the video conferencing link via email on the day before the course.
From Zero to Multi-Node GPU Programming
This event is part of the From Zero to Multi-Node GPU Programming series. Registration is done individually for each part of the series.
- Part 1 - Introduction to CUDA C/C++ (2026, September 3-4) (Register)
- Part 2 - Scaling CUDA-Accelerated Applications (this course) (2026, September 7-9) (Register)
Instructors
- Dr. Sebastian Kuckuk, NHR@FAU, certified NVIDIA DLI Ambassador
- Aditya Ujeniya, NHR@FAU
- Markus Velten, NHR@TUD, certified NVIDIA DLI Ambassador
This course is organized by Erlangen National High Performance Computing Center (NHR@FAU) in collaboration with NHR@TUD.
Course Description
Scaling a GPU application beyond a single accelerator requires both intra-node and inter-node parallelism. This course provides a comprehensive treatment of both: part one covers CUDA streams, multi-GPU execution within a node, and direct peer-to-peer GPU memory access; part two extends that foundation to multi-node deployments using CUDA-aware MPI and NVSHMEM, including domain decomposition and halo exchange patterns. The course uses a progression from CPU baseline through managed memory and algorithmic partitioning to full distributed execution.
This course was developed to replace the two formerly separate NVIDIA DLI courses Accelerating CUDA C++ Applications with Multiple GPUs and Scaling CUDA C++ Applications to Multiple Nodes which have been first on hold and then finally discontinued in 2025 and 2026.
Prerequisites
Knowledge
- Experience with CUDA C++ GPU programming, including memory allocation, kernel launches, grid-stride loops, and error handling (equivalent to the Introduction to CUDA C/C++ course)
- Familiarity with the Linux command line as well as compiling and running CUDA applications
Technical
- A modern web browser (for JupyterHub access to NHR@FAU's HPC clusters)
- A local installation of NVIDIA Nsight Systems
Course Structure
- CPU baseline and GPU porting: managed memory, algorithmic work partitioning
- CUDA streams and copy/compute overlap: concurrent execution and Nsight Systems profiling
- Multi-GPU programming: device management, workload indexing, and peer-to-peer communication
- Multi-node parallelism: MPI fundamentals, CUDA-aware MPI, and halo exchanges
- NVSHMEM: symmetric memory model, GPU-initiated transfers, and distributed solvers
Learning Outcomes
After completing this course, you will be able to:
- Use concurrent CUDA streams to overlap memory transfers with GPU computation
- Scale CUDA C++ workloads across multiple GPUs within a single compute node
- Enable and exploit direct peer-to-peer GPU memory access for efficient intra-node communication
- Write portable, scalable SPMD code using CUDA-aware MPI with inter-node GPU communication
- Apply NVSHMEM for GPU-initiated data transfers using the symmetric memory model
- Implement domain decomposition and halo exchange patterns for distributed GPU workloads
- Profile multi-GPU execution and identify performance bottlenecks with NVIDIA Nsight Systems
Registration, Wait List and Withdrawal Policy
Registration
Please register at the bottom of this page. Registration is open until a few days before the course starts, or until the course is fully booked.
Prices and Eligibility
This course is open and free of charge for participants affiliated with academic institutions in European Union (EU) member states and Horizon 2020-associated countries.
Wait List
If the course reaches its maximum capacity, you can request to join the wait list by sending an email to nhr-training@fau.de. Please include your name and university affiliation in the message.
Withdrawal Policy
Please only register if you are committed to attending the course. No-shows will be blacklisted and excluded from future events.
If you need to withdraw your registration, please either cancel it directly through the registration system or send an email to nhr-training@fau.de.
Additional Courses
You can find an up-to-date list of all courses offered by NHR@FAU at https://hpc.fau.de/teaching/tutorials-and-courses/.