[Online] Fundamentals of Accelerated Computing with Modern CUDA C++
Online
Fundamentals of Accelerated Computing with Modern CUDA C++

Schedule & Format
- Date: 2026, October 27-29
- Times:
- Oct 27: 9:00 - 13:00 CE(S)T
- Oct 28: 9:00 - 13:00 CE(S)T
- Oct 29: 9:00 - 13:00 CE(S)T
- Format: Three half-days
- Location: Online via Zoom
- Language: English
Registered participants will receive the video conferencing link via email on the day before the course.
Instructor
- Dr. Sebastian Kuckuk, NHR@FAU, certified NVIDIA DLI Ambassador
This course is organized by Erlangen National High Performance Computing Center (NHR@FAU) in collaboration with NVIDIA Deep Learning Institute (DLI).
Course Description
This course teaches GPU acceleration of C++ applications using CUDA, with an emphasis on modern C++ idioms rather than low-level GPU APIs. Starting from library-provided parallel algorithms that execute transparently on the GPU, it progresses through custom CUDA kernels, thread hierarchies, shared memory, and concurrent streams - covering the full range from high-level abstractions to fine-grained GPU control. No prior CUDA or GPU programming experience is required.
Further information about this tutorial can be found on the NVIDIA DLI course page.
Prerequisites
Knowledge
- C++ programming experience, including lambda expressions and standard library algorithms
Technical
- A free NVIDIA developer account
- A local installation of NVIDIA Nsight Systems is recommended
Course Structure
- GPU programming fundamentals: writing and launching CUDA-accelerated C++ code; applying parallel algorithms on the GPU
- Concurrency and profiling: CUDA streams, asynchronous data transfers, and code analysis with NVIDIA Nsight Systems
- Custom kernel development: thread hierarchies, shared memory, and cooperative parallel algorithms
Learning Outcomes
After completing this course, you will be able to:
- Accelerate C++ applications by writing, compiling, and running GPU code with CUDA
- Apply parallel algorithms to GPU workloads without writing custom kernels
- Manage CPU-GPU data movement and optimize memory access patterns
- Write custom CUDA kernels and manage thread hierarchies and shared memory
- Overlap computation with data transfers using concurrent CUDA streams
- Profile GPU code and identify performance bottlenecks with NVIDIA Nsight Systems
Registration, Wait List and Withdrawal Policy
Registration
Please register at the bottom of this page. Registration is open until a few days before the course starts, or until the course is fully booked.
Prices and Eligibility
Free for participants affiliated with academic institutions in EU member states and Horizon 2020-associated countries
Wait List
Email nhr-training@fau.de with name and university affiliation
Withdrawal Policy
Withdraw through the registration system or email nhr-training@fau.de. No-shows will be excluded from future events.
If you need to withdraw your registration, please either cancel it directly through the registration system or send an email to nhr-training@fau.de.
Additional Courses
You can find an up-to-date list of all courses offered by NHR@FAU at https://hpc.fau.de/teaching/tutorials-and-courses/.