[Online] Choosing GPU Programming Approaches
Online
Choosing GPU Programming Approaches

Schedule & Format
- Date: 2026, November 9-10
- Times:
- Nov 9: 9:00 - 13:00 CE(S)T
- Nov 10: 9:00 - 13:00 CE(S)T
- Format: Two half-days
- Location: Online via Zoom
- Language: English
Registered participants will receive the video conferencing link via email on the day before the course.
Instructor
- Dr. Sebastian Kuckuk, NHR@FAU, certified NVIDIA DLI Ambassador
This course is organized by Erlangen National High Performance Computing Center (NHR@FAU).
Course Description
The GPU programming landscape has grown from a single dominant framework (CUDA) into a diverse ecosystem of vendor-neutral and performance-portable alternatives. Choosing the right approach for a given application - considering hardware targets, portability requirements, team expertise, and performance goals - is a non-trivial decision. This course surveys the most widely used GPU programming models: CUDA/HIP, SYCL, modern C++ parallel algorithms, Thrust, OpenACC, OpenMP offloading, and Kokkos. For each approach, participants see representative code, learn the key abstractions, and assess the trade-offs in portability, expressiveness, and performance.
Prerequisites
Knowledge
- Familiarity with modern C++ programming (templates, lambdas, and the STL)
- Prior experience with at least one GPU programming approach is recommended but not required
Technical
- A modern web browser (exercises run on NHR@FAU's HPC clusters via JupyterHub - no local installation required)
Course Structure
- GPU programming landscape: hardware diversity, portability challenges, and framework taxonomy
- Low-level, vendor-specific approaches: CUDA and HIP
- Open standard directives: OpenACC and OpenMP target offloading
- Performance portability libraries: Kokkos
- C++ abstraction layers: SYCL, Thrust, and standard library parallel algorithms
- Performance analysis: profiling with Nsight Systems and Nsight Compute, and common optimization patterns
- Hands-on programming challenge: porting STREAM, a 2D stencil, and a conjugate-gradient solver across multiple approaches
- Comparative evaluation: portability, performance, and practical considerations for framework selection
Learning Outcomes
After completing this course, you will be able to:
- Describe the key abstractions and execution model of each major GPU programming framework: CUDA/HIP, SYCL, OpenACC, OpenMP offloading, Kokkos, Thrust, and standard C++ parallel algorithms
- Implement a representative kernel in multiple frameworks and compare the resulting code
- Evaluate each approach across the dimensions of portability, performance, and programming effort
- Select the most appropriate GPU programming model for a given combination of hardware targets and application requirements
- Profile a GPU application with Nsight Systems and Nsight Compute and relate observed performance to the choice of approach
- Identify NHR@FAU courses and resources for deepening expertise in any specific framework
Registration, Wait List and Withdrawal Policy
Registration
Please register at the bottom of this page. Registration is open until a few days before the course starts, or until the course is fully booked.
Prices and Eligibility
Free for participants affiliated with academic institutions in EU member states and Horizon 2020-associated countries
Wait List
Email nhr-training@fau.de with name and university affiliation
Withdrawal Policy
Withdraw through the registration system or email nhr-training@fau.de. No-shows will be excluded from future events.
If you need to withdraw your registration, please either cancel it directly through the registration system or send an email to nhr-training@fau.de.
Additional Courses
You can find an up-to-date list of all courses offered by NHR@FAU at https://hpc.fau.de/teaching/tutorials-and-courses/.