[Online] GPU Performance Engineering
Online
GPU Performance Engineering

Schedule & Format
- Date: 2026, September 28-30
- Times:
- Sep 28: 9:00 - 13:00 CE(S)T
- Sep 29: 9:00 - 13:00 CE(S)T
- Sep 30: 9:00 - 13:00 CE(S)T
- Format: Three half-days
- Location: Online via Zoom
- Language: English
Registered participants will receive the video conferencing link via email on the day before the course.
Instructor
- Dr. Sebastian Kuckuk, NHR@FAU, certified NVIDIA DLI Ambassador
This course is organized by Erlangen National High Performance Computing Center (NHR@FAU).
Course Description
Porting code to the GPU can yield significant speedups, but achieving good GPU utilization requires understanding where and why performance falls short. This advanced course addresses exactly that challenge: it introduces NVIDIA's profiling ecosystem - Nsight Systems for application-level timeline analysis and Nsight Compute for individual kernel assessment - and pairs them with resource-based performance models that let developers judge how close their code comes to the hardware's theoretical limits. Instrumentation with NVTX markers is also covered to improve profiler output legibility.
This course is the significantly extended successor to the earlier GPU Performance Analysis course, which was offered as a standalone half-day course until 2025. NHR@FAU also offers a condensed two-hour GPU Performance Analysis Module for integration into summer schools and other larger events.
Prerequisites
Knowledge
- Experience with GPU programming in CUDA or OpenMP offloading using C/C++
Technical
- A modern web browser (for JupyterHub access to NHR@FAU's HPC clusters)
- A local installation of NVIDIA Nsight Systems and Nsight Compute (no local GPU required)
Course Structure
- GPU architecture fundamentals and the roofline model for GPUs
- Application instrumentation with NVTX and timeline analysis with Nsight Systems
- Kernel-level profiling with Nsight Compute: metrics, roofline, and memory analysis
- Interpreting bottleneck indicators and guiding optimization decisions
Learning Outcomes
After completing this course, you will be able to:
- Instrument GPU applications with NVTX markers to produce interpretable profiler output
- Use the Nsight Systems CLI and GUI to capture and analyze application-level timelines
- Use the Nsight Compute CLI and GUI to assess individual CUDA kernel performance
- Apply resource-based performance models to determine theoretical performance limits
- Identify the dominant bottleneck of a GPU kernel and quantify the gap to peak performance
- Prioritize optimization effort based on profiling data and performance model predictions
Registration, Wait List and Withdrawal Policy
Registration
Please register at the bottom of this page. Registration is open until a few days before the course starts, or until the course is fully booked.
Prices and Eligibility
Free for participants affiliated with academic institutions in EU member states and Horizon 2020-associated countries
Wait List
Email nhr-training@fau.de with name and university affiliation
Withdrawal Policy
Withdraw through the registration system or email nhr-training@fau.de. No-shows will be excluded from future events.
If you need to withdraw your registration, please either cancel it directly through the registration system or send an email to nhr-training@fau.de.
Additional Courses
You can find an up-to-date list of all courses offered by NHR@FAU at https://hpc.fau.de/teaching/tutorials-and-courses/.