[Online] GPU Performance Engineering

Europe/Berlin
Online

Online

Description

GPU Performance Engineering

NHR@FAU

Schedule & Format

  • Date: 2026, September 28-30
  • Times:
    • Sep 28: 9:00 - 13:00 CE(S)T
    • Sep 29: 9:00 - 13:00 CE(S)T
    • Sep 30: 9:00 - 13:00 CE(S)T
  • Format: Three half-days
  • Location: Online via Zoom
  • Language: English

Registered participants will receive the video conferencing link via email on the day before the course.

Instructor

This course is organized by Erlangen National High Performance Computing Center (NHR@FAU).

Course Description

Porting code to the GPU can yield significant speedups, but achieving good GPU utilization requires understanding where and why performance falls short. This advanced course addresses exactly that challenge: it introduces NVIDIA's profiling ecosystem - Nsight Systems for application-level timeline analysis and Nsight Compute for individual kernel assessment - and pairs them with resource-based performance models that let developers judge how close their code comes to the hardware's theoretical limits. Instrumentation with NVTX markers is also covered to improve profiler output legibility.

This course is the significantly extended successor to the earlier GPU Performance Analysis course, which was offered as a standalone half-day course until 2025. NHR@FAU also offers a condensed two-hour GPU Performance Analysis Module for integration into summer schools and other larger events.

Prerequisites

Knowledge

  • Experience with GPU programming in CUDA or OpenMP offloading using C/C++

Technical

  • A modern web browser (for JupyterHub access to NHR@FAU's HPC clusters)
  • A local installation of NVIDIA Nsight Systems and Nsight Compute (no local GPU required)

Course Structure

  • GPU architecture fundamentals and the roofline model for GPUs
  • Application instrumentation with NVTX and timeline analysis with Nsight Systems
  • Kernel-level profiling with Nsight Compute: metrics, roofline, and memory analysis
  • Interpreting bottleneck indicators and guiding optimization decisions

Learning Outcomes

After completing this course, you will be able to:

  • Instrument GPU applications with NVTX markers to produce interpretable profiler output
  • Use the Nsight Systems CLI and GUI to capture and analyze application-level timelines
  • Use the Nsight Compute CLI and GUI to assess individual CUDA kernel performance
  • Apply resource-based performance models to determine theoretical performance limits
  • Identify the dominant bottleneck of a GPU kernel and quantify the gap to peak performance
  • Prioritize optimization effort based on profiling data and performance model predictions

Registration, Wait List and Withdrawal Policy

Registration

Please register at the bottom of this page. Registration is open until a few days before the course starts, or until the course is fully booked.

Prices and Eligibility

Free for participants affiliated with academic institutions in EU member states and Horizon 2020-associated countries

Wait List

Email nhr-training@fau.de with name and university affiliation

Withdrawal Policy

Withdraw through the registration system or email nhr-training@fau.de. No-shows will be excluded from future events.

If you need to withdraw your registration, please either cancel it directly through the registration system or send an email to nhr-training@fau.de.

Additional Courses

You can find an up-to-date list of all courses offered by NHR@FAU at https://hpc.fau.de/teaching/tutorials-and-courses/.

Registration
Participants
0 / 40
The agenda of this meeting is empty