[Online] Accelerating CUDA C++ Applications with Multiple GPUs

Name: [Online] Accelerating CUDA C++ Applications with Multiple GPUs
Start: 2025-03-19T09:00:00+01:00
End: 2025-03-19T17:00:00+01:00
Location: No location set

Wednesday 19 Mar 2025, 09:00 → 17:00 Europe/Berlin

Description

Date and Time

The course will be held online on March 19 from 9:00 a.m. to 5:00 p.m. (CET).

Registered participants will receive the Zoom participation link via email the day before the course begins.

This course is part two of the three-event series, "From Zero to Multi-Node GPU Programming". Please register individually for each day you wish to attend:

Part 1: Fundamentals of Accelerated Computing with CUDA C/C++ (March 12)
Part 2: Accelerating CUDA C++ Applications with Multiple GPUs (March 19)
Part 3: Scaling CUDA C++ Applications to Multiple Nodes (March 26)

Prerequisites

A free NVIDIA developer account is required to access the course material. Please register before the training at https://learn.nvidia.com/join.

Participants should additionally meet the following requirements:

Successful completion of Part 1: Fundamentals of Accelerated Computing with CUDA C/C++, or equivalent experience in implementing CUDA C/C++ applications, including:
- Memory allocation
- Host-to-device and device-to-host memory transfers
- Kernel launches
- Grid-stride loops
- CUDA error handling
Familiarity with the Linux command line
Experience using Makefiles

Learning Objectives

At the conclusion of the workshop, you will be able to:

Use concurrent CUDA streams to overlap memory transfers with GPU computation
Scale workloads across available GPUs on a single node
Combine memory copy/compute overlap with multiple GPUs
Utilize the NVIDIA Nsight Systems timeline to identify improvement opportunities and assess the impact of the techniques covered in the workshop

Course Structure

Introduction to CUDA Streams

Get familiar with your GPU-accelerated interactive JupyterLab environment.
Orient yourself with a single GPU CUDA C++ application that will be the starting point for the course.
Observe the current performance of the single GPU CUDA C++ application using Nsight Systems.
Learn the rules that govern concurrent CUDA stream behavior.
Use multiple CUDA streams to perform concurrent host-to-device and device-to-host memory transfers.
Utilize multiple CUDA streams for launching GPU kernels.
Observe multiple streams in the Nsight Systems Visual Profiler timeline view.

Multiple GPUs with CUDA C++

Learn the key concepts for effectively using multiple GPUs on a single node with CUDA C++.
Explore robust indexing strategies for the flexible use of multiple GPUs in applications.
Refactor the single-GPU CUDA C++ application to utilize multiple GPUs.
See multiple-GPU utilization in the Nsight Systems Visual Profiler timeline.

Copy/ Compute Overlap with CUDA Streams

Learn the key concepts for effectively performing copy/ compute overlap.
Explore robust indexing strategies for the flexible use of copy/ compute overlap in applications.
Refactor the single-GPU CUDA C++ application to perform copy/ compute overlap.
See copy/ compute overlap in the Nsight Systems visual profiler timeline.

Copy/ Compute Overlap with Multiple GPUs

Learn the key concepts for effectively performing copy/ compute overlap on multiple GPUs.
Explore robust indexing strategies for the flexible use of copy/ compute overlap on multiple GPUs.
Refactor the single-GPU CUDA C++ application to perform copy/ compute overlap on multiple GPUs.
Observe performance benefits for copy/ compute overlap on multiple GPUs.
See copy/ compute overlap on multiple GPUs in the Nsight Systems visual profiler timeline.

Certification

Upon successfully completing the course assessments, participants will receive an NVIDIA DLI Certificate, recognizing their subject matter expertise and supporting their professional career growth.

Language

The course will be conducted in English.

Instructors

Dr. Sebastian Kuckuk, Markus Velten, both certified NVIDIA DLI Ambassadors.

The course is co-organised by NHR@FAU, NHR@TUD and the NVIDIA Deep Learning Institute (DLI).

Prices and Eligibility

This course is open and free of charge for participants affiliated with academic institutions in European Union (EU) member states and Horizon 2020-associated countries.

Withdrawal Policy

We kindly ask participants to register only if they are committed to attending the course. No-shows will be blacklisted and excluded from future events.

If you need to withdraw your registration, please either cancel it directly through the registration system or send an email to sebastian.kuckuk@fau.de .

Wait List

If the course reaches its maximum capacity, you can request to join the waitlist by sending an email to sebastian.kuckuk@fau.de . Please include your name and university affiliation in the message.

Additional Courses

You can find an up-to-date list of all courses offered by NHR@FAU at https://hpc.fau.de/teaching/tutorials-and-courses/ and by NHR@TUD at https://tu-dresden.de/zih/hochleistungsrechnen/nhr-training .

The agenda of this meeting is empty

Choose timezone