This folder contains the material for an advanced GPU computing course taught in 2023 at the Swiss National Supercomputing Centre (CSCS), ETH Zurich.
The first part of the course, taught by Tim Besard (JuliaHub), focusses on (advanced) usage of CUDA.jl and how to analyze and optimize GPU applications written in Julia. It covers:
- Advanced usage of CUDA.jl
- library integrations and wrappers (CUDA driver API, CUBLAS, etc)
- programming models (array abstractions, kernels)
- memory management
- task-based concurrent GPU computing
- Performance deep-dive
- application analysis and optimization (using NSight Systems)
- kernel analysis and optimization (using NSight Compute)
A YouTube recording is available, with the following key timestamps:
- 00:00: Introduction to the course
- 03:23: Introduction to part 1
- 04:59: Presentation of notebook 1-0: Introduction
- 24:19: Presentation of notebook 1-1: Array programming
- 43:18: Presentation of notebook 1-2: Application analysis and optimization
- 1:33:22: Presentation of notebook 1-3: Kernel programming
- 2:25:23: Presentation of notebook 1-4: Kernel analysis and optimization
- 3:19:16: Presentation of notebook 2-1: CUDA libraries
- 3:41:08: Presentation of notebook 2-2: Memory management
- 4:03:44: Presentation of notebook 2-3: Concurrent computing
The second part of the course, taught by Samuel Omlin (CSCS) deals with more concrete examples that matter to the HPC community. A YouTube recording is available too, with the following key timestamps:
- 00:51: High-speed introduction/thoughts on GPU supercomputing
- 08:38: Overview on course notebooks of part 1
- 11:08: Presentation of notebook 1: Memory copy and performance evaluation
- 43:59: Walk through solutions of notebook 2: Application performance evaluation and optimization
- 58:29: Presentation on sustainable HPC building block development in Julia
- 1:27:56: Walk through solutions of notebook 3: Using shared memory
- 1:37:35: Walk through solutions of notebook 4: Steering registers and using warp level functions
- 1:57:02: Walk through solutions of notebook 5: Distributed parallelization