Author: Jeng Bai-Cheng([email protected])
An example code is worth a thousand words. This repository intends to host fundamental, but useful examples. Each example is just a few dozen lines of code. Most of them come from my past experience in HPC projects, but readers do not need to have the HPC background to understand the examples.
- acc_async - faster way to enqueue GPU routines(kernels)
- access_efficiency - faster way to access a GPU arrary
- alternative_nested_parallelism - alternative to nested parallelism on the GPU
- array_setting - faster way to initialize a GPU array
- atomic_op - use atomic operation to maximize parallelism
- cuda_c_binding - call CUDA C from Fortran
- cuda_graph - faster way to launch GPU kernels
- device_routine - usage of GPU routine. Call other routines in the GPU kernel
- device_variable - usage of GPU variable. Access a global variable from other modules in the GPU kernel
- hybrid_omp_acc - usage of OpenMP and OpenACC
- cuda_mpi_sendrecv - CUDA-Aware MPI, faster way to use MPI on GPU
- cuda_unified_memory_mpi_bcast - usage of CUDA Unified Memory and MPI, more convenient way to use MPI on GPU
- nccl_alltoall - faster Alltoall on GPU
- nccl_alltoallv - faster Alltoallv on GPU
- auto_nvtx - use compiler to insert CPU profiling routines automatically
- profiling_range - demonstration of focused profiling via profiling tool
- NVIDIA HPC SDK 21.3
To install HPC SDK via Docker, visit NVIDIA GPU Cloud: https://ngc.nvidia.com/catalog/containers/nvidia:nvhpc/tags
Or download HPC SDK from official website: https://developer.nvidia.com/hpc-sdk
$ cd <folder>
$ make
$ cd <folder>
$ ./<executable>