Skip to content
rshipley160 edited this page Nov 8, 2021 · 52 revisions

NVIDIA CUDA Knowledge Base

Welcome to the knowledge base. All you need to get started is a CUDA-enabled device that you can run the example programs on, a basic understanding of C/C++ (some of the more complex elements like memory management and pointers will be reviewed as needed), and the desire to learn.

Fundamental CUDA Concepts

  1. What is Parallel Computing?
  2. What is a GPU?
  3. Basic CUDA Syntax
  4. Memory Management on the GPU
    a. CUDA Memory Types
    b. Using CUDA Memory
  5. Performance Experiment: On-GPU vs Off-GPU Bandwidth
  6. Thread and Block Scheduling
  7. Common Parallel Applications
    a. Reduction
    b. Matrix Multiplication

Asynchronous Computing Using CUDA

  1. Intro to Asynchronous Computing
  2. CUDA Streams
  3. Asynchronous Memory Transfers
  4. Performance Experiment: Multi-stream Parallelism
  5. Thread, Stream, and Device Synchronization
  6. Event-Based Synchronization and Dependencies
  7. Performance Experiment: Event-Based Synchronization vs Explicit Synchronization
  8. The Graph Model
  9. Creating a CUDA Graph using Stream Capture
  10. Performance Experiment: Graphs vs Streams vs Synchronous Kernels
  11. Performance Experiment: Increasing the Amount of Graph Nodes
  12. CUDA Graph API
  13. Synchronization & Dependencies Inside CUDA Graphs
  14. Using Host Functions in Graphs & Streams
  15. Graph API Node Glossary & Usage Examples