-
Notifications
You must be signed in to change notification settings - Fork 7
Home
rshipley160 edited this page Nov 8, 2021
·
52 revisions
Welcome to the knowledge base. All you need to get started is a CUDA-enabled device that you can run the example programs on, a basic understanding of C/C++ (some of the more complex elements like memory management and pointers will be reviewed as needed), and the desire to learn.
- What is Parallel Computing?
- What is a GPU?
- Basic CUDA Syntax
- Memory Management on the GPU
a. CUDA Memory Types
b. Using CUDA Memory - Performance Experiment: On-GPU vs Off-GPU Bandwidth
- Thread and Block Scheduling
- Common Parallel Applications
a. Reduction
b. Matrix Multiplication
- Introduction to Asynchronous Computing
- CUDA Streams
- Asynchronous Memory Transfers
- Performance Experiment: Multi-stream Parallelism
- Thread, Stream, and Device Synchronization
- Event-Based Synchronization and Dependencies
- Performance Experiment: Event-Based Synchronization vs Explicit Synchronization
- The Graph Model
- Creating a CUDA Graph using Stream Capture
- Performance Experiment: Graphs vs Streams vs Synchronous Kernels
- Performance Experiment: Increasing the Amount of Graph Nodes
- CUDA Graph API
- Synchronization & Dependencies Inside CUDA Graphs
- Using Host Functions in Graphs & Streams
- Graph API Node Glossary & Usage Examples