Skip to content

AIS-UCLA/cuda-track

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Week 1: Intro to GPUs and writing your first kernel!

gpu-devotes-more-transistors-to-data-processing

Can you guess which architecture more closely resembles a CPU? What about a GPU?

Recommended Readings:

Motivation for GPUs in Deep Learning
A gentle introduction to CUDA

Further resources/references to use:

PMPP Book Access
NVIDIA GPU Glossary

Week 2: Learning to optimize your kernels!

gemm1

From the image, how many FLOPS (floating point operations) are in matrix multiplication?

Recommended Readings:

Aalto University's Course on GPU Programming
Simon's Blog on SGEMM (Kernels 1-5 are the most relevant for the assignment)
How to use NCU profiler
Roofline Models

Further references to use:

NCU Documentation

Week 3 and 4: Learning to optimize with Tensor Cores!

Tensor-Core-Matrix

How much faster are Tensor Core operations compared to F32 CUDA Cores?

Recommended Readings:

A sequel to Simon's Blog in HGEMM
Bruce's Blog on HGEMM
Spatter's Blog on HGEMM
NVIDIA's Presentation on A100 Tensor Cores

Further references to use:

Primer on Inline PTX Assembly
CUTLASS GEMM Documentation
NVIDIA PTX ISA Documentation (Chapter 9.7 is most relevant)

Week 6: Exploring other optimization parallel techniques!

1_l1uoTZpQUW8YaSjFpcMNlw

How could we compute the sum of all the elements in a 1-million sized vector?

Recommended Readings:

Primer on Parallel Reduction
Warp level Primitives
Vectorization
Efficient Softmax Kernel
Online Softmax Paper

Week 7 & 8: Putting it all together in Flash Attention!

0_maKQLOzxf4mK3B4O

Is the self-attention layer in LLMs compute-bound or memory-bound?

Recommended Readings:

Flash Attention V1 Paper
Aleksa Gordic's Flash Attention Blog

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published