Week 1: Intro to GPUs and writing your first kernel!

Can you guess which architecture more closely resembles a CPU? What about a GPU?

Further resources/references to use:

PMPP Book Access
NVIDIA GPU Glossary

Week 2: Learning to optimize your kernels!

From the image, how many FLOPS (floating point operations) are in matrix multiplication?

Further references to use:

NCU Documentation

Week 3 and 4: Learning to optimize with Tensor Cores!

How much faster are Tensor Core operations compared to F32 CUDA Cores?

Further references to use:

Primer on Inline PTX Assembly
CUTLASS GEMM Documentation
NVIDIA PTX ISA Documentation (Chapter 9.7 is most relevant)

Week 6: Exploring other optimization parallel techniques!

How could we compute the sum of all the elements in a 1-million sized vector?

Week 7 & 8: Putting it all together in Flash Attention!

Is the self-attention layer in LLMs compute-bound or memory-bound?

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
assign1		assign1
assign2_pt1		assign2_pt1
assign2_pt2		assign2_pt2
assign3		assign3
assign4		assign4
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Week 1: Intro to GPUs and writing your first kernel!

Can you guess which architecture more closely resembles a CPU? What about a GPU?

Recommended Readings:

Further resources/references to use:

Week 2: Learning to optimize your kernels!

From the image, how many FLOPS (floating point operations) are in matrix multiplication?

Recommended Readings:

Further references to use:

Week 3 and 4: Learning to optimize with Tensor Cores!

How much faster are Tensor Core operations compared to F32 CUDA Cores?

Recommended Readings:

Further references to use:

Week 6: Exploring other optimization parallel techniques!

How could we compute the sum of all the elements in a 1-million sized vector?

Recommended Readings:

Week 7 & 8: Putting it all together in Flash Attention!

Is the self-attention layer in LLMs compute-bound or memory-bound?

Recommended Readings:

About

Uh oh!

Releases

Packages

Languages

AIS-UCLA/cuda-track

Folders and files

Latest commit

History

Repository files navigation

Week 1: Intro to GPUs and writing your first kernel!

Can you guess which architecture more closely resembles a CPU? What about a GPU?

Recommended Readings:

Further resources/references to use:

Week 2: Learning to optimize your kernels!

From the image, how many FLOPS (floating point operations) are in matrix multiplication?

Recommended Readings:

Further references to use:

Week 3 and 4: Learning to optimize with Tensor Cores!

How much faster are Tensor Core operations compared to F32 CUDA Cores?

Recommended Readings:

Further references to use:

Week 6: Exploring other optimization parallel techniques!

How could we compute the sum of all the elements in a 1-million sized vector?

Recommended Readings:

Week 7 & 8: Putting it all together in Flash Attention!

Is the self-attention layer in LLMs compute-bound or memory-bound?

Recommended Readings:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages