For self-study (all students):
- Attention Is All You Need: https://arxiv.org/abs/1706.03762
- Attention Is ALl You Need video: https://www.youtube.com/watch?v=iDulhoQ2pro
- Must watch: Neural Networks (chapter 5 - chapter 7)
Supplementary materials:
- Multi-Headed Attention implementation.
- Transformer Encoder and Decoder Models implementation
- Fixed Positional Encodings implementation
Advanced (for students who want to learn more):
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT): paper, code, video
- BERT: paper code, video
- GPT-2: paper, code, video
Transformer and ViT implementations:
References: