Here is the adventure of generation of the next tokens from start to finish, documented step by step as:
0. THE JOURNEY
1. INITIALIZATION
2. LOADING TORCH MODEL
3. LOADING TORCH MODEL (DETAILS)
4. LOADING MODEL ARGS
5. LOADING TOKENIZER MODEL
6. OBSOLETE - LOADING LLAMA 2 TOKENIZER MODEL
7. BFLOAT16 DATA TYPE
8. TENSOR
9. IMPLEMENTING LLAMA MODEL ARCHITECTURE
10. RoPE (ROTARY POSITIONAL EMBEDDINGS)
10.BONUS. PRECOMPUTING FREQUENCY TENSOR (Python Notebook)
11. ASKING FOR USER INPUT
12. TOKENIZATION
13. GENERATING NEXT TOKENS
14. MAKING PREDICTION with LLAMA MODEL - 1
15. MAKING PREDICTION with LLAMA MODEL - 2
16. MAKING PREDICTION with LLAMA MODEL - 3
17. UNICODE, UTF-8 and EMOJIS
18. CONCLUSION
19. REFERENCES
20. DIAGRAMS