Merge remote-tracking branch 'upstream/main' into dev

hbaghramyan · Oct 20, 2024 · b4aeada · b4aeada
2 parents 8ab2406 + 1f61aeb
commit b4aeada
Show file tree

Hide file tree

Showing 22 changed files with 10,335 additions and 153 deletions.
diff --git a/.github/workflows/check-links.yml b/.github/workflows/check-links.yml
@@ -29,6 +29,6 @@ jobs:
 
     - name: Check links
       run: |
-        pytest --check-links ./ --check-links-ignore "https://platform.openai.com/*" --check-links-ignore "https://openai.com/*" --check-links-ignore "https://arena.lmsys.org" --check-links-ignore "https://www.reddit.com/r/*"
+        pytest --check-links ./ --check-links-ignore "https://platform.openai.com/*" --check-links-ignore "https://openai.com/*" --check-links-ignore "https://arena.lmsys.org" --check-links-ignore "https://www.reddit.com/r/*" --check-links-ignore "https://code.visualstudio.com/*" --check-links-ignore https://arxiv.org/* --check-links-ignore "https://ai.stanford.edu/~amaas/data/sentiment/"
         # pytest --check-links ./ --check-links-ignore "https://platform.openai.com/*" --check-links-ignore "https://arena.lmsys.org" --retries 2 --retry-delay 5
 
diff --git a/.gitignore b/.gitignore
@@ -38,6 +38,9 @@ ch05/06_user_interface/gpt2
 ch05/07_gpt_to_llama/Llama-2-7b
 ch05/07_gpt_to_llama/Llama-2-7b-chat
 ch05/07_gpt_to_llama/.cache
+ch05/07_gpt_to_llama/llama3-files
+ch05/07_gpt_to_llama/llama31-files
+ch05/07_gpt_to_llama/llama32-files
 
 ch06/01_main-chapter-code/gpt2
 ch06/02_bonus_additional-experiments/gpt2

diff --git a/README.md b/README.md
@@ -117,6 +117,8 @@ Several folders contain optional materials as a bonus for interested readers:
   - [Optimizing Hyperparameters for Pretraining](ch05/05_bonus_hparam_tuning)
   - [Building a User Interface to Interact With the Pretrained LLM](ch05/06_user_interface)
   - [Converting GPT to Llama](ch05/07_gpt_to_llama)
+  - [Llama 3.2 From Scratch](ch05/07_gpt_to_llama/standalone-llama32.ipynb)
+  - [Memory-efficient Model Weight Loading](ch05/08_memory_efficient_weight_loading/memory-efficient-state-dict.ipynb)
 - **Chapter 6:**
   - [Additional experiments finetuning different layers and using larger models](ch06/02_bonus_additional-experiments)
   - [Finetuning different models on 50k IMDB movie review dataset](ch06/03_bonus_imdb-classification)

diff --git a/ch01/README.md b/ch01/README.md
@@ -1,8 +1,15 @@
 # Chapter 1: Understanding Large Language Models
 
+
+&nbsp;
+## Main Chapter Code
+
 There is no code in this chapter.
 
-<br>
+
+&nbsp;
+## Bonus Materials
+
 As optional bonus material, below is a video tutorial where I explain the LLM development lifecycle covered in this book:
 
 <br>

diff --git a/ch02/01_main-chapter-code/ch02.ipynb b/ch02/01_main-chapter-code/ch02.ipynb
@@ -174,6 +174,14 @@
     "    urllib.request.urlretrieve(url, file_path)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "56488f2c-a2b8-49f1-aaeb-461faad08dce",
+   "metadata": {},
+   "source": [
+    "- (If you encounter an `ssl.SSLCertVerificationError` when executing the previous code cell, it might be due to using an outdated Python version; you can find [more information here on GitHub](https://github.com/rasbt/LLMs-from-scratch/pull/403))"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 3,

diff --git a/ch02/README.md b/ch02/README.md
@@ -1,10 +1,11 @@
 # Chapter 2: Working with Text Data
 
-
+&nbsp;
 ## Main Chapter Code
 
 - [01_main-chapter-code](01_main-chapter-code) contains the main chapter code and exercise solutions
 
+&nbsp;
 ## Bonus Materials
 
 - [02_bonus_bytepair-encoder](02_bonus_bytepair-encoder) contains optional code to benchmark different byte pair encoder implementations

diff --git a/ch03/README.md b/ch03/README.md
@@ -1,9 +1,11 @@
 # Chapter 3: Coding Attention Mechanisms
 
+&nbsp;
 ## Main Chapter Code
 
 - [01_main-chapter-code](01_main-chapter-code) contains the main chapter code.
 
+&nbsp;
 ## Bonus Materials
 
 - [02_bonus_efficient-multihead-attention](02_bonus_efficient-multihead-attention) implements and compares different implementation variants of multihead-attention

diff --git a/ch04/02_performance-analysis/previous_chapters.py b/ch04/02_performance-analysis/previous_chapters.py
@@ -6,52 +6,8 @@
 # This file collects all the relevant code that we covered thus far
 # throughout Chapters 2-4.
 # This file can be run as a standalone script.
-
-import tiktoken
 import torch
 import torch.nn as nn
-from torch.utils.data import Dataset, DataLoader
-
-#####################################
-# Chapter 2
-#####################################
-
-
-class GPTDatasetV1(Dataset):
-    def __init__(self, txt, tokenizer, max_length, stride):
-        self.input_ids = []
-        self.target_ids = []
-
-        # Tokenize the entire text
-        token_ids = tokenizer.encode(txt, allowed_special={"<|endoftext|>"})
-
-        # Use a sliding window to chunk the book into overlapping sequences of max_length
-        for i in range(0, len(token_ids) - max_length, stride):
-            input_chunk = token_ids[i:i + max_length]
-            target_chunk = token_ids[i + 1: i + max_length + 1]
-            self.input_ids.append(torch.tensor(input_chunk))
-            self.target_ids.append(torch.tensor(target_chunk))
-
-    def __len__(self):
-        return len(self.input_ids)
-
-    def __getitem__(self, idx):
-        return self.input_ids[idx], self.target_ids[idx]
-
-
-def create_dataloader_v1(txt, batch_size=4, max_length=256,
-                         stride=128, shuffle=True, drop_last=True, num_workers=0):
-    # Initialize the tokenizer
-    tokenizer = tiktoken.get_encoding("gpt2")
-
-    # Create dataset
-    dataset = GPTDatasetV1(txt, tokenizer, max_length, stride)
-
-    # Create dataloader
-    dataloader = DataLoader(
-        dataset, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last, num_workers=num_workers)
-
-    return dataloader
 
 
 #####################################
@@ -236,44 +192,3 @@ def generate_text_simple(model, idx, max_new_tokens, context_size):
         idx = torch.cat((idx, idx_next), dim=1)  # (batch, n_tokens+1)
 
     return idx
-
-
-if __name__ == "__main__":
-
-    GPT_CONFIG_124M = {
-        "vocab_size": 50257,     # Vocabulary size
-        "context_length": 1024,  # Context length
-        "emb_dim": 768,          # Embedding dimension
-        "n_heads": 12,           # Number of attention heads
-        "n_layers": 12,          # Number of layers
-        "drop_rate": 0.1,        # Dropout rate
-        "qkv_bias": False        # Query-Key-Value bias
-    }
-
-    torch.manual_seed(123)
-    model = GPTModel(GPT_CONFIG_124M)
-    model.eval()  # disable dropout
-
-    start_context = "Hello, I am"
-
-    tokenizer = tiktoken.get_encoding("gpt2")
-    encoded = tokenizer.encode(start_context)
-    encoded_tensor = torch.tensor(encoded).unsqueeze(0)
-
-    print(f"\n{50*'='}\n{22*' '}IN\n{50*'='}")
-    print("\nInput text:", start_context)
-    print("Encoded input text:", encoded)
-    print("encoded_tensor.shape:", encoded_tensor.shape)
-
-    out = generate_text_simple(
-        model=model,
-        idx=encoded_tensor,
-        max_new_tokens=10,
-        context_size=GPT_CONFIG_124M["context_length"]
-    )
-    decoded_text = tokenizer.decode(out.squeeze(0).tolist())
-
-    print(f"\n\n{50*'='}\n{22*' '}OUT\n{50*'='}")
-    print("\nOutput:", out)
-    print("Output length:", len(out[0]))
-    print("Output text:", decoded_text)
diff --git a/ch04/README.md b/ch04/README.md
@@ -1,10 +1,13 @@
 # Chapter 4: Implementing a GPT Model from Scratch to Generate Text
 
+&nbsp;
 ## Main Chapter Code
 
 - [01_main-chapter-code](01_main-chapter-code) contains the main chapter code.
 
-## Optional Code
+&nbsp;
+## Bonus Materials
 
-- [02_performance-analysis](02_performance-analysis) contains optional code analyzing the performance of the GPT model(s) implemented in the main chapter.
+- [02_performance-analysis](02_performance-analysis) contains optional code analyzing the performance of the GPT model(s) implemented in the main chapter
+- [ch05/07_gpt_to_llama](../ch05/07_gpt_to_llama) contains a step-by-step guide for converting a GPT architecture implementation to Llama 3.2 and loads pretrained weights from Meta AI (it might be interesting to look at alternative architectures after completing chapter 4, but you can also save that for after reading chapter 5)
 
diff --git a/ch05/07_gpt_to_llama/README.md b/ch05/07_gpt_to_llama/README.md
@@ -2,6 +2,10 @@
 
 
 
-This folder contains code for converting the GPT implementation from chapter 4 and 5 to Meta AI's Llama architecture:
+This folder contains code for converting the GPT implementation from chapter 4 and 5 to Meta AI's Llama architecture in the following recommended reading order:
 
-- [converting-gpt-to-llama2.ipynb](converting-gpt-to-llama2.ipynb): contains code to convert GPT to Llama 2 7B step by step and loads pretrained weights from Meta AI
+- [converting-gpt-to-llama2.ipynb](converting-gpt-to-llama2.ipynb): contains code to convert GPT to Llama 2 7B step by step and loads pretrained weights from Meta AI
+- [converting-llama2-to-llama3.ipynb](converting-llama2-to-llama3.ipynb): contains code to convert the Llama 2 model to Llama 3, Llama 3.1, and Llama 3.2
+- [standalone-llama32.ipynb](standalone-llama32.ipynb): a standalone notebook implementing Llama 3.2
+
+<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/gpt-to-llama/gpt-and-all-llamas.webp">