Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/main' into dev
Browse files Browse the repository at this point in the history
  • Loading branch information
hbaghramyan committed Oct 20, 2024
2 parents 8ab2406 + 1f61aeb commit b4aeada
Show file tree
Hide file tree
Showing 22 changed files with 10,335 additions and 153 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/check-links.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,6 @@ jobs:
- name: Check links
run: |
pytest --check-links ./ --check-links-ignore "https://platform.openai.com/*" --check-links-ignore "https://openai.com/*" --check-links-ignore "https://arena.lmsys.org" --check-links-ignore "https://www.reddit.com/r/*"
pytest --check-links ./ --check-links-ignore "https://platform.openai.com/*" --check-links-ignore "https://openai.com/*" --check-links-ignore "https://arena.lmsys.org" --check-links-ignore "https://www.reddit.com/r/*" --check-links-ignore "https://code.visualstudio.com/*" --check-links-ignore https://arxiv.org/* --check-links-ignore "https://ai.stanford.edu/~amaas/data/sentiment/"
# pytest --check-links ./ --check-links-ignore "https://platform.openai.com/*" --check-links-ignore "https://arena.lmsys.org" --retries 2 --retry-delay 5
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,9 @@ ch05/06_user_interface/gpt2
ch05/07_gpt_to_llama/Llama-2-7b
ch05/07_gpt_to_llama/Llama-2-7b-chat
ch05/07_gpt_to_llama/.cache
ch05/07_gpt_to_llama/llama3-files
ch05/07_gpt_to_llama/llama31-files
ch05/07_gpt_to_llama/llama32-files

ch06/01_main-chapter-code/gpt2
ch06/02_bonus_additional-experiments/gpt2
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,8 @@ Several folders contain optional materials as a bonus for interested readers:
- [Optimizing Hyperparameters for Pretraining](ch05/05_bonus_hparam_tuning)
- [Building a User Interface to Interact With the Pretrained LLM](ch05/06_user_interface)
- [Converting GPT to Llama](ch05/07_gpt_to_llama)
- [Llama 3.2 From Scratch](ch05/07_gpt_to_llama/standalone-llama32.ipynb)
- [Memory-efficient Model Weight Loading](ch05/08_memory_efficient_weight_loading/memory-efficient-state-dict.ipynb)
- **Chapter 6:**
- [Additional experiments finetuning different layers and using larger models](ch06/02_bonus_additional-experiments)
- [Finetuning different models on 50k IMDB movie review dataset](ch06/03_bonus_imdb-classification)
Expand Down
9 changes: 8 additions & 1 deletion ch01/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,15 @@
# Chapter 1: Understanding Large Language Models


 
## Main Chapter Code

There is no code in this chapter.

<br>

&nbsp;
## Bonus Materials

As optional bonus material, below is a video tutorial where I explain the LLM development lifecycle covered in this book:

<br>
Expand Down
8 changes: 8 additions & 0 deletions ch02/01_main-chapter-code/ch02.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,14 @@
" urllib.request.urlretrieve(url, file_path)"
]
},
{
"cell_type": "markdown",
"id": "56488f2c-a2b8-49f1-aaeb-461faad08dce",
"metadata": {},
"source": [
"- (If you encounter an `ssl.SSLCertVerificationError` when executing the previous code cell, it might be due to using an outdated Python version; you can find [more information here on GitHub](https://github.com/rasbt/LLMs-from-scratch/pull/403))"
]
},
{
"cell_type": "code",
"execution_count": 3,
Expand Down
3 changes: 2 additions & 1 deletion ch02/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# Chapter 2: Working with Text Data


&nbsp;
## Main Chapter Code

- [01_main-chapter-code](01_main-chapter-code) contains the main chapter code and exercise solutions

&nbsp;
## Bonus Materials

- [02_bonus_bytepair-encoder](02_bonus_bytepair-encoder) contains optional code to benchmark different byte pair encoder implementations
Expand Down
2 changes: 2 additions & 0 deletions ch03/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
# Chapter 3: Coding Attention Mechanisms

&nbsp;
## Main Chapter Code

- [01_main-chapter-code](01_main-chapter-code) contains the main chapter code.

&nbsp;
## Bonus Materials

- [02_bonus_efficient-multihead-attention](02_bonus_efficient-multihead-attention) implements and compares different implementation variants of multihead-attention
Expand Down
85 changes: 0 additions & 85 deletions ch04/02_performance-analysis/previous_chapters.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,52 +6,8 @@
# This file collects all the relevant code that we covered thus far
# throughout Chapters 2-4.
# This file can be run as a standalone script.

import tiktoken
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

#####################################
# Chapter 2
#####################################


class GPTDatasetV1(Dataset):
def __init__(self, txt, tokenizer, max_length, stride):
self.input_ids = []
self.target_ids = []

# Tokenize the entire text
token_ids = tokenizer.encode(txt, allowed_special={"<|endoftext|>"})

# Use a sliding window to chunk the book into overlapping sequences of max_length
for i in range(0, len(token_ids) - max_length, stride):
input_chunk = token_ids[i:i + max_length]
target_chunk = token_ids[i + 1: i + max_length + 1]
self.input_ids.append(torch.tensor(input_chunk))
self.target_ids.append(torch.tensor(target_chunk))

def __len__(self):
return len(self.input_ids)

def __getitem__(self, idx):
return self.input_ids[idx], self.target_ids[idx]


def create_dataloader_v1(txt, batch_size=4, max_length=256,
stride=128, shuffle=True, drop_last=True, num_workers=0):
# Initialize the tokenizer
tokenizer = tiktoken.get_encoding("gpt2")

# Create dataset
dataset = GPTDatasetV1(txt, tokenizer, max_length, stride)

# Create dataloader
dataloader = DataLoader(
dataset, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last, num_workers=num_workers)

return dataloader


#####################################
Expand Down Expand Up @@ -236,44 +192,3 @@ def generate_text_simple(model, idx, max_new_tokens, context_size):
idx = torch.cat((idx, idx_next), dim=1) # (batch, n_tokens+1)

return idx


if __name__ == "__main__":

GPT_CONFIG_124M = {
"vocab_size": 50257, # Vocabulary size
"context_length": 1024, # Context length
"emb_dim": 768, # Embedding dimension
"n_heads": 12, # Number of attention heads
"n_layers": 12, # Number of layers
"drop_rate": 0.1, # Dropout rate
"qkv_bias": False # Query-Key-Value bias
}

torch.manual_seed(123)
model = GPTModel(GPT_CONFIG_124M)
model.eval() # disable dropout

start_context = "Hello, I am"

tokenizer = tiktoken.get_encoding("gpt2")
encoded = tokenizer.encode(start_context)
encoded_tensor = torch.tensor(encoded).unsqueeze(0)

print(f"\n{50*'='}\n{22*' '}IN\n{50*'='}")
print("\nInput text:", start_context)
print("Encoded input text:", encoded)
print("encoded_tensor.shape:", encoded_tensor.shape)

out = generate_text_simple(
model=model,
idx=encoded_tensor,
max_new_tokens=10,
context_size=GPT_CONFIG_124M["context_length"]
)
decoded_text = tokenizer.decode(out.squeeze(0).tolist())

print(f"\n\n{50*'='}\n{22*' '}OUT\n{50*'='}")
print("\nOutput:", out)
print("Output length:", len(out[0]))
print("Output text:", decoded_text)
7 changes: 5 additions & 2 deletions ch04/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
# Chapter 4: Implementing a GPT Model from Scratch to Generate Text

&nbsp;
## Main Chapter Code

- [01_main-chapter-code](01_main-chapter-code) contains the main chapter code.

## Optional Code
&nbsp;
## Bonus Materials

- [02_performance-analysis](02_performance-analysis) contains optional code analyzing the performance of the GPT model(s) implemented in the main chapter.
- [02_performance-analysis](02_performance-analysis) contains optional code analyzing the performance of the GPT model(s) implemented in the main chapter
- [ch05/07_gpt_to_llama](../ch05/07_gpt_to_llama) contains a step-by-step guide for converting a GPT architecture implementation to Llama 3.2 and loads pretrained weights from Meta AI (it might be interesting to look at alternative architectures after completing chapter 4, but you can also save that for after reading chapter 5)

8 changes: 6 additions & 2 deletions ch05/07_gpt_to_llama/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@



This folder contains code for converting the GPT implementation from chapter 4 and 5 to Meta AI's Llama architecture:
This folder contains code for converting the GPT implementation from chapter 4 and 5 to Meta AI's Llama architecture in the following recommended reading order:

- [converting-gpt-to-llama2.ipynb](converting-gpt-to-llama2.ipynb): contains code to convert GPT to Llama 2 7B step by step and loads pretrained weights from Meta AI
- [converting-gpt-to-llama2.ipynb](converting-gpt-to-llama2.ipynb): contains code to convert GPT to Llama 2 7B step by step and loads pretrained weights from Meta AI
- [converting-llama2-to-llama3.ipynb](converting-llama2-to-llama3.ipynb): contains code to convert the Llama 2 model to Llama 3, Llama 3.1, and Llama 3.2
- [standalone-llama32.ipynb](standalone-llama32.ipynb): a standalone notebook implementing Llama 3.2

<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/gpt-to-llama/gpt-and-all-llamas.webp">
Loading

0 comments on commit b4aeada

Please sign in to comment.