Skip to content

Commit

Permalink
Merge branch 'dev'
Browse files Browse the repository at this point in the history
  • Loading branch information
hbaghramyan committed Oct 25, 2024
2 parents 33bae2a + 4926a16 commit ad946db
Show file tree
Hide file tree
Showing 9 changed files with 274 additions and 98 deletions.
11 changes: 7 additions & 4 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,15 @@ ch05/01_main-chapter-code/model.pth
ch05/01_main-chapter-code/model_and_optimizer.pth
ch05/03_bonus_pretraining_on_gutenberg/model_checkpoints
ch05/06_user_interface/gpt2
ch05/07_gpt_to_llama/.cache
ch05/07_gpt_to_llama/Llama-2-7b
ch05/07_gpt_to_llama/Llama-2-7b-chat
ch05/07_gpt_to_llama/.cache
ch05/07_gpt_to_llama/llama3-files
ch05/07_gpt_to_llama/llama31-files
ch05/07_gpt_to_llama/llama32-files
ch05/07_gpt_to_llama/Llama-3-8B
ch05/07_gpt_to_llama/Llama-3-8B-Instruct
ch05/07_gpt_to_llama/Llama-3.1-8B
ch05/07_gpt_to_llama/Llama-3.1-8B-Instruct
ch05/07_gpt_to_llama/Llama-3.2-1B
ch05/07_gpt_to_llama/Llama-3.2-1B-Instruct

ch06/01_main-chapter-code/gpt2
ch06/02_bonus_additional-experiments/gpt2
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,50 +22,6 @@
"</table>"
]
},
{
"cell_type": "markdown",
"id": "1HABx0Hr3PDD",
"metadata": {
"id": "1HABx0Hr3PDD"
},
"source": [
"Uncomment and execute the following code cell to install the dependencies:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "qPnVNAOxwy5s",
"metadata": {
"id": "qPnVNAOxwy5s"
},
"outputs": [],
"source": [
"# pip install -r https://raw.githubusercontent.com/rasbt/LLMs-from-scratch/main/requirements.txt"
]
},
{
"cell_type": "markdown",
"id": "LYLcq3403Yq6",
"metadata": {
"id": "LYLcq3403Yq6"
},
"source": [
"Uncomment and execute the following code cell to install the PyTorch nightly dependency if you want to run the FlexAttention benchmarks (this is required because FlexAttention is not yet included in the latest PyTorch release):"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "gAgYvxm_xVct",
"metadata": {
"id": "gAgYvxm_xVct"
},
"outputs": [],
"source": [
"# pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121 -U"
]
},
{
"cell_type": "markdown",
"id": "6f678e62-7bcb-4405-86ae-dce94f494303",
Expand Down Expand Up @@ -119,6 +75,28 @@
"embeddings = torch.randn((batch_size, context_len, embed_dim), device=device)"
]
},
{
"cell_type": "markdown",
"id": "LYLcq3403Yq6",
"metadata": {
"id": "LYLcq3403Yq6"
},
"source": [
"- To run all the code in this notebook, please ensure you update to at least PyTorch 2.5 (FlexAttention is not included in earlier PyTorch releases)\n",
"- If the code cell above shows a PyTorch version lower than 2.5, you can upgrade your PyTorch installation by uncommenting and running the following code cell (Please note that PyTorch 2.5 requires Python 3.9 or later)\n",
"- For more specific instructions and CUDA versions, please refer to the official installation guide at https://pytorch.org"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1db27f43-86f4-478f-89df-fbc2182a129b",
"metadata": {},
"outputs": [],
"source": [
"# pip install --upgrade torch torchvision torchaudio"
]
},
{
"cell_type": "markdown",
"id": "2f9bb1b6-a1e5-4e0a-884d-0f31b374a8d6",
Expand Down Expand Up @@ -908,12 +886,14 @@
"id": "d2164859-31a0-4537-b4fb-27d57675ba77"
},
"source": [
"- Set `need_weights` (default `True`) to need_weights=False so that `MultiheadAttention` uses `scaled_dot_product_attention` [according to the documentation](https://github.com/pytorch/pytorch/blob/71d020262793542974cf13b30f2a9099773f015c/torch/nn/modules/activation.py#L1096)\n",
"- Set `need_weights` (default `True`) to `False` so that `MultiheadAttention` uses `scaled_dot_product_attention` [according to the documentation](https://github.com/pytorch/pytorch/blob/71d020262793542974cf13b30f2a9099773f015c/torch/nn/modules/activation.py#L1096)\n",
"\n",
"> need_weights: If specified, returns ``attn_output_weights`` in addition to ``attn_outputs``.\n",
" Set ``need_weights=False`` to use the optimized ``scaled_dot_product_attention``\n",
" and achieve the best performance for MHA.\n",
" Default: ``True``."
"```markdown\n",
"need_weights: If specified, returns `attn_output_weights` in addition to `attn_outputs`.\n",
" Set `need_weights=False` to use the optimized `scaled_dot_product_attention`\n",
" and achieve the best performance for MHA.\n",
" Default: `True`\n",
"```"
]
},
{
Expand Down Expand Up @@ -964,16 +944,16 @@
"## 9) Using PyTorch's FlexAttention\n",
"\n",
"- See [FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention](https://pytorch.org/blog/flexattention/) to learn more about FlexAttention\n",
"- This is currently only supported in PyTorch 2.5 (nightly), which you can install on a CPU machine via\n",
"- This is supported starting from PyTorch 2.5, which you can install on a CPU machine via\n",
"\n",
" ```bash\n",
" pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu -U\n",
" pip install torch torchvision torchaudio\n",
" ```\n",
"\n",
"- To install PyTorch nighly on a GPU machine, use the following (for more information, also see the installation menu on [pytorch.org](https://pytorch.org/))\n",
"- To install PyTorch on a GPU machine, use the following (for more information, also see the installation menu on [pytorch.org](https://pytorch.org/))\n",
"\n",
" ```bash\n",
" pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121 -U\n",
" pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124\n",
" ```"
]
},
Expand Down Expand Up @@ -1987,7 +1967,7 @@
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "pt",
"language": "python",
"name": "python3"
},
Expand Down
10 changes: 5 additions & 5 deletions ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -426,7 +426,7 @@
" assert head_dim % 2 == 0, \"Embedding dimension must be even\"\n",
"\n",
" # Compute the inverse frequencies\n",
" inv_freq = 1.0 / (theta_base ** (torch.arange(0, head_dim // 2) / (head_dim // 2)))\n",
" inv_freq = 1.0 / (theta_base ** (torch.arange(0, head_dim, 2)[: (head_dim // 2)].float() / head_dim))\n",
"\n",
" # Generate position indices\n",
" positions = torch.arange(context_length)\n",
Expand Down Expand Up @@ -493,8 +493,8 @@
"\n",
"# Dummy query and key tensors\n",
"torch.manual_seed(123)\n",
"queries = torch.randn(batch_size, context_len, num_heads, head_dim)\n",
"keys = torch.randn(batch_size, context_len, num_heads, head_dim)\n",
"queries = torch.randn(batch_size, num_heads, context_len, head_dim)\n",
"keys = torch.randn(batch_size, num_heads, context_len, head_dim)\n",
"\n",
"# Apply rotary position embeddings\n",
"queries_rot = compute_rope(queries, cos, sin)\n",
Expand Down Expand Up @@ -1189,7 +1189,7 @@
"tokenizer_file = hf_hub_download(\n",
" repo_id=\"meta-llama/Llama-2-7b\",\n",
" filename=\"tokenizer.model\",\n",
" local_dir=\"Llama-2-7B\"\n",
" local_dir=\"Llama-2-7b\"\n",
")"
]
},
Expand Down Expand Up @@ -1691,7 +1691,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
"version": "3.10.6"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
Expand Down
38 changes: 20 additions & 18 deletions ch05/07_gpt_to_llama/converting-llama2-to-llama3.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,7 @@
" assert head_dim % 2 == 0, \"Embedding dimension must be even\"\n",
"\n",
" # Compute the inverse frequencies\n",
" inv_freq = 1.0 / (theta_base ** (torch.arange(0, head_dim // 2) / (head_dim // 2)))\n",
" inv_freq = 1.0 / (theta_base ** (torch.arange(0, head_dim, 2)[: (head_dim // 2)].float() / head_dim))\n",
"\n",
" ################################ NEW ###############################################\n",
" # Frequency adjustments\n",
Expand Down Expand Up @@ -383,8 +383,8 @@
"\n",
"# Dummy query and key tensors\n",
"torch.manual_seed(123)\n",
"queries = torch.randn(batch_size, llama_3_context_len, num_heads, head_dim)\n",
"keys = torch.randn(batch_size, llama_3_context_len, num_heads, head_dim)\n",
"queries = torch.randn(batch_size, num_heads, llama_3_context_len, head_dim)\n",
"keys = torch.randn(batch_size, num_heads, llama_3_context_len, head_dim)\n",
"\n",
"# Apply rotary position embeddings\n",
"queries_rot = compute_rope(queries, cos, sin)\n",
Expand Down Expand Up @@ -1252,7 +1252,7 @@
"tokenizer_file_path = hf_hub_download(\n",
" repo_id=\"meta-llama/Meta-Llama-3-8B\",\n",
" filename=\"original/tokenizer.model\",\n",
" local_dir=\"llama3-files\"\n",
" local_dir=\"Llama-3-8B\"\n",
")"
]
},
Expand Down Expand Up @@ -1458,7 +1458,7 @@
" weights_file = hf_hub_download(\n",
" repo_id=\"meta-llama/Meta-Llama-3-8B\",\n",
" filename=f\"model-0000{i}-of-00004.safetensors\",\n",
" local_dir=\"llama3-files\"\n",
" local_dir=\"Llama-3-8B\"\n",
" )\n",
" current_weights = load_file(weights_file)\n",
" combined_weights.update(current_weights)"
Expand Down Expand Up @@ -1677,7 +1677,7 @@
"id": "akyo7WNyF_YL"
},
"source": [
"- Above, we used the pretrained base model; if you want to use a model capable of following instructions, use the `\"meta-llama/Llama-3-8b-Instruct\"` model instead, as shown below"
"- Above, we used the pretrained base model; if you want to use a model capable of following instructions, use the `\"meta-llama/Llama-3-8B-Instruct\"` model instead, as shown below"
]
},
{
Expand Down Expand Up @@ -1824,7 +1824,7 @@
" weights_file = hf_hub_download(\n",
" repo_id=\"meta-llama/Meta-Llama-3-8B-Instruct\",\n",
" filename=f\"model-0000{i}-of-00004.safetensors\",\n",
" local_dir=\"llama3-files\"\n",
" local_dir=\"Llama-3-8B-Instruct\"\n",
" )\n",
" current_weights = load_file(weights_file)\n",
" combined_weights.update(current_weights)\n",
Expand All @@ -1843,7 +1843,7 @@
"id": "VlH7qYVdDKQr"
},
"source": [
"- Note that the Llama 3 model should ideally used with the correct prompt template that was used during finetuning (as discussed in chapter 7)\n",
"- Note that the Llama 3 model should ideally be used with the correct prompt template that was used during finetuning (as discussed in chapter 7)\n",
"- Below is a wrapper class around the tokenizer based on Meta AI's Llama 3-specific [ChatFormat code](https://github.com/meta-llama/llama3/blob/11817d47e1ba7a4959b025eb1ca308572e0e3963/llama/tokenizer.py#L202) that constructs the prompt template"
]
},
Expand Down Expand Up @@ -2099,7 +2099,7 @@
"metadata": {},
"outputs": [],
"source": [
"LLAMA32_CONFIG[\"context_length\"] = 8192"
"LLAMA31_CONFIG_8B[\"context_length\"] = 8192"
]
},
{
Expand Down Expand Up @@ -2157,7 +2157,7 @@
"tokenizer_file_path = hf_hub_download(\n",
" repo_id=\"meta-llama/Llama-3.1-8B\",\n",
" filename=\"original/tokenizer.model\",\n",
" local_dir=\"llama31-files\"\n",
" local_dir=\"Llama-3.1-8B\"\n",
")\n",
"\n",
"tokenizer = Tokenizer(tokenizer_file_path)"
Expand Down Expand Up @@ -2313,13 +2313,14 @@
" weights_file = hf_hub_download(\n",
" repo_id=\"meta-llama/Llama-3.1-8B\",\n",
" filename=f\"model-0000{i}-of-00004.safetensors\",\n",
" local_dir=\"llama31-files\"\n",
" local_dir=\"Llama-3.1-8B\"\n",
" )\n",
" current_weights = load_file(weights_file)\n",
" combined_weights.update(current_weights)\n",
"\n",
"load_weights_into_llama(model, LLAMA31_CONFIG_8B, combined_weights)\n",
"model.to(device);"
"model.to(device);\n",
"del combined_weights # free up memory"
]
},
{
Expand Down Expand Up @@ -2466,7 +2467,7 @@
"metadata": {},
"outputs": [],
"source": [
"LLAMA32_CONFIG[\"context_length\"] = 8192"
"LLAMA32_CONFIG_1B[\"context_length\"] = 8192"
]
},
{
Expand Down Expand Up @@ -2512,7 +2513,7 @@
"tokenizer_file_path = hf_hub_download(\n",
" repo_id=\"meta-llama/Llama-3.2-1B\",\n",
" filename=\"original/tokenizer.model\",\n",
" local_dir=\"llama32-files\"\n",
" local_dir=\"Llama-3.2-1B\"\n",
")\n",
"\n",
"tokenizer = Tokenizer(tokenizer_file_path)"
Expand Down Expand Up @@ -2589,12 +2590,13 @@
"weights_file = hf_hub_download(\n",
" repo_id=\"meta-llama/Llama-3.2-1B\",\n",
" filename=f\"model.safetensors\",\n",
" local_dir=\"llama32-files\"\n",
" local_dir=\"Llama-3.2-1B\"\n",
")\n",
"current_weights = load_file(weights_file)\n",
"\n",
"load_weights_into_llama(model, LLAMA32_CONFIG_1B, current_weights)\n",
"model.to(device);"
"model.to(device);\n",
"del current_weights # free up memory"
]
},
{
Expand Down Expand Up @@ -2687,7 +2689,7 @@
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "pt",
"language": "python",
"name": "python3"
},
Expand All @@ -2701,7 +2703,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
"version": "3.11.9"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
Expand Down
12 changes: 6 additions & 6 deletions ch05/07_gpt_to_llama/standalone-llama32.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@
" assert head_dim % 2 == 0, \"Embedding dimension must be even\"\n",
"\n",
" # Compute the inverse frequencies\n",
" inv_freq = 1.0 / (theta_base ** (torch.arange(0, head_dim // 2) / (head_dim // 2)))\n",
" inv_freq = 1.0 / (theta_base ** (torch.arange(0, head_dim, 2)[: (head_dim // 2)].float() / head_dim))\n",
"\n",
" # Frequency adjustments\n",
" if freq_config is not None:\n",
Expand Down Expand Up @@ -733,7 +733,7 @@
"tokenizer_file_path = hf_hub_download(\n",
" repo_id=f\"meta-llama/Llama-3.2-{LLAMA_SIZE_STR}-Instruct\",\n",
" filename=\"original/tokenizer.model\",\n",
" local_dir=\"llama32-files\"\n",
" local_dir=\"Llama-3.2-1B-Instruct\"\n",
")"
]
},
Expand Down Expand Up @@ -860,7 +860,7 @@
" weights_file = hf_hub_download(\n",
" repo_id=f\"meta-llama/Llama-3.2-{LLAMA_SIZE_STR}-Instruct\",\n",
" filename=f\"model.safetensors\",\n",
" local_dir=\"llama32-files\"\n",
" local_dir=\"Llama-3.2-1B-Instruct\"\n",
" )\n",
" combined_weights = load_file(weights_file)\n",
"\n",
Expand All @@ -871,7 +871,7 @@
" weights_file = hf_hub_download(\n",
" repo_id=f\"meta-llama/Llama-3.2-{LLAMA_SIZE_STR}-Instruct\",\n",
" filename=f\"model-0000{i}-of-00002.safetensors\",\n",
" local_dir=\"llama32-files\"\n",
" local_dir=\"Llama-3.2-1B-Instruct\"\n",
" )\n",
" current_weights = load_file(weights_file)\n",
" combined_weights.update(current_weights)\n",
Expand Down Expand Up @@ -1047,7 +1047,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "pt",
"language": "python",
"name": "python3"
},
Expand All @@ -1061,7 +1061,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
"version": "3.11.9"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit ad946db

Please sign in to comment.