Merge branch 'dev'

hbaghramyan · Oct 25, 2024 · ad946db · ad946db
2 parents 33bae2a + 4926a16
commit ad946db
Show file tree

Hide file tree

Showing 9 changed files with 274 additions and 98 deletions.
diff --git a/.gitignore b/.gitignore
@@ -35,12 +35,15 @@ ch05/01_main-chapter-code/model.pth
 ch05/01_main-chapter-code/model_and_optimizer.pth
 ch05/03_bonus_pretraining_on_gutenberg/model_checkpoints
 ch05/06_user_interface/gpt2
+ch05/07_gpt_to_llama/.cache
 ch05/07_gpt_to_llama/Llama-2-7b
 ch05/07_gpt_to_llama/Llama-2-7b-chat
-ch05/07_gpt_to_llama/.cache
-ch05/07_gpt_to_llama/llama3-files
-ch05/07_gpt_to_llama/llama31-files
-ch05/07_gpt_to_llama/llama32-files
+ch05/07_gpt_to_llama/Llama-3-8B
+ch05/07_gpt_to_llama/Llama-3-8B-Instruct
+ch05/07_gpt_to_llama/Llama-3.1-8B
+ch05/07_gpt_to_llama/Llama-3.1-8B-Instruct
+ch05/07_gpt_to_llama/Llama-3.2-1B
+ch05/07_gpt_to_llama/Llama-3.2-1B-Instruct
 
 ch06/01_main-chapter-code/gpt2
 ch06/02_bonus_additional-experiments/gpt2

diff --git a/ch03/02_bonus_efficient-multihead-attention/mha-implementations.ipynb b/ch03/02_bonus_efficient-multihead-attention/mha-implementations.ipynb
@@ -22,50 +22,6 @@
     "</table>"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "1HABx0Hr3PDD",
-   "metadata": {
-    "id": "1HABx0Hr3PDD"
-   },
-   "source": [
-    "Uncomment and execute the following code cell to install the dependencies:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "qPnVNAOxwy5s",
-   "metadata": {
-    "id": "qPnVNAOxwy5s"
-   },
-   "outputs": [],
-   "source": [
-    "# pip install -r https://raw.githubusercontent.com/rasbt/LLMs-from-scratch/main/requirements.txt"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "LYLcq3403Yq6",
-   "metadata": {
-    "id": "LYLcq3403Yq6"
-   },
-   "source": [
-    "Uncomment and execute the following code cell to install the PyTorch nightly dependency if you want to run the FlexAttention benchmarks (this is required because FlexAttention is not yet included in the latest PyTorch release):"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "id": "gAgYvxm_xVct",
-   "metadata": {
-    "id": "gAgYvxm_xVct"
-   },
-   "outputs": [],
-   "source": [
-    "# pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121 -U"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "6f678e62-7bcb-4405-86ae-dce94f494303",
@@ -119,6 +75,28 @@
     "embeddings = torch.randn((batch_size, context_len, embed_dim), device=device)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "LYLcq3403Yq6",
+   "metadata": {
+    "id": "LYLcq3403Yq6"
+   },
+   "source": [
+    "- To run all the code in this notebook, please ensure you update to at least PyTorch 2.5 (FlexAttention is not included in earlier PyTorch releases)\n",
+    "- If the code cell above shows a PyTorch version lower than 2.5, you can upgrade your PyTorch installation by uncommenting and running the following code cell (Please note that PyTorch 2.5 requires Python 3.9 or later)\n",
+    "- For more specific instructions and CUDA versions, please refer to the official installation guide at https://pytorch.org"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1db27f43-86f4-478f-89df-fbc2182a129b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# pip install --upgrade torch torchvision torchaudio"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "2f9bb1b6-a1e5-4e0a-884d-0f31b374a8d6",
@@ -908,12 +886,14 @@
     "id": "d2164859-31a0-4537-b4fb-27d57675ba77"
    },
    "source": [
-    "- Set `need_weights` (default `True`) to need_weights=False so that `MultiheadAttention` uses `scaled_dot_product_attention` [according to the documentation](https://github.com/pytorch/pytorch/blob/71d020262793542974cf13b30f2a9099773f015c/torch/nn/modules/activation.py#L1096)\n",
+    "- Set `need_weights` (default `True`) to `False` so that `MultiheadAttention` uses `scaled_dot_product_attention` [according to the documentation](https://github.com/pytorch/pytorch/blob/71d020262793542974cf13b30f2a9099773f015c/torch/nn/modules/activation.py#L1096)\n",
     "\n",
-    ">  need_weights: If specified, returns ``attn_output_weights`` in addition to ``attn_outputs``.\n",
-    "            Set ``need_weights=False`` to use the optimized ``scaled_dot_product_attention``\n",
-    "            and achieve the best performance for MHA.\n",
-    "            Default: ``True``."
+    "```markdown\n",
+    "need_weights: If specified, returns `attn_output_weights` in addition to `attn_outputs`.\n",
+    "           Set `need_weights=False` to use the optimized `scaled_dot_product_attention`\n",
+    "           and achieve the best performance for MHA.\n",
+    "           Default: `True`\n",
+    "```"
    ]
   },
   {
@@ -964,16 +944,16 @@
     "## 9) Using PyTorch's FlexAttention\n",
     "\n",
     "- See [FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention](https://pytorch.org/blog/flexattention/) to learn more about FlexAttention\n",
-    "- This is currently only supported in PyTorch 2.5 (nightly), which you can install on a CPU machine via\n",
+    "- This is supported starting from PyTorch 2.5, which you can install on a CPU machine via\n",
     "\n",
     "    ```bash\n",
-    "    pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu -U\n",
+    "    pip install torch torchvision torchaudio\n",
     "    ```\n",
     "\n",
-    "- To install PyTorch nighly on a GPU machine, use the following (for more information, also see the installation menu on [pytorch.org](https://pytorch.org/))\n",
+    "- To install PyTorch on a GPU machine, use the following (for more information, also see the installation menu on [pytorch.org](https://pytorch.org/))\n",
     "\n",
     "    ```bash\n",
-    "    pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121 -U\n",
+    "    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124\n",
     "    ```"
    ]
   },
@@ -1987,7 +1967,7 @@
    "provenance": []
   },
   "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "pt",
    "language": "python",
    "name": "python3"
   },

diff --git a/ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb b/ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb
@@ -426,7 +426,7 @@
     "    assert head_dim % 2 == 0, \"Embedding dimension must be even\"\n",
     "\n",
     "    # Compute the inverse frequencies\n",
-    "    inv_freq = 1.0 / (theta_base ** (torch.arange(0, head_dim // 2) / (head_dim // 2)))\n",
+    "    inv_freq = 1.0 / (theta_base ** (torch.arange(0, head_dim, 2)[: (head_dim // 2)].float() / head_dim))\n",
     "\n",
     "    # Generate position indices\n",
     "    positions = torch.arange(context_length)\n",
@@ -493,8 +493,8 @@
     "\n",
     "# Dummy query and key tensors\n",
     "torch.manual_seed(123)\n",
-    "queries = torch.randn(batch_size, context_len, num_heads, head_dim)\n",
-    "keys = torch.randn(batch_size, context_len, num_heads, head_dim)\n",
+    "queries = torch.randn(batch_size, num_heads, context_len, head_dim)\n",
+    "keys = torch.randn(batch_size, num_heads, context_len, head_dim)\n",
     "\n",
     "# Apply rotary position embeddings\n",
     "queries_rot = compute_rope(queries, cos, sin)\n",
@@ -1189,7 +1189,7 @@
     "tokenizer_file = hf_hub_download(\n",
     "    repo_id=\"meta-llama/Llama-2-7b\",\n",
     "    filename=\"tokenizer.model\",\n",
-    "    local_dir=\"Llama-2-7B\"\n",
+    "    local_dir=\"Llama-2-7b\"\n",
     ")"
    ]
   },
@@ -1691,7 +1691,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.4"
+   "version": "3.10.6"
   },
   "widgets": {
    "application/vnd.jupyter.widget-state+json": {

diff --git a/ch05/07_gpt_to_llama/converting-llama2-to-llama3.ipynb b/ch05/07_gpt_to_llama/converting-llama2-to-llama3.ipynb
@@ -278,7 +278,7 @@
     "    assert head_dim % 2 == 0, \"Embedding dimension must be even\"\n",
     "\n",
     "    # Compute the inverse frequencies\n",
-    "    inv_freq = 1.0 / (theta_base ** (torch.arange(0, head_dim // 2) / (head_dim // 2)))\n",
+    "    inv_freq = 1.0 / (theta_base ** (torch.arange(0, head_dim, 2)[: (head_dim // 2)].float() / head_dim))\n",
     "\n",
     "    ################################ NEW ###############################################\n",
     "    # Frequency adjustments\n",
@@ -383,8 +383,8 @@
     "\n",
     "# Dummy query and key tensors\n",
     "torch.manual_seed(123)\n",
-    "queries = torch.randn(batch_size, llama_3_context_len, num_heads, head_dim)\n",
-    "keys = torch.randn(batch_size, llama_3_context_len, num_heads, head_dim)\n",
+    "queries = torch.randn(batch_size, num_heads, llama_3_context_len, head_dim)\n",
+    "keys = torch.randn(batch_size, num_heads, llama_3_context_len, head_dim)\n",
     "\n",
     "# Apply rotary position embeddings\n",
     "queries_rot = compute_rope(queries, cos, sin)\n",
@@ -1252,7 +1252,7 @@
     "tokenizer_file_path = hf_hub_download(\n",
     "    repo_id=\"meta-llama/Meta-Llama-3-8B\",\n",
     "    filename=\"original/tokenizer.model\",\n",
-    "    local_dir=\"llama3-files\"\n",
+    "    local_dir=\"Llama-3-8B\"\n",
     ")"
    ]
   },
@@ -1458,7 +1458,7 @@
     "    weights_file = hf_hub_download(\n",
     "        repo_id=\"meta-llama/Meta-Llama-3-8B\",\n",
     "        filename=f\"model-0000{i}-of-00004.safetensors\",\n",
-    "        local_dir=\"llama3-files\"\n",
+    "        local_dir=\"Llama-3-8B\"\n",
     "    )\n",
     "    current_weights = load_file(weights_file)\n",
     "    combined_weights.update(current_weights)"
@@ -1677,7 +1677,7 @@
     "id": "akyo7WNyF_YL"
    },
    "source": [
-    "- Above, we used the pretrained base model; if you want to use a model capable of following instructions, use the `\"meta-llama/Llama-3-8b-Instruct\"` model instead, as shown below"
+    "- Above, we used the pretrained base model; if you want to use a model capable of following instructions, use the `\"meta-llama/Llama-3-8B-Instruct\"` model instead, as shown below"
    ]
   },
   {
@@ -1824,7 +1824,7 @@
     "    weights_file = hf_hub_download(\n",
     "        repo_id=\"meta-llama/Meta-Llama-3-8B-Instruct\",\n",
     "        filename=f\"model-0000{i}-of-00004.safetensors\",\n",
-    "        local_dir=\"llama3-files\"\n",
+    "        local_dir=\"Llama-3-8B-Instruct\"\n",
     "    )\n",
     "    current_weights = load_file(weights_file)\n",
     "    combined_weights.update(current_weights)\n",
@@ -1843,7 +1843,7 @@
     "id": "VlH7qYVdDKQr"
    },
    "source": [
-    "- Note that the Llama 3 model should ideally used with the correct prompt template that was used during finetuning (as discussed in chapter 7)\n",
+    "- Note that the Llama 3 model should ideally be used with the correct prompt template that was used during finetuning (as discussed in chapter 7)\n",
     "- Below is a wrapper class around the tokenizer based on Meta AI's Llama 3-specific [ChatFormat code](https://github.com/meta-llama/llama3/blob/11817d47e1ba7a4959b025eb1ca308572e0e3963/llama/tokenizer.py#L202) that constructs the prompt template"
    ]
   },
@@ -2099,7 +2099,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "LLAMA32_CONFIG[\"context_length\"] = 8192"
+    "LLAMA31_CONFIG_8B[\"context_length\"] = 8192"
    ]
   },
   {
@@ -2157,7 +2157,7 @@
     "tokenizer_file_path = hf_hub_download(\n",
     "    repo_id=\"meta-llama/Llama-3.1-8B\",\n",
     "    filename=\"original/tokenizer.model\",\n",
-    "    local_dir=\"llama31-files\"\n",
+    "    local_dir=\"Llama-3.1-8B\"\n",
     ")\n",
     "\n",
     "tokenizer = Tokenizer(tokenizer_file_path)"
@@ -2313,13 +2313,14 @@
     "    weights_file = hf_hub_download(\n",
     "        repo_id=\"meta-llama/Llama-3.1-8B\",\n",
     "        filename=f\"model-0000{i}-of-00004.safetensors\",\n",
-    "        local_dir=\"llama31-files\"\n",
+    "        local_dir=\"Llama-3.1-8B\"\n",
     "    )\n",
     "    current_weights = load_file(weights_file)\n",
     "    combined_weights.update(current_weights)\n",
     "\n",
     "load_weights_into_llama(model, LLAMA31_CONFIG_8B, combined_weights)\n",
-    "model.to(device);"
+    "model.to(device);\n",
+    "del combined_weights  # free up memory"
    ]
   },
   {
@@ -2466,7 +2467,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "LLAMA32_CONFIG[\"context_length\"] = 8192"
+    "LLAMA32_CONFIG_1B[\"context_length\"] = 8192"
    ]
   },
   {
@@ -2512,7 +2513,7 @@
     "tokenizer_file_path = hf_hub_download(\n",
     "    repo_id=\"meta-llama/Llama-3.2-1B\",\n",
     "    filename=\"original/tokenizer.model\",\n",
-    "    local_dir=\"llama32-files\"\n",
+    "    local_dir=\"Llama-3.2-1B\"\n",
     ")\n",
     "\n",
     "tokenizer = Tokenizer(tokenizer_file_path)"
@@ -2589,12 +2590,13 @@
     "weights_file = hf_hub_download(\n",
     "    repo_id=\"meta-llama/Llama-3.2-1B\",\n",
     "    filename=f\"model.safetensors\",\n",
-    "    local_dir=\"llama32-files\"\n",
+    "    local_dir=\"Llama-3.2-1B\"\n",
     ")\n",
     "current_weights = load_file(weights_file)\n",
     "\n",
     "load_weights_into_llama(model, LLAMA32_CONFIG_1B, current_weights)\n",
-    "model.to(device);"
+    "model.to(device);\n",
+    "del current_weights  # free up memory"
    ]
   },
   {
@@ -2687,7 +2689,7 @@
    "provenance": []
   },
   "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "pt",
    "language": "python",
    "name": "python3"
   },
@@ -2701,7 +2703,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.4"
+   "version": "3.11.9"
   },
   "widgets": {
    "application/vnd.jupyter.widget-state+json": {

diff --git a/ch05/07_gpt_to_llama/standalone-llama32.ipynb b/ch05/07_gpt_to_llama/standalone-llama32.ipynb
@@ -133,7 +133,7 @@
     "    assert head_dim % 2 == 0, \"Embedding dimension must be even\"\n",
     "\n",
     "    # Compute the inverse frequencies\n",
-    "    inv_freq = 1.0 / (theta_base ** (torch.arange(0, head_dim // 2) / (head_dim // 2)))\n",
+    "    inv_freq = 1.0 / (theta_base ** (torch.arange(0, head_dim, 2)[: (head_dim // 2)].float() / head_dim))\n",
     "\n",
     "    # Frequency adjustments\n",
     "    if freq_config is not None:\n",
@@ -733,7 +733,7 @@
     "tokenizer_file_path = hf_hub_download(\n",
     "    repo_id=f\"meta-llama/Llama-3.2-{LLAMA_SIZE_STR}-Instruct\",\n",
     "    filename=\"original/tokenizer.model\",\n",
-    "    local_dir=\"llama32-files\"\n",
+    "    local_dir=\"Llama-3.2-1B-Instruct\"\n",
     ")"
    ]
   },
@@ -860,7 +860,7 @@
     "    weights_file = hf_hub_download(\n",
     "        repo_id=f\"meta-llama/Llama-3.2-{LLAMA_SIZE_STR}-Instruct\",\n",
     "        filename=f\"model.safetensors\",\n",
-    "        local_dir=\"llama32-files\"\n",
+    "        local_dir=\"Llama-3.2-1B-Instruct\"\n",
     "    )\n",
     "    combined_weights = load_file(weights_file)\n",
     "\n",
@@ -871,7 +871,7 @@
     "        weights_file = hf_hub_download(\n",
     "            repo_id=f\"meta-llama/Llama-3.2-{LLAMA_SIZE_STR}-Instruct\",\n",
     "            filename=f\"model-0000{i}-of-00002.safetensors\",\n",
-    "            local_dir=\"llama32-files\"\n",
+    "            local_dir=\"Llama-3.2-1B-Instruct\"\n",
     "        )\n",
     "        current_weights = load_file(weights_file)\n",
     "        combined_weights.update(current_weights)\n",
@@ -1047,7 +1047,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "pt",
    "language": "python",
    "name": "python3"
   },
@@ -1061,7 +1061,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.4"
+   "version": "3.11.9"
   }
  },
  "nbformat": 4,