Merge branch 'dev'

hbaghramyan · Aug 3, 2024 · 882ad29 · 882ad29
2 parents 5d57927 + 801238b
commit 882ad29
Show file tree

Hide file tree

Showing 6 changed files with 57 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -129,7 +129,7 @@ I welcome all sorts of feedback, best shared via the [Discussions](https://githu
 
 If you notice any problems or issues, please do not hesitate to file an [Issue](https://github.com/rasbt/LLMs-from-scratch/issues).
 
-However, since this repository contains the code corresponding to a print book, I currently cannot accept contributions that would extend the contents of the main chapter code, as it would introduce deviations from the physical book.
+However, since this repository contains the code corresponding to a print book, I currently cannot accept contributions that would extend the contents of the main chapter code, as it would introduce deviations from the physical book. Keeping it consistent helps ensure a smooth experience for everyone.
 
 
 &nbsp;

diff --git a/ch01/README.md b/ch01/README.md
@@ -1,3 +1,12 @@
 # Chapter 1: Understanding Large Language Models
 
 There is no code in this chapter.
+
+<br>
+As optional bonus material, below is a video tutorial where I explain the LLM development lifecycle covered in this book:
+
+<br>
+<br>
+
+[![Link to the video](https://img.youtube.com/vi/kPGTx4wcm_w/0.jpg)](https://www.youtube.com/watch?v=kPGTx4wcm_w)
+
diff --git a/ch03/03_understanding-buffers/README.md b/ch03/03_understanding-buffers/README.md
@@ -4,7 +4,7 @@
 
 
 <br>
-Below is a video tutorial of me explaining walking through the code:
+Below is a hands-on video tutorial I recorded to explain the code:
 
 <br>
 <br>

diff --git a/ch04/01_main-chapter-code/ch04.py b/ch04/01_main-chapter-code/ch04.py
@@ -62,6 +62,20 @@ def forward(self, x):
         return x
 
 
+class LayerNorm(nn.Module):
+    def __init__(self, emb_dim):
+        super().__init__()
+        self.eps = 1e-5
+        self.scale = nn.Parameter(torch.ones(emb_dim))
+        self.shift = nn.Parameter(torch.zeros(emb_dim))
+
+    def forward(self, x):
+        mean = x.mean(dim=-1, keepdim=True)
+        var = x.var(dim=-1, keepdim=True, unbiased=False)
+        norm_x = (x - mean) / torch.sqrt(var + self.eps)
+        return self.scale * norm_x + self.shift
+
+
 tokenizer = tiktoken.get_encoding("gpt2")
 batch = []
 txt1 = "Every effort moves you"
@@ -102,3 +116,12 @@ def forward(self, x):
 torch.set_printoptions(sci_mode=False)
 print("Mean\n:", mean)
 print("Variance:\n", var)
+
+
+ln = LayerNorm(emb_dim=5)
+out_ln = ln(batch_example)
+
+mean = out_ln.mean(dim=-1, keepdim=True)
+var = out_ln.var(dim=-1, unbiased=False, keepdim=True)
+
+print("Here")
diff --git a/setup/README.md b/setup/README.md
@@ -15,6 +15,11 @@ pip install -r requirements.txt
 
 &nbsp;
 
+# Local Setup
+
+This section provides recommendations for running the code in this book locally. Note that the code in the main chapters of this book is designed to run on conventional laptops within a reasonable timeframe and does not require specialized hardware. I tested all main chapters on an M3 MacBook Air laptop. Additionally, if your laptop or desktop computer has an NVIDIA GPU, the code will automatically take advantage of it.
+
+&nbsp;
 ## Setting up Python
 
 If you don't have Python set up on your machine yet, I have written about my personal Python setup preferences in the following directories:
@@ -46,6 +51,14 @@ If you are using Visual Studio Code (VSCode) as your primary code editor, you ca
 
 &nbsp;
 
+# Cloud Resources
+
+This section describes cloud alternatives for running the code presented in this book.
+
+While the code can run on conventional laptops and desktop computers without a dedicated GPU, cloud platforms with NVIDIA GPUs can substantially improve the runtime of the code, especially in chapters 5 to 7.
+
+&nbsp;
+
 ## Using Lightning Studio
 
 For a smooth development experience in the cloud, I recommend the [Lightning AI Studio](https://lightning.ai/) platform, which allows users to set up a persistent environment and use both VSCode and Jupyter Lab on cloud CPUs and GPUs.
@@ -85,6 +98,6 @@ You can optionally run the code on a GPU by changing the *Runtime* as illustrate
 
 &nbsp;
 
-## Questions?
+# Questions?
 
 If you have any questions, please don't hesitate to reach out via the [Discussions](https://github.com/rasbt/LLMs-from-scratch/discussions) forum in this GitHub repository.
diff --git a/todo.md b/todo.md
@@ -295,6 +295,15 @@ to discuss
 
     To ensure that the positional embeddings are on the same device as the input indices and token embeddings, you specify device=in_idx.device when creating the positional indices tensor. This guarantees that the positional indices tensor and, consequently, the output of pos_emb will be on the correct device.
 
+    3. 
+
+    https://en.wikipedia.org/wiki/Rectifier_(neural_networks)
+
+    4. 
+
+    https://en.wikipedia.org/wiki/Bessel%27s_correction
+
+
  ### 05/08/2024 -
 
  1. mha-implementations.ipynb from the 02_bonus_efficient-multihead-attention
-Original file line number
+Diff line change
@@ Expand Up @@
     If you notice any problems or issues, please do not hesitate to file an [Issue](https://github.com/rasbt/LLMs-from-scratch/issues).
-    However, since this repository contains the code corresponding to a print book, I currently cannot accept contributions that would extend the contents of the main chapter code, as it would introduce deviations from the physical book.
+    However, since this repository contains the code corresponding to a print book, I currently cannot accept contributions that would extend the contents of the main chapter code, as it would introduce deviations from the physical book. Keeping it consistent helps ensure a smooth experience for everyone.
     &nbsp;
@@ Expand Down @@