Merge pull request #4 from djliden/dl/reorganize

Dl/reorganize
djliden · Sep 12, 2024 · 7ecfc8b · 7ecfc8b
2 parents a570178 + a08cbe4
commit 7ecfc8b
Show file tree

Hide file tree

Showing 20 changed files with 1,364 additions and 27 deletions.
diff --git a/README.md b/README.md
@@ -1,20 +1,24 @@
-# Fine-Tuning LLMs
+# Jupyter Notebook Examples
 
-*View these notebooks in a more readable format at [danliden.com/fine-tuning](https://danliden.com/fine-tuning).*
+*View these notebooks in a more readable format at [danliden.com/notebooks](https://danliden.com/notebooks).*
 
-This series of notebooks is intended to show how to fine-tune language models, starting from smaller models on single-node single-GPU setups and gradually scaling up to multi-GPU and multi-node configurations.
+This repository contains a collection of Jupyter notebooks demonstrating various concepts and techniques across different fields. Currently, it includes a series on fine-tuning language models, but it will expand to cover other topics in the future.
+
+## Fine-Tuning LLMs
+
+The fine-tuning section shows how to fine-tune language models, starting from smaller models on single-node single-GPU setups and gradually scaling up to multi-GPU and multi-node configurations.
 
 Existing examples and learning resources generally do not bridge the practical gap between single-node single-GPU training when all parameters fit in VRAM, and the various forms of distributed training. These examples, when complete, are intended to show how to train smaller models given sufficient compute resources and then scale the models up until we encounter compute and/or memory constraints. We will then introduce various distributed training approaches aimed at overcoming these issues.
 
 This will, hopefully, serve as a practical and conceptual bridge from single-node single-GPU training to distributed training with tools such as deepspeed and FSDP.
 
 ## How to use this repository
 
-The examples in this repository are intended to be read sequentially. Later examples build on earlier examples and gradually add scale and complexity.
+The examples in this repository are organized by topic. Within each topic, the notebooks are intended to be read sequentially. Later examples often build on earlier examples and gradually add complexity.
 
 ## Contributing
 
 Contributions are welcome, and there are a few different ways to get involved.
-- If you see an error or bug, please [open an issue](https://github.com/djliden/fine-tuning/issues/new) or open a PR.
-- If you have a question about this repository, or you want to request a specific example, please [open an issue](https://github.com/djliden/fine-tuning/issues/new).
-- If you're interested in contributing an example, I encourage you to get in touch. You can [open an issue](https://github.com/djliden/fine-tuning/issues/new) or reach out by email or social media.
+- If you see an error or bug, please [open an issue](https://github.com/djliden/notebooks/issues/new) or open a PR.
+- If you have a question about this repository, or you want to request a specific example, please [open an issue](https://github.com/djliden/notebooks/issues/new).
+- If you're interested in contributing an example, I encourage you to get in touch. You can [open an issue](https://github.com/djliden/notebooks/issues/new) or reach out by email or social media.
diff --git a/notebooks/_config.yml b/notebooks/_config.yml
@@ -1,6 +1,6 @@
-title: LLM Fine-Tuning
-author: Dan Liden
-logo: logo_draft.png
+title: Notebooks
+author: Daniel Liden
+logo: logo.jpg
 execute:
   execute_notebooks: 'off'
 
@@ -15,7 +15,7 @@ repository:
 html:
   use_issues_button: true
   use_repository_button: true
-  home_page_in_navbar: false
+  home_page_in_navbar: true
 sphinx:
   config:
     html_show_copyright: false
diff --git a/notebooks/_toc.yml b/notebooks/_toc.yml
@@ -4,13 +4,15 @@
 format: jb-book
 root: index
 parts:
-- caption: Smaller Models (Single GPU)
+- caption: AI Training
   chapters:
-  - file: 1_t5_small_single_gpu/1_T5-Small_on_Single_GPU.ipynb
-  - file: 2_gpt2_single_gpu/2_GPT2_on_a_single_GPU.ipynb
-  - file: 3_tinyllama_instruction_tune/3_instruction_tuning_tinyllama_on_a_single_GPU.ipynb
-  - file: 4_olmo_1b_instruction_tune/4_olmo_instruction_tune.ipynb
-- caption: Other topics of interest
-  chapters:
-  - file: 3_tinyllama_instruction_tune/data_preprocessing.ipynb
-  - file: 5_gemma_2b_axolotl/gemma_2b_axolotl.ipynb
+  - file: ai_training/intro
+    sections:
+    - file: ai_training/fine_tuning/1_t5_small_single_gpu/1_T5-Small_on_Single_GPU
+    - file: ai_training/fine_tuning/2_gpt2_single_gpu/2_GPT2_on_a_single_GPU
+    - file: ai_training/fine_tuning/3_tinyllama_instruction_tune/3_instruction_tuning_tinyllama_on_a_single_GPU.ipynb
+    - file: ai_training/fine_tuning/4_olmo_1b_instruction_tune/4_olmo_instruction_tune
+    - file: ai_training/fine_tuning/5_gemma_2b_axolotl/gemma_2b_axolotl
+  - file: ai_training/appendix
+    sections:
+    - file: ai_training/fine_tuning/3_tinyllama_instruction_tune/data_preprocessing
diff --git a/notebooks/ai_training/appendix.md b/notebooks/ai_training/appendix.md
@@ -0,0 +1,6 @@
+# Appendix
+
+This appendix contains a collection of miscellaneous resources and supplementary materials that complement the main sections of the AI training materials.
+
+```{tableofcontents}
+```
diff --git a/...single_gpu/1_T5-Small_on_Single_GPU.ipynb → ...single_gpu/1_T5-Small_on_Single_GPU.ipynb b/...single_gpu/1_T5-Small_on_Single_GPU.ipynb → ...single_gpu/1_T5-Small_on_Single_GPU.ipynb
@@ -19,7 +19,7 @@
     "\n",
     "Fine-tuning large language models (LLMs) almost always requires multiple GPUs to be practical (or possible at all). But if you're relatively new to deep learning, or you've only trained models on single GPUs before, making the jump to distributed training on multiple GPUs and multiple nodes can be extremely challenging and more than a little frustrating.\n",
     "\n",
-    "As noted in the [readme](../README.md), the goal of this project is to start small and gradually add complexity. So we're not going to start with a \"large language model\" at all. We're starting with a very small model called [t5-small](https://huggingface.co/t5-small). Why start with a small model if we want to train considerably larger models?\n",
+    "The goal of this project is to start small and gradually add complexity. So we're not going to start with a \"large language model\" at all. We're starting with a very small model called [t5-small](https://huggingface.co/t5-small). Why start with a small model if we want to train considerably larger models?\n",
     "- Learning about model fine-tuning is a lot less frustrating if you start from a place of less complexity and are able to get results quickly!\n",
     "- When we get to the point of training larger models on distributed systems, we're going to spend a lot of time and energy on *how* to distribute the model, data, etc., across that system. Starting smaller lets us spend some time at the beginning focusing on the training metrics that directly relate to model performance rather than the complexity involved with distributed training. Eventually we will need both, but there's no reason to try to digest all of it all at once!\n",
     "- Starting small and then scaling up will give us a solid intuition of how, when, and why to use the various tools and techniques for training larger models or for using more compute resources to train models faster.\n",
@@ -32,7 +32,7 @@
     "t5-small is a 60 million parameter model. This is *small*: the smallest version of GPT2 has more than twice as many parameters (124M); llama2-7b, one of the most commonly-used models at the time of writing, has more than 116 times as many parameters (7B, hence the name). What does this mean for us? Parameter count strongly impacts the amount of memory required to train a model. Eleuther's [Transformer Math blog post](https://blog.eleuther.ai/transformer-math/#training) has a great overview of the memory costs associated with training models of different sizes. We'll get into this in more detail in a later notebook.\n",
     "\n",
     "## A few things to keep in mind\n",
-    "Check out the [Readme](README.md) if you haven't already, as it provides important context for this whole project. If you're looking for a set of absolute best practices for how to train particular models, this isn't the place to find them (though I will link them when I come across them, and will try to make improvements where I can, as long as they don't come at the cost of extra complexity!). The goal is to develop a high-level understanding and intuition on model training and fine-tuning, so you can fairly quickly get to something that *works* and then iterate to make it work *better*.\n",
+    "If you're looking for a set of absolute best practices for how to train particular models, this isn't the place to find them (though I will link them when I come across them, and will try to make improvements where I can, as long as they don't come at the cost of extra complexity!). The goal is to develop a high-level understanding and intuition on model training and fine-tuning, so you can fairly quickly get to something that *works* and then iterate to make it work *better*.\n",
     "\n",
     "## Compute used in this example\n",
     "I am using a `g4dn.4xlarge` AWS ec2 instance, which has a single T4 GPU with 16GB VRAM.\n",

diff --git a/...mall_single_gpu/t5_small_requirements.txt → ...mall_single_gpu/t5_small_requirements.txt b/...mall_single_gpu/t5_small_requirements.txt → ...mall_single_gpu/t5_small_requirements.txt
diff --git a/...2_single_gpu/2_GPT2_on_a_single_GPU.ipynb → ...2_single_gpu/2_GPT2_on_a_single_GPU.ipynb b/...2_single_gpu/2_GPT2_on_a_single_GPU.ipynb → ...2_single_gpu/2_GPT2_on_a_single_GPU.ipynb
diff --git a/...s/2_gpt2_single_gpu/gpt2_requirements.txt → ...g/2_gpt2_single_gpu/gpt2_requirements.txt b/...s/2_gpt2_single_gpu/gpt2_requirements.txt → ...g/2_gpt2_single_gpu/gpt2_requirements.txt
diff --git a/...on_tuning_tinyllama_on_a_single_GPU.ipynb → ...on_tuning_tinyllama_on_a_single_GPU.ipynb b/...on_tuning_tinyllama_on_a_single_GPU.ipynb → ...on_tuning_tinyllama_on_a_single_GPU.ipynb
diff --git a/...instruction_tune/data_preprocessing.ipynb → ...instruction_tune/data_preprocessing.ipynb b/...instruction_tune/data_preprocessing.ipynb → ...instruction_tune/data_preprocessing.ipynb
diff --git a/...struction_tune/tinyllama_requirements.txt → ...struction_tune/tinyllama_requirements.txt b/...struction_tune/tinyllama_requirements.txt → ...struction_tune/tinyllama_requirements.txt
diff --git a/...uction_tune/4_olmo_instruction_tune.ipynb → ...uction_tune/4_olmo_instruction_tune.ipynb b/...uction_tune/4_olmo_instruction_tune.ipynb → ...uction_tune/4_olmo_instruction_tune.ipynb
diff --git a/...1b_instruction_tune/olmo_requirements.txt → ...1b_instruction_tune/olmo_requirements.txt b/...1b_instruction_tune/olmo_requirements.txt → ...1b_instruction_tune/olmo_requirements.txt
diff --git a/...5_gemma_2b_axolotl/gemma_2b_axolotl.ipynb → ...5_gemma_2b_axolotl/gemma_2b_axolotl.ipynb b/...5_gemma_2b_axolotl/gemma_2b_axolotl.ipynb → ...5_gemma_2b_axolotl/gemma_2b_axolotl.ipynb
diff --git a/notebooks/ai_training/intro.md b/notebooks/ai_training/intro.md
@@ -0,0 +1,12 @@
+# LLM Training/Fine-tuning
+
+Welcome to the AI Training module! This notebook serves as an introduction to the fundamental concepts and techniques used in training artificial intelligence models.
+
+In this module, we'll explore various aspects of AI training, including:
+- Data preparation and preprocessing
+- Model selection and architecture
+- Training algorithms and optimization techniques
+- Evaluation metrics and performance assessment
+
+```{tableofcontents}
+```
diff --git a/notebooks/index.md b/notebooks/index.md
@@ -1,14 +1,19 @@
-# Fine-Tuning LLMs
+# Guides and Examples
 
-This series of notebooks is intended to show how to fine-tune language models, starting from smaller models on single-node single-GPU setups and gradually scaling up to multi-GPU and multi-node configurations.
+```{attention} This site was previously dedicated solely to fine-tuning LLMs. I have since expanded the scope to include other topics. The process of converting the site is still in process, so you might encounter broken links or other issues. Let me know if you do! You can submit an issue with the GitHub button on the top right.
+```
+
+This repository contains a collection of Jupyter notebooks demonstrating various concepts and techniques across different fields. Currently, it includes a series on fine-tuning language models, but it will expand to cover other topics in the future.
+
+## AI Training: Fine-Tuning LLMs
 
-Existing examples and learning resources generally do not bridge the practical gap between single-node single-GPU training when all parameters fit in VRAM, and the various forms of distributed training. These examples, when complete, are intended to show how to train smaller models given sufficient compute resources and then scale the models up until we encounter compute and/or memory constraints. We will then introduce various distributed training approaches aimed at overcoming these issues.
+The AI Training section currently focuses on fine-tuning language models. It shows how to fine-tune models starting from smaller, single-GPU setups and gradually scaling up to multi-GPU and multi-node configurations.
 
-This will, hopefully, serve as a practical and conceptual bridge from single-node single-GPU training to distributed training with tools such as deepspeed and FSDP.
+These examples aim to bridge the gap between single-node single-GPU training and various forms of distributed training, serving as a practical and conceptual guide for scaling up model training.
 
 ## How to use this repository
 
-The examples in this repository are intended to be read sequentially. Later examples build on earlier examples and gradually add scale and complexity.
+The examples in this repository are organized by topic. Within each topic, the notebooks are intended to be read sequentially. Later examples often build on earlier examples and gradually add complexity.
 
 ```{tableofcontents}
 ```
diff --git a/notebooks/logo.jpg b/notebooks/logo.jpg
diff --git a/notebooks/logo_draft.png b/notebooks/logo_draft.png
diff --git a/pyproject.toml b/pyproject.toml
@@ -0,0 +1,9 @@
+[project]
+name = "fine-tuning"
+version = "0.1.0"
+description = "Add your description here"
+readme = "README.md"
+requires-python = ">=3.12"
+dependencies = [
+    "jupyter-book>=1.0.2",
+]