From ce49ccd495e379518fb59f33cea252d903781733 Mon Sep 17 00:00:00 2001
From: Alonso Silva Allende <alonso.silva@gmail.com>
Date: Tue, 13 Aug 2024 23:03:01 +0200
Subject: [PATCH] Change cookbook examples: Download model weights in the hub
 cache folder (#1097)

Change cookbook examples: Download model weights in the hub cache folder
---
 docs/cookbook/chain_of_thought.md           | 56 +++++++++++++-------
 docs/cookbook/knowledge_graph_extraction.md | 56 +++++++++++++-------
 docs/cookbook/qa-with-citations.md          | 55 +++++++++++++-------
 docs/cookbook/react_agent.md                | 57 +++++++++++++--------
 4 files changed, 148 insertions(+), 76 deletions(-)

diff --git a/docs/cookbook/chain_of_thought.md b/docs/cookbook/chain_of_thought.md
index cc079a7ff..17c362696 100644
--- a/docs/cookbook/chain_of_thought.md
+++ b/docs/cookbook/chain_of_thought.md
@@ -11,30 +11,48 @@ We use [llama.cpp](https://github.com/ggerganov/llama.cpp) using the [llama-cpp-
 pip install llama-cpp-python
 ```
 
-We pull a quantized GGUF model, in this guide we pull [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/):
+We download the model weights by passing the name of the repository on the HuggingFace Hub, and the filenames (or glob pattern):
+```python
+import llama_cpp
+from outlines import generate, models
 
-```bash
-wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf
+model = models.llamacpp("NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF",
+            "Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf",
+            tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
+            "NousResearch/Hermes-2-Pro-Llama-3-8B"
+            ),
+            n_gpu_layers=-1,
+            flash_attn=True,
+            n_ctx=8192,
+            verbose=False)
 ```
 
-We initialize the model:
+??? note "(Optional) Store the model weights in a custom folder"
 
-```python
-from llama_cpp import Llama
-from outlines import generate, models
+    By default the model weights are downloaded to the hub cache but if we want so store the weights in a custom folder, we pull a quantized GGUF model [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/):
 
-llm = Llama(
-    "/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf",
-    tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
-        "NousResearch/Hermes-2-Pro-Llama-3-8B"
-    ),
-    n_gpu_layers=-1,
-    flash_attn=True,
-    n_ctx=8192,
-    verbose=False
-)
-model = models.LlamaCpp(llm)
-```
+    ```bash
+    wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf
+    ```
+
+    We initialize the model:
+
+    ```python
+    import llama_cpp
+    from llama_cpp import Llama
+    from outlines import generate, models
+
+    llm = Llama(
+        "/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf",
+        tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
+            "NousResearch/Hermes-2-Pro-Llama-3-8B"
+        ),
+        n_gpu_layers=-1,
+        flash_attn=True,
+        n_ctx=8192,
+        verbose=False
+    )
+    ```
 
 ## Chain of thought
 
diff --git a/docs/cookbook/knowledge_graph_extraction.md b/docs/cookbook/knowledge_graph_extraction.md
index c4c1dc75c..e25166bca 100644
--- a/docs/cookbook/knowledge_graph_extraction.md
+++ b/docs/cookbook/knowledge_graph_extraction.md
@@ -8,30 +8,48 @@ We will use [llama.cpp](https://github.com/ggerganov/llama.cpp) using the [llama
 pip install llama-cpp-python
 ```
 
-We pull a quantized GGUF model, in this guide we pull [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/):
+We download the model weights by passing the name of the repository on the HuggingFace Hub, and the filenames (or glob pattern):
+```python
+import llama_cpp
+from outlines import generate, models
 
-```bash
-wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf
+model = models.llamacpp("NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF",
+            "Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf",
+            tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
+            "NousResearch/Hermes-2-Pro-Llama-3-8B"
+            ),
+            n_gpu_layers=-1,
+            flash_attn=True,
+            n_ctx=8192,
+            verbose=False)
 ```
 
-We initialize the model:
+??? note "(Optional) Store the model weights in a custom folder"
 
-```python
-from llama_cpp import Llama
-from outlines import generate, models
+    By default the model weights are downloaded to the hub cache but if we want so store the weights in a custom folder, we pull a quantized GGUF model [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/):
 
-llm = Llama(
-    "/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf",
-    tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
-        "NousResearch/Hermes-2-Pro-Llama-3-8B"
-    ),
-    n_gpu_layers=-1,
-    flash_attn=True,
-    n_ctx=8192,
-    verbose=False
-)
-model = models.LlamaCpp(llm)
-```
+    ```bash
+    wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf
+    ```
+
+    We initialize the model:
+
+    ```python
+    import llama_cpp
+    from llama_cpp import Llama
+    from outlines import generate, models
+
+    llm = Llama(
+        "/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf",
+        tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
+            "NousResearch/Hermes-2-Pro-Llama-3-8B"
+        ),
+        n_gpu_layers=-1,
+        flash_attn=True,
+        n_ctx=8192,
+        verbose=False
+    )
+    ```
 
 ## Knowledge Graph Extraction
 
diff --git a/docs/cookbook/qa-with-citations.md b/docs/cookbook/qa-with-citations.md
index c2111617f..79a2214c3 100644
--- a/docs/cookbook/qa-with-citations.md
+++ b/docs/cookbook/qa-with-citations.md
@@ -8,29 +8,48 @@ We will use [llama.cpp](https://github.com/ggerganov/llama.cpp) using the [llama
 pip install llama-cpp-python
 ```
 
-We pull a quantized GGUF model [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/):
+We download the model weights by passing the name of the repository on the HuggingFace Hub, and the filenames (or glob pattern):
+```python
+import llama_cpp
+from outlines import generate, models
 
-```bash
-wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf
+model = models.llamacpp("NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF",
+            "Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf",
+            tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
+            "NousResearch/Hermes-2-Pro-Llama-3-8B"
+            ),
+            n_gpu_layers=-1,
+            flash_attn=True,
+            n_ctx=8192,
+            verbose=False)
 ```
 
-We initialize the model:
+??? note "(Optional) Store the model weights in a custom folder"
 
-```python
-from llama_cpp import Llama
-from outlines import generate, models
+    By default the model weights are downloaded to the hub cache but if we want so store the weights in a custom folder, we pull a quantized GGUF model [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/):
 
-llm = Llama(
-    "/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf",
-    tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
-        "NousResearch/Hermes-2-Pro-Llama-3-8B"
-    ),
-    n_gpu_layers=-1,
-    flash_attn=True,
-    n_ctx=8192,
-    verbose=False
-)
-```
+    ```bash
+    wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf
+    ```
+
+    We initialize the model:
+
+    ```python
+    import llama_cpp
+    from llama_cpp import Llama
+    from outlines import generate, models
+
+    llm = Llama(
+        "/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf",
+        tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
+            "NousResearch/Hermes-2-Pro-Llama-3-8B"
+        ),
+        n_gpu_layers=-1,
+        flash_attn=True,
+        n_ctx=8192,
+        verbose=False
+    )
+    ```
 
 ## Generate Synthetic Data
 
diff --git a/docs/cookbook/react_agent.md b/docs/cookbook/react_agent.md
index 15fb964a0..ca4829d5f 100644
--- a/docs/cookbook/react_agent.md
+++ b/docs/cookbook/react_agent.md
@@ -12,32 +12,49 @@ We use [llama.cpp](https://github.com/ggerganov/llama.cpp) using the [llama-cpp-
 pip install llama-cpp-python
 ```
 
-We pull a quantized GGUF model, in this guide we pull [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/):
-
-```bash
-wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf
-```
-
-We initialize the model:
-
+We download the model weights by passing the name of the repository on the HuggingFace Hub, and the filenames (or glob pattern):
 ```python
 import llama_cpp
-from llama_cpp import Llama
 from outlines import generate, models
 
-llm = Llama(
-    "/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf",
-    tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
-        "NousResearch/Hermes-2-Pro-Llama-3-8B"
-    ),
-    n_gpu_layers=-1,
-    flash_attn=True,
-    n_ctx=8192,
-    verbose=False
-)
-model = models.LlamaCpp(llm)
+model = models.llamacpp("NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF",
+            "Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf",
+            tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
+            "NousResearch/Hermes-2-Pro-Llama-3-8B"
+            ),
+            n_gpu_layers=-1,
+            flash_attn=True,
+            n_ctx=8192,
+            verbose=False)
 ```
 
+??? note "(Optional) Store the model weights in a custom folder"
+
+    By default the model weights are downloaded to the hub cache but if we want so store the weights in a custom folder, we pull a quantized GGUF model [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/):
+
+    ```bash
+    wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf
+    ```
+
+    We initialize the model:
+
+    ```python
+    import llama_cpp
+    from llama_cpp import Llama
+    from outlines import generate, models
+
+    llm = Llama(
+        "/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf",
+        tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
+            "NousResearch/Hermes-2-Pro-Llama-3-8B"
+        ),
+        n_gpu_layers=-1,
+        flash_attn=True,
+        n_ctx=8192,
+        verbose=False
+    )
+    ```
+
 ## Build a ReAct agent
 
 In this example, we use two tools: