Improve example documentation

- Add README - Improve Godoc
philippgille · Mar 6, 2024 · f82f7ba · f82f7ba
1 parent 8dc1a7f
commit f82f7ba
Show file tree

Hide file tree

Showing 3 changed files with 190 additions and 4 deletions.
diff --git a/example/README.md b/example/README.md
@@ -0,0 +1,186 @@
+# Example
+
+This example shows a retrieval augmented generation (RAG) application, using `chromem-go` as knowledge base for finding relevant info for a question.
+
+We run the embeddings model and LLM in [Ollama](https://github.com/ollama/ollama), to showcase how a RAG application can run entirely offline, without relying on OpenAI or other third party APIs. It doesn't require a GPU, and a CPU like an 11th Gen Intel i5-1135G7 (like in the first generation Framework Laptop 13) is fast enough.
+
+As LLM we use Google's [Gemma (2B)](https://huggingface.co/google/gemma-2b), a very small model that doesn't need much resources and is fast, but doesn't have much knowledge, so it's a prime example for the combination of LLMs and vector databases. We found Gemma 2B to be superior to [TinyLlama (1.1B)](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0), [Stable LM 2 (1.6B)](https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b) and [Phi-2 (2.7B)](https://huggingface.co/microsoft/phi-2) for the RAG use case.
+
+As embeddings model we use Nomic's [nomic-embed-text v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5).
+
+## How to run
+
+1. Install Ollama: <https://ollama.com/download>
+2. Download the two models:
+   - `ollama pull gemma:2b`
+   - `ollama pull nomic-embed-text`
+3. Run the example: `go run .`
+
+## Output
+
+The output can differ slightly on each run, but it's along the lines of:
+
+```log
+2024/03/02 20:02:30 Warming up Ollama...
+2024/03/02 20:02:33 Question: When did the Monarch Company exist?
+2024/03/02 20:02:33 Asking LLM...
+2024/03/02 20:02:34 Initial reply from the LLM: "I cannot provide information on the Monarch Company, as I am unable to access real-time or comprehensive knowledge sources."
+2024/03/02 20:02:34 Setting up chromem-go...
+2024/03/02 20:02:34 Reading JSON lines...
+2024/03/02 20:02:34 Adding documents to chromem-go, including creating their embeddings via Ollama API...
+2024/03/02 20:03:11 Querying chromem-go...
+2024/03/02 20:03:11 Document 1 (similarity: 0.723627): "Malleable Iron Range Company was a company that existed from 1896 to 1985 and primarily produced kitchen ranges made of malleable iron but also produced a variety of other related products. The company's primary trademark was 'Monarch' and was colloquially often referred to as the Monarch Company or just Monarch."
+2024/03/02 20:03:11 Document 2 (similarity: 0.550584): "The American Motor Car Company was a short-lived company in the automotive industry founded in 1906 lasting until 1913. It was based in Indianapolis Indiana United States. The American Motor Car Company pioneered the underslung design."
+2024/03/02 20:03:11 Asking LLM with augmented question...
+2024/03/02 20:03:32 Reply after augmenting the question with knowledge: "The Monarch Company existed from 1896 to 1985."
+```
+
+The majority of the time here is spent during the embeddings creation as well as the LLM conversation, which are not part of `chromem-go`.
+
+## OpenAI
+
+You can easily adapt the code to work with OpenAI instead of locally in Ollama.
+
+Add the OpenAI API key in your environment as `OPENAI_API_KEY`.
+
+Then, if you want to create the embeddings via OpenAI, but still use Gemma 2B as LLM:
+
+<details><summary>Apply this patch</summary>
+
+```diff
+diff --git a/example/main.go b/example/main.go
+index 55b3076..cee9561 100644
+--- a/example/main.go
++++ b/example/main.go
+@@ -14,8 +14,6 @@ import (
+
+ const (
+        question = "When did the Monarch Company exist?"
+-       // We use a local LLM running in Ollama for the embedding: https://huggingface.co/nomic-ai/nomic-embed-text-v1.5
+-       embeddingModel = "nomic-embed-text"
+ )
+
+ func main() {
+@@ -48,7 +46,7 @@ func main() {
+        // variable to be set.
+        // For this example we choose to use a locally running embedding model though.
+        // It requires Ollama to serve its API at "http://localhost:11434/api".
+-       collection, err := db.GetOrCreateCollection("Wikipedia", nil, chromem.NewEmbeddingFuncOllama(embeddingModel))
++       collection, err := db.GetOrCreateCollection("Wikipedia", nil, nil)
+        if err != nil {
+                panic(err)
+        }
+@@ -82,7 +80,7 @@ func main() {
+                                Content:  article.Text,
+                        })
+                }
+-               log.Println("Adding documents to chromem-go, including creating their embeddings via Ollama API...")
++               log.Println("Adding documents to chromem-go, including creating their embeddings via OpenAI API...")
+                err = collection.AddDocuments(ctx, docs, runtime.NumCPU())
+                if err != nil {
+                        panic(err)
+```
+
+</details>
+
+Or alternatively, if you want to use OpenAI for everything (embeddings creation and LLM):
+
+<details><summary>Apply this patch</summary>
+
+```diff
+diff --git a/example/llm.go b/example/llm.go
+index 1fde4ec..7cb81cc 100644
+--- a/example/llm.go
++++ b/example/llm.go
+@@ -2,23 +2,13 @@ package main
+
+ import (
+  "context"
+- "net/http"
++ "os"
+  "strings"
+  "text/template"
+
+  "github.com/sashabaranov/go-openai"
+ )
+
+-const (
+- // We use a local LLM running in Ollama for asking the question: https://github.com/ollama/ollama
+- ollamaBaseURL = "http://localhost:11434/v1"
+- // We use Google's Gemma (2B), a very small model that doesn't need much resources
+- // and is fast, but doesn't have much knowledge: https://huggingface.co/google/gemma-2b
+- // We found Gemma 2B to be superior to TinyLlama (1.1B), Stable LM 2 (1.6B)
+- // and Phi-2 (2.7B) for the retrieval augmented generation (RAG) use case.
+- llmModel = "gemma:2b"
+-)
+-
+ // There are many different ways to provide the context to the LLM.
+ // You can pass each context as user message, or the list as one user message,
+ // or pass it in the system prompt. The system prompt itself also has a big impact
+@@ -47,10 +37,7 @@ Don't mention the knowledge base, context or search results in your answer.
+
+ func askLLM(ctx context.Context, contexts []string, question string) string {
+  // We can use the OpenAI client because Ollama is compatible with OpenAI's API.
+- openAIClient := openai.NewClientWithConfig(openai.ClientConfig{
+-  BaseURL:    ollamaBaseURL,
+-  HTTPClient: http.DefaultClient,
+- })
++ openAIClient := openai.NewClient(os.Getenv("OPENAI_API_KEY"))
+  sb := &strings.Builder{}
+  err := systemPromptTpl.Execute(sb, contexts)
+  if err != nil {
+@@ -66,7 +53,7 @@ func askLLM(ctx context.Context, contexts []string, question string) string {
+   },
+  }
+  res, err := openAIClient.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
+-  Model:    llmModel,
++  Model:    openai.GPT3Dot5Turbo,
+   Messages: messages,
+  })
+  if err != nil {
+diff --git a/example/main.go b/example/main.go
+index 55b3076..044a246 100644
+--- a/example/main.go
++++ b/example/main.go
+@@ -12,19 +12,11 @@ import (
+  "github.com/philippgille/chromem-go"
+ )
+
+-const (
+- question = "When did the Monarch Company exist?"
+- // We use a local LLM running in Ollama for the embedding: https://huggingface.co/nomic-ai/nomic-embed-text-v1.5
+- embeddingModel = "nomic-embed-text"
+-)
++const question = "When did the Monarch Company exist?"
+
+ func main() {
+  ctx := context.Background()
+
+- // Warm up Ollama, in case the model isn't loaded yet
+- log.Println("Warming up Ollama...")
+- _ = askLLM(ctx, nil, "Hello!")
+-
+  // First we ask an LLM a fairly specific question that it likely won't know
+  // the answer to.
+  log.Println("Question: " + question)
+@@ -48,7 +40,7 @@ func main() {
+  // variable to be set.
+  // For this example we choose to use a locally running embedding model though.
+  // It requires Ollama to serve its API at "http://localhost:11434/api".
+- collection, err := db.GetOrCreateCollection("Wikipedia", nil, chromem.NewEmbeddingFuncOllama(embeddingModel))
++ collection, err := db.GetOrCreateCollection("Wikipedia", nil, nil)
+  if err != nil {
+   panic(err)
+  }
+@@ -82,7 +74,7 @@ func main() {
+     Content:  article.Text,
+    })
+   }
+-  log.Println("Adding documents to chromem-go, including creating their embeddings via Ollama API...")
++  log.Println("Adding documents to chromem-go, including creating their embeddings via OpenAI API...")
+   err = collection.AddDocuments(ctx, docs, runtime.NumCPU())
+   if err != nil {
+    panic(err)
+```
+
+</details>
diff --git a/example/llm.go b/example/llm.go
@@ -10,10 +10,10 @@ import (
 )
 
 const (
-	// We use a local LLM running in Ollama for asking the question: https://ollama.com/
+	// We use a local LLM running in Ollama for asking the question: https://github.com/ollama/ollama
 	ollamaBaseURL = "http://localhost:11434/v1"
-	// We use a very small model that doesn't need much resources and is fast, but
-	// doesn't have much knowledge: https://ollama.com/library/gemma
+	// We use Google's Gemma (2B), a very small model that doesn't need much resources
+	// and is fast, but doesn't have much knowledge: https://huggingface.co/google/gemma-2b
 	// We found Gemma 2B to be superior to TinyLlama (1.1B), Stable LM 2 (1.6B)
 	// and Phi-2 (2.7B) for the retrieval augmented generation (RAG) use case.
 	llmModel = "gemma:2b"

diff --git a/example/main.go b/example/main.go
@@ -14,7 +14,7 @@ import (
 
 const (
 	question = "When did the Monarch Company exist?"
-	// We use a local LLM running in Ollama for the embedding: https://ollama.com/library/nomic-embed-text
+	// We use a local LLM running in Ollama for the embedding: https://huggingface.co/nomic-ai/nomic-embed-text-v1.5
 	embeddingModel = "nomic-embed-text"
 )