Skip to content

Commit

Permalink
Add langchain tutorial (#2777)
Browse files Browse the repository at this point in the history
  • Loading branch information
CaroFG authored Apr 8, 2024
1 parent 0d952f3 commit 2577ab0
Show file tree
Hide file tree
Showing 3 changed files with 177 additions and 2 deletions.
7 changes: 6 additions & 1 deletion config/sidebar-learn.json
Original file line number Diff line number Diff line change
Expand Up @@ -445,7 +445,12 @@
"source": "learn/cookbooks/strapi_v4.mdx",
"label": "Strapi v4 guide",
"slug": "strapi_v4"
}
},
{
"source": "learn/cookbooks/langchain.mdx",
"label": "Building semantic search with LangChain",
"slug": "langchain"
}
]
},
{
Expand Down
170 changes: 170 additions & 0 deletions learn/cookbooks/langchain.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
---
title: Implementing semantic search with LangChain
description: This guide shows you how to implement semantic search using LangChain and similarity search.
---

# Implementing semantic search with LangChain

In this guide, you’ll use OpenAI’s text embeddings to measure the similarity between document properties. Then, you’ll use the LangChain framework to seamlessly integrate Meilisearch and create an application with semantic search.

## Requirements

This guide assumes a basic understanding of Python and LangChain. Beginners to LangChain will still find the tutorial accessible.

- Python (LangChain requires >= 3.8.1 and < 4.0) and the pip CLI
- A [Meilisearch >= 1.6 project](/learn/getting_started/cloud_quick_start)
- An [OpenAI API key](https://platform.openai.com/account/api-keys)


On Meilisearch Cloud, enable the vector store feature via your project’s Settings page.

![A section of the project overview interface titled "Experimental features". There are two options: "Score details" and "Vector store". "Vector store" is turned on.](https://raw.githubusercontent.com/meilisearch/documentation/main/assets/images/vector-search/01-cloud-vector-store.png)

If you are working with a self-hosted Meilisearch instance, activate the vector store with the [API route](/learn/experimental/vector_search#activate-vector-search).

## Creating the application

Create a folder for your application with an empty `setup.py` file.

Before writing any code, install the necessary dependencies:

```bash
pip install langchain openai meilisearch python-dotenv
```

First create a .env to store our credentials:

```
# .env
MEILI_HTTP_ADDR="your Meilisearch host"
MEILI_API_KEY="your Meilisearch API key"
OPENAI_API_KEY="your OpenAI API key"
```

Now that you have your environment variables available, create a `setup.py` file with some boilerplate code:

```python
# setup.py

import os
from dotenv import load_dotenv # remove if not using dotenv
from langchain.vectorstores import Meilisearch
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.document_loaders import JSONLoader

load_dotenv() # remove if not using dotenv

# exit if missing env vars
if "MEILI_HTTP_ADDR" not in os.environ:
raise Exception("Missing MEILI_HTTP_ADDR env var")
if "MEILI_API_KEY" not in os.environ:
raise Exception("Missing MEILI_API_KEY env var")
if "OPENAI_API_KEY" not in os.environ:
raise Exception("Missing OPENAI_API_KEY env var")

# Setup code will go here 👇
```

## Importing documents and embeddings

Now that the project is ready, import some documents in Meilisearch. First, download this small movies dataset:

<ButtonLink as="a" id="downloadMoviesLite" href="https://gist.github.com/Strift/1524ab5e2015e50bbcb215fb4d950a38" download="movies-lite.json">movies-lite.json</ButtonLink>

Then, update the setup.py file to load the JSON and store it in Meilisearch. You will also use the OpenAI text search models to generate vector embeddings.

To use vector search, we need to set the embedders index setting. In this case, you are using an `userProvided` source which requires to specify the size of the vectors in a `dimensions` field. The default model used by `OpenAIEmbeddings()` is `text-embedding-ada-002`, which has 1,536 dimensions.

```python
# setup.py

# previous code

# Load documents
loader = JSONLoader(
file_path="./movies-lite.json",
jq_schema=".[] | {id: .id, overview: .overview, title: .title}",
text_content=False,
)
documents = loader.load()
print("Loaded {} documents".format(len(documents)))

# Store documents in Meilisearch
embeddings = OpenAIEmbeddings()
embedders = {
"custom": {
"source": "userProvided",
"dimensions": 1536
}
}
embedder_name = "custom"
vector_store = Meilisearch.from_documents(documents=documents, embedding=embeddings, embedders=embedders, embedder_name=embedder_name)

print("Started importing documents")
```

Your Meilisearch instance will now contain your documents. Meilisearch runs tasks like document import asynchronously, so you might need to wait a bit for documents to be available. Consult [the asynchronous operations explanation](/learn/async/asynchronous_operations) for more information on how tasks work.

## Performing similarity search

Your database is now populated with the data from the movies dataset. Create a new `search.py` file to make a semantic search query: searching for documents using similarity search.

```python
# search.py

import os
from dotenv import load_dotenv
from langchain.vectorstores import Meilisearch
from langchain.embeddings.openai import OpenAIEmbeddings
import meilisearch

load_dotenv()

# You can use the same code as `setup.py` to check for missing env vars

# Create the vector store
client = meilisearch.Client(
url=os.environ.get("MEILI_HTTP_ADDR"),
api_key=os.environ.get("MEILI_API_KEY"),
)
embeddings = OpenAIEmbeddings()
vector_store = Meilisearch(client=client, embedding=embeddings)

# Make similarity search
embedder_name = "custom"
query = "superhero fighting evil in a city at night"
results = vector_store.similarity_search(
query=query,
embedder_name=embedder_name,
k=3,
)

# Display results
for result in results:
print(result.page_content)
```

Run `search.py`. If everything is working correctly, you should see an output like this:

```
{"id": 155, "title": "The Dark Knight", "overview": "Batman raises the stakes in his war on crime. With the help of Lt. Jim Gordon and District Attorney Harvey Dent, Batman sets out to dismantle the remaining criminal organizations that plague the streets. The partnership proves to be effective, but they soon find themselves prey to a reign of chaos unleashed by a rising criminal mastermind known to the terrified citizens of Gotham as the Joker."}
{"id": 314, "title": "Catwoman", "overview": "Liquidated after discovering a corporate conspiracy, mild-mannered graphic artist Patience Phillips washes up on an island, where she's resurrected and endowed with the prowess of a cat -- and she's eager to use her new skills ... as a vigilante. Before you can say \"cat and mouse,\" handsome gumshoe Tom Lone is on her tail."}
{"id": 268, "title": "Batman", "overview": "Batman must face his most ruthless nemesis when a deformed madman calling himself \"The Joker\" seizes control of Gotham's criminal underworld."}
```

Congrats 🎉 You managed to make a similarity search using Meilisearch as a LangChain vector store.

## Going further

Using Meilisearch as a LangChain vector store allows you to load documents and search for them in different ways:

- [Import documents from text](https://python.langchain.com/docs/integrations/vectorstores/meilisearch#adding-text-and-embeddings)
- [Similarity search with score](https://python.langchain.com/docs/integrations/vectorstores/meilisearch#similarity-search-with-score)
- [Similarity search by vector](https://python.langchain.com/docs/integrations/vectorstores/meilisearch#similarity-search-by-vector)

For additional information, consult:

[Meilisearch Python SDK docs](https://python-sdk.meilisearch.com/)

Finally, should you want to use Meilisearch's vector search capabilities without LangChain or its hybrid search feature, refer to the [dedicated guide](/learn/experimental/vector_search).
2 changes: 1 addition & 1 deletion learn/experimental/vector_search.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@ curl -X POST -H 'content-type: application/json' \
```

<Capsule intent="tip" title="Other resources">
Check out the Meilisearch blog post for a [tutorial on implementing semantic search with LangChain](https://blog.meilisearch.com/langchain-semantic-search-tutorial/?utm_campaign=vector-search&utm_source=docs).
Check out the Meilisearch blog post for a [guide on implementing semantic search with LangChain](/learn/cookbooks/langchain).
</Capsule>

## Deactivate vector search
Expand Down

0 comments on commit 2577ab0

Please sign in to comment.