Skip to content

Commit

Permalink
Merge branch 'run-llama:main' into grounding
Browse files Browse the repository at this point in the history
  • Loading branch information
jmugicagonz authored Mar 1, 2024
2 parents 972a84c + 0e39c36 commit c792317
Show file tree
Hide file tree
Showing 4 changed files with 291 additions and 2 deletions.
280 changes: 280 additions & 0 deletions docs/cookbooks/mixedbread_reranker.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,280 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "964030f7-40e4-4398-a5ab-668aabcf3bad",
"metadata": {},
"source": [
"<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/cookbooks/mixedbread_reranker.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"id": "360313ab-9393-430e-9647-e0d5545809b9",
"metadata": {},
"source": [
"# mixedbread Rerank Cookbook\n",
"\n",
"mixedbread.ai has released three fully open-source reranker models under the Apache 2.0 license. For more in-depth information, you can check out their detailed [blog post](https://www.mixedbread.ai/blog/mxbai-rerank-v1). The following are the three models:\n",
"\n",
"1. `mxbai-rerank-xsmall-v1`\n",
"2. `mxbai-rerank-base-v1`\n",
"3. `mxbai-rerank-large-v1`\n",
"\n",
"In this notebook, we'll demonstrate how to use the `mxbai-rerank-base-v1` model with the `SentenceTransformerRerank` module in LlamaIndex. This setup allows you to seamlessly swap in any reranker model of your choice using the `SentenceTransformerRerank` module to enhance your RAG pipeline."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "856ecfdc-04fa-4fe9-a81c-9a5858cd4a6d",
"metadata": {},
"source": [
"### Installation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bfb5314f-e6c7-409c-86df-8e1a5ca59adb",
"metadata": {},
"outputs": [],
"source": [
"!pip install llama-index\n",
"!pip install sentence-transformers"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "5f5393fb-b410-4769-9380-0ef90a33b82e",
"metadata": {},
"source": [
"### Set API Keys"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a9782acf-b0ab-4933-bb41-27cd2a02b5dd",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"YOUR OPENAI API KEY\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b7596ddf-e1de-4098-81f3-fce504d2da94",
"metadata": {},
"outputs": [],
"source": [
"from llama_index.core import (\n",
" VectorStoreIndex,\n",
" SimpleDirectoryReader,\n",
")\n",
"\n",
"from llama_index.core.postprocessor import SentenceTransformerRerank"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "8011ff9c-2b82-47b4-983f-4fafc29e3127",
"metadata": {},
"source": [
"### Download Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6dd335cb-900b-462f-987a-d4af2aac88fa",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2024-03-01 09:52:09-- https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt\n",
"Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.109.133, ...\n",
"Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 75042 (73K) [text/plain]\n",
"Saving to: ‘data/paul_graham/paul_graham_essay.txt’\n",
"\n",
"data/paul_graham/pa 100%[===================>] 73.28K --.-KB/s in 0.007s \n",
"\n",
"2024-03-01 09:52:09 (9.86 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]\n",
"\n"
]
}
],
"source": [
"!mkdir -p 'data/paul_graham/'\n",
"!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "e482b09c-a0df-4788-a75b-a33ade7001d1",
"metadata": {},
"source": [
"### Load Documents"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "342c91b8-301f-40ed-9d09-9acdb1bbdc44",
"metadata": {},
"outputs": [],
"source": [
"documents = SimpleDirectoryReader(\"./data/paul_graham/\").load_data()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "8afdfeb1-57ae-4d2b-ae73-683db205be32",
"metadata": {},
"source": [
"### Build Index"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "47c335e9-dd4d-475c-bade-e2a588e33294",
"metadata": {},
"outputs": [],
"source": [
"index = VectorStoreIndex.from_documents(documents=documents)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "f1ab8157-dbcb-4588-9b3c-5bd2fc4a721e",
"metadata": {},
"source": [
"### Define postprocessor for `mxbai-rerank-base-v1` reranker"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3fcc5590-2e58-4a7e-8b18-a7153c06d1ff",
"metadata": {},
"outputs": [],
"source": [
"from llama_index.core.postprocessor import SentenceTransformerRerank\n",
"\n",
"postprocessor = SentenceTransformerRerank(\n",
" model=\"mixedbread-ai/mxbai-rerank-base-v1\", top_n=2\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "c7c81b0d-0449-4092-80cb-88080e69f980",
"metadata": {},
"source": [
"### Create Query Engine\n",
"\n",
"We will first retrieve 10 relevant nodes and pick top-2 nodes using the defined postprocessor."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e1b23700-15ae-4f1a-9443-43eb1eecab5f",
"metadata": {},
"outputs": [],
"source": [
"query_engine = index.as_query_engine(\n",
" similarity_top_k=10,\n",
" node_postprocessors=[postprocessor],\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "93871f9c-8871-4f43-8ee9-b3ca4e403d86",
"metadata": {},
"source": [
"### Test Queries"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "658d3092-7d86-4520-83a2-c3e630dc02b6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Sam Altman initially declined the offer of becoming president of Y Combinator because he wanted to start a startup focused on making nuclear reactors.\n"
]
}
],
"source": [
"response = query_engine.query(\n",
" \"Why did Sam Altman decline the offer of becoming president of Y Combinator?\",\n",
")\n",
"\n",
"print(response)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "497e715e-3f7a-4140-a3ba-34356e473702",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Paul Graham started YC because he and his partners wanted to create an investment firm where they could implement their own ideas and provide the kind of support to startups that they felt was lacking when they were founders themselves. They aimed to not only make seed investments but also assist startups with various aspects of setting up a company, similar to the help they had received from others in the past.\n"
]
}
],
"source": [
"response = query_engine.query(\n",
" \"Why did Paul Graham start YC?\",\n",
")\n",
"\n",
"print(response)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Original file line number Diff line number Diff line change
Expand Up @@ -332,4 +332,5 @@ maxdepth: 1
/examples/node_postprocessor/rankGPT.ipynb
/examples/node_postprocessor/ColbertRerank.ipynb
/examples/node_postprocessor/JinaRerank.ipynb
/cookbooks/mixedbread_reranker.ipynb
```
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,11 @@ def _simple_fusion(
for nodes_with_scores in results.values():
for node_with_score in nodes_with_scores:
text = node_with_score.node.get_content()
all_nodes[text] = node_with_score
if text in all_nodes:
max_score = max(node_with_score.score, all_nodes[text].score)
all_nodes[text].score = max_score
else:
all_nodes[text] = node_with_score

return sorted(all_nodes.values(), key=lambda x: x.score or 0.0, reverse=True)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,11 @@ def _simple_fusion(
for nodes_with_scores in results.values():
for node_with_score in nodes_with_scores:
text = node_with_score.node.get_content()
all_nodes[text] = node_with_score
if text in all_nodes:
score = max(node_with_score.score, all_nodes[text].score)
all_nodes[text].score = score
else:
all_nodes[text] = node_with_score

return sorted(all_nodes.values(), key=lambda x: x.score or 0.0, reverse=True)

Expand Down

0 comments on commit c792317

Please sign in to comment.