-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1319 from PrashantDixit0/lancedb-integration
LanceDB Integration
- Loading branch information
Showing
6 changed files
with
4,732 additions
and
3,158 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,158 +1,161 @@ | ||
--- | ||
title: "Train PandasAI" | ||
--- | ||
|
||
You can train PandasAI to understand your data better and to improve its performance. Training is as easy as calling the `train` method on the `Agent`. | ||
|
||
There are two kinds of training: | ||
|
||
- instructions training | ||
- q/a training | ||
|
||
<iframe | ||
width="560" | ||
height="315" | ||
src="https://www.youtube.com/embed/wd7abNyv8vk?si=1W_Kwg1i3ecfoPGv" | ||
title="Train on PandasAI" | ||
frameborder="0" | ||
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" | ||
allowfullscreen | ||
></iframe> | ||
<br /> | ||
|
||
## Prerequisites | ||
|
||
Before you start training PandasAI, you need to set your PandasAI API key. You can generate your API key by signing up at [https://pandabi.ai](https://pandabi.ai). | ||
|
||
Then you can set your API key as an environment variable: | ||
|
||
```python | ||
import os | ||
|
||
os.environ["PANDASAI_API_KEY"] = "YOUR_PANDASAIAPI_KEY" | ||
``` | ||
|
||
It is important that you set the API key, or it will fail with the following error: `No vector store provided. Please provide a vector store to train the agent`. | ||
|
||
## Instructions training | ||
|
||
Instructions training is used to teach PandasAI how you expect it to respond to certain queries. You can provide generic instructions about how you expect the model to approach certain types of queries, and PandasAI will use these instructions to generate responses to similar queries. | ||
|
||
For example, you might want the LLM to be aware that your company's fiscal year starts in April, or about specific ways you want to handle missing data. Or you might want to teach it about specific business rules or data analysis best practices that are specific to your organization. | ||
|
||
To train PandasAI with instructions, you can use the `train` method on the `Agent`, as it follows: | ||
|
||
The training uses by default the `BambooVectorStore` to store the training data, and it's accessible with the API key. | ||
|
||
As an alternative, if you want to use a local vector store (enterprise only for production use cases), you can use the `ChromaDB`, `Qdrant` or `Pinecone` vector stores (see examples below). | ||
|
||
```python | ||
from pandasai import Agent | ||
|
||
# Set your PandasAI API key (you can generate one signing up at https://pandabi.ai) | ||
os.environ["PANDASAI_API_KEY"] = "YOUR_PANDASAI_API_KEY" | ||
|
||
agent = Agent("data.csv") | ||
agent.train(docs="The fiscal year starts in April") | ||
|
||
response = agent.chat("What is the total sales for the fiscal year?") | ||
print(response) | ||
# The model will use the information provided in the training to generate a response | ||
``` | ||
|
||
Your training data is persisted, so you only need to train the model once. | ||
|
||
## Q/A training | ||
|
||
Q/A training is used to teach PandasAI the desired process to answer specific questions, enhancing the model's performance and determinism. One of the biggest challenges with LLMs is that they are not deterministic, meaning that the same question can produce different answers at different times. Q/A training can help to mitigate this issue. | ||
|
||
To train PandasAI with Q/A, you can use the `train` method on the `Agent`, as it follows: | ||
|
||
```python | ||
from pandasai import Agent | ||
|
||
agent = Agent("data.csv") | ||
|
||
# Train the model | ||
query = "What is the total sales for the current fiscal year?" | ||
response = """ | ||
import pandas as pd | ||
df = dfs[0] | ||
# Calculate the total sales for the current fiscal year | ||
total_sales = df[df['date'] >= pd.to_datetime('today').replace(month=4, day=1)]['sales'].sum() | ||
result = { "type": "number", "value": total_sales } | ||
""" | ||
agent.train(queries=[query], codes=[response]) | ||
|
||
response = agent.chat("What is the total sales for the last fiscal year?") | ||
print(response) | ||
# The model will use the information provided in the training to generate a response | ||
``` | ||
|
||
Also in this case, your training data is persisted, so you only need to train the model once. | ||
|
||
## Training with local Vector stores | ||
|
||
If you want to train the model with a local vector store, you can use the local `ChromaDB`, `Qdrant` or `Pinecone` vector stores. Here's how to do it: | ||
An enterprise license is required for using the vector stores locally, ([check it out](https://github.com/Sinaptik-AI/pandas-ai/blob/master/pandasai/ee/LICENSE)). | ||
If you plan to use it in production, [contact us](https://tally.so/r/wzZNWg). | ||
|
||
```python | ||
from pandasai import Agent | ||
from pandasai.ee.vectorstores import ChromaDB | ||
from pandasai.ee.vectorstores import Qdrant | ||
from pandasai.ee.vectorstores import Pinecone | ||
|
||
# Instantiate the vector store | ||
vector_store = ChromaDB() | ||
# or with Qdrant | ||
# vector_store = Qdrant() | ||
# or with Pinecone | ||
# vector_store = Pinecone( | ||
# api_key="*****", | ||
# embedding_function=embedding_function, | ||
# dimensions=384, # dimension of your embedding model | ||
# ) | ||
|
||
# Instantiate the agent with the custom vector store | ||
agent = Agent("data.csv", vectorstore=vector_store) | ||
|
||
# Train the model | ||
query = "What is the total sales for the current fiscal year?" | ||
response = """ | ||
import pandas as pd | ||
df = dfs[0] | ||
# Calculate the total sales for the current fiscal year | ||
total_sales = df[df['date'] >= pd.to_datetime('today').replace(month=4, day=1)]['sales'].sum() | ||
result = { "type": "number", "value": total_sales } | ||
""" | ||
agent.train(queries=[query], codes=[response]) | ||
|
||
response = agent.chat("What is the total sales for the last fiscal year?") | ||
print(response) | ||
# The model will use the information provided in the training to generate a response | ||
``` | ||
|
||
## Troubleshooting | ||
|
||
In some cases, you might get an error like this: `No vector store provided. Please provide a vector store to train the agent`. It means no API key has been generated to use the `BambooVectorStore`. | ||
|
||
Here's how to fix it: | ||
|
||
First of all, you'll need to generated an API key (check the prerequisites paragraph above). | ||
Once you have generated the API key, you have 2 options: | ||
|
||
1. Override the env variable (`os.environ["PANDASAI_API_KEY"] = "YOUR_PANDASAI_API_KEY"`) | ||
2. Instantiate the vector store and pass the API key: | ||
|
||
```python | ||
# Instantiate the vector store with the API keys | ||
vector_store = BambooVectorStor(api_key="YOUR_PANDASAI_API_KEY") | ||
|
||
# Instantiate the agent with the custom vector store | ||
agent = Agent(connector, config={...} vectorstore=vector_store) | ||
``` | ||
--- | ||
title: "Train PandasAI" | ||
--- | ||
|
||
You can train PandasAI to understand your data better and to improve its performance. Training is as easy as calling the `train` method on the `Agent`. | ||
|
||
There are two kinds of training: | ||
|
||
- instructions training | ||
- q/a training | ||
|
||
<iframe | ||
width="560" | ||
height="315" | ||
src="https://www.youtube.com/embed/wd7abNyv8vk?si=1W_Kwg1i3ecfoPGv" | ||
title="Train on PandasAI" | ||
frameborder="0" | ||
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" | ||
allowfullscreen | ||
></iframe> | ||
<br /> | ||
|
||
## Prerequisites | ||
|
||
Before you start training PandasAI, you need to set your PandasAI API key. You can generate your API key by signing up at [https://pandabi.ai](https://pandabi.ai). | ||
|
||
Then you can set your API key as an environment variable: | ||
|
||
```python | ||
import os | ||
|
||
os.environ["PANDASAI_API_KEY"] = "YOUR_PANDASAIAPI_KEY" | ||
``` | ||
|
||
It is important that you set the API key, or it will fail with the following error: `No vector store provided. Please provide a vector store to train the agent`. | ||
|
||
## Instructions training | ||
|
||
Instructions training is used to teach PandasAI how you expect it to respond to certain queries. You can provide generic instructions about how you expect the model to approach certain types of queries, and PandasAI will use these instructions to generate responses to similar queries. | ||
|
||
For example, you might want the LLM to be aware that your company's fiscal year starts in April, or about specific ways you want to handle missing data. Or you might want to teach it about specific business rules or data analysis best practices that are specific to your organization. | ||
|
||
To train PandasAI with instructions, you can use the `train` method on the `Agent`, as it follows: | ||
|
||
The training uses by default the `BambooVectorStore` to store the training data, and it's accessible with the API key. | ||
|
||
As an alternative, if you want to use a local vector store (enterprise only for production use cases), you can use the `ChromaDB`, `Qdrant` or `Pinecone` vector stores (see examples below). | ||
|
||
```python | ||
from pandasai import Agent | ||
|
||
# Set your PandasAI API key (you can generate one signing up at https://pandabi.ai) | ||
os.environ["PANDASAI_API_KEY"] = "YOUR_PANDASAI_API_KEY" | ||
|
||
agent = Agent("data.csv") | ||
agent.train(docs="The fiscal year starts in April") | ||
|
||
response = agent.chat("What is the total sales for the fiscal year?") | ||
print(response) | ||
# The model will use the information provided in the training to generate a response | ||
``` | ||
|
||
Your training data is persisted, so you only need to train the model once. | ||
|
||
## Q/A training | ||
|
||
Q/A training is used to teach PandasAI the desired process to answer specific questions, enhancing the model's performance and determinism. One of the biggest challenges with LLMs is that they are not deterministic, meaning that the same question can produce different answers at different times. Q/A training can help to mitigate this issue. | ||
|
||
To train PandasAI with Q/A, you can use the `train` method on the `Agent`, as it follows: | ||
|
||
```python | ||
from pandasai import Agent | ||
|
||
agent = Agent("data.csv") | ||
|
||
# Train the model | ||
query = "What is the total sales for the current fiscal year?" | ||
response = """ | ||
import pandas as pd | ||
df = dfs[0] | ||
# Calculate the total sales for the current fiscal year | ||
total_sales = df[df['date'] >= pd.to_datetime('today').replace(month=4, day=1)]['sales'].sum() | ||
result = { "type": "number", "value": total_sales } | ||
""" | ||
agent.train(queries=[query], codes=[response]) | ||
|
||
response = agent.chat("What is the total sales for the last fiscal year?") | ||
print(response) | ||
# The model will use the information provided in the training to generate a response | ||
``` | ||
|
||
Also in this case, your training data is persisted, so you only need to train the model once. | ||
|
||
## Training with local Vector stores | ||
|
||
If you want to train the model with a local vector store, you can use the local `ChromaDB`, `Qdrant` or `Pinecone` vector stores. Here's how to do it: | ||
An enterprise license is required for using the vector stores locally, ([check it out](https://github.com/Sinaptik-AI/pandas-ai/blob/master/pandasai/ee/LICENSE)). | ||
If you plan to use it in production, [contact us](https://forms.gle/JEUqkwuTqFZjhP7h8). | ||
|
||
```python | ||
from pandasai import Agent | ||
from pandasai.ee.vectorstores import ChromaDB | ||
from pandasai.ee.vectorstores import Qdrant | ||
from pandasai.ee.vectorstores import Pinecone | ||
from pandasai.ee.vector_stores import LanceDB | ||
|
||
# Instantiate the vector store | ||
vector_store = ChromaDB() | ||
# or with Qdrant | ||
# vector_store = Qdrant() | ||
# or with LanceDB | ||
vector_store = LanceDB() | ||
# or with Pinecone | ||
# vector_store = Pinecone( | ||
# api_key="*****", | ||
# embedding_function=embedding_function, | ||
# dimensions=384, # dimension of your embedding model | ||
# ) | ||
|
||
# Instantiate the agent with the custom vector store | ||
agent = Agent("data.csv", vectorstore=vector_store) | ||
|
||
# Train the model | ||
query = "What is the total sales for the current fiscal year?" | ||
response = """ | ||
import pandas as pd | ||
df = dfs[0] | ||
# Calculate the total sales for the current fiscal year | ||
total_sales = df[df['date'] >= pd.to_datetime('today').replace(month=4, day=1)]['sales'].sum() | ||
result = { "type": "number", "value": total_sales } | ||
""" | ||
agent.train(queries=[query], codes=[response]) | ||
|
||
response = agent.chat("What is the total sales for the last fiscal year?") | ||
print(response) | ||
# The model will use the information provided in the training to generate a response | ||
``` | ||
|
||
## Troubleshooting | ||
|
||
In some cases, you might get an error like this: `No vector store provided. Please provide a vector store to train the agent`. It means no API key has been generated to use the `BambooVectorStore`. | ||
|
||
Here's how to fix it: | ||
|
||
First of all, you'll need to generated an API key (check the prerequisites paragraph above). | ||
Once you have generated the API key, you have 2 options: | ||
|
||
1. Override the env variable (`os.environ["PANDASAI_API_KEY"] = "YOUR_PANDASAI_API_KEY"`) | ||
2. Instantiate the vector store and pass the API key: | ||
|
||
```python | ||
# Instantiate the vector store with the API keys | ||
vector_store = BambooVectorStor(api_key="YOUR_PANDASAI_API_KEY") | ||
|
||
# Instantiate the agent with the custom vector store | ||
agent = Agent(connector, config={...} vectorstore=vector_store) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.