This project enables efficient querying of CSV files using the LangChain library and OpenAI GPT-3.5 turbo model. The source code loads, indexes, and retrieves data from CSV files, providing accurate responses to relevant queries.
- Loading CSV Files
- Utilizes LangChain's
DirectoryLoader
class to load all CSV files in a specified directory.
- Utilizes LangChain's
- Creating Vector Indexes
- Uses the
VectorIndexStore
class of LangChain to create vector indexes with OpenAI embeddings.
- Uses the
- Retrieval and Query Execution
- Implements the
RetrievalQA
chain, integrating OpenAI GPT-3.5 turbo model and prompt templates for efficient and accurate query responses.
- Implements the
- Query Execution
- Executes queries on the uploaded CSV files, ensuring responses are relevant to the content of the CSV files.
- Root Directory
- Contains the
src
folder, Jupyter notebook,requirements.txt
file, andmain.py
file.
- Contains the
- Jupyter Notebook
- Functionality can be run by executing commands in the notebook.
- Main.py
- Imports functions from
config.py
in thesrc
directory and calls them through the Gradio interface.
- Imports functions from
- Config.py
- Functions can be used for backend API or modified as needed in the notebook.
- Only answers queries related to the uploaded CSV files.
- Does not respond to irrelevant queries outside the context of the CSV files.
These instructions will help you set up and run the project on your local machine.
- Python (version 3.10 or later)
- OpenAI API key
-
Clone the Repo
Clone the repo open the folder.
-
Create a Python virtual environment
Open your terminal and create a Python virtual environment by using the following command:
python -m venv venv
-
Activate the virtual environment
.\venv\Scripts\Activate.ps1
-
Install required packages
pip install -r requirements.txt
-
Set up OpenAI API key
export OPENAI_KEY=InsertYourKeyHere
-
Run the main application
python main.py
-
Upload your CSV files using the Gradio interface.
-
Start executing queries. The functionality will create indexes of the CSV file data from which the model retrieves the query results.
If you want to contribute to this project, please fork the repository and create a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.