Skip to content

This repository contains a comprehensive implementation of Retrieval Augmented Generation (RAG) leveraging Mistral-7B-Instruct-v0.1 for generating responses from a custom dataset.

Notifications You must be signed in to change notification settings

ChanukaRavishan/MistralRAG-LlamaIndex

Repository files navigation

Retrieval Augmented Generation (RAG) with Mistral-7B-Instruct-v0.1 and LlamaIndex

Web interface

This repository contains a complete implementation of Retrieval Augmented Generation (RAG) using Mistral-7B-Instruct-v0.1 for generating responses from a custom dataset. The main file app.py sets up a Flask web server to provide an interface for querying the RAG model.

This implementation is done in two key phases: indexing and retrieval & generation. First, during indexing, documents are split into text chunks, and their embeddings are stored in a vector database. Then, in the retrieval & generation phase, user queries are matched against these embeddings, prompting the LLM to generate responses based on the retrieved contexts.

Summary of the pipeline

Why Mistral-7B?, Mistral-7B, especially in its 4-bit quantized version, offers impressive performance while being efficient in memory usage ( other open-source LLMs such as Llama2, Mindy-7B, MoMo-70B, etc can also be utilized). Here ‘Mistral-7B-Instruct-v0.2-Q4_K_M.gguf’ is used for the efficient retrieval and generation tasks.

The stack includes the LlamaIndex framework, which provides SentenceWindowNodeParser, VectorStoreIndex, ServiceContext, and SentenceTransformerRerank for powerful, breeze querying, and accessing domain-specific data, outperforming alternatives like Langchain.

Instead of fine-tuning an LLM model, the embedding-based retrieval ensures scalability and avoids issues like model drift, cost, and complexity.

How to assess the RAG system? Check out the evaluation process and benchmarks in detail here. It offers further insights into the methodology and performance metrics.

(Note: A system with at least 12 GB GPU and 16 GB RAM is recommended for optimal performance.)

Setup

Set up the environment and install the necessary dependencies. You can do this using the following steps:

  1. Clone this repository to your local machine:

    git clone https://github.com/ChanukaRavishan/MistralRAG-LlamaIndex.git
  2. Navigate to the repository directory:

    cd MistralRAG-LlamaIndex
  3. Install the required Python packages. You may use a virtual environment to manage dependencies:

    pip install -r requirements.txt

Usage

1. Creating the vector dataset

To create a vector storage follow the instructions in the file: 'LLaMaCPP_python_creating_vector_storage.ipynb' Here, I'm using huggingface 'bge-small-en-v1.5 model' for the embedding generation and storing the embeddings in the VectorStoreIndex provided by LlamaIndex.

2. Starting the web server

To start the Flask web server and interact with the RAG model, run the following command:

nohup python app.py &

This will start the server locally on http://localhost:3000/. You can now visit this URL in your web browser to access the interface or if you are running this implementation on a remote machine, you can visit http://<your-remote-ip-address>:3000/.

API Endpoints

  • GET /query: This endpoint accepts a query string as a parameter (message) and returns the response generated by the RAG model. Example usage:

    http://localhost:3000/query?message=your_query_here

Main File Explanation

The app.py file contains the main implementation for setting up the Flask web server and integrating the RAG model. Here's a breakdown of its components:

  • Initialization Functions:

    • initialize_llm: Initializes the LlamaCPP model with specified parameters.
    • initialize_query_engine: Initializes the query engine for executing queries with the RAG model.
  • Flask App Routes:

    • /: Renders the index.html template in the templates folder.
    • /query: Accepts a query string and returns the response generated by the RAG model.

Additional Notes

  • Ensure that you have the necessary models and resources available in the specified paths.
  • Customize the configuration and parameters according to your requirements.

Feel free to explore and modify the code to suit your needs. If you encounter any issues or have suggestions for improvement, please don't hesitate to open an issue or contribute to the repository.

Thanks!

About

This repository contains a comprehensive implementation of Retrieval Augmented Generation (RAG) leveraging Mistral-7B-Instruct-v0.1 for generating responses from a custom dataset.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published