This Streamlit application implements a Multimodal Retrieval-Augmented Generation (RAG) system. It processes various types of documents including text files, PDFs, PowerPoint presentations, and images. The app leverages Large Language Models and Vision Language Models to extract and index information from these documents, allowing users to query the processed data through an interactive chat interface.
The system utilizes LlamaIndex for efficient indexing and retrieval of information, NIM microservices for high-performance inference capabilities, and Milvus as a vector database for efficient storage and retrieval of embedding vectors. This combination of technologies enables the application to handle complex multimodal data, perform advanced queries, and deliver rapid, context-aware responses to user inquiries.
- Multi-format Document Processing: Handles text files, PDFs, PowerPoint presentations, and images.
- Advanced Text Extraction: Extracts text from PDFs and PowerPoint slides, including tables and embedded images.
- Image Analysis: Uses a VLM (NeVA) to describe images and Google's DePlot for processing graphs/charts on NIM microservices.
- Vector Store Indexing: Creates a searchable index of processed documents using Milvus vector store.
- Interactive Chat Interface: Allows users to query the processed information through a chat-like interface.
- Clone the repository:
git clone https://github.com/NVIDIA/GenerativeAIExamples.git
cd GenerativeAIExamples/community/multimodal_rag
-
(Optional) Create a conda environment or a virtual environment:
-
Using conda:
conda create --name multimodal-rag python=3.10 conda activate multimodal-rag
-
Using venv:
python -m venv venv source venv/bin/activate
-
-
Install the required packages:
pip install -r requirements.txt
- Set up your NVIDIA API key as an environment variable:
export NVIDIA_API_KEY="your-api-key-here"
- Refer this tutorial to install and start the GPU-accelerated Milvus container:
sudo docker compose up -d
- Ensure the Milvus container is running:
docker ps
- Run the Streamlit app:
streamlit run app.py
-
Open the provided URL in your web browser.
-
Choose between uploading files or specifying a directory path containing your documents.
-
Process the files by clicking the "Process Files" or "Process Directory" button.
-
Once processing is complete, use the chat interface to query your documents.
app.py
: Main Streamlit applicationutils.py
: Utility functions for image processing and API interactionsdocument_processors.py
: Functions for processing various document typesrequirements.txt
: List of Python dependenciesvectorstore/
: Repository to store information from pdfs and ppt
To utilize GPU acceleration in the vector database, ensure that:
- Your system has a compatible NVIDIA GPU.
- You're using the GPU-enabled version of Milvus (as shown in the setup instructions).
- There are enough concurrent requests to justify GPU usage. GPU acceleration typically shows significant benefits under high load conditions.
It's important to note that GPU acceleration will only be used when the incoming requests are extremely high. For more detailed information on GPU indexing and search in Milvus, refer to the official Milvus GPU Index documentation.
To connect the GPU-accelerated Milvus with LlamaIndex, update the MilvusVectorStore configuration in app.py:
vector_store = MilvusVectorStore(
host="127.0.0.1",
port=19530,
dim=1024,
collection_name="your_collection_name",
gpu_id=0 # Specify the GPU ID to use
)
Contributions to this project are welcome! Please follow these steps:
- Fork the NVIDIA/GenerativeAIExamples repository.
- Create a new branch for your feature or bug fix.
- Make your changes in the community/multimodal_rag/ directory.
- Submit a pull request to the main repository.