An AI-powered chat application using text, audio, and images for context-aware responses. It integrates language models and vector databases to enhance retrieval-augmented generation (RAG) capabilities, making it a versatile tool for intelligent conversations.
- Text to Speech: Convert text responses to speech using gTTS.
- Speech to Text: Process and transcribe audio files using
speech_recognition
andWav2Vec2
. - Visual Question Answering: Answer questions based on uploaded images using BLIP.
- PDF Knowledge Base: Upload PDF files to enhance the knowledge base for more accurate responses.
- Context-Aware Responses: Use conversation history to provide more relevant answers.
-
Clone the repository:
git clone https://github.com/Ahmed-AI-01/Multimodal-RAG.git cd Multimodal-RAG
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required dependencies:
pip install -r requirements.txt
-
Set up environment variables:
- Create a .env file in the root directory and add your Pinecone API key:
PINECONE_API_KEY=your_pinecone_api_key
- Create a .env file in the root directory and add your Pinecone API key:
-
Run the Streamlit application:
streamlit run app.py
-
Open your web browser and navigate to
http://localhost:8501
. -
Interact with the chat application by uploading PDFs, images, or audio files and typing your questions.
- app.py: Main application file for Streamlit.
- audio_processor.py: Handles audio processing for speech-to-text and text-to-speech.
- llama_cpp_chains.py: Implements Llama-based language model chains.
- ollama_chain.py: Implements Ollama-based language model chains and RAG chains.
- pdf_handler.py: Handles PDF loading and splitting.
- utils.py: Utility functions, including configuration loading.
- vectorstore.py: Manages vector database setup and indexing.
- vqa.py: Handles visual question answering and audio transcription.
This project is licensed under the Apache License 2.0. See the LICENSE file for details.