Multimodal-RAG

An AI-powered chat application using text, audio, and images for context-aware responses. It integrates language models and vector databases to enhance retrieval-augmented generation (RAG) capabilities, making it a versatile tool for intelligent conversations.

Features

Text to Speech: Convert text responses to speech using gTTS.
Speech to Text: Process and transcribe audio files using speech_recognition and Wav2Vec2.
Visual Question Answering: Answer questions based on uploaded images using BLIP.
PDF Knowledge Base: Upload PDF files to enhance the knowledge base for more accurate responses.
Context-Aware Responses: Use conversation history to provide more relevant answers.

Installation

Clone the repository:

git clone https://github.com/Ahmed-AI-01/Multimodal-RAG.git
cd Multimodal-RAG

Create a virtual environment and activate it:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the required dependencies:
```
pip install -r requirements.txt
```
Set up environment variables:
- Create a .env file in the root directory and add your Pinecone API key:
```
PINECONE_API_KEY=your_pinecone_api_key
```

Usage

Run the Streamlit application:
```
streamlit run app.py
```
Open your web browser and navigate to http://localhost:8501.
Interact with the chat application by uploading PDFs, images, or audio files and typing your questions.

Project Structure

app.py: Main application file for Streamlit.
audio_processor.py: Handles audio processing for speech-to-text and text-to-speech.
llama_cpp_chains.py: Implements Llama-based language model chains.
ollama_chain.py: Implements Ollama-based language model chains and RAG chains.
pdf_handler.py: Handles PDF loading and splitting.
utils.py: Utility functions, including configuration loading.
vectorstore.py: Manages vector database setup and indexing.
vqa.py: Handles visual question answering and audio transcription.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
.env		.env
LICENSE		LICENSE
README.md		README.md
app.py		app.py
config.yaml		config.yaml
maxresdefault.jpg		maxresdefault.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal-RAG

Features

Installation

Usage

Project Structure

License

Acknowledgements

About

Releases

Packages

Languages

License

Ahmed-AI-01/Multimodal-RAG

Folders and files

Latest commit

History

Repository files navigation

Multimodal-RAG

Features

Installation

Usage

Project Structure

License

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages