This project demonstrates a scalable Retrieval-Augmented Generation (RAG) system designed for medical question answering using a combination of a vector database and a local Large Language Model (LLM). It showcases modern machine learning and software engineering practices with the potential to integrate cloud-based solutions such as Azure Cognitive Search for production-ready applications.
The system retrieves relevant research findings from the PubMedQA dataset, embeds the content using Sentence Transformers, and stores them in Qdrant, a high-performance vector database. Queries are answered using a local LLM (e.g., Llama), augmented with the retrieved context.
- Context-Enhanced Responses: Combines vector-based search with LLMs for accurate and context-aware answers.
- Modular Design: Supports local deployment with Qdrant and extensibility for cloud integration with Azure Cognitive Search.
- FastAPI Framework: Provides an intuitive and scalable API interface.
- Dockerized Environment: Simplifies deployment with separate configurations for development and production.
- Data Preparation:
- The PubMedQA dataset is loaded, and relevant entries are embedded using Sentence Transformers (
all-MiniLM-L6-v2
).
- The PubMedQA dataset is loaded, and relevant entries are embedded using Sentence Transformers (
- Vector Storage:
- Qdrant stores embeddings for efficient vector-based retrieval.
- Query Pipeline:
- User queries are vectorized and matched with relevant embeddings in Qdrant.
- Retrieved context is passed to the local LLM (e.g., Llama) for an enriched response.
- API Interaction:
- Exposes endpoints for submitting queries (
/ask
) via FastAPI.
- Exposes endpoints for submitting queries (
- Language & Framework:
- Python 3.10
- FastAPI
- Machine Learning:
- Hugging Face Datasets (PubMedQA)
- Sentence Transformers
- Vector Database:
- Qdrant
- Local LLM:
- Llama (via LlamaFile)
- DevOps & Deployment:
- Docker, Docker Compose
- Multi-stage Dockerfile for optimized builds
- Cloud (Optional):
- Azure Cognitive Search (ready for integration)
- Docker and Docker Compose (v2+)
- Python 3.10+ (optional, for local testing)
- GPU (optional, for accelerated embeddings)
git clone https://github.com/tanle8/MediRAG.git
cd MediRAG
pip install -r requirements.txt
Build the Docker image with all dependencies:
docker compose up --build
Use the development configuration to enable live code reloading:
make dev-build
Access the FastAPI interface at http://localhost:80/docs.
For optimized builds and deployment:
make prod-build
GET /
:- Redirects to the Swagger UI at
/docs
.
- Redirects to the Swagger UI at
POST /ask
:- Submit a query and receive an enriched response.
curl -X POST "http://localhost:80/ask" -H "Content-Type: application/json" -d '{"query": "What are the long-term outcomes of laparoscopic surgery for hiatal hernia repair?"}'
{
"response": "Laparoscopic surgery for hiatal hernia repair has shown positive long-term outcomes, with reduced recurrence rates compared to traditional open surgeries."
}
Configure via .env
:
LLM_TYPE=local
LOCAL_LLM_SERVICE_URL=http://host.docker.internal:8080/
AZURE_API_KEY=your-api-key # For optional Azure integration
.
├── webapp
│ ├── main.py # FastAPI app
│ ├── vectorstore.py # Qdrant operations
│ ├── embeddings.py # Embedding generation
│ ├── llm.py # Interaction with local LLM
│ ├── __init__.py
├── requirements.txt # Python dependencies
├── Dockerfile # Multi-stage Docker build
├── docker-compose.dev.yml # Docker Compose for development
├── docker-compose.prod.yml # Docker Compose for production
├── Makefile # Automation for builds and deployment
└── README.md # Project documentation
- Cloud Integration:
- Azure Cognitive Search for scalable vector-based retrieval.
- Azure OpenAI for hosted GPT models.
- Advanced Indexing:
- Experiment with hybrid search (e.g., dense + sparse retrieval).
- Scalability:
- Kubernetes support for deploying across clusters.
- Improved UI:
- Build a web-based frontend for user-friendly interactions.
We welcome contributions from the community! Please follow these steps:
- Fork the repository.
- Create a new branch (
feature-xyz
). - Commit your changes.
- Open a pull request.
- Hugging Face for the PubMedQA dataset.
- Qdrant for the vector database.
- Sentence Transformers for efficient embedding generation.
- Uvicorn & FastAPI for the API framework.
This project is licensed under the MIT License. See LICENSE for more details.
For questions, feedback, or collaborations, please reach out:
- Email: [email protected]
- LinkedIn: Tan (David) LE
- GitHub: @tanle8