Custom RAG Chatbot

This project implements a custom Retrieval-Augmented Generation (RAG) chatbot using FastAPI, OpenAI, and Qdrant, with enhanced capabilities to extract and process data from both web URLs and PDF documents.

What is RAG?

Retrieval Augmented Generation (RAG) is a hybrid approach that combines retrieval-based techniques with generative models to provide more accurate and contextually relevant answers. In this chatbot, the RAG framework works by retrieving information from a knowledge base (constructed from web URLs and PDFs) and then using OpenAI's GPT model to generate natural language responses.

Features

Web Scraping and Text Processing: Extracts meaningful text data from web pages for use in the chatbot's knowledge base.
PDF Text Extraction: Supports extracting text from PDF documents to enrich the chatbot's knowledge base.
Vector Embedding Generation: Uses OpenAI's text embeddings to convert extracted text into vector representations for efficient storage and retrieval.
Vector Storage and Retrieval: Stores and retrieves vector embeddings using Qdrant, allowing the chatbot to find the most relevant information quickly.
Enhanced Question Answering: Combines retrieved data with OpenAI's GPT model to generate contextually relevant answers that are grounded in the imported information.

How Importing URLs and PDFs Helps

Expands Knowledge Base: By allowing the import of both web URLs and PDFs, the chatbot can build a rich, diverse, and up-to-date knowledge base.
Improves Accuracy: More relevant data leads to better retrieval, which in turn helps the generative model produce more precise and informative answers.
Supports Diverse Data Sources: The ability to process information from both online and offline sources makes the chatbot adaptable to different use cases, including research, customer support, and more.

Setup

Clone the repository
Install dependencies:
```
pip install -r requirements.txt
```
Set up environment variables:
- OPENAI_API_KEY
- QDRANT_URL
- QDRANT_API_KEY

Usage

Start the server: uvicorn main:app --reload
Process a URL: Send a POST request to /process_url/ with a JSON body containing the URL.
Process a PDF: Send a POST request to /process_pdf/ with the PDF file uploaded as a form field.
Ask a question: Send a POST request to /ask_question/ with a JSON body containing the question

API Endpoints

GET /: Welcome message
POST /process_url/: Process and store data from a given URL
POST /process_pdf/: Process and store data from a PDF file
POST /ask_question/: Ask a question and get an AI-generated answer

Dependencies

FastAPI
Pydantic
Requests
BeautifulSoup
PyMuPDF
Langchain
OpenAI
Qdrant

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Custom RAG Chatbot

What is RAG?

Features

How Importing URLs and PDFs Helps

Setup

Usage

API Endpoints

Dependencies

Contributing

About

Releases

Packages

Languages

Dnyaneshvn/Custom-RAG-Chatbot

Folders and files

Latest commit

History

Repository files navigation

Custom RAG Chatbot

What is RAG?

Features

How Importing URLs and PDFs Helps

Setup

Usage

API Endpoints

Dependencies

Contributing

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages