PDFChatAI is an advanced tool for interacting with PDF documents using AI technology. It processes PDF files, extracts text, generates embeddings, and enables context-aware querying through a user-friendly web interface with asynchronous task processing.
GitHub Repository: https://github.com/PierrunoYT/PDF-Chat-AI
- Extract text from single or multiple PDF files
- Clean and preprocess extracted text
- Generate embeddings for text chunks using OpenAI's models or local models
- Store extracted text, metadata, and embeddings in a SQLite database
- Use FAISS for efficient similarity search
- Perform context-aware querying with conversation history
- Web interface for uploading PDFs, indexing, and querying
- Asynchronous task processing using Python's threading module
- Python 3.7+
- Flask
- PyPDF2
- NLTK
- sentence-transformers
- FAISS
- OpenAI API (optional)
-
Clone this repository:
git clone https://github.com/PierrunoYT/PDF-Chat-AI.git cd PDF-Chat-AI
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Set up environment variables: Create a
.env
file in the project root and add the necessary variables.
-
Run the Flask application:
python app.py
-
Open a web browser and navigate to
http://localhost:5000
to access the web interface. -
Use the web interface to:
- Upload PDF files
- Index PDF files
- Perform context-aware queries
- View conversation history
app.py
: Flask application for the web interfaceindexing_pipeline.py
: Main pipeline for processing and indexing PDFspdf_processor.py
: Functions for extracting text from PDFsdatabase_manager.py
: Manages the SQLite databaseembedding_model.py
: Handles embedding generationfaiss_manager.py
: Manages the FAISS index for similarity searchquery_processor.py
: Processes and expands queriesprompt_engineer.py
: Generates prompts for context-aware responsesopenrouter_client.py
: Client for interacting with the OpenRouter API
PDFChatAI is licensed under the MIT License - see the LICENSE file for details.