RAG-QABot

This repository contains a system designed to process financial documents (e.g., P&L statements) and answer queries using a Retrieval-Augmented Generation (RAG) model. The project is divided into two main parts:

Part 1: RAG Model-based QA Bot:
- Located in the RAGModel folder.
- Implements a Retrieval-Augmented Generation pipeline in RAGModel-QABot.ipynb.
Part 2: Interactive QA Bot Interface:
- Located in the QA-Bot folder.
- Provides an interactive user interface using Streamlit for uploading financial PDFs and querying the extracted data.
Example Queries
- Located in Example-Demonstarting-Images folder
- It includes demonstration of QA bot along with the examples of financial queries.

Part 1: RAG Model QA Bot

Approach
The financial data is first loaded and preprocessed to extract meaningful information, such as tabular data from P&L statements. This preprocessing ensures that the data is structured and ready for embedding. Using a transformer-based Sentence Transformer model, the data is encoded into dense vector representations, capturing the semantic meaning of the financial content. These embeddings are then indexed in Pinecone, a high-performance vector database, allowing efficient storage and retrieval. Queries entered by users are similarly encoded into vectors and matched against the indexed data in Pinecone. The most relevant results are retrieved based on similarity scores, providing the context needed for generating accurate answers to user queries.
Usage Instructions
- Open RAGModel-QABot.ipynb in google collab.
- Follow the steps to:
  - Upload the pdf file containing P&L data
  - Run all the parts.
  - Test the model by running example queries.

Part 2: Interactive QA Bot Interface

Approach
- Frontend:
  Built with Streamlit for uploading financial PDFs and querying.
  Displays the extracted P&L data in a table format.
- Backend:
  Extracts data from uploaded PDFs (e.g., P&L tables) using tabula and pdfplumber.
  Embeds data using Sentence Transformers and stores it in Pinecone.
  Queries Pinecone and retrieves the most relevant rows from the document.
- Deployment:
  Deployed locally using Streamlit and exposed via localtunnel.
Usage Instructions
- Open streamlit_run.ipynb in google collab.
- Upload frontend_qabot.py and backend_qabot.py
- Install the libraries
- Run the command - !wget -q -O - ipv4.icanhazip.com
- copy the IP address
- Run the command - !streamlit run frontend_qabot.py & npx localtunnel --port 8501
- Click on 'your url'
- Enter the copied IP address
- Hence you will get the Interface of the QA BOT.
Overall Workflow
- Upload a PDF containing a P&L table.
- Click "Process PDF" to preprocess and index the data in Pinecone.
- Enter financial queries (e.g., "What is the gross profit?") in the input box.
- View the retrieved answers alongside the relevant P&L segments in an interactive table.

Key Features

Scalable Retrieval: Uses Pinecone to handle large datasets efficiently.
Interactive Frontend: User-friendly interface for real-time queries.
Dynamic Metadata Display: Extracted data is presented in a structured tabular format.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Example-Demonstarting-Images		Example-Demonstarting-Images
QA-Bot		QA-Bot
RAGModel		RAGModel
RAG-QABot-Documentation.pdf		RAG-QABot-Documentation.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG-QABot

Part 1: RAG Model QA Bot

Part 2: Interactive QA Bot Interface

Key Features

About

Releases

Packages

Languages

bhatt-j/RAG-QABot

Folders and files

Latest commit

History

Repository files navigation

RAG-QABot

Part 1: RAG Model QA Bot

Part 2: Interactive QA Bot Interface

Key Features

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages