Financial Analyst Chabot is a RAG based solution that allows the user to ask questions based on financial documents (excel in this case). It provides a web based UI (Streamlit) for user interaction and uses OpenAI to generate embeddings and responses. PineCone is used to store and search vector embeddings.
- Upload & parse excel spreadsheets
- Smart document chunking for optimal retrieval
- Created embeddings using OpenAI's
text_embedding-3-large - Vector store and similarity search using PineCone
- Chatbot interface built using Streamlit
- Embeddings - Open AI
text-embedding-3-large - LLM - Open AI
gpt-4o - Vector Store - Pinecone
- UI - Streamlit
- File Parsing - pandas + OpenPyXL
- Chunking - LangChain
First the documents are ingested using a parser and chunks are created using RecursiveCharacterTextSplitter from Langchain. Then using OpenAIEmbeddings text-embeddings-3-large embeddings are created and stored in the Pinecone vector database.
When a user asks a questions, chatbot create query embeddings and retrieves for similar documents using cosine similarity. Top_k is set to 8 and OpenAI GPT temperature is set to 0.2 to control the randomness & get realiable answers as we're dealing with financial documents.
financial-rag-chatbot/
│
├── data/RAWDATA/ # Raw Excel files
├── src/
│ ├── ingestion/ # File parsing, chunking
│ ├── embeddings/ # Embedding generation
│ ├── vector_store/ # Pinecone connection and upsert
│ ├── chat/ # RAG: retriever, prompt, LLM
│ ├── ui/ # Streamlit chatbot
│ └── utils/ # Config loader, token counter
├── config.yaml # API keys and config
├── requirements.txt # Dependencies
├── main.py # Indexing pipeline
├── README.md # Documentation
git clone https://github.com/prashantcodelab/financial-rag-chatbot.git
cd financial-rag-chatbot
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
open_ai: sk-<your-openai-key>
pinecone: <your-pinecone-api-key>
pinecone_index: financial-rag-index
- Sample excel file is provided. You can use your own excel file and keep it in data/RAWDATA/
python main.py
This step will parse and chunk excel, embed the content and add vectore to Pinecone
streamlit run src/ui/streamlit_app.py
You can ask questions based on your excel file - "What is the total expense ?", "What are the top 5 vendors by payment amount?"
- PDF & CSV support
- Allow file upload directly from chatbot UI
- Role-based chat access
- Latency improvements - Caching
python main.py
- main.py - Runs the document ingestion pipeline. After first run, run only when new files are added.
- Loads config -> config_loader.py
- Parses file -> file_parser.py
- Document chunking -> chunking.py
- Embed chunks -> embedder.py
- PineCone -> pinecone_client.py
- Upserts vectors to Pinecone
main.py
├──→ config_loader.py # loads API keys
├──→ file_parser.py # reads Excel files
├──→ chunking.py # splits into text chunks
├──→ embedder.py # embeds chunks into vectors
└──→ pinecone_client.py # connects and upserts to Pinecone
streamlit run src/ui/streamlit_app.py
- streamlist_app.py - Chatbot
- Loads config - config_loader.py
- Pinecone - pinecone_client.py
- Query embedding - embedder.py
- Retrieves context - context_retriever.py
- Builds prompt - prompt_builder.py
- LLM invocation - llm_interface.py
- Responses in chatbot UI
streamlit_app.py
├──→ config_loader.py # loads API keys
├──→ pinecone_client.py # connects to Pinecone
├──→ embedder.py # embeds the user's query
├──→ context_retriever.py # finds top-k matching chunks
├──→ prompt_builder.py # builds final LLM prompt
└──→ llm_interface.py # gets response from OpenAI
Source for Sample Data File - https://www.nasdaq.com/market-activity/stocks/saia/sec-filings





