Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vedant Raikar RAG using ai Planet GenAI stack #12

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vedantRaikar
Copy link

Nodes:

PyPDFLoader (id: PyPDFLoader-SLNUq): This node loads a PDF document using the pypdf library.
RecursiveCharacterTextSplitter (id: RecursiveCharacterTextSplitter-Nzmv0): This node splits text into chunks of a specified length.
Chroma (id: Chroma-ihv1X): This node represents a Chroma vector store. It includes configuration options for the collection name, persistence, and embedding.
RetrievalQA (id: RetrievalQA-GwM0k): This node implements a question-answering chain against the Chroma vector store.
CombineDocsChain (id: CombineDocsChain-XGMGK): This node combines documents from different sources.
ConversationBufferMemory (id: ConversationBufferMemory-A4PN1): This node stores conversation history for a chatbot.
HuggingFaceHub (id: HuggingFaceHub-E4Iou): This node interacts with a Hugging Face Hub model.

@vedantRaikar
Copy link
Author

Project Components
The workflow consists of several key nodes, each playing a specific role:

PyPDFLoader : This node utilizes the pypdf library to load and process PDF documents. It extracts text content from the PDF for further analysis.

RecursiveCharacterTextSplitter (id: RecursiveCharacterTextSplitter-Nzmv0): This node takes text input and splits it into smaller chunks of a predetermined character length. This can be useful for processing large documents or tailoring text to specific model requirements.

Chroma : This node represents a Chroma vector store. Chroma is a service for storing and retrieving dense vector representations of data. The configuration options within this node specify details like the collection name, persistence settings, and embedding configuration.

RetrievalQA : This node performs question answering by retrieving relevant information from the Chroma vector store. It likely leverages a question answering model to analyze the user's question and retrieve corresponding passages from the stored document vectors.

CombineDocsChain : This node offers the functionality to combine documents from various sources. While its exact role in this workflow might require further investigation, it suggests the potential for incorporating information from multiple documents during the question answering process.

ConversationBufferMemory : This node functions as a memory buffer for chatbot interactions. It stores the conversation history, allowing the model to consider previous user queries and context when responding to new questions.

HuggingFaceHub : This node interacts with a model hosted on Hugging Face Hub. Hugging Face Hub is a platform for sharing and accessing pre-trained machine learning models. The specific model used in this workflow is likely a question answering model trained on relevant data.

Workflow Execution
While the specific connections between these nodes are not explicitly provided, we can infer the general workflow:

Document Processing: The PyPDFLoader ingests a PDF document and extracts its text content.
Text Preprocessing: The RecursiveCharacterTextSplitter might further process the extracted text by splitting it into smaller chunks.
Document Embedding: The preprocessed text is likely transformed into vector representations suitable for the Chroma vector store.
Chroma Storage: The generated vector representations are stored within the Chroma collection.
Question Analysis: When a user asks a question, the RetrievalQA node analyzes it using the Hugging Face Hub model.
Information Retrieval: Based on the question analysis, the RetrievalQA node retrieves relevant document vectors from the Chroma store.
Answer Generation: Using the retrieved document vectors and potentially the conversation history stored in the ConversationBufferMemory, the question answering model formulates a response to the user's query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant