Skip to content

deepjijhuvadia/NexaSync--AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NexaSync AI - A Conversational PDF Chatbot

This Streamlit application, NexaSync AI, empowers you to interact with the content of your PDF files in a natural, conversational way. By leveraging AI technology, it enables you to ask questions and receive comprehensive responses derived from the information within the PDFs you upload.

How it Works

NexaSync AI operates in a multi-step process to facilitate a conversational interaction with your PDFs:

1. PDF Processing and Vectorization:

  • When you upload one or more PDFs using the file uploader, the application goes through these steps:
    • Reading PDFs: It utilizes the PyPDF2 library to extract text content from each uploaded PDF file.
    • Text Chunking: The extracted text is then broken down into manageable chunks using the langchain.text_splitter.RecursiveCharacterTextSplitter. This helps optimize the processing and retrieval process.
    • Vectorization: Each text chunk is converted into a numerical representation (vector) using the SpacyEmbeddings class from the langchain_community.embeddings module. This vector captures the semantic meaning of the text chunk.
    • FAISS Database Creation: Finally, these vectors are stored in a FAISS (Fast Approximate Nearest Neighbor Search) database using the FAISS.from_texts function. This database serves as an efficient index for retrieving relevant information later.

2. User Interaction and Conversational Response Generation:

  • Once processing is complete, a text input field appears where you can ask your questions.
  • When you type a question, the following happens:
    • Retrieval: The application uses the FAISS database to find the text chunks within the uploaded PDFs that are most semantically similar to the words and phrases in your question. This retrieval is facilitated by the invoke method within FAISS.
    • Context Building: The retrieved text chunks are then combined to create a contextual understanding of the information relevant to your question.
    • AI21 API Interaction: NexaSync AI utilizes the AI21 Studios API to generate a response to your question. It constructs a prompt that incorporates the context built from retrieved text chunks and your question itself. The AI21 J2 Jumbo Instruct model then processes this prompt and generates a human-like response that addresses your query.
    • Response Display: The generated conversational response is presented back to you in the application. I'd be glad to create an improved README.md file incorporating the strengths of both Response A and Response B, addressing their shortcomings, and considering the insights from the ratings:

Installation

  1. Prerequisites:

  2. Create a virtual environment (recommended):

    • This helps isolate project dependencies and avoid conflicts:
      python3 -m venv venv
      source venv/bin/activate  # Windows: venv\Scripts\activate.bat
  3. Install required packages:

    • Activate your virtual environment (if created).
    • Install the necessary libraries using pip:
      pip install streamlit spacy langchain langchain-community faiss-cpu dotenv requests
    • Install spacy en_core_web_sm
      python -m spacy download en_core_web_sm

Usage

  1. Obtain an AI21 API Key:

    • Create an account on the AI21 Studios platform (https://www.ai21.com/studio) and acquire an API key for the J2 Jumbo Instruct model.
    • Replace the placeholder value ("S39YXKUEt3xEbGIOdYnwisHEjfqfcTo0") in the code with your actual API key.
  2. Run the application:

    • Navigate to the project directory in your terminal.

    • Execute the following command to start the Streamlit app:

      streamlit run main.py
    • This will open NexaSync AI in your web browser, typically at http://localhost:8501.

Functionality

  1. Upload PDF Files:

    • In the sidebar, use the file uploader to select one or more PDF documents you want to interact with. Click the "Submit & Process" button to initiate the processing step.

    • Note: The application will create a FAISS database to store vector representations of text chunks extracted from the uploaded PDFs. This database is essential for retrieving relevant information later. The processing step might take some time depending on the size and number of your PDFs.

  2. Ask Questions:

    • Once processing is complete, a text input field will appear in the main area. This is where you can pose your questions about the content within the uploaded PDFs.

    • Example: You could ask something like: "What are the key findings of this research paper?" or "What are the company's financial performance metrics for the past year?"

  3. Get Conversational Responses:

    • As you type your query, NexaSync AI will analyze the PDFs and retrieve the most pertinent sections or passages using the FAISS database.

    • It will then leverage the AI21 API to formulate a response that addresses your question in a conversational and informative manner, drawing upon the retrieved information.

Additional Considerations

  • Accuracy: The accuracy and relevance of the responses depend on the quality of the uploaded PDFs, the clarity of your questions, and the effectiveness of the underlying AI models.

  • Error Handling: The code incorporates error handling mechanisms to gracefully address potential issues with the AI21 API or the processing pipeline. However, unexpected errors might still occur. In such cases, the application will provide an error message or a generic response.

  • Performance Optimization: For very large PDFs or numerous documents, consider techniques like chunking the text into smaller parts for processing or exploring alternative vector storage methods.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages