Query PDF is a voice-powered AI RAG (Retrieval-Augmented Generation) application π€ designed to simplify working with PDFs π. Users can upload documents and interact via voice commands π£οΈ, receiving accurate summaries and real-time responses β‘.
We built this solution to address common challenges people face with large, complex documents π. Traditional search tools can be limiting and are often inaccessible for individuals with disabilities . By integrating RAG and voice technology π€, we aimed to create an app that lets users interact with documents naturally, using conversation π¬.
- Individuals with Visual Impairments or Learning Disabilities : Benefit from having documents read aloud and interacting using voice commands, promoting accessibility.
- Business Professionals π: Work with lengthy contracts, proposals, or reports and require a fast, accessible way to review documents.
- Multitaskers πΌ: Engage with documents hands-free, listening to summaries or searching documents while focusing on other tasks.
- Students and Researchers π§βπ: Need to extract and interact with large volumes of information from academic PDFs, reports, or textbooks quickly.
RAG ensures the app provides accurate, relevant answers by retrieving specific data from PDFs π and generating real-time voice summaries π£οΈπ. This reduces errors, making the app a trustworthy tool for users needing precise document-based information β .
The app combines voice interaction ποΈ with RAG technology π οΈ to offer an easy, hands-free way to explore PDFs. Itβs particularly helpful for users who may find traditional document navigation challenging, such as those with visual impairments π or those who prefer voice over reading π.
The app is set to transform how people engage with digital documents . By providing voice-driven summaries π and search π, students, professionals, and individuals with accessibility needs can easily access key information without manually scrolling through long PDFs β³.
The app is designed to be simple and accessible . Users upload a PDF, use voice commands to interact with content π€, and receive voice-based responses π£οΈ. Itβs intuitive and user-friendly, with no technical skills required π»π«.
- JavaScript
- Java
- .NET
- Python
- AI Studio
- AI Search
- PostgreSQL
- Cosmos DB
- Azure SQL
- Next.js
- React
- JavaScript
- Hugging Face
- Pinecone
- OpenAI API Key
The app begins with a Landing Page that welcomes users. To start using the app, click the "Start to PDF Now" button, which navigates you to the page where you can upload a PDF document.
On the homepage, users can explore the features app's three main features by clicking "Features" tab:
- PDF Summary: Automatically generates a summary of the uploaded PDF.
- Ask Questions: Allows users to ask specific questions about the PDF content.
- Voice Chat: Engage in a voice-based conversation to send messages and interact with the PDF content.
By clicking on the "Meet the Team" section from the homepage, users can view the GitHub repositories of the contributors involved in building the app.
Hereβs an example of user interaction:
- After clicking "Start PDF Chat Now", the user uploads a PDF file.
- The app generates a summary of the uploaded document (e.g., a hackathon PDF).
- The user can then prompt the chatbot (e.g., "When is submission due?"), and the bot will scan the document to respond accordingly.
-By integrating RAG, this app ensures high-quality, context-aware interactions with PDF documents, enhancing the overall user experience.
-
Ayesha Adna Abdullah
GitHub: mahmoodayesha -
Abdullah K.
GitHub: abdullah-k18
To enhance the app's effectiveness and inclusivity, additional research and development can focus on the following areas:
Conduct studies to assess and refine the voice interaction feature for users with various disabilities, including:
- Speech Impairments: Tailor voice recognition and response features to better accommodate users with speech disabilities.
- Hearing Impairments: Ensure that voice commands and responses are accessible and clear, possibly integrating text-to-speech and speech-to-text functionalities.
Perform detailed usability studies to evaluate how individuals with cognitive, visual, or physical impairments interact with the app. This can include:
- Cognitive Impairments: Simplify interactions and improve the clarity of instructions and feedback.
- Visual Impairments: Enhance compatibility with screen readers and ensure that visual elements are accessible.
Improve natural language processing (NLP) capabilities to handle diverse speech patterns, accents, and speeds. Research could focus on:
- Accent and Dialect Recognition: Adapt the app to accurately understand and respond to various accents and dialects.
- Contextual Understanding: Enhance the appβs ability to comprehend and generate relevant responses based on contextual nuances in user queries.
By addressing these research areas, the app can become more inclusive, user-friendly, and effective for a broader range of users.