Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PDF Reading Capability #92

Open
gromdimon opened this issue Jan 19, 2025 · 0 comments
Open

Add PDF Reading Capability #92

gromdimon opened this issue Jan 19, 2025 · 0 comments
Assignees
Labels
feature New feature or request
Milestone

Comments

@gromdimon
Copy link
Contributor

gromdimon commented Jan 19, 2025

Problem Statement

Is your feature request related to a problem? Please describe.

Currently, the Nevron framework lacks the ability to extract and analyze content from PDFs. This limitation prevents the agent from processing important documents, research papers, and other PDF-based sources, which are common in many workflows.


Describe the solution you'd like

Add functionality to the framework to read and extract content from PDF files. The feature should enable the agent to process PDF documents, analyze the text, and integrate the extracted data into workflows such as memory storage, action planning, or contextual analysis.

Proposed Solution

Proposed Implementation Steps:

  1. Add a PDF Processing Utility:

    • Use a Python library like PyPDF2, pdfplumber, or PyMuPDF for PDF text extraction.
    • Create a new Execution tool:
      • Extract text from single-page and multi-page PDFs.
      • Handle PDFs with complex layouts (e.g., multi-column, images).
      • Manage encrypted PDFs by saying, that PDF must be unencrypted.
  2. Configuration Options:

    • Allow users to configure PDF processing settings in settings.py, such as:
      • Maximum file size for PDFs.
      • Page range selection.
      • Enable/disable image-based OCR for non-text PDFs.
  3. Error Handling:

    • Gracefully handle errors like:
      • Corrupted or unsupported PDF files.
      • Failed text extraction due to complex layouts or encryption.
    • Log detailed error messages for debugging.
  4. Unit Tests:

    • Write unit tests to validate PDF extraction functionality using sample PDFs:
      • Text-only PDFs.
      • PDFs with images and text.
      • Encrypted PDFs.
      • PDFs with complex layouts.
  5. Security:

    • WE need to check the audio first for any malware

Additional Context

Additional Context

  • Suggested utility function for extracting text:
    import pdfplumber
    
    def extract_text_from_pdf(file_path: str) -> str:
        """
        Extract text from a PDF file.
    
        Args:
            file_path (str): Path to the PDF file.
    
        Returns:
            str: Extracted text.
        """
        try:
            with pdfplumber.open(file_path) as pdf:
                text = ""
                for page in pdf.pages:
                    text += page.extract_text()
                return text
        except Exception as e:
            raise RuntimeError(f"Failed to extract text from PDF: {e}")
  • Example use case:
    • A user uploads a research paper PDF. The framework extracts the content and uses it to update the agent's memory or plan actions based on the insights.
@gromdimon gromdimon added the feature New feature or request label Jan 19, 2025
@gromdimon gromdimon added this to the v0.2.0 milestone Jan 19, 2025
@gromdimon gromdimon modified the milestones: v0.2.0, v0.3.0 Jan 24, 2025
@gromdimon gromdimon modified the milestones: v0.3.0, v0.2.2 Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants