Skip to content

Parse PDFs using computer vision, layout analysis, and other state-of-the-art document intelligence techniques. WebApp implemented in Flask/Jinja2 with infer and train pipelines managed by FlorDB

License

Notifications You must be signed in to change notification settings

ucbepic/pdf_parser

Repository files navigation

PDF Parser

This project presents a Flask-based web application with a focus on user interface and optional AI integration. The primary command, make run, initiates the web server and provides access to the core functionalities. Advanced users can optionally enhance the application by training a model or updating it with the best version.

Running the Web Application

Prerequisites

  • Python 3.x
  • Flask
  • Other dependencies in requirements.txt

Quick Start

To quickly start the web application:

git clone [email protected]:ucbepic/pdf_parser.git
cd pdf_parser
make install
make run

This command sets up the environment and launches the Flask web server, ready for use.

Storing PDFs for Processing

For privacy and organization, this application processes PDFs stored in a specific directory: app/static/private/pdfs. This directory is excluded from version control via .gitignore to ensure privacy and data security.

Optional AI Integration

Training the Model

For users interested in AI functionalities:

  • Train the model with:
    make train

Updating the Model

  • Update the repository with the best model using:
    make model.pth
    This command enhances the application's AI capabilities by using the most effective model.

Cleaning Up

Remove generated files and clean up:

make clean

Project Structure

  • run.py: Flask application entry point.
  • get_best_ckpt.py: Script to generate model.pth.
  • Makefile: Manages the build, run, and AI integration process.

Contributing

Contributions are welcome. Please use standard fork-and-pull request workflow for any contributions.

License

This project is licensed under the Apache License, Version 2.0

About

Parse PDFs using computer vision, layout analysis, and other state-of-the-art document intelligence techniques. WebApp implemented in Flask/Jinja2 with infer and train pipelines managed by FlorDB

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published