Intelligent-Document-Processing

This repository is a project that aims to extract data from invoices by making use of YOLOv8 object detection algorithm and OCR libraries that include Tesseract OCR and Paddle OCR. We take a user image as an input and then show OCR generated text as an output.

Step 1:

First, we extract the regions of interest using object detection model by predicting using our pre-trained model.

Step 2:

Once we extract our regions of interest we perform OCR on those particular regions using Tesseract OCR and Paddle OCR. This makes sure we are only extracting the useful information from our invoices.

Step 3:

Display the results.

Jupyter Notebooks

The repository also includes the code used for creating the pre-trained YOLO object detection model inside of the ipynb notebooks.

Demo

Invoice.Process.-.Brave.2023-07-07.00-52-59.mp4

Output

Requirements

Make sure you have the following dependencies installed:

cv2
streamlit
pytesseract
paddleocr
ultralytics
numpy
PIL

You can install these using the requirements.txt file: pip install -r requirements.txt

Usage

Clone the repository: git clone https://github.com/varunpusarla/invoice-processing.git
Change the directory: cd UI
Create a virtual environment: python -m venv env
Activate the virtual environment: .\env\Scripts\activate
Install the required dependencies: pip install -r requirements.txt
Run the Streamlit application: streamlit run app.py

For Tesseract OCR you also need to install its setup which can be found in the following link: https://github.com/UB-Mannheim/tesseract/wiki

Acknowledgements

This project utilizes the YOLO object detection model from the Ultralytics repository. For more information, please refer to the Ultralytics GitHub page.
The PaddleOCR library is used for OCR processing. For more information, please refer to the PaddleOCR GitHub page.
The Streamlit library is used to create the web application. For more information, please refer to the Streamlit documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
UI		UI
.gitignore		.gitignore
Invoice Cropping.ipynb		Invoice Cropping.ipynb
Invoice Training.ipynb		Invoice Training.ipynb
Invoice_Testing.ipynb		Invoice_Testing.ipynb
Invoice_Training.ipynb		Invoice_Training.ipynb
OCR_Implementation.ipynb		OCR_Implementation.ipynb
README.md		README.md
best.pt		best.pt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intelligent-Document-Processing

Step 1:

Step 2:

Step 3:

Jupyter Notebooks

Demo

Output

Requirements

Usage

Acknowledgements

About

Releases

Packages

Languages

varunpusarla/Intelligent-Document-Processing

Folders and files

Latest commit

History

Repository files navigation

Intelligent-Document-Processing

Step 1:

Step 2:

Step 3:

Jupyter Notebooks

Demo

Output

Requirements

Usage

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages