Skip to content

Effortlessly process invoices with AI! This project uses the Llama3.2 Vision Model for OCR, converting invoice images into structured, machine-readable tables. Designed for accountants, it automates data extraction and outputs in tabular format ready for ERP integration, improving efficiency and accuracy in invoice management.

Notifications You must be signed in to change notification settings

yYorky/LlamaOCR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OCR to Tabular Data with Llama3.2 Vision Model

Overview

This project leverages the powerful capabilities of the Llama3.2 Vision Model to extract textual data from images, specifically tailored for invoice processing. Designed with accountants in mind, this tool converts invoices into structured, machine-readable tabular data, enabling seamless integration with accounting ERP systems.

As a former accountant, I deeply understand the challenges of manually processing invoices—it's time-consuming, error-prone, and repetitive. This project is my effort to bridge the gap between manual invoice entry and efficient automation, improving productivity and accuracy for accountants and finance professionals.

Features

  • Highly Accurate OCR: Optimized for extracting text from invoices, including handwritten notes.
  • Table Formatting: Automatically formats extracted data into structured tables.
  • ERP Integration Ready: Outputs data in tabular format, ideal for integration with ERP systems.
  • Interactive Interface: Simple and intuitive Streamlit-based web application with a sidebar for image uploads.
  • Progress Updates: Real-time progress updates during OCR processing.
  • Supports Overlapping Image Stripes: Splits images into overlapping sections for improved accuracy on large invoices.

Sources and Inspiration

This project draws inspiration and insights from:

  • Llama-OCR: A tool that processes documents such as receipts and PDFs containing tables and charts, converting them into Markdown while preserving structure and formatting.
  • LlamaOCR - Building your Own Private OCR System by Sam Witteveen:
    A comprehensive video demonstrating the use of LlamaOCR and its capabilities. It highlights how the Llama 3.2 visual model can convert images and scanned documents into structured Markdown, retaining the formatting of elements like tables, lists, and spreadsheets. Practical tutorials and code snippets are provided in JavaScript and Python, including hands-on examples in a Colab environment.

How It Works

  1. Upload Invoice: Users upload an invoice (JPEG or PNG format) via the sidebar.
  2. Image Preprocessing: The tool splits the image into overlapping horizontal stripes for better accuracy.
  3. Text Extraction: Leverages Llama3.2 Vision Model to process each stripe and extract text.
  4. Tabular Conversion: Aggregates and formats the extracted text into a structured table.
  5. Downloadable Output: Users can download the table in CSV format for easy integration with ERP systems.

Motivation

During my career as an accountant, I spent countless hours manually processing invoices, reconciling numbers, and ensuring data accuracy in ERP systems. While these tasks were critical, they were also tedious and repetitive. This experience inspired me to create a solution that could automate these processes, enabling accountants to focus on more strategic work. It is truly amazing in the open-source world we live in today that I can have the opportunity to work on this directly to solve for a problem I use to face!

Setup and Installation

Prerequisites

Installation Steps

  1. Clone the Repository:

    git clone https://github.com/yYorky/LlamaOCR.git
    cd llamaocr
  2. Install Dependencies:

    pip install -r requirements.txt
  3. Set Up API Key: Create a .env file in the root directory and add your GROQ API key:

    GROQ_API_KEY=your_api_key_here
  4. Run the Application:

    streamlit run app_v6.py
  5. Access the App: Open http://localhost:8501 in your browser.

Usage

  1. Upload Invoice:

    • Use the sidebar to upload an image of an invoice (JPEG or PNG).
    • The uploaded image will be displayed in the sidebar for review.
  2. OCR Processing:

    • The main screen will show the processing progress with real-time updates for each section of the invoice.
  3. Download Results:

    • Once processing is complete, the extracted data will be displayed as a table.
    • Use the "Download Table Output" button to download the data file.

Testing

To test the application:

  1. Use sample invoice images, including those with:
    • Printed text
    • Handwritten notes
    • Complex layouts (e.g., multi-column or detailed itemizations)
  2. Verify that the output table matches the original invoice details.
  3. Test the download functionality.

Contributing

Contributions are welcome! Please open an issue or submit a pull request for suggestions or improvements.

About

Effortlessly process invoices with AI! This project uses the Llama3.2 Vision Model for OCR, converting invoice images into structured, machine-readable tables. Designed for accountants, it automates data extraction and outputs in tabular format ready for ERP integration, improving efficiency and accuracy in invoice management.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages