ScannerAI is a Python application that processes retail receipts using computer vision and AI to extract, classify and analyze receipt data. It features a graphical user interface for viewing and editing receipt information with support for COICOP (Classification of Individual Consumption According to Purchase) code classification.
- Receipt Processing: Extract text and structured data from receipt images and PDFs
- Item Classification: Automatically classify items using COICOP codes
- Interactive GUI: View, edit, and manage receipt data
- Multiple OCR Options: Support for multiple OCR models including Gemini and OpenAI
- Batch Processing: Process multiple receipts from a folder
- Export Options: Save processed data in JSON or CSV formats
- Progress Tracking: Visual progress tracking for batch operations
1. Clone the repository
git clone https://github.com/datasciencecampus/receipt_scanner.git
cd receipt_scanner
2. Set up virtual environment
You are strongly recommended to install resources into a virtual environment.
python -m venv scanner-venv
source scanner-venv/bin/activate # source scanner-venv/Scripts/activate on Windows
python -m pip install --upgrade pip
python -m pip install .
[!NOTE] If you intend doing any development work, please install the package as editable and with the
dev
optional dependencies:python -m pip install -e ".[dev]"
Moreover, once you have installed the package, please install the pre-commit hooks. These hooks help us to ensure repository security and a consistent code style.
3. Install the required dependencies
pip install -r requirements.txt
4. Set up configuration
Create a configuration file (e.g., config.txt
) at the folder 'scannerai/_config/' with the following parameters:
DEBUG_MODE=False
ENABLE_PREPROCESSING=False
SAVE_PROCESSED_IMAGE=False
ENABLE_PRICE_COUNT=False
OCR_MODEL=3
CLASSIFIER_MODEL_PATH=/path/to/your/trained/model
LABEL_ENCODER_PATH=/path/to/your/label/encoder
GEMINI_API_KEY_PATH=/path/to/your/gemini/api/key
OPENAI_API_KEY_PATH=/path/to/your/openai/api/key
GOOGLE_CREDENTIALS_PATH=/path/to/your/google/credentials
Rember to update the path config = Config("/path/to/your/config.txt") in 'scannerai/_config/config.py'.
There is an example of 'scannerai_config.txt' for your reference.
Here is the descriptioin of the configuration parameters:
DEBUG_MODE
: Enable/disable debug loggingENABLE_PREPROCESSING
: Enable/disable image preprocessing (not implemented at this moment, please set as False)SAVE_PROCESSED_IMAGE
: Save processed imagesENABLE_PRICE_COUNT
: Enable token counting for API pricingOCR_MODEL
: Select OCR modelCLASSIFIER_MODEL_PATH
: Path to trained classification model (if no trained model is available, please remove this parameter. The COICOP and confidence columns will display None)LABEL_ENCODER_PATH
: Path to label encoder (This should be generated together with the trained model. If not available, please remove this parameter.)GEMINI_API_KEY_PATH
: Path to Gemini API key file (only required if OCR_MODEL is set as 3)OPENAI_API_KEY_PATH
: Path to OpenAI API key file (only required if OCR_MODEL is set as 1 or 2)GOOGLE_CREDENTIALS_PATH
: Path to Google Cloud credentials (only required if OCR_MODEL is set as 3)
The application requires API keys for OCR services:
- Gemini API key (optional if Google's Gemini is selected as OCR_MODEL)
- OpenAI API key (optional if OpenAI API is selected as OCR_MODEL)
- Google Cloud credentials (optional if Google's Gemini is selected as OCR_MODEL)
Store API keys in separate files and update paths in the configuration.
We put a trained model as an example in receipt_scanner/src/scannerai/classifiers/trainedModels/, where you can set LRCountVectorizer.sav for CLASSIFIER_MODEL_PATH and encoder.pkl for LABEL_ENCODER_PATH.
The above model is trained based on Logistic Regression (LR) using a popular feature extraction method, Countvectorizer implemented in Scikit-learn Python package.
streamlit run scripts/lcf_receipt_entry_streamlit.py
- Click "Browse files" to select receipt images/PDFs
- Click "Process Uploaded Files" to process the files
- Edit shop name, total amount, payment mode and items including item name, price, or COICOP if needed
- Add/delete items to the receipt if needed
- Navigate between receipts using Previous/Next buttons
- Save processed data in JSON or CSV format
Here is an example using Google's Gemini model to take image or pdf as input and output a dictionary of shop name, items and their prices, total amount and payment methods.
import json
import os
from scannerai.ocr.lcf_receipt_process_gemini import LCFReceiptProcessGemini
processor = LCFReceiptProcessGemini()
image_pathfile = os.path.join('/path/to/your/image.jpg')
result = processor.process_receipt(image_pathfile)
print(json.dumps(result, indent=2))
lcf_receipt_entry_streamlit.py
: Main GUI application using streamlitlcf_receipt_process_gemini.py
: Receipt processing using Gemini Vision APIlcf_receipt_process_gpt4vision.py
: Receipt processing using OpenAI model gpt-4o-minilcf_receipt_process_openai.py
: Receipt processing using Tesseract to extract string from an image and then input to OpenAI model gpt-3.5-turbolcf_classify.py
: COICOP classification implementationscanner_utils.py
: Utility functionsconfig.py
: Configuration management
See requirements.txt
for detailed dependencies.
This repository contains a configuration of pre-commit hooks. These are language agnostic and focussed on repository security (such as detection of passwords and API keys). If approaching this project as a developer, you are encouraged to install and enable pre-commits
by running the following in your shell: 1. Install pre-commit
:
```
pip install pre-commit
```
-
Enable
pre-commit
:pre-commit install
Once pre-commits are activated, whenever you commit to this repository a series of checks will be executed. The pre-commits include checking for security keys, large files and unresolved merge conflict headers. The use of active pre-commits are highly encouraged and the given hooks can be expanded with Python or R specific hooks that can automate the code style and linting. For example, the
flake8
andblack
hooks are useful for maintaining consistent Python code formatting.
NOTE: Pre-commit hooks execute Python, so it expects a working Python build.
At the Data Science Campus we apply data science, and build skills, for public good across the UK and internationally. Get in touch with the Campus at [email protected].
The code, unless otherwise stated, is released under the MIT Licence.
The documentation for this work is subject to © Crown copyright and is available under the terms of the Open Government 3.0 licence.