Attempting to read portions of a screen or a video and exporting its data in a CSV file.
Blazingly slow, but it kinda works
- Real time recognitions of numbers of the screen.
- Offline number recognition in a video file.
- Works on easily configurable areas, as many as one wants.
- Integrated Tesseract OCR and EasyOCR.
- Easy to add new OCR methods (see
src/ocr.py
). - Dead simple to use!
This has been mainly developed and tested on Ubuntu 22.04, with Python 3.10.
# Install Python version >= 3.10 (necessary on lower Ubuntu versions)
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.13 python3.13-venv python3.13-tk
# Install dependencies
sudo apt install tesseract-ocr libtesseract-dev
git clone ... && cd ocr_motor
python3.13 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Optional additional OCR methods
pip install easyocr
source .venv/bin/activate
python gui.py
- Known OCR issues:
- Characters confusion, depending on the font:
0
and8
,1
and7
,5
and9
. - Missing dot (e.g.
42.42
turned into4242
).
- Characters confusion, depending on the font:
- Tips to improve OCR reliability:
- Upscale the detected area, to get a better characters resolution.
- Use min and max bounds to filter outliers out.
- Don't trust the OCR output too much. Potentially implement post-filtering based on knowledge of the recorded data. For example if measuring a variable that can only evolve slowly, big jumps in the output value can be marked as outliers and discarded.
- When processing a video, enabling the preview can induce up to 20% overhead.
- EasyOCR requires PyTorch and Scipy, so isn't lightweight. The first time the program is started, it will download necessary model weights (stored in
~/.EasyOCR/model
). See more details on the EasyOCR GitHub (link). With this application, it seems that EasyOCR is slower than Tesseract.
This work is merely a wrapper and a graphical interface for some already existing OCR implementations. It heavily uses Tkinter and CustomTkinter for the interface.
- OpenCV (MIT license): image processing.
- CustomTkinter (MIT license): beautiful interface and GUI.
- Tesseract (Apache 2.0): OCR API.
- Tesserocr (MIT license): Python wrapper for Tesseract.
- EasyOCR (Apache 2.0): another OCR API.
- Loading and processing videos
- Saving/loading configuration
- Multi threading, for less blazing slowness
- Make it easier to add new OCR methods, and documenting it
- Logging not only in the Python terminal, but also in the logging text box
- Real time graphing