Author: Nathan Aday / [email protected]
https://github.com/nathanaday/RealTime-OCR
Perform text detection in a variety of languages with your computer webcam using Google Tesseract OCR and OpenCV. This script achieves a real-time OCR effect by incorporating multi-threading.
The CV2 video stream is instantiated in a dedicated class, in a dedicated thread, so it's always reading live frames from the webcam and storing the most recent frame as an instance attribute. The video display can access these frames and show them in real-time. Meanwhile, pytesseract OCR has its own dedicated class, running in a dedicated thread, and it simply grabs the most recent frame from the video stream class, processes it, and outputs the data. The result is a capable real-time OCR. True, the bounding boxes might lag if the text is moved around quickly, and sometimes it needs a moment to detect the text, but this is an astronomical improvement from the ultra-slow, bottlenecked video stream you get from processing in a single thread.
This a command-line script. The only required argument is a full path to the Tesseract executable from the Tesseract install (see DEPENDENCIES below for more info)
python Main.py -t '<full_path_to_your_tesseract_executable>' [-c ] [-v] [-sv] [-l] [-sl] [-s]
optional arguments:
-h, --help command-line argument help message
-c , --crop crop OCR area in pixels (two vals required): width height
-v , --view_mode view mode for OCR boxes display (default=1)
-sv, --show_views show the available view modes and descriptions
-l , --language code for tesseract language, use + to add multiple (ex: chi_sim+chi_tra)
-sl, --show_langs show list of tesseract (4.0+) supported langs
-s, --src. SRC video source for video capture
required named arguments: -t , --tess_path path to the cmd root of tesseract install (see docs for further help)
The crop area allows OCR to be performed on a smaller frame and therefore improves speed. A box is automatically drawn around the crop so it's clear where to position text for detection.
This script implements four view modes, which stylize the way text is detected. To specify a view mode, use -v after the Main.py call
(View mode 1: Draws boxes on text with >75 confidence level)
(View mode 2: Draws red boxes on low-confidence text and green on high-confidence text)
(View mode 3: Color changes according to each word's confidence; brighter indicates higher confidence)
View mode 4: Draws a box around detected text regardless of confidence
If no view mode is specified, the OCR will run with mode 1.
To see the view options and their descriptions in the command line, evoke -sv or --show_views
In the case of multiple camera ports, the src for the desired video input can be specified with the -s command-line arguemnt. Without specification, the src defaults to 0, which for most users is a built-in webcam.
Using SRC source 0:
python Main.py -t '<full_path_to_your_tesseract_executable>' -s 0
Using SRC source 1:
python Main.py -t '<full_path_to_your_tesseract_executable>' -s 1
Tesseract can detect a variety of langauges since version 4+. A language can be specified to the OCR by appending the Main.py call with "-l "
For example, to detect simplified Chinese use:
python Main.py -t '<full_path_to_your_tesseract_executable>' -l chi_sim
Multiple languages can be simultaneously detected by appending the codes with '+'. To detect both simplified chinese and traditional chinese, use:
python Main.py -t '<full_path_to_your_tesseract_executable>' -l chi_sim+chi_tra
A list of all language codes can be printed in the command line by evoking '-sl'.
python Main.py -t '<full_path_to_your_tesseract_executable>' -sl
Note, the printed list of available langauges comes from the tesseract supported languages, which should be included in an up-to-date install. However, evoking the lagnauge code at runtime will have no effect if the .traindata file for that language is nowhere in your Tesseract files.
If no language code is specified, the OCR defaults to English.
While running an OCR stream, push "c" to capture the current frame and save as a .jpeg to the working directory. A capture will also print the current detected text to the command line:
RealTime-OCR user$ REAL TIME OCR with pytesseract and CV2 “Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated.” OCR 2021-04-09 at 13:06:35-5.jpg
RealTime-OCR user$ 实时 OCR 跟 pytesseract, CV2 优美 胜 于 丑陋 , 显 明 胜 于 隐 含 。 简单 胜 于 复杂 , 复 杂 胜 于 繁复 。 扁平 胜 于 , 稀 胜 于 密集 。 可 读 性 会 起 作用 。 OCR 2021-04-09 at 12:59:54-10.jpg
This script requires:
- a Tesseract install
- the python wrapper for Teseract (pytesseract)
- OpenCV for python (CV2)
- Numpy
Instruction on how to install Tesseract on your OS are located here:
https://tesseract-ocr.github.io/tessdoc/Installation.html
To use the script, you will need the path to the exec file included in the Tesseract install, so note the install's location.
Example (on a Homebrew install): '/usr/local/Cellar/tesseract/4.1.1/bin/tesseract'
To install the Python wrapper for Tesseract, use:
pip install pytesseract
See the pytesseract docs for further help.
To install OpenCV for Python, use:
pip install opencv-python
See the opencv-python docs for further help.
For numpy, use:
pip install numpy
See the numpy docs for further help.
Tesseract should come with .traindata files that supports a wide variety of foreign languages. In any case, the repo for language files can be found here:
https://github.com/tesseract-ocr/tessdata
Main.py
- Command-line argparser and call to the OCR stream
OCR.py
- All classes and functions for multi-threaded text detection and webcam display
Linguist.py
- A few functions for handling the language codes and converting them to full language names (reads from Tesseract_Langs.txt)
Tesseract_Langs.txt
- Text file for every supported language code and the language's full name
README.md
requirements.txt
This script was written with customizability in mind. It's easy to add custom view modes, edit the pre-processing frames for the OCR, or customize the output displayed in the video capture. These changes can be made in OCR.py. To add custom command line arguments, see Main.py.
OpenCV is an incredbily robust computer vision package. Because this script already imports CV2, the OCR core could be swapped for CV2's facial recognition features, boundary detection, etc. and still achieve the seamless video display from multi-threading.
Tesseract has two additional data sets that can be configured: a fast dataset, and a best dataset.
The fast data will speed up the OCR process, but at the cost of accuracy.
THe best data is trained to produce more accurate detection, but at the cost of speed.
Questions and bugs can be posted on the project's github page or emailed to [email protected]