Skip to content

telumletiferum/clci

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CLCI (Count Latin Characters in Image)

This is a simple python project that implements Google's tesseract-ocr and it's python wrapper (pytesseract) in order to count the latin characters from a given image that also contains arabic and/or chinese characters.

Requirements

This script utilizes pytesseract as such you must have Tesseract OCR installed on system. This script also assumes that you have Tesseract added to your PATH. If you don't have it added to PATH for some or other reasons you need to uncomment this line of code in main.py

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

For more info check Tesseract documentation.

pip dependencies can be installed via

pip install -r requirements.txt

Creating a python virtual environment is recommended

Caveats and limitations

Results obtained running this script are really dependent on source image that you are using. The script in it's current version works really well with document-like text as seen with the test images. Technically it should work with images that are a bit more complex however your results might vary from case to case. Unfortunately that's just the reality of using OCR technology.


Universitatea "Dunărea de Jos" Galati

Facultatea de Automatica, Calculatoare, Inginerie Electrica si Electronica

Autor: Naval Cristian

Grupa: 22C22B

Profesor coordonator: Simona Moldovanu

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages