-
Notifications
You must be signed in to change notification settings - Fork 0
Documentation
Material posted here is copyrighted and may not be sold or distributed without permission of the respective copyright holder.
The links below take you to PDF download.
The following materials appeared in IEEE publications, and each carries an IEEE copyright designation. Papers may not be sold or distributed further without written permission of the IEEE.
An Overview of the Tesseract OCR Engine
Hybrid Page Layout Analysis via Tab-Stop Detection
Adapting the Tesseract Open Source OCR Engine for Multilingual OCR
©ACM, 2009. This is the authors’ version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the International Workshop on Multilingual OCR 2009, Barcelona, Spain July 25, 2009. https://dl.acm.org/citation.cfm?id=1577804
- Ray Smith Publications
- The extraction and recognition of text from multimedia document images by Smith, R.W. (Ph.D. thesis), 1987
- Slides from Tutorial on Tesseract presented at DAS2014
There are manual pages for tesseract tools available in svn:
- ambiguous_words
- cntraining
- combine_tessdata
- dawg2wordlist
- mftraining
- shapeclustering
- tesseract
- unicharset_extractor
- wordlist2dawg
plus description of unicharambigs files
Documentation of tesseract generated from source code by doxygen can be found on tesseract-ocr.github.io
- Video PhotoTechEDU Day 11: Document Image Analysis with Leptonica
- Training Tesseract for Ancient Greek OCR by Nick White
- Shirorekha Chopping Integrated Tesseract OCR Engine for Enhanced Hindi Language Recognition by Nitin Mishra, C. Patvardhan, C. Vasantha Lakshmi, Sarika Singh
- Report on the comparison of Tesseract and ABBYY FineReader OCR engines by Heliński, Kmieciak, and Parkoła
- The hOCR Embedded OCR Workflow and Output Format (hOCR specification)
- Text Detection on Nokia N900 Using Stroke Width Transform (with source code)