-
Notifications
You must be signed in to change notification settings - Fork 2
2. Optical Character Recognition
Saumya Shah edited this page Aug 14, 2018
·
1 revision
Consider the above-mentioned image and constituent cropped entries.
The text generated for GARRETT Peggy is
GARRETT Peggy.
Effects under £600.
22 August. The Will of Peggy Garrett late of 33 Addison-
road-North Notting Hill in the County of Middlesex Widow
who died 30 July 1873 at 33 Addison-road-North was proved
at the Principal Registry by Robert Henry Hoar of
3 Campden-hill-gardens Notting Hill Tobacconist the sole
Executor.
As you can observe, a great deal of this text is raw and distorted due to the quality of the image. This phase will do a raw OCR and clean it so that it is suitable for the subsequent phase to learn on these text entries.
Cleaned OCR would be
GARRETT Peggy. Effects under £600. 22 August. The Will of Peggy Garrett late of 33 Addison-road-North Notting Hill in the County of Middlesex Widow who died 30 July 1873 at 33 Addison-road-North was proved at the Principal Registry by Robert Henry Hoar of 3 Campden-hill-gardens Notting Hill Tobacconist the sole Executor.
- OCR output depends on each image. Since the image is not perfectly processed to remove any noise, a few stray characters may be added and some characters may be misjudged during OCR.
To take a look at the implementation code, usage and output sample, click here.