Subword-based-Text-Recognition

Motivations are as follows:

We can regard the long word as a combination of several sub-elements, which may be beneficial for the lexicon-free case that is more flexible in real life where word component is free from lexicon.
We can relieve the pressure of memory by LSTM, i.e., LSTM only need to remember the root and affix rather than the big number of whole-words requiring lots of samples, and is friendly to gradient propagation.

Some examples which show the weakness of word-based methods like CRNN (suffered from their memory on the lexicon in trainset):

aa------u----t--o---l--d----e---n---t--ii-f--y---- => autoldentify

i---d----ee--n---t--ii-f--y---- => identify

m------a--t-h--l--y--p-e---- => mathlype

t----y--p--e--- => type

cc-------a----r---e----y---o-----v----v----e---r---k---d---ll-llo----w-----hh--f-- => careyovverkdllowhf

v------e---r---k-- => verk

v------e----f--- => vef

h------o----o---o----c----o---o----o----c--k----- => hooocooock

0--------0-----0-----0-----0-----0-----0-----0-----0-----0------ => 0000000000

g-----o---o---o---c---o---o---o---o--g---l--e--- => gooocoooogle

l----u--u---u--d---a--g---a--y--u---c--k--- => luuudagayuck

y-----o--u---d----d---a---d--- => youddad

g------i--v--e--l-l---e---f---i--v--e--- => givellefive

e----mm------e--- => eme

The memory got by LSTM can help to recognize some ambiguous character (e.g., l, o) using the context in one hand, but in other hand limit the ability to recognize out-of-vocabulary words which have somewhat different compositional styles from words in training set.

So we decide to use subword-based method to make RNN only care about the sub-region of a whole word to approach the problem above. It can probabaly reduce the needs of large amount of training data since a word in our method should consist of subwords, showing more intelligence in processing text line.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
imgs		imgs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Subword-based-Text-Recognition

About

Releases

Packages

ccx1997/Subword-based-Text-Recognition

Folders and files

Latest commit

History

Repository files navigation

Subword-based-Text-Recognition

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages