Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

financial numbers reading #120

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

kpanuragh
Copy link

This language data file support financial numbers like (5,555) 12,555

This language data file support financial   numbers like (5,555) 12,555
@Shreeshrii
Copy link
Contributor

Make your PR for https://github.com/tesseract-ocr/tessdata_contrib instead of this.

Give comparative data about how and why this is better than existing data.

@kpanuragh
Copy link
Author

kpanuragh commented Apr 11, 2019

Reading financial numbers is not that much accurate on normal eng.traindata, When reading data using normal eng.traindata it read numbers like

(4)

as

D

. I just find the issue on my personal project and it's not only in (4) also other numbers like (16) 1 etc, Some time the issue is present on large number also, so I just try to train model from eng.traindata, It only includes the financial number format(0123456789(),.) .

@zdenop
Copy link
Contributor

zdenop commented Apr 13, 2019

@anuraghkp1 : as @Shreeshrii point out: can you please make pull request to tessdata_contrib repository?
And provide some details for it (e.g. regarding training - something like khmLimon.md for khmLimon.traineddata

@Shreeshrii
Copy link
Contributor

@stweil can the PR be applied to tessdata_contrib directly? It will be useful to other users too.

E.g. see request on Shreeshrii/tessdata_shreetest#9

@stweil
Copy link
Contributor

stweil commented Jun 4, 2019

can the PR be applied to tessdata_contrib directly

Yes, that is possible. @anuraghkp1, could you please describe how you generated the new traineddata? Ideally it should be possible to reproduce your training process.

@stweil
Copy link
Contributor

stweil commented Jun 9, 2019

@anuraghkp1, ping.

@stweil
Copy link
Contributor

stweil commented Jun 13, 2019

@anuraghkp1, do you have some examples where the new model is superior to existing models? Would a white list of expected characters with an existing model achieve the same results as your model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants