-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
financial numbers reading #120
base: main
Are you sure you want to change the base?
Conversation
This language data file support financial numbers like (5,555) 12,555
Make your PR for https://github.com/tesseract-ocr/tessdata_contrib instead of this. Give comparative data about how and why this is better than existing data. |
Reading financial numbers is not that much accurate on normal eng.traindata, When reading data using normal eng.traindata it read numbers like
as
. I just find the issue on my personal project and it's not only in (4) also other numbers like (16) 1 etc, Some time the issue is present on large number also, so I just try to train model from eng.traindata, It only includes the financial number format(0123456789(),.) . |
@anuraghkp1 : as @Shreeshrii point out: can you please make pull request to tessdata_contrib repository? |
@stweil can the PR be applied to tessdata_contrib directly? It will be useful to other users too. E.g. see request on Shreeshrii/tessdata_shreetest#9 |
Yes, that is possible. @anuraghkp1, could you please describe how you generated the new traineddata? Ideally it should be possible to reproduce your training process. |
@anuraghkp1, ping. |
@anuraghkp1, do you have some examples where the new model is superior to existing models? Would a white list of expected characters with an existing model achieve the same results as your model? |
This language data file support financial numbers like (5,555) 12,555