Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong bounding box coordinates reported (regression from 4.0.0) #2240

Closed
lorenzob opened this issue Feb 14, 2019 · 8 comments
Closed

Wrong bounding box coordinates reported (regression from 4.0.0) #2240

lorenzob opened this issue Feb 14, 2019 · 8 comments

Comments

@lorenzob
Copy link

  • Tesseract Version:

tesseract 4.0.0-279-gec8f
leptonica-1.77.0
libjpeg 6b (libjpeg-turbo 1.2.90) : libpng 1.5.13 : libtiff 4.0.3 : zlib 1.2.7
Found AVX2
Found AVX
Found SSE

  • Commit Number:

b67ff53

  • Platform:

Linux ip-xxx-xx-xx-xxx.eu-west-1.compute.internal 3.10.0-957.5.1.el7.x86_64 #1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 x86_64 x86_64 GNU/Linux

Current Behavior:

Space len is reported as -470

Space len is computed as the distance between the right side of previous box and the left side of the one containing the symbol.

Expected Behavior:

Should be about 17 (as in version 4.0.0 and as from pixel inspection)

Test img:

napoleone_ocr_13

This is a comparison of the app log:

tess-diff.txt

"DEBUG:root:# Spazio troppo breve -470 lo ignoro"

This means "space too short, ignoring it" and the computed distance is -470.

Many other "space len" are reported as negative values.

I think is a new one, not the same as in #2024 and the two referenced issues.

@stweil
Copy link
Member

stweil commented Feb 14, 2019

Could you please try to find out when this regression occurred first, maybe by using git bisect? If 4.0.0 was fine, are there other later revisions which still are fine?

@amitdo
Copy link
Collaborator

amitdo commented Feb 15, 2019

Does this also occur with commit ce88adb?

@lorenzob
Copy link
Author

Same problem with ce88adb.

Later I'll try git bisect to find the commit.

@amitdo
Copy link
Collaborator

amitdo commented Feb 15, 2019

Please try commit 7249571

@lorenzob
Copy link
Author

lorenzob commented Feb 15, 2019

Commit 7249571 works as 4.0.0 (tesseract 4.0.0-29-g7249)

Here is a comparison of the bounding boxes.

4.0.0:

4 0 0

Others:

latest

The boxes visually look fine, even better than the old version. Please let me check my code before spending further time on this, maybe the problem is on my side.

@zdenop
Copy link
Contributor

zdenop commented Feb 25, 2019

Did you have change to check your code?

@lorenzob
Copy link
Author

Not yet. If you want we can close this and I will reopen it as soon as I have more details.

@zdenop
Copy link
Contributor

zdenop commented Feb 27, 2019

With Tesseract-ocr version 4.1.0-rc1-21-g8e83 I got this result:
image
so I expect problem should be in your code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants