-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repair boundingbox of individual characters of textangle 90 text #3599
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the two unused code lines and fix the indentation of the remaining line.
Is that code only used when making boxes, or is it also used in recognition?
Done
I see this second question has been striked through. I don't know. I've run make check afterward and after gigabytes of possibly dependend languages were downloaded and in place the checks all ran fine. It might be helpful to explicitly require only a needed set of languages in advance of the make check. |
@stweil @zdenop Is there anything that could be done to get this PR merged? I can create a separate PR rebased on top of latest main branch tip, add tests, etc. I have looked into all call paths that could potentially be affected by the code change. There aren't too many of them in the first place and all of them are within the auxiliary functionality of getting the results of tesseract out, not the recognition itself. As a consequence, the risk of a regression is probably relatively small. The following is the list of APIs affected:
|
I am sorry, but I have a minimum spare time for tesseract. PR seems to be interesting, but as this effect API, it should be well tested including effect on training. |
Thanks for response. I will do extensive testing and present results in a way that requires as little time as possible to review. |
@p12tic I'm interested in solving the bounding box problem. I will try to write regression tests with automatic measures covering more scripts, languages and fonts. It will need some time if I find some time in the next weeks/months. The complicated part is to create ground truth with correct bounding boxes. |
@wollmers This is great to hear. Is there any way to help? I could translate very high-level directions into working code :-) For you answering a small number of questions should take much less time than doing the implementation. To me it seems that annotating ground truth images with correct bounding boxes is work that is not complicated in principle, but just needs a lot of effort for automation and reviewing. This would be a perfect task for an external developer like me to accomplish. I'm assuming that you don't want to go the route of rendering text and OCRing the result images back, like when doing LSTM training in certain cases. In this case the character positions are essentially already known. Well, at least that's my understanding which could be completely wrong. |
Sorry, mismatched this PR with PR 3787. For 3787 (normal text without rotation) I wrote an approach at the weekend. See ocr-bbox-gt in prototypish Perl (without the dependencies, not published yet). If you can read it you can port it to your favourite language, which is maybe Python. For text angles other than 0 degrees, the text image can be rotated before OCR and the bounding boxes geometrically transformed back. For degrees other than a multiple of 90 a polygon notation is needed, something like Just use a clean image of text, which has no recognition errors (CER 0.0). That's the case for the sample image in 3787. Then use a legacy model with Now we can check the quality as follows:
Of course this works only with clean, generated images in one and the same font, style and size. But we want to isolate the problem, reduce it only to bbox errors, thus want to exclude all other seasons for errors. As text one page of the Human Rights Declaration (available in ~500 languages) can be used. Format it with a popular font, export as PDF, pdftoimage, tesseract. That's the work to get ground truth. Then measure the errors compared between ground truth, before patch, after patch. With legacy only a few characters have deviation:
One of the 3 errors (deviation 3 pixels): It would be easy to correct this few remaining errors in a website (import the bboxes as JSON and wite the corrections back). Then the resulting bbox file is the ground truth. With CTC/LTSM Tesseract release 5.1.0 it looks like this:
The same part of the image with CTC/LTSM: |
@wollmers wrote
Yes, I first wasn't able to understand what my rotation fix had to do with your response, but as I now also have run into a bounding-box issue I'll be glad trying your PR to see if that fixes it. I'll first see whether I can satisfactorily get it running with LSTM before reverting to OEM 0 for my bounding boxes. My fix doesn't fix straight up bounding boxes. |
Solution to issue #3590 (makebox doesn't output horizontal coordinates of textangle 90 content).
I followed these lines back to 2010, there has been no-one fiddling with these lines, however they were most suspect of excluding RIL_SYMBOL from the matrix transformation at textangle 90.
TBOX rotate operations don't seem expensive, so it's not known why the exclusion for RIL_SYMBOL has ever been introduced.