-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for LSTM Diplopia issue #3476
base: main
Are you sure you want to change the base?
Conversation
src/lstm/recodebeam.cpp
Outdated
if (!in_possible_diplopia_) | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!in_possible_diplopia_) | |
return true; | |
if (!in_possible_diplopia_) { | |
return true; | |
} |
Stefan, I just committed the changes you suggested to branch JDWDIPLOPIA Thanks, Dave |
I still don't see that latest changes. Did you push them to GitHub? |
Hi Stefan, OK, apologies, newbie inexperience at this end. I have no previous Git experience. I am using GitHub Desktop for source control. I had committed the changes at my end, but had not done the 'push to origin'. Please check again now and hopefully you will find the changes there. Sorry about that. Dave |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just few pedantic suggestions about code style consistency.
I'm sorry for making this kind of comment, but what's the status on this? |
Suggested-by: Robert Pösel Signed-off-by: Stefan Weil <[email protected]> Co-authored-by: Robert Pösel <[email protected]>
Signed-off-by: Stefan Weil <[email protected]>
Signed-off-by: Stefan Weil <[email protected]>
Meanwhile there are some merge conflicts which must be resolved. But the most important thing is that we need to test the changes that they improve the diplopia issue and that they don't introduce a regression. So it would help if whoever has done OCR before and after those changes and can confirm that it improves the OCR result without making some recognition worse could report it here. |
Hello all: Sorry, but I have no previous experience with open source development mechanisms, so not sure how to move things forward. The changes I made in this pull request have been successfully tested by me, but admittedly only on a limited set of data and the fairly specific requirements I have, namely English text from computer screen contents with specific limited fonts being used. So indeed there is much more testing needed, but I have no mechanism for doing this. For the test results reported by Juligreen, I would suggest that you investigate different values for the variable kMinDiplopiaKey. This value is critical to the identification of potential diplopia, and probably should be made into a configuration setting. But again, how to do that is beyond my level of experience. OK I have fixed the merging issue I think and committed them to my branch. Hopefully I did it properly. @stweil please let me know. |
With a dataset of around 4000 images that produced 103 "diplopia" affected results, this branch reduced it down to only 38. |
Thanks @ohk2kt3t4 that is good to know. Could you please attach a couple of the images you used where the diplopia was not successfully eliminated? I would like to take a look and see if I can figure out why. |
Unfortunately I cannot share the images, but I can try help you investigate if it is trivial enough. |
@ohk2kt3t4
Can you describe how you classified diplopia? Automatically with some heuristic rules (which ones?) or manually? Then others can setup regression tests with license-free images. |
I am also seeing this problem in some cases, were we try to automatically OCR a lot of text, store it in spreadsheets, and then manually verify the results. I can pull in this pull request and re-run that OCR, and check if the diplopia problems are reduced or gone, if that's helpful. It'd be a narrow/specific set of mostly images like this: Please let me know if that's helpful, and I'll try to do the evaluation. The images should be public, so I could share results where diplopia is (potentially) not eliminated. |
@MerlijnWajer it would be very helpful if you could test my diplopia fix as much as possible, and report your findings here. Also, if there are any diplopia examples which remain, and if you are able to share your images, please attach them here after your testing so I can take a further look. I do know that the fix I posted was limited in scope, but it seemed to correct the diplopia cases we have generally encountered in our own use of Tesseract. Typically those cases involved situations where one particular character is fairly closely matched at the beginning of the given image segment, but then a second character turns out to be a better match as that image segment is further traversed. This can result in one or more beams which contain both characters, with the likelihood that such a beam will get a higher score and therefore be considered as the best match. The fix as it currently stands does not address cases where there are more than 2 potential matching characters. |
@ohk2kt3t4 I understand that you are unable to share your images on this site, but perhaps you could just extract a few specific part numbers where diplopia is still occurring using my fix, and just attach partial images of those. |
As mentioned earlier on, I have no previous experience participating in an open source community. However, I am a very experienced developer and am interested in working on Tesseract issues, particularly those that are related to OCR of non-textual data, like part numbers, codes, etc. It is particularly in those cases that diplopia is a problem when it occurs. I have asked earlier but need to ask again. How is it decided that things like the change that I have put forward in this pull request actually get included in the primary Tesseract release? What should I be doing to move this forward? As indicated earlier in this thread, there have been some fairly large test runs done by @ohk2kt3t4 which seem to indicate that the fix in this pull request does fix a large percentage of diplopia cases (although not all) and does not seem to have negative side effects. So I am not sure why it hasn't moved forward, even though it is not a full solution to the diplopia issue. In the meanwhile, I am continuing to work with the code, and trying to see if I can come up with a more universal solution. |
@woodjohndavid Hello, I myself have come here to see if this fixes my wrongly positioned boxes issues, but it doesn't, I wrote a lot here: #3477 I am interested in this:
Can you point me to the code? I have a hard time navigating it. |
Yes, the code is hard to follow. And I agree with you that the ultimate solution should be that the LSTM engine returns the match coordinates in some form. However, it does not at this point, and I do not understand the LSTM engine operation sufficiently well to figure out how to get it to return those coordinates. So what I have been working on is the code that runs after the LSTM engine does its thing, to see if there is a way to solve the diplopia problem at that stage. This pull request is my initial attempt on this, which is partially but not entirely successful. It does NOT make any attempt to correct the inaccurate box dimensions. One thing you can try (which I did also) is to reduce the "timestep" size. That does improve the box dimensions with the code as it is otherwise, although when diplopia occurs they are still messed up. However, it seems that reducing the "timestep" would require full re-training of the LSTM model. |
Yeah, that pretty much summarizes my experiences so far. There is a tonne of comments in the code, but not really any explanation of the operation. |
@woodjohndavid I am not ever sure where are the character positions calculated. |
@MerlijnWajer Thanks for guiding me to this PR! Could you please take a look if it is possible to improve on this case? |
@woodjohndavid Sorry to trouble you. From all your comments in the pull request, I think you should be definitely interested in some cases where your fixes can be further improved. Could you take a look at my example which meets the conditions of your fixes, i.e., diplopia for 2 characters (0 and O). |
Hello @liuyl07 Sorry, have been busy with other things and don't get back to Tesseract very often. Anyway, it is not surprising that there are some diplopia cases that don't work with this fix, and some that do. If you notice posts by @ohk2kt3t4 earlier in this thread, he has found with a fairly big sample that this fix seems to work for about 70% of the diplopia examples, so it is an improvement, but not a cure. However, I have been looking at this in more detail and have some additional ideas that might help further. So while I can't promise any particular timing, I will let you know if/when I have another attempt. I will use your sample as one test case. |
Another update @liuyl07 I did run your sample on my system, and did NOT have the same result as you did. I did not get the diplopia '0O' I just got the 'O' |
Is this code merged? |
Sounds like this is superseded by #4211. Should it be closed? |
This is a proposed fix for the LSTM diplopia problem where 2 characters are included in the LSTM output stream for the same physical position in the original image. According to my review of trace output for a limited number of test cases, the issue occurs when there are 2 possible characters essentially 'competing' for the same spot, where one of those characters is a better match in the earlier timesteps but the second character (usually the better eventual match) becomes the better choice in later timesteps. In this scenario, it is possible that there will be a beam which includes the first character choice and then adds the second character choice in the same beam after the first, once the first choice score has been reduced and it no longer appears in the TopN list.
This solution is limited to solving diplopia for 2 characters, but could be expanded to deal with a multiple character scenario.
This solution is also dependent upon assigning a value to variable kMinDiplopiaKey which is the minimum score (key value) for an output entry coming from the matrix which would be considered a likely valid character. See the code for further details.
I have tested this only on a very small set of diplopia problems. I am assuming that you folks have a much more extensive set of test cases to run this proposed change on to ensure that it has no unexpected results.