Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for LSTM Diplopia issue #3476

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

woodjohndavid
Copy link

@woodjohndavid woodjohndavid commented Jun 29, 2021

This is a proposed fix for the LSTM diplopia problem where 2 characters are included in the LSTM output stream for the same physical position in the original image. According to my review of trace output for a limited number of test cases, the issue occurs when there are 2 possible characters essentially 'competing' for the same spot, where one of those characters is a better match in the earlier timesteps but the second character (usually the better eventual match) becomes the better choice in later timesteps. In this scenario, it is possible that there will be a beam which includes the first character choice and then adds the second character choice in the same beam after the first, once the first choice score has been reduced and it no longer appears in the TopN list.

This solution is limited to solving diplopia for 2 characters, but could be expanded to deal with a multiple character scenario.

This solution is also dependent upon assigning a value to variable kMinDiplopiaKey which is the minimum score (key value) for an output entry coming from the matrix which would be considered a likely valid character. See the code for further details.

I have tested this only on a very small set of diplopia problems. I am assuming that you folks have a much more extensive set of test cases to run this proposed change on to ensure that it has no unexpected results.

src/lstm/recodebeam.cpp Outdated Show resolved Hide resolved
Comment on lines 1234 to 1235
if (!in_possible_diplopia_)
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!in_possible_diplopia_)
return true;
if (!in_possible_diplopia_) {
return true;
}

src/lstm/recodebeam.h Outdated Show resolved Hide resolved
@woodjohndavid
Copy link
Author

Stefan, I just committed the changes you suggested to branch JDWDIPLOPIA

Thanks,

Dave

@stweil
Copy link
Member

stweil commented Jun 30, 2021

Stefan, I just committed the changes you suggested to branch JDWDIPLOPIA

I still don't see that latest changes. Did you push them to GitHub?

@woodjohndavid
Copy link
Author

Hi Stefan,

OK, apologies, newbie inexperience at this end. I have no previous Git experience. I am using GitHub Desktop for source control. I had committed the changes at my end, but had not done the 'push to origin'. Please check again now and hopefully you will find the changes there.

Sorry about that.

Dave

Copy link
Contributor

@Robyer Robyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just few pedantic suggestions about code style consistency.

src/lstm/recodebeam.cpp Outdated Show resolved Hide resolved
src/lstm/recodebeam.cpp Outdated Show resolved Hide resolved
src/lstm/recodebeam.cpp Outdated Show resolved Hide resolved
src/lstm/recodebeam.cpp Outdated Show resolved Hide resolved
src/lstm/recodebeam.cpp Outdated Show resolved Hide resolved
src/lstm/recodebeam.cpp Outdated Show resolved Hide resolved
@MinmoTech
Copy link

I'm sorry for making this kind of comment, but what's the status on this?

Suggested-by: Robert Pösel
Signed-off-by: Stefan Weil <[email protected]>

Co-authored-by: Robert Pösel <[email protected]>
src/lstm/recodebeam.cpp Outdated Show resolved Hide resolved
src/lstm/recodebeam.cpp Outdated Show resolved Hide resolved
src/lstm/recodebeam.cpp Outdated Show resolved Hide resolved
src/lstm/recodebeam.h Outdated Show resolved Hide resolved
src/lstm/recodebeam.h Outdated Show resolved Hide resolved
src/lstm/recodebeam.h Outdated Show resolved Hide resolved
src/lstm/recodebeam.cpp Outdated Show resolved Hide resolved
src/lstm/recodebeam.cpp Outdated Show resolved Hide resolved
src/lstm/recodebeam.h Outdated Show resolved Hide resolved
@stweil
Copy link
Member

stweil commented Jan 22, 2022

what's the status on this

Meanwhile there are some merge conflicts which must be resolved.

But the most important thing is that we need to test the changes that they improve the diplopia issue and that they don't introduce a regression. So it would help if whoever has done OCR before and after those changes and can confirm that it improves the OCR result without making some recognition worse could report it here.

@MinmoTech
Copy link

MinmoTech commented Jan 23, 2022

It doesn't seem to resolve the issue for me:

Recognized text: けげんこうちょう…?知らない町だな。
(The first character is duplicated)

Original Image:
screenshot(575)

After applying my optimizations:
image

(I.e. tesseract only sees the second image)

To make sure I'm actually using the right build, here are my steps to compile:

  • checking out the repo
  • checking out the PR with gh pr checkout 3476
  • ./autogen.sh
  • ./configure
  • make

@woodjohndavid
Copy link
Author

woodjohndavid commented Jan 30, 2022

Hello all:

Sorry, but I have no previous experience with open source development mechanisms, so not sure how to move things forward.

The changes I made in this pull request have been successfully tested by me, but admittedly only on a limited set of data and the fairly specific requirements I have, namely English text from computer screen contents with specific limited fonts being used. So indeed there is much more testing needed, but I have no mechanism for doing this.

For the test results reported by Juligreen, I would suggest that you investigate different values for the variable kMinDiplopiaKey. This value is critical to the identification of potential diplopia, and probably should be made into a configuration setting. But again, how to do that is beyond my level of experience.

OK I have fixed the merging issue I think and committed them to my branch. Hopefully I did it properly. @stweil please let me know.

@6A61736F6E206E61646572
Copy link

6A61736F6E206E61646572 commented Feb 17, 2022

With a dataset of around 4000 images that produced 103 "diplopia" affected results, this branch reduced it down to only 38.

@woodjohndavid
Copy link
Author

Thanks @ohk2kt3t4 that is good to know. Could you please attach a couple of the images you used where the diplopia was not successfully eliminated? I would like to take a look and see if I can figure out why.

@6A61736F6E206E61646572
Copy link

Unfortunately I cannot share the images, but I can try help you investigate if it is trivial enough.

@wollmers
Copy link

@ohk2kt3t4

With a dataset of around 4000 images that produced 103 "diplopia" affected results, this branch reduced it down to only 38.

Can you describe how you classified diplopia? Automatically with some heuristic rules (which ones?) or manually?

Then others can setup regression tests with license-free images.

@6A61736F6E206E61646572
Copy link

6A61736F6E206E61646572 commented Feb 21, 2022

All my images are of part numbers which have the same number of characters and follow the same character placement pattern (e.g. the first two characters are always digits): image.

When "diplopia" occurs, the resulting length of the OCR is always longer because tesseract has output two characters in the place of one (see a lot of "0O", "B8" etc pairs). So in that sense I am able to automatically detect them, but it might not be useful for other use-cases. Perhaps could try generating some images programmatically and see if can end up with some that will trigger diplopia.. then those could be shared and used for testing.

@MerlijnWajer
Copy link
Contributor

I am also seeing this problem in some cases, were we try to automatically OCR a lot of text, store it in spreadsheets, and then manually verify the results. I can pull in this pull request and re-run that OCR, and check if the diplopia problems are reduced or gone, if that's helpful. It'd be a narrow/specific set of mostly images like this:

image

Please let me know if that's helpful, and I'll try to do the evaluation. The images should be public, so I could share results where diplopia is (potentially) not eliminated.

@woodjohndavid
Copy link
Author

@MerlijnWajer it would be very helpful if you could test my diplopia fix as much as possible, and report your findings here. Also, if there are any diplopia examples which remain, and if you are able to share your images, please attach them here after your testing so I can take a further look.

I do know that the fix I posted was limited in scope, but it seemed to correct the diplopia cases we have generally encountered in our own use of Tesseract. Typically those cases involved situations where one particular character is fairly closely matched at the beginning of the given image segment, but then a second character turns out to be a better match as that image segment is further traversed. This can result in one or more beams which contain both characters, with the likelihood that such a beam will get a higher score and therefore be considered as the best match.

The fix as it currently stands does not address cases where there are more than 2 potential matching characters.

@woodjohndavid
Copy link
Author

@ohk2kt3t4 I understand that you are unable to share your images on this site, but perhaps you could just extract a few specific part numbers where diplopia is still occurring using my fix, and just attach partial images of those.

@woodjohndavid
Copy link
Author

As mentioned earlier on, I have no previous experience participating in an open source community. However, I am a very experienced developer and am interested in working on Tesseract issues, particularly those that are related to OCR of non-textual data, like part numbers, codes, etc. It is particularly in those cases that diplopia is a problem when it occurs.

I have asked earlier but need to ask again. How is it decided that things like the change that I have put forward in this pull request actually get included in the primary Tesseract release? What should I be doing to move this forward?

As indicated earlier in this thread, there have been some fairly large test runs done by @ohk2kt3t4 which seem to indicate that the fix in this pull request does fix a large percentage of diplopia cases (although not all) and does not seem to have negative side effects. So I am not sure why it hasn't moved forward, even though it is not a full solution to the diplopia issue.

In the meanwhile, I am continuing to work with the code, and trying to see if I can come up with a more universal solution.

@exander77
Copy link

@woodjohndavid Hello, I myself have come here to see if this fixes my wrongly positioned boxes issues, but it doesn't, I wrote a lot here: #3477

I am interested in this:

As I see it, therefore the LSTM matrix processing using the NetworkIO interface needs to add to its return values (in addition to the possible character and the likelihood score) the starting pixel location of the possible match, and the horizontal size of the potential match image from the train data. Once that is done, the rest should be relatively straightforward.

Can you point me to the code? I have a hard time navigating it.

@woodjohndavid
Copy link
Author

Yes, the code is hard to follow. And I agree with you that the ultimate solution should be that the LSTM engine returns the match coordinates in some form. However, it does not at this point, and I do not understand the LSTM engine operation sufficiently well to figure out how to get it to return those coordinates.

So what I have been working on is the code that runs after the LSTM engine does its thing, to see if there is a way to solve the diplopia problem at that stage. This pull request is my initial attempt on this, which is partially but not entirely successful. It does NOT make any attempt to correct the inaccurate box dimensions.

One thing you can try (which I did also) is to reduce the "timestep" size. That does improve the box dimensions with the code as it is otherwise, although when diplopia occurs they are still messed up. However, it seems that reducing the "timestep" would require full re-training of the LSTM model.

@exander77
Copy link

However, it does not at this point, and I do not understand the LSTM engine operation sufficiently well to figure out how to get it to return those coordinates.

Yeah, that pretty much summarizes my experiences so far. There is a tonne of comments in the code, but not really any explanation of the operation.

@exander77
Copy link

@woodjohndavid I am not ever sure where are the character positions calculated.

@liuyl07
Copy link

liuyl07 commented Jun 13, 2022

@MerlijnWajer Thanks for guiding me to this PR!
@woodjohndavid Hello, I tested the below image with your fixes in the PR with different kMinDiplopiaKey values (0.25,0.5,0.75), but the result (DOT 0O4N 6VHPPC) keeps unchanged with an extra ZERO in bold.
TesseractInputImageSingle

Could you please take a look if it is possible to improve on this case?

@liuyl07
Copy link

liuyl07 commented Jun 27, 2022

Updates for my question above.

By printing the below log in the function RecodeBeamSearch::ExtractBestPathAsWords(), it is obvious that there are 2 possible characters (zero 0 and big O) essentially 'competing' for the same spot. Both of them eventually are shown in the final recognition result.
image

However, it seems that various kMinDiplopiaKey values don't help for this particular case...

Another interesting clue which may help us to resolve the diplopia issue to some extent is that, before recoding beam, the number of extracted blobs are exactly 12, after recoding beam, we get 13 letters in the recognition result with one dummy zero as shown in my last comment.

@liuyl07
Copy link

liuyl07 commented Jul 16, 2022

@woodjohndavid Sorry to trouble you. From all your comments in the pull request, I think you should be definitely interested in some cases where your fixes can be further improved. Could you take a look at my example which meets the conditions of your fixes, i.e., diplopia for 2 characters (0 and O).

@woodjohndavid
Copy link
Author

Hello @liuyl07

Sorry, have been busy with other things and don't get back to Tesseract very often. Anyway, it is not surprising that there are some diplopia cases that don't work with this fix, and some that do. If you notice posts by @ohk2kt3t4 earlier in this thread, he has found with a fairly big sample that this fix seems to work for about 70% of the diplopia examples, so it is an improvement, but not a cure.

However, I have been looking at this in more detail and have some additional ideas that might help further. So while I can't promise any particular timing, I will let you know if/when I have another attempt. I will use your sample as one test case.

@woodjohndavid
Copy link
Author

Another update @liuyl07

I did run your sample on my system, and did NOT have the same result as you did. I did not get the diplopia '0O' I just got the 'O'

@DesBw
Copy link

DesBw commented Sep 8, 2023

Is this code merged?
I am getting a lot of diplopia in my projects lately. I was wondering if it can reduce it.

@tfmorris
Copy link
Contributor

Sounds like this is superseded by #4211. Should it be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.