-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature request: dump only failed images. #34
Comments
Signed-off-by: Rüdiger Sonderfeld <[email protected]>
I implemented this option in a branch for now. I'm not sure if it is really needed. I think opening them in an external image editor should be done in a GUI which could also use Eventually I want to figure out how to extract more information from tesseract about the OCR process. It should provide some kind of confidence or error estimate. That would probably be even more useful than simply looking for images with complete OCR failure. |
Cloning into 'VobSub2SRT'... Maybe I did something wrong but. [greg@greg-desktop test]$ vobsub2srt --dump-error-images vobsub As you can see there are no images. Also I wanted a prompt because vobsub2srt deletes the line it can't ocr then shifts all the timecoodes. So not only do I have to figure out the ocr. I got to manually open the idx and get the time codes too then edit the line in. It's kinda annoying :P |
Could you provide me with a sample file? (e.g., via e-mail [email protected]) I have several VobSub samples but none for which OCR fails.
shifts all the timecodes? That's strange. I'll have to test it. I guess the best way would be to write the error message to the SRT as well. That way a GUI tool could easily point to the part of the SRT that needs fixing. |
I sent them. By shift imecodes. I mean. It completly deletes the empty line. Ie if it were line 21 itd make line 22 become line 21. |
hmm works for me.
maybe you are calling an old version of vobsub2srt or haven't rebuild it properly. |
b70b6f5 should fix the shifting problem and writes an error message to the SRT in case of OCR error. Thanks for reporting that issue and providing me with the sample subtitles. |
I got it but cant for the life of me figure out whats needed to open a pmg... Nothing I try can view it |
PGM is a rather simple format. What operating system are you using? On Linux you should enter |
Hmm It appears the Images It fails to ocr are corrupt? I can open the rest just fine :/ |
Ah, ok. I was surprised that tesseract would simply return NULL for an OCR error but in fact it seems to be an error with the bitmap data. It seems the subtitle has a height of 0. Are those subtitles displayed when you watch them with MPlayer? Do they contain actual text? |
most of them are nothing but ocasionally its a line :/. Watching in mplayer everything displays fine |
ah, that's bad. Because it means the problem is not in the mplayer code but how I call the mplayer code. This will probably take a while for me to figure it out. Are these the only subtitles you have with errors? They are only 6 frames with error so I guess you can work around that for now. Sorry about that. |
Hi, thanks for the work. Amazing tool to get rid of vobsub. I had two or three missing lines on the sub i processed. I spotted them when i watched the movie. I compared with the vobsub to make sure there was a miss. My problem is that those mistakes are not detected/signaled during process even with the --dump-error-images option. The missing lines don't let any clue in the srt file. There is apparently no way to detect those errors except watching the whole movie. Do i miss something ? If one day you feel you want to attack this issue, here are my files : You'll need french tesseract data (tesseract-ocr-fra package in ubuntu) One miss is between 753 and 754, at 01:04:42 . |
ERROR: OCR failed for 1
ERROR: OCR failed for 23
ERROR: OCR failed for 133
ERROR: OCR failed for 367
ERROR: OCR failed for 367
ERROR: OCR failed for 386
Can you make an argumen to dump only the images that failed to ocr? And if possible allow them to be opened in external image editor so I can be prompted on the cli for a fix?
The text was updated successfully, but these errors were encountered: