Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implemented text recognition (ocr) #272

Merged
merged 29 commits into from
Jun 16, 2024
Merged

implemented text recognition (ocr) #272

merged 29 commits into from
Jun 16, 2024

Conversation

ston1th
Copy link

@ston1th ston1th commented Jan 17, 2024

I implemented another comparison method based on OCR.

This could be a useful addition in cases where modern game rendering and visual effects (clutter) cause difficulties to find good comparison images.

It currently depends on pytesseract and Tesseract-OCR but tests with EasyOCR have also been conducted. Both seem to get similar good recognition results. EasyOCR looks like to cause higher CPU load then tesseract. Tesseract on the other hand is an external dependency that needs to be installed seperatly.

The text comparison of the expected and recognized string has two modes. A perfect 1:1 match or the levenshtein ratio.

I also introduced two new file config options:

  • Rectangle position (only used for text files)
  • FPS limit per text or image file

Please let me know what you think of this feature.

ston1th and others added 7 commits January 17, 2024 21:46
I implemented another comparison method based on OCR.

This could be a useful addition in cases where modern game rendering
and visual effects (clutter) cause difficulties to find good comparison
images.

It currently depends on pytesseract and Tesseract-OCR but tests
with EasyOCR have also been conducted. Both seem to get similar good
recognition results. EasyOCR looks like to cause higher CPU load then
tesseract. Tesseract on the other hand is an external dependency that
needs to be installed seperatly.

The text comparison of the expected and recognized string has two modes.
A perfect 1:1 match or the levenshtein ratio.

I also introduced two new file config options:

* Rectangle position (only used for text files)
* FPS limit per text or image file

Please let me know what you think of this feature.
@Avasam
Copy link
Collaborator

Avasam commented Jan 18, 2024

I'll look into the code changes after I come back from GDQ, but I love the idea of a comparison method that specializes in text comparison/recognition.

Is the per-file FPS limit necessary to the implementation? Could you split it into a different PR?

Idk about the rectangle position option, but maybe it'll make sense once I give the implementation a proper look.

@ston1th
Copy link
Author

ston1th commented Jan 18, 2024

Hi @Avasam first of all have fun and good luck at GDQ.

To your question: yes I find the FPS limit necessary to not max out CPU usage too much.
I included a note in the README regarding this. A quick FYI:

Note: This method can cause high CPU usage at the standard comparison FPS. You should therefor limit the comparison FPS when you use this method to 1 or 2 FPS using the limit option !1! in the file name.
The size of the selected rectangle can also impact the CPU load (bigger = more CPU load).

@realRammbob
Copy link

Heya! Div2 Content Creator here.
I may or may not have inspired this OCR-method implementation after finding and falling in love with this Autosplitter. :D

As for the user-perspective regarding Div2:
Div2 has a short mission-description for most Checkpoints in missions, and when doing activities and other stuff it shows a 5-8 second pop-up with Text. The text is completely white, and using that with the current methods causes a false-positive when blinded by a flashbang (white screen).
If one uses the shadow of the text (or puts black pixels where shadow is supposed to be), there is a somewhat working threshold difference to not trigger with flashbang - but depending on weather of open world it still gets false positives here and there. (and lowering threshold below 96 will sometimes not split in those 5-8 sec).

With the OCR this issue would probably be solved, and better yet: There's different activities and i would love to split "complete 5 activities" for example. With Text-pop-ups like "Broadcast Restored", "All Hostages Saved" and "Perimeter Secured" one could scan for the "ed" at the end and properly split those, while keeping "Watch Level up" and other false-positives away. :)

Hopefully info from the user-perspective is helpful here as well, if not ignore my comment :D

Enjoy GDQ and looking forward to testing the new method if it gets approved =)

@realRammbob
Copy link

Hello again - not meaning to stress, but i'm really, really looking forward to using this method for a variety of auto-split scenarios.
Any update on regarding looking at the code? :)

Avasam

This comment was marked as resolved.

ston1th and others added 2 commits February 3, 2024 16:11
* rewrite text files to contain the rectangle position
* switch to easyocr since there was no way to use pytesseract or
  tesserocr reliably without PIL
* display text that is searched for
* set default FPS limit for OCR to 1
* minor fixes
Avasam

This comment was marked as resolved.

ston1th and others added 2 commits February 3, 2024 19:49
* switch back to tesseract
* ditch all python binding libraries to not include Pillow
* call tesseract ourselfs
Copy link
Collaborator

@Avasam Avasam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I forget, just writing down some ideas:

  • We could validate that user-provided characters are all supported by tesseract
  • We could let the user provide a set of "characters that could appear on screen", to give tesseract as an allow-list, improving consistency and speed.

src/compare.py Outdated Show resolved Hide resolved
src/compare.py Outdated Show resolved Hide resolved
src/compare.py Outdated Show resolved Hide resolved
@ston1th
Copy link
Author

ston1th commented Feb 4, 2024

@Avasam the README.md file is missing in the allowed paths in the lint-and-build.yml action and thus blocking the build. Could you add it please?

@Avasam
Copy link
Collaborator

Avasam commented Feb 4, 2024

lint-and-build.yml

It's not a "allowed path", it's a "trigger the build on changing these files". README doesn't need to trigger lint/type/build checks.

It's just that workflow requires approval.
You can run locally using the scripts/lint.ps1 script (or running the commands found inside individually)

Copy link
Collaborator

@Avasam Avasam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're getting close! Mostly polishing documentation

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
src/compare.py Outdated Show resolved Hide resolved
src/AutoSplitImage.py Outdated Show resolved Hide resolved
src/AutoSplitImage.py Outdated Show resolved Hide resolved
src/utils.py Outdated Show resolved Hide resolved
src/utils.py Outdated Show resolved Hide resolved
src/AutoSplitImage.py Show resolved Hide resolved
src/AutoSplitImage.py Outdated Show resolved Hide resolved
@realRammbob
Copy link

Hello again :)
Just wanted to drop in with a huge thank you for this!
I went ahead and tested this version today, and it went great for the most part!

For Division 2 it was way easier to set up the splits for autosplitting - tho i made some human errors along the way.
Most notably i could now autosplit events that could either be succesful or fail (only difference being in the text), as well as have one file to basically split all the random activities that i run during the randomizer-thing i'm doing :)
(If you want to take a look at how the Autosplitter did during a Countdown run with the new method, you can see that here )

Regarding the FPS-limit and Tesseract:
Even if the FPS-limit is high, it won't go very high due to the method. If the box i choose in the file is huge (like 1080p fullscreen), it takes way longer than 1 second until the next image is processed.
Optimizing this was quite easier as a user than getting optimal image data with paint.net for me, but it also opened up a question for me:

During missions, a new objective pops up in the middle of the screen for short, travels to top (still centered), stays still shortly, then travels to the left where the other objectives are listed and stays there until it's done.
To have it properly auto-split, one would need it to split when centered - will do tests if it does that 100% of the time later.

There is a trade-off i think:

  • Only check where it pops up in center: Best FPS-performance, best split-accuracy, highest risk of no split
  • Only check the vertical part: FPS-sacrifice, long time-frame for low risk of no split, somewhat good split-accuracy still
  • Check the whole rectangle where the text could be at any given time: Worst FPS, worst split-accuracy, almost impossible to not split
    I tried to visualize it here:
    example

Is there any interest or does it even make sense for me to test stuff like this and report on it here?

Copy link
Collaborator

@Avasam Avasam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few typos and small improvements to new code.

src/AutoSplitImage.py Outdated Show resolved Hide resolved
src/AutoSplitImage.py Show resolved Hide resolved
src/error_messages.py Outdated Show resolved Hide resolved
src/AutoSplitImage.py Outdated Show resolved Hide resolved
src/compare.py Outdated Show resolved Hide resolved
src/utils.py Show resolved Hide resolved
src/compare.py Outdated Show resolved Hide resolved
src/compare.py Outdated Show resolved Hide resolved
src/utils.py Outdated Show resolved Hide resolved
src/utils.py Outdated Show resolved Hide resolved
scripts/requirements.txt Outdated Show resolved Hide resolved
@realRammbob
Copy link

realRammbob commented Feb 12, 2024

Here's some more user-feedback and testing notes:

(1) False-Positive issue with OCR on few characters
I did a Level 31-40 run trying to use OCR on the two Level-digits. The results are:
Duration 2 hours 40 minutes (~9600 OCR-checks)
8 out of 8 Level-ups detected
9 False-Positives

Basically when OCR checks again and again, in some images it misreads the number and as there is no hold-variable yet, it simply splits right then.
If a user is depending on very few characters with this method, the false-positive likelihood is kinda huge (even with a 100% threshold).
(Sidenote: For the run i can use a "Level Up" pop up at center of screen. It will split later, but not have false-positives. So there is a solution, but would've been nice if OCR was able to for example split by just checking the last digit.)
Here's a screen showing the digits at top right that i tried OCR on (while it also had a false-positive)

example

(2) Capture-Device out of bounds Crash on Start-up
I configured my Autosplit to use a capture device, so it now uses my capture card which should work best as it just contains the footage of the game.
However when starting it up and loading a settings.toml, it chooses the device via device ID which sometimes is swapped out with my Logitech Capture or OBS Virtual Camera device. When it tries to use Logitech Capture, it doesn't have 1080p coordinates, so if there's also an Start_Auto_Splitter image that tries to read out of bounds, it crashes immediately upon start-up.
To fix it, i can delete the settings.toml and create a new one, or edit the existing one with notepad for example and fix either the device-ID (by guessing) or choose a different split folder so it doesn't start OCR out of bounds.

(3) AutoSplit Integration can't split
I tried using the (AutoSplit Integration) to have it start automatically when starting Livesplit with splits & Layout containing the Integration. While it starts up and works fine starting a run, when i tried it with OCR it never split even tho it got beyond the threshold and also showed going on pause. So basically when Autosplit was externally controlled, it wasn't able to progress splits in LiveSplit. (didnt test further, not sure if its an OCR bug or also not working in general)

@Avasam
Copy link
Collaborator

Avasam commented Feb 12, 2024

False-Positive issue with OCR on few characters

I think that's just gonna be a limitation of the technology. A hold flag (#120) is still the best solution I can think of.

out of bounds Crash

If I understand correctly, this should be easy to replicate by just setting the OCR crop outside of the capture area. Will need to be fixed first. Not certain if I wanna send an error popup (and reset, otherwise you'd be stuck in an error loop) or gracefully handle it.

I guess there's no valid reason to change the capture size mid-run, unless you're testing, so we could include that as part of the initial checks, and if it happens-mid-run, then reset AutoSplit.

AutoSplit Integration can't split

I can't immediately think of a reason why. From the main logic's PoV, there should be no difference between OCR and regular images (other than for displaying the current split). Will have to test.

@realRammbob
Copy link

Unsure if this is relevant or i made a mistake, but for testing i used this version until now. It worked great, especially in RDO to figure out a mission start via

texts = ["go to", "search", "capture", "find", "deliver"]

It always had a 100% match with strings like "Go to the shack", "Kill or capture Gustavo", "Help deliver the goods to Wallace Station" etc.

Yesterday i tried going down this list and downloaded newer versions. 2 Versions (didnt document which one, sorry D: ) crashed upon trying to get it started, and 2 other versions worked. However, those newer versions didnt work on the first two missions i tested, so i checked...

Next mission the text "Go to the shack." only got 50% match (and i use a 98% or 100% threshold, cause it worked amazing with the first version i tested). Did something crucial about the OCR-matching change? I went back to the earliest version i used for all the tests and will keep it that way for now... :D

@realRammbob
Copy link

realRammbob commented Feb 16, 2024

Probably related to the out of bounds coordinate crash, but when in the settings i'm using my Capture-Card while also having an Start_Auto_Splitter image (aka Autosplit running) and then opening the settings, it also crashes.
Here's a video showcasing it:

2024-02-16.14-14-04.mp4

@Avasam Avasam force-pushed the dev branch 2 times, most recently from f14aafc to a1a26ab Compare March 9, 2024 20:52
@Avasam
Copy link
Collaborator

Avasam commented Mar 9, 2024

@ston1th There is now a merge conflict due to moving out the tutorial/user guide into its own file.

@ston1th
Copy link
Author

ston1th commented Mar 11, 2024

@Avasam Noted. I'll fix this along with the rest once I find some free time again.

ston1th added 3 commits March 21, 2024 20:45
this commit improves the handling of the rectangle coordinates.
the new scheme uses the top_left and bottom_right (X/Y) coordinates.

the migration from the old scheme works as follows:

```
top_left = [<top_left>, <bottom_left>]
bottom_right = [<top_right>, <bottom_right>]

old:
top_left = 275
top_right = 540
bottom_left = 70
bottom_right = 95

new:
top_left = [275, 70]
bottom_right = [540, 95]
```

you can now specify multiple matching methods and look for the best
`text : method` match:

```
old:
method = 0

new:
methods = [0]
or:
methods = [2, 1, 0]
```
@ston1th
Copy link
Author

ston1th commented Jun 3, 2024

Hey @Avasam when you have time, could you please review the latest changes?

@Avasam
Copy link
Collaborator

Avasam commented Jun 3, 2024

Oh sorry I completely forgot about this!!

Thanks for the ping. I'll test the latest changes when I have time (not today), and I think as long as it doesn't break any existing feature, I'll get it in and publish a new release where it's clearly marked as experimental (so I'm allowed to introduce a breaking change for this feature if I wanna change something)

src/compare.py Outdated Show resolved Hide resolved
src/compare.py Outdated Show resolved Hide resolved
if is_valid_image(self.split_image.byte_array):
if self.split_image.ocr:
text = "\nor\n".join(self.split_image.texts)
self.current_split_image.setText(f"Looking for OCR text:\n{text}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't block for this with the TODO comment. Just bumping as a reminder we should test it to confirm

src/error_messages.py Outdated Show resolved Hide resolved
src/AutoSplitImage.py Outdated Show resolved Hide resolved
src/AutoSplitImage.py Outdated Show resolved Hide resolved
src/AutoSplitImage.py Outdated Show resolved Hide resolved
src/AutoSplitImage.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@Avasam Avasam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the variable names in the OCR text file and the removal of a potentially redundant 1:1 comparison. I don't think any of my request include functional changes. I might just do the changes directly in your PR and handle linting / type checking fixes to move this along.

I could actually use this to split on Shaman Shop in Pitfall 100%, so I can properly test by dogfooding.

@Avasam Avasam merged commit 4f47abb into Toufool:dev Jun 16, 2024
16 checks passed
@ston1th
Copy link
Author

ston1th commented Jun 17, 2024

@Avasam I just looked at your changes, may I ask why you changed the rectangle format back to the old one?

I think the new one was more explanatory and better understandable using just the X/Y coordinates of two points in the image.
I wanted to make this fix before people adopt this feature despite it being experimental.

@Avasam
Copy link
Collaborator

Avasam commented Jun 17, 2024

@ston1th This had been stalling for too long since I forgot about it, and didn't want to keep you waiting any longer.

Brought the PR to a state I was happy merging, and it doesn't affect existing functionality, so I did.

Feel free to open a follow-up PR for any fix and improvement ! Any follow-up should be much easier and faster to review at this point.

As for the coordinates, I found it really odd that this was the only place using two points. Especially since we effectively just immediatly split it up again in code.

If you still disagree, I can always put it up to a vote with the users on Discord to see what they think.


And once again, thanks a lot for implementing this awesome feature !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants