Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text recognition #9

Open
test2a opened this issue Oct 25, 2023 · 3 comments
Open

Text recognition #9

test2a opened this issue Oct 25, 2023 · 3 comments

Comments

@test2a
Copy link

test2a commented Oct 25, 2023

Memes and photos have text overlays. Usually the file name is not enough to find the right photos

Would it be possible to recognize text and index that?

@slavabarkov
Copy link
Owner

@test2a In theory it should already be working reasonably well with digitally rendered text, and from my experience actually works quite good on my device for that task. Since the used CLIP model is trained on a bunch of image/text pairs scraped from the internet, it has some semantic OCR representation capabilities already and responds to the presence of text pretty well. Obviously the separate OCR model would be even better, so I'll probably consider adding that if I decide to implement it in the future.

@sowa705
Copy link

sowa705 commented Oct 25, 2023

Hi, clip works well for short text snippets but fails for different languages or longer text. I think it might be a good idea to introduce multiple "sources" for similarity like clip, ocr and potentially others. Might be good to keep in mind for the future

@waqqas31
Copy link

Hello @slavabarkov

I was directed to your app when I reported that my Samsung Gallery app was no longer performing text searches on images taken and/or stored on the phone.

My primary use case is searching for text, and I had some feedback for you.

  1. When searching for terms that seem to have no exact matches, the result set is "scattered" with lots of blank results in between actual pictures.
  2. Multidigit numbers are treated as separate single-digit numbers. E.g. "786" will return all results that include a "7", "8" and a "6", but not necessarily together.
  3. Results do not seem to be sorted from the best matches (to the worst.) Exact matches are scattered between partial matches.
  4. It would be really helpful to support exact matches only (using quotation marks.)
  5. It would be helpful to have a "Refresh index" option within the app, instead of having to kill the app and relaunch it.
  6. If you can implement an OCR function to scan all text in all pictures, that would be EXTREMELY valuable.
  7. When we open a picture, if we can see the path and filename, that will help us understand if the search term matched the picture or part of the metadata.
  8. If next to the "Share" button you can add a button to open the picture with the default gallery app, that would be very useful, too.

That's all my feedback for now.

Thanks for all your hard work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants