[REQUEST] Support Google Cloud Vision API for OCR #116

SnowySailor · 2024-12-19T00:27:41Z

The manga_ocr model works decently well, but still messes things up quite often in my experience, even with high resolution manga panels. Specifically if there are any obscure/dense kanji it sometimes doesn't recognize them, and it often omits dakuten over kana characters or has trouble differentiating between small/large characters in words like しょっちゅう. This could be an artifact of the comic text extractor, but it's unclear to me specifically what's causing it.

I use the google cloud vision API for a personal project of mine and I can count on one hand the number of times it has messed something up in the last few months of me using it heavily.

Someone using the vision API would just have to provide their API credentials as an argument and it would use the vision API instead of manga_ocr. The vision API should return blocks with dimensions/locations of text, so it should be possible to mutate that into the mokuro json format.

kha-white added the enhancement New feature or request label Jan 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REQUEST] Support Google Cloud Vision API for OCR #116

[REQUEST] Support Google Cloud Vision API for OCR #116

SnowySailor commented Dec 19, 2024

[REQUEST] Support Google Cloud Vision API for OCR #116

[REQUEST] Support Google Cloud Vision API for OCR #116

Comments

SnowySailor commented Dec 19, 2024