LLaVA can read? #61

mudomau · 2023-04-25T19:01:37Z

mudomau
Apr 25, 2023

I was wondering if it's capable of reading text in the images. It seems to do ok!

haotian-liu · 2023-04-25T21:33:49Z

haotian-liu
Apr 25, 2023
Maintainer

Hi, great observation and nice example!

This is one of the interesting emerging property that we see from LLaVA, although it has not been explicit trained / instructed to perform text recognition in images (OCR). Such data is also scarce in our training. One possible explanation would be that these were learnt during the CLIP pretraining (our vision encoder), and some of these capability are transferred to our model, during the feature alignment process.

We are working on exploring this, and also seeking for improvements on these interesting capabilities, to make the LLaVA even better!

Looking forward to more discussions :)

3 replies

mudomau Apr 25, 2023
Author

I've tried some more complex examples (more, smaller text), and it's struggling a bit more. It's capable of picking out letters/words (and likely using the language model to complete from the letters picked out), but not quite of "transcribing". Still, I'm amazed by how well this works, even after I quantized it. Congrats!

haotian-liu Apr 25, 2023
Maintainer

So glad to hear that you are happy playing with it! Failure to recognize those small characters is currently a weakness of LLaVA, as the input image resolution is quite low (224x224). Trying to improve the high resolution recognition :)

YXTR Apr 27, 2023

Can it read Chinese?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLaVA can read? #61

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

LLaVA can read? #61

mudomau Apr 25, 2023

Replies: 1 comment · 3 replies

haotian-liu Apr 25, 2023 Maintainer

mudomau Apr 25, 2023 Author

haotian-liu Apr 25, 2023 Maintainer

YXTR Apr 27, 2023

mudomau
Apr 25, 2023

Replies: 1 comment 3 replies

haotian-liu
Apr 25, 2023
Maintainer

mudomau Apr 25, 2023
Author

haotian-liu Apr 25, 2023
Maintainer