Poor Image Recognition Capabilities #3

LaxmanSinghTomar · 2024-06-17T07:11:31Z

Hello Team,

Thanks for releasing the weights. I tested the model with some of these examples but the quality seems to be very bad and nowhere near the quoted examples in the paper. Is this level of performance expected from the latest weights release or am I doing something wrong here?

zhiyuanyou · 2024-06-18T03:00:49Z

Hello, thanks for your feedback.

We have tested some similar cases.

For the second case, the background is all black, which is not appeared in our training data (our training data contains natural scenes), thus the model thinks that this image has some issues of brightness. The confusing thing is that the model claims brightening instead of darkening. Can you share this image to me through [email protected]?

For the first and third case, we think the distortion judgement is reasonable (i.e., the saturation in the first image is not so good, and the color in the third image is great). But our model is not trained on cartoon images, so high-level recognition may be a little hard. We will also consider add cartoon images in our next release.

LaxmanSinghTomar · 2024-06-18T04:09:47Z

Sure, I'll be mailing these to you. The issue can be explained by the fact that similar images were not part of the training data. Is this also the reason for poor text recognition? For example, in both the first and second cases, the model doesn't seem to recognize the text elements and consistently mistakes them for something else. If so, then including such images in the training data will also enhance the model's performance on images with text elements.

zhiyuanyou · 2024-06-18T04:26:01Z

Yes. You are right. Our training data currently focuses on natural scenes, and does not contain text contents.

In this release, if we find that text is important for many users, we will construct corresponding datasets to solve this problem in next release, which is scheduled in around Sept / Oct.

zhiyuanyou · 2024-06-19T03:10:11Z

We have tested the second case with a screenshot on your image. The input question is the same. However, the response is different as follows. The response does not contain brightening. May I ask have you changed the temperature, top_p parameters? Bellow is the response.

The image depicts a man with the text "Don't Blame Me" above him, set against a dark background with a blurred background image.

The evaluated image exhibits a slight darkening, reducing visibility, especially in shadow areas. Additionally, there is a slight compression artifact present, which can be observed as a subtle blockiness in the image, affecting the smoothness of color transitions.

Overall, the image maintains a reasonable level of quality. The darkening does not severely impact the readability of the text, and the compression does not significantly degrade the image's integrity. The image is still clear and recognizable, though with a minor loss in fidelity.

LaxmanSinghTomar · 2024-06-19T04:17:30Z

I have mailed you the images. I didn't tweak any parameters, and directly used the gradio app and prompt shown in above image for testing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor Image Recognition Capabilities #3

Poor Image Recognition Capabilities #3

LaxmanSinghTomar commented Jun 17, 2024

zhiyuanyou commented Jun 18, 2024 •

edited

Loading

LaxmanSinghTomar commented Jun 18, 2024

zhiyuanyou commented Jun 18, 2024

zhiyuanyou commented Jun 19, 2024 •

edited

Loading

LaxmanSinghTomar commented Jun 19, 2024

Poor Image Recognition Capabilities #3

Poor Image Recognition Capabilities #3

Comments

LaxmanSinghTomar commented Jun 17, 2024

zhiyuanyou commented Jun 18, 2024 • edited Loading

LaxmanSinghTomar commented Jun 18, 2024

zhiyuanyou commented Jun 18, 2024

zhiyuanyou commented Jun 19, 2024 • edited Loading

LaxmanSinghTomar commented Jun 19, 2024

zhiyuanyou commented Jun 18, 2024 •

edited

Loading

zhiyuanyou commented Jun 19, 2024 •

edited

Loading