Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor Image Recognition Capabilities #3

Open
LaxmanSinghTomar opened this issue Jun 17, 2024 · 5 comments
Open

Poor Image Recognition Capabilities #3

LaxmanSinghTomar opened this issue Jun 17, 2024 · 5 comments

Comments

@LaxmanSinghTomar
Copy link

Hello Team,

Thanks for releasing the weights. I tested the model with some of these examples but the quality seems to be very bad and nowhere near the quoted examples in the paper. Is this level of performance expected from the latest weights release or am I doing something wrong here?

CleanShot 2024-06-17 at 11 56 35@2x
CleanShot 2024-06-17 at 12 03 24@2x
CleanShot 2024-06-17 at 12 37 38@2x

@zhiyuanyou
Copy link
Collaborator

zhiyuanyou commented Jun 18, 2024

Hello, thanks for your feedback.

We have tested some similar cases.

For the second case, the background is all black, which is not appeared in our training data (our training data contains natural scenes), thus the model thinks that this image has some issues of brightness. The confusing thing is that the model claims brightening instead of darkening. Can you share this image to me through [email protected]?

For the first and third case, we think the distortion judgement is reasonable (i.e., the saturation in the first image is not so good, and the color in the third image is great). But our model is not trained on cartoon images, so high-level recognition may be a little hard. We will also consider add cartoon images in our next release.

@LaxmanSinghTomar
Copy link
Author

Sure, I'll be mailing these to you. The issue can be explained by the fact that similar images were not part of the training data. Is this also the reason for poor text recognition? For example, in both the first and second cases, the model doesn't seem to recognize the text elements and consistently mistakes them for something else. If so, then including such images in the training data will also enhance the model's performance on images with text elements.

@zhiyuanyou
Copy link
Collaborator

Yes. You are right. Our training data currently focuses on natural scenes, and does not contain text contents.

In this release, if we find that text is important for many users, we will construct corresponding datasets to solve this problem in next release, which is scheduled in around Sept / Oct.

@zhiyuanyou
Copy link
Collaborator

zhiyuanyou commented Jun 19, 2024

We have tested the second case with a screenshot on your image. The input question is the same. However, the response is different as follows. The response does not contain brightening. May I ask have you changed the temperature, top_p parameters? Bellow is the response.

The image depicts a man with the text "Don't Blame Me" above him, set against a dark background with a blurred background image.

The evaluated image exhibits a slight darkening, reducing visibility, especially in shadow areas. Additionally, there is a slight compression artifact present, which can be observed as a subtle blockiness in the image, affecting the smoothness of color transitions.

Overall, the image maintains a reasonable level of quality. The darkening does not severely impact the readability of the text, and the compression does not significantly degrade the image's integrity. The image is still clear and recognizable, though with a minor loss in fidelity.

@LaxmanSinghTomar
Copy link
Author

I have mailed you the images. I didn't tweak any parameters, and directly used the gradio app and prompt shown in above image for testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants