-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: LLama 3.2 vision focuses only on first image #10983
Comments
Do you get similar behavior for the HF implementation of the model? It might just be a limitation of the model itself. |
Hello, thank you for the reply. Yes, I do get similar behavior for the HF implementation. Should that limitation be documented somewhere? Everywhere I looked, from the "supported models" docs page to the PRs, led me to believe the model was capable of that. Also, it doesn't make sense to do all the hard work in, e.g., PR #9393 if the model itself does not support multi-image. I'll attempt running the 90b version, depending on the compute required, and report back. EDIT: I can't run the 90b version on my machines |
@heheda12345 do you have more context regarding this? |
This is a known problem. Personally, I feel that this model's multi-image ability is a little limited. A simpler way to reproduce it is to run |
OK, I think then the issue can be closed. @heheda12345, if I may, what other models would you recommend for multi-image? |
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
No matter what I do, llama3.2 vision focuses only on the first image, despite #9095. The tests at
vllm/tests/models/encoder_decoder/vision_language/test_mllama.py
nominally pass, but, if you print the model reponses, you do not get intended behavior.
The most direct way to reproduce this bug is by following the example in PR #9393
Serve the model
Then attempt a conversation
What I get are two descriptions of cereal, despite the second image being of a doll.
I tried a myriad of other methods, such as sending both images in one message, to no avail. Would really appreciate help digging to the core of this bug.
Thank you
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: