-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Composite weight loading for multimodal Qwen2 #10944
Conversation
Signed-off-by: DarkLight1337 <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: DarkLight1337 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
I'm unable to get PP test to pass for Qwen2-VL, but (after wasting quite a bit of time) I realized that it occurs on main branch as well. |
Signed-off-by: DarkLight1337 <[email protected]>
Hmmm, that's odd. The Qwen2-VL PP test on main branch and this branch all passed on my device... |
I'm referring to the test in |
prefix=maybe_prefix( | ||
prefix, "lm_head")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry about this prefix being correct now since in the model checkpoint on HF the weights are just at lm_head
, and so we do the same when specifying the ignored module in compressed tensors https://huggingface.co/nm-testing/Qwen2-VL-2B-Instruct-FP8-dynamic/blob/8a9ad03741a56273d91cf71afbe9b5baa9509e17/config.json#L186
We could add this model to vllm/tests/models/decoder_only/vision_language/test_models.py
to verify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be handled by the weight mapper inside Qwen2-VL weight loading logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Qwen2 (language-only) is already being tested in language models tests.
I tried running the model with PP in online inference and it seems to work fine, maybe it's just some device-specific floating point error? |
@mgoin can you try it on your end as well? Just to be sure. |
I tested and it works
However since ct doesn't support quantized lm_head yet we are not truly able to test the ignore prefix case. vllm/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py Lines 67 to 77 in b26b4cd
I don't think there is anything you can do about this, so we will possibly deal with this in the future when adding quantized lm head support. Thanks! |
…0944) Signed-off-by: DarkLight1337 <[email protected]>
…0944) Signed-off-by: DarkLight1337 <[email protected]>
This PR removes some redundant code in Qwen2-VL and Qwen2-Audio by reusing logic defined by the submodules.