[Model] Composite weight loading for multimodal Qwen2 #10944

DarkLight1337 · 2024-12-06T06:29:36Z

This PR removes some redundant code in Qwen2-VL and Qwen2-Audio by reusing logic defined by the submodules.

Signed-off-by: DarkLight1337 <[email protected]>

github-actions · 2024-12-06T06:29:48Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: DarkLight1337 <[email protected]>

Isotr0py

LGTM!

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2024-12-06T07:47:45Z

I'm unable to get PP test to pass for Qwen2-VL, but (after wasting quite a bit of time) I realized that it occurs on main branch as well.

Signed-off-by: DarkLight1337 <[email protected]>

Isotr0py · 2024-12-06T08:02:35Z

Hmmm, that's odd. The Qwen2-VL PP test on main branch and this branch all passed on my device...

DarkLight1337 · 2024-12-06T08:54:43Z

Hmmm, that's odd. The Qwen2-VL PP test on main branch and this branch all passed on my device...

I'm referring to the test in test_pipeline_parallel.py.

vllm/model_executor/models/qwen2_audio.py

mgoin · 2024-12-06T17:57:38Z

vllm/model_executor/models/qwen2.py

+                                              prefix=maybe_prefix(
+                                                  prefix, "lm_head"))


I worry about this prefix being correct now since in the model checkpoint on HF the weights are just at lm_head, and so we do the same when specifying the ignored module in compressed tensors https://huggingface.co/nm-testing/Qwen2-VL-2B-Instruct-FP8-dynamic/blob/8a9ad03741a56273d91cf71afbe9b5baa9509e17/config.json#L186

We could add this model to vllm/tests/models/decoder_only/vision_language/test_models.py to verify

It should be handled by the weight mapper inside Qwen2-VL weight loading logic.

Qwen2 (language-only) is already being tested in language models tests.

DarkLight1337 · 2024-12-07T13:54:45Z

I'm unable to get PP test to pass for Qwen2-VL, but (after wasting quite a bit of time) I realized that it occurs on main branch as well.

I tried running the model with PP in online inference and it seems to work fine, maybe it's just some device-specific floating point error?

DarkLight1337 · 2024-12-07T13:55:31Z

@mgoin can you try it on your end as well? Just to be sure.

mgoin · 2024-12-07T14:22:24Z

I tested and it works

lm_eval --model vllm --model_args pretrained=nm-testing/Qwen2-VL-2B-Instruct-FP8-dynamic --tasks gsm8k --num_fewshot 5 --batch_size auto
vllm (pretrained=nm-testing/Qwen2-VL-2B-Instruct-FP8-dynamic), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.5125|±  |0.0138|
|     |       |strict-match    |     5|exact_match|↑  |0.4693|±  |0.0137|

However since ct doesn't support quantized lm_head yet we are not truly able to test the ignore prefix case.

vllm/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py

Lines 67 to 77 in b26b4cd

    
           if should_ignore_layer(prefix, ignore=self.ignore): 
        
               return UnquantizedLinearMethod() 
        
           if isinstance(layer, LinearBase): 
        
               scheme = self.get_scheme(layer=layer, layer_name=prefix) 
        
               layer.scheme = scheme 
        
               return CompressedTensorsLinearMethod(self) 
        
           if isinstance(layer, Attention): 
        
               return CompressedTensorsKVCacheMethod(self) 
        
           if isinstance(layer, FusedMoE): 
        
               return CompressedTensorsMoEMethod.get_moe_method(self) 
        
           return None

I don't think there is anything you can do about this, so we will possibly deal with this in the future when adding quantized lm head support. Thanks!

…0944) Signed-off-by: DarkLight1337 <[email protected]>

Composite weight loading for multimodal Qwen2

32b7e08

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 6, 2024

DarkLight1337 requested review from Isotr0py and ywang96 December 6, 2024 06:30

Fix HF config not defining architectures even when it's overridden

b66e4b3

Signed-off-by: DarkLight1337 <[email protected]>

Isotr0py approved these changes Dec 6, 2024

View reviewed changes

DarkLight1337 added 2 commits December 6, 2024 07:35

Fix PP

b80777e

Signed-off-by: DarkLight1337 <[email protected]>

Revert

787138b

Signed-off-by: DarkLight1337 <[email protected]>

Avoid warning spam

4a72d29

Signed-off-by: DarkLight1337 <[email protected]>

mgoin reviewed Dec 6, 2024

View reviewed changes

vllm/model_executor/models/qwen2_audio.py Show resolved Hide resolved

mgoin reviewed Dec 6, 2024

View reviewed changes

mgoin approved these changes Dec 7, 2024

View reviewed changes

mgoin merged commit bf0e382 into vllm-project:main Dec 7, 2024
52 checks passed

DarkLight1337 deleted the composite-qwen2-mm branch December 7, 2024 14:26

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[Model] Composite weight loading for multimodal Qwen2 (vllm-project#1…

0b1345f

…0944) Signed-off-by: DarkLight1337 <[email protected]>

BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024

[Model] Composite weight loading for multimodal Qwen2 (vllm-project#1…

40b588e

…0944) Signed-off-by: DarkLight1337 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] Composite weight loading for multimodal Qwen2 #10944

[Model] Composite weight loading for multimodal Qwen2 #10944

DarkLight1337 commented Dec 6, 2024

github-actions bot commented Dec 6, 2024

Isotr0py left a comment

DarkLight1337 commented Dec 6, 2024

Isotr0py commented Dec 6, 2024

DarkLight1337 commented Dec 6, 2024

mgoin Dec 6, 2024

DarkLight1337 Dec 7, 2024

DarkLight1337 Dec 7, 2024

DarkLight1337 commented Dec 7, 2024

DarkLight1337 commented Dec 7, 2024

mgoin commented Dec 7, 2024

[Model] Composite weight loading for multimodal Qwen2 #10944

[Model] Composite weight loading for multimodal Qwen2 #10944

Conversation

DarkLight1337 commented Dec 6, 2024

github-actions bot commented Dec 6, 2024

Isotr0py left a comment

Choose a reason for hiding this comment

DarkLight1337 commented Dec 6, 2024

Isotr0py commented Dec 6, 2024

DarkLight1337 commented Dec 6, 2024

mgoin Dec 6, 2024

Choose a reason for hiding this comment

DarkLight1337 Dec 7, 2024

Choose a reason for hiding this comment

DarkLight1337 Dec 7, 2024

Choose a reason for hiding this comment

DarkLight1337 commented Dec 7, 2024

DarkLight1337 commented Dec 7, 2024

mgoin commented Dec 7, 2024