Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vllm部署qwen2.5vl无法使用fp8量化 #983

Open
lessmore991 opened this issue Mar 21, 2025 · 0 comments
Open

vllm部署qwen2.5vl无法使用fp8量化 #983

lessmore991 opened this issue Mar 21, 2025 · 0 comments

Comments

@lessmore991
Copy link

使用vllm 0.8.1 版本部署qwen2.5-vl-7B模型时,无法使用fp8量化,请问如何解决。
部署命令如下:
vllm serve Qwen2.5-VL/Qwen2.5-VL-7B-Instruct --port 8083 --quantization fp8

报错如下:

......
Loading safetensors checkpoint shards: 100% Completed | 5/5 [00:03<00:00,  1.51it/s]

INFO 03-21 02:21:53 [loader.py:429] Loading weights took 3.47 seconds
INFO 03-21 02:21:53 [gpu_model_runner.py:1176] Model loading took 8.9031 GB and 3.891568 seconds
INFO 03-21 02:21:53 [gpu_model_runner.py:1421] Encoder cache will be initialized with a budget of 98304 tokens, and profiled with 1 video items of the maximum feature size.
ERROR 03-21 02:21:57 [core.py:340] EngineCore hit an exception: Traceback (most recent call last):
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 332, in run_engine_core
ERROR 03-21 02:21:57 [core.py:340]     engine_core = EngineCoreProc(*args, **kwargs)
ERROR 03-21 02:21:57 [core.py:340]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 287, in __init__
ERROR 03-21 02:21:57 [core.py:340]     super().__init__(vllm_config, executor_class, log_stats)
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 62, in __init__
ERROR 03-21 02:21:57 [core.py:340]     num_gpu_blocks, num_cpu_blocks = self._initialize_kv_caches(
ERROR 03-21 02:21:57 [core.py:340]                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 121, in _initialize_kv_caches
ERROR 03-21 02:21:57 [core.py:340]     available_gpu_memory = self.model_executor.determine_available_memory()
ERROR 03-21 02:21:57 [core.py:340]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 66, in determine_available_memory
ERROR 03-21 02:21:57 [core.py:340]     output = self.collective_rpc("determine_available_memory")
ERROR 03-21 02:21:57 [core.py:340]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 03-21 02:21:57 [core.py:340]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 03-21 02:21:57 [core.py:340]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/vllm/utils.py", line 2216, in run_method
ERROR 03-21 02:21:57 [core.py:340]     return func(*args, **kwargs)
ERROR 03-21 02:21:57 [core.py:340]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 03-21 02:21:57 [core.py:340]     return func(*args, **kwargs)
ERROR 03-21 02:21:57 [core.py:340]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory
ERROR 03-21 02:21:57 [core.py:340]     self.model_runner.profile_run()
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1452, in profile_run
ERROR 03-21 02:21:57 [core.py:340]     dummy_encoder_outputs = self.model.get_multimodal_embeddings(
ERROR 03-21 02:21:57 [core.py:340]                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_5_vl.py", line 975, in get_multimodal_embeddings
ERROR 03-21 02:21:57 [core.py:340]     video_embeddings = self._process_video_input(video_input)
ERROR 03-21 02:21:57 [core.py:340]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_5_vl.py", line 931, in _process_video_input
ERROR 03-21 02:21:57 [core.py:340]     video_embeds = self.visual(pixel_values_videos, grid_thw=grid_thw)
ERROR 03-21 02:21:57 [core.py:340]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 03-21 02:21:57 [core.py:340]     return self._call_impl(*args, **kwargs)
ERROR 03-21 02:21:57 [core.py:340]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 03-21 02:21:57 [core.py:340]     return forward_call(*args, **kwargs)
ERROR 03-21 02:21:57 [core.py:340]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_5_vl.py", line 659, in forward
ERROR 03-21 02:21:57 [core.py:340]     hidden_states = blk(
ERROR 03-21 02:21:57 [core.py:340]                     ^^^^
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 03-21 02:21:57 [core.py:340]     return self._call_impl(*args, **kwargs)
ERROR 03-21 02:21:57 [core.py:340]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 03-21 02:21:57 [core.py:340]     return forward_call(*args, **kwargs)
ERROR 03-21 02:21:57 [core.py:340]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_5_vl.py", line 382, in forward
ERROR 03-21 02:21:57 [core.py:340]     x = x + self.mlp(self.norm2(x))
ERROR 03-21 02:21:57 [core.py:340]             ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 03-21 02:21:57 [core.py:340]     return self._call_impl(*args, **kwargs)
ERROR 03-21 02:21:57 [core.py:340]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 03-21 02:21:57 [core.py:340]     return forward_call(*args, **kwargs)
ERROR 03-21 02:21:57 [core.py:340]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_5_vl.py", line 191, in forward
ERROR 03-21 02:21:57 [core.py:340]     x_gate, _ = self.gate_proj(x)
ERROR 03-21 02:21:57 [core.py:340]                 ^^^^^^^^^^^^^^^^^
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 03-21 02:21:57 [core.py:340]     return self._call_impl(*args, **kwargs)
ERROR 03-21 02:21:57 [core.py:340]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 03-21 02:21:57 [core.py:340]     return forward_call(*args, **kwargs)
ERROR 03-21 02:21:57 [core.py:340]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/layers/linear.py", line 474, in forward
ERROR 03-21 02:21:57 [core.py:340]     output_parallel = self.quant_method.apply(self, input_, bias)
ERROR 03-21 02:21:57 [core.py:340]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/layers/quantization/fp8.py", line 386, in apply
ERROR 03-21 02:21:57 [core.py:340]     return self.fp8_linear.apply(input=x,
ERROR 03-21 02:21:57 [core.py:340]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/layers/quantization/utils/w8a8_utils.py", line 184, in apply
ERROR 03-21 02:21:57 [core.py:340]     output = ops.cutlass_scaled_mm(qinput,
ERROR 03-21 02:21:57 [core.py:340]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-21 02:21:57 [core.py:340]   File "/opt/venv/lib/python3.12/site-packages/vllm/_custom_ops.py", line 523, in cutlass_scaled_mm
ERROR 03-21 02:21:57 [core.py:340]     assert (b.shape[0] % 16 == 0 and b.shape[1] % 16 == 0)
ERROR 03-21 02:21:57 [core.py:340]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-21 02:21:57 [core.py:340] AssertionError

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant