Support Qwen3-VL #4093

CUHKSZzxy · 2025-10-31T12:03:52Z

TODO

Qwen3-VL-MOE
Add documents
Video input support ?

This reverts commit c979730.

README_zh-CN.md

lmdeploy/pytorch/config.py

lmdeploy/pytorch/configurations/default.py

lmdeploy/pytorch/models/qwen3.py

lmdeploy/pytorch/models/qwen3_moe.py

CUHKSZzxy · 2025-11-04T06:52:57Z

Improved the config check part, tested with internvl / intern-s1 / qwen3vl / qwen3 / qwen2.5vl / glm4.1v, seems good.

lvhan028 · 2025-11-04T13:07:49Z

May share the evalution test results

lmdeploy/pytorch/config.py

lmdeploy/pytorch/models/qwen3_vl_moe.py

evalution test failed

lvhan028 · 2025-11-05T06:18:18Z

LLM evaluation test failed by following #4094

CUHKSZzxy · 2025-11-06T08:44:51Z

LLM evaluation test failed by following #4094

Can reproduce the handler does not exist bug when benchmarking qwen3vl on AIME25 for a while (around 10 ~ 20 min).
The actual error trace logs are as follows

2025-11-06 16:40:12,650 - lmdeploy - ERROR - engine.py:1235 - exception happened: <class 'AssertionError'> 
Traceback (most recent call last):
  File "/nvme1/zhouxinyu/lmdeploy_glm4/lmdeploy/pytorch/engine/engine.py", line 1230, in async_loop
    await self._async_loop_main(resp_que=resp_que,
  File "/nvme1/zhouxinyu/lmdeploy_glm4/lmdeploy/pytorch/engine/engine.py", line 1106, in _async_loop_main
    forward_inputs, next_running = await inputs_maker.send_next_inputs()
  File "/nvme1/zhouxinyu/lmdeploy_glm4/lmdeploy/pytorch/engine/engine.py", line 302, in send_next_inputs
    return await self._send_next_inputs_impl(prefill)
  File "/nvme1/zhouxinyu/lmdeploy_glm4/lmdeploy/pytorch/engine/engine.py", line 286, in _send_next_inputs_impl
    forward_inputs = self._make_forward_inputs(prefill, enable_empty)
  File "/nvme1/zhouxinyu/lmdeploy_glm4/lmdeploy/pytorch/engine/engine.py", line 228, in _make_forward_inputs
    return self.engine._make_forward_inputs(*args, **kwargs)
  File "/nvme1/zhouxinyu/lmdeploy_glm4/lmdeploy/pytorch/engine/engine.py", line 907, in _make_forward_inputs
    scheduler_output = scheduler.schedule(is_prefill=prefill, prealloc_size=prealloc_size)
  File "/nvme1/zhouxinyu/lmdeploy_glm4/lmdeploy/pytorch/paging/scheduler.py", line 305, in schedule
    output = self._schedule_decoding(prealloc_size)
  File "/nvme1/zhouxinyu/lmdeploy_glm4/lmdeploy/utils.py", line 271, in __func_warpper
    return func(*args, **kwargs)
  File "/nvme1/zhouxinyu/lmdeploy_glm4/lmdeploy/pytorch/paging/scheduler.py", line 254, in _schedule_decoding
    assert len(running) != 0
AssertionError

which is the same as the one mentioned in

fix bug: schedule ratio support prefix-caching #4100

Therefore, I would conclude that this is a bug related to scheduling, rather than the current qwen3vl codes.

CUHKSZzxy · 2025-11-06T12:14:52Z

May share the evaluation test results

Tested with VLMEvalKit, dataset: OCRBench, temperature: 0.7, max_new_tokens: 16384

Model	Acc	Official Acc
Qwen3-VL-4B-Instruct	86.9	88.1
Qwen3-VL-30B-A3B-Instruct	89.6	90.3

Official ACC refers to:
https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct
https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Instruct

grimoire

LGTM

lvhan028 · 2025-11-07T04:48:12Z

After merging main, benchmark serving by profile_restful_api.py got errors:

Traceback (most recent call last):
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 1230, in async_loop
    await self._async_loop_main(resp_que=resp_que,
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 1106, in _async_loop_main
    forward_inputs, next_running = await inputs_maker.send_next_inputs()
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 302, in send_next_inputs
    return await self._send_next_inputs_impl(prefill)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 286, in _send_next_inputs_impl
    forward_inputs = self._make_forward_inputs(prefill, enable_empty)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 228, in _make_forward_inputs
    return self.engine._make_forward_inputs(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 907, in _make_forward_inputs
    scheduler_output = scheduler.schedule(is_prefill=prefill, prealloc_size=prealloc_size)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/paging/scheduler.py", line 316, in schedule
    output = self._schedule_decoding(prealloc_size)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nvme1/lvhan/lmdeploy/lmdeploy/utils.py", line 271, in __func_warpper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/paging/scheduler.py", line 265, in _schedule_decoding
    assert len(running) != 0
           ^^^^^^^^^^^^^^^^^
AssertionError

lvhan028 · 2025-11-07T05:15:44Z

I have also deployed two additional models: Qwen/Qwen3-8B and OpenGVLab/InternVL3_5-8B. After benchmarking both services, I confirmed that they functioned properly. Therefore, I suspect that this PR may contain potential issues.

CUHKSZzxy · 2025-11-07T06:37:48Z

I have also deployed two additional models: Qwen/Qwen3-8B and OpenGVLab/InternVL3_5-8B. After benchmarking both services, I confirmed that they functioned properly. Therefore, I suspect that this PR may contain potential issues.

@lvhan028 Tested with Qwen/Qwen3-8B, with the following benchmark settings, appears that the main branch code still triggers the handler not exist bug.

num_prompts=4000
backend="lmdeploy"
dataset_name="random"
dataset_path="/nvme1/shared/ShareGPT_V3_unfiltered_cleaned_split.json"

echo ">>> num_prompts: ${num_prompts}, dataset: ${dataset_name}"

for in_len in 1024
do
    echo "input len: ${in_len}"

    for out_len in 1024 # 2048 4096 8192 16384 32768
    do
        echo "output len: ${out_len}"

        range_ratio=1
        python3 benchmark/profile_restful_api.py \
            --backend ${backend} \
            --dataset-name ${dataset_name} \
            --dataset-path ${dataset_path} \
            --num-prompts ${num_prompts} \
            --random-input-len ${in_len} \
            --random-output-len ${out_len} \
            --random-range-ratio ${range_ratio} \
            --host 0.0.0.0 --port 23334
    done

done

Therefore, I think

fix bug: schedule ratio support prefix-caching #4100

does not really resolve the scheduling bug. Additionally, for pure text inputs, the qwen3vl text parts work as a class inherited from qwen3, unlikely to cause scheduling errors. I will insist on the view that this is not a bug caused by the current qwen3vl code.

lvhan028 · 2025-11-07T08:57:09Z

I have also deployed two additional models: Qwen/Qwen3-8B and OpenGVLab/InternVL3_5-8B. After benchmarking both services, I confirmed that they functioned properly. Therefore, I suspect that this PR may contain potential issues.

@lvhan028 Tested with Qwen/Qwen3-8B, with the following benchmark settings, appears that the main branch code still triggers the handler not exist bug.
num_prompts=4000
backend="lmdeploy"
dataset_name="random"
dataset_path="/nvme1/shared/ShareGPT_V3_unfiltered_cleaned_split.json"

echo ">>> num_prompts: ${num_prompts}, dataset: ${dataset_name}"

for in_len in 1024
do
    echo "input len: ${in_len}"

    for out_len in 1024 # 2048 4096 8192 16384 32768
    do
        echo "output len: ${out_len}"

        range_ratio=1
        python3 benchmark/profile_restful_api.py \
            --backend ${backend} \
            --dataset-name ${dataset_name} \
            --dataset-path ${dataset_path} \
            --num-prompts ${num_prompts} \
            --random-input-len ${in_len} \
            --random-output-len ${out_len} \
            --random-range-ratio ${range_ratio} \
            --host 0.0.0.0 --port 23334
    done

done
Therefore, I think

fix bug: schedule ratio support prefix-caching #4100

does not really resolve the scheduling bug. Additionally, for pure text inputs, the qwen3vl text parts work as a class inherited from qwen3, unlikely to cause scheduling errors. I will insist on the view that this is not a bug caused by the current qwen3vl code.

cc @grimoire

CUHKSZzxy added 4 commits October 31, 2025 19:58

support qwen3vl dense

172f98e

cleanups

e472bd0

cleanups

a4c10aa

reuse input processor

73a5e95

CUHKSZzxy mentioned this pull request Oct 31, 2025

[Feature] 请求支持qwen3 vl #4025

Closed

lvhan028 added the enhancement New feature or request label Nov 1, 2025

CUHKSZzxy added 3 commits November 3, 2025 17:43

support qwen3vl moe, add docs

01b7a97

format

c979730

Revert "format"

d034751

This reverts commit c979730.

CUHKSZzxy mentioned this pull request Nov 4, 2025

[Bug] Bug when deploy Qwen3-VL-32B-Instruct #4098

Closed

3 tasks