Skip to content

Conversation

@CUHKSZzxy
Copy link
Collaborator

@CUHKSZzxy CUHKSZzxy commented Oct 31, 2025

Related

TODO

  • Qwen3-VL-MOE
  • Add documents
  • Video input support ?

@lvhan028 lvhan028 added the enhancement New feature or request label Nov 1, 2025
@CUHKSZzxy
Copy link
Collaborator Author

CUHKSZzxy commented Nov 4, 2025

Improved the config check part, tested with internvl / intern-s1 / qwen3vl / qwen3 / qwen2.5vl / glm4.1v, seems good.

lvhan028
lvhan028 previously approved these changes Nov 4, 2025
@lvhan028 lvhan028 requested a review from grimoire November 4, 2025 12:38
@lvhan028
Copy link
Collaborator

lvhan028 commented Nov 4, 2025

May share the evalution test results

@lvhan028 lvhan028 dismissed their stale review November 5, 2025 06:17

evalution test failed

@lvhan028
Copy link
Collaborator

lvhan028 commented Nov 5, 2025

LLM evaluation test failed by following #4094

@CUHKSZzxy
Copy link
Collaborator Author

CUHKSZzxy commented Nov 6, 2025

LLM evaluation test failed by following #4094

Can reproduce the handler does not exist bug when benchmarking qwen3vl on AIME25 for a while (around 10 ~ 20 min).
The actual error trace logs are as follows

2025-11-06 16:40:12,650 - lmdeploy - ERROR - engine.py:1235 - exception happened: <class 'AssertionError'> 
Traceback (most recent call last):
  File "/nvme1/zhouxinyu/lmdeploy_glm4/lmdeploy/pytorch/engine/engine.py", line 1230, in async_loop
    await self._async_loop_main(resp_que=resp_que,
  File "/nvme1/zhouxinyu/lmdeploy_glm4/lmdeploy/pytorch/engine/engine.py", line 1106, in _async_loop_main
    forward_inputs, next_running = await inputs_maker.send_next_inputs()
  File "/nvme1/zhouxinyu/lmdeploy_glm4/lmdeploy/pytorch/engine/engine.py", line 302, in send_next_inputs
    return await self._send_next_inputs_impl(prefill)
  File "/nvme1/zhouxinyu/lmdeploy_glm4/lmdeploy/pytorch/engine/engine.py", line 286, in _send_next_inputs_impl
    forward_inputs = self._make_forward_inputs(prefill, enable_empty)
  File "/nvme1/zhouxinyu/lmdeploy_glm4/lmdeploy/pytorch/engine/engine.py", line 228, in _make_forward_inputs
    return self.engine._make_forward_inputs(*args, **kwargs)
  File "/nvme1/zhouxinyu/lmdeploy_glm4/lmdeploy/pytorch/engine/engine.py", line 907, in _make_forward_inputs
    scheduler_output = scheduler.schedule(is_prefill=prefill, prealloc_size=prealloc_size)
  File "/nvme1/zhouxinyu/lmdeploy_glm4/lmdeploy/pytorch/paging/scheduler.py", line 305, in schedule
    output = self._schedule_decoding(prealloc_size)
  File "/nvme1/zhouxinyu/lmdeploy_glm4/lmdeploy/utils.py", line 271, in __func_warpper
    return func(*args, **kwargs)
  File "/nvme1/zhouxinyu/lmdeploy_glm4/lmdeploy/pytorch/paging/scheduler.py", line 254, in _schedule_decoding
    assert len(running) != 0
AssertionError

which is the same as the one mentioned in

Therefore, I would conclude that this is a bug related to scheduling, rather than the current qwen3vl codes.

@CUHKSZzxy
Copy link
Collaborator Author

May share the evaluation test results

Tested with VLMEvalKit, dataset: OCRBench, temperature: 0.7, max_new_tokens: 16384

Model Acc Official Acc
Qwen3-VL-4B-Instruct 86.9 88.1
Qwen3-VL-30B-A3B-Instruct 89.6 90.3

Official ACC refers to:
https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct
https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Instruct

Copy link
Collaborator

@grimoire grimoire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lvhan028
Copy link
Collaborator

lvhan028 commented Nov 7, 2025

After merging main, benchmark serving by profile_restful_api.py got errors:

Traceback (most recent call last):
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 1230, in async_loop
    await self._async_loop_main(resp_que=resp_que,
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 1106, in _async_loop_main
    forward_inputs, next_running = await inputs_maker.send_next_inputs()
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 302, in send_next_inputs
    return await self._send_next_inputs_impl(prefill)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 286, in _send_next_inputs_impl
    forward_inputs = self._make_forward_inputs(prefill, enable_empty)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 228, in _make_forward_inputs
    return self.engine._make_forward_inputs(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 907, in _make_forward_inputs
    scheduler_output = scheduler.schedule(is_prefill=prefill, prealloc_size=prealloc_size)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/paging/scheduler.py", line 316, in schedule
    output = self._schedule_decoding(prealloc_size)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nvme1/lvhan/lmdeploy/lmdeploy/utils.py", line 271, in __func_warpper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/nvme1/lvhan/lmdeploy/lmdeploy/pytorch/paging/scheduler.py", line 265, in _schedule_decoding
    assert len(running) != 0
           ^^^^^^^^^^^^^^^^^
AssertionError

@lvhan028
Copy link
Collaborator

lvhan028 commented Nov 7, 2025

I have also deployed two additional models: Qwen/Qwen3-8B and OpenGVLab/InternVL3_5-8B. After benchmarking both services, I confirmed that they functioned properly. Therefore, I suspect that this PR may contain potential issues.

@CUHKSZzxy
Copy link
Collaborator Author

CUHKSZzxy commented Nov 7, 2025

I have also deployed two additional models: Qwen/Qwen3-8B and OpenGVLab/InternVL3_5-8B. After benchmarking both services, I confirmed that they functioned properly. Therefore, I suspect that this PR may contain potential issues.

@lvhan028 Tested with Qwen/Qwen3-8B, with the following benchmark settings, appears that the main branch code still triggers the handler not exist bug.

num_prompts=4000
backend="lmdeploy"
dataset_name="random"
dataset_path="/nvme1/shared/ShareGPT_V3_unfiltered_cleaned_split.json"

echo ">>> num_prompts: ${num_prompts}, dataset: ${dataset_name}"

for in_len in 1024
do
    echo "input len: ${in_len}"

    for out_len in 1024 # 2048 4096 8192 16384 32768
    do
        echo "output len: ${out_len}"

        range_ratio=1
        python3 benchmark/profile_restful_api.py \
            --backend ${backend} \
            --dataset-name ${dataset_name} \
            --dataset-path ${dataset_path} \
            --num-prompts ${num_prompts} \
            --random-input-len ${in_len} \
            --random-output-len ${out_len} \
            --random-range-ratio ${range_ratio} \
            --host 0.0.0.0 --port 23334
    done

done
image

Therefore, I think

does not really resolve the scheduling bug. Additionally, for pure text inputs, the qwen3vl text parts work as a class inherited from qwen3, unlikely to cause scheduling errors. I will insist on the view that this is not a bug caused by the current qwen3vl code.

@lvhan028
Copy link
Collaborator

lvhan028 commented Nov 7, 2025

I have also deployed two additional models: Qwen/Qwen3-8B and OpenGVLab/InternVL3_5-8B. After benchmarking both services, I confirmed that they functioned properly. Therefore, I suspect that this PR may contain potential issues.

@lvhan028 Tested with Qwen/Qwen3-8B, with the following benchmark settings, appears that the main branch code still triggers the handler not exist bug.

num_prompts=4000
backend="lmdeploy"
dataset_name="random"
dataset_path="/nvme1/shared/ShareGPT_V3_unfiltered_cleaned_split.json"

echo ">>> num_prompts: ${num_prompts}, dataset: ${dataset_name}"

for in_len in 1024
do
    echo "input len: ${in_len}"

    for out_len in 1024 # 2048 4096 8192 16384 32768
    do
        echo "output len: ${out_len}"

        range_ratio=1
        python3 benchmark/profile_restful_api.py \
            --backend ${backend} \
            --dataset-name ${dataset_name} \
            --dataset-path ${dataset_path} \
            --num-prompts ${num_prompts} \
            --random-input-len ${in_len} \
            --random-output-len ${out_len} \
            --random-range-ratio ${range_ratio} \
            --host 0.0.0.0 --port 23334
    done

done
image Therefore, I think

does not really resolve the scheduling bug. Additionally, for pure text inputs, the qwen3vl text parts work as a class inherited from qwen3, unlikely to cause scheduling errors. I will insist on the view that this is not a bug caused by the current qwen3vl code.

cc @grimoire

@lvhan028 lvhan028 merged commit bbc4369 into InternLM:main Nov 7, 2025
5 checks passed
@CUHKSZzxy CUHKSZzxy deleted the qwen3-vl branch November 7, 2025 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants