Skip to content

Conversation

@nancyjlau
Copy link

Added processor support that is tested to work for multimodal VL models like Qwen2.5-VL and Qwen3-VL. Changes include updating the verl submodule to latest main (which includes multimodal support from PRs #2146 and #2398), adding hf_processor loading in rLLM trainers (train_workflow_pipeline.py and train_agent_ppo.py), and bumping transformers to >=4.57.0 for Qwen3-VL.

image

I have tested Qwen/Qwen2.5-VL-3B-Instruct and successfully loaded Qwen2_5_VLProcessor. Qwen3-VL models are supported with transformers >=4.57.0.

Closes #242

…o multimodal models can work with rllm. that way for issue rllm-org#242, multimodal ReAct agents can process and generate images for RL training
@jeffreysijuntan
Copy link
Contributor

Thanks for your PR! This feature would be really helpful. Can you write a simple training example that can verify it?

@yuzeng0-0
Copy link

Added processor support that is tested to work for multimodal VL models like Qwen2.5-VL and Qwen3-VL. Changes include updating the verl submodule to latest main (which includes multimodal support from PRs #2146 and #2398), adding hf_processor loading in rLLM trainers (train_workflow_pipeline.py and train_agent_ppo.py), and bumping transformers to >=4.57.0 for Qwen3-VL.

image I have tested `Qwen/Qwen2.5-VL-3B-Instruct` and successfully loaded `Qwen2_5_VLProcessor`. Qwen3-VL models are supported with `transformers >=4.57.0`.

Closes #242

Could you please provide a test startup script as well as the training dataset used for testing? Thanks!

@huaiyizhao
Copy link

I think we need vllm>=0.11.0 or sglang>=0.5.3 to support qwen3-vl rollout

@nancyjlau
Copy link
Author

nancyjlau commented Oct 28, 2025

Working on getting a environment that works again for examples but yes, this needs an updated sglang version in order to get it to work. Currently in dependency hell trying to recreate a working environment for qwen3-vl.

@jeffreysijuntan
Copy link
Contributor

Does merging this into the nightly version helps? It is currently using verl==0.6.0, so the sglang/vllm version is newer.

@huaiyizhao
Copy link

For vllm==0.11.0, this works for me volcengine/verl#3934

@nancyjlau
Copy link
Author

You were able to get it working on vllm==0.11.0? Could you share the details on how you did that?

I'm currently facing these issues:
- torch 2.6.0+cu124 is the only version that works with flash-attn 2.7.4.post1
- flash attention 2.7.4.post is required for Qwen-3VL
- vllm 0.8.5 is the only version compatible with torch 2.6.0+cu124, but is incompatible with current verl version because of v1 API issues

For SGLang, I was able to get it training until I hit a OOM error on a single GPU, and then am unable to replicate the setup with multiple H100s.

@Osilly
Copy link

Osilly commented Oct 28, 2025

@nancyjlau I would like to know if you can start the RLLM training in the latest version of verl. I tried to replace vert-0.5.0 with the latest one, but it was suspended. As a result, the task pends after outputting the logs:

[36m(vLLMHttpServer pid=1793, ip=10.144.204.96)�[0m INFO:2025-10-28 04:50:43,457:replica_rank=1, node_rank=0, nnodes=1, get worker zmq addresses: ['ipc:///tmp/verl_vllm_zmq_2096_root.ipc', 'ipc:///tmp/verl_vllm_zmq_2097_root.ipc', 'ipc:///tmp/verl_vllm_zmq_2098_root.ipc', 'ipc:///tmp/verl_vllm_zmq_2099_root.ipc', 'ipc:///tmp/verl_vllm_zmq_2100_root.ipc', 'ipc:///tmp/verl_vllm_zmq_2101_root.ipc', 'ipc:///tmp/verl_vllm_zmq_2102_root.ipc', 'ipc:///tmp/verl_vllm_zmq_2103_root.ipc']
�[36m(vLLMHttpServer pid=27028)�[0m INFO:2025-10-28 04:50:49,269:vLLMHttpServer, replica_rank: 0, master address: 10.144.201.219, master port: 38905, data parallel master port: 38093
�[36m(vLLMHttpServer pid=27028)�[0m INFO:2025-10-28 04:50:49,273:override_generation_config: {'temperature': 0.6, 'top_k': -1, 'top_p': 1, 'repetition_penalty': 1.0, 'max_new_tokens': 2048}
�[36m(vLLMHttpServer pid=27028)�[0m INFO:2025-10-28 04:50:49,308:replica_rank=0, node_rank=0, nnodes=1, get worker zmq addresses: ['ipc:///tmp/verl_vllm_zmq_23757_root.ipc', 'ipc:///tmp/verl_vllm_zmq_23758_root.ipc', 'ipc:///tmp/verl_vllm_zmq_23759_root.ipc', 'ipc:///tmp/verl_vllm_zmq_23760_root.ipc', 'ipc:///tmp/verl_vllm_zmq_23761_root.ipc', 'ipc:///tmp/verl_vllm_zmq_23762_root.ipc', 'ipc:///tmp/verl_vllm_zmq_23763_root.ipc', 'ipc:///tmp/verl_vllm_zmq_23764_root.ipc']
�[36m(vLLMHttpServer pid=1793, ip=10.144.204.96)�[0m `torch_dtype` is deprecated! Use `dtype` instead!
�[36m(vLLMHttpServer pid=27028)�[0m `torch_dtype` is deprecated! Use `dtype` instead!

Furthermore, my enviroment is:

torch==2.8.0 
vllm==0.11.0
verl==0.6.0.dev
rllm==0.2.0

@huaiyizhao
Copy link

@nancyjlau cu128+torch2.8.0 is compatible with flash-attn 2.7.4.post1 in my env.
Check out volcengine/verl#3906. volcengine/verl#3934.
截屏2025-10-29 10 49 40

Btw, which sglang version are you using? I encontered some memory leak problem with sglang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

multimodal support

5 participants