Enable multimodal model support for RL training #264

nancyjlau · 2025-10-24T01:32:45Z

Added processor support that is tested to work for multimodal VL models like Qwen2.5-VL and Qwen3-VL. Changes include updating the verl submodule to latest main (which includes multimodal support from PRs #2146 and #2398), adding hf_processor loading in rLLM trainers (train_workflow_pipeline.py and train_agent_ppo.py), and bumping transformers to >=4.57.0 for Qwen3-VL.

I have tested Qwen/Qwen2.5-VL-3B-Instruct and successfully loaded Qwen2_5_VLProcessor. Qwen3-VL models are supported with transformers >=4.57.0.

Closes #242

…o multimodal models can work with rllm. that way for issue rllm-org#242, multimodal ReAct agents can process and generate images for RL training

…3VLProcessor requires transformers 4.57.0+

jeffreysijuntan · 2025-10-26T08:11:06Z

Thanks for your PR! This feature would be really helpful. Can you write a simple training example that can verify it?

yuzeng0-0 · 2025-10-27T04:25:58Z

Added processor support that is tested to work for multimodal VL models like Qwen2.5-VL and Qwen3-VL. Changes include updating the verl submodule to latest main (which includes multimodal support from PRs #2146 and #2398), adding hf_processor loading in rLLM trainers (train_workflow_pipeline.py and train_agent_ppo.py), and bumping transformers to >=4.57.0 for Qwen3-VL.
I have tested `Qwen/Qwen2.5-VL-3B-Instruct` and successfully loaded `Qwen2_5_VLProcessor`. Qwen3-VL models are supported with `transformers >=4.57.0`.
Closes #242

Could you please provide a test startup script as well as the training dataset used for testing? Thanks!

huaiyizhao · 2025-10-27T15:13:02Z

I think we need vllm>=0.11.0 or sglang>=0.5.3 to support qwen3-vl rollout

nancyjlau · 2025-10-28T01:06:29Z

Working on getting a environment that works again for examples but yes, this needs an updated sglang version in order to get it to work. Currently in dependency hell trying to recreate a working environment for qwen3-vl.

jeffreysijuntan · 2025-10-28T01:33:45Z

Does merging this into the nightly version helps? It is currently using verl==0.6.0, so the sglang/vllm version is newer.

huaiyizhao · 2025-10-28T06:48:05Z

For vllm==0.11.0, this works for me volcengine/verl#3934

nancyjlau · 2025-10-28T15:29:31Z

You were able to get it working on vllm==0.11.0? Could you share the details on how you did that?

I'm currently facing these issues:
- torch 2.6.0+cu124 is the only version that works with flash-attn 2.7.4.post1
- flash attention 2.7.4.post is required for Qwen-3VL
- vllm 0.8.5 is the only version compatible with torch 2.6.0+cu124, but is incompatible with current verl version because of v1 API issues

For SGLang, I was able to get it training until I hit a OOM error on a single GPU, and then am unable to replicate the setup with multiple H100s.

Osilly · 2025-10-28T16:33:01Z

@nancyjlau I would like to know if you can start the RLLM training in the latest version of verl. I tried to replace vert-0.5.0 with the latest one, but it was suspended. As a result, the task pends after outputting the logs:

[36m(vLLMHttpServer pid=1793, ip=10.144.204.96)�[0m INFO:2025-10-28 04:50:43,457:replica_rank=1, node_rank=0, nnodes=1, get worker zmq addresses: ['ipc:///tmp/verl_vllm_zmq_2096_root.ipc', 'ipc:///tmp/verl_vllm_zmq_2097_root.ipc', 'ipc:///tmp/verl_vllm_zmq_2098_root.ipc', 'ipc:///tmp/verl_vllm_zmq_2099_root.ipc', 'ipc:///tmp/verl_vllm_zmq_2100_root.ipc', 'ipc:///tmp/verl_vllm_zmq_2101_root.ipc', 'ipc:///tmp/verl_vllm_zmq_2102_root.ipc', 'ipc:///tmp/verl_vllm_zmq_2103_root.ipc']
�[36m(vLLMHttpServer pid=27028)�[0m INFO:2025-10-28 04:50:49,269:vLLMHttpServer, replica_rank: 0, master address: 10.144.201.219, master port: 38905, data parallel master port: 38093
�[36m(vLLMHttpServer pid=27028)�[0m INFO:2025-10-28 04:50:49,273:override_generation_config: {'temperature': 0.6, 'top_k': -1, 'top_p': 1, 'repetition_penalty': 1.0, 'max_new_tokens': 2048}
�[36m(vLLMHttpServer pid=27028)�[0m INFO:2025-10-28 04:50:49,308:replica_rank=0, node_rank=0, nnodes=1, get worker zmq addresses: ['ipc:///tmp/verl_vllm_zmq_23757_root.ipc', 'ipc:///tmp/verl_vllm_zmq_23758_root.ipc', 'ipc:///tmp/verl_vllm_zmq_23759_root.ipc', 'ipc:///tmp/verl_vllm_zmq_23760_root.ipc', 'ipc:///tmp/verl_vllm_zmq_23761_root.ipc', 'ipc:///tmp/verl_vllm_zmq_23762_root.ipc', 'ipc:///tmp/verl_vllm_zmq_23763_root.ipc', 'ipc:///tmp/verl_vllm_zmq_23764_root.ipc']
�[36m(vLLMHttpServer pid=1793, ip=10.144.204.96)�[0m `torch_dtype` is deprecated! Use `dtype` instead!
�[36m(vLLMHttpServer pid=27028)�[0m `torch_dtype` is deprecated! Use `dtype` instead!

Furthermore, my enviroment is:

torch==2.8.0 
vllm==0.11.0
verl==0.6.0.dev
rllm==0.2.0

huaiyizhao · 2025-10-29T02:51:09Z

@nancyjlau cu128+torch2.8.0 is compatible with flash-attn 2.7.4.post1 in my env.
Check out volcengine/verl#3906. volcengine/verl#3934.

Btw, which sglang version are you using? I encontered some memory leak problem with sglang

nancyjlau added 2 commits October 23, 2025 16:03

add verl submodule to latest main and then enable processor support s…

dccce95

…o multimodal models can work with rllm. that way for issue rllm-org#242, multimodal ReAct agents can process and generate images for RL training

bump transformers to >=4.57.0 for Qwen3-VL processor support, as Qwen…

af5e9e0

…3VLProcessor requires transformers 4.57.0+

Merge branch 'rllm-org:main' into feat/multimodal-support

87f693b

Merge branch 'rllm-org:main' into feat/multimodal-support

fd4f666

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable multimodal model support for RL training #264

Enable multimodal model support for RL training #264

nancyjlau commented Oct 24, 2025

Uh oh!

jeffreysijuntan commented Oct 26, 2025

Uh oh!

yuzeng0-0 commented Oct 27, 2025

Uh oh!

huaiyizhao commented Oct 27, 2025

Uh oh!

nancyjlau commented Oct 28, 2025 •

edited

Loading

Uh oh!

jeffreysijuntan commented Oct 28, 2025

Uh oh!

huaiyizhao commented Oct 28, 2025

Uh oh!

nancyjlau commented Oct 28, 2025

Uh oh!

Osilly commented Oct 28, 2025 •

edited

Loading

Uh oh!

huaiyizhao commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Enable multimodal model support for RL training #264

Are you sure you want to change the base?

Enable multimodal model support for RL training #264

Conversation

nancyjlau commented Oct 24, 2025

Uh oh!

jeffreysijuntan commented Oct 26, 2025

Uh oh!

yuzeng0-0 commented Oct 27, 2025

Uh oh!

huaiyizhao commented Oct 27, 2025

Uh oh!

nancyjlau commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffreysijuntan commented Oct 28, 2025

Uh oh!

huaiyizhao commented Oct 28, 2025

Uh oh!

nancyjlau commented Oct 28, 2025

Uh oh!

Osilly commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

huaiyizhao commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

nancyjlau commented Oct 28, 2025 •

edited

Loading

Osilly commented Oct 28, 2025 •

edited

Loading