- 
                Notifications
    
You must be signed in to change notification settings  - Fork 439
 
Enable multimodal model support for RL training #264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…o multimodal models can work with rllm. that way for issue rllm-org#242, multimodal ReAct agents can process and generate images for RL training
…3VLProcessor requires transformers 4.57.0+
| 
           Thanks for your PR! This feature would be really helpful. Can you write a simple training example that can verify it?  | 
    
          
 Could you please provide a test startup script as well as the training dataset used for testing? Thanks!  | 
    
| 
           I think we need vllm>=0.11.0 or sglang>=0.5.3 to support qwen3-vl rollout  | 
    
| 
           Working on getting a environment that works again for examples but yes, this needs an updated sglang version in order to get it to work. Currently in dependency hell trying to recreate a working environment for qwen3-vl.  | 
    
| 
           Does merging this into the nightly version helps? It is currently using   | 
    
| 
           For vllm==0.11.0, this works for me volcengine/verl#3934  | 
    
| 
           You were able to get it working on  I'm currently facing these issues: For SGLang, I was able to get it training until I hit a OOM error on a single GPU, and then am unable to replicate the setup with multiple H100s.  | 
    
| 
           @nancyjlau I would like to know if you can start the RLLM training in the latest version of verl. I tried to replace vert-0.5.0 with the latest one, but it was suspended. As a result, the task pends after outputting the logs: Furthermore, my enviroment is:  | 
    
| 
           @nancyjlau cu128+torch2.8.0 is compatible with flash-attn 2.7.4.post1 in my env. Btw, which sglang version are you using? I encontered some memory leak problem with sglang  | 
    


Added processor support that is tested to work for multimodal VL models like Qwen2.5-VL and Qwen3-VL. Changes include updating the verl submodule to latest main (which includes multimodal support from PRs #2146 and #2398), adding
hf_processorloading in rLLM trainers (train_workflow_pipeline.pyandtrain_agent_ppo.py), and bumping transformers to >=4.57.0 for Qwen3-VL.I have tested
Qwen/Qwen2.5-VL-3B-Instructand successfully loadedQwen2_5_VLProcessor. Qwen3-VL models are supported withtransformers >=4.57.0.Closes #242