rLLM v0.2: RL Training over General Agentic Programs (Blog Post)
We are excited to release rLLM v0.2, a major upgrade of our RL training framework. In v0.1, rLLM provided agent and OpenAI Gym-like environment abstractions to support training ReACT-style agents. In v0.2, we additionally introduce AgentWorkflowEngine and AgentWorkflowTrainer—more general abstractions that enable arbitrary agentic programs to be trained. Agent builders and researchers can now define multi-agent systems, complex workflows (e.g., solver-judge, planner executor, MCTS), and agentic programs with custom reward functions, and train them with reinforcement learning without rewriting their production code.
Key Features in v0.2
- Support the official
verl==0.5.0as training backend, no custom verl fork anymore!verl==0.5.0comes with support of the following features which are now supported in rLLM (@kylemontgomery1):- Megatron training support (@jeewoo-lee)
- SGLang as the rollout engine, in addition to vLLM.
- Introduce
AgentWorkflowEngine, which enables passing in arbitrary agentic programs for training. (@kylemontgomery1) - Support more agents and environments
- Terminus and TerminalBench (@JasonWei05)
- Tongyi DeepResearch agent (@yayashuxue)
- AppWorld and AppWorldReactAgent (@sunan135)
- Integration with other agentic framework/SDK
- Strands SDK from AWS
- SmolAgents
What's Changed
- fix <tool_calls_begin> variable by @wj-Mcat in #142
- Fix not registered license from code by @annyan09023 in #144
- fix r2egym import error; update installation README by @jeffreysijuntan in #146
- update deepscaler max_prompt_length to avoid exception during training by @jeffreysijuntan in #148
- fix(syntax): Resolve invalid escape sequence warnings by @tonyz0x0 in #154
- added Tools for SFT by @mananroongta in #160
- update docs by @jeffreysijuntan in #167
- Add dark mode to docs by @philippnormann in #168
- [FIX] Fix tool calling result parsing problem in tranjectory visualizer & MCP tool name fixing by @VincentXWD in #174
- [hotfix][miniwob] Fix gymnasium.error.NameNotFound by @abrohamLee in #172
- Load full DeepCoder dataset, instead of LCB subset by @mananroongta in #178
- [feat][docker] Installation with Docker by @abrohamLee in #177
- Add macOS compatibility: exclude GPU dependencies on darwin by @yayashuxue in #180
- Torch 2.7.0 only compatible with MacOS python=3.11 by @yayashuxue in #184
- Migrate to verl v0.5.0 by @kylemontgomery1 in #193
- Terminal Bench Integration into rLLM (Simplified) by @JasonWei05 in #205
- feat: Integrate Strands SDK with RLLM for scalable tool-enabled agent training by @yayashuxue in #206
- Add VimGolf agent training example by @James4Ever0 in #209
- fix: update search engine source data path by @noiji in #216
- [feature] Adding Megatron support for v0.2 by @jeewoo-lee in #221
- Use RolloutEngine for single_turn_workflow.py by @1stprinciple in #223
- Standalone inference: remove hard verl dependency by @JasonWei05 in #228
- Update pyproject.toml to v0.2.0 by @NIL-zhuang in #229
- proper handling the case that next_observation is empty dict by @erranlli in #233
- [v0.2] Add lazy import to fix circular import and ray init config support by @listar2000 in #236
- v0.2 verl patch by @kylemontgomery1 in #237
- v0.2 masking/parsing fix by @kylemontgomery1 in #238
- v0.2 rollout upgrade by @kylemontgomery1 in #241
- Feat: deepresearch integration by @yayashuxue in #215
- workflow updates by @kylemontgomery1 in #244
- added colab example of solver judge by @jeewoo-lee in #246
- v0.2 misc changes by @kylemontgomery1 in #245
- Add FireworksEngine for disaggregated rollout by @1stprinciple in #243
- AppWorld Integration for rLLM by @sunan135 in #235
- V0.2 by @jeffreysijuntan in #247
- update solver judge workflow by @kylemontgomery1 in #248
- update install instructions, update solver judge notebook by @kylemontgomery1 in #249
New Contributors
- @wj-Mcat made their first contribution in #142
- @annyan09023 made their first contribution in #144
- @tonyz0x0 made their first contribution in #154
- @mananroongta made their first contribution in #160
- @philippnormann made their first contribution in #168
- @VincentXWD made their first contribution in #174
- @abrohamLee made their first contribution in #172
- @yayashuxue made their first contribution in #180
- @kylemontgomery1 made their first contribution in #193
- @JasonWei05 made their first contribution in #205
- @James4Ever0 made their first contribution in #209
- @noiji made their first contribution in #216
- @jeewoo-lee made their first contribution in #221
- @1stprinciple made their first contribution in #223
- @NIL-zhuang made their first contribution in #229
- @erranlli made their first contribution in #233
- @listar2000 made their first contribution in #236
- @sunan135 made their first contribution in #235
Full Changelog: https://github.com/rllm-org/rllm/commits/v0.2.0