rLLM v0.2: RL Training over General Agentic Programs (Blog Post)

We are excited to release rLLM v0.2, a major upgrade of our RL training framework. In v0.1, rLLM provided agent and OpenAI Gym-like environment abstractions to support training ReACT-style agents. In v0.2, we additionally introduce AgentWorkflowEngine and AgentWorkflowTrainer—more general abstractions that enable arbitrary agentic programs to be trained. Agent builders and researchers can now define multi-agent systems, complex workflows (e.g., solver-judge, planner executor, MCTS), and agentic programs with custom reward functions, and train them with reinforcement learning without rewriting their production code.

Key Features in v0.2

Support the official verl==0.5.0 as training backend, no custom verl fork anymore! verl==0.5.0 comes with support of the following features which are now supported in rLLM (@kylemontgomery1):
- Megatron training support (@jeewoo-lee)
- SGLang as the rollout engine, in addition to vLLM.
Introduce AgentWorkflowEngine, which enables passing in arbitrary agentic programs for training. (@kylemontgomery1)
Support more agents and environments
- Terminus and TerminalBench (@JasonWei05)
- Tongyi DeepResearch agent (@yayashuxue)
- AppWorld and AppWorldReactAgent (@sunan135)
Integration with other agentic framework/SDK
- Strands SDK from AWS
- SmolAgents

What's Changed

fix <tool_calls_begin> variable by @wj-Mcat in #142
Fix not registered license from code by @annyan09023 in #144
fix r2egym import error; update installation README by @jeffreysijuntan in #146
update deepscaler max_prompt_length to avoid exception during training by @jeffreysijuntan in #148
fix(syntax): Resolve invalid escape sequence warnings by @tonyz0x0 in #154
added Tools for SFT by @mananroongta in #160
update docs by @jeffreysijuntan in #167
Add dark mode to docs by @philippnormann in #168
[FIX] Fix tool calling result parsing problem in tranjectory visualizer & MCP tool name fixing by @VincentXWD in #174
[hotfix][miniwob] Fix gymnasium.error.NameNotFound by @abrohamLee in #172
Load full DeepCoder dataset, instead of LCB subset by @mananroongta in #178
[feat][docker] Installation with Docker by @abrohamLee in #177
Add macOS compatibility: exclude GPU dependencies on darwin by @yayashuxue in #180
Torch 2.7.0 only compatible with MacOS python=3.11 by @yayashuxue in #184
Migrate to verl v0.5.0 by @kylemontgomery1 in #193
Terminal Bench Integration into rLLM (Simplified) by @JasonWei05 in #205
feat: Integrate Strands SDK with RLLM for scalable tool-enabled agent training by @yayashuxue in #206
Add VimGolf agent training example by @James4Ever0 in #209
fix: update search engine source data path by @noiji in #216
[feature] Adding Megatron support for v0.2 by @jeewoo-lee in #221
Use RolloutEngine for single_turn_workflow.py by @1stprinciple in #223
Standalone inference: remove hard verl dependency by @JasonWei05 in #228
Update pyproject.toml to v0.2.0 by @NIL-zhuang in #229
proper handling the case that next_observation is empty dict by @erranlli in #233
[v0.2] Add lazy import to fix circular import and ray init config support by @listar2000 in #236
v0.2 verl patch by @kylemontgomery1 in #237
v0.2 masking/parsing fix by @kylemontgomery1 in #238
v0.2 rollout upgrade by @kylemontgomery1 in #241
Feat: deepresearch integration by @yayashuxue in #215
workflow updates by @kylemontgomery1 in #244
added colab example of solver judge by @jeewoo-lee in #246
v0.2 misc changes by @kylemontgomery1 in #245
Add FireworksEngine for disaggregated rollout by @1stprinciple in #243
AppWorld Integration for rLLM by @sunan135 in #235
V0.2 by @jeffreysijuntan in #247
update solver judge workflow by @kylemontgomery1 in #248
update install instructions, update solver judge notebook by @kylemontgomery1 in #249

New Contributors

@wj-Mcat made their first contribution in #142
@annyan09023 made their first contribution in #144
@tonyz0x0 made their first contribution in #154
@mananroongta made their first contribution in #160
@philippnormann made their first contribution in #168
@VincentXWD made their first contribution in #174
@abrohamLee made their first contribution in #172
@yayashuxue made their first contribution in #180
@kylemontgomery1 made their first contribution in #193
@JasonWei05 made their first contribution in #205
@James4Ever0 made their first contribution in #209
@noiji made their first contribution in #216
@jeewoo-lee made their first contribution in #221
@1stprinciple made their first contribution in #223
@NIL-zhuang made their first contribution in #229
@erranlli made their first contribution in #233
@listar2000 made their first contribution in #236
@sunan135 made their first contribution in #235

Full Changelog: https://github.com/rllm-org/rllm/commits/v0.2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

rLLM: v0.2.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

rLLM v0.2: RL Training over General Agentic Programs (Blog Post)

Key Features in v0.2

What's Changed

New Contributors

Contributors

Uh oh!