This is the official code base for the paper RLVR-World: Training World Models with Reinforcement Learning.
Give it a star 🌟 if you find our work useful!
- 🚩 2025.05.26: We release all models and datasets.
- 🚩 2025.05.21: We open-source our training codes.
- 🚩 2025.05.21: Our paper is released on arXiv.
We pioneer training world models through RLVR:
- World models across various modalities (particularly, language and videos) are unified under a sequence modeling formulation;
- Task-specific prediction metrics serve as verifiable rewards directly optimized by RL.
At the moment, we provide the following models and datasets:
Modality | Type | Domain | Name |
---|---|---|---|
Language | Dataset | Text game | bytesized32-world-model-cot |
Language | World model | Text game | bytesized32-world-model-sft |
Language | World model | Text game | bytesized32-world-model-rlvr-binary-reward |
Language | World model | Text game | bytesized32-world-model-rlvr-task-specific-reward |
Language | Dataset | Web navigation | webarena-world-model-cot |
Language | World model | Web navigation | webarena-world-model-sft |
Language | World model | Web navigation | webarena-world-model-rlvr |
Video | Tokenizer | Robot manipulation | rt1-frame-tokenizer |
Video | World model | Robot manipulation | rt1-world-model-single-step-base |
Video | World model | Robot manipulation | rt1-world-model-single-step-rlvr |
Video | Tokenizer | Robot manipulation | rt1-compressive-tokenizer |
Video | World model | Robot manipulation | rt1-world-model-multi-step-base |
Video | World model | Robot manipulation | rt1-world-model-multi-step-rlvr |
See lang_wm
:
- Text game state prediction
- Web page state prediction
- Application: Model predictive control for web agents
See vid_wm
:
- Robot manipulation trajectory prediction
- Application: Real2Sim policy evaluation
- Video world model with RLVR
- Pre-trained & post-trained video world model weights
- Real2sim policy evaluation with video world models
- Text game SFT data
- Web page SFT data
- Language world model on text games with RLVR
- Language world model on web pages with RLVR
- Post-trained language world model weights
- Web agents with language world models
If you find this project useful, please cite our paper as:
@article{wu2025rlvr,
title={RLVR-World: Training World Models with Reinforcement Learning},
author={Jialong Wu and Shaofeng Yin and Ningya Feng and Mingsheng Long},
journal={arXiv preprint arXiv:2505.13934},
year={2025},
}
If you have any questions, please contact [email protected].
We sincerely appreciate the following github repos for their valuable codebase we build upon: