RLVR-World: Training World Models with Reinforcement Learning

This is the official code base for the paper RLVR-World: Training World Models with Reinforcement Learning.

Give it a star 🌟 if you find our work useful!

🔥 News

🚩 2025.05.26: We release all models and datasets.
🚩 2025.05.21: We open-source our training codes.
🚩 2025.05.21: Our paper is released on arXiv.

📋 TL;DR

We pioneer training world models through RLVR:

World models across various modalities (particularly, language and videos) are unified under a sequence modeling formulation;
Task-specific prediction metrics serve as verifiable rewards directly optimized by RL.

🤗 Models and Datasets

At the moment, we provide the following models and datasets:

Modality	Type	Domain	Name
Language	Dataset	Text game	bytesized32-world-model-cot
Language	World model	Text game	bytesized32-world-model-sft
Language	World model	Text game	bytesized32-world-model-rlvr-binary-reward
Language	World model	Text game	bytesized32-world-model-rlvr-task-specific-reward
Language	Dataset	Web navigation	webarena-world-model-cot
Language	World model	Web navigation	webarena-world-model-sft
Language	World model	Web navigation	webarena-world-model-rlvr
Video	Tokenizer	Robot manipulation	rt1-frame-tokenizer
Video	World model	Robot manipulation	rt1-world-model-single-step-base
Video	World model	Robot manipulation	rt1-world-model-single-step-rlvr
Video	Tokenizer	Robot manipulation	rt1-compressive-tokenizer
Video	World model	Robot manipulation	rt1-world-model-multi-step-base
Video	World model	Robot manipulation	rt1-world-model-multi-step-rlvr

💬 Evaluating Language World Models

See lang_wm:

Text game state prediction
Web page state prediction
Application: Model predictive control for web agents

🎇 Evaluating Video World Models

See vid_wm:

Robot manipulation trajectory prediction
Application: Real2Sim policy evaluation

🎥 Showcases

🚀 Release Progress

Video world model with RLVR
Pre-trained & post-trained video world model weights
Real2sim policy evaluation with video world models
Text game SFT data
Web page SFT data
Language world model on text games with RLVR
Language world model on web pages with RLVR
Post-trained language world model weights
Web agents with language world models

📜 Citation

If you find this project useful, please cite our paper as:

@article{wu2025rlvr,
    title={RLVR-World: Training World Models with Reinforcement Learning}, 
    author={Jialong Wu and Shaofeng Yin and Ningya Feng and Mingsheng Long},
    journal={arXiv preprint arXiv:2505.13934},
    year={2025},
}

🤝 Contact

If you have any questions, please contact [email protected].

💡 Acknowledgement

We sincerely appreciate the following github repos for their valuable codebase we build upon:

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
assets		assets
lang_wm		lang_wm
vid_wm		vid_wm
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RLVR-World: Training World Models with Reinforcement Learning

🔥 News

📋 TL;DR

🤗 Models and Datasets

💬 Evaluating Language World Models

🎇 Evaluating Video World Models

🎥 Showcases

🚀 Release Progress

📜 Citation

🤝 Contact

💡 Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Languages

License

thuml/RLVR-World

Folders and files

Latest commit

History

Repository files navigation

RLVR-World: Training World Models with Reinforcement Learning

🔥 News

📋 TL;DR

🤗 Models and Datasets

💬 Evaluating Language World Models

🎇 Evaluating Video World Models

🎥 Showcases

🚀 Release Progress

📜 Citation

🤝 Contact

💡 Acknowledgement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Languages

Packages