Skip to content

thuml/RLVR-World

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RLVR-World: Training World Models with Reinforcement Learning

Project Page Paper Hugging Face

This is the official code base for the paper RLVR-World: Training World Models with Reinforcement Learning.

Give it a star 🌟 if you find our work useful!

🔥 News

  • 🚩 2025.05.26: We release all models and datasets.
  • 🚩 2025.05.21: We open-source our training codes.
  • 🚩 2025.05.21: Our paper is released on arXiv.

📋 TL;DR

We pioneer training world models through RLVR:

  • World models across various modalities (particularly, language and videos) are unified under a sequence modeling formulation;
  • Task-specific prediction metrics serve as verifiable rewards directly optimized by RL.

concept

🤗 Models and Datasets

At the moment, we provide the following models and datasets:

Modality Type Domain Name
Language Dataset Text game bytesized32-world-model-cot
Language World model Text game bytesized32-world-model-sft
Language World model Text game bytesized32-world-model-rlvr-binary-reward
Language World model Text game bytesized32-world-model-rlvr-task-specific-reward
Language Dataset Web navigation webarena-world-model-cot
Language World model Web navigation webarena-world-model-sft
Language World model Web navigation webarena-world-model-rlvr
Video Tokenizer Robot manipulation rt1-frame-tokenizer
Video World model Robot manipulation rt1-world-model-single-step-base
Video World model Robot manipulation rt1-world-model-single-step-rlvr
Video Tokenizer Robot manipulation rt1-compressive-tokenizer
Video World model Robot manipulation rt1-world-model-multi-step-base
Video World model Robot manipulation rt1-world-model-multi-step-rlvr

💬 Evaluating Language World Models

See lang_wm:

  • Text game state prediction
  • Web page state prediction
  • Application: Model predictive control for web agents

🎇 Evaluating Video World Models

See vid_wm:

  • Robot manipulation trajectory prediction
  • Application: Real2Sim policy evaluation

🎥 Showcases

showcase

🚀 Release Progress

  • Video world model with RLVR
  • Pre-trained & post-trained video world model weights
  • Real2sim policy evaluation with video world models
  • Text game SFT data
  • Web page SFT data
  • Language world model on text games with RLVR
  • Language world model on web pages with RLVR
  • Post-trained language world model weights
  • Web agents with language world models

📜 Citation

If you find this project useful, please cite our paper as:

@article{wu2025rlvr,
    title={RLVR-World: Training World Models with Reinforcement Learning}, 
    author={Jialong Wu and Shaofeng Yin and Ningya Feng and Mingsheng Long},
    journal={arXiv preprint arXiv:2505.13934},
    year={2025},
}

🤝 Contact

If you have any questions, please contact [email protected].

💡 Acknowledgement

We sincerely appreciate the following github repos for their valuable codebase we build upon: