Yunfan Jiang, Chen Wang, Ruohan Zhang, Jiajun Wu, Li Fei-Fei
Conference on Robot Learning (CoRL) 2024
[Website] [arXiv] [PDF] [TRANSIC-Envs] [Model Weights] [Training Data] [Model Card] [Data Card]
Learning in simulation and transferring the learned policy to the real world has the potential to enable generalist robots. The key challenge of this approach is to address simulation-to-reality (sim-to-real) gaps. Previous methods often require domain-specific knowledge a priori. We argue that a straightforward way to obtain such knowledge is by asking humans to observe and assist robot policy execution in the real world. The robots can then learn from humans to close various sim-to-real gaps. We propose TRANSIC, a data-driven approach to enable successful sim-to-real transfer based on a human-in-the-loop framework. TRANSIC allows humans to augment simulation policies to overcome various unmodeled sim-to-real gaps holistically through intervention and online correction. Residual policies can be learned from human corrections and integrated with simulation policies for autonomous execution. We show that our approach can achieve successful sim-to-real transfer in complex and contact-rich manipulation tasks such as furniture assembly. Through synergistic integration of policies learned in simulation and from humans, TRANSIC is effective as a holistic approach to addressing various, often coexisting sim-to-real gaps. It displays attractive properties such as scaling with human effort.
First follow the instruction to create a virtual environment, install IsaacGym, and install our simulation codebase TRANSIC-Envs.
Now clone this repo and install it.
git clone https://github.com/transic-robot/transic
cd transic
pip3 install -e .
Optionally, if you would like to use our model checkpoints and training data, download them from 🤗Hugging Face.
git clone https://huggingface.co/transic-robot/models transic-models
git clone https://huggingface.co/datasets/transic-robot/data transic-data
The basic syntax to launch teacher policy RL training is
python3 main/rl/train.py task=<task_name> num_envs=<num_of_parallel_envs> \
sim_device=cuda:<gpu_id> rl_device=cuda:<gpu_id> graphics_device_id=<gpu_id>
You need to replace anything within <>
with suitable values. For example, you can select task_name
with one from here.
Tip
You may need to tune the number of parallel envs num_envs=<num_of_parallel_envs>
depending on your GPU memory to achieve the maximum throughput.
Tip
You may use wandb to log experiments. To do this, add wandb_activate=true
to the command and specify your wandb username and project name through wandb_entity=<your_wandb_user_name> wandb_project=<your_wandb_project_name>
.
The training command will create a folder called runs/{experiment_name}
under the current directory, where you can find the training config and saved checkpoints.
To test a checkpoint, run the following command.
python3 main/rl/train.py task=<task_name> num_envs=<num_of_parallel_envs> \
test=true checkpoint=<path_to_your_checkpoint>
Tip
To visualize a trained policy, use either display=true
or headless=false
. The first option will pop up an OpenCV window showing the env-level workspace from a frontal view. This doesn't require a physical monitor attached. The second option will open the IsaacGym GUI and you will see all parallel environments. This REQUIRES a physical monitor connected to your workstation.
Tip
You can also log policy rollouts as mp4 videos to your wandb. Simply add capture_video=true
to the test command.
We use trained teacher policies to generate data for student policies. To do so, simply run the following command.
python3 main/rl/train.py task=<task_name> num_envs=<num_of_parallel_envs> \
test=true checkpoint=<path_to_your_checkpoint> \
save_rollouts=true
Rollouts will be saved in runs/{experiment_name}/rollouts.hdf5
.
Tip
By default, this will generate 10K successful trajectories. Each trajectory will have a minium length of 20 steps. You can change these behaviors by setting save_successful_rollouts_only
, num_rollouts_to_save
, and min_episode_length
.
We provide weights for trained RL teachers. To use them, replace checkpoint
with the suitable path. For example,
python3 main/rl/train.py task=Stabilize \
test=true checkpoint=<path_to_transic-models/rl/stabilize.pth> \
save_rollouts=true
We also provide pre-generated data for student distillation. They can be found in the distillation
folder from our 🤗Hugging Face data repository.
The basic syntax to launch student policy distillation is
python3 main/distillation/train.py task=<task_name> distillation_student_arch=<arch> \
bs=<batch_size> num_envs=<num_of_parallel_envs> exp_root_dir=<where_to_log_experiment> \
data_path=<path_to_hdf5_file> matched_scene_data_path=<path_to_matched_scene_data> \
sim_device=cuda:<gpu_id> rl_device=cuda:<gpu_id> graphics_device_id=<gpu_id> gpus=\[<gpus>\] \
wandb_project=<your_wandb_project_name>
Similarly, you need to replace anything within <>
with suitable values. For example, you can select task_name
with one from here. But make sure they have the PCD
suffix since you are training student policies with visual observations. You can select either pointnet
or rnn_pointnet
for policy architecture. You may need to tune the batch size bs
and number of parallel environments num_envs
to fit into your GPU. The exp_root_dir
specifies where you would like to log the experiment. The data_path
is where your generated rollouts are saved. The matched_scene_data_path
is a static and fixed dataset we used to regularize the point cloud encoder. It can be found as distillation/matched_point_cloud_scesim_device=cuda:nes.h5
from our 🤗Hugging Face data repository.
Warning
By default we add data randomization during the distillation. You may opt to set module.enable_pcd_augmentation=false
to turn off point cloud augmentation and module.enable_prop_augmentation=false
to turn off proprioception augmentation. But this will lead to suboptimal student policies that are not robust enough for sim-to-real transfer.
Tip
The argument gpus
specifies the devices to use for distillation and follows the same syntax as in PyTorch Lightning. Other device-related arguments such as sim_device
, rl_device
, and graphics_device
control which GPU should IsaacGym use. GPUs for distillation and simulation do not need to be the same. Actually, we also support multi-GPU distillation with IsaacGym running on another GPU for evaluation.
The experiment will be logged at exp_root_dir
, where you can find the saved config, logs, tensorboard, and checkpoints. Since we periodically switch between training and simulation evaluation. Policies are saved based on their success rates. You can find weights of our student policies in the folder student
from our 🤗Hugging Face model repository.
To test and visualize trained student policies, run the following command.
python3 main/distillation/test.py task=<task_name> distillation_student_arch=<arch> \
bs=null num_envs=<num_of_parallel_envs> exp_root_dir=<where_to_log_experiment> \
data_path=null matched_scene_data_path=null \
test.ckpt_path=<path_to_student_policy> display=true
Once we have the simulation base policy, we deploy it on a real robot while a human operator monitors its execution. The human operator intervenes the policy execution when necessary and provides correction through teleoperation. To collect such correction data, checkout the script
python3 main/correction_data_collection.py \
--base-policy-ckpt-path <path_to_simulation_base_policy_ckpt> \
--data-save-path <where_to_save_correction_data>
We notice that the real-world observation pipeline and real robot controller may differ across different groups. Therefore, you have to fill in the instantiation of these two components in the script. In our case, we use deoxys
as our robot controller. We provide an example of observation pipeline here.
We provide correction data we collected during the project in the correction_data
folder from our 🤗Hugging Face data repository.
Once we have enough correction data, we can train residual policies with two steps. First, we only learn the residual action head.
python3 main/residual/train.py residual_policy_arch=<arch> \
data_dir=<correction_data_path> exp_root_dir=<where_to_log_experiment> \
residual_policy_task=<task> \
gpus=<gpus> bs=<batch_size> \
module.intervention_pred_loss_weight=0.0 \
wandb_project=<your_wandb_project_name>
For residual_policy_task
, use insert
for the task Insert and default
for others.
We then freeze everything and only learn the head to predict intervention or not.
python3 main/residual/train.py residual_policy_arch=<arch> \
data_dir=<correction_data_path> exp_root_dir=<where_to_log_experiment> \
residual_policy_task=<task> \
gpus=<gpus> bs=<batch_size> \
module.residual_policy.update_intervention_head_only=True \
module.residual_policy.ckpt_path_if_update_intervention_head_only=<path_to_ckpt_from_the_first_step>
wandb_project=<your_wandb_project_name>
Note
Residual policies also can be trained in a single step where both the action and intervention prediction heads are jointly learned. We found that the two-step method leads to overall better residual policies.
Our trained residual policies can be found in the folder residual
from our 🤗Hugging Face model repository.
Once we have both the simulation base policy and the residual policy, we can integrate them together for successful sim-to-real transfer. Checkout the script
python3 main/integrated_deployment.py \
--base-policy-ckpt-path <path_to_simulation_base_policy_ckpt> \
--residual-policy-ckpt-path <path_to_residual_policy_ckpt>
Similarly, you need to fill in the instantiation for real-world observation pipeline and the real-robot controller.
We would like to acknowledge the following open-source project that greatly inspired our development.
Our paper is posted on arXiv. If you find our work useful, please consider citing us!
@inproceedings{jiang2024transic,
title = {TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction},
author = {Yunfan Jiang and Chen Wang and Ruohan Zhang and Jiajun Wu and Li Fei-Fei},
booktitle = {Conference on Robot Learning},
year = {2024}
}
This codebase is released under the MIT License.