Skip to content
/ COMPASS Public

Cross-embOdiment Mobility Policy via ResiduAl RL and Skill Synthesis

License

Notifications You must be signed in to change notification settings

NVlabs/COMPASS

Repository files navigation

COMPASS: Cross-Embodiment Mobility Policy via Residual RL and Skill Synthesis

Overview

This repository provides the official PyTorch implementation of COMPASS.

COMPASS

COMPASS is a novel framework for cross-embodiment mobility that combines:

  • Imitation Learning (IL) for strong baseline performance
  • Residual Reinforcement Learning (RL) for embodiment-specific adaptation
  • Policy distillation to create a unified, generalist policy

Table of Contents

Installation

1. Isaac Lab Installation

  • Install Isaac Lab and the residual RL mobility extension by following this instruction.

2. Environment Setup

  • Create and activate a virtual environment:
    python3 -m venv venv
    source venv/bin/activate

3. Dependencies

  • Install the required packages:
    ${ISAACLAB_PATH}/isaaclab.sh -p -m pip install -r requirements.txt

4. X-Mobility Installation

5. Residual RL environment USDs

Usage

Residual RL Specialists

  • Train with the default configurations in configs/train_config.gin:

    ${ISAACLAB_PATH}/isaaclab.sh -p run.py \
        -c configs/train_config.gin \
        -o <output_dir> \
        -b <path/to/x_mobility_ckpt> \
        --enable_camera
  • Evaluate trained model:

    ${ISAACLAB_PATH}/isaaclab.sh -p run.py \
        -c configs/eval_config.gin \
        -o <output_dir> \
        -b <path/to/x_mobility_ckpt> \
        -p <path/to/residual_policy_ckpt> \
        --enable_camera \
        --video \
        --video_interval <video_interval>

NOTE: The GPU memory usage is proportional to the number of environments in residual RL training. For example, 32 environments will use around 30GB memory, so reduce the number of environments if you have limited GPU memory.

Policy Distillation

  • Collect specialist data:

    • Update specialists policy checkpoint paths in dataset_config_template
    • Run data collection:
      ${ISAACLAB_PATH}/isaaclab.sh -p record.py \
          -c configs/distillation_dataset_config_template.yaml \
          -o <output_dir> \
          -b <path/to/x_mobility_ckpt> \
          --dataset-name <dataset_name>
  • Train generalist policy:

    python3 distillation_train.py \
        --config-files configs/distillation_config.gin \
        --dataset-path <path/to/specialists_dataset> \
        --output-dir <output_dir>
  • Evaluate generalist policy:

    ${ISAACLAB_PATH}/isaaclab.sh -p run.py \
        -c configs/eval_config.gin \
        -o <output_dir> \
        -b <path/to/x_mobility_ckpt> \
        -d <path/to/generalist_policy_ckpt> \
        --enable_camera \
        --video \
        --video_interval <video_interval>

Model Export

  • Export RL specialist policy to ONNX or JIT formats:

    python3 onnx_conversion.py \
        -b <path/to/x_mobility_ckpt> \
        -r <path/to/residual_policy_ckpt> \
        -o <path/to/output_onnx_file> \
        -j <path/to/output_jit_file>
  • Export generalist policy to ONNX or JIT formats:

    python3 onnx_conversion.py \
        -b <path/to/x_mobility_ckpt> \
        -g <path/to/generalist_policy_ckpt> \
        -e <embodiment_type> \
        -o <path/to/output_onnx_file> \
        -j <path/to/output_jit_file>

Add New Embodiment or Scene

  • Follow this instruction to add a new embodiment or scene to the Isaac Lab RL environment.
  • Register the new embodiment or scene to the EmbodimentEnvCfgMap and EnvSceneAssetCfgMap in run.py, then update the configs or use command line arguments to select the new embodiment or scene.

Logging:

The training and evaluation scripts use TensorBoard for logging by default. Weights & Biases (W&B) logging is also supported for more advanced experiment tracking features.

To use TensorBoard (default):

  • Logs will be saved to <output_dir>/tensorboard/
  • View logs with: tensorboard --logdir=<output_dir>/tensorboard/

To use Weights & Biases:

  1. Install and set up W&B: pip install wandb and follow the setup instructions
  2. Log in to your W&B account: wandb login
  3. Add the --logger wandb flag to your command:
    ${ISAACLAB_PATH}/isaaclab.sh -p run.py \
        -c configs/train_config.gin \
        -o <output_dir\
        -b <path/to/x_mobility_ckpt> \
        --enable_camera \
        --logger wandb \
        --wandb-run-name "experiment_name" \
        --wandb-project-name "project_name" \
        --wandb-entity-name "your_username_or_team"

Pre-trained Generalist Policy Example

We provide a pre-trained generalist policy that works across four robot embodiments:

  • Carter (wheeled robot)
  • H1 (humanoid)
  • G1 (humanoid)
  • Spot (quadruped)

To try out the pre-trained generalist policy:

  1. Download the checkpoint from: https://huggingface.co/nvidia/COMPASS/blob/main/compass_generalist.ckpt
  2. Use the evaluation command shown above with your downloaded checkpoint:
    ${ISAACLAB_PATH}/isaaclab.sh -p run.py \
        -c configs/eval_config.gin \
        -o <output_dir> \
        -b <path/to/x_mobility_ckpt> \
        -d <path/to/downloaded_generalist_policy_ckpt> \
        --enable_camera \
        --embodiment <embodiment_name> \
        --environment <environment_name>

NOTE: The generalist policy uses one-hot embodiment encoding and may not generalize perfectly to unseen embodiment types. For best results with new embodiment types, we recommend fine-tuning with residual RL first.

License

COMPASS is released under the Apache License 2.0. See LICENSE for additional details.

Core Contributors

Wei Liu, Huihua Zhao, Chenran Li, Joydeep Biswas, Soha Pouya, Yan Chang

Acknowledgments

We would like to acknowledge the following projects where parts of the codes in this repo is derived from:

Citation

If you find this work useful in your research, please consider citing:

@article{liu2025compass,
  title={COMPASS: Cross-embodiment Mobility Policy via Residual RL and Skill Synthesis},
  author={Liu, Wei and Zhao, Huihua and Li, Chenran and Biswas, Joydeep and Pouya, Soha and Chang, Yan},
  journal={arXiv preprint arXiv:2502.16372},
  year={2025}
}

About

Cross-embOdiment Mobility Policy via ResiduAl RL and Skill Synthesis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published