Kushal Kedia, Prithwish Dan, Angela Chao, Maximus A. Pace, Sanjiban Choudhury
Cornell University, *Equal Contribution
Follow these steps to install RHyME
:
- Create and activate the conda environment:
cd rhyme conda env create -f environment.yml conda activate rhyme pip install -e .
To set up the simulation dataset:
- Instructions TBD
Datasets (Visual Encoder):
- robot
- twohands
- robot_segments_paired_twohands and twohands_segments_paired_twohands (Optional)
Datasets (Diffusion Policy):
- robot
- imagined demonstrator dataset (will be created)
- Pretrain visual encoder:
Additional options include:
python scripts/skill_discovery.py
exp_name (name of model) cross_embodiment (human, singlehand, twohands) use_paired_data (True/False) paired_dataset.percentage_pairing (0-1)
- Convert images into latent vectors using pretrained visual encoder:
Additional options include:
python scripts/label_sim_kitchen_dataset.py
cross_embodiment (human, singlehand, twohands) pretrain_model_name ckpt
- Compute and store sequence-level distance metrics between cross embodiment play data and robot data:
Additional options include:
python scripts/chopped_segment_wise_dists.py
cross_embodiment_segments (e.g. twohands_segments_paired_sample) pretrain_model_name ckpt num_chops (number of clips to retrieve per robot video)
- "Imagine" the paired demonstrator dataset, and store it in the datasets folder:
Additional options include:
python scripts/reconstruction.py
cross_embodiment_segments (e.g. twohands_segments_paired_sample) pretrain_model_name ckpt ot_lookup (True/False) tcc_lookup (True/False) num_chops (number of clips to retrieve per robot video)
- Convert the imagined dataset into latent vectors:
Additional options include:
python scripts/label_sim_kitchen_dataset.py include_robot=False pretrain_model_name=NO_PAIRING_TWOHANDS cross_embodiment=NO_PAIRING_TWOHANDS_twohands_segments_paired_sample_generated_ot_2_ckpt40
include_robot (True/False) pretrain_model_name cross_embodiment (now should be the name of the reconstructed dataset from OT)
- Train conditional diffusion policy to translate imagined demonstrator videos into robot actions:
Additional options include:
python scripts/skill_transfer_composing.py pretrain_model_name=NO_PAIRING_TWOHANDS pretrain_ckpt=40 eval_cfg.demo_type=twohands cross_embodiment=NO_PAIRING_TWOHANDS_twohands_segments_paired_sample_generated_ot_2_ckpt40 dataset.paired_data=True dataset.paired_percent=0.5
pretrain_model_name pretrain_ckpt eval_cfg.demo_type (specifies which demonstrator to evaluate on) cross_embodiment (reconstructed dataset from OT) dataset.paired_data (True if using the imagined paired dataset) dataset.paired_percent (hybrid training on robot/imagined dataset)
@article{
kedia2024one,
title={One-Shot Imitation under Mismatched Execution},
author={Kedia, Kushal and Dan, Prithwish and Choudhury, Sanjiban},
journal={arXiv preprint arXiv:2409.06615},
year={2024}
}
- Much of the training pipeline is adapted from XSkill.
- Diffusion Policy is adapted from Diffusion Policy
- Many useful utilies are adapted from XIRL.