Digital Cousins

Project Page | Paper

This repository contains the codebase used in Automated Creation of Digital Cousins for Robust Policy Learning.

More generally, this codebase is designed to generate fully interactive scenes from a single RGB image in a completely automated fashion.

Requirements

Linux machine
Conda
NVIDIA RTX-enabled GPU (recommended 24+ GB VRAM) + CUDA (12.1+)

Getting Started

Download

Clone this repo:

git clone https://github.com/cremebrule/digital-cousins.git
cd digital-cousins

Installation

We provide two methods of installation, both of which are functionally equivalent and install from source. The first method is a one-line call to install everything, including creating a new conda environment (if it doesn't already exist) and installing all necessary dependencies, whereas the second method gives a step-by-step guide.

One-Liner

./install.sh -e acdc -c /PATH/TO/cuda-12.3 [-m]
conda activate acdc

-e specifies the name of the conda environment to use
-c specifies the path to CUDA_HOME installation
-m (optional) should be set if using Mamba, else, will use Conda

Step-by-Step

Create a new conda environment to be used for this repo and activate the repo:
```
conda create -y -n acdc python=3.10
conda activate acdc
```

Install ACDC

conda install conda-build
pip install -r requirements.txt
pip install -e .

Install the following key dependencies used in our pipeline. NOTE: Make sure to install in the exact following order:

Make sure we're in dependencies directory
```
mkdir -p deps && cd deps
```

dinov2

git clone https://github.com/facebookresearch/dinov2.git && cd dinov2
conda-develop . && cd ..      # Note: Do NOT run 'pip install -r requirements.txt'!!

segment-anything-2

git clone https://github.com/facebookresearch/segment-anything-2.git && cd segment-anything-2
pip install -e . && cd ..

GroundingDINO

git clone https://github.com/IDEA-Research/GroundingDINO.git && cd GroundingDINO
export CUDA_HOME=/PATH/TO/cuda-12.3   # Make sure to set this!
pip install --no-build-isolation -e . && cd ..

PerspectiveFields

git clone https://github.com/jinlinyi/PerspectiveFields.git && cd PerspectiveFields
pip install -e . && cd ..

Depth-Anything-V2

git clone https://github.com/DepthAnything/Depth-Anything-V2.git && cd Depth-Anything-V2
pip install -r requirements.txt
conda-develop . && cd ..

CLIP

pip install git+https://github.com/openai/CLIP.git

faiss-gpu

conda install -c pytorch -c nvidia faiss-gpu=1.8.0

robomimic

git clone https://github.com/ARISE-Initiative/robomimic.git --branch diffusion-updated --single-branch && cd robomimic
pip install -e . && cd ..

OmniGibson

git clone https://github.com/StanfordVL/OmniGibson.git && cd OmniGibson
pip install -e . && python -m omnigibson.install --no-install-datasets && cd ..

Assets

In order to use this repo, we require both the asset image and BEHAVIOR datasets used to match digital cousins, as well as relevant checkpoints used by underlying foundation models. Use the following commands to install each:

Asset image and BEHAVIOR datasets

python -m omnigibson.utils.asset_utils --download_assets --download_og_dataset --accept_license
python -m digital_cousins.utils.dataset_utils --download_acdc_assets

Model checkpoints

# Make sure you start in the root directory of ACDC
mkdir -p checkpoints && cd checkpoints
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt
wget https://huggingface.co/depth-anything/Depth-Anything-V2-Metric-Hypersim-Large/resolve/main/depth_anything_v2_metric_hypersim_vitl.pth
wget https://huggingface.co/depth-anything/Depth-Anything-V2-Metric-VKITTI-Large/resolve/main/depth_anything_v2_metric_vkitti_vitl.pth
cd ..

Policy checkpoints

mkdir -p training_results && cd training_results
wget https://huggingface.co/RogerDAI/ACDC/resolve/main/cousin_ckpt.pth
wget https://huggingface.co/RogerDAI/ACDC/resolve/main/twin_ckpt.pth
cd ..

Testing

To validate that the entire installation process completed successfully, please run our set of unit tests:

python tests/test_models.py --gpt_api_key <KEY> --gpt_version 4o

--gpt_api_key specifies the GPT API key to use for GPT queries. Must be compatible with --gpt_version
--gpt_version (optional) specifies the GPT version to use. Default is 4o

Usage

ACDC Pipeline

Usage is straightforward, simply run our ACDC pipeline on any input image you'd like via our entrypoint:

python digital_cousins/pipeline/acdc.py --input_path <INPUT_IMG_PATH> [--config <CONFIG>] [--gpt_api_key <KEY>]

--input_path specifies the path to the input RGB image ot use
--config (optional) specifies the path to the config to use. If not set, will use the default config at digital_cousins/configs/default.yaml
--gpt_api_key (optional) specifies the GPT API key to use for GPT queries. If not set, this must be set in the loaded config

By default, this will generate all outputs to a directory named acdc_outputs in the same directory as <INPUT_IMG_PATH>.

We include complex input images published in our work under examples/images.

To visualize intermediate results like the no-cut videos shown in our website, please set pipeline.RealWorldExtractor.call.visualize to True in the config file.

To load the result in an user-interactable way, simply run:

python digital_cousins/scripts/load_scene.py --scene_info_path <SCENE_OUTPUT_JSON_FILE>

The user can use keyboard and mouse commands to interact with the scene.

Policy Rollout

To visualize the policy rollout of digital twin policy versus digital cousin policy on the exact digital twin, unseen digital cousins, and a more dissimilar asset, simply run:

# Rollout digital twin policy on the exact digital twin (expected success rate ~89%)
python examples/4_evaluate_policy.py --agent training_results/twin_ckpt.pth --eval_category_model_link_name bottom_cabinet,kdbgpm,link_1 --n_rollouts 100 --seed 1

# Rollout digital twin policy on the second hold-out cousin (expected success rate ~88%)
python examples/4_evaluate_policy.py --agent training_results/twin_ckpt.pth --eval_category_model_link_name bottom_cabinet,dajebq,link_3 --n_rollouts 100 --seed 1

# Rollout digital twin policy on the sixth hold-out cousin (expected success rate ~41%)
python examples/4_evaluate_policy.py --agent training_results/twin_ckpt.pth --eval_category_model_link_name bottom_cabinet,nrlayx,link_1 --n_rollouts 100 --seed 1

# Rollout digital twin policy on the dissimilar asset (expected success rate ~48%)
python examples/4_evaluate_policy.py --agent training_results/twin_ckpt.pth --eval_category_model_link_name bottom_cabinet,plccav,dof_rootd_ba001_r --n_rollouts 100 --seed 1

# Rollout digital cousin policy on the exact digital twin (expected success rate ~94%)
python examples/4_evaluate_policy.py --agent training_results/cousin_ckpt.pth --eval_category_model_link_name bottom_cabinet,kdbgpm,link_1 --n_rollouts 100 --seed 1

# Rollout digital cousin policy on the second hold-out cousin (expected success rate ~94%)
python examples/4_evaluate_policy.py --agent training_results/cousin_ckpt.pth --eval_category_model_link_name bottom_cabinet,dajebq,link_3 --n_rollouts 100 --seed 1

# Rollout digital cousin policy on the sixth hold-out cousin (expected success rate ~98%)
python examples/4_evaluate_policy.py --agent training_results/cousin_ckpt.pth --eval_category_model_link_name bottom_cabinet,nrlayx,link_1 --n_rollouts 100 --seed 1

# Rollout digital cousin policy on the dissimilar asset (expected success rate ~38%)
python examples/4_evaluate_policy.py --agent training_results/cousin_ckpt.pth --eval_category_model_link_name bottom_cabinet,plccav,dof_rootd_ba001_r --n_rollouts 100 --seed 1

Digital cousin-trained policies can often perform similarly to its equivalent digital twin policy on the exact twin asset despite not being trained on that specific setup. In held-out cousin setups unseen by both the digital twin and digital cousin policies, we find that the performance disparity sharply increases. While policies trained on digital cousins exhibit more robust performance across these setups, the digital twin policy exhibits significant degradation. This suggests that digital cousins can improve policy robustness to setups that are unseen but still within the distribution of cousins that the policy was trained on.

Full Pipeline Examples

We provide a full suite of examples showcasing our end-to-end pipeline, including scene generation, automated demo collection, and policy training / evaluation. The examples are listed and ordered under the examples directory.

User Tips and Limitations

High-quality digital cousin selection requires sufficient assets in the corresponding category in BEHAVIOR. If the number of available assets under a certain category is limited, the result would be sub-optimal. For example, the current BEHAVIOR dataset has only one pot asset, one toaster asset, and two coffee maker assets. In this case, we suggest collecting a smaller number of digital cousins to ensure the collected digital cousins belong to the same category as the target objects.
We assume assets can only rotate around its local z axis, so we cannot model rotation around object's local x and y axis, like a flipped table with table top touching the floor but table legs pointing upward. Also, some assets in BEHAVIOR dataset has physically unstable default orientation. For examples, some book assets under their default orientation may be tilted. Based on our knowledge, BEHAVIOR will have its new dataset released and this problem will get solved. We will pre-process the new dataset and post it on our repository.
In the config file, FeatureMatcher.gsam_box_threshold and FeatureMatcher.gsam_text_threshold controls confidence threshold for object detection. When objects in the input image are missing in the reconstructed digital cousin scenes, consider decreasing these values. For example, when we run ACDC on tests/test_img_gsam_box_0.22_gam_text_0.18.png as shown in the no-cut video on our project website, we set FeatureMatcher.gsam_box_threshold to 0.22 and FeatureMatcher.gsam_text_threshold to 0.18.
Accurate object position and bounding box estimation depends on the quality of point cloud and object mask, where the point cloud is computed from the depth image inferred by depth-anything-v2. The performance of depth-anything-v2 decreases under occlusion, reflective material, incomplete objects at the boarder of the input image, and non-uniform lighting condition; mask generation quality of Grounded-Sam-v2 decreases under occlusion, fine-grained details, and cluttered background. If an asset becomes unreasonably large, you may consider tuning FeatureMatcher.gsam_box_threshold and FeatureMatcher.gsam_text_threshold, and set FeatureMatcher.pipeline.SimulatedSceneGenerator.resolve_collision to false to decrease influence to other assets.
We only model 'on top' relationship between objects, so for other object relationships, like kettles in coffee machines and books on bookshelves, one object will be placed on top of another.
We take care of objects on walls, but not objects on ceilings. An input image with no objects on ceiling will be optimal. If objects on ceilings are detected, users can set FeatureMatcher.pipeline.SimulatedSceneGenerator.discard_objs to discard unwanted objects at Step 3.
If step 2 of ACDC is killed by OpenAI server error, or low RAM, users can resume collecting digital cousins by setting FeatureMatcher.pipeline.DigitalCousinMatcher.start_at_name to the object name where the process is killed. See tests/test_models.py for examples of running only step 2 and step 3 of ACDC.
We assume that assets within semantically similar categories share the same default orientation. For instance, wardrobes, bottom cabinets, and top cabinets should have doors or drawers that open along the local x-axis in their default orientation. However, some assets in the current BEHAVIOR dataset do not adhere to this assumption, potentially leading to incorrect orientations of digital cousins during policy training. Based on our knowledge, the BEHAVIOR team plans to release an updated dataset that resolves this issue, and we will update our dataset accordingly once it is available.
If you encounter torch installation error, like ImportError: /scr/tydai/miniforge3/envs/acdc/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12, consider trying this solution.

Citation

Please cite Digital Cousins if you use this framework in your publications:

@inproceedings{dai2024acdc,
  title={Automated Creation of Digital Cousins for Robust Policy Learning},
  author={Tianyuan Dai and Josiah Wong and Yunfan Jiang and Chen Wang and Cem Gokmen and Ruohan Zhang and Jiajun Wu and Li Fei-Fei},
  booktitle={Conference on Robot Learning (CoRL)},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
digital_cousins		digital_cousins
examples		examples
resources		resources
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Digital Cousins

Project Page | Paper

Requirements

Getting Started

Download

Installation

One-Liner

Step-by-Step

Assets

Testing

Usage

ACDC Pipeline

Policy Rollout

Full Pipeline Examples

User Tips and Limitations

Citation

About

Releases

Packages

Contributors 2

Languages

License

cremebrule/digital-cousins

Folders and files

Latest commit

History

Repository files navigation

Digital Cousins

Project Page | Paper

Requirements

Getting Started

Download

Installation

One-Liner

Step-by-Step

Assets

Testing

Usage

ACDC Pipeline

Policy Rollout

Full Pipeline Examples

User Tips and Limitations

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages