Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openvla policy intergration #10

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 15 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,10 @@ We hope that our work guides and inspires future real-to-sim evaluation efforts.
- [Code Structure](#code-structure)
- [Adding New Policies](#adding-new-policies)
- [Adding New Real-to-Sim Evaluation Environments and Robots](#adding-new-real-to-sim-evaluation-environments-and-robots)
- [Full Installation (RT-1 and Octo Inference, Env Building)](#full-installation-rt-1-and-octo-inference-env-building)
- [Full Installation (RT-1, Octo, OpenVLA Inference, Env Building)](#full-installation-rt-1-octo-openvla-inference-env-building)
- [RT-1 Inference Setup](#rt-1-inference-setup)
- [Octo Inference Setup](#octo-inference-setup)
- [OpenVLA Inference Setup](#openvla-inference-setup)
- [Troubleshooting](#troubleshooting)
- [Citation](#citation)

Expand Down Expand Up @@ -97,15 +98,15 @@ cd {this_repo}
pip install -e .
```

**If you'd like to perform evaluations on our provided agents (e.g., RT-1, Octo), or add new robots and environments, please additionally follow the full installation instructions [here](#full-installation-rt-1-and-octo-inference-env-building).**
**If you'd like to perform evaluations on our provided agents (e.g., RT-1, Octo, OpenVLA), or add new robots and environments, please additionally follow the full installation instructions [here](#full-installation-rt-1-octo-openvla-inference-env-building).**


## Examples

- Simple RT-1 and Octo evaluation script on prepackaged environments with visual matching evaluation setup: see [`simpler_env/simple_inference_visual_matching_prepackaged_envs.py`](https://github.com/simpler-env/SimplerEnv/blob/main/simpler_env/simple_inference_visual_matching_prepackaged_envs.py).
- Simple RT-1, Octo, and OpenVLA evaluation script on prepackaged environments with visual matching evaluation setup: see [`simpler_env/simple_inference_visual_matching_prepackaged_envs.py`](https://github.com/simpler-env/SimplerEnv/blob/main/simpler_env/simple_inference_visual_matching_prepackaged_envs.py).
- Colab notebook for RT-1 and Octo inference: see [this link](https://colab.research.google.com/github/simpler-env/SimplerEnv/blob/main/example.ipynb).
- Environment interactive visualization and manual control: see [`ManiSkill2_real2sim/mani_skill2_real2sim/examples/demo_manual_control_custom_envs.py`](https://github.com/simpler-env/ManiSkill2_real2sim/blob/main/mani_skill2_real2sim/examples/demo_manual_control_custom_envs.py)
- Policy inference scripts to reproduce our Google Robot and WidowX real-to-sim evaluation results with sweeps over object / robot poses and advanced loggings. These contain both visual matching and variant aggregation evaluation setups along with RT-1, RT-1-X, and Octo policies. See [`scripts/`](https://github.com/simpler-env/SimplerEnv/tree/main/scripts).
- Policy inference scripts to reproduce our Google Robot and WidowX real-to-sim evaluation results with sweeps over object / robot poses and advanced loggings. These contain both visual matching and variant aggregation evaluation setups along with RT-1, RT-1-X, Octo, and OpenVLA policies. See [`scripts/`](https://github.com/simpler-env/SimplerEnv/tree/main/scripts).
- Real-to-sim evaluation videos from running `scripts/*.sh`: see [this link](https://huggingface.co/datasets/xuanlinli17/simpler-env-eval-example-videos/tree/main).

## Current Environments
Expand Down Expand Up @@ -183,6 +184,7 @@ simpler_env/
policies/: policy implementations
rt1/: RT-1 policy implementation
octo/: Octo policy implementation
openvla/: OpenVLA policy implementation
utils/:
env/: environment building and observation utilities
debug/: debugging tools for policies and robots
Expand All @@ -205,7 +207,7 @@ scripts/: example bash scripts for policy inference under our variant aggregatio

If you want to use existing environments for evaluating new policies, you can keep `./ManiSkill2_real2sim` as is.

1. Implement new policy inference scripts in `simpler_env/policies/{your_new_policy}`, following the examples for RT-1 (`simpler_env/policies/rt1`) and Octo (`simpler_env/policies/octo`) policies.
1. Implement new policy inference scripts in `simpler_env/policies/{your_new_policy}`, following the examples for RT-1 (`simpler_env/policies/rt1`), Octo (`simpler_env/policies/octo`), and OpenVLA (`simpler_env/policies/openvla`) policies.
2. You can now use `simpler_env/simple_inference_visual_matching_prepackaged_envs.py` to perform policy evaluations in simulation.
- If the policy behaviors deviate a lot from those in the real-world, you can write similar scripts as in `simpler_env/utils/debug/{policy_name}_inference_real_video.py` to debug the policy behaviors. The debugging script performs policy inference by feeding real eval video frames into the policy. If the policy behavior still deviates significantly from real, this may suggest that policy actions are processed incorrectly into the simulation environments. Please double check action orderings and action spaces.
3. If you'd like to perform customized evaluations,
Expand All @@ -219,7 +221,7 @@ If you want to use existing environments for evaluating new policies, you can ke
We provide a step-by-step guide to add new real-to-sim evaluation environments and robots in [this README](ADDING_NEW_ENVS_ROBOTS.md)


## Full Installation (RT-1 and Octo Inference, Env Building)
## Full Installation (RT-1, Octo, OpenVLA Inference, Env Building)

If you'd like to perform evaluations on our provided agents (e.g., RT-1, Octo), or add new robots and environments, please follow the full installation instructions below.

Expand Down Expand Up @@ -289,6 +291,13 @@ If you are using CUDA 12, then to use GPU for Octo inference, you need CUDA vers

`PATH=/usr/local/cuda-12.3/bin:$PATH LD_LIBRARY_PATH=/usr/local/cuda-12.3/lib64:$LD_LIBRARY_PATH bash scripts/octo_xxx_script.sh`

### OpenVLA Inference Setup

```
pip install torch==2.3.1 torchvision==0.18.1 timm==0.9.10 tokenizers==0.15.2 accelerate==0.32.1
pip install flash-attn==2.6.1 --no-build-isolation
```

## Troubleshooting

1. If you encounter issues such as
Expand Down
Empty file modified scripts/octo_bridge.sh
100644 → 100755
Empty file.
Empty file modified scripts/octo_drawer_variant_agg.sh
100644 → 100755
Empty file.
Empty file modified scripts/octo_drawer_visual_matching.sh
100644 → 100755
Empty file.
Empty file modified scripts/octo_move_near_variant_agg.sh
100644 → 100755
Empty file.
Empty file modified scripts/octo_move_near_visual_matching.sh
100644 → 100755
Empty file.
Empty file modified scripts/octo_pick_coke_can_variant_agg.sh
100644 → 100755
Empty file.
Empty file modified scripts/octo_pick_coke_can_visual_matching.sh
100644 → 100755
Empty file.
Empty file modified scripts/octo_put_in_drawer_variant_agg.sh
100644 → 100755
Empty file.
Empty file modified scripts/octo_put_in_drawer_visual_matching.sh
100644 → 100755
Empty file.
49 changes: 49 additions & 0 deletions scripts/openvla_bridge.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
gpu_id=0
policy_model=openvla
ckpt_path="openvla/openvla-7b"

scene_name=bridge_table_1_v1
robot=widowx
rgb_overlay_path=ManiSkill2_real2sim/data/real_inpainting/bridge_real_eval_1.png
robot_init_x=0.147
robot_init_y=0.028

python simpler_env/main_inference.py --policy-model ${policy_model} --ckpt-path ${ckpt_path} \
--robot ${robot} --policy-setup widowx_bridge \
--control-freq 5 --sim-freq 500 --max-episode-steps 60 \
--env-name PutCarrotOnPlateInScene-v0 --scene-name ${scene_name} \
--rgb-overlay-path ${rgb_overlay_path} \
--robot-init-x ${robot_init_x} ${robot_init_x} 1 --robot-init-y ${robot_init_y} ${robot_init_y} 1 --obj-variation-mode episode --obj-episode-range 0 24 \
--robot-init-rot-quat-center 0 0 0 1 --robot-init-rot-rpy-range 0 0 1 0 0 1 0 0 1;

python simpler_env/main_inference.py --policy-model ${policy_model} --ckpt-path ${ckpt_path} \
--robot ${robot} --policy-setup widowx_bridge \
--control-freq 5 --sim-freq 500 --max-episode-steps 60 \
--env-name StackGreenCubeOnYellowCubeBakedTexInScene-v0 --scene-name ${scene_name} \
--rgb-overlay-path ${rgb_overlay_path} \
--robot-init-x ${robot_init_x} ${robot_init_x} 1 --robot-init-y ${robot_init_y} ${robot_init_y} 1 --obj-variation-mode episode --obj-episode-range 0 24 \
--robot-init-rot-quat-center 0 0 0 1 --robot-init-rot-rpy-range 0 0 1 0 0 1 0 0 1;

python simpler_env/main_inference.py --policy-model ${policy_model} --ckpt-path ${ckpt_path} \
--robot ${robot} --policy-setup widowx_bridge \
--control-freq 5 --sim-freq 500 --max-episode-steps 60 \
--env-name PutSpoonOnTableClothInScene-v0 --scene-name ${scene_name} \
--rgb-overlay-path ${rgb_overlay_path} \
--robot-init-x ${robot_init_x} ${robot_init_x} 1 --robot-init-y ${robot_init_y} ${robot_init_y} 1 --obj-variation-mode episode --obj-episode-range 0 24 \
--robot-init-rot-quat-center 0 0 0 1 --robot-init-rot-rpy-range 0 0 1 0 0 1 0 0 1;


scene_name=bridge_table_1_v2
robot=widowx_sink_camera_setup
rgb_overlay_path=ManiSkill2_real2sim/data/real_inpainting/bridge_sink.png
robot_init_x=0.127
robot_init_y=0.06

python simpler_env/main_inference.py --policy-model ${policy_model} --ckpt-path ${ckpt_path} \
--robot ${robot} --policy-setup widowx_bridge \
--control-freq 5 --sim-freq 500 --max-episode-steps 120 \
--env-name PutEggplantInBasketScene-v0 --scene-name ${scene_name} \
--rgb-overlay-path ${rgb_overlay_path} \
--robot-init-x ${robot_init_x} ${robot_init_x} 1 --robot-init-y ${robot_init_y} ${robot_init_y} 1 --obj-variation-mode episode --obj-episode-range 0 24 \
--robot-init-rot-quat-center 0 0 0 1 --robot-init-rot-rpy-range 0 0 1 0 0 1 0 0 1;

82 changes: 82 additions & 0 deletions scripts/openvla_drawer_variant_agg.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# shader_dir=rt means that we turn on ray-tracing rendering; this is quite crucial for the open / close drawer task as policies often rely on shadows to infer depth
declare -a ckpt_paths=(
"openvla/openvla-7b"
)

declare -a env_names=(
OpenTopDrawerCustomInScene-v0
OpenMiddleDrawerCustomInScene-v0
OpenBottomDrawerCustomInScene-v0
CloseTopDrawerCustomInScene-v0
CloseMiddleDrawerCustomInScene-v0
CloseBottomDrawerCustomInScene-v0
)

EXTRA_ARGS="--enable-raytracing"


# base setup
scene_name=frl_apartment_stage_simple

EvalSim() {
echo ${ckpt_path} ${env_name}

python simpler_env/main_inference.py --policy-model openvla --ckpt-path ${ckpt_path} \
--robot google_robot_static \
--control-freq 3 --sim-freq 513 --max-episode-steps 113 \
--env-name ${env_name} --scene-name ${scene_name} \
--robot-init-x 0.65 0.85 3 --robot-init-y -0.2 0.2 3 \
--robot-init-rot-quat-center 0 0 0 1 --robot-init-rot-rpy-range 0 0 1 0 0 1 0.0 0.0 1 \
--obj-init-x-range 0 0 1 --obj-init-y-range 0 0 1 \
${EXTRA_ARGS}
}


for ckpt_path in "${ckpt_paths[@]}"; do
for env_name in "${env_names[@]}"; do
EvalSim
done
done


# backgrounds

declare -a scene_names=(
"modern_bedroom_no_roof"
"modern_office_no_roof"
)

for scene_name in "${scene_names[@]}"; do
for ckpt_path in "${ckpt_paths[@]}"; do
for env_name in "${env_names[@]}"; do
EXTRA_ARGS="--additional-env-build-kwargs shader_dir=rt"
EvalSim
done
done
done


# lightings
scene_name=frl_apartment_stage_simple

for ckpt_path in "${ckpt_paths[@]}"; do
for env_name in "${env_names[@]}"; do
EXTRA_ARGS="--additional-env-build-kwargs shader_dir=rt light_mode=brighter"
EvalSim
EXTRA_ARGS="--additional-env-build-kwargs shader_dir=rt light_mode=darker"
EvalSim
done
done


# new cabinets
scene_name=frl_apartment_stage_simple

for ckpt_path in "${ckpt_paths[@]}"; do
for env_name in "${env_names[@]}"; do
EXTRA_ARGS="--additional-env-build-kwargs shader_dir=rt station_name=mk_station2"
EvalSim
EXTRA_ARGS="--additional-env-build-kwargs shader_dir=rt station_name=mk_station3"
EvalSim
done
done
132 changes: 132 additions & 0 deletions scripts/openvla_drawer_visual_matching.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# shader_dir=rt means that we turn on ray-tracing rendering; this is quite crucial for the open / close drawer task as policies often rely on shadows to infer depth
declare -a ckpt_paths=(
"openvla/openvla-7b"
)

declare -a env_names=(
OpenTopDrawerCustomInScene-v0
OpenMiddleDrawerCustomInScene-v0
OpenBottomDrawerCustomInScene-v0
CloseTopDrawerCustomInScene-v0
CloseMiddleDrawerCustomInScene-v0
CloseBottomDrawerCustomInScene-v0
)

# URDF variations
declare -a urdf_version_arr=("recolor_cabinet_visual_matching_1" "recolor_tabletop_visual_matching_1" "recolor_tabletop_visual_matching_2" None)

for urdf_version in "${urdf_version_arr[@]}"; do

EXTRA_ARGS="--enable-raytracing --additional-env-build-kwargs station_name=mk_station_recolor light_mode=simple disable_bad_material=True urdf_version=${urdf_version}"

EvalOverlay() {
# A0
python simpler_env/main_inference.py --policy-model openvla --ckpt-path ${ckpt_path} \
--robot google_robot_static \
--control-freq 3 --sim-freq 513 --max-episode-steps 113 \
--env-name ${env_name} --scene-name dummy_drawer \
--robot-init-x 0.644 0.644 1 --robot-init-y -0.179 -0.179 1 \
--robot-init-rot-quat-center 0 0 0 1 --robot-init-rot-rpy-range 0 0 1 0 0 1 -0.03 -0.03 1 \
--obj-init-x-range 0 0 1 --obj-init-y-range 0 0 1 \
--rgb-overlay-path ./ManiSkill2_real2sim/data/real_inpainting/open_drawer_a0.png \
${EXTRA_ARGS}

# A1
python simpler_env/main_inference.py --policy-model openvla --ckpt-path ${ckpt_path} \
--robot google_robot_static \
--control-freq 3 --sim-freq 513 --max-episode-steps 113 \
--env-name ${env_name} --scene-name dummy_drawer \
--robot-init-x 0.765 0.765 1 --robot-init-y -0.182 -0.182 1 \
--robot-init-rot-quat-center 0 0 0 1 --robot-init-rot-rpy-range 0 0 1 0 0 1 -0.02 -0.02 1 \
--obj-init-x-range 0 0 1 --obj-init-y-range 0 0 1 \
--rgb-overlay-path ./ManiSkill2_real2sim/data/real_inpainting/open_drawer_a1.png \
${EXTRA_ARGS}

# A2
python simpler_env/main_inference.py --policy-model openvla --ckpt-path ${ckpt_path} \
--robot google_robot_static \
--control-freq 3 --sim-freq 513 --max-episode-steps 113 \
--env-name ${env_name} --scene-name dummy_drawer \
--robot-init-x 0.889 0.889 1 --robot-init-y -0.203 -0.203 1 \
--robot-init-rot-quat-center 0 0 0 1 --robot-init-rot-rpy-range 0 0 1 0 0 1 -0.06 -0.06 1 \
--obj-init-x-range 0 0 1 --obj-init-y-range 0 0 1 \
--rgb-overlay-path ./ManiSkill2_real2sim/data/real_inpainting/open_drawer_a2.png \
${EXTRA_ARGS}

# B0
python simpler_env/main_inference.py --policy-model openvla --ckpt-path ${ckpt_path} \
--robot google_robot_static \
--control-freq 3 --sim-freq 513 --max-episode-steps 113 \
--env-name ${env_name} --scene-name dummy_drawer \
--robot-init-x 0.652 0.652 1 --robot-init-y 0.009 0.009 1 \
--robot-init-rot-quat-center 0 0 0 1 --robot-init-rot-rpy-range 0 0 1 0 0 1 0 0 1 \
--obj-init-x-range 0 0 1 --obj-init-y-range 0 0 1 \
--rgb-overlay-path ./ManiSkill2_real2sim/data/real_inpainting/open_drawer_b0.png \
${EXTRA_ARGS}

# B1
python simpler_env/main_inference.py --policy-model openvla --ckpt-path ${ckpt_path} \
--robot google_robot_static \
--control-freq 3 --sim-freq 513 --max-episode-steps 113 \
--env-name ${env_name} --scene-name dummy_drawer \
--robot-init-x 0.752 0.752 1 --robot-init-y 0.009 0.009 1 \
--robot-init-rot-quat-center 0 0 0 1 --robot-init-rot-rpy-range 0 0 1 0 0 1 0 0 1 \
--obj-init-x-range 0 0 1 --obj-init-y-range 0 0 1 \
--rgb-overlay-path ./ManiSkill2_real2sim/data/real_inpainting/open_drawer_b1.png \
${EXTRA_ARGS}

# B2
python simpler_env/main_inference.py --policy-model openvla --ckpt-path ${ckpt_path} \
--robot google_robot_static \
--control-freq 3 --sim-freq 513 --max-episode-steps 113 \
--env-name ${env_name} --scene-name dummy_drawer \
--robot-init-x 0.851 0.851 1 --robot-init-y 0.035 0.035 1 \
--robot-init-rot-quat-center 0 0 0 1 --robot-init-rot-rpy-range 0 0 1 0 0 1 0 0 1 \
--obj-init-x-range 0 0 1 --obj-init-y-range 0 0 1 \
--rgb-overlay-path ./ManiSkill2_real2sim/data/real_inpainting/open_drawer_b2.png \
${EXTRA_ARGS}

# C0
python simpler_env/main_inference.py --policy-model openvla --ckpt-path ${ckpt_path} \
--robot google_robot_static \
--control-freq 3 --sim-freq 513 --max-episode-steps 113 \
--env-name ${env_name} --scene-name dummy_drawer \
--robot-init-x 0.665 0.665 1 --robot-init-y 0.224 0.224 1 \
--robot-init-rot-quat-center 0 0 0 1 --robot-init-rot-rpy-range 0 0 1 0 0 1 0 0 1 \
--obj-init-x-range 0 0 1 --obj-init-y-range 0 0 1 \
--rgb-overlay-path ./ManiSkill2_real2sim/data/real_inpainting/open_drawer_c0.png \
${EXTRA_ARGS}

# C1
python simpler_env/main_inference.py --policy-model openvla --ckpt-path ${ckpt_path} \
--robot google_robot_static \
--control-freq 3 --sim-freq 513 --max-episode-steps 113 \
--env-name ${env_name} --scene-name dummy_drawer \
--robot-init-x 0.765 0.765 1 --robot-init-y 0.222 0.222 1 \
--robot-init-rot-quat-center 0 0 0 1 --robot-init-rot-rpy-range 0 0 1 0 0 1 -0.025 -0.025 1 \
--obj-init-x-range 0 0 1 --obj-init-y-range 0 0 1 \
--rgb-overlay-path ./ManiSkill2_real2sim/data/real_inpainting/open_drawer_c1.png \
${EXTRA_ARGS}

# C2
python simpler_env/main_inference.py --policy-model openvla --ckpt-path ${ckpt_path} \
--robot google_robot_static \
--control-freq 3 --sim-freq 513 --max-episode-steps 113 \
--env-name ${env_name} --scene-name dummy_drawer \
--robot-init-x 0.865 0.865 1 --robot-init-y 0.222 0.222 1 \
--robot-init-rot-quat-center 0 0 0 1 --robot-init-rot-rpy-range 0 0 1 0 0 1 -0.025 -0.025 1 \
--obj-init-x-range 0 0 1 --obj-init-y-range 0 0 1 \
--rgb-overlay-path ./ManiSkill2_real2sim/data/real_inpainting/open_drawer_c2.png \
${EXTRA_ARGS}
}


for ckpt_path in "${ckpt_paths[@]}"; do
for env_name in "${env_names[@]}"; do
EvalOverlay
done
done



done
Loading