Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatible Control Mode between TFDS Dataset and ManiSkill2 Environment #750

Open
Taej-P opened this issue Dec 16, 2024 · 7 comments
Open

Comments

@Taej-P
Copy link

Taej-P commented Dec 16, 2024

Hi,

I extracted actions from the ManiSkill2 TFDS dataset provided by Open-X-Embodiments and tried to execute them in a new environment. However, after checking the TFDS dataset, I found that the controller uses "pd_base_ee_target_delta_pose", but it seems that the current environment created using gym.make does not support this control mode.

How can I set the base frame when extracting actions from TFDS and executing them in the environment?

@Taej-P
Copy link
Author

Taej-P commented Dec 17, 2024

maniskill2 library version is 0.5.3.

If I need to use ManiSkill3 instead of ManiSkill2, how should I set it up in my Python script? Could you provide guidance on how to properly configure and initialize the environment for ManiSkill3?

@StoneT2000
Copy link
Member

Which environment are you creating?

@Taej-P
Copy link
Author

Taej-P commented Dec 18, 2024

I tried to create and run environments for all tasks in the maniskill2 TFDS, which include: 'PegInsertionSide-v0', 'PickSingleYCB-v0', 'LiftCube-v0', 'PickCube-v0', 'StackCube-v0', 'AssemblingKits-v0', 'PickClutterYCB-v0', 'PlugCharger-v0', 'PickSingleEGAD-v0', and 'TurnFaucet-v0'.

To simply verify that they were performing the same actions, I extracted the actions from the TFDS and compared the image observations (from both the TFDS and the environment after performing actions). However, I observed that the behaviors were completely different, and I suspect that the coordinate systems might be different.

@StoneT2000
Copy link
Member

StoneT2000 commented Dec 18, 2024

@xuanlinli17 any idea? My guess is your original RLDS converter modified the controller somewhere? That being said I am actually in the middle of adding back RLDS/Open-X dataset format export support to maniskill3 so you could use that instead potentially in the future.

@xuanlinli17
Copy link
Collaborator

xuanlinli17 commented Dec 18, 2024

The original RLDS converter for ManiSkill used coupled translation and rotation (i.e., multiplication of 2 SE(3)) , while the default in ManiSkill3 is decoupled translation and rotation (i.e., new translation = action_translation + current_translation; new rotation = action rotation * current rotation).

For other datasets in Open-X-Embodiment they also use decoupled translation and rotation.

@Taej-P
Copy link
Author

Taej-P commented Dec 18, 2024

Thank you for your response!

After extracting the actions from the ManiSkill2 TFDS, I would like to replicate the same behavior in a new environment. Is there a way to transform the actions into a space such as the "ee" frame (e.g., using control modes like 'pd_ee_target_delta_pose') provided by ManiSkill2? Is there any information in the TFDS that could be useful for this transformation?

The related code as follow:

import cv2
import tensorflow_datasets as tfds
import gymnasium as gym
import mani_skill2.envs
import numpy as np


builder = tfds.builder_from_directory('./datasets/maniskill_dataset_converted_externally_to_rlds/0.1.0/')
ds = builder.as_dataset(split=['train'])[0]

tasks_description_list = ['PegInsertionSide-v0', 'PickSingleYCB-v0', 'LiftCube-v0', 'PickCube-v0', 'StackCube-v0', 'AssemblingKits-v0', 'PickClutterYCB-v0', 'PlugCharger-v0', 'PickSingleEGAD-v0', 'TurnFaucet-v0']

env_name = tasks_description_list[1]
obs_mode = "rgbd" #@param can be one of ['pointcloud', 'rgbd', 'state_dict', 'state']
control_mode = "pd_ee_target_delta_pose" #@param can be one of ['pd_joint_delta_pos', 'pd_joint_pos', 'pd_ee_delta_pos', 'pd_ee_delta_pose', 'pd_ee_delta_pose_align', 'pd_joint_target_delta_pos', 'pd_ee_target_delta_pos', 'pd_ee_target_delta_pose', 'pd_joint_vel', 'pd_joint_pos_vel', 'pd_joint_delta_pos_vel']
reward_mode = "dense"


import time
for ep in ds:
    task = ep['episode_metadata']['episode_id'].numpy().decode('utf-8').split('_')[0]
    print(ep['episode_metadata'])

    # dataset test
    env = gym.make(task,
                render_mode="rgb_array",
                obs_mode=obs_mode,
                reward_mode=reward_mode,
                control_mode=control_mode,
                enable_shadow=False,
                )
    obs, _ = env.reset()

    for step in ep['steps']:
        action = step['action'].numpy()
        next_obs, reward, done, truncated, info = env.step(action)

        img1 = step['observation']['image'].numpy()
        img2 = cv2.resize(next_obs['image']['base_camera']['rgb'], (256, 256), interpolation=cv2.INTER_AREA)

        total_img = np.hstack((img1, img2))
        cv2.imshow('render', total_img)
        cv2.waitKey(1)

    break

@xuanlinli17
Copy link
Collaborator

xuanlinli17 commented Dec 18, 2024

ManiSkill2 action uses coupled rotation & translation (2 SE3 matrix multiplication), instead of the decoupled rotation & translation in ManiSkill3 (https://maniskill.readthedocs.io/en/latest/user_guide/concepts/controllers.html)

To transform ManiSkill2 target delta pose to ManiSkill3 target delta pose, first calculate the target_pose = sapien.Pose(ms2_action_xyz, ms2_action_quat) * sapien.Pose(obs_xyz, obs_quat)); then ManiSkill3 action_xyz = target_pose.p - obs_xyz; action_rot_matrix=target_pose.to_transformation_matrix()[:3, :3] @ np.linalg.inv(quat2mat(ms2_action_quat)) where quat2mat is the utility in transforms3d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants