Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Solving Pick-Cube from Pixels Only #667

Open
SumeetBatra opened this issue Oct 30, 2024 · 12 comments
Open

[Question] Solving Pick-Cube from Pixels Only #667

SumeetBatra opened this issue Oct 30, 2024 · 12 comments

Comments

@SumeetBatra
Copy link

Hey! I wanted to see if you guys had any reference code / hyperparameters for SAC solving any of the tabletop tasks using RGB(D) data only and no proprioceptive state information. Thanks!

@StoneT2000
Copy link
Member

StoneT2000 commented Oct 30, 2024

Sorry we have not tuned SAC at the moment, only PPO with some proprioception data + one RGB camera. There is some example code with state based SAC, a simple vision based one will come eventually. TD-MPC2 is already integrated and supports learning from pixels, does need much tuning.

If there's a lot of value in testing algorithms with visual only inputs we can try and help set it up in the future, we have some DM control environments benchmarked with PPO with an option to use visual only inputs.

@SumeetBatra
Copy link
Author

I see, thanks for letting me know! I think having some baselines of end-to-end pixel to action policies would be useful. I am currently using SAC for my project but may also try out other algos in the future.

@StoneT2000
Copy link
Member

Is GPU parallelization important in your case? Or are you working more on e.g. sample-efficiency. I can have some members on the team look to try and tune a RGB/RGBD SAC version.

@SumeetBatra
Copy link
Author

It's not important, but if it makes policy convergence faster I'm for GPU parallelization. Sample efficiency is not an issue atm. I appreciate you all looking into this!

@SumeetBatra
Copy link
Author

Hey! Just wanted to check in and see if this is in the pipeline and if so, if you guys have an expected release date on it. Thanks!

@StoneT2000
Copy link
Member

Currently working on it! Fixing up the SAC state and RGBD implementations now. will provide baseline for PickCube and maybe a few other tasks

@StoneT2000
Copy link
Member

StoneT2000 commented Nov 16, 2024

Ok @SumeetBatra new baseline uploaded. I only checked it works for PushCube and PickCube from pixels. the suggested script to run

python sac_rgbd.py --env_id="PickCube-v1" --obs_mode="rgb" \
  --num_envs=32 --utd=0.5 --buffer_size=300_000 \
  --control-mode="pd_ee_delta_pos" --camera_width=64 --camera_height=64 \
  --total_timesteps=1_000_000 --eval_freq=10_000

was tested and converged after about 1-1.5 hours on a 4090. The SAC code can run faster if I add torch compile/cudagraphs support and add some shared memory optimization for observation storage but that will be done in the future.

31.mp4

tiny 64x64 image in each corner is what the policy sees. Policy also sees any relevant state information (like goal position for the cube and agent joint positions).

See the SAC baseline readme: https://github.com/haosulab/ManiSkill/blob/main/examples/baselines/sac/README.md

I'm sure the other tasks work fine with just the same hyperparameters as PickCube training if trained long enough and appropriate controller is used.

@SumeetBatra
Copy link
Author

@StoneT2000 Thank you so much!! I'll take a look and follow up if I have any questions.

@SumeetBatra
Copy link
Author

SumeetBatra commented Nov 27, 2024

@StoneT2000 I had a chance to look over the sac_rgbd baseline and it looks like state information is in there by default. Is it possible to solve the task without having any proprioceptive state information, just rgb(d) observations only?

EDIT: For extra context, what I'm trying to avoid is needing a perception pipeline to estimate low dimensional state information when working on real hardware. If any state information is present, ideally it should come from somewhere else like inverse kinematics and not a noisy / brittle perception system. Now that I think about it, joint angles can come from IK, so maybe this solution avoids the need for a perception pipeline? I haven't worked with these systems before, so let me know if I'm misunderstanding something.

@StoneT2000
Copy link
Member

Hi @SumeetBatra

So generally when it comes to sim2real / real2sim or testing if something might work in the real world at all, the state data that is accessible and quite accurate is

  • joint positions / qpos values
  • tcp_pose / end-effector pose / link poses (tcp_pose is one of the observation states given in PickCube all the time). These poses are available in the real world via IK using current joint positions.
  • anything else like "command" information. For example in PickCube a goal_pos is given in the observations which is a xyz position in 3D space.

We by default also give qvel values but these require estimation and are harder to align between sim and real so I would definitely just remove that (you don't need it to solve tasks usually, it might help with sample efficiency at times).

If you plan to do sim2real you will need to make modifications to environments for transfer regardless. By default envs in ManiSkill unless stated otherwise are designed more for algorithm benchmarking.

Also from image only is quite difficult, although maybe not impossible assuming the goal information is in the images somewhere (For PickCube it is not, but for peg insertion side or StackCube it is in the perceived image data). It is best to always include necessary goal information of the env, as well as qpos values and tcp poses if possible otherwise learning is slower.

@SumeetBatra
Copy link
Author

SumeetBatra commented Nov 27, 2024

This is really helpful, thanks!

What kind of modifications are needed to facilitate sim2real transfer? I'm guessing DR in the form of state observation noise and maybe some physics randomization at a minimum? Anything else I'm missing? And is there some existing pipeline for facilitating sim2real transfer in the repo? FYI I'm not concerned with the sim2real perception gap atm, mostly with sim2real physics gap and unmodeled dynamics.

@StoneT2000
Copy link
Member

Hard to say, our lab is still finishing up some basic reproducible sim2real experiments that we will have relatively ready to share in a month or two I think. It is led by @Xander-Hinrichsen at the moment, he can comment a bit more on his own real experiences.

At minimum

  • object color randomization
  • green-screening a real world image as the background (works for static non-mobile robotics setups like a single arm)
  • observation noise for state related data like agent qpos
  • ensure your simulation controller behaves close to the real world controller. I'd recommend checking for each action you can take from some rest position in sim and real, verify the qpos of the robot in real and sim are very close and don't deviate. Our current recommendation that works decently well is to use pd_joint_target_delta_pos controllers, and to tune the real world controller to always try and achieve the joint target.

Then you easily train a RGB based policy in sim and do direct deployment in the real world for mostly simpler tasks of reaching/pushing/pulling type behaviors. Picking a cube is kind of hard still without more advanced tricks, @Xander-Hinrichsen and I are investigating how to make this as simple as possible without resorting to collecting real world demonstrations or combining RL with imitation learning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants