[Question] Solving Pick-Cube from Pixels Only #667

SumeetBatra · 2024-10-30T20:59:50Z

Hey! I wanted to see if you guys had any reference code / hyperparameters for SAC solving any of the tabletop tasks using RGB(D) data only and no proprioceptive state information. Thanks!

StoneT2000 · 2024-10-30T21:09:36Z

Sorry we have not tuned SAC at the moment, only PPO with some proprioception data + one RGB camera. There is some example code with state based SAC, a simple vision based one will come eventually. TD-MPC2 is already integrated and supports learning from pixels, does need much tuning.

If there's a lot of value in testing algorithms with visual only inputs we can try and help set it up in the future, we have some DM control environments benchmarked with PPO with an option to use visual only inputs.

SumeetBatra · 2024-10-30T21:40:04Z

I see, thanks for letting me know! I think having some baselines of end-to-end pixel to action policies would be useful. I am currently using SAC for my project but may also try out other algos in the future.

StoneT2000 · 2024-10-30T23:49:13Z

Is GPU parallelization important in your case? Or are you working more on e.g. sample-efficiency. I can have some members on the team look to try and tune a RGB/RGBD SAC version.

SumeetBatra · 2024-10-30T23:51:53Z

It's not important, but if it makes policy convergence faster I'm for GPU parallelization. Sample efficiency is not an issue atm. I appreciate you all looking into this!

SumeetBatra · 2024-11-13T07:26:26Z

Hey! Just wanted to check in and see if this is in the pipeline and if so, if you guys have an expected release date on it. Thanks!

StoneT2000 · 2024-11-15T20:03:30Z

Currently working on it! Fixing up the SAC state and RGBD implementations now. will provide baseline for PickCube and maybe a few other tasks

StoneT2000 · 2024-11-16T01:51:32Z

Ok @SumeetBatra new baseline uploaded. I only checked it works for PushCube and PickCube from pixels. the suggested script to run

python sac_rgbd.py --env_id="PickCube-v1" --obs_mode="rgb" \
  --num_envs=32 --utd=0.5 --buffer_size=300_000 \
  --control-mode="pd_ee_delta_pos" --camera_width=64 --camera_height=64 \
  --total_timesteps=1_000_000 --eval_freq=10_000

was tested and converged after about 1-1.5 hours on a 4090. The SAC code can run faster if I add torch compile/cudagraphs support and add some shared memory optimization for observation storage but that will be done in the future.

31.mp4

tiny 64x64 image in each corner is what the policy sees. Policy also sees any relevant state information (like goal position for the cube and agent joint positions).

See the SAC baseline readme: https://github.com/haosulab/ManiSkill/blob/main/examples/baselines/sac/README.md

I'm sure the other tasks work fine with just the same hyperparameters as PickCube training if trained long enough and appropriate controller is used.

SumeetBatra · 2024-11-22T20:08:18Z

@StoneT2000 Thank you so much!! I'll take a look and follow up if I have any questions.

SumeetBatra · 2024-11-27T22:28:33Z

@StoneT2000 I had a chance to look over the sac_rgbd baseline and it looks like state information is in there by default. Is it possible to solve the task without having any proprioceptive state information, just rgb(d) observations only?

EDIT: For extra context, what I'm trying to avoid is needing a perception pipeline to estimate low dimensional state information when working on real hardware. If any state information is present, ideally it should come from somewhere else like inverse kinematics and not a noisy / brittle perception system. Now that I think about it, joint angles can come from IK, so maybe this solution avoids the need for a perception pipeline? I haven't worked with these systems before, so let me know if I'm misunderstanding something.

StoneT2000 · 2024-11-27T23:07:33Z

Hi @SumeetBatra

So generally when it comes to sim2real / real2sim or testing if something might work in the real world at all, the state data that is accessible and quite accurate is

joint positions / qpos values
tcp_pose / end-effector pose / link poses (tcp_pose is one of the observation states given in PickCube all the time). These poses are available in the real world via IK using current joint positions.
anything else like "command" information. For example in PickCube a goal_pos is given in the observations which is a xyz position in 3D space.

We by default also give qvel values but these require estimation and are harder to align between sim and real so I would definitely just remove that (you don't need it to solve tasks usually, it might help with sample efficiency at times).

If you plan to do sim2real you will need to make modifications to environments for transfer regardless. By default envs in ManiSkill unless stated otherwise are designed more for algorithm benchmarking.

Also from image only is quite difficult, although maybe not impossible assuming the goal information is in the images somewhere (For PickCube it is not, but for peg insertion side or StackCube it is in the perceived image data). It is best to always include necessary goal information of the env, as well as qpos values and tcp poses if possible otherwise learning is slower.

SumeetBatra · 2024-11-27T23:15:02Z

This is really helpful, thanks!

What kind of modifications are needed to facilitate sim2real transfer? I'm guessing DR in the form of state observation noise and maybe some physics randomization at a minimum? Anything else I'm missing? And is there some existing pipeline for facilitating sim2real transfer in the repo? FYI I'm not concerned with the sim2real perception gap atm, mostly with sim2real physics gap and unmodeled dynamics.

StoneT2000 · 2024-12-16T22:31:03Z

Hard to say, our lab is still finishing up some basic reproducible sim2real experiments that we will have relatively ready to share in a month or two I think. It is led by @Xander-Hinrichsen at the moment, he can comment a bit more on his own real experiences.

At minimum

object color randomization
green-screening a real world image as the background (works for static non-mobile robotics setups like a single arm)
observation noise for state related data like agent qpos
ensure your simulation controller behaves close to the real world controller. I'd recommend checking for each action you can take from some rest position in sim and real, verify the qpos of the robot in real and sim are very close and don't deviate. Our current recommendation that works decently well is to use pd_joint_target_delta_pos controllers, and to tune the real world controller to always try and achieve the joint target.

Then you easily train a RGB based policy in sim and do direct deployment in the real world for mostly simpler tasks of reaching/pushing/pulling type behaviors. Picking a cube is kind of hard still without more advanced tricks, @Xander-Hinrichsen and I are investigating how to make this as simple as possible without resorting to collecting real world demonstrations or combining RL with imitation learning.

SumeetBatra closed this as completed Nov 22, 2024

SumeetBatra reopened this Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Solving Pick-Cube from Pixels Only #667

[Question] Solving Pick-Cube from Pixels Only #667

SumeetBatra commented Oct 30, 2024

StoneT2000 commented Oct 30, 2024 •

edited

Loading

SumeetBatra commented Oct 30, 2024

StoneT2000 commented Oct 30, 2024

SumeetBatra commented Oct 30, 2024

SumeetBatra commented Nov 13, 2024

StoneT2000 commented Nov 15, 2024

StoneT2000 commented Nov 16, 2024 •

edited

Loading

SumeetBatra commented Nov 22, 2024

SumeetBatra commented Nov 27, 2024 •

edited

Loading

StoneT2000 commented Nov 27, 2024

SumeetBatra commented Nov 27, 2024 •

edited

Loading

StoneT2000 commented Dec 16, 2024

[Question] Solving Pick-Cube from Pixels Only #667

[Question] Solving Pick-Cube from Pixels Only #667

Comments

SumeetBatra commented Oct 30, 2024

StoneT2000 commented Oct 30, 2024 • edited Loading

SumeetBatra commented Oct 30, 2024

StoneT2000 commented Oct 30, 2024

SumeetBatra commented Oct 30, 2024

SumeetBatra commented Nov 13, 2024

StoneT2000 commented Nov 15, 2024

StoneT2000 commented Nov 16, 2024 • edited Loading

SumeetBatra commented Nov 22, 2024

SumeetBatra commented Nov 27, 2024 • edited Loading

StoneT2000 commented Nov 27, 2024

SumeetBatra commented Nov 27, 2024 • edited Loading

StoneT2000 commented Dec 16, 2024

StoneT2000 commented Oct 30, 2024 •

edited

Loading

StoneT2000 commented Nov 16, 2024 •

edited

Loading

SumeetBatra commented Nov 27, 2024 •

edited

Loading

SumeetBatra commented Nov 27, 2024 •

edited

Loading