R3M: A Universal Visual Representation for Robot Manipulation #16403

edbeeching · 2022-03-25T10:57:48Z

🌟 New model addition

Model description

We pre-train a visual representation using the Ego4D human video dataset using a combination of time-contrastive learning, video-language alignment,and an L1 penalty to encourage sparse and compact representations. The resulting representation, R3M, can be used as a frozen perception module for downstream policy learning. Across a suite of 12 simulated robot manipulation tasks, we find that R3M improves task success by over 20% compared to training from scratch and by over 10% compared to state-of-the-art visual representations like CLIP and MoCo. Furthermore, R3M enables a Franka Emika Panda arm to learn a range of manipulation tasks in a real, cluttered apartment given just 20 demonstrations.

Open source status

the model implementation is available:(https://github.com/facebookresearch/r3m)
the model weights are available: https://github.com/facebookresearch/r3m/blob/main/r3m/example.py
who are the authors: @suraj-nair-1

edbeeching added the New model label Mar 25, 2022

suraj-nair-1 mentioned this issue Apr 21, 2022

Integrating R3M Models into Transformers #16883

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R3M: A Universal Visual Representation for Robot Manipulation #16403

R3M: A Universal Visual Representation for Robot Manipulation #16403

edbeeching commented Mar 25, 2022

R3M: A Universal Visual Representation for Robot Manipulation #16403

R3M: A Universal Visual Representation for Robot Manipulation #16403

Comments

edbeeching commented Mar 25, 2022

🌟 New model addition

Model description

Open source status