Skip to content

Commit

Permalink
Adds documentation and example scripts for sensors (isaac-sim#1443)
Browse files Browse the repository at this point in the history
## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

---------

Signed-off-by: Michael Gussert <[email protected]>
Co-authored-by: David Hoeller <[email protected]>
Co-authored-by: Kelly Guo <[email protected]>
  • Loading branch information
3 people authored Nov 25, 2024
1 parent d603b67 commit 4d99147
Show file tree
Hide file tree
Showing 21 changed files with 994 additions and 26 deletions.
3 changes: 2 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ Table of Contents

source/overview/developer-guide/index
source/overview/core-concepts/index
source/overview/sensors/index
source/overview/environments
source/overview/reinforcement-learning/index
source/overview/teleop_imitation
Expand All @@ -98,7 +99,7 @@ Table of Contents

source/features/hydra
source/features/multi_gpu
source/features/tiled_rendering
Tiled Rendering</source/overview/sensors/camera>
source/features/reproducibility

.. toctree::
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -1,7 +1,15 @@
Tiled-Camera Rendering
======================
.. _overview_sensors_camera:

.. currentmodule:: omni.isaac.lab

Camera
========

Camera sensors are uniquely defined by the use of the ``render_product``, a structure for managing data generated by the rendering pipeline (images). Isaac Lab provides the ability to fully control how these renderings are created through camera parameters like focal length, pose, type, etc... and what kind of data you want to render through the use of Annotators, allowing you to record not only RGB, but also Instance segmentation, object pose, object ID, etc...

Rendered images are unique among the supported data types in Isaac Lab due to the inherently large bandwidth requirements for moving those data. A single 800 x 600 image with 32-bit color (a single float per pixel) clocks in at just under 2 MB. If we render at 60 fps and record every frame, that camera needs to move 120 MB/s. Multiply this by the number of cameras in an environment and environments in a simulation, and you can quickly see how scaling a naive vectorization of camera data could lead to bandwidth challenges. NVIDIA's Isaac Lab leverages our expertise in GPU hardware to provide an API that specifically addresses these scaling challenges in the rendering pipeline.

Tiled Rendering
~~~~~~~~~~~~~~~~~

.. note::

Expand All @@ -10,19 +18,9 @@ Tiled-Camera Rendering
Tiled rendering in combination with image processing networks require heavy memory resources, especially
at larger resolutions. We recommend running 512 cameras in the scene on RTX 4090 GPUs or similar.

The Tiled Rendering APIs provide a vectorized interface for collecting data from camera sensors. This is useful for reinforcement learning environments where parallelization can be exploited to accelerate data collection and thus the training loop. Tiled rendering works by using a single ``render_product`` for **all** clones of a single camera in the scene. The desired dimensions of a single image and the number of environments are used to compute a much larger ``render_product``, consisting of the tiled individual renders from the separate clones of the camera. When all cameras have populated their buffers the render product is "completed" and can be moved around as a single, large image, dramatically reducing the overhead for moving the data from the host to the device, for example. Only a single call is used to synchronize the device data, instead of one call per camera, and this is a big part of what makes the Tiled Rendering API more efficient for working with vision data.

Tiled rendering APIs provide a vectorized interface for collecting data from camera sensors.
This is useful for reinforcement learning environments requiring vision in the loop.
Tiled rendering works by concatenating camera outputs from multiple cameras and rendering
one single large image instead of multiple smaller images that would have been produced
by each individual camera. This reduces the amount of time required for rendering and
provides a more efficient API for working with vision data.

Isaac Lab provides tiled rendering APIs for RGB, depth, along with other annotators through the :class:`~sensors.TiledCamera`
class. Configurations for the tiled rendering APIs can be defined through the :class:`~sensors.TiledCameraCfg`
class, specifying parameters such as the regex expression for all camera paths, the transform
for the cameras, the desired data type, the type of cameras to add to the scene, and the camera
resolution.
Isaac Lab provides tiled rendering APIs for RGB, depth, along with other annotators through the :class:`~sensors.TiledCamera` class. Configurations for the tiled rendering APIs can be defined through the :class:`~sensors.TiledCameraCfg` class, specifying parameters such as the regex expression for all camera paths, the transform for the cameras, the desired data type, the type of cameras to add to the scene, and the camera resolution.

.. code-block:: python
Expand All @@ -37,28 +35,25 @@ resolution.
height=80,
)
To access the tiled rendering interface, a :class:`~sensors.TiledCamera` object can be created and used
to retrieve data from the cameras.
To access the tiled rendering interface, a :class:`~sensors.TiledCamera` object can be created and used to retrieve data from the cameras.

.. code-block:: python
tiled_camera = TiledCamera(cfg.tiled_camera)
data_type = "rgb"
data = tiled_camera.data.output[data_type]
The returned data will be transformed into the shape (num_cameras, height, width, num_channels), which
can be used directly as observation for reinforcement learning.
The returned data will be transformed into the shape (num_cameras, height, width, num_channels), which can be used directly as observation for reinforcement learning.

When working with rendering, make sure to add the ``--enable_cameras`` argument when launching the
environment. For example:
When working with rendering, make sure to add the ``--enable_cameras`` argument when launching the environment. For example:

.. code-block:: shell
python source/standalone/workflows/rl_games/train.py --task=Isaac-Cartpole-RGB-Camera-Direct-v0 --headless --enable_cameras
Annotators and Data Types
-------------------------
Annotators
~~~~~~~~~~~~~~~~~

Both :class:`~sensors.TiledCamera` and :class:`~sensors.Camera` classes provide APIs for retrieving various types annotator data from replicator:

Expand All @@ -76,6 +71,11 @@ Both :class:`~sensors.TiledCamera` and :class:`~sensors.Camera` classes provide
RGB and RGBA
~~~~~~~~~~~~

.. figure:: ../../_static/overview/overview_sensors_rgb.png
:align: center
:figwidth: 100%
:alt: A scene captured in RGB

``rgb`` data type returns a 3-channel RGB colored image of type ``torch.uint8``, with dimension (B, H, W, 3).

``rgba`` data type returns a 4-channel RGBA colored image of type ``torch.uint8``, with dimension (B, H, W, 4).
Expand All @@ -85,6 +85,11 @@ To convert the ``torch.uint8`` data to ``torch.float32``, divide the buffer by 2
Depth and Distances
~~~~~~~~~~~~~~~~~~~

.. figure:: ../../_static/overview/overview_sensors_depth.png
:align: center
:figwidth: 100%
:alt: A scene captured in RGB

``distance_to_camera`` returns a single-channel depth image with distance to the camera optical center. The dimension for this annotator is (B, H, W, 1) and has type ``torch.float32``.

``distance_to_image_plane`` returns a single-channel depth image with distances of 3D points from the camera plane along the camera's Z-axis. The dimension for this annotator is (B, H, W, 1) and has type ``torch.float32``.
Expand All @@ -94,6 +99,11 @@ Depth and Distances
Normals
~~~~~~~

.. figure:: ../../_static/overview/overview_sensors_normals.png
:align: center
:figwidth: 100%
:alt: A scene captured in RGB

``normals`` returns an image containing the local surface normal vectors at each pixel. The buffer has dimension (B, H, W, 3), containing the (x, y, z) information for each vector, and has data type ``torch.float32``.

Motion Vectors
Expand All @@ -104,6 +114,11 @@ Motion Vectors
Semantic Segmentation
~~~~~~~~~~~~~~~~~~~~~

.. figure:: ../../_static/overview/overview_sensors_semantic.png
:align: center
:figwidth: 100%
:alt: A scene captured in RGB

``semantic_segmentation`` outputs semantic segmentation of each entity in the camera’s viewport that has semantic labels. In addition to the image buffer, an ``info`` dictionary can be retrieved with ``tiled_camera.data.info['semantic_segmentation']`` containing ID to labels information.

- If ``colorize_semantic_segmentation=True`` in the camera config, a 4-channel RGBA image will be returned with dimension (B, H, W, 4) and type ``torch.uint8``. The info ``idToLabels`` dictionary will be the mapping from color to semantic labels.
Expand All @@ -113,6 +128,11 @@ Semantic Segmentation
Instance ID Segmentation
~~~~~~~~~~~~~~~~~~~~~~~~

.. figure:: ../../_static/overview/overview_sensors_instanceID.png
:align: center
:figwidth: 100%
:alt: A scene captured in RGB

``instance_id_segmentation_fast`` outputs instance ID segmentation of each entity in the camera’s viewport. The instance ID is unique for each prim in the scene with different paths. In addition to the image buffer, an ``info`` dictionary can be retrieved with ``tiled_camera.data.info['instance_id_segmentation_fast']`` containing ID to labels information.

The main difference between ``instance_id_segmentation_fast`` and ``instance_segmentation_fast`` are that instance segmentation annotator goes down the hierarchy to the lowest level prim which has semantic labels, where instance ID segmentation always goes down to the leaf prim.
Expand All @@ -124,11 +144,18 @@ The main difference between ``instance_id_segmentation_fast`` and ``instance_seg
Instance Segmentation
"""""""""""""""""""""

.. figure:: ../../_static/overview/overview_sensors_instance.png
:align: center
:figwidth: 100%
:alt: A scene captured in RGB

``instance_segmentation_fast`` outputs instance segmentation of each entity in the camera’s viewport. In addition to the image buffer, an ``info`` dictionary can be retrieved with ``tiled_camera.data.info['instance_segmentation_fast']`` containing ID to labels and ID to semantic information.

- If ``colorize_instance_segmentation=True`` in the camera config, a 4-channel RGBA image will be returned with dimension (B, H, W, 4) and type ``torch.uint8``. The info ``idToLabels`` dictionary will be the mapping from color to USD prim path of that semantic entity. The info ``idToSemantics`` dictionary will be the mapping from color to semantic labels of that semantic entity.
- If ``colorize_instance_segmentation=True`` in the camera config, a 4-channel RGBA image will be returned with dimension (B, H, W, 4) and type ``torch.uint8``.

- If ``colorize_instance_segmentation=False``, a buffer of dimension (B, H, W, 1) of type ``torch.int32`` will be returned, containing the instance ID of each pixel.

- If ``colorize_instance_segmentation=False``, a buffer of dimension (B, H, W, 1) of type ``torch.int32`` will be returned, containing the instance ID of each pixel. The info ``idToLabels`` dictionary will be the mapping from instance ID to USD prim path of that semantic entity. The info ``idToSemantics`` dictionary will be the mapping from instance ID to semantic labels of that semantic entity.
The info ``idToLabels`` dictionary will be the mapping from color to USD prim path of that semantic entity. The info ``idToSemantics`` dictionary will be the mapping from color to semantic labels of that semantic entity.


Current Limitations
Expand Down
145 changes: 145 additions & 0 deletions docs/source/overview/sensors/contact_sensor.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
.. _overview_sensors_contact:

Contact Sensor
================

.. figure:: ../../_static/overview/overview_sensors_contact_diagram.png
:align: center
:figwidth: 100%
:alt: A contact sensor with filtering

The contact sensor is designed to return the net contact force acting on a given ridgid body. The sensor is written to behave as a physical object, and so the "scope" of the contact sensor is limited to the body (or bodies) that defines it. There are multiple ways to define this scope, depending on your need to filter the forces coming from the contact.

By default, the reported force is the total contact force, but your application may only care about contact forces due to specific objects. Retrieving contact forces from specific objects requires filtering, and this can only be done in a "many-to-one" way. A multi-legged robot that needs filterable contact information for its feet would require one sensor per foot to be defined in the environment, but a robotic hand with contact sensors on the tips of each finger can be defined with a single sensor.

Consider a simple environment with an Anymal Quadruped and a block

.. code-block:: python
@configclass
class ContactSensorsSceneCfg(InteractiveSceneCfg):
"""Design the scene with sensors on the robot."""
# ground plane
ground = AssetBaseCfg(prim_path="/World/defaultGroundPlane", spawn=sim_utils.GroundPlaneCfg())
# lights
dome_light = AssetBaseCfg(
prim_path="/World/Light", spawn=sim_utils.DomeLightCfg(intensity=3000.0, color=(0.75, 0.75, 0.75))
)
# robot
robot = ANYMAL_C_CFG.replace(prim_path="{ENV_REGEX_NS}/Robot")
# Rigid Object
cube = RigidObjectCfg(
prim_path="{ENV_REGEX_NS}/Cube",
spawn=sim_utils.CuboidCfg(
size=(0.5,0.5,0.1),
rigid_props=sim_utils.RigidBodyPropertiesCfg(),
mass_props=sim_utils.MassPropertiesCfg(mass=100.0),
collision_props=sim_utils.CollisionPropertiesCfg(),
physics_material=sim_utils.RigidBodyMaterialCfg(static_friction=1.0),
visual_material=sim_utils.PreviewSurfaceCfg(diffuse_color=(0.0, 1.0, 0.0), metallic=0.2),
),
init_state=RigidObjectCfg.InitialStateCfg(pos=(0.5, 0.5, 0.05)),
)
contact_forces_LF = ContactSensorCfg(
prim_path="{ENV_REGEX_NS}/Robot/LF_FOOT",
update_period=0.0,
history_length=6,
debug_vis=True,
filter_prim_paths_expr=["{ENV_REGEX_NS}/Cube"],
)
contact_forces_RF = ContactSensorCfg(
prim_path="{ENV_REGEX_NS}/Robot/RF_FOOT",
update_period=0.0,
history_length=6,
debug_vis=True,
filter_prim_paths_expr=["{ENV_REGEX_NS}/Cube"],
)
contact_forces_H = ContactSensorCfg(
prim_path="{ENV_REGEX_NS}/Robot/.*H_FOOT",
update_period=0.0,
history_length=6,
debug_vis=True,
)
We define the sensors on the feet of the robot in two different ways. The front feet are independent sensors (one sensor body per foot) and the "Cube" is placed under the left foot. The hind feet are defined as a single sensor with multiple bodies.

We can then run the scene and print the data from the sensors

.. code-block:: python
def run_simulator(sim: sim_utils.SimulationContext, scene: InteractiveScene):
.
.
.
# Simulate physics
while simulation_app.is_running():
.
.
.
# print information from the sensors
print("-------------------------------")
print(scene["contact_forces_LF"])
print("Received force matrix of: ", scene["contact_forces_LF"].data.force_matrix_w)
print("Received contact force of: ", scene["contact_forces_LF"].data.net_forces_w)
print("-------------------------------")
print(scene["contact_forces_RF"])
print("Received force matrix of: ", scene["contact_forces_RF"].data.force_matrix_w)
print("Received contact force of: ", scene["contact_forces_RF"].data.net_forces_w)
print("-------------------------------")
print(scene["contact_forces_H"])
print("Received force matrix of: ", scene["contact_forces_H"].data.force_matrix_w)
print("Received contact force of: ", scene["contact_forces_H"].data.net_forces_w)
Here, we print both the net contact force and the filtered force matrix for each contact sensor defined in the scene. The front left and front right feet report the following

.. code-block:: bash
-------------------------------
Contact sensor @ '/World/envs/env_.*/Robot/LF_FOOT':
view type : <class 'omni.physics.tensors.impl.api.RigidBodyView'>
update period (s) : 0.0
number of bodies : 1
body names : ['LF_FOOT']
Received force matrix of: tensor([[[[-1.3923e-05, 1.5727e-04, 1.1032e+02]]]], device='cuda:0')
Received contact force of: tensor([[[-1.3923e-05, 1.5727e-04, 1.1032e+02]]], device='cuda:0')
-------------------------------
Contact sensor @ '/World/envs/env_.*/Robot/RF_FOOT':
view type : <class 'omni.physics.tensors.impl.api.RigidBodyView'>
update period (s) : 0.0
number of bodies : 1
body names : ['RF_FOOT']
Received force matrix of: tensor([[[[0., 0., 0.]]]], device='cuda:0')
Received contact force of: tensor([[[1.3529e-05, 0.0000e+00, 1.0069e+02]]], device='cuda:0')
Notice that even with filtering, both sensors report the net contact force acting on the foot. However only the left foot has a non zero "force matrix", because the right foot isn't standing on the filtered body, ``/World/envs/env_.*/Cube``. Now, checkout the data coming from the hind feet!

.. code-block:: bash
-------------------------------
Contact sensor @ '/World/envs/env_.*/Robot/.*H_FOOT':
view type : <class 'omni.physics.tensors.impl.api.RigidBodyView'>
update period (s) : 0.0
number of bodies : 2
body names : ['LH_FOOT', 'RH_FOOT']
Received force matrix of: None
Received contact force of: tensor([[[9.7227e-06, 0.0000e+00, 7.2364e+01],
[2.4322e-05, 0.0000e+00, 1.8102e+02]]], device='cuda:0')
In this case, the contact sensor has two bodies: the left and right hind feet. When the force matrix is queried, the result is ``None`` because this is a many body sensor, and presently Isaac Lab only supports "many to one" contact force filtering. Unlike the single body contact sensor, the reported force tensor has multiple entries, with each "row" corresponding to the contact force on a single body of the sensor (matching the ordering at construction).

.. dropdown:: Code for contact_sensor.py
:icon: code

.. literalinclude:: ../../../../source/standalone/demos/sensors/contact_sensor.py
:language: python
:linenos:
Loading

0 comments on commit 4d99147

Please sign in to comment.