Skip to content

Latest commit

 

History

History
274 lines (205 loc) · 20.8 KB

README.md

File metadata and controls

274 lines (205 loc) · 20.8 KB

Deep Learning Nodes for ROS/ROS2

This repo contains deep learning inference nodes and camera/video streaming nodes for ROS/ROS2 with support for Jetson Nano/TX1/TX2/Xavier NX/AGX Xavier and TensorRT.

The nodes use the image recognition, object detection, and semantic segmentation DNN's from the jetson-inference library and NVIDIA Hello AI World tutorial, which come with several built-in pretrained networks for classification, detection, and segmentation and the ability to load customized user-trained models.

The camera/video streaming nodes support the following input/output interfaces:

  • MIPI CSI cameras
  • V4L2 cameras
  • RTP / RTSP
  • Videos & Images
  • Image sequences
  • OpenGL windows

ROS Melodic and ROS2 Eloquent are supported, and the latest version of JetPack is recommended.

Table of Contents

Installation

First, install the latest version of JetPack on your Jetson.

Then, follow the steps below to install the needed components on your Jetson.

jetson-inference

These ROS nodes use the DNN objects from the jetson-inference project (aka Hello AI World). To build and install jetson-inference, see this page or run the commands below:

$ cd ~
$ sudo apt-get install git cmake
$ git clone --recursive https://github.com/dusty-nv/jetson-inference
$ cd jetson-inference
$ mkdir build
$ cd build
$ cmake ../
$ make -j$(nproc)
$ sudo make install
$ sudo ldconfig

Before proceeding, it's worthwhile to test that jetson-inference is working properly on your system by following this step of the Hello AI World tutorial:

ROS/ROS2

Install the ros-melodic-ros-base or ros-eloquent-ros-base package on your Jetson following these directions:

Depending on which version of ROS you're using, install some additional dependencies and create a workspace:

ROS Melodic

$ sudo apt-get install ros-melodic-image-transport ros-melodic-vision-msgs

For ROS Melodic, create a Catkin workspace (~/ros_workspace) using these steps:
http://wiki.ros.org/ROS/Tutorials/InstallingandConfiguringROSEnvironment#Create_a_ROS_Workspace

ROS Eloquent

$ sudo apt-get install ros-eloquent-vision-msgs \
                       ros-eloquent-launch-xml \
                       ros-eloquent-launch-yaml \
                       python3-colcon-common-extensions

For ROS Eloquent, create a workspace (~/ros_workspace) to use:

$ mkdir -p ~/ros2_example_ws/src

ros_deep_learning

Next, navigate into your ROS workspace's src directory and clone ros_deep_learning:

$ cd ~/ros_workspace/src
$ git clone https://github.com/dusty-nv/ros_deep_learning

Then build it - if you are using ROS Melodic, use catkin_make. If you are using ROS2 Eloquent, use colcon build:

$ cd ~/ros_workspace/

# ROS Melodic
$ catkin_make
$ source devel/setup.bash 

# ROS2 Eloquent
$ colcon build
$ source install/local_setup.bash 

The nodes should now be built and ready to use. Remember to source the overlay as shown above so that ROS can find the nodes.

Testing

Before proceeding, if you're using ROS Melodic make sure that roscore is running first:

$ roscore

If you're using ROS2, running the core service is no longer required.

Video Viewer

First, it's recommended to test that you can stream a video feed using the video_source and video_output nodes. See Camera Streaming & Multimedia for valid input/output streams, and substitute your desired input and output argument below. For example, you can use video files for the input or output, or use V4L2 cameras instead of MIPI CSI cameras. You can also use RTP/RTSP streams over the network.

# ROS Melodic
$ roslaunch ros_deep_learning video_viewer.ros1.launch input:=csi://0 output:=display://0

# ROS2 Eloquent
$ ros2 launch ros_deep_learning video_viewer.ros2.launch input:=csi://0 output:=display://0

imagenet Node

You can launch a classification demo with the following commands - substitute your desired camera or video path to the input argument below (see here for valid input/output streams).

Note that the imagenet node also publishes classification metadata on the imagenet/classification topic in a vision_msgs/Detection2DArray message -- see the Topics & Parameters section below for more info.

# ROS Melodic
$ roslaunch ros_deep_learning imagenet.ros1.launch input:=csi://0 output:=display://0

# ROS2 Eloquent
$ ros2 launch ros_deep_learning imagenet.ros2.launch input:=csi://0 output:=display://0

detectnet Node

To launch an object detection demo, substitute your desired camera or video path to the input argument below (see here for valid input/output streams). Note that the detectnet node also publishes the metadata in a vision_msgs/Detection2DArray message -- see the Topics & Parameters section below for more info.

# ROS Melodic
$ roslaunch ros_deep_learning detectnet.ros1.launch input:=csi://0 output:=display://0

# ROS2 Eloquent
$ ros2 launch ros_deep_learning detectnet.ros2.launch input:=csi://0 output:=display://0

segnet Node

To launch a semantic segmentation demo, substitute your desired camera or video path to the input argument below (see here for valid input/output streams). Note that the segnet node also publishes raw segmentation results to the segnet/class_mask topic -- see the Topics & Parameters section below for more info.

# ROS Melodic
$ roslaunch ros_deep_learning segnet.ros1.launch input:=csi://0 output:=display://0

# ROS2 Eloquent
$ ros2 launch ros_deep_learning segnet.ros2.launch input:=csi://0 output:=display://0

Topics & Parameters

Below are the message topics and parameters that each node implements.

imagenet Node

Topic Name I/O Message Type Description
image_in Input sensor_msgs/Image Raw input image
classification Output vision_msgs/Classification2D Classification results (class ID + confidence)
vision_info Output vision_msgs/VisionInfo Vision metadata (class labels parameter list name)
overlay Output sensor_msgs/Image Input image overlayed with the classification results
Parameter Name Type Default Description
model_name string "googlenet" Built-in model name (see here for valid values)
model_path string "" Path to custom caffe or ONNX model
prototxt_path string "" Path to custom caffe prototxt file
input_blob string "data" Name of DNN input layer
output_blob string "prob" Name of DNN output layer
class_labels_path string "" Path to custom class labels file
class_labels_HASH vector<string> class names List of class labels, where HASH is model-specific (actual name of parameter is found via the vision_info topic)

detectnet Node

Topic Name I/O Message Type Description
image_in Input sensor_msgs/Image Raw input image
detections Output vision_msgs/Detection2DArray Detection results (bounding boxes, class IDs, confidences)
vision_info Output vision_msgs/VisionInfo Vision metadata (class labels parameter list name)
overlay Output sensor_msgs/Image Input image overlayed with the detection results
Parameter Name Type Default Description
model_name string "ssd-mobilenet-v2" Built-in model name (see here for valid values)
model_path string "" Path to custom caffe or ONNX model
prototxt_path string "" Path to custom caffe prototxt file
input_blob string "data" Name of DNN input layer
output_cvg string "coverage" Name of DNN output layer (coverage/scores)
output_bbox string "bboxes" Name of DNN output layer (bounding boxes)
class_labels_path string "" Path to custom class labels file
class_labels_HASH vector<string> class names List of class labels, where HASH is model-specific (actual name of parameter is found via the vision_info topic)
overlay_flags string "box,labels,conf" Flags used to generate the overlay (some combination of none,box,labels,conf)
mean_pixel_value float 0.0 Mean pixel subtraction value to be applied to input (normally 0)
threshold float 0.5 Minimum confidence value for positive detections (0.0 - 1.0)

segnet Node

Topic Name I/O Message Type Description
image_in Input sensor_msgs/Image Raw input image
vision_info Output vision_msgs/VisionInfo Vision metadata (class labels parameter list name)
overlay Output sensor_msgs/Image Input image overlayed with the classification results
color_mask Output sensor_msgs/Image Colorized segmentation class mask out
class_mask Output sensor_msgs/Image 8-bit single-channel image where each pixel is a classID
Parameter Name Type Default Description
model_name string "fcn-resnet18-cityscapes-1024x512" Built-in model name (see here for valid values)
model_path string "" Path to custom caffe or ONNX model
prototxt_path string "" Path to custom caffe prototxt file
input_blob string "data" Name of DNN input layer
output_blob string "score_fr_21classes" Name of DNN output layer
class_colors_path string "" Path to custom class colors file
class_labels_path string "" Path to custom class labels file
class_labels_HASH vector<string> class names List of class labels, where HASH is model-specific (actual name of parameter is found via the vision_info topic)
mask_filter string "linear" Filtering to apply to color_mask topic (linear or point)
overlay_filter string "linear" Filtering to apply to overlay topic (linear or point)
overlay_alpha float 180.0 Alpha blending value used by overlay topic (0.0 - 255.0)

video_source Node

Topic Name I/O Message Type Description
raw Output sensor_msgs/Image Raw output image (BGR8)
Parameter Type Default Description
resource string "csi://0" Input stream URI (see here for valid protocols)
codec string "" Manually specify codec for compressed streams (see here for valid values)
width int 0 Manually specify desired width of stream (0 = stream default)
height int 0 Manually specify desired height of stream (0 = stream default)
framerate int 0 Manually specify desired framerate of stream (0 = stream default)
loop int 0 For video files: 0 = don't loop, >0 = # of loops, -1 = loop forever

video_output Node

Topic Name I/O Message Type Description
image_in Input sensor_msgs/Image Raw input image
Parameter Type Default Description
resource string "display://0" Output stream URI (see here for valid protocols)
codec string "h264" Codec used for compressed streams (see here for valid values)
bitrate int 4000000 Target VBR bitrate of encoded streams (in bits per second)