Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance drop in ROS node vs. standalone execution of detectnet mobilenet-ssd-v2 #137

Open
ashishbhatti opened this issue Mar 22, 2024 · 1 comment

Comments

@ashishbhatti
Copy link

Description:
I am experiencing a significant performance drop when running the mobilenet-ssd-v2 model with a detectnet ROS node compared to standalone execution. The FPS drops by approximately two-thirds, which is unexpected given that the model and its computational load remain unchanged.

Performance Details:

  • Standalone FPS: Approximately 24 FPS
  • ROS Node FPS: Approximately 8 FPS

Environment:

  • Model: mobilenet-ssd-v2 with detectnet
  • Platform: NVIDIA Jetson Nano
  • Software: jetson-inference docker container, ROS Noetic

Expected Behavior:
The FPS should be comparable between the ROS node and standalone executions since the model's computational requirements do not change.

Steps to Reproduce:

  1. Run as ROS Node
$ git clone --recursive --depth=1 https://github.com/dusty-nv/jetson-inference
$ cd jetson-inference
$ docker/run.sh --ros=noetic
$ roscore
$ roslaunch ros_deep_learning video_viewer.ros1.launch input:=v4l2:///dev/video0 output:=display://0
  1. Run standalone
$ docker/run.sh
$ cd build/aarch64/bin
$ ./detectnet /dev/video0

Additional Information:
I have attached screenshots demonstrating the FPS in both scenarios.
| normal | ros |

I am seeking insights or suggestions that could explain the cause of this performance drop and how it might be resolved. Any help would be greatly appreciated.

@dusty-nv
Copy link
Owner

Hi @ashishbhatti, sorry about that, I no longer have a setup for running it on the versions you specify, however my initial guess is that is related to inefficient image transport of video stream topics in Noetic. I think the primary difference with detectnet/detectnet.py examples is that the images are captured with zero-copy and into CUDA memory.

I remember exploring the use of ROS nodelets (for the imageNet classification models in that case) to work around this, where it all resides inside one process then. If you don't need the camera imagery in other nodes, you could explore just creating a wrapper node that both captures the camera and does detectNet inferencing inside the same node, alleviating the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants