Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU usage component container #2295

Closed
albertarla opened this issue Sep 1, 2023 · 10 comments
Closed

CPU usage component container #2295

albertarla opened this issue Sep 1, 2023 · 10 comments
Labels
more-information-needed Further information is required

Comments

@albertarla
Copy link

Bug report

Required Info:

  • Operating System:
    • Ubuntu 22.04
  • Installation type:
    • binaries
  • Version or commit hash:
    • ROS2 Humble
  • DDS implementation:
    • Cyclone dds
  • Client library (if applicable):
    • rclcpp_components

Issue description

I have a simple node that publishes a camera image topic. Then, a component_container_isolated that is launching diferent components with a single thread executor in wich every component subscribes to the image topic, in the callback of the subscription a high demanding task is being processed.

The problem appears with the subscription of this image_topic. With two or three components I receive the images at 30fps as expected. After that, the more components I add the lower the fps received. At first, I thought the cause of this problem would be because of the CPU. After monitoring it the % of the cpu usage increments with the addition of components but never gets over the 80% of its total.

Expected behavior

I would expect that before decreasing the fps received in the components the CPU usage would increase, As an example the results I would expect are the following:

  • 1 Component being run -> 30fps -> CPU 55%
  • 2 Components being run -> 30fps -> CPU 55.5%
  • ...
  • 6 Components being run -> 30fps -> CPU 75%
  • ...
  • 8 Components being run -> 30fps -> CPU 90%
  • ...
  • 15 Components being run -> 10fps -> CPU 95%

Actual behavior

As an example the results I'm getting are the following:

  • 1 Component being run -> 30fps -> CPU 55%
  • 2 Components being run -> 30fps -> CPU 55.5%
  • ...
  • 6 Components being run -> 20fps -> CPU 60%
  • ...
  • 8 Components being run -> 15fps -> CPU 70%
  • ...
  • 15 Components being run -> 4fps -> CPU 80%

Additional information

I have tried the same but launching idependent nodes for the subscription (without using the component manager). In this situation, my CPU usage gets to 90% or more with only 5 nodes subscribed to the image. It is then when the frequency of the received images starts decreasing. This is in my opinion what shoul happend using the component manager.

So, is this behaviour normal? Is there any way to improve this CPU usage?

@fujitatomoya
Copy link
Collaborator

@albertarla out of curiosity, how do you check the CPU usage? do you use specific command?

@fujitatomoya fujitatomoya added the more-information-needed Further information is required label Sep 1, 2023
@Yadunund
Copy link
Member

Yadunund commented Sep 1, 2023

I'd be curious to also know if intra-process comms was enabled within the container? And also know the QoS for the image subscriptions?

If not enabled, the RMW layer (cyclonedds without any SHM) would be responsible for serializing/deserializing multiple copies of the image for the different subs. At this point, cyclonedds send/receive buffer sizes and the UDP buffer limits in the kernel (in combination with QoS) may be throttling transport. I would expect that increasing both those limits + sizes as described here would result in a CPU load increase while improving transport performance.

I'd reckon the CPU increase outside the container might have more to do with increased CPU footprint from 5 separate processes vs 1 component container.

@alsora
Copy link
Collaborator

alsora commented Sep 2, 2023

Is the publisher node one of the components? Or is it in a separate process?

@albertarla
Copy link
Author

Hi @fujitatomoya , I don't have precise metrics of the CPU usage. Just the general behaviour. I used htop to monitorize it and the values I'm showing as an example emulates what I'm seeing with htop.

Hi @Yadunund I don't have intra-process comms enabled because (answering the question of @alsora) the publisher node is from another process and not a component. The QoS are the default values for the image_transport, Reliable and Volatile. Morover, the depth for the subscriber is of 1. Additionally, I have tried with Best effort and Volatile with the same results.

I'll try to increase both limits you are talking about and see if something changes. Thanks for the support!

@Yadunund
Copy link
Member

Yadunund commented Sep 4, 2023

Ah whoops I missed the part where the publisher is not within the component.

@albertarla
Copy link
Author

We are lookint into our high demanding process inside the callback as it seems the delay may be caused by it and it's not related with ROS2. I'll keep the issue updated.

@alsora
Copy link
Collaborator

alsora commented Sep 5, 2023

@albertarla you are running multiple processes; can you provide more details about the CPU measurement you reported?
Is that the total CPU usage of the computer? or the CPU usage of the subscriptions process?
Is that normalized with the number of CPU cores?

To get a full understanding, we would need to see the CPU usage of both the publisher and the subscriber process.
The bottleneck may be on the publisher side that it's not able to send messages to that many subscriptions at the desired rate (the publisher is one, so it uses a single thread to serve all publications).

@albertarla
Copy link
Author

@alsora I'll try to record the CPU usage of those single processes by the end of this week and let you know. The data I show is the average CPU usage of the whole computer with all the different process. What I tried to show in this data is that, even the CPU is not at its maximum the fps of the subscription is not 30fps after adding more components. I was expecting for the CPU to increase at its maximum before decreasing performance.

For the moment, the tests I had run are showing me that the ferquency of the publisher is stable at 30fps. To test that I'm computing the frequency the publisher node enters in the pub callback.

@jlblancoc
Copy link

@albertarla Just in case you are interested in testing, we wrote a ROS2 package to measure the CPU usage of one or a set of processes: https://github.com/ual-arm/robotic-simulators-benchmark/tree/main/measure_process_ros2_pkg

We'll try to release it as an independent package... someday :-)

@albertarla
Copy link
Author

Hi all, I changed the code to use intra process communications. Now ros2 topic hz has the expected frequency.

The subscriber still receives the images at a lower frequency but this is now caused only because the processing time of one task inside the callback. I'm still not sure why the CPU is not at its maximum before adding this delay in the processing task but it seems is not related with ROS or rclcpp.

I'll close the issue as I'm going to investigate why this task running inside the callback doesn't use the 100% of the CPU available. I'm pretty sure this is not related to ROS anymore, thanks for the support! If at the end is ROS related I'll reopen this issue and let you know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
more-information-needed Further information is required
Projects
None yet
Development

No branches or pull requests

5 participants