Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close the Zenoh session after destroying the last node #21

Closed
wants to merge 1 commit into from

Conversation

Yadunund
Copy link

@Yadunund Yadunund commented Sep 19, 2024

With this change I can get the failing test here to pass. Didn't bother with thread safety here since that is the focus of other PRs in upstream rmw_zenoh.

// Erase the node from the set of session_nodes and close the Zenoh
// session if this is the last node.
node->context->impl->session_nodes_.erase(node);
if (node->context->impl->session_nodes_.empty()) {
Copy link

@JEnoch JEnoch Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you also test if node->context->impl->is_shutdown == true ?
I mean could it happen that some process close all its Nodes without calling rcl_shutdown(), but re-creating Nodes later ?

👆My bad: on a graceful closure, the Nodes are deleted before calling rcl_shutdown(), and is_shutdown is always false on last node removal.

@JEnoch
Copy link

JEnoch commented Sep 19, 2024

Could it happen that some ROS application do the following steps:

  1. create a Node
  2. delete the Node, not calling rcl_shutdown()
  3. re-create a Node

In this case the Session is closed at step 2, and step 3 will fail.

@Yadunund
Copy link
Author

Could it happen that some ROS application do the following steps:

  1. create a Node
  2. delete the Node, not calling rcl_shutdown()
  3. re-create a Node

In this case the Session is closed at step 2, and step 3 will fail.

Not that i'm aware of. But there could be some tests that aim to do this....

After thinking about this further, I think the best way forward that works with all these different scenarios is to be able to close a session within rmw_context_fini() without segfaulting...

@Yadunund
Copy link
Author

Yadunund commented Sep 19, 2024

Actually after looking at the rmw_shutdown() documentation again, I think closing the session within rmw_shutdown() would make a lot of sense. I would argue that the failing test from ros2#276 (review) does not have the right expectation. If a rmw_shutdown() is called on a context, then it should not expect any nodes created using the same context to still be present in the graph.

So i'm inclined towards closing this PR and fixing the test expectations.

@JEnoch
Copy link

JEnoch commented Sep 20, 2024

I'm also in favor of closing the session in rmw_shutdown().

It would also have the beneficial side effect to lower the traffic when a process shutdowns without explicitly cleaning up its Node and pub/sub/services. At closure Zenoh won't send all the unregister_token messages but just close the connection. The remote Nodes will detect this closure and automatically call the LivelinessSubscriber callback for each disappeared token.

The drawback is that the order of tokens unregistration is not guaranteed in this case, and the Node's token will probably be unregistered before some of the Node's pub/sub/services tokens.
This leads to such warning with current graph cache implementation:

[WARN] [1726642070.602629950] [rmw_zenoh_cpp]: Received liveliness token to remove node /_ros2cli_147828 from the graph before all pub/subs/clients/services for this node have been removed. Removing all entities first...
[WARN] [1726642070.602636280] [rmw_zenoh_cpp]: Received liveliness token to remove unknown node /_ros2cli_147828 from the graph. Ignoring...

But I think those logs can be set to debug level or just removed, as the cache is correctly cleaned up anyway.

@Yadunund
Copy link
Author

Okay to change the log levels to debug for those printouts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants