Some work participant discovery #2086

eboasson · 2024-09-06T15:15:20Z

This PR is somewhat massive, but it is primarily driven by a simple thing: make it possible to drop locators from SPDP when no participant seems to exist at that address. I've always refused to do that because of some edge cases in asymmetrical discovery and reconnecting after network interrupts, but the problem is that it also meant that MaxAutoParticipantIndex had to remain small to prevent massive bursts of periodic traffic, but that low limit caused enough trouble to make people prefer those bursts. The ROS 2 navigation stack suffers from this in particular: ros2/rmw_cyclonedds#458

Thus, this makes it prune locators after some time, distinguishing between peers specified in the configuration and locators discovered at run-time, and handling locators of participants of which the lease expired differently from those that terminated cleanly.

It also flips the scheduling of SPDP messages around: it used to have a timed event for each local participant and send to all known locators in one burst; now it scans a table of locators and sends SPDP messages for all participants at the same time. The advantage should be packing; most people will never notice because most applications only have a single participant ...

It's been working just fine for months on my laptop, but there are many configurations out there and I have not been using all of them. I'm sure there are still issues with it, but I think the time has come to just merge it and see what breaks. (Perhaps nothing, after all!)

I've made a PR because I want to be done with it 🙂 but have marked it as a draft because I still need to turn the handful of test scenarios in a bash script into tests that run as part of the regular CI.

This replaces the per-participant timed-event for periodically broadcasting its SPDP message by a central device that schedules and sends the SPDP messages for all participants and maintains the set of addresses to which to send them. There are a number of benefits to this model: * We can make it more likely (certain, if we want) that SPDP messages to a single destination go out in a single packet, reducing the packet rate if there are many participants. * We are no longer dependent on an "addrset" for tracking the set of addresses to publish to (and we don't need to special-case SPDP writers). That means we now have the option of "aging" locators and lowering the frequency or giving up on them altogether if there just doesn't seem to be anything at that address.

This eliminates looking up the SPDP sample in the WHC, which became rather painful with the introduction of serdata_plist, and eliminates the problem of the dispose+unregister not getting stored in the WHC (removed from the index, best-effort so not retained until ACKs come in). It also solves having to construct an address set based on the addresses in the SPDP administration and patching that into the SPDP writer, which was a horrible hack to begin with.

The old default of 9 (so 10 participants) was hit rather often. This raises the default to 99 (so 100 participants). The only real downside is that increases the number of SPDP packets 10x if unicast discovery is used.

Signed-off-by: Erik Boasson <[email protected]>

If one adds a host to the peer list, we add MaxAutoParticipantIndex+1 locators for it, but it generally doesn't make much sense to keep pinging all of them if there is no response. Especially if MaxAutoParticipantIndex is large, it causes a lot of periodic traffic.

SPDP locators can be added by the configuration but also by discovering a peer that we (presumably) cannot reach via multicast. The question is what to do with such a discovered address when the peer is no longer there. If we learnt a locator through discovery and we know for certain that the peer is no longer there (i.e., received a dispose/unregister) and that there are also no other peers at the locator, it makes sense to remove the address: a new peer that shows up at that locator would presumably try to contact us. If instead the peer's lease expired, we need to keep pinging it for a some time in case it was just the disconnect. So for each SPDP locator we need to know why we have it in the table of addreses. Signed-off-by: Erik Boasson <[email protected]>

This replaces the somewhat silly (and ancient) tracing of the queue length on every insert.

Signed-off-by: Erik Boasson <[email protected]>

Replaces src/core/xtests/spdp_scenarios.bash

eboasson added 13 commits September 23, 2024 12:19

Refactor SPDP handling

b67c90e

Raise default MaxAutoParticipantIndex

d72b56f

The old default of 9 (so 10 participants) was hit rather often. This raises the default to 99 (so 100 participants). The only real downside is that increases the number of SPDP packets 10x if unicast discovery is used.

Fix memory leak on OOM in _confgen

81e0612

Signed-off-by: Erik Boasson <[email protected]>

Remove as_disc, track addresses in spdp scheduler

51a4c9e

Periodically trace outgoing message queue length

0f9b718

This replaces the somewhat silly (and ancient) tracing of the queue length on every insert.

EADDRNOTAVAIL is not a generic error for sendmsg

bba519e

Signed-off-by: Erik Boasson <[email protected]>

Consider pinging localhost when ppidx not default

4cb337b

Signed-off-by: Erik Boasson <[email protected]>

Peers up to and including MaxAutoParticipantIndex

40db716

Signed-off-by: Erik Boasson <[email protected]>

Add tests for various SPDP-related behaviours

31e30cb

Replaces src/core/xtests/spdp_scenarios.bash

eboasson force-pushed the rework-spdp branch from 33eb9b6 to 31e30cb Compare September 25, 2024 18:02

eboasson marked this pull request as ready for review September 25, 2024 18:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some work participant discovery #2086

Some work participant discovery #2086

eboasson commented Sep 6, 2024

Some work participant discovery #2086

Are you sure you want to change the base?

Some work participant discovery #2086

Conversation

eboasson commented Sep 6, 2024