Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[foxy backport] Lost service responses (#183, #74) (#187) #209

Merged
merged 1 commit into from
Jul 21, 2020

Conversation

jacobperron
Copy link
Member

Backport #187 to Foxy.

I believe since this is only touching a cpp file that the change is ABI compatible.
I will confirm that the change is also wire compatible, to be sure.

* Block rmw_send_response if response reader unknown

The client checks using rmw_service_server_is_available whether the
request it sends will be delivered to service, but that does not imply
that the (independent, as far as DDS is concerned) response reader of
the client has been discovered by the service.  Usually that will be the
case, but there is no guarantee.

Ideally DDS would offer an interface that allows checking the reverse
discovery, but that does not yet exist in either the specification or in
Cyclone.  This commit works around that by delaying publishing the
response until the number of request writers matches the number of
response readers.

Signed-off-by: Erik Boasson <[email protected]>

* Change request headers to use rmw_request_id_t on the wire

Signed-off-by: Erik Boasson <[email protected]>

* Precise check for matched client/service

Assign a unique identifier to each client/service on creation, add it
to the USER_DATA QoS of the reader and writer and use it for the request
ids.  This allows:

* rmw_service_server_is_available to only return true once it has
  discovered a reader/writer pair of a single service (rather than a
  reader from some service and a writer from some service); and
* rmw_send_response to block until it has discovered the requesting
  client's response reader and to abandon the operation when the client
  has disappeared.

The USER_DATA is formatted in the same manner as the participant
USER_DATA, this uses the keys "serviceid" and "clientid".

This is still but a workaround for having a mechanism in DDS to ensure
that the response reader has been discovered prior by the request writer
prior to sending the request.

Signed-off-by: Erik Boasson <[email protected]>

* Address review comments

Signed-off-by: Erik Boasson <[email protected]>

* Backwards compatibility

* Revert commit fb040c5 to retain the
  old wire representation;

* Embed the publication_handle of the request inside rmw_request_id_t,
  possible because reverting to the old wire representation frees up
  enough space, and use this in rmw_send_response to check for the
  presence of the client's reader;

* Clients and services without a client/service id in the reader/writer
  user data are treated as fully matched at all times.

* Replace ERROR by FAILURE to because of windows.h

Signed-off-by: Erik Boasson <[email protected]>

* Timeout rmw_send_response after waiting 100ms for discovery

The discovery will eventually result in the client's reader being known
or its writer no longer being known, so a timeout is not necessary for
correctness.  However, if it ever were to block for a longish
time (which is possible in the face of network failures), returning a
timeout to the caller is expected to result in less confusion.

Signed-off-by: Erik Boasson <[email protected]>

* Make iterators "const auto &"

Signed-off-by: Erik Boasson <[email protected]>

* Add TODO for eliminating rmw_send_response blocking

Signed-off-by: Erik Boasson <[email protected]>
@jacobperron jacobperron requested review from hidmic and eboasson July 21, 2020 19:59
@jacobperron
Copy link
Member Author

On localhost, the following checks work for me:

  • A client of the new version can make requests to a service of the old version.
  • A client of the old version can make requests to a service of the new version.
  • Service and clients of the old version are listed correctly by a ros2cli tool of the new version.
  • Service and clients of the new version are listed correctly by a ros2cli tool of the old version.

Although, I haven't succeeded in discovery of services across hosts (over a VPN); not with the current released version either. I may have not configured something properly.

Copy link
Contributor

@hidmic hidmic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jacobperron
Copy link
Member Author

Although, I haven't succeeded in discovery of services across hosts (over a VPN); not with the current released version either. I may have not configured something properly.

I've figured it out by overriding the default network interface selected. LGTM.

@jacobperron
Copy link
Member Author

jacobperron commented Jul 21, 2020

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
    • Expected test failures.
  • Windows Build Status
    • TestRos2BagRecord.test_incomplete_qos_profile failure is expected.
    • rqt_py_common failure looks unrelated.

@jacobperron jacobperron merged commit 5333ab6 into foxy Jul 21, 2020
@delete-merged-branch delete-merged-branch bot deleted the jacob/backport_187 branch July 21, 2020 23:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants