Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] ROS Services Hang with RosLibRust Service Servers #212

Open
Carter12s opened this issue Dec 3, 2024 · 0 comments
Open

[Bug] ROS Services Hang with RosLibRust Service Servers #212

Carter12s opened this issue Dec 3, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@Carter12s
Copy link

Describe the bug

We're in the weeds of the poorly documented TCPROS protocol, and this issue could equally validly be filed on rosrust or roslibrust, but this is where I encountered it.

Withing abstract_bridge.rs this chunk of code in process_query:

        let res = spawn_blocking_runtime(move || {
            let description = RawMessageDescription {
                msg_definition: String::from("*"),
                md5sum: topic.md5.clone(),
                msg_type: topic.datatype.clone(),
            };
            ros1_client.req_with_description(&rosrust::RawMessage(payload), description)
        })
        .await

does not resolve when it should when interacting with some ROS1 service server implementations over TCPROS.

Specifically when interacting with roslibrust (which is what I'm trying to do). The following happens:

  1. Request is sent over TCP socket
  2. Request is received by roslibrust and processed
  3. Response is sent over TCP socket
  4. roslibrust holds the TCP socket open waiting to see if more requests are going to come
  5. the zenoh-ros1-bridge / rosrust receives the response over the TCP socket, but doesn't process it and instead waits for the TCP socket to shut down
  6. This ultimately leads to the zenoh query timing out even thou the bytes of the response were sent to the bridge

I can bypass this issue by modifying roslibrust to shutdown the TCP socket from its end after it sends the payload at which point everything behaves normally.

https://wiki.ros.org/ROS/TCPROS - describes the "persistent" header field of service requests, but poorly describes how service servers are supposed to behave when this field is not present. In the case of the bridge / rosrust this field is NOT included in the header.

roslibrust has choose to keep the TCP socket open when the field is not present which has proven compatible with all the ROS1 ecosystem we've tested with so far (which isn't everything). I can (and likely will) modify roslibrust's behavior to work around this issue, but regardless I don't think that the bridge should be waiting for the TCP socket to close to respond to the query, it should respond as soon as the bytes are received.

To reproduce

  1. Start a zenoh-ros1-bridge with a master
  2. Start a roslibrust ros1 service server cargo run --features ros1 --example ros1_service_server
  3. Call the service over zenoh

See that the request makes it the roslibrust ros1 service server, and that it responds, see the zenoh query timeout and the response never make it to the source of the query.

System info

Ubuntu 20.04

@Carter12s Carter12s added the bug Something isn't working label Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant