Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lost messages when a port hast multiple connections (one remote connection) #248

Open
planthaber opened this issue Dec 5, 2017 · 6 comments

Comments

@planthaber
Copy link
Contributor

Hi,

I'm writing this to have a visible documentation on the issue, the initial explanation including code to reproduce the issue was in a closed PR: #122, which was not a proper solution and raised other issues.

As output Ports are not buffered, all new data is dropped until the data was completely written to the connected input ports.

When a output port has two connections, new writes are only sent when the previous data was sent and the acks were received by corba. This is an issue on remote connections, especially on networks with low bandwidth.

When the low bandwith connection needs longer for the corba ack as the period of the Task, the next message will not be delivered at all. This means not even on the local connection received the message.

So if you connect a remote gui via network, local messages could possibly be lost.

Nevertheless, a simple workaround is to avoid using additional remote connections on output ports, this can be simply achieved by adding a repeater task, which just writes a data type received by an input port to an output port of the same type. This way only the repeater task is losing messages, not the important task with the double connections (original connection + repeater).

@meyerj
Copy link
Member

meyerj commented Dec 5, 2017

As output Ports are not buffered, all new data is dropped until the data was completely written to the connected input ports.

That's only partially true. For remote connections Orocos adds a local buffer per remote connection for output ports and the write calls never block on Corba. For buffer connections this additional local buffer can also hold multiple elements and unless this buffer is full, no samples will be lost. A Corba dispatcher thread (one per component) is then emptying the local buffer. Unless one-way writes are enabled (compile time option since #123) the dispatcher writes one sample at a time and indeed waits for the remote end to acknowledge, which, for low-bandwidth or high-latency connections could result in dropped samples in the output buffer. But local connections should not be affected by that.

Did you actually observe something different?

@planthaber
Copy link
Contributor Author

Hi,
I experienced the problem on master, i guess because #123 is not merged yet.

@meyerj
Copy link
Member

meyerj commented Dec 5, 2017

#123 added the compile-time option to switch to one-way writes, but the Corba dispatcher thread already existed much longer, probably since the beginning of the RTT corba transport plugin. The responsible code is in RemotePorts.cpp:177 and following on master. The dispatcher is triggered in RemoteChannelElement::signal() in RemoteChannelElement.hpp:132 and then calls RemoteChannelElement::transferSamples() to actually empty the buffer and forward to Corba in a non-realtime context.

So any remote connection of an output port should not be able to delay the samples written to another local connection to the same port (or at least not more than an additional local connection) and definitely not directly call into the Corba middleware. If that would be the case, it would be a bug.

I was not able to run your original example from #122 because I am not familiar with orogen and simply running orogen -v oneway_test.orogen results in this error on my Ubuntu Xenial system using the toolchain-2.9 branches of all toolchain packages, which contain all patches from the respective master branches. Can you either help with that problem or provide CMake code and a deployment script to run the example without orogen?

@planthaber
Copy link
Contributor Author

planthaber commented Dec 13, 2017

So any remote connection of an output port should not be able to delay the samples written to another local connection to the same port

Ok, perhaps there is a Information missing: the "local connection" is also a corba connection, but "locally on the same PC". So both connections are using corba.

https://github.com/orocos-toolchain/rtt/blob/master/rtt/transports/corba/RemoteChannelElement.hpp#L342 this corba write blocks until the transmission is complete, so also transferSamples() and signal() is blocking.

If a new signal() is arriving while the old is still running (blocked), i guess it is not executed. I also guess that the port data content is overwritten in this case or am I wrong?

@meyerj
Copy link
Member

meyerj commented Dec 13, 2017

Yes, in that case your observations perfectly make sense. What @doudou proposed in #227 might be a solution then.

@doudou
Copy link
Contributor

doudou commented Dec 13, 2017

The multi-dispatcher setup is IMO the best solution to segregate domains (UI vs. system, reliable vs unreliable).

On single-host machines, you could also workaround by using the MQ transport, which would not have the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants