-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rcl_wait() returns early when a timer awakes #687
Comments
One potential workaround would be to relax the definition of a ready |
Alternatively, since it's quite likely not every possible timer period can be achieved due to time resolution constrains, the expectation on |
I guess what I don't understand about this is that if you ask for 1 nanosecond sleep, then yeah, it might take the OS considerably longer than that to return (maybe on the order of 10ms for time slicing). But if you ask for 10ms sleep, the OS should never return in 9 ms; that's just not doing what the user application asked for. Extending this to |
That is true for a plain sleep, but arguable for a time deadline (and even more so if you want to sustain some periodicity, but that's unrelated to |
I agree with @clalancette about the expectation that it should never return before the time period the user specified. This is the case on Linux and macOS, and I would have extended that to rmw_wait or dds_wait, but maybe that's not guaranteed.
That block from the Microsoft docs makes me sad, but basically I agree it could end early based on that, assuming that's the same kind of timer that is being used by RTI. This is also a condition we could check for and at least log something about it. Or even we could go back to sleep if needed. On the other hand, we could adjust the tests (and our documentation) to reflect reality on Windows (if I understand the core issue correctly). |
That's fair for
I agree with updating |
👍
👍 I think this one is the problem we have to solve.
It's not only that. |
There's that too, yeah. And waking up earlier won't do if the timer clock goes faster than the system's (typical in CI settings using simulation with an RTF > 1). This is a separate issue though.
I will say that a delay (or sleep) and a timeout are two conceptually different things. If you ask for a 10ms delay, then yes, you'd expect it to last 10ms at least. But if you set a 10ms timeout, I'd expect it to last 10ms at most. And thus RTI Connext's |
I will point out that that MSVC article says that starting on Windows 8,
To be fair, on a non-realtime OS, all timer periods are a rough lower bound. So I think that expectation is pretty well set, and I think we could go back to sleep in that case (note that in the above discussion, I've been ignoring signals on Unix anyway, which is another case that might cause a sleep to wake up early and have to go back to sleep). But I definitely agree with you that the documentation should be clear about this.
I don't think so. At least, that's not how I understand e.g. the timeout on
|
That is true. Not sure about embedded ROS though.
IMHO the choices that |
I think resolving this issue could also resolve some rclpy related failures: Or at least they are related, the rclpy issues happen across multiple rmw implementations. |
Yeah, I can't agree with this either. For me a timeout is always "we'll start trying to exit after this period of time". In fact I'd never assume it comes back before or "on time". |
👍 too |
Oh well, I guess we agree to disagree. In any case, we have an |
AFAIR, those failures are due to the timer waking up too late (windows server scheduler time slice is much bigger than macOS and linux). They might also be related with this though. |
@neil-rti maybe you can weigh in here. |
With ros2/rmw#275, |
This one's still causing regressions (I think): https://ci.ros2.org/view/nightly/job/nightly_win_rel/lastCompletedBuild/testReport/projectroot/test/test_timer__rmw_connextdds/ |
@asorbini had some thoughts about this. I don't remember the exact details, but there was a mismatch on Windows between the clock source that Connext is using internally and the one that we are using in rcl. Maybe he can comment. |
My guess was that these errors might have had something to do with the fact that Anyway, I'm afraid I don't yet have a good answer to why this is actually happening.
Note that once ros2/rmw_connextdds#22 is merged, the default implementation of WaitSets for EDIT: maybe I was thinking of this Microsoft page:
|
Thanks for the follow-up! |
Bug report
Required Info:
Steps to reproduce issue
Run timer tests using
rmw_connext_cpp
.Expected behavior
Tests pass.
Actual behavior
Often, tests fail when
rcl_wait()
returns early when a timer awakes.Additional information
For one, even though a clock is specified for timers, how
rmw_wait()
abides to the given timeout is not specified. This issue persists for both steady time and system time timers.This issue does not occur when using
rmw_fastrtps*_cpp
orrmw_cyclonedds_cpp
, as these return at the right time or (most often) past it.Suspecting a bug in
rmw_connext_cpp
, I could confirm that the expected timeout is passed to RTI Connext'sWaitSet::wait()
API which nonetheless returns early for both steady time and system time clocks. However, I believe this is less of a bug in RTI Connext than a misuse ofrmw_wait()
forrcl
timers' implementation, considering time resolution. The closest achievable timeout on a given OS is not necessarily greater than or equal than any one specified.The text was updated successfully, but these errors were encountered: