-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
async: Fix hang in wait_all_exit #227
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait()
is just waiting, dowhile self.waiters()
inwait_all
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't fix the issue, you need to check the waiters count after the notified() object is created but before calling await on it. Let's take an example with two threads running in parallel:
The notify_waiters() call doesn't store any permit, it only unblocks existing notified() objects: https://docs.rs/tokio/latest/tokio/sync/struct.Notify.html#method.notify_waiters .
If the notified object is created after notify_waiters() call, await will block forever. There won't be anyone left to unblock it
because notify_waiters() is called only when the last waiter is dropped (unless a subscribe() call and a drop() happens in the meantime).
The sync.Notify struct (the type of self.shared.notify_exit) uses internally seq_cst operations on an AtomicUsize state value. This means that they form a happens before relationship between notified() and notify_waiters() calls, anything that happened before the call to notify_waiters() in thread 1, will be visible for the thread 2 after the notified() call.
Hence, we can test for waiters == 0 just after the notified() call:
By adding a check for the waiters count right before await happens we can protect against this hang.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, very detailed. I know how the race happens and have read the documentation.
The problem is that
Notified
need to be created first, then it makes sense to test waiters. So we can movenotified()
forward.My point is that the
wait
function is a helper function, and not public api, it only wrap the timeout or not. It maybe used to unit test or other intenal situation. Thewait_all_exit
is provided to users to ensure concurrency safety.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two reasons for moving the bulk of the code in wait():
Let's inline wait() call in the snippet you suggested:
Between the notified() call and the await there is no check for waiters count so awaiting can still block forever if waiters count is 0. The solution is to add the test for waiters() at the beginning of wait() function. This solution while it seem ok at first it doesn't work because of the second issue.
The timeout, when specified, should apply to the outer while loop as well. Otherwise the code might loop multiple times, beyond the specified timeout, which is incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
......
is the code that leave it as is for yours.Will this work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work, it doesn't resolve the second issue that is related to timeout. The timeout should be applied to the whole while loop because it can iterate multiple times, in case someone calls subscribe() and the waiters() goes from 0 to >0.
For example, if you pass 5 seconds for timeout this code can behave something like this:
The code waits for a total of twelve seconds in total although the original intent was to wait only for 5 seconds at most.
The idea is to move the while block inside the future that is passed to timeout() function. By moving it there, you'll get to the same piece of code I wrote.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comment.
I'm going to take some time to review this.