Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback to Parking in Bounded and Unbounded Channels #1105

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ibraheemdev
Copy link
Contributor

@ibraheemdev ibraheemdev commented May 2, 2024

Currently, the queues used by the bounded and unbounded channels (also ArrayQueue and SegQueue) are not lock-free. The linearization point of a send or receive is moving the tail or head index respectively, meaning that if a sender or receiver sees the head/tail has moved but the value has not been written yet, they must spin until the corresponding receive or send completes that frees up space in the channel. Blocking in such a case is not possible, as sends might notify a receiver even though they are not visible due to a lagging sender holding up the channel, or vice versa, leading to missed wakeups.

The current solution of unbounded spinning can lead to issues, especially on platforms with a single core or custom thread priorities. One way to allow falling back to blocking is if waiters that see more values in the channel after being woken up notify any other waiters, making up for any missed notifications. However, this ends up being quite expensive and can lead to many unnecessary wakeups. A solution to this is wakeup throttling as implemented by tokio and other userspace schedulers, which avoids notifications if there is already an active thread, the "waker thread". That thread will then notify another waiter if it finds a message, creating a new waker thread. This PR changes the bounded and unbounded channels to fallback to blocking after observing a linearizability condition, using wakeup throttling to makeup for missed notifications without creating too many unnecessary wakeups.

Thanks to @kprotty for pointing me in the right direction for the wakeup throttling algorithm. Hopefully paired with #1038 this should resolve all of the spinning issues in crossbeam/std.

Should resolve #366 and #997 (as well as rust-lang/rust#114851 and rust-lang/rust#112723 when upstreamed to std).

@ibraheemdev ibraheemdev force-pushed the linearizable-waker branch 2 times, most recently from d1602c0 to e2f54ea Compare May 2, 2024 02:36
@ibraheemdev ibraheemdev force-pushed the linearizable-waker branch from e2f54ea to 61013ca Compare May 2, 2024 02:42
@ibraheemdev
Copy link
Contributor Author

We probably also want to introduce a non-linearizable version of try_recv, because this doesn't actually fix the issue of try_recv taking longer than expected.

@ibraheemdev
Copy link
Contributor Author

Here are the benchmark results before and after. Note that all of the fallback cases are very unlikely to be hit on most setups, so this is just measuring the performance difference of new wakeup algorithm, which seems to be pretty mixed.

- test bounded_1::create    ... bench:          91 ns/iter (+/- 0)
+ test bounded_1::create    ... bench:          82 ns/iter (+/- 3)
- test bounded_1::mpmc      ... bench:  10,576,639 ns/iter (+/- 1,012,158)
+ test bounded_1::mpmc      ... bench:  11,206,415 ns/iter (+/- 2,787,440)
- test bounded_1::mpsc      ... bench:  21,551,625 ns/iter (+/- 975,344)
+ test bounded_1::mpsc      ... bench:  21,975,220 ns/iter (+/- 390,863)
- test bounded_1::oneshot   ... bench:         116 ns/iter (+/- 1)
+ test bounded_1::oneshot   ... bench:         104 ns/iter (+/- 3)
- test bounded_1::spmc      ... bench:  22,192,680 ns/iter (+/- 760,426)
+ test bounded_1::spmc      ... bench:  22,268,657 ns/iter (+/- 559,440)
- test bounded_1::spsc      ... bench:  23,097,047 ns/iter (+/- 76,889)
+ test bounded_1::spsc      ... bench:  22,753,449 ns/iter (+/- 143,048)
- test bounded_n::mpmc      ... bench:   2,387,457 ns/iter (+/- 313,821)
+ test bounded_n::mpmc      ... bench:   2,734,135 ns/iter (+/- 315,206)
- test bounded_n::mpsc      ... bench:   5,909,059 ns/iter (+/- 81,411)
+ test bounded_n::mpsc      ... bench:   6,635,209 ns/iter (+/- 134,351)
- test bounded_n::par_inout ... bench:   6,835,077 ns/iter (+/- 234,552)
+ test bounded_n::par_inout ... bench:   6,583,975 ns/iter (+/- 210,953)
- test bounded_n::spmc      ... bench:   5,649,130 ns/iter (+/- 127,809)
+ test bounded_n::spmc      ... bench:   5,748,161 ns/iter (+/- 130,954)
- test bounded_n::spsc      ... bench:   1,126,902 ns/iter (+/- 32,449)
+ test bounded_n::spsc      ... bench:   1,194,757 ns/iter (+/- 15,458)
- test unbounded::create    ... bench:          64 ns/iter (+/- 1)
+ test unbounded::create    ... bench:          57 ns/iter (+/- 1)
- test unbounded::inout     ... bench:          36 ns/iter (+/- 0)
+ test unbounded::inout     ... bench:          44 ns/iter (+/- 0)
- test unbounded::mpmc      ... bench:   1,181,564 ns/iter (+/- 204,692)
+ test unbounded::mpmc      ... bench:   1,200,760 ns/iter (+/- 174,606)
- test unbounded::mpsc      ... bench:   2,900,684 ns/iter (+/- 196,933)
+ test unbounded::mpsc      ... bench:   2,144,163 ns/iter (+/- 199,247)
- test unbounded::oneshot   ... bench:         103 ns/iter (+/- 0)
+ test unbounded::oneshot   ... bench:         113 ns/iter (+/- 0)
- test unbounded::par_inout ... bench:   4,799,151 ns/iter (+/- 487,975)
+ test unbounded::par_inout ... bench:   6,485,203 ns/iter (+/- 407,363)
- test unbounded::spmc      ... bench:   2,366,594 ns/iter (+/- 46,477)
+ test unbounded::spmc      ... bench:   2,014,760 ns/iter (+/- 28,476)
- test unbounded::spsc      ... bench:   1,061,444 ns/iter (+/- 14,589)
+ test unbounded::spsc      ... bench:   1,392,352 ns/iter (+/- 11,792)

@ibraheemdev
Copy link
Contributor Author

There are some changes I made on the rust-lang PR that have to be applied to this one as well.

I believe there's a large optimization here than can be made which is only notifying a waiter if the current thread sees remaining data on the channel.

@the8472
Copy link

the8472 commented Nov 13, 2024

Since some of the issues indicate priority inversions I wonder if parking is the right choice or using inheritance-aware synchronization would be better on platforms that support it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Yielding in crossbeam-channel
3 participants