Fallback to Parking in Bounded and Unbounded Channels #1105

ibraheemdev · 2024-05-02T01:40:58Z

Currently, the queues used by the bounded and unbounded channels (also ArrayQueue and SegQueue) are not lock-free. The linearization point of a send or receive is moving the tail or head index respectively, meaning that if a sender or receiver sees the head/tail has moved but the value has not been written yet, they must spin until the corresponding receive or send completes that frees up space in the channel. Blocking in such a case is not possible, as sends might notify a receiver even though they are not visible due to a lagging sender holding up the channel, or vice versa, leading to missed wakeups.

The current solution of unbounded spinning can lead to issues, especially on platforms with a single core or custom thread priorities. One way to allow falling back to blocking is if waiters that see more values in the channel after being woken up notify any other waiters, making up for any missed notifications. However, this ends up being quite expensive and can lead to many unnecessary wakeups. A solution to this is wakeup throttling as implemented by tokio and other userspace schedulers, which avoids notifications if there is already an active thread, the "waker thread". That thread will then notify another waiter if it finds a message, creating a new waker thread. This PR changes the bounded and unbounded channels to fallback to blocking after observing a linearizability condition, using wakeup throttling to makeup for missed notifications without creating too many unnecessary wakeups.

Thanks to @kprotty for pointing me in the right direction for the wakeup throttling algorithm. Hopefully paired with #1038 this should resolve all of the spinning issues in crossbeam/std.

Should resolve #366 and #997 (as well as rust-lang/rust#114851 and rust-lang/rust#112723 when upstreamed to std).

ibraheemdev · 2024-05-02T04:59:57Z

We probably also want to introduce a non-linearizable version of try_recv, because this doesn't actually fix the issue of try_recv taking longer than expected.

ibraheemdev · 2024-05-02T05:34:34Z

Here are the benchmark results before and after. Note that all of the fallback cases are very unlikely to be hit on most setups, so this is just measuring the performance difference of new wakeup algorithm, which seems to be pretty mixed.

- test bounded_1::create    ... bench:          91 ns/iter (+/- 0)
+ test bounded_1::create    ... bench:          82 ns/iter (+/- 3)
- test bounded_1::mpmc      ... bench:  10,576,639 ns/iter (+/- 1,012,158)
+ test bounded_1::mpmc      ... bench:  11,206,415 ns/iter (+/- 2,787,440)
- test bounded_1::mpsc      ... bench:  21,551,625 ns/iter (+/- 975,344)
+ test bounded_1::mpsc      ... bench:  21,975,220 ns/iter (+/- 390,863)
- test bounded_1::oneshot   ... bench:         116 ns/iter (+/- 1)
+ test bounded_1::oneshot   ... bench:         104 ns/iter (+/- 3)
- test bounded_1::spmc      ... bench:  22,192,680 ns/iter (+/- 760,426)
+ test bounded_1::spmc      ... bench:  22,268,657 ns/iter (+/- 559,440)
- test bounded_1::spsc      ... bench:  23,097,047 ns/iter (+/- 76,889)
+ test bounded_1::spsc      ... bench:  22,753,449 ns/iter (+/- 143,048)
- test bounded_n::mpmc      ... bench:   2,387,457 ns/iter (+/- 313,821)
+ test bounded_n::mpmc      ... bench:   2,734,135 ns/iter (+/- 315,206)
- test bounded_n::mpsc      ... bench:   5,909,059 ns/iter (+/- 81,411)
+ test bounded_n::mpsc      ... bench:   6,635,209 ns/iter (+/- 134,351)
- test bounded_n::par_inout ... bench:   6,835,077 ns/iter (+/- 234,552)
+ test bounded_n::par_inout ... bench:   6,583,975 ns/iter (+/- 210,953)
- test bounded_n::spmc      ... bench:   5,649,130 ns/iter (+/- 127,809)
+ test bounded_n::spmc      ... bench:   5,748,161 ns/iter (+/- 130,954)
- test bounded_n::spsc      ... bench:   1,126,902 ns/iter (+/- 32,449)
+ test bounded_n::spsc      ... bench:   1,194,757 ns/iter (+/- 15,458)
- test unbounded::create    ... bench:          64 ns/iter (+/- 1)
+ test unbounded::create    ... bench:          57 ns/iter (+/- 1)
- test unbounded::inout     ... bench:          36 ns/iter (+/- 0)
+ test unbounded::inout     ... bench:          44 ns/iter (+/- 0)
- test unbounded::mpmc      ... bench:   1,181,564 ns/iter (+/- 204,692)
+ test unbounded::mpmc      ... bench:   1,200,760 ns/iter (+/- 174,606)
- test unbounded::mpsc      ... bench:   2,900,684 ns/iter (+/- 196,933)
+ test unbounded::mpsc      ... bench:   2,144,163 ns/iter (+/- 199,247)
- test unbounded::oneshot   ... bench:         103 ns/iter (+/- 0)
+ test unbounded::oneshot   ... bench:         113 ns/iter (+/- 0)
- test unbounded::par_inout ... bench:   4,799,151 ns/iter (+/- 487,975)
+ test unbounded::par_inout ... bench:   6,485,203 ns/iter (+/- 407,363)
- test unbounded::spmc      ... bench:   2,366,594 ns/iter (+/- 46,477)
+ test unbounded::spmc      ... bench:   2,014,760 ns/iter (+/- 28,476)
- test unbounded::spsc      ... bench:   1,061,444 ns/iter (+/- 14,589)
+ test unbounded::spsc      ... bench:   1,392,352 ns/iter (+/- 11,792)

ibraheemdev · 2024-07-11T02:06:11Z

There are some changes I made on the rust-lang PR that have to be applied to this one as well.

I believe there's a large optimization here than can be made which is only notifying a waiter if the current thread sees remaining data on the channel.

the8472 · 2024-11-13T19:28:35Z

Since some of the issues indicate priority inversions I wonder if parking is the right choice or using inheritance-aware synchronization would be better on platforms that support it.

implement linearizable wakeups

86debba

ibraheemdev force-pushed the linearizable-waker branch 2 times, most recently from d1602c0 to e2f54ea Compare May 2, 2024 02:36

use blocking state in select macro

61013ca

ibraheemdev force-pushed the linearizable-waker branch from e2f54ea to 61013ca Compare May 2, 2024 02:42

This was referenced May 2, 2024

crossbeam::channel::Receiver::try_recv can block forever if sending thread is blocked #997

Open

Channel sending/receiving ends up spinlooping and deadlocking rust-lang/rust#114851

Closed

ibraheemdev force-pushed the linearizable-waker branch from 9d6381c to cff8e5e Compare May 2, 2024 06:29

fallback to blocking when waiting for block or message writes

b7277c9

ibraheemdev force-pushed the linearizable-waker branch from cff8e5e to b7277c9 Compare May 2, 2024 06:35

taiki-e added the crossbeam-channel label May 2, 2024

ibraheemdev mentioned this pull request May 17, 2024

Fallback to Parking in std::sync::mpsc Channels rust-lang/rust#125204

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fallback to Parking in Bounded and Unbounded Channels #1105

Fallback to Parking in Bounded and Unbounded Channels #1105

ibraheemdev commented May 2, 2024 •

edited

Loading

ibraheemdev commented May 2, 2024

ibraheemdev commented May 2, 2024

ibraheemdev commented Jul 11, 2024

the8472 commented Nov 13, 2024

Fallback to Parking in Bounded and Unbounded Channels #1105

Are you sure you want to change the base?

Fallback to Parking in Bounded and Unbounded Channels #1105

Conversation

ibraheemdev commented May 2, 2024 • edited Loading

ibraheemdev commented May 2, 2024

ibraheemdev commented May 2, 2024

ibraheemdev commented Jul 11, 2024

the8472 commented Nov 13, 2024

ibraheemdev commented May 2, 2024 •

edited

Loading