`std::sync::Mutex` using batch semaphore #155

Aurel300 · 2024-07-24T23:38:29Z

Reimplements std::sync::Mutex using batch semaphore. The tests already include cases to check causal dependencies and the number of context switches, so these have not changed.

This PR changes block_on to not yield before the future is polled for the first time. This caused one replay test to fail. I'm not sure if we should bump versions for this, since it is technically a breaking change for schedules?

We can delay this PR until after we start the crate reorganisation, but this is ready for reviews in any case.

While trying to make sure the new semaphore implementation is equivalent to the old one in terms of context switches, I found it useful to write out how the methods work at a high-level, including their yield points. Including this here in case it's useful:

Mutex::lock when blocked
- check available_permits (0 available)
- (re-entrant deadlock check)
- BatchSemaphore::acquire_blocking(1) blocked
  - check state (-> not enough permits)
  - enqueue waiter
  - YIELD (block_on poll yield; wait until woken)
  - check and update state (-> permits now available)
  - remove waiter
  - block other failed waiters
  - YIELD (let other threads fail a try_acquire)
- (acquire_blocking returns)
- update state
- acquire inner mutex
Mutex::lock when not blocked
- check available_permits (1)
- BatchSemaphore::acquire_blocking(1) not blocked
  - check state (-> sufficient permits)
  - update state
  - block other failed waiters
  - YIELD (let other threads fail a try_acquire)
- (acquire_blocking returns)
- update state
- acquire inner mutex
Mutex::try_lock failed
- BatchSemaphore::try_acquire(1) failed
  - YIELD (so another try_acquire in this thread may succeed)
- (try_acquire returns)
Mutex::try_lock successful
- BatchSemaphore::try_acquire(1) successful
  - update state
  - block other failed waiters
  - YIELD (let other threads fail a try_acquire)
- (try_acquire returns)
- update state
- acquire inner mutex
Mutex::drop
- release inner mutex
- update state
- BatchSemaphore::release(1)
  - update state
  - unblock all feasible waiters
  - YIELD (allow waiters to race and wake)
- (release returns)
Mutex::drop when panicking
- release inner mutex
- update state
- BatchSemaphore::release(1)
  - update state
  - forget waiters
  - close semaphore
- (release returns)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

src/sync/mutex.rs

sarsko · 2024-07-26T01:28:53Z

src/sync/mutex.rs

+                // latter case, we check the state to report a more precise
+                // error message.
+                state = self.state.borrow_mut();
+                if let Some(holder) = state.holder {


If this is None, then we have a bug in the Mutex – swap to state.holder.expect(...) ?

I don't think this is the case: state.holder may still be None because there is a yield between the acquiry of a permit from the semaphore and the actual state update (the two are not performed in a single atomic step).

However, the check is still sufficient here because in the case of a thread attempting to re-acquire a lock it already holds, it must have returned from the first lock call by the time the second one is checking the state.

src/sync/mutex.rs

jorajeev · 2024-08-08T14:50:13Z

src/future/batch_semaphore.rs

+    /// the permits remaining in the semaphore.
+    fn reblock_if_unfair(&self) {
+        let state = self.state.borrow_mut();
+        if state.fairness == Fairness::Unfair {


nit: We're not planning to support changing the fairness once a semaphore has been created. Given that, I wonder if we could just put the fairness field outside the RefCell so we can do this check before calling borrow_mut?

Ok, resolved in the new commit, though fairness needed to be added to BatchSemaphoreState::acquire_permits here, which is maybe not great.

jorajeev · 2024-08-08T14:57:00Z

src/future/batch_semaphore.rs

+            // acquires may succeed, as long as the requesters were not
+            // blocking on the semaphore at the time of the panic. This is
+            // used to correctly model lock poisoning.
+            state.permits_available.release(num_permits, VectorClock::new());


Why VectorClock::new() here, instead of the current caller's clock?

When this is done as part of Drop handlers, current::clock may panic. I think this is the same thing as mentioned here for the close clock: the way execution stops needs to be changed a bit.

jorajeev · 2024-08-08T14:59:50Z

src/future/batch_semaphore.rs

+                        self.semaphore.reblock_if_unfair();
+
+                        // Yield so other threads can fail a `try_acquire`.
+                        thread::switch();


[Aside] Even though we're always calling thread::switch() after reblock_if_unfair, I think it's better not to push the switch inside the reblock, and keep them outside, as you have done here.

jorajeev · 2024-08-08T15:09:12Z

src/sync/mutex.rs

-        if let Some(holder) = state.holder {
-            if holder == me {
-                panic!("deadlock! task {:?} tried to acquire a Mutex it already holds", me);
+        if !self.semaphore.is_closed() {


Feels a bit strange that you're checking if the semaphore is closed here, but then you call unwrap on the acquire call on line 70. If we're not expecting the semaphore to be closed, maybe just assert that here?

This is the other way though: if it is not closed, then we can unwrap the acquire_blocking call. Otherwise, we will continue and should get a poisoned lock from the inner mutex.

stdlib mutex using batch semaphore

f4e178d

sarsko reviewed Jul 26, 2024

View reviewed changes

src/sync/mutex.rs Outdated Show resolved Hide resolved

sarsko reviewed Jul 26, 2024

View reviewed changes

src/sync/mutex.rs Outdated Show resolved Hide resolved

address comments, fix tests

250a989

jorajeev reviewed Aug 8, 2024

View reviewed changes

move fairness out of semaphore state

82725a1

jorajeev approved these changes Aug 8, 2024

View reviewed changes

jorajeev merged commit 4281c33 into awslabs:main Aug 9, 2024
5 checks passed

Aurel300 deleted the feature/semaphore-mutex branch August 9, 2024 02:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`std::sync::Mutex` using batch semaphore #155

`std::sync::Mutex` using batch semaphore #155

Aurel300 commented Jul 24, 2024 •

edited

Loading

sarsko Jul 26, 2024

Aurel300 Aug 7, 2024

jorajeev Aug 8, 2024

Aurel300 Aug 8, 2024

jorajeev Aug 8, 2024

Aurel300 Aug 8, 2024

jorajeev Aug 8, 2024

jorajeev Aug 8, 2024

Aurel300 Aug 8, 2024 •

edited

Loading

std::sync::Mutex using batch semaphore #155

std::sync::Mutex using batch semaphore #155

Conversation

Aurel300 commented Jul 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Aurel300 Aug 8, 2024 • edited Loading

Choose a reason for hiding this comment

`std::sync::Mutex` using batch semaphore #155

`std::sync::Mutex` using batch semaphore #155

Aurel300 commented Jul 24, 2024 •

edited

Loading

Aurel300 Aug 8, 2024 •

edited

Loading