Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::sync::Mutex using batch semaphore #155

Merged
merged 3 commits into from
Aug 9, 2024

Conversation

Aurel300
Copy link
Contributor

@Aurel300 Aurel300 commented Jul 24, 2024

Reimplements std::sync::Mutex using batch semaphore. The tests already include cases to check causal dependencies and the number of context switches, so these have not changed.

This PR changes block_on to not yield before the future is polled for the first time. This caused one replay test to fail. I'm not sure if we should bump versions for this, since it is technically a breaking change for schedules?

We can delay this PR until after we start the crate reorganisation, but this is ready for reviews in any case.


While trying to make sure the new semaphore implementation is equivalent to the old one in terms of context switches, I found it useful to write out how the methods work at a high-level, including their yield points. Including this here in case it's useful:

  • Mutex::lock when blocked
    • check available_permits (0 available)
    • (re-entrant deadlock check)
    • BatchSemaphore::acquire_blocking(1) blocked
      • check state (-> not enough permits)
      • enqueue waiter
      • YIELD (block_on poll yield; wait until woken)
      • check and update state (-> permits now available)
      • remove waiter
      • block other failed waiters
      • YIELD (let other threads fail a try_acquire)
    • (acquire_blocking returns)
    • update state
    • acquire inner mutex
  • Mutex::lock when not blocked
    • check available_permits (1)
    • BatchSemaphore::acquire_blocking(1) not blocked
      • check state (-> sufficient permits)
      • update state
      • block other failed waiters
      • YIELD (let other threads fail a try_acquire)
    • (acquire_blocking returns)
    • update state
    • acquire inner mutex
  • Mutex::try_lock failed
    • BatchSemaphore::try_acquire(1) failed
      • YIELD (so another try_acquire in this thread may succeed)
    • (try_acquire returns)
  • Mutex::try_lock successful
    • BatchSemaphore::try_acquire(1) successful
      • update state
      • block other failed waiters
      • YIELD (let other threads fail a try_acquire)
    • (try_acquire returns)
    • update state
    • acquire inner mutex
  • Mutex::drop
    • release inner mutex
    • update state
    • BatchSemaphore::release(1)
      • update state
      • unblock all feasible waiters
      • YIELD (allow waiters to race and wake)
    • (release returns)
  • Mutex::drop when panicking
    • release inner mutex
    • update state
    • BatchSemaphore::release(1)
      • update state
      • forget waiters
      • close semaphore
    • (release returns)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

src/sync/mutex.rs Outdated Show resolved Hide resolved
// latter case, we check the state to report a more precise
// error message.
state = self.state.borrow_mut();
if let Some(holder) = state.holder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is None, then we have a bug in the Mutex – swap to state.holder.expect(...) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the case: state.holder may still be None because there is a yield between the acquiry of a permit from the semaphore and the actual state update (the two are not performed in a single atomic step).

However, the check is still sufficient here because in the case of a thread attempting to re-acquire a lock it already holds, it must have returned from the first lock call by the time the second one is checking the state.

src/sync/mutex.rs Outdated Show resolved Hide resolved
/// the permits remaining in the semaphore.
fn reblock_if_unfair(&self) {
let state = self.state.borrow_mut();
if state.fairness == Fairness::Unfair {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We're not planning to support changing the fairness once a semaphore has been created. Given that, I wonder if we could just put the fairness field outside the RefCell so we can do this check before calling borrow_mut?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, resolved in the new commit, though fairness needed to be added to BatchSemaphoreState::acquire_permits here, which is maybe not great.

// acquires may succeed, as long as the requesters were not
// blocking on the semaphore at the time of the panic. This is
// used to correctly model lock poisoning.
state.permits_available.release(num_permits, VectorClock::new());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why VectorClock::new() here, instead of the current caller's clock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this is done as part of Drop handlers, current::clock may panic. I think this is the same thing as mentioned here for the close clock: the way execution stops needs to be changed a bit.

self.semaphore.reblock_if_unfair();

// Yield so other threads can fail a `try_acquire`.
thread::switch();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Aside] Even though we're always calling thread::switch() after reblock_if_unfair, I think it's better not to push the switch inside the reblock, and keep them outside, as you have done here.

if let Some(holder) = state.holder {
if holder == me {
panic!("deadlock! task {:?} tried to acquire a Mutex it already holds", me);
if !self.semaphore.is_closed() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels a bit strange that you're checking if the semaphore is closed here, but then you call unwrap on the acquire call on line 70. If we're not expecting the semaphore to be closed, maybe just assert that here?

Copy link
Contributor Author

@Aurel300 Aurel300 Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the other way though: if it is not closed, then we can unwrap the acquire_blocking call. Otherwise, we will continue and should get a poisoned lock from the inner mutex.

@jorajeev jorajeev merged commit 4281c33 into awslabs:main Aug 9, 2024
5 checks passed
@Aurel300 Aurel300 deleted the feature/semaphore-mutex branch August 9, 2024 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants