Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

race: Relax success ordering from AcqRel to Release. #278

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

briansmith
Copy link
Contributor

@briansmith briansmith commented Mar 13, 2025

See the analogous change in
rust-lang/rust#131746 and the discussion in
#220.

What is the effect of this change? Not much, because before we ever execute the compare_exchange, we do a load with Ordering::Acquire; the compare_exchange is in the #[cold] path already. Thus, this just mostly clarifies our expectations. See the non-doc comment added under the module's doc comment for the reasoning.

How does this change the code gen? Consider this analogous example:

 #[no_mangle]
 fn foo1(y: &mut i32) -> bool {
-    let r = X.compare_exchange(0, 1, Ordering::AcqRel, Ordering::Acquire).is_ok();
+    let r = X.compare_exchange(0, 1, Ordering::Release, Ordering::Acquire).is_ok();
     r
 }

On x86_64, there is no change. Here is the generated code before and after:

foo1:
        mov     rcx, qword ptr [rip + example::X::h9e1b81da80078af7@GOTPCREL]
        mov     edx, 1
        xor     eax, eax
        lock            cmpxchg dword ptr [rcx], edx
        sete    al
        ret

example::X::h9e1b81da80078af7:
        .zero   4

On AArch64, regardless of whether atomics are outlined or not, there is no change. Here
is the generated code with inlined atomics:

foo1:
        adrp    x8, :got:example::X::h40b04fb69d714de3
        ldr     x8, [x8, :got_lo12:example::X::h40b04fb69d714de3]
.LBB0_1:
        ldaxr   w9, [x8]
        cbnz    w9, .LBB0_4
        mov     w0, #1
        stlxr   w9, w0, [x8]
        cbnz    w9, .LBB0_1
        ret
.LBB0_4:
        mov     w0, wzr
        clrex
        ret

example::X::h40b04fb69d714de3:
        .zero   4

For 32-bit ARMv7, with inlined atomics, the resulting diff in the object code is:

@@ -10,14 +10,13 @@
         mov     r0, #1
         strex   r2, r0, [r1]
         cmp     r2, #0
-        beq     .LBB0_5
+        bxeq    lr
         ldrex   r0, [r1]
         cmp     r0, #0
         beq     .LBB0_2
 .LBB0_4:
-        mov     r0, #0
         clrex
-.LBB0_5:
+        mov     r0, #0
         dmb     ish
         bx      lr
 .LCPI0_0:
@@ -54,4 +53,3 @@

 example::X::h47e2038445e1c648:
         .zero   4

@briansmith
Copy link
Contributor Author

briansmith commented Mar 13, 2025

@briansmith briansmith changed the title race: Reduce failure ordering from RelAcq to Release. race: Reduce failure ordering from AcqRel to Release. Mar 13, 2025
@briansmith briansmith force-pushed the b/weaken-release branch 4 times, most recently from 32c2734 to b527b4a Compare March 15, 2025 17:14
@briansmith briansmith changed the title race: Reduce failure ordering from AcqRel to Release. race: Relax success ordering from AcqRel to Release. Mar 15, 2025
@matklad
Copy link
Owner

matklad commented Mar 17, 2025

LGTM, especially with the same change in Rust!

Could your remove the comments though, and instead add the (great) explanation of "there's nothing to synchronize with for 0" to commit message? I think it's ok to hide this in git blame!

And, as usual, could you bump a version in Cargo.toml&add a changelog entry? ^^

@briansmith briansmith force-pushed the b/weaken-release branch 2 times, most recently from 059d0c1 to a245e8d Compare March 17, 2025 18:04
@briansmith
Copy link
Contributor Author

Could your remove the comments though, and instead add the (great) explanation of "there's nothing to synchronize with for 0" to commit message? I think it's ok to hide this in git blame!

I think it's too important to hide in the git log, especially when people go to change the code later. Instead, I added a non-doc comment under the doc comment, and removed the other comments I added.

And, as usual, could you bump a version in Cargo.toml&add a changelog entry? ^^

done.

…ease`.

See the analogous change in
rust-lang/rust#131746 and the discussion in
matklad#220.

What is the effect of this change? Not much, because before we ever
execute the `compare_exchange`, we do a load with `Ordering::Acquire`;
the `compare_exchange` is in the `#[cold]` path already. Thus, this just mostly
clarifies our expectations. See the non-doc comment added under the module's doc
comment for the reasoning.

How does this change the code gen? Consider this analogous example:

```diff
 #[no_mangle]
 fn foo1(y: &mut i32) -> bool {
-    let r = X.compare_exchange(0, 1, Ordering::AcqRel, Ordering::Acquire).is_ok();
+    let r = X.compare_exchange(0, 1, Ordering::Release, Ordering::Acquire).is_ok();
     r
 }

```

On x86_64, there is no change. Here is the generated code before and after:

```
foo1:
        mov     rcx, qword ptr [rip + example::X::h9e1b81da80078af7@GOTPCREL]
        mov     edx, 1
        xor     eax, eax
        lock            cmpxchg dword ptr [rcx], edx
        sete    al
        ret

example::X::h9e1b81da80078af7:
        .zero   4
```

On AArch64, regardless of whether atomics are outlined or not, there is no change. Here
is the generated code with inlined atomics:

```
foo1:
        adrp    x8, :got:example::X::h40b04fb69d714de3
        ldr     x8, [x8, :got_lo12:example::X::h40b04fb69d714de3]
.LBB0_1:
        ldaxr   w9, [x8]
        cbnz    w9, .LBB0_4
        mov     w0, matklad#1
        stlxr   w9, w0, [x8]
        cbnz    w9, .LBB0_1
        ret
.LBB0_4:
        mov     w0, wzr
        clrex
        ret

example::X::h40b04fb69d714de3:
        .zero   4
```

For 32-bit ARMv7, with inlined atomics, the resulting diff in the object code is:

```diff
@@ -10,14 +10,13 @@
         mov     r0, matklad#1
         strex   r2, r0, [r1]
         cmp     r2, #0
-        beq     .LBB0_5
+        bxeq    lr
         ldrex   r0, [r1]
         cmp     r0, #0
         beq     .LBB0_2
 .LBB0_4:
-        mov     r0, #0
         clrex
-.LBB0_5:
+        mov     r0, #0
         dmb     ish
         bx      lr
 .LCPI0_0:
@@ -54,4 +53,3 @@

 example::X::h47e2038445e1c648:
         .zero   4
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants