race: Relax success ordering from `AcqRel` to `Release`. #278

briansmith · 2025-03-13T18:33:10Z

See the analogous change in
rust-lang/rust#131746 and the discussion in
#220.

What is the effect of this change? Not much, because before we ever execute the compare_exchange, we do a load with Ordering::Acquire; the compare_exchange is in the #[cold] path already. Thus, this just mostly clarifies our expectations. See the non-doc comment added under the module's doc comment for the reasoning.

How does this change the code gen? Consider this analogous example:

 #[no_mangle]
 fn foo1(y: &mut i32) -> bool {
-    let r = X.compare_exchange(0, 1, Ordering::AcqRel, Ordering::Acquire).is_ok();
+    let r = X.compare_exchange(0, 1, Ordering::Release, Ordering::Acquire).is_ok();
     r
 }

On x86_64, there is no change. Here is the generated code before and after:

foo1:
        mov     rcx, qword ptr [rip + example::X::h9e1b81da80078af7@GOTPCREL]
        mov     edx, 1
        xor     eax, eax
        lock            cmpxchg dword ptr [rcx], edx
        sete    al
        ret

example::X::h9e1b81da80078af7:
        .zero   4

On AArch64, regardless of whether atomics are outlined or not, there is no change. Here
is the generated code with inlined atomics:

foo1:
        adrp    x8, :got:example::X::h40b04fb69d714de3
        ldr     x8, [x8, :got_lo12:example::X::h40b04fb69d714de3]
.LBB0_1:
        ldaxr   w9, [x8]
        cbnz    w9, .LBB0_4
        mov     w0, #1
        stlxr   w9, w0, [x8]
        cbnz    w9, .LBB0_1
        ret
.LBB0_4:
        mov     w0, wzr
        clrex
        ret

example::X::h40b04fb69d714de3:
        .zero   4

For 32-bit ARMv7, with inlined atomics, the resulting diff in the object code is:

@@ -10,14 +10,13 @@
         mov     r0, #1
         strex   r2, r0, [r1]
         cmp     r2, #0
-        beq     .LBB0_5
+        bxeq    lr
         ldrex   r0, [r1]
         cmp     r0, #0
         beq     .LBB0_2
 .LBB0_4:
-        mov     r0, #0
         clrex
-.LBB0_5:
+        mov     r0, #0
         dmb     ish
         bx      lr
 .LCPI0_0:
@@ -54,4 +53,3 @@

 example::X::h47e2038445e1c648:
         .zero   4

briansmith · 2025-03-13T18:37:41Z

I noticed Aarch64 is the same before/after primarily because it has +outline-atomics by default whereas 32-bit ARM doesn't. So we might experiment with disabling that for AArch64 to see more interesting results for that configuration. EDITED: nope, no change.
Instead of using the godbolt example from issue Potential to weaken ordering for success case of compare_exhanges in race module #220 in the commit message, we should probably document the difference from a real use of get_or_init as I did in Remove spin dependency on x86 and x86_64 targets. briansmith/ring#2331. EDITED: I decided not to bother with this, since this is in the #[cold] path anyway.

matklad · 2025-03-17T16:55:12Z

LGTM, especially with the same change in Rust!

Could your remove the comments though, and instead add the (great) explanation of "there's nothing to synchronize with for 0" to commit message? I think it's ok to hide this in git blame!

And, as usual, could you bump a version in Cargo.toml&add a changelog entry? ^^

briansmith · 2025-03-17T18:05:00Z

Could your remove the comments though, and instead add the (great) explanation of "there's nothing to synchronize with for 0" to commit message? I think it's ok to hide this in git blame!

I think it's too important to hide in the git log, especially when people go to change the code later. Instead, I added a non-doc comment under the doc comment, and removed the other comments I added.

And, as usual, could you bump a version in Cargo.toml&add a changelog entry? ^^

done.

…ease`. See the analogous change in rust-lang/rust#131746 and the discussion in matklad#220. What is the effect of this change? Not much, because before we ever execute the `compare_exchange`, we do a load with `Ordering::Acquire`; the `compare_exchange` is in the `#[cold]` path already. Thus, this just mostly clarifies our expectations. See the non-doc comment added under the module's doc comment for the reasoning. How does this change the code gen? Consider this analogous example: ```diff #[no_mangle] fn foo1(y: &mut i32) -> bool { - let r = X.compare_exchange(0, 1, Ordering::AcqRel, Ordering::Acquire).is_ok(); + let r = X.compare_exchange(0, 1, Ordering::Release, Ordering::Acquire).is_ok(); r } ``` On x86_64, there is no change. Here is the generated code before and after: ``` foo1: mov rcx, qword ptr [rip + example::X::h9e1b81da80078af7@GOTPCREL] mov edx, 1 xor eax, eax lock cmpxchg dword ptr [rcx], edx sete al ret example::X::h9e1b81da80078af7: .zero 4 ``` On AArch64, regardless of whether atomics are outlined or not, there is no change. Here is the generated code with inlined atomics: ``` foo1: adrp x8, :got:example::X::h40b04fb69d714de3 ldr x8, [x8, :got_lo12:example::X::h40b04fb69d714de3] .LBB0_1: ldaxr w9, [x8] cbnz w9, .LBB0_4 mov w0, matklad#1 stlxr w9, w0, [x8] cbnz w9, .LBB0_1 ret .LBB0_4: mov w0, wzr clrex ret example::X::h40b04fb69d714de3: .zero 4 ``` For 32-bit ARMv7, with inlined atomics, the resulting diff in the object code is: ```diff @@ -10,14 +10,13 @@ mov r0, matklad#1 strex r2, r0, [r1] cmp r2, #0 - beq .LBB0_5 + bxeq lr ldrex r0, [r1] cmp r0, #0 beq .LBB0_2 .LBB0_4: - mov r0, #0 clrex -.LBB0_5: + mov r0, #0 dmb ish bx lr .LCPI0_0: @@ -54,4 +53,3 @@ example::X::h47e2038445e1c648: .zero 4 ```

briansmith mentioned this pull request Mar 13, 2025

Potential to weaken ordering for success case of compare_exhanges in race module #220

Open

briansmith changed the title ~~race: Reduce failure ordering from RelAcq to Release.~~ race: Reduce failure ordering from AcqRel to Release. Mar 13, 2025

briansmith force-pushed the b/weaken-release branch 4 times, most recently from 32c2734 to b527b4a Compare March 15, 2025 17:14

briansmith changed the title ~~race: Reduce failure ordering from AcqRel to Release.~~ race: Relax success ordering from AcqRel to Release. Mar 15, 2025

briansmith force-pushed the b/weaken-release branch from b527b4a to 2f82871 Compare March 15, 2025 18:36

briansmith force-pushed the b/weaken-release branch 2 times, most recently from 059d0c1 to a245e8d Compare March 17, 2025 18:04

briansmith force-pushed the b/weaken-release branch from a245e8d to 2a707ee Compare March 17, 2025 18:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

race: Relax success ordering from `AcqRel` to `Release`. #278

race: Relax success ordering from `AcqRel` to `Release`. #278

briansmith commented Mar 13, 2025 •

edited

Loading

briansmith commented Mar 13, 2025 •

edited

Loading

matklad commented Mar 17, 2025

briansmith commented Mar 17, 2025

race: Relax success ordering from AcqRel to Release. #278

Are you sure you want to change the base?

race: Relax success ordering from AcqRel to Release. #278

Conversation

briansmith commented Mar 13, 2025 • edited Loading

briansmith commented Mar 13, 2025 • edited Loading

matklad commented Mar 17, 2025

briansmith commented Mar 17, 2025

race: Relax success ordering from `AcqRel` to `Release`. #278

race: Relax success ordering from `AcqRel` to `Release`. #278

briansmith commented Mar 13, 2025 •

edited

Loading

briansmith commented Mar 13, 2025 •

edited

Loading