Skip to content

implement floor and ceil in assembly on i586 #976

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

folkertdev
Copy link
Contributor

@folkertdev folkertdev commented Jul 13, 2025

fixes #837

The assembly is based on

Which both state

/*
 * Written by J.T. Conklin <[email protected]>.
 * Public domain.
 */

Which I believe means we're good in terms of licensing.

@tgross35
Copy link
Contributor

This is awesome, thank you for implementing it!

There is one problem, our libm MSRV unfortunately doesn't support naked functions. I'll be bumping it in the near future but I don't think it will be that high; how hard is this to do with inline asm? Probably adds a few instructions because the loads stack access becomes opaque, but that would still be faster than the slow existing implementation.

@tgross35
Copy link
Contributor

Fix for the red CI in #979 btw

@folkertdev
Copy link
Contributor Author

how hard is this to do with inline asm?

Well, I don't know this target and its assembly that well. So using global_asm! would be much easier (for me, anyway) and more straightforward to upgrade once the MSRV does support naked functions.

I think the only downside is that the symbols might be visible from the outside? Is that a problem?

@folkertdev
Copy link
Contributor Author

actually we can prevent even that with a trick that Björn showed me recently using sym

@folkertdev
Copy link
Contributor Author

folkertdev commented Jul 17, 2025

from what I can find this is a 2024 edition thing


error: extern blocks must be unsafe
  --> builtins-shim/../compiler-builtins/src/math/../../../libm/src/math/arch/i586.rs:23:1
   |
23 | / extern "cdecl" {
24 | |     fn ceil_helper(_: f64) -> f64;
25 | |     fn floor_helper(_: f64) -> f64;
26 | | }
   | |_^

but rust 1.63 won't compile unsafe extern right?

@tgross35
Copy link
Contributor

tgross35 commented Jul 17, 2025

I think the only downside is that the symbols might be visible from the outside? Is that a problem?

I think it may be, also conflicts (though we could make it weak). But this should be pretty easy to move to inline asm; I think something like this would work https://rust.godbolt.org/z/P98rKEW7P, which is reasonably close to the original (not sure if there's a way to turn pointers to locals into stack offsets, to avoid the leal s)

@folkertdev
Copy link
Contributor Author

not sure if there's a way to turn pointers to locals into stack offsets, to avoid the leal s

That's probably fine, we can fix that properly once naked functions are available


we now get errors like this?

  stderr ───
    Spaced Musl sinf arg 1/1: 800 iterations (800 total)

    thread 'musl_quickspace_sinf' panicked at libm-test/tests/compare_built_musl.rs:26:50:
    called `Result::unwrap()` on an `Err` value: 
        input:    (-239880100.0,)
        as hex:   (-0x1.c988f4p+27,)
        as bits:  (0xcd64c47a,)
        expected: -0.17352822            -0x1.6362c4p-3 0xbe31b162
        actual:   -0.17353132            -0x1.636464p-3 0xbe31b232

@quaternic
Copy link
Contributor

You're clobbering the saved control word on the stack with the modified one, so it isn't restored correctly.

I think there's an argument for using a fixed control word for the frndint, but I'll have to come back to this later.

Comment on lines 18 to 25
"movw %dx, ({cw_ptr})", // Apply cw
"fldcw ({cw_ptr})", // ...
Copy link
Contributor

@tgross35 tgross35 Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, yeah, as @quaternic mentioned this needs a second stack slot. As a microopt this could be let mut cw_stash = [0u16; 2]; and then these accesses can be -4({stash_ptr}) so we save a register vs. two different u16 locals

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually even better, let mut cw_stash = MaybeUninit::<[u16; 2]>::uninit(); to save the zero init instruction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this to work

https://rust.godbolt.org/z/44Tjcx3bb

the -4 stuff didn't work (also, for u16, why -4?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea, took that from the original asm without checking :)

@folkertdev
Copy link
Contributor Author

Somehow these changes introduce a panic somewhere??

error: linking with `cc` failed: exit status: 1
  |
  = note:  "cc" "-m32" "<1 object files omitted>" "-Wl,--as-needed" "-Wl,-Bstatic" "<sysroot>/lib/rustlib/i586-unknown-linux-gnu/lib/libcompiler_builtins-*.rlib" "-Wl,-Bdynamic" "-lgcc_s" "-lutil" "-lrt" "-lpthread" "-lm" "-ldl" "-lc" "-L" "/tmp/rustcBJvx8Y/raw-dylibs" "-Wl,--eh-frame-hdr" "-Wl,-z,noexecstack" "-L" "<sysroot>/lib/rustlib/i586-unknown-linux-gnu/lib" "-o" "/home/folkertdev/rust/compiler-builtins/target/i586-unknown-linux-gnu/release-opt/deps/libm-7098e2e2ad176b85" "-Wl,--gc-sections" "-pie" "-Wl,-z,relro,-z,now" "-Wl,-O1" "-Wl,--strip-debug" "-nodefaultlibs"
  = note: some arguments are omitted. use `--verbose` to show all linker arguments
  = note: ld: error: undefined symbol: 
          
          ERROR[no-panic]: detected panic in function `rem_pio2`
          
          >>> referenced by libm.ddf30f7f94fa857e-cgu.0
          >>>               /home/folkertdev/rust/compiler-builtins/target/i586-unknown-linux-gnu/release-opt/deps/libm-7098e2e2ad176b85.libm.ddf30f7f94fa857e-cgu.0.rcgu.o:(core::ops::function::FnOnce::call_once::h239ee4d4be0e829c)
          >>> referenced by libm.ddf30f7f94fa857e-cgu.0
          >>>               /home/folkertdev/rust/compiler-builtins/target/i586-unknown-linux-gnu/release-opt/deps/libm-7098e2e2ad176b85.libm.ddf30f7f94fa857e-cgu.0.rcgu.o:(core::ops::function::FnOnce::call_once::hf87b8ecfbb337da8)
          >>> referenced by libm.ddf30f7f94fa857e-cgu.0
          >>>               /home/folkertdev/rust/compiler-builtins/target/i586-unknown-linux-gnu/release-opt/deps/libm-7098e2e2ad176b85.libm.ddf30f7f94fa857e-cgu.0.rcgu.o:(libm::math::rem_pio2::rem_pio2::h2b1fed0a2d9377e5)
          
          ld: error: undefined symbol: 
          
          ERROR[no-panic]: detected panic in function `j1f`

The culprit is floor, but eh, wut? Maybe inlining no longer happens (sufficiently) with the inline assembly?

@tgross35
Copy link
Contributor

That's bizarre. The failure didn't show up with naked_asm right? I have no idea what could be different but that seems like an interesting thread to pull...

I'll try to look on Monday but it would be good to trace back where that panic is coming from, maybe an assert_unchecked can get rid of it (I'll figure out how to do the MSRV dance around that).

@beetrees
Copy link
Contributor

After doing a bit of testing, it appears that any inline ASM block that doesn't have the pure option causes no_panic to fail.

@tgross35
Copy link
Contributor

After doing a bit of testing, it appears that any inline ASM block that doesn't have the pure option causes no_panic to fail.

Is this bug in no_panic or is does codegen for non-pure asm somehow prevent eliminating an actual panic?

If it's the first case, I think it might be okay to work around this by updating use_arch_required at https://github.com/folkertdev/compiler-builtins/blob/8f43b0b5f568c688c3504f89097221c2d16749b4/libm/src/math/floor.rs and ceil to include not(assert_no_panic).

@folkertdev
Copy link
Contributor Author

I think it's a codegen/optimization issue. When I move the floor and ceil implementations into their own binary, and add #[no_panic], it fails in debug mode (unsurprising), but succeeds with --release. So it is not the assembly inherently.

https://godbolt.org/z/aad7TYaWf (with no_panic turned on locally, godbolt does not have that sadly)

On the other hand in the compiler-builtins codebase even a asm!("nop") is enough to trigger a panic apparently. Neither inline(always) nor inline(never) changes that.

As far as I can tell there is no actual panic there (see e.g. the godbolt)

@quaternic
Copy link
Contributor

I think there's an argument for using a fixed control word for the frndint, but I'll have to come back to this later.

So, what I had in mind there was that while the implementation from NetBSD is surely more correct in supporting unmasked exceptions, we really don't have that support anywhere in the language (except maybe naked functions) so a chunk of it is unnecessary.

I believe this should work just as well:
(using intel syntax)

extern "C" fn floor(mut x: f64) -> f64 {
    unsafe {
        core::arch::asm!(
            "fld qword ptr [{x}]",
            "fstcw [{x}]",
            "mov word ptr [{x} + 2], 0x077f",
            "fldcw [{x} + 2]",
            "frndint",
            "fldcw [{x}]",
            "fstp qword ptr [{x}]",
            x = in(reg) &mut x,
            out("st(0)") _, out("st(1)") _,
            out("st(2)") _, out("st(3)") _,
            out("st(4)") _, out("st(5)") _,
            out("st(6)") _, out("st(7)") _,
            options(nostack),
        );
    }
    x
}

https://godbolt.org/z/8b6a8Yxn1
It's a bit of a hack to reuse the memory of x for the control words, but since the memory operand is needed regardless, might as well.

@folkertdev
Copy link
Contributor Author

That implementation has the right behavior, but still runs into the panic issue.

@quaternic
Copy link
Contributor

quaternic commented Jul 20, 2025

Yeah, I didn't mean to imply it would do anything for that, and there's currently no way to make x87 inline assembly pure since that would require support for either memory operands or x87 register operands.


As for the panic, it seems that all the errors trace back to the same source: rem_pio2_large contains one call to floor, and inserting asm!("") next to that introduces the same no-panic errors.

Since that call looks to be replaceable with trunc, that would be a possible workaround. (Testing now: #982 )

@tgross35
Copy link
Contributor

Does making the functions extern "C" change things? I wonder if that could be the difference with the naked implementation, because of assumptions around unwinding with extern ABIs.

If that doesn't work, I think the change in #982 is fine if it gets us out of this issue.

I think it's a codegen/optimization issue. When I move the floor and ceil implementations into their own binary, and add #[no_panic], it fails in debug mode (unsurprising), but succeeds with --release. So it is not the assembly inherently.

Do you mean putting the definitions in a separate object file but still checking the panic in rem_pio2? If so, that could be falling into no_panic needing LTO

# Release with maximum optimizations, which is very slow to build. This is also
# what is needed to check `no-panic`.
[profile.release-opt]
inherits = "release"
codegen-units = 1
lto = "fat"

@folkertdev
Copy link
Contributor Author

Locally at least, for floor the combination of extern "C" with inline(never) gets rid of the panics. I'm assuming that is fine (e.g. the naked function version also would not get inlined).

Copy link
Contributor

@tgross35 tgross35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few small nits but it looks good to me, @quaternic could you also review?

I do think it is okay to use #982 as well so we can allow floor to be inlined

Comment on lines 14 to 15
let mut cw_stash = core::mem::MaybeUninit::<u16>::uninit();
let mut cw_tmp = core::mem::MaybeUninit::<u16>::uninit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be imported since it is used a few times?

} else {
return x;
// NOTE: this `inline(never)` is load-bearing for making functions that use it panic-free.
// Without it `rem_pio2_large` (and any function that uses it) will contain panics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do they actually contain panics? Or is it a false positive that the pattern used by no_panic doesn't handle

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been able to actually pinpoint where the reported panic really is. e.g. https://godbolt.org/z/TcEGeYos9 doesn't show any panics. So there is a chance that it's a limitation in no_panic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace i586 ceil and floor implementations with assembly to fix +/-0
4 participants