Skip to content

Better @sqrt for f32, f64 and f128 #23865

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

chrocapix
Copy link

This branch contains a zig port of the new musl sqrt function for f32, f64 and f128, found in lib/libc/musl/src/math/sqrt*.c

I thoroughly tested the port here:

  • all 2^32 f32 values give the same result bit-for-bit
  • 100e6 random f64 values also give the same results
  • 100e6 random f128 values are under 0.5 ulp away from the exact value (tested against MPFR)

The new functions should be much faster (I only run benchmarks on aarch64 and x86_64, though.)

I ran the following tests on the branch:

  • stage4/bin/zig build test -Denable-llvm=true -Dskip-release=true -Dskip-non-native=true
  • stage4/bin/zig build test-std -Dskip-release=true -Dskip-non-native=true
  • zig test lib/compiler_rt.zig on arches powerpc, riscv64, riscv32, mipsel, aarch64_be, arm, x86_64, powerpc64, hexagon, mips64el, mips64, sparc, sparc64, mips and x86 via qemu.

They all pass expect compiler_rt on sparc which fails with:
LLVM ERROR: SPARCv8 does not handle f128 in calls; pass indirectly
but that test was already failing before my changes.

chrocapix and others added 5 commits April 16, 2025 08:12
f32, f64 and f128 implementations ported from a more recent musl libc.

All three are correctly rounded and much faster than the previous
implementation.
@chrocapix
Copy link
Author

I investigated the test failure on hexagon and it's weird. Using prints, I've located the problem to here: sqrt.zig:240

if (top % 2 != 0) ml >>= 1;

When calling sqrtq with 9.0 directly, no problem. When calling it via @sqrt, that if statement changes ml from 0x90000000000000000000000000000000 to 0x9000000000000000ffffffffffffffff. To make things worse, top is even at that point so ml >>= 1 is not evaluated.

That doesn't look like an sqrtq bug to me but I've no clue how to debug this further. Anybody have any ideas?

@alexrp
Copy link
Member

alexrp commented May 13, 2025

That sounds like an LLVM bug of some kind. If you feel like it, it would be great to get a reduced reproduction of the issue that could be filed upstream. It's still early days for Hexagon support in Zig, so on our end, we should just disable any affected tests on Hexagon; I don't think you need to include a Hexagon-specific workaround in the code.

@chrocapix
Copy link
Author

That sounds like an LLVM bug of some kind. If you feel like it, it would be great to get a reduced reproduction of the issue that could be filed upstream. It's still early days for Hexagon support in Zig, so on our end, we should just disable any affected tests on Hexagon; I don't think you need to include a Hexagon-specific workaround in the code.

Sorry, I didn't mean to push that workaround. I tried but I couldn't reproduce the problem without a call to @sqrt unfortunately.

So now, I revert the that last commit and disable the failing test for hexagon in test/behavior/floatop.zig with something like this:

if (builtin.zig_backend == .stage2_llvm and builtin.cpu.arch == .hexagon) return error.SkipZigTest;

correct?

@alexrp
Copy link
Member

alexrp commented May 13, 2025

I tried but I couldn't reproduce the problem without a call to @sqrt unfortunately.

That's fine. A repro can just be the sqrtq implementation + (strong) export + the code to call it via @sqrt. You can then try to reduce it by removing code from the sqrtq implementation until you're left with as little code as necessary to trigger the bug.

But it's also fine if you don't feel like going through this process; we can just file an issue and follow up in the future.

So now, I revert the that last commit and disable the failing test for hexagon in test/behavior/floatop.zig with something like this:

Yes, that seems reasonable. It'd be good to place the check as near to the failure as possible so that the working parts keep getting tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants