Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix i128.clz edge case #93

Merged
merged 1 commit into from
Jan 23, 2025
Merged

Conversation

BlobMaster41
Copy link
Contributor

In short, the patched functions implement exactly how you do leading/trailing-zero counts on a pair of 64-bit values (treated as a 128-bit integer) by directly checking which half (“hi” vs. “lo”) is zero. By contrast, the original code tried to be clever with a “mask trick” and could break in corner cases (especially around zero and signedness). Here is the essential difference:

  1. In the patched __clz128:

    • We reinterpret hi as an unsigned 64-bit (h = <u64>hi), making sure we do a pure bitwise operation rather than a signed one.
    • If the high half (h) is zero, the leading zeros must come from whatever is in lo. That means we have a full 64 bits of leading zeros in the high half plus however many leading zeros are in lo.
    • If the high half is not zero, we just do a clz on that half (because the leading bits in that 128-bit integer must appear in the high half).
  2. In the patched __ctz128:

    • We simply check the low half first: if lo == 0, then the entire lower 64 bits are zero, so all the trailing zeros must be in those lower bits. We then add another 64 bits of zeros to the count of how many zeros are at the bottom of hi.
    • Otherwise, if lo is non-zero, the trailing zeros can only be in the low half, so we just do ctz(lo).
  3. By contrast, the original code tried to do something like:

    var mask: u64 = <i64>(hi ^ (hi - 1)) >> 63;
    return <i32>clz((hi & ~mask) | (lo & mask)) + ((<i32>mask) & 64);

    That mask is supposed to become 1 if hi == 0 and 0 otherwise (or vice versa) so it can select either hi or lo. In practice, this “mask trick” can fail on certain boundary conditions (like hi and lo both being zero, or if hi is treated as signed). You also lose clarity and risk sign-extension issues, because hi might be a signed i64 in an i128 context.

export function __clz128(lo: u64, hi: u64): i32 {
var mask: u64 = <i64>(hi ^ (hi - 1)) >> 63;
return <i32>clz((hi & ~mask) | (lo & mask)) + (<i32>mask & 64);
export function __clz128(lo: u64, hi: i64): i32 {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export function __clz128(lo: u64, hi: i64): i32 {
export function __clz128(lo: u64, hi: u64): i32 {

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i64.clz is sign agnostic, so we can use always unsigned types

return <i32>ctz((hi & mask) | (lo & ~mask)) + (<i32>mask & 64);
if (lo == 0) {
// Otherwise, ctz is 64 plus ctz(hi)
return 64 + <i32>i64.ctz(hi);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return 64 + <i32>i64.ctz(hi);
return 64 + <i32>ctz(hi);

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simplify due to ctz / clz is generic and can infer type from argument

return 64 + <i32>i64.ctz(hi);
} else {
// If the lower 64 bits are non-zero, measure ctz(lo)
return <i32>i64.ctz(lo);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return <i32>i64.ctz(lo);
return <i32>ctz(lo);

let h: u64 = <u64>hi; // reinterpret hi as unsigned
if (h == 0) {
// If hi is 0, the leading zeros are "64 plus however many are in lo"
return 64 + <i32>i64.clz(lo);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return 64 + <i32>i64.clz(lo);
return 64 + <i32>clz(lo);

Comment on lines +242 to +243
let h: u64 = <u64>hi; // reinterpret hi as unsigned
if (h == 0) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let h: u64 = <u64>hi; // reinterpret hi as unsigned
if (h == 0) {
if (hi == 0) {

return 64 + <i32>i64.clz(lo);
} else {
// The top 64 bits are set => just measure their leading zeros
return <i32>i64.clz(h);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return <i32>i64.clz(h);
return <i32>clz(h);

@MaxGraey
Copy link
Owner

I agree that this variant is more concise and has no edge-cases than the previous one despite the fact that now clz / ctz are calculated twice due to select (conditional mov).

I will accept this PR after minor adjustments

@MaxGraey MaxGraey merged commit 894ecba into MaxGraey:master Jan 23, 2025
2 checks passed
@MaxGraey
Copy link
Owner

Thanks!

MaxGraey added a commit that referenced this pull request Jan 23, 2025
@BlobMaster41
Copy link
Contributor Author

There are still PRs that I need to open, I will make them asap

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants