-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix i128.clz edge case #93
Conversation
export function __clz128(lo: u64, hi: u64): i32 { | ||
var mask: u64 = <i64>(hi ^ (hi - 1)) >> 63; | ||
return <i32>clz((hi & ~mask) | (lo & mask)) + (<i32>mask & 64); | ||
export function __clz128(lo: u64, hi: i64): i32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
export function __clz128(lo: u64, hi: i64): i32 { | |
export function __clz128(lo: u64, hi: u64): i32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i64.clz is sign agnostic, so we can use always unsigned types
return <i32>ctz((hi & mask) | (lo & ~mask)) + (<i32>mask & 64); | ||
if (lo == 0) { | ||
// Otherwise, ctz is 64 plus ctz(hi) | ||
return 64 + <i32>i64.ctz(hi); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return 64 + <i32>i64.ctz(hi); | |
return 64 + <i32>ctz(hi); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can simplify due to ctz
/ clz
is generic and can infer type from argument
return 64 + <i32>i64.ctz(hi); | ||
} else { | ||
// If the lower 64 bits are non-zero, measure ctz(lo) | ||
return <i32>i64.ctz(lo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return <i32>i64.ctz(lo); | |
return <i32>ctz(lo); |
let h: u64 = <u64>hi; // reinterpret hi as unsigned | ||
if (h == 0) { | ||
// If hi is 0, the leading zeros are "64 plus however many are in lo" | ||
return 64 + <i32>i64.clz(lo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return 64 + <i32>i64.clz(lo); | |
return 64 + <i32>clz(lo); |
let h: u64 = <u64>hi; // reinterpret hi as unsigned | ||
if (h == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let h: u64 = <u64>hi; // reinterpret hi as unsigned | |
if (h == 0) { | |
if (hi == 0) { |
return 64 + <i32>i64.clz(lo); | ||
} else { | ||
// The top 64 bits are set => just measure their leading zeros | ||
return <i32>i64.clz(h); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return <i32>i64.clz(h); | |
return <i32>clz(h); |
I agree that this variant is more concise and has no edge-cases than the previous one despite the fact that now clz / ctz are calculated twice due to select (conditional mov). I will accept this PR after minor adjustments |
Thanks! |
There are still PRs that I need to open, I will make them asap |
In short, the patched functions implement exactly how you do leading/trailing-zero counts on a pair of 64-bit values (treated as a 128-bit integer) by directly checking which half (“hi” vs. “lo”) is zero. By contrast, the original code tried to be clever with a “mask trick” and could break in corner cases (especially around zero and signedness). Here is the essential difference:
In the patched
__clz128
:hi
as an unsigned 64-bit (h = <u64>hi
), making sure we do a pure bitwise operation rather than a signed one.h
) is zero, the leading zeros must come from whatever is inlo
. That means we have a full 64 bits of leading zeros in the high half plus however many leading zeros are inlo
.clz
on that half (because the leading bits in that 128-bit integer must appear in the high half).In the patched
__ctz128
:lo == 0
, then the entire lower 64 bits are zero, so all the trailing zeros must be in those lower bits. We then add another 64 bits of zeros to the count of how many zeros are at the bottom ofhi
.lo
is non-zero, the trailing zeros can only be in the low half, so we just doctz(lo)
.By contrast, the original code tried to do something like:
That
mask
is supposed to become1
ifhi == 0
and0
otherwise (or vice versa) so it can select eitherhi
orlo
. In practice, this “mask trick” can fail on certain boundary conditions (likehi
andlo
both being zero, or ifhi
is treated as signed). You also lose clarity and risk sign-extension issues, becausehi
might be a signedi64
in an i128 context.