Fix i128.clz edge case #93

BlobMaster41 · 2025-01-20T22:40:08Z

In short, the patched functions implement exactly how you do leading/trailing-zero counts on a pair of 64-bit values (treated as a 128-bit integer) by directly checking which half (“hi” vs. “lo”) is zero. By contrast, the original code tried to be clever with a “mask trick” and could break in corner cases (especially around zero and signedness). Here is the essential difference:

In the patched __clz128:
- We reinterpret hi as an unsigned 64-bit (h = <u64>hi), making sure we do a pure bitwise operation rather than a signed one.
- If the high half (h) is zero, the leading zeros must come from whatever is in lo. That means we have a full 64 bits of leading zeros in the high half plus however many leading zeros are in lo.
- If the high half is not zero, we just do a clz on that half (because the leading bits in that 128-bit integer must appear in the high half).
In the patched __ctz128:
- We simply check the low half first: if lo == 0, then the entire lower 64 bits are zero, so all the trailing zeros must be in those lower bits. We then add another 64 bits of zeros to the count of how many zeros are at the bottom of hi.
- Otherwise, if lo is non-zero, the trailing zeros can only be in the low half, so we just do ctz(lo).
By contrast, the original code tried to do something like:
```
var mask: u64 = <i64>(hi ^ (hi - 1)) >> 63;
return <i32>clz((hi & ~mask) | (lo & mask)) + ((<i32>mask) & 64);
```
That mask is supposed to become 1 if hi == 0 and 0 otherwise (or vice versa) so it can select either hi or lo. In practice, this “mask trick” can fail on certain boundary conditions (like hi and lo both being zero, or if hi is treated as signed). You also lose clarity and risk sign-extension issues, because hi might be a signed i64 in an i128 context.

MaxGraey · 2025-01-21T15:32:26Z

assembly/globals.ts

-export function __clz128(lo: u64, hi: u64): i32 {
-  var mask: u64 = <i64>(hi ^ (hi - 1)) >> 63;
-  return <i32>clz((hi & ~mask) | (lo & mask)) + (<i32>mask & 64);
+export function __clz128(lo: u64, hi: i64): i32 {


Suggested change

export function __clz128(lo: u64, hi: i64): i32 {

export function __clz128(lo: u64, hi: u64): i32 {

i64.clz is sign agnostic, so we can use always unsigned types

MaxGraey · 2025-01-21T15:33:43Z

assembly/globals.ts

-  return <i32>ctz((hi & mask) | (lo & ~mask)) + (<i32>mask & 64);
+  if (lo == 0) {
+    // Otherwise, ctz is 64 plus ctz(hi)
+    return 64 + <i32>i64.ctz(hi);


Suggested change

return 64 + <i32>i64.ctz(hi);

return 64 + <i32>ctz(hi);

We can simplify due to ctz / clz is generic and can infer type from argument

MaxGraey · 2025-01-21T15:34:32Z

assembly/globals.ts

+    return 64 + <i32>i64.ctz(hi);
+  } else {
+    // If the lower 64 bits are non-zero, measure ctz(lo)
+    return <i32>i64.ctz(lo);


Suggested change

return <i32>i64.ctz(lo);

return <i32>ctz(lo);

MaxGraey · 2025-01-21T15:34:47Z

assembly/globals.ts

+  let h: u64 = <u64>hi;  // reinterpret hi as unsigned
+  if (h == 0) {
+    // If hi is 0, the leading zeros are "64 plus however many are in lo"
+    return 64 + <i32>i64.clz(lo);


Suggested change

return 64 + <i32>i64.clz(lo);

return 64 + <i32>clz(lo);

MaxGraey · 2025-01-21T15:35:00Z

assembly/globals.ts

+  let h: u64 = <u64>hi;  // reinterpret hi as unsigned
+  if (h == 0) {


Suggested change

let h: u64 = <u64>hi; // reinterpret hi as unsigned

if (h == 0) {

if (hi == 0) {

MaxGraey · 2025-01-21T15:35:19Z

assembly/globals.ts

+    return 64 + <i32>i64.clz(lo);
+  } else {
+    // The top 64 bits are set => just measure their leading zeros
+    return <i32>i64.clz(h);


Suggested change

return <i32>i64.clz(h);

return <i32>clz(h);

MaxGraey · 2025-01-21T15:38:00Z

I agree that this variant is more concise and has no edge-cases than the previous one despite the fact that now clz / ctz are calculated twice due to select (conditional mov).

I will accept this PR after minor adjustments

MaxGraey · 2025-01-23T18:26:57Z

Thanks!

BlobMaster41 · 2025-01-23T22:25:48Z

There are still PRs that I need to open, I will make them asap

Update globals.ts

b840ad2

MaxGraey approved these changes Jan 21, 2025

View reviewed changes

MaxGraey reviewed Jan 21, 2025

View reviewed changes

MaxGraey merged commit 894ecba into MaxGraey:master Jan 23, 2025
2 checks passed

MaxGraey added a commit that referenced this pull request Jan 23, 2025

(chore) some cosmetic changes after #93

f64de31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix i128.clz edge case #93

Fix i128.clz edge case #93

BlobMaster41 commented Jan 20, 2025

MaxGraey Jan 21, 2025

MaxGraey Jan 21, 2025

MaxGraey Jan 21, 2025

MaxGraey Jan 21, 2025

MaxGraey Jan 21, 2025

MaxGraey Jan 21, 2025

MaxGraey Jan 21, 2025

MaxGraey Jan 21, 2025

MaxGraey commented Jan 21, 2025

MaxGraey commented Jan 23, 2025

BlobMaster41 commented Jan 23, 2025

	export function __clz128(lo: u64, hi: i64): i32 {
	export function __clz128(lo: u64, hi: u64): i32 {

		let h: u64 = <u64>hi; // reinterpret hi as unsigned
		if (h == 0) {

	let h: u64 = <u64>hi; // reinterpret hi as unsigned
	if (h == 0) {
	if (hi == 0) {

Fix i128.clz edge case #93

Fix i128.clz edge case #93

Conversation

BlobMaster41 commented Jan 20, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MaxGraey commented Jan 21, 2025

MaxGraey commented Jan 23, 2025

BlobMaster41 commented Jan 23, 2025