Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve imulc Performance #21

Open
2 tasks
nlordell opened this issue Sep 18, 2022 · 3 comments
Open
2 tasks

Improve imulc Performance #21

nlordell opened this issue Sep 18, 2022 · 3 comments

Comments

@nlordell
Copy link
Owner

Checked signed integer multiplication (imulc and I256::checked_mul) use division to verify overflows. This is, I think, quite inefficient and unnecessary.

This issue captures the work to:

  • Benched checked signed multiplication
  • Investigate and implement possible alternative algorithms

One idea would be to multiply absolutes and then verify the sign afterwards. I'm not sure if this will work for I256::overflowing_mul (in that the wrapping produces different results), but it might be worth a try!

@nlordell nlordell added good first issue Good for newcomers and removed good first issue Good for newcomers labels Sep 18, 2022
@NCGThompson
Copy link
Contributor

As expected, imulc has abysmal performance. On Kaby Lake, I256::overflowing_mul benches 20 to 27 times slower than wrapping_mul. According to uops.info, on Ice Lake up, 128-bit / 64-bit DIV runs with 1/2 to 1/3 the rthroughput, so it should be better. I can't find similar data for non-intel chips. Interestingly, with or without llvm-intrinsics, U256::overflowing_mul benches about twice the time of U256::wrapping_mul, so we should probably manage our expectations. I might take a look with mca to see where that extra time is coming from.

Since there is no llvm-intrinsics function for imulc, the slow native function is also ran with llvm-intrinsics. Consider adding an intrinsic. Also,

https://github.com/nlordell/ethnum-rs/blob/b66678b45c9f935930a95aac8b0e379b9b7d3dea/src/intrinsics/native/mul.rs#L87C2-L87C2

can trivially be replaced with

let r = abs_a.as_u256().overflowing_mul(abs_b.as_u256());
r.1 || r.0 > I256::MAX.as_u256()

The redundant multiplication is tiny next to a division.

@NCGThompson
Copy link
Contributor

tl;dr: nlordell's idea of verifying the sign afterwards should work and his concern of wrapping problems is unfounded. There is a tiny caveat with I256::MIN.

The advantage of two's complement is that the values wrap around correctly, and are congruent modulo 2bits to their binary representation reinterpreted as an unsigned integer. This means that we can do modulo arithmetic on congruence classes and apply it to signed and unsigned integers simultaneously.

imulc is given two integers, 256 and 256. One of our outputs, 256, is the modular product of our two inputs. The bool output overflow, on the other hand, is partial to whether or not we are interpreting the values as signed or unsigned.

Let's define functions that interpret the congruence classes as regular integers. signed(x) interprets x as a signed integer while unsigned(x) interprets x as an unsigned integer. So for example signed(-1̅0256) = -10 while unsigned(-1̅0256) = 2256 - 10.

overflow should return positive iff signed(256) * signed(256) ≠ signed(256). Equivalently, overflow should return false iff signed(I256::MIN) ≤ signed(256) * signed(256) ≤ signed(I256::MAX).

One idea would be to multiply absolutes and then verify the sign afterwards.

That makes sense. If umulc(...).1 of the absolute values returns true, then we know overflow is true. I'm not going to bother to explain that with math symbols. If umulc didn't overflow and signed(product of absolutes) is non-negative, then we know overflow is false. If the product is negative, then overflow is true unless the product is I256::MIN and exactly one of our inputs is negative.

I'm not sure if this will work for I256::overflowing_mul (in that the wrapping produces different results)

We wan't to be able to "undo" the absolute value at the end by negating the product if exactly one of the inputs was negative. So the question is whether or not 256 ≡ 0̅256 - (0̅256 - 256) * 256 and 256 ≡ (0̅256 - 256) * (0̅256 - 256)

With modulo arithmetic, congruence is preserved with addition, subtraction, and multiplication. Therefor we can rewrite the question more simply: "Does a * b = -(-a * b) and a * b = -a * - b?" This is clearly true.

@NCGThompson
Copy link
Contributor

This is how llvm does it. This function is called on Aarch64 for i128 mul. https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/builtins/int_mulo_impl.inc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants