-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
div_mod_knuth added #18
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow! This is awesome!
Would you be able to run the benchmarks before and after this change to get an idea for the speed improvement?
Also, I will be busy with personal things for couple of weeks, so don't be surprised if I am a little quiet on this PR.
Sure, U256::add time: [1.6429 ns 1.6462 ns 1.6499 ns] U256::div (lo/lo) time: [8.1225 ns 8.1331 ns 8.1452 ns] U256::div (hi/lo) time: [19.067 ns 19.083 ns 19.100 ns] U256::div (hi/hi) time: [171.46 ns 172.14 ns 172.87 ns] U256::mul time: [2.6050 ns 2.6064 ns 2.6078 ns] U256::sub time: [1.6185 ns 1.6207 ns 1.6232 ns] U256::shl time: [1.9257 ns 1.9313 ns 1.9371 ns] U256::shr time: [1.7386 ns 1.7402 ns 1.7424 ns] U256::ctlz time: [798.04 ps 798.59 ps 799.28 ps] U256::cttz time: [881.57 ps 882.25 ps 883.15 ps] U256::rotate_left time: [2.9088 ns 2.9195 ns 2.9321 ns] U256::rotate_right time: [2.7154 ns 2.7197 ns 2.7249 ns] and after: U256::add time: [1.6201 ns 1.6232 ns 1.6267 ns] U256::div (lo/lo) time: [8.1195 ns 8.1333 ns 8.1489 ns] U256::div (hi/lo) time: [19.281 ns 19.295 ns 19.309 ns] U256::div (hi/hi) time: [24.055 ns 24.095 ns 24.141 ns] U256::mul time: [2.6083 ns 2.6114 ns 2.6154 ns] U256::sub time: [1.6206 ns 1.6233 ns 1.6262 ns] U256::shl time: [1.9175 ns 1.9225 ns 1.9283 ns] U256::shr time: [1.7385 ns 1.7405 ns 1.7439 ns] U256::ctlz time: [799.83 ps 801.46 ps 803.33 ps] U256::cttz time: [884.29 ps 885.95 ps 887.80 ps] U256::rotate_left time: [2.9030 ns 2.9095 ns 2.9165 ns] U256::rotate_right time: [2.7006 ns 2.7032 ns 2.7068 ns] |
Previous post was on my desktop PC Intel i12900K. Below is the Graviton 3 (c7g amazon aws instance) bench: U256::div (lo/lo) time: [20.475 ns 20.477 ns 20.479 ns] U256::div (hi/lo) time: [94.612 ns 94.762 ns 94.916 ns] U256::div (hi/hi) time: [336.59 ns 336.61 ns 336.63 ns] U256::mul time: [5.5886 ns 5.5908 ns 5.5931 ns] U256::sub time: [5.4582 ns 5.4601 ns 5.4619 ns] U256::shl time: [4.2159 ns 4.2164 ns 4.2169 ns] U256::shr time: [4.1342 ns 4.1354 ns 4.1369 ns] U256::ctlz time: [2.4419 ns 2.4420 ns 2.4420 ns] U256::cttz time: [2.4857 ns 2.4858 ns 2.4858 ns] U256::rotate_left time: [5.7390 ns 5.7577 ns 5.7763 ns] U256::rotate_right time: [5.5316 ns 5.5386 ns 5.5461 ns] After: U256::add time: [5.4514 ns 5.4521 ns 5.4529 ns] U256::div (lo/lo) time: [20.474 ns 20.480 ns 20.485 ns] U256::div (hi/lo) time: [94.189 ns 94.202 ns 94.215 ns] U256::div (hi/hi) time: [68.181 ns 68.192 ns 68.204 ns] U256::mul time: [5.5856 ns 5.5879 ns 5.5902 ns] U256::sub time: [5.4708 ns 5.4722 ns 5.4736 ns] U256::shl time: [4.2157 ns 4.2162 ns 4.2167 ns] U256::shr time: [4.1333 ns 4.1340 ns 4.1348 ns] U256::ctlz time: [2.4419 ns 2.4420 ns 2.4421 ns] U256::cttz time: [2.4824 ns 2.4834 ns 2.4842 ns] U256::rotate_left time: [5.7638 ns 5.7770 ns 5.7908 ns] U256::rotate_right time: [5.5470 ns 5.5548 ns 5.5626 ns] |
Graviton 2 (c6g* amazon aws): U256::add time: [8.3676 ns 8.3697 ns 8.3718 ns] U256::div (lo/lo) time: [31.313 ns 31.330 ns 31.345 ns] U256::div (hi/lo) time: [142.58 ns 142.62 ns 142.66 ns] U256::div (hi/hi) time: [432.31 ns 432.42 ns 432.55 ns] U256::mul time: [21.649 ns 21.654 ns 21.661 ns] U256::sub time: [8.3902 ns 8.3923 ns 8.3945 ns] U256::shl time: [5.9219 ns 5.9233 ns 5.9248 ns] U256::shr time: [5.8867 ns 5.8869 ns 5.8871 ns] U256::ctlz time: [3.2917 ns 3.2931 ns 3.2947 ns] U256::cttz time: [3.2852 ns 3.2862 ns 3.2872 ns] U256::rotate_left time: [8.6338 ns 8.6409 ns 8.6480 ns] U256::rotate_right time: [8.5947 ns 8.5984 ns 8.6023 ns] After: U256::add time: [8.3776 ns 8.3796 ns 8.3818 ns] U256::div (lo/lo) time: [31.350 ns 31.386 ns 31.423 ns] U256::div (hi/lo) time: [143.55 ns 143.63 ns 143.72 ns] U256::div (hi/hi) time: [132.81 ns 132.88 ns 132.96 ns] U256::mul time: [21.653 ns 21.658 ns 21.664 ns] U256::sub time: [8.3743 ns 8.3750 ns 8.3757 ns] U256::shl time: [5.9165 ns 5.9166 ns 5.9168 ns] U256::shr time: [5.8862 ns 5.8863 ns 5.8865 ns] U256::ctlz time: [3.2861 ns 3.2862 ns 3.2863 ns] U256::cttz time: [3.2849 ns 3.2857 ns 3.2865 ns] U256::rotate_left time: [8.6495 ns 8.6562 ns 8.6631 ns] |
Intel(R) Xeon(R) CPU E3-1270 v6 @ 3.80GHz U256::div (lo/lo) time: [34.656 ns 34.746 ns 34.849 ns] U256::div (hi/lo) time: [101.12 ns 101.34 ns 101.57 ns] U256::div (hi/hi) time: [260.59 ns 261.64 ns 263.05 ns] U256::mul time: [7.2249 ns 7.2486 ns 7.2787 ns] U256::sub time: [3.6266 ns 3.6375 ns 3.6507 ns] U256::shl time: [4.6821 ns 4.7187 ns 4.7582 ns] U256::shr time: [5.4234 ns 5.6343 ns 5.8779 ns] U256::ctlz time: [1.7979 ns 1.8071 ns 1.8172 ns] U256::cttz time: [1.6234 ns 1.6275 ns 1.6317 ns] U256::rotate_left time: [7.6074 ns 7.6350 ns 7.6708 ns] U256::rotate_right time: [6.9703 ns 6.9921 ns 7.0137 ns] After: U256::add time: [3.8436 ns 4.0444 ns 4.2842 ns] U256::div (lo/lo) time: [34.616 ns 34.722 ns 34.836 ns] U256::div (hi/lo) time: [102.35 ns 102.50 ns 102.67 ns] U256::div (hi/hi) time: [76.856 ns 80.566 ns 84.570 ns] U256::mul time: [7.2813 ns 7.3206 ns 7.3653 ns] U256::sub time: [3.6143 ns 3.6161 ns 3.6180 ns] U256::shl time: [4.9900 ns 5.2624 ns 5.5863 ns] U256::shr time: [5.2371 ns 5.3298 ns 5.4537 ns] U256::ctlz time: [1.9459 ns 2.0639 ns 2.2043 ns] U256::cttz time: [1.6350 ns 1.6419 ns 1.6494 ns] U256::rotate_left time: [7.7551 ns 7.7892 ns 7.8216 ns] U256::rotate_right time: [6.8823 ns 6.8993 ns 6.9163 ns] |
Please note that I did not test big endian arch. I don't know how to test it to be honest. I'm reasonably sure that div_mod_knuth is architecture-agnostic, but it is better to test. |
Sorry, just getting back to this now. The changes look great. I also ran the more comprehensive benchmarks from the
|
I'm just going to run the fuzzer for a bit to see if there are any regressions with division, then this should be good to merge. |
Added div_mod_knuth division for two large >128bit numbers. Resolves #16.