div_mod_knuth added #18

mcseemk · 2022-07-05T09:58:30Z

Added div_mod_knuth division for two large >128bit numbers. Resolves #16.

nlordell

Wow! This is awesome!

Would you be able to run the benchmarks before and after this change to get an idea for the speed improvement?

Also, I will be busy with personal things for couple of weeks, so don't be surprised if I am a little quiet on this PR.

mcseemk · 2022-07-07T23:47:54Z

Sure,
bench before:

U256::add time: [1.6429 ns 1.6462 ns 1.6499 ns]
Found 15 outliers among 100 measurements (15.00%)
7 (7.00%) low mild
7 (7.00%) high mild
1 (1.00%) high severe

U256::div (lo/lo) time: [8.1225 ns 8.1331 ns 8.1452 ns]
Found 15 outliers among 100 measurements (15.00%)
5 (5.00%) high mild
10 (10.00%) high severe

U256::div (hi/lo) time: [19.067 ns 19.083 ns 19.100 ns]
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) low mild
4 (4.00%) high mild

U256::div (hi/hi) time: [171.46 ns 172.14 ns 172.87 ns]

U256::mul time: [2.6050 ns 2.6064 ns 2.6078 ns]
Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) low severe
1 (1.00%) low mild
5 (5.00%) high mild
2 (2.00%) high severe

U256::sub time: [1.6185 ns 1.6207 ns 1.6232 ns]
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe

U256::shl time: [1.9257 ns 1.9313 ns 1.9371 ns]
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe

U256::shr time: [1.7386 ns 1.7402 ns 1.7424 ns]
Found 17 outliers among 100 measurements (17.00%)
4 (4.00%) high mild
13 (13.00%) high severe

U256::ctlz time: [798.04 ps 798.59 ps 799.28 ps]
Found 13 outliers among 100 measurements (13.00%)
2 (2.00%) low mild
3 (3.00%) high mild
8 (8.00%) high severe

U256::cttz time: [881.57 ps 882.25 ps 883.15 ps]
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe

U256::rotate_left time: [2.9088 ns 2.9195 ns 2.9321 ns]
Found 16 outliers among 100 measurements (16.00%)
2 (2.00%) high mild
14 (14.00%) high severe

U256::rotate_right time: [2.7154 ns 2.7197 ns 2.7249 ns]
Found 14 outliers among 100 measurements (14.00%)
2 (2.00%) low mild
2 (2.00%) high mild
10 (10.00%) high severe

and after:

U256::add time: [1.6201 ns 1.6232 ns 1.6267 ns]
change: [-1.5315% -1.1987% -0.8861%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild

U256::div (lo/lo) time: [8.1195 ns 8.1333 ns 8.1489 ns]
change: [-0.2028% -0.0507% +0.1022%] (p = 0.52 > 0.05)
No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe

U256::div (hi/lo) time: [19.281 ns 19.295 ns 19.309 ns]
change: [+1.0802% +1.2256% +1.3774%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe

U256::div (hi/hi) time: [24.055 ns 24.095 ns 24.141 ns]
change: [-86.074% -86.006% -85.935%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
1 (1.00%) high mild
2 (2.00%) high severe

U256::mul time: [2.6083 ns 2.6114 ns 2.6154 ns]
change: [+0.3169% +0.7328% +1.2796%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe

U256::sub time: [1.6206 ns 1.6233 ns 1.6262 ns]
change: [-0.0591% +0.1864% +0.4386%] (p = 0.15 > 0.05)
No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

U256::shl time: [1.9175 ns 1.9225 ns 1.9283 ns]
change: [-0.6044% -0.1814% +0.2108%] (p = 0.39 > 0.05)
No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) high mild
4 (4.00%) high severe

U256::shr time: [1.7385 ns 1.7405 ns 1.7439 ns]
change: [-0.4575% -0.1798% +0.0966%] (p = 0.21 > 0.05)
No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
3 (3.00%) high mild
8 (8.00%) high severe

U256::ctlz time: [799.83 ps 801.46 ps 803.33 ps]
change: [+0.8143% +1.2245% +1.6966%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) high mild
3 (3.00%) high severe

U256::cttz time: [884.29 ps 885.95 ps 887.80 ps]
change: [+0.1541% +0.3471% +0.5903%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) high mild
1 (1.00%) high severe

U256::rotate_left time: [2.9030 ns 2.9095 ns 2.9165 ns]
change: [-0.4245% -0.1878% +0.0451%] (p = 0.13 > 0.05)
No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
9 (9.00%) high severe

U256::rotate_right time: [2.7006 ns 2.7032 ns 2.7068 ns]
change: [-0.6416% -0.4652% -0.3043%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe

mcseemk · 2022-07-08T08:25:33Z

Previous post was on my desktop PC Intel i12900K. Below is the Graviton 3 (c7g amazon aws instance) bench:
Before:
U256::add time: [5.4664 ns 5.4688 ns 5.4718 ns]
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low severe
2 (2.00%) low mild
1 (1.00%) high severe

U256::div (lo/lo) time: [20.475 ns 20.477 ns 20.479 ns]
Found 19 outliers among 100 measurements (19.00%)
1 (1.00%) low severe
10 (10.00%) low mild
7 (7.00%) high mild
1 (1.00%) high severe

U256::div (hi/lo) time: [94.612 ns 94.762 ns 94.916 ns]

U256::div (hi/hi) time: [336.59 ns 336.61 ns 336.63 ns]
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low severe
5 (5.00%) high mild
1 (1.00%) high severe

U256::mul time: [5.5886 ns 5.5908 ns 5.5931 ns]
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low severe
1 (1.00%) low mild
2 (2.00%) high mild

U256::sub time: [5.4582 ns 5.4601 ns 5.4619 ns]
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low severe
3 (3.00%) low mild
4 (4.00%) high mild

U256::shl time: [4.2159 ns 4.2164 ns 4.2169 ns]
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low mild
4 (4.00%) high severe

U256::shr time: [4.1342 ns 4.1354 ns 4.1369 ns]
Found 13 outliers among 100 measurements (13.00%)
7 (7.00%) high mild
6 (6.00%) high severe

U256::ctlz time: [2.4419 ns 2.4420 ns 2.4420 ns]
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe

U256::cttz time: [2.4857 ns 2.4858 ns 2.4858 ns]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

U256::rotate_left time: [5.7390 ns 5.7577 ns 5.7763 ns]
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe

U256::rotate_right time: [5.5316 ns 5.5386 ns 5.5461 ns]
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe

After:

U256::add time: [5.4514 ns 5.4521 ns 5.4529 ns]
change: [-0.3786% -0.3263% -0.2762%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
4 (4.00%) low severe
4 (4.00%) low mild
1 (1.00%) high severe

U256::div (lo/lo) time: [20.474 ns 20.480 ns 20.485 ns]
change: [-0.0804% -0.0553% -0.0291%] (p = 0.00 < 0.05)
Change within noise threshold.

U256::div (hi/lo) time: [94.189 ns 94.202 ns 94.215 ns]
change: [-0.7320% -0.6301% -0.5264%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
2 (2.00%) high mild

U256::div (hi/hi) time: [68.181 ns 68.192 ns 68.204 ns]
change: [-79.746% -79.742% -79.738%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe

U256::mul time: [5.5856 ns 5.5879 ns 5.5902 ns]
change: [-0.1402% -0.0720% -0.0009%] (p = 0.05 < 0.05)
Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
5 (5.00%) low mild
1 (1.00%) high mild

U256::sub time: [5.4708 ns 5.4722 ns 5.4736 ns]
change: [+0.2187% +0.2696% +0.3182%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) low severe
2 (2.00%) low mild

U256::shl time: [4.2157 ns 4.2162 ns 4.2167 ns]
change: [-0.0453% -0.0197% +0.0044%] (p = 0.11 > 0.05)
No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low severe
2 (2.00%) low mild
3 (3.00%) high mild

U256::shr time: [4.1333 ns 4.1340 ns 4.1348 ns]
change: [-0.0832% -0.0447% -0.0078%] (p = 0.02 < 0.05)
Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) low mild
3 (3.00%) high severe

U256::ctlz time: [2.4419 ns 2.4420 ns 2.4421 ns]
change: [-0.0089% -0.0009% +0.0079%] (p = 0.84 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low severe
1 (1.00%) low mild
5 (5.00%) high mild
1 (1.00%) high severe

U256::cttz time: [2.4824 ns 2.4834 ns 2.4842 ns]
change: [-0.2846% -0.2477% -0.2093%] (p = 0.00 < 0.05)
Change within noise threshold.

U256::rotate_left time: [5.7638 ns 5.7770 ns 5.7908 ns]
change: [+0.3731% +0.9748% +1.6579%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
4 (4.00%) low mild
3 (3.00%) high mild
4 (4.00%) high severe

U256::rotate_right time: [5.5470 ns 5.5548 ns 5.5626 ns]
change: [-0.1547% +0.0932% +0.3309%] (p = 0.46 > 0.05)
No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

mcseemk · 2022-07-08T08:43:51Z

Graviton 2 (c6g* amazon aws):
Before:

U256::add time: [8.3676 ns 8.3697 ns 8.3718 ns]
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe

U256::div (lo/lo) time: [31.313 ns 31.330 ns 31.345 ns]
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe

U256::div (hi/lo) time: [142.58 ns 142.62 ns 142.66 ns]
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) low mild
3 (3.00%) high mild
1 (1.00%) high severe

U256::div (hi/hi) time: [432.31 ns 432.42 ns 432.55 ns]
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe

U256::mul time: [21.649 ns 21.654 ns 21.661 ns]
Found 6 outliers among 100 measurements (6.00%)
5 (5.00%) high mild
1 (1.00%) high severe

U256::sub time: [8.3902 ns 8.3923 ns 8.3945 ns]
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild

U256::shl time: [5.9219 ns 5.9233 ns 5.9248 ns]
Found 9 outliers among 100 measurements (9.00%)
7 (7.00%) high mild
2 (2.00%) high severe

U256::shr time: [5.8867 ns 5.8869 ns 5.8871 ns]
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) low mild
2 (2.00%) high severe

U256::ctlz time: [3.2917 ns 3.2931 ns 3.2947 ns]
Found 12 outliers among 100 measurements (12.00%)
10 (10.00%) high mild
2 (2.00%) high severe

U256::cttz time: [3.2852 ns 3.2862 ns 3.2872 ns]
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) low severe
1 (1.00%) low mild
4 (4.00%) high mild
1 (1.00%) high severe

U256::rotate_left time: [8.6338 ns 8.6409 ns 8.6480 ns]
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild

U256::rotate_right time: [8.5947 ns 8.5984 ns 8.6023 ns]
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild

After:

U256::add time: [8.3776 ns 8.3796 ns 8.3818 ns]
change: [+0.0877% +0.1233% +0.1592%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low mild
6 (6.00%) high mild
1 (1.00%) high severe

U256::div (lo/lo) time: [31.350 ns 31.386 ns 31.423 ns]
change: [+0.2172% +0.3445% +0.4766%] (p = 0.00 < 0.05)
Change within noise threshold.

U256::div (hi/lo) time: [143.55 ns 143.63 ns 143.72 ns]
change: [+0.5793% +0.6375% +0.7054%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) low mild
6 (6.00%) high mild

U256::div (hi/hi) time: [132.81 ns 132.88 ns 132.96 ns]
change: [-69.311% -69.290% -69.270%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

U256::mul time: [21.653 ns 21.658 ns 21.664 ns]
change: [-0.0171% +0.0175% +0.0521%] (p = 0.32 > 0.05)
No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe

U256::sub time: [8.3743 ns 8.3750 ns 8.3757 ns]
change: [-0.1994% -0.1761% -0.1541%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild

U256::shl time: [5.9165 ns 5.9166 ns 5.9168 ns]
change: [-0.1599% -0.1259% -0.0897%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low severe
2 (2.00%) low mild
1 (1.00%) high mild
4 (4.00%) high severe

U256::shr time: [5.8862 ns 5.8863 ns 5.8865 ns]
change: [-0.0270% -0.0121% +0.0038%] (p = 0.13 > 0.05)
No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low severe
2 (2.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe

U256::ctlz time: [3.2861 ns 3.2862 ns 3.2863 ns]
change: [-0.2516% -0.2076% -0.1648%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) low severe
1 (1.00%) low mild
4 (4.00%) high mild
3 (3.00%) high severe

U256::cttz time: [3.2849 ns 3.2857 ns 3.2865 ns]
change: [-0.0756% -0.0139% +0.0411%] (p = 0.65 > 0.05)
No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low severe
2 (2.00%) high mild

U256::rotate_left time: [8.6495 ns 8.6562 ns 8.6631 ns]
change: [+0.0940% +0.2308% +0.3666%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe

mcseemk · 2022-07-08T09:00:36Z

Intel(R) Xeon(R) CPU E3-1270 v6 @ 3.80GHz
Before:
U256::add time: [3.7044 ns 3.7158 ns 3.7301 ns]
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low severe
1 (1.00%) low mild
3 (3.00%) high mild
1 (1.00%) high severe

U256::div (lo/lo) time: [34.656 ns 34.746 ns 34.849 ns]
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe

U256::div (hi/lo) time: [101.12 ns 101.34 ns 101.57 ns]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

U256::div (hi/hi) time: [260.59 ns 261.64 ns 263.05 ns]
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe

U256::mul time: [7.2249 ns 7.2486 ns 7.2787 ns]
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe

U256::sub time: [3.6266 ns 3.6375 ns 3.6507 ns]
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low mild
2 (2.00%) high mild
5 (5.00%) high severe

U256::shl time: [4.6821 ns 4.7187 ns 4.7582 ns]
Found 8 outliers among 100 measurements (8.00%)
7 (7.00%) high mild
1 (1.00%) high severe

U256::shr time: [5.4234 ns 5.6343 ns 5.8779 ns]
Found 16 outliers among 100 measurements (16.00%)
2 (2.00%) high mild
14 (14.00%) high severe

U256::ctlz time: [1.7979 ns 1.8071 ns 1.8172 ns]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

U256::cttz time: [1.6234 ns 1.6275 ns 1.6317 ns]
Found 11 outliers among 100 measurements (11.00%)
4 (4.00%) high mild
7 (7.00%) high severe

U256::rotate_left time: [7.6074 ns 7.6350 ns 7.6708 ns]
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe

U256::rotate_right time: [6.9703 ns 6.9921 ns 7.0137 ns]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild

After:

U256::add time: [3.8436 ns 4.0444 ns 4.2842 ns]
change: [+0.9428% +3.2698% +6.2352%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
1 (1.00%) low mild
13 (13.00%) high severe

U256::div (lo/lo) time: [34.616 ns 34.722 ns 34.836 ns]
change: [-1.8507% -1.2031% -0.6344%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) high mild
1 (1.00%) high severe

U256::div (hi/lo) time: [102.35 ns 102.50 ns 102.67 ns]
change: [+0.9068% +1.1183% +1.3322%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe

U256::div (hi/hi) time: [76.856 ns 80.566 ns 84.570 ns]
change: [-71.760% -71.055% -70.233%] (p = 0.00 < 0.05)
Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
13 (13.00%) high severe

U256::mul time: [7.2813 ns 7.3206 ns 7.3653 ns]
change: [-0.3070% +0.5551% +1.2446%] (p = 0.16 > 0.05)
No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe

U256::sub time: [3.6143 ns 3.6161 ns 3.6180 ns]
change: [-0.8392% -0.5666% -0.3273%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low severe
3 (3.00%) high mild

U256::shl time: [4.9900 ns 5.2624 ns 5.5863 ns]
change: [+4.5167% +8.7856% +13.977%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) high mild
14 (14.00%) high severe

U256::shr time: [5.2371 ns 5.3298 ns 5.4537 ns]
change: [-7.4553% -4.3827% -1.3965%] (p = 0.01 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
4 (4.00%) high mild
8 (8.00%) high severe

U256::ctlz time: [1.9459 ns 2.0639 ns 2.2043 ns]
change: [+6.6873% +11.117% +15.950%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
1 (1.00%) high mild
13 (13.00%) high severe

U256::cttz time: [1.6350 ns 1.6419 ns 1.6494 ns]
change: [-1.1075% -0.4105% +0.3172%] (p = 0.27 > 0.05)
No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe

U256::rotate_left time: [7.7551 ns 7.7892 ns 7.8216 ns]
change: [+0.7183% +1.2633% +1.7584%] (p = 0.00 < 0.05)
Change within noise threshold.

U256::rotate_right time: [6.8823 ns 6.8993 ns 6.9163 ns]
change: [-1.9033% -1.4820% -1.0720%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe

mcseemk · 2022-07-08T09:03:07Z

Please note that I did not test big endian arch. I don't know how to test it to be honest. I'm reasonably sure that div_mod_knuth is architecture-agnostic, but it is better to test.

nlordell · 2022-08-03T18:23:05Z

Sorry, just getting back to this now. The changes look great. I also ran the more comprehensive benchmarks from the intx-division branch and there are some very noticeable improvements, noticeably:

U256::div/####/####     time:   [36.128 ns 36.137 ns 36.148 ns]                                 
                        change: [-68.388% -68.374% -68.360%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  6 (6.00%) high severe

U256::div/####/###      time:   [57.154 ns 57.161 ns 57.168 ns]                               
                        change: [-86.374% -86.366% -86.359%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  8 (8.00%) high severe

U256::div/###/###       time:   [36.128 ns 36.132 ns 36.136 ns]                               
                        change: [-77.010% -76.966% -76.867%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

nlordell · 2022-08-03T18:23:32Z

I'm just going to run the fuzzer for a bit to see if there are any regressions with division, then this should be good to merge.

div_mod_knuth added

2a6d30b

nlordell reviewed Jul 7, 2022

View reviewed changes

nlordell merged commit ed3c109 into nlordell:main Aug 3, 2022

nlordell mentioned this pull request Aug 3, 2022

Improve udivmod Performance #16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

div_mod_knuth added #18

div_mod_knuth added #18

mcseemk commented Jul 5, 2022

nlordell left a comment •

edited

Loading

mcseemk commented Jul 7, 2022

mcseemk commented Jul 8, 2022

mcseemk commented Jul 8, 2022

mcseemk commented Jul 8, 2022

mcseemk commented Jul 8, 2022

nlordell commented Aug 3, 2022

nlordell commented Aug 3, 2022

div_mod_knuth added #18

div_mod_knuth added #18

Conversation

mcseemk commented Jul 5, 2022

nlordell left a comment • edited Loading

Choose a reason for hiding this comment

mcseemk commented Jul 7, 2022

mcseemk commented Jul 8, 2022

mcseemk commented Jul 8, 2022

mcseemk commented Jul 8, 2022

mcseemk commented Jul 8, 2022

nlordell commented Aug 3, 2022

nlordell commented Aug 3, 2022

nlordell left a comment •

edited

Loading