Skip to content

Conversation

@pauleonix
Copy link
Contributor

@pauleonix pauleonix commented Sep 30, 2025

Description

closes #6005 (previous PR #5926 was wrongly closed, go there for slightly outdated performance results until I generate new ones)

Since DeviceMerge needs its data in shared memory either way, it is a good candidate to demonstrate the new BlockLoadToShared interface in the real world. It also comes with some complications since we know the size of the output tile at compile-time, but how many elements to load from each of the ranges to be merged (calculated at runtime using merge-path). This can be solved by introducing additional padding (size known at compile-time) for in between the two dynamically sized input buffers.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@pauleonix pauleonix self-assigned this Sep 30, 2025
@github-project-automation github-project-automation bot moved this to Todo in CCCL Sep 30, 2025
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Sep 30, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Progress in CCCL Sep 30, 2025
@pauleonix pauleonix added the feature request New feature or request. label Sep 30, 2025
@pauleonix

This comment was marked as resolved.

@github-actions

This comment was marked as outdated.

@pauleonix

This comment was marked as resolved.

@pauleonix
Copy link
Contributor Author

pauleonix commented Oct 1, 2025

cub.bench.merge.keys.base

B200 (UBLKCPY)

['/home/pgrossebley/SM_100_merge_keys_final_old.json', '/home/pgrossebley/SM_100_merge_keys_final_newest.json']

base

[0] NVIDIA B200

KeyT{ct} OffsetT{ct} Elements{io} Entropy Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I32 2^16 1 17.394 us 1.28% 17.396 us 1.27% 0.001 us 0.01% SAME
I8 I32 2^20 1 21.517 us 1.56% 21.523 us 1.55% 0.007 us 0.03% SAME
I8 I32 2^24 1 48.394 us 1.51% 46.388 us 1.72% -2.007 us -4.15% FAST
I8 I32 2^28 1 402.803 us 0.18% 370.989 us 0.33% -31.814 us -7.90% FAST
I8 I32 2^16 0.201 17.419 us 2.21% 17.407 us 2.26% -0.012 us -0.07% SAME
I8 I32 2^20 0.201 21.493 us 1.31% 21.474 us 1.30% -0.018 us -0.08% SAME
I8 I32 2^24 0.201 47.076 us 31.55% 45.684 us 1.89% -1.391 us -2.96% FAST
I8 I32 2^28 0.201 400.456 us 0.11% 367.570 us 0.16% -32.887 us -8.21% FAST
I8 I64 2^16 1 17.296 us 2.04% 17.272 us 2.10% -0.025 us -0.14% SAME
I8 I64 2^20 1 21.489 us 1.42% 21.466 us 1.43% -0.023 us -0.11% SAME
I8 I64 2^24 1 48.373 us 1.41% 46.273 us 1.29% -2.100 us -4.34% FAST
I8 I64 2^28 1 405.016 us 0.22% 360.895 us 0.39% -44.121 us -10.89% FAST
I8 I64 2^16 0.201 17.387 us 2.13% 17.375 us 2.16% -0.012 us -0.07% SAME
I8 I64 2^20 0.201 21.524 us 1.32% 21.513 us 1.35% -0.011 us -0.05% SAME
I8 I64 2^24 0.201 47.017 us 2.60% 43.925 us 0.98% -3.092 us -6.58% FAST
I8 I64 2^28 0.201 401.286 us 0.26% 357.393 us 0.08% -43.893 us -10.94% FAST
I16 I32 2^16 1 17.496 us 2.49% 17.494 us 2.46% -0.002 us -0.01% SAME
I16 I32 2^20 1 22.685 us 5.24% 22.614 us 5.05% -0.071 us -0.31% SAME
I16 I32 2^24 1 58.323 us 0.50% 55.206 us 1.96% -3.117 us -5.34% FAST
I16 I32 2^28 1 547.819 us 0.06% 502.775 us 0.06% -45.044 us -8.22% FAST
I16 I32 2^16 0.201 18.066 us 4.31% 18.053 us 4.37% -0.013 us -0.07% SAME
I16 I32 2^20 0.201 22.079 us 2.87% 22.051 us 2.80% -0.027 us -0.12% SAME
I16 I32 2^24 0.201 53.262 us 2.27% 49.579 us 1.87% -3.683 us -6.92% FAST
I16 I32 2^28 0.201 467.185 us 0.23% 432.468 us 0.22% -34.717 us -7.43% FAST
I16 I64 2^16 1 17.646 us 3.44% 17.622 us 3.25% -0.024 us -0.14% SAME
I16 I64 2^20 1 22.627 us 4.90% 22.695 us 5.19% 0.068 us 0.30% SAME
I16 I64 2^24 1 56.446 us 1.44% 56.199 us 0.66% -0.247 us -0.44% SAME
I16 I64 2^28 1 515.182 us 0.10% 511.094 us 0.10% -4.088 us -0.79% FAST
I16 I64 2^16 0.201 18.271 us 4.70% 18.292 us 4.74% 0.021 us 0.11% SAME
I16 I64 2^20 0.201 22.204 us 3.69% 22.223 us 3.79% 0.018 us 0.08% SAME
I16 I64 2^24 0.201 51.373 us 1.65% 49.809 us 1.61% -1.564 us -3.04% FAST
I16 I64 2^28 0.201 447.229 us 0.22% 431.700 us 0.30% -15.529 us -3.47% FAST
I32 I32 2^16 1 19.170 us 3.49% 19.142 us 3.66% -0.028 us -0.15% SAME
I32 I32 2^20 1 24.940 us 4.08% 24.797 us 4.26% -0.143 us -0.57% SAME
I32 I32 2^24 1 58.860 us 1.22% 55.059 us 1.91% -3.800 us -6.46% FAST
I32 I32 2^28 1 541.017 us 0.24% 473.892 us 0.31% -67.125 us -12.41% FAST
I32 I32 2^16 0.201 18.949 us 4.25% 18.965 us 4.20% 0.016 us 0.08% SAME
I32 I32 2^20 0.201 23.810 us 3.38% 23.655 us 2.65% -0.155 us -0.65% SAME
I32 I32 2^24 0.201 52.229 us 0.86% 50.182 us 0.66% -2.047 us -3.92% FAST
I32 I32 2^28 0.201 434.307 us 0.26% 406.415 us 0.18% -27.892 us -6.42% FAST
I32 I64 2^16 1 19.277 us 2.83% 19.280 us 2.77% 0.003 us 0.01% SAME
I32 I64 2^20 1 24.802 us 4.33% 24.825 us 4.29% 0.023 us 0.09% SAME
I32 I64 2^24 1 58.902 us 1.31% 54.845 us 1.35% -4.057 us -6.89% FAST
I32 I64 2^28 1 540.128 us 0.27% 473.345 us 0.24% -66.783 us -12.36% FAST
I32 I64 2^16 0.201 19.235 us 2.86% 19.236 us 2.82% 0.001 us 0.01% SAME
I32 I64 2^20 0.201 23.547 us 2.06% 23.646 us 2.80% 0.099 us 0.42% SAME
I32 I64 2^24 0.201 52.220 us 0.63% 50.241 us 0.83% -1.979 us -3.79% FAST
I32 I64 2^28 0.201 427.869 us 0.25% 404.551 us 0.15% -23.318 us -5.45% FAST
I64 I32 2^16 1 19.802 us 3.38% 19.772 us 3.30% -0.029 us -0.15% SAME
I64 I32 2^20 1 29.018 us 3.37% 29.143 us 2.99% 0.125 us 0.43% SAME
I64 I32 2^24 1 94.063 us 1.18% 87.145 us 0.87% -6.918 us -7.36% FAST
I64 I32 2^28 1 1.119 ms 0.10% 993.853 us 0.10% -124.692 us -11.15% FAST
I64 I32 2^16 0.201 19.581 us 3.35% 19.593 us 3.35% 0.012 us 0.06% SAME
I64 I32 2^20 0.201 26.304 us 2.77% 26.307 us 2.77% 0.003 us 0.01% SAME
I64 I32 2^24 0.201 72.628 us 0.64% 70.535 us 0.78% -2.093 us -2.88% FAST
I64 I32 2^28 0.201 759.537 us 0.81% 735.329 us 0.18% -24.208 us -3.19% FAST
I64 I64 2^16 1 19.962 us 3.83% 19.924 us 3.81% -0.038 us -0.19% SAME
I64 I64 2^20 1 29.133 us 2.87% 29.568 us 1.71% 0.435 us 1.49% SAME
I64 I64 2^24 1 95.011 us 1.05% 87.896 us 1.23% -7.115 us -7.49% FAST
I64 I64 2^28 1 1.125 ms 0.10% 1.006 ms 0.09% -118.789 us -10.56% FAST
I64 I64 2^16 0.201 19.696 us 2.87% 19.654 us 3.04% -0.043 us -0.22% SAME
I64 I64 2^20 0.201 26.255 us 2.97% 26.453 us 3.59% 0.198 us 0.75% SAME
I64 I64 2^24 0.201 72.364 us 1.04% 70.554 us 0.78% -1.810 us -2.50% FAST
I64 I64 2^28 0.201 759.394 us 0.41% 743.428 us 0.17% -15.966 us -2.10% FAST
I128 I32 2^16 1 21.406 us 2.08% 21.412 us 1.92% 0.006 us 0.03% SAME
I128 I32 2^20 1 35.931 us 1.18% 34.416 us 2.93% -1.515 us -4.22% FAST
I128 I32 2^24 1 194.501 us 0.58% 169.468 us 0.47% -25.034 us -12.87% FAST
I128 I32 2^28 1 2.742 ms 0.05% 2.332 ms 0.06% -410.298 us -14.96% FAST
I128 I32 2^16 0.201 21.266 us 2.28% 21.233 us 2.26% -0.033 us -0.16% SAME
I128 I32 2^20 0.201 31.760 us 1.13% 31.728 us 1.05% -0.032 us -0.10% SAME
I128 I32 2^24 0.201 137.654 us 0.84% 119.246 us 0.87% -18.408 us -13.37% FAST
I128 I32 2^28 0.201 1.842 ms 0.11% 1.546 ms 0.10% -295.156 us -16.03% FAST
I128 I64 2^16 1 21.418 us 1.71% 21.405 us 1.66% -0.014 us -0.06% SAME
I128 I64 2^20 1 36.144 us 1.52% 34.442 us 2.77% -1.703 us -4.71% FAST
I128 I64 2^24 1 193.982 us 0.42% 169.322 us 0.48% -24.660 us -12.71% FAST
I128 I64 2^28 1 2.746 ms 0.05% 2.312 ms 0.07% -434.699 us -15.83% FAST
I128 I64 2^16 0.201 21.452 us 1.71% 21.438 us 1.70% -0.014 us -0.07% SAME
I128 I64 2^20 0.201 31.809 us 1.12% 31.792 us 1.13% -0.018 us -0.06% SAME
I128 I64 2^24 0.201 128.324 us 0.58% 120.125 us 0.62% -8.199 us -6.39% FAST
I128 I64 2^28 0.201 1.691 ms 0.09% 1.555 ms 0.09% -136.649 us -8.08% FAST
F32 I32 2^16 1 19.099 us 3.85% 19.110 us 3.85% 0.010 us 0.05% SAME
F32 I32 2^20 1 24.849 us 4.07% 25.005 us 3.77% 0.156 us 0.63% SAME
F32 I32 2^24 1 58.793 us 1.16% 54.934 us 1.52% -3.859 us -6.56% FAST
F32 I32 2^28 1 541.378 us 0.16% 474.168 us 0.25% -67.210 us -12.41% FAST
F32 I32 2^16 0.201 18.942 us 4.10% 18.944 us 4.07% 0.001 us 0.01% SAME
F32 I32 2^20 0.201 23.614 us 2.71% 23.713 us 3.21% 0.099 us 0.42% SAME
F32 I32 2^24 0.201 52.199 us 0.90% 50.176 us 0.37% -2.024 us -3.88% FAST
F32 I32 2^28 0.201 433.923 us 0.24% 406.155 us 0.21% -27.769 us -6.40% FAST
F32 I64 2^16 1 19.249 us 2.43% 19.263 us 2.39% 0.013 us 0.07% SAME
F32 I64 2^20 1 24.746 us 4.31% 24.816 us 4.21% 0.070 us 0.28% SAME
F32 I64 2^24 1 59.110 us 1.47% 55.055 us 1.57% -4.054 us -6.86% FAST
F32 I64 2^28 1 541.175 us 0.22% 473.683 us 0.23% -67.492 us -12.47% FAST
F32 I64 2^16 0.201 19.241 us 2.68% 19.244 us 2.66% 0.003 us 0.01% SAME
F32 I64 2^20 0.201 23.883 us 3.56% 24.018 us 4.00% 0.135 us 0.57% SAME
F32 I64 2^24 0.201 52.361 us 0.98% 50.299 us 0.94% -2.062 us -3.94% FAST
F32 I64 2^28 0.201 427.292 us 0.17% 404.589 us 0.12% -22.703 us -5.31% FAST
F64 I32 2^16 1 19.791 us 3.14% 19.777 us 3.12% -0.014 us -0.07% SAME
F64 I32 2^20 1 29.117 us 3.07% 29.027 us 3.04% -0.090 us -0.31% SAME
F64 I32 2^24 1 94.534 us 1.17% 87.098 us 0.50% -7.436 us -7.87% FAST
F64 I32 2^28 1 1.121 ms 0.09% 993.475 us 0.13% -127.259 us -11.35% FAST
F64 I32 2^16 0.201 19.559 us 3.60% 19.554 us 3.48% -0.005 us -0.03% SAME
F64 I32 2^20 0.201 26.141 us 2.82% 26.147 us 2.91% 0.006 us 0.02% SAME
F64 I32 2^24 0.201 72.613 us 0.64% 70.497 us 0.82% -2.116 us -2.91% FAST
F64 I32 2^28 0.201 754.944 us 0.18% 729.513 us 0.20% -25.432 us -3.37% FAST
F64 I64 2^16 1 19.861 us 3.52% 19.843 us 3.51% -0.019 us -0.09% SAME
F64 I64 2^20 1 29.227 us 2.72% 29.165 us 2.91% -0.062 us -0.21% SAME
F64 I64 2^24 1 95.350 us 0.60% 88.640 us 1.27% -6.709 us -7.04% FAST
F64 I64 2^28 1 1.130 ms 0.12% 1.002 ms 0.11% -127.572 us -11.29% FAST
F64 I64 2^16 0.201 19.696 us 2.80% 19.690 us 2.75% -0.006 us -0.03% SAME
F64 I64 2^20 0.201 26.343 us 3.16% 26.432 us 3.35% 0.088 us 0.34% SAME
F64 I64 2^24 0.201 72.679 us 0.45% 70.618 us 0.46% -2.060 us -2.83% FAST
F64 I64 2^28 0.201 762.743 us 0.50% 738.925 us 0.16% -23.818 us -3.12% FAST

Summary

  • Total Matches: 112
    • Pass (diff <= min_noise): 55
    • Unknown (infinite noise): 0
    • Failure (diff > min_noise): 57
H200 (UBLKCPY)

['/home/pgrossebley/SM_90_merge_keys_final_old.json', '/home/pgrossebley/SM_90_merge_keys_final_newest.json']

base

[0] NVIDIA H200

KeyT{ct} OffsetT{ct} Elements{io} Entropy Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I32 2^16 1 13.736 us 3.37% 13.593 us 3.42% -0.143 us -1.04% SAME
I8 I32 2^20 1 19.347 us 2.20% 19.126 us 2.22% -0.221 us -1.14% SAME
I8 I32 2^24 1 48.911 us 1.85% 45.290 us 1.12% -3.621 us -7.40% FAST
I8 I32 2^28 1 476.472 us 0.16% 414.281 us 0.15% -62.191 us -13.05% FAST
I8 I32 2^16 0.201 13.539 us 3.34% 13.372 us 3.45% -0.167 us -1.23% SAME
I8 I32 2^20 0.201 18.479 us 2.42% 18.238 us 2.43% -0.241 us -1.30% SAME
I8 I32 2^24 0.201 47.450 us 0.98% 43.503 us 1.10% -3.946 us -8.32% FAST
I8 I32 2^28 0.201 471.208 us 0.17% 408.964 us 0.15% -62.244 us -13.21% FAST
I8 I64 2^16 1 13.774 us 2.97% 13.636 us 3.04% -0.138 us -1.00% SAME
I8 I64 2^20 1 19.400 us 2.22% 19.198 us 2.31% -0.202 us -1.04% SAME
I8 I64 2^24 1 48.807 us 0.98% 45.439 us 1.03% -3.368 us -6.90% FAST
I8 I64 2^28 1 467.005 us 0.18% 417.632 us 0.15% -49.373 us -10.57% FAST
I8 I64 2^16 0.201 13.373 us 3.01% 13.235 us 2.96% -0.138 us -1.03% SAME
I8 I64 2^20 0.201 18.717 us 2.47% 18.561 us 2.43% -0.156 us -0.83% SAME
I8 I64 2^24 0.201 47.285 us 0.99% 43.601 us 1.02% -3.684 us -7.79% FAST
I8 I64 2^28 0.201 462.325 us 0.18% 412.036 us 0.15% -50.289 us -10.88% FAST
I16 I32 2^16 1 14.572 us 2.93% 14.287 us 3.01% -0.284 us -1.95% SAME
I16 I32 2^20 1 21.188 us 2.30% 20.820 us 2.25% -0.369 us -1.74% SAME
I16 I32 2^24 1 61.033 us 0.84% 54.499 us 0.87% -6.534 us -10.71% FAST
I16 I32 2^28 1 670.628 us 0.14% 557.570 us 0.16% -113.058 us -16.86% FAST
I16 I32 2^16 0.201 14.252 us 3.03% 14.017 us 3.06% -0.235 us -1.65% SAME
I16 I32 2^20 0.201 20.273 us 2.48% 19.812 us 2.21% -0.461 us -2.27% FAST
I16 I32 2^24 0.201 54.623 us 1.73% 48.918 us 1.05% -5.705 us -10.44% FAST
I16 I32 2^28 0.201 563.986 us 0.14% 476.165 us 0.18% -87.821 us -15.57% FAST
I16 I64 2^16 1 14.634 us 3.16% 14.238 us 3.26% -0.397 us -2.71% SAME
I16 I64 2^20 1 21.493 us 2.18% 20.759 us 2.22% -0.734 us -3.42% FAST
I16 I64 2^24 1 60.061 us 0.83% 53.481 us 0.90% -6.580 us -10.96% FAST
I16 I64 2^28 1 641.592 us 0.13% 544.175 us 0.16% -97.417 us -15.18% FAST
I16 I64 2^16 0.201 14.435 us 3.06% 14.085 us 3.11% -0.350 us -2.42% SAME
I16 I64 2^20 0.201 20.714 us 2.50% 19.840 us 3.80% -0.875 us -4.22% FAST
I16 I64 2^24 0.201 54.133 us 0.96% 48.050 us 1.10% -6.083 us -11.24% FAST
I16 I64 2^28 0.201 551.713 us 0.16% 464.094 us 0.19% -87.619 us -15.88% FAST
I32 I32 2^16 1 15.498 us 3.60% 15.448 us 3.61% -0.051 us -0.33% SAME
I32 I32 2^20 1 22.230 us 2.06% 22.145 us 2.09% -0.084 us -0.38% SAME
I32 I32 2^24 1 59.742 us 1.01% 56.884 us 1.26% -2.858 us -4.78% FAST
I32 I32 2^28 1 636.943 us 0.19% 575.835 us 0.19% -61.108 us -9.59% FAST
I32 I32 2^16 0.201 14.920 us 3.33% 15.011 us 3.40% 0.091 us 0.61% SAME
I32 I32 2^20 0.201 21.088 us 2.25% 21.071 us 2.23% -0.017 us -0.08% SAME
I32 I32 2^24 0.201 55.442 us 1.28% 57.253 us 1.20% 1.811 us 3.27% SLOW
I32 I32 2^28 0.201 581.417 us 0.19% 589.692 us 0.17% 8.275 us 1.42% SLOW
I32 I64 2^16 1 15.631 us 3.11% 15.366 us 3.14% -0.265 us -1.70% SAME
I32 I64 2^20 1 22.573 us 2.07% 22.330 us 2.17% -0.243 us -1.08% SAME
I32 I64 2^24 1 60.376 us 0.96% 57.318 us 1.24% -3.058 us -5.06% FAST
I32 I64 2^28 1 639.606 us 0.19% 576.387 us 0.18% -63.219 us -9.88% FAST
I32 I64 2^16 0.201 15.023 us 3.13% 14.858 us 4.80% -0.165 us -1.10% SAME
I32 I64 2^20 0.201 21.498 us 2.16% 21.102 us 2.15% -0.396 us -1.84% SAME
I32 I64 2^24 0.201 55.562 us 1.29% 57.482 us 1.20% 1.920 us 3.46% SLOW
I32 I64 2^28 0.201 580.662 us 0.18% 589.767 us 0.17% 9.105 us 1.57% SLOW
I64 I32 2^16 1 16.523 us 2.98% 16.551 us 3.05% 0.029 us 0.17% SAME
I64 I32 2^20 1 26.223 us 1.87% 26.139 us 1.85% -0.085 us -0.32% SAME
I64 I32 2^24 1 99.440 us 0.72% 94.698 us 0.95% -4.743 us -4.77% FAST
I64 I32 2^28 1 1.265 ms 0.24% 1.171 ms 0.50% -94.754 us -7.49% FAST
I64 I32 2^16 0.201 15.927 us 3.26% 15.989 us 3.23% 0.062 us 0.39% SAME
I64 I32 2^20 0.201 23.450 us 2.11% 23.534 us 2.25% 0.085 us 0.36% SAME
I64 I32 2^24 0.201 87.082 us 0.88% 92.073 us 0.76% 4.991 us 5.73% SLOW
I64 I32 2^28 0.201 1.108 ms 0.50% 1.169 ms 0.50% 61.199 us 5.52% SLOW
I64 I64 2^16 1 16.638 us 2.95% 16.662 us 2.97% 0.024 us 0.14% SAME
I64 I64 2^20 1 26.093 us 1.71% 26.096 us 3.17% 0.003 us 0.01% SAME
I64 I64 2^24 1 99.416 us 0.70% 95.766 us 0.92% -3.650 us -3.67% FAST
I64 I64 2^28 1 1.267 ms 0.21% 1.184 ms 0.43% -83.871 us -6.62% FAST
I64 I64 2^16 0.201 15.775 us 2.78% 15.839 us 2.87% 0.064 us 0.40% SAME
I64 I64 2^20 0.201 23.342 us 1.97% 23.626 us 1.95% 0.283 us 1.21% SAME
I64 I64 2^24 0.201 86.882 us 0.78% 91.931 us 0.70% 5.050 us 5.81% SLOW
I64 I64 2^28 0.201 1.108 ms 0.50% 1.169 ms 0.50% 61.830 us 5.58% SLOW
I128 I32 2^16 1 17.557 us 2.53% 17.660 us 2.51% 0.102 us 0.58% SAME
I128 I32 2^20 1 32.318 us 1.56% 32.244 us 1.76% -0.074 us -0.23% SAME
I128 I32 2^24 1 199.430 us 0.33% 185.326 us 0.86% -14.104 us -7.07% FAST
I128 I32 2^28 1 2.868 ms 0.28% 2.624 ms 0.46% -244.183 us -8.51% FAST
I128 I32 2^16 0.201 16.647 us 2.62% 16.778 us 2.63% 0.131 us 0.79% SAME
I128 I32 2^20 0.201 27.330 us 1.85% 28.223 us 1.74% 0.893 us 3.27% SLOW
I128 I32 2^24 0.201 155.857 us 0.59% 159.877 us 0.58% 4.020 us 2.58% SLOW
I128 I32 2^28 0.201 2.240 ms 0.50% 2.286 ms 0.50% 45.724 us 2.04% SLOW
I128 I64 2^16 1 17.811 us 4.19% 17.916 us 2.47% 0.105 us 0.59% SAME
I128 I64 2^20 1 32.329 us 1.54% 32.124 us 1.74% -0.205 us -0.64% SAME
I128 I64 2^24 1 200.509 us 0.32% 184.955 us 0.43% -15.554 us -7.76% FAST
I128 I64 2^28 1 2.885 ms 0.31% 2.620 ms 0.47% -265.687 us -9.21% FAST
I128 I64 2^16 0.201 16.857 us 2.62% 16.945 us 2.59% 0.087 us 0.52% SAME
I128 I64 2^20 0.201 27.551 us 1.80% 28.475 us 1.70% 0.925 us 3.36% SLOW
I128 I64 2^24 0.201 156.255 us 0.56% 160.423 us 0.56% 4.168 us 2.67% SLOW
I128 I64 2^28 0.201 2.248 ms 0.50% 2.289 ms 0.50% 40.335 us 1.79% SLOW
F32 I32 2^16 1 15.178 us 3.07% 15.033 us 3.04% -0.145 us -0.96% SAME
F32 I32 2^20 1 22.380 us 2.02% 22.265 us 2.05% -0.115 us -0.51% SAME
F32 I32 2^24 1 59.790 us 0.99% 57.102 us 1.30% -2.689 us -4.50% FAST
F32 I32 2^28 1 636.927 us 0.18% 575.572 us 0.18% -61.354 us -9.63% FAST
F32 I32 2^16 0.201 14.812 us 3.37% 14.753 us 3.37% -0.059 us -0.40% SAME
F32 I32 2^20 0.201 21.031 us 2.24% 20.962 us 2.26% -0.068 us -0.32% SAME
F32 I32 2^24 0.201 55.284 us 1.26% 57.297 us 1.18% 2.013 us 3.64% SLOW
F32 I32 2^28 0.201 581.101 us 0.18% 589.383 us 0.16% 8.281 us 1.43% SLOW
F32 I64 2^16 1 15.370 us 3.16% 15.152 us 3.23% -0.217 us -1.41% SAME
F32 I64 2^20 1 22.578 us 2.05% 22.307 us 2.08% -0.271 us -1.20% SAME
F32 I64 2^24 1 60.077 us 1.03% 57.404 us 1.29% -2.673 us -4.45% FAST
F32 I64 2^28 1 639.634 us 0.17% 576.288 us 0.18% -63.347 us -9.90% FAST
F32 I64 2^16 0.201 14.995 us 3.36% 14.862 us 3.29% -0.134 us -0.89% SAME
F32 I64 2^20 0.201 21.384 us 2.21% 21.039 us 2.25% -0.346 us -1.62% SAME
F32 I64 2^24 0.201 55.720 us 1.26% 57.530 us 1.23% 1.810 us 3.25% SLOW
F32 I64 2^28 0.201 580.515 us 0.18% 589.459 us 0.16% 8.944 us 1.54% SLOW
F64 I32 2^16 1 16.323 us 2.90% 16.359 us 2.95% 0.035 us 0.22% SAME
F64 I32 2^20 1 26.172 us 1.78% 26.070 us 1.92% -0.102 us -0.39% SAME
F64 I32 2^24 1 98.869 us 0.78% 94.636 us 0.96% -4.233 us -4.28% FAST
F64 I32 2^28 1 1.258 ms 0.21% 1.178 ms 0.46% -80.339 us -6.39% FAST
F64 I32 2^16 0.201 15.667 us 2.78% 15.740 us 2.77% 0.073 us 0.46% SAME
F64 I32 2^20 0.201 23.157 us 1.94% 23.243 us 3.73% 0.086 us 0.37% SAME
F64 I32 2^24 0.201 86.679 us 0.84% 91.686 us 0.74% 5.007 us 5.78% SLOW
F64 I32 2^28 0.201 1.108 ms 0.50% 1.169 ms 0.50% 61.209 us 5.53% SLOW
F64 I64 2^16 1 16.087 us 2.59% 16.111 us 2.64% 0.023 us 0.15% SAME
F64 I64 2^20 1 26.067 us 1.66% 26.065 us 1.74% -0.003 us -0.01% SAME
F64 I64 2^24 1 99.183 us 0.72% 99.961 us 0.82% 0.778 us 0.78% SLOW
F64 I64 2^28 1 1.264 ms 0.20% 1.251 ms 0.42% -12.314 us -0.97% FAST
F64 I64 2^16 0.201 15.757 us 2.70% 15.823 us 2.73% 0.066 us 0.42% SAME
F64 I64 2^20 0.201 23.387 us 1.92% 23.618 us 2.00% 0.231 us 0.99% SAME
F64 I64 2^24 0.201 86.896 us 0.86% 93.167 us 0.72% 6.271 us 7.22% SLOW
F64 I64 2^28 0.201 1.108 ms 0.50% 1.184 ms 0.50% 75.936 us 6.85% SLOW

Summary

  • Total Matches: 112
    • Pass (diff <= min_noise): 51
    • Unknown (infinite noise): 0
    • Failure (diff > min_noise): 61
A100 (LDGSTS)

['/home/pgrossebley/SM_80_merge_keys_final_old.json', '/home/pgrossebley/SM_80_merge_keys_final_newest.json']

base

[0] NVIDIA A100 80GB PCIe

KeyT{ct} OffsetT{ct} Elements{io} Entropy Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I32 2^16 1 17.444 us 2.55% 17.314 us 2.79% -0.130 us -0.75% SAME
I8 I32 2^20 1 24.700 us 1.69% 24.336 us 1.93% -0.364 us -1.47% SAME
I8 I32 2^24 1 80.854 us 0.66% 71.104 us 0.82% -9.750 us -12.06% FAST
I8 I32 2^28 1 912.188 us 0.43% 744.099 us 0.39% -168.089 us -18.43% FAST
I8 I32 2^16 0.201 17.491 us 2.37% 17.318 us 2.49% -0.173 us -0.99% SAME
I8 I32 2^20 0.201 23.377 us 2.02% 22.822 us 2.07% -0.555 us -2.37% FAST
I8 I32 2^24 0.201 78.375 us 0.69% 68.069 us 0.80% -10.305 us -13.15% FAST
I8 I32 2^28 0.201 897.439 us 0.26% 724.454 us 0.31% -172.985 us -19.28% FAST
I8 I64 2^16 1 17.854 us 2.86% 17.732 us 2.81% -0.122 us -0.69% SAME
I8 I64 2^20 1 25.227 us 2.01% 24.898 us 1.97% -0.329 us -1.30% SAME
I8 I64 2^24 1 81.502 us 0.70% 71.551 us 0.83% -9.951 us -12.21% FAST
I8 I64 2^28 1 917.073 us 0.43% 752.120 us 0.40% -164.953 us -17.99% FAST
I8 I64 2^16 0.201 17.300 us 2.45% 17.075 us 2.86% -0.225 us -1.30% SAME
I8 I64 2^20 0.201 23.788 us 1.97% 23.270 us 2.06% -0.518 us -2.18% FAST
I8 I64 2^24 0.201 78.938 us 0.66% 68.432 us 0.75% -10.507 us -13.31% FAST
I8 I64 2^28 0.201 902.893 us 0.41% 728.647 us 0.31% -174.246 us -19.30% FAST
I16 I32 2^16 1 18.573 us 2.47% 18.187 us 2.75% -0.386 us -2.08% SAME
I16 I32 2^20 1 27.067 us 1.92% 26.772 us 1.68% -0.295 us -1.09% SAME
I16 I32 2^24 1 96.975 us 0.57% 90.929 us 0.60% -6.046 us -6.23% FAST
I16 I32 2^28 1 1.187 ms 0.50% 1.030 ms 0.35% -157.282 us -13.25% FAST
I16 I32 2^16 0.201 18.619 us 2.50% 18.312 us 2.52% -0.307 us -1.65% SAME
I16 I32 2^20 0.201 25.532 us 1.80% 25.081 us 2.07% -0.451 us -1.77% SAME
I16 I32 2^24 0.201 86.742 us 0.66% 83.531 us 0.88% -3.211 us -3.70% FAST
I16 I32 2^28 0.201 1.021 ms 0.38% 891.606 us 0.59% -129.643 us -12.69% FAST
I16 I64 2^16 1 18.938 us 2.75% 18.817 us 2.73% -0.122 us -0.64% SAME
I16 I64 2^20 1 26.892 us 1.78% 27.207 us 1.89% 0.315 us 1.17% SAME
I16 I64 2^24 1 96.715 us 0.55% 94.363 us 0.55% -2.352 us -2.43% FAST
I16 I64 2^28 1 1.188 ms 0.42% 1.080 ms 0.50% -107.934 us -9.09% FAST
I16 I64 2^16 0.201 18.619 us 2.50% 18.536 us 2.39% -0.083 us -0.45% SAME
I16 I64 2^20 0.201 25.566 us 1.75% 25.539 us 1.73% -0.027 us -0.11% SAME
I16 I64 2^24 0.201 86.688 us 0.64% 85.383 us 0.79% -1.305 us -1.51% FAST
I16 I64 2^28 0.201 1.024 ms 0.46% 920.148 us 0.50% -103.858 us -10.14% FAST
I32 I32 2^16 1 19.055 us 2.77% 18.911 us 2.77% -0.144 us -0.75% SAME
I32 I32 2^20 1 28.709 us 1.56% 28.953 us 1.66% 0.243 us 0.85% SAME
I32 I32 2^24 1 112.452 us 1.06% 118.758 us 1.11% 6.306 us 5.61% SLOW
I32 I32 2^28 1 1.437 ms 0.61% 1.449 ms 0.56% 11.897 us 0.83% SLOW
I32 I32 2^16 0.201 18.045 us 3.09% 18.107 us 2.89% 0.062 us 0.34% SAME
I32 I32 2^20 0.201 26.919 us 1.81% 27.007 us 1.89% 0.089 us 0.33% SAME
I32 I32 2^24 0.201 110.412 us 1.03% 113.505 us 1.16% 3.093 us 2.80% SLOW
I32 I32 2^28 0.201 1.411 ms 0.62% 1.426 ms 0.59% 15.435 us 1.09% SLOW
I32 I64 2^16 1 18.989 us 2.89% 18.901 us 2.97% -0.088 us -0.47% SAME
I32 I64 2^20 1 28.997 us 1.71% 29.278 us 1.76% 0.281 us 0.97% SAME
I32 I64 2^24 1 112.460 us 1.09% 119.723 us 1.14% 7.263 us 6.46% SLOW
I32 I64 2^28 1 1.439 ms 0.59% 1.456 ms 0.54% 16.441 us 1.14% SLOW
I32 I64 2^16 0.201 18.170 us 3.17% 18.250 us 2.70% 0.080 us 0.44% SAME
I32 I64 2^20 0.201 27.147 us 1.93% 27.149 us 1.92% 0.002 us 0.01% SAME
I32 I64 2^24 0.201 110.699 us 1.05% 114.313 us 1.19% 3.614 us 3.27% SLOW
I32 I64 2^28 0.201 1.411 ms 0.62% 1.429 ms 0.59% 18.722 us 1.33% SLOW
I64 I32 2^16 1 20.232 us 2.61% 20.626 us 2.65% 0.394 us 1.95% SAME
I64 I32 2^20 1 35.724 us 1.62% 36.268 us 1.43% 0.544 us 1.52% SLOW
I64 I32 2^24 1 197.665 us 0.70% 205.779 us 0.78% 8.114 us 4.10% SLOW
I64 I32 2^28 1 2.858 ms 0.51% 2.941 ms 0.50% 82.921 us 2.90% SLOW
I64 I32 2^16 0.201 19.559 us 2.69% 19.698 us 2.77% 0.139 us 0.71% SAME
I64 I32 2^20 0.201 32.159 us 1.71% 32.660 us 1.42% 0.501 us 1.56% SLOW
I64 I32 2^24 0.201 192.818 us 0.76% 196.244 us 0.80% 3.427 us 1.78% SLOW
I64 I32 2^28 0.201 2.789 ms 0.56% 2.830 ms 0.50% 41.408 us 1.48% SLOW
I64 I64 2^16 1 20.624 us 2.67% 20.924 us 2.77% 0.300 us 1.45% SAME
I64 I64 2^20 1 35.893 us 1.42% 36.563 us 1.37% 0.670 us 1.87% SLOW
I64 I64 2^24 1 198.112 us 0.73% 206.572 us 0.78% 8.460 us 4.27% SLOW
I64 I64 2^28 1 2.876 ms 0.78% 2.952 ms 0.61% 75.648 us 2.63% SLOW
I64 I64 2^16 0.201 19.370 us 3.08% 19.603 us 2.74% 0.234 us 1.21% SAME
I64 I64 2^20 0.201 32.589 us 1.56% 33.168 us 1.53% 0.578 us 1.77% SLOW
I64 I64 2^24 0.201 193.110 us 0.77% 197.182 us 0.79% 4.072 us 2.11% SLOW
I64 I64 2^28 0.201 2.787 ms 0.54% 2.835 ms 0.50% 47.743 us 1.71% SLOW
I128 I32 2^16 1 21.323 us 2.90% 21.294 us 3.08% -0.030 us -0.14% SAME
I128 I32 2^20 1 47.631 us 1.27% 49.871 us 1.44% 2.239 us 4.70% SLOW
I128 I32 2^24 1 382.559 us 0.53% 388.651 us 0.55% 6.092 us 1.59% SLOW
I128 I32 2^28 1 6.193 ms 1.45% 6.156 ms 0.97% -36.939 us -0.60% SAME
I128 I32 2^16 0.201 20.875 us 3.00% 20.868 us 2.99% -0.007 us -0.03% SAME
I128 I32 2^20 0.201 43.512 us 1.39% 44.597 us 1.44% 1.086 us 2.49% SLOW
I128 I32 2^24 0.201 362.705 us 0.51% 366.162 us 0.50% 3.456 us 0.95% SLOW
I128 I32 2^28 0.201 5.605 ms 0.52% 5.673 ms 0.50% 67.545 us 1.21% SLOW
I128 I64 2^16 1 22.041 us 2.85% 21.960 us 2.79% -0.081 us -0.37% SAME
I128 I64 2^20 1 48.159 us 1.25% 50.089 us 1.50% 1.930 us 4.01% SLOW
I128 I64 2^24 1 382.577 us 0.50% 389.773 us 0.56% 7.197 us 1.88% SLOW
I128 I64 2^28 1 6.291 ms 1.26% 6.337 ms 0.82% 45.761 us 0.73% SAME
I128 I64 2^16 0.201 20.748 us 3.22% 20.616 us 3.07% -0.131 us -0.63% SAME
I128 I64 2^20 0.201 43.313 us 1.55% 44.545 us 1.42% 1.232 us 2.84% SLOW
I128 I64 2^24 0.201 364.649 us 0.55% 366.225 us 0.50% 1.576 us 0.43% SAME
I128 I64 2^28 0.201 5.642 ms 0.51% 5.694 ms 0.50% 52.524 us 0.93% SLOW
F32 I32 2^16 1 18.987 us 3.19% 18.866 us 3.20% -0.121 us -0.64% SAME
F32 I32 2^20 1 28.606 us 1.53% 28.930 us 1.62% 0.324 us 1.13% SAME
F32 I32 2^24 1 112.477 us 1.06% 118.777 us 1.15% 6.300 us 5.60% SLOW
F32 I32 2^28 1 1.479 ms 1.77% 1.464 ms 0.98% -15.147 us -1.02% FAST
F32 I32 2^16 0.201 17.811 us 2.86% 17.915 us 3.78% 0.104 us 0.58% SAME
F32 I32 2^20 0.201 26.693 us 1.67% 26.683 us 1.60% -0.010 us -0.04% SAME
F32 I32 2^24 0.201 110.321 us 1.04% 113.358 us 1.17% 3.037 us 2.75% SLOW
F32 I32 2^28 0.201 1.420 ms 0.61% 1.431 ms 0.57% 10.939 us 0.77% SLOW
F32 I64 2^16 1 18.831 us 3.33% 18.674 us 2.97% -0.156 us -0.83% SAME
F32 I64 2^20 1 28.525 us 1.72% 28.906 us 1.70% 0.382 us 1.34% SAME
F32 I64 2^24 1 112.345 us 1.07% 118.846 us 1.12% 6.501 us 5.79% SLOW
F32 I64 2^28 1 1.466 ms 1.40% 1.466 ms 0.88% 0.260 us 0.02% SAME
F32 I64 2^16 0.201 18.363 us 3.22% 18.218 us 3.42% -0.145 us -0.79% SAME
F32 I64 2^20 0.201 26.721 us 1.70% 26.716 us 1.65% -0.006 us -0.02% SAME
F32 I64 2^24 0.201 110.360 us 1.04% 113.588 us 1.12% 3.228 us 2.92% SLOW
F32 I64 2^28 0.201 1.418 ms 0.62% 1.433 ms 0.59% 15.537 us 1.10% SLOW
F64 I32 2^16 1 20.322 us 3.06% 20.618 us 2.85% 0.297 us 1.46% SAME
F64 I32 2^20 1 35.618 us 1.52% 36.253 us 1.41% 0.635 us 1.78% SLOW
F64 I32 2^24 1 198.903 us 0.74% 206.801 us 0.77% 7.898 us 3.97% SLOW
F64 I32 2^28 1 2.921 ms 0.95% 3.003 ms 1.08% 81.778 us 2.80% SLOW
F64 I32 2^16 0.201 19.088 us 2.85% 19.371 us 3.14% 0.283 us 1.48% SAME
F64 I32 2^20 0.201 32.819 us 1.56% 32.978 us 1.46% 0.159 us 0.49% SAME
F64 I32 2^24 0.201 193.041 us 0.78% 196.522 us 0.79% 3.481 us 1.80% SLOW
F64 I32 2^28 0.201 2.805 ms 0.53% 2.858 ms 0.50% 53.033 us 1.89% SLOW
F64 I64 2^16 1 20.261 us 3.52% 20.670 us 3.27% 0.408 us 2.02% SAME
F64 I64 2^20 1 35.575 us 1.56% 36.117 us 1.35% 0.543 us 1.53% SLOW
F64 I64 2^24 1 198.396 us 0.73% 210.584 us 0.85% 12.189 us 6.14% SLOW
F64 I64 2^28 1 2.958 ms 1.08% 3.141 ms 0.88% 182.976 us 6.19% SLOW
F64 I64 2^16 0.201 19.526 us 3.19% 19.781 us 3.24% 0.255 us 1.30% SAME
F64 I64 2^20 0.201 32.101 us 1.68% 32.652 us 1.39% 0.552 us 1.72% SLOW
F64 I64 2^24 0.201 193.139 us 0.76% 200.658 us 0.84% 7.519 us 3.89% SLOW
F64 I64 2^28 0.201 2.807 ms 0.52% 2.901 ms 0.50% 93.688 us 3.34% SLOW

Summary

  • Total Matches: 112
    • Pass (diff <= min_noise): 47
    • Unknown (infinite noise): 0
    • Failure (diff > min_noise): 65
RTX 5090 (UBLKCPY)

['/home/pgrossebley/SM_120_merge_keys_old_final_CTK13.json', '/home/pgrossebley/SM_120_merge_keys_newest_final_CTK13.json']

base

[0] NVIDIA GeForce RTX 5090

KeyT{ct} OffsetT{ct} Elements{io} Entropy Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I32 2^16 1 11.429 us 11.26% 11.402 us 9.92% -0.027 us -0.24% SAME
I8 I32 2^20 1 16.369 us 0.53% 16.366 us 0.59% -0.004 us -0.02% SAME
I8 I32 2^24 1 41.906 us 2.44% 38.935 us 0.58% -2.971 us -7.09% FAST
I8 I32 2^28 1 435.918 us 0.41% 426.919 us 0.42% -8.999 us -2.06% FAST
I8 I32 2^16 0.201 10.337 us 3.97% 10.437 us 5.56% 0.100 us 0.96% SAME
I8 I32 2^20 0.201 16.155 us 1.59% 16.089 us 1.58% -0.067 us -0.41% SAME
I8 I32 2^24 0.201 40.198 us 3.21% 37.754 us 2.71% -2.444 us -6.08% FAST
I8 I32 2^28 0.201 416.804 us 0.39% 405.870 us 0.42% -10.933 us -2.62% FAST
I8 I64 2^16 1 11.614 us 11.60% 11.875 us 10.22% 0.261 us 2.25% SAME
I8 I64 2^20 1 16.238 us 1.46% 16.187 us 1.56% -0.051 us -0.32% SAME
I8 I64 2^24 1 40.967 us 0.35% 38.922 us 0.40% -2.045 us -4.99% FAST
I8 I64 2^28 1 435.873 us 0.40% 426.688 us 0.44% -9.185 us -2.11% FAST
I8 I64 2^16 0.201 11.721 us 8.40% 12.029 us 5.67% 0.308 us 2.63% SAME
I8 I64 2^20 0.201 16.362 us 0.64% 16.351 us 0.78% -0.012 us -0.07% SAME
I8 I64 2^24 0.201 40.720 us 3.52% 38.035 us 2.71% -2.684 us -6.59% FAST
I8 I64 2^28 0.201 418.763 us 0.33% 405.005 us 0.43% -13.758 us -3.29% FAST
I16 I32 2^16 1 14.283 us 2.26% 12.727 us 6.59% -1.555 us -10.89% FAST
I16 I32 2^20 1 13.152 us 7.98% 12.915 us 8.49% -0.237 us -1.80% SAME
I16 I32 2^24 1 57.437 us 0.92% 55.410 us 0.87% -2.027 us -3.53% FAST
I16 I32 2^28 1 804.760 us 0.21% 798.871 us 0.23% -5.888 us -0.73% FAST
I16 I32 2^16 0.201 13.203 us 7.69% 12.392 us 3.62% -0.811 us -6.14% FAST
I16 I32 2^20 0.201 16.373 us 0.48% 16.364 us 0.64% -0.009 us -0.06% SAME
I16 I32 2^24 0.201 55.415 us 0.89% 55.343 us 0.82% -0.072 us -0.13% SAME
I16 I32 2^28 0.201 772.169 us 0.24% 770.532 us 0.24% -1.638 us -0.21% SAME
I16 I64 2^16 1 14.332 us 0.29% 13.926 us 5.86% -0.406 us -2.83% FAST
I16 I64 2^20 1 16.402 us 2.91% 16.458 us 3.56% 0.056 us 0.34% SAME
I16 I64 2^24 1 58.193 us 1.77% 56.158 us 1.84% -2.035 us -3.50% FAST
I16 I64 2^28 1 804.580 us 0.23% 799.094 us 0.25% -5.487 us -0.68% FAST
I16 I64 2^16 0.201 12.458 us 4.54% 12.312 us 1.82% -0.145 us -1.17% SAME
I16 I64 2^20 0.201 16.227 us 1.47% 16.166 us 1.58% -0.061 us -0.38% SAME
I16 I64 2^24 0.201 55.382 us 0.91% 55.172 us 1.65% -0.209 us -0.38% SAME
I16 I64 2^28 0.201 772.380 us 0.25% 771.032 us 0.24% -1.348 us -0.17% SAME
I32 I32 2^16 1 14.334 us 0.19% 14.335 us 0.12% 0.001 us 0.01% SAME
I32 I32 2^20 1 18.517 us 2.15% 18.610 us 2.96% 0.094 us 0.51% SAME
I32 I32 2^24 1 103.801 us 1.31% 102.287 us 1.31% -1.514 us -1.46% FAST
I32 I32 2^28 1 1.528 ms 0.16% 1.517 ms 0.16% -10.758 us -0.70% FAST
I32 I32 2^16 0.201 14.295 us 2.00% 12.425 us 4.15% -1.870 us -13.08% FAST
I32 I32 2^20 0.201 16.643 us 5.18% 16.230 us 1.99% -0.413 us -2.48% FAST
I32 I32 2^24 0.201 99.909 us 1.47% 101.184 us 1.49% 1.275 us 1.28% SAME
I32 I32 2^28 0.201 1.497 ms 0.15% 1.492 ms 0.15% -4.992 us -0.33% FAST
I32 I64 2^16 1 14.332 us 0.27% 14.334 us 0.24% 0.001 us 0.01% SAME
I32 I64 2^20 1 18.841 us 3.95% 18.898 us 4.09% 0.056 us 0.30% SAME
I32 I64 2^24 1 103.542 us 1.39% 102.105 us 1.42% -1.438 us -1.39% SAME
I32 I64 2^28 1 1.528 ms 0.16% 1.517 ms 0.15% -11.021 us -0.72% FAST
I32 I64 2^16 0.201 14.333 us 0.39% 13.949 us 5.72% -0.384 us -2.68% FAST
I32 I64 2^20 0.201 18.848 us 4.11% 18.997 us 4.26% 0.149 us 0.79% SAME
I32 I64 2^24 0.201 100.630 us 1.42% 101.664 us 1.39% 1.034 us 1.03% SAME
I32 I64 2^28 0.201 1.498 ms 0.14% 1.493 ms 0.14% -5.348 us -0.36% FAST
I64 I32 2^16 1 14.342 us 1.30% 14.373 us 2.08% 0.030 us 0.21% SAME
I64 I32 2^20 1 22.401 us 1.11% 22.360 us 1.13% -0.042 us -0.19% SAME
I64 I32 2^24 1 197.146 us 0.57% 196.265 us 0.59% -0.881 us -0.45% SAME
I64 I32 2^28 1 3.054 ms 0.10% 3.039 ms 0.10% -15.222 us -0.50% FAST
I64 I32 2^16 0.201 13.689 us 7.18% 12.501 us 5.15% -1.187 us -8.67% FAST
I64 I32 2^20 0.201 22.319 us 1.21% 22.498 us 2.96% 0.180 us 0.80% SAME
I64 I32 2^24 0.201 193.581 us 0.73% 195.248 us 0.68% 1.667 us 0.86% SLOW
I64 I32 2^28 0.201 2.995 ms 0.10% 2.988 ms 0.10% -7.158 us -0.24% FAST
I64 I64 2^16 1 14.700 us 5.34% 14.851 us 6.00% 0.152 us 1.03% SAME
I64 I64 2^20 1 22.255 us 1.29% 22.316 us 2.29% 0.061 us 0.28% SAME
I64 I64 2^24 1 198.965 us 0.70% 197.998 us 0.69% -0.967 us -0.49% SAME
I64 I64 2^28 1 3.055 ms 0.09% 3.040 ms 0.09% -15.018 us -0.49% FAST
I64 I64 2^16 0.201 12.473 us 4.84% 12.385 us 3.58% -0.088 us -0.71% SAME
I64 I64 2^20 0.201 22.181 us 1.14% 22.176 us 1.09% -0.005 us -0.02% SAME
I64 I64 2^24 0.201 194.248 us 0.61% 195.746 us 0.63% 1.498 us 0.77% SLOW
I64 I64 2^28 0.201 2.995 ms 0.09% 2.988 ms 0.10% -6.919 us -0.23% FAST
I128 I32 2^16 1 16.376 us 0.36% 16.378 us 0.34% 0.001 us 0.01% SAME
I128 I32 2^20 1 35.052 us 3.08% 35.254 us 3.13% 0.201 us 0.57% SAME
I128 I32 2^24 1 385.861 us 0.45% 386.261 us 0.45% 0.400 us 0.10% SAME
I128 I32 2^28 1 6.163 ms 0.06% 6.149 ms 0.07% -14.466 us -0.23% FAST
I128 I32 2^16 0.201 16.382 us 0.19% 16.381 us 0.37% -0.001 us -0.01% SAME
I128 I32 2^20 0.201 33.559 us 2.90% 33.517 us 2.82% -0.042 us -0.12% SAME
I128 I32 2^24 0.201 377.670 us 0.43% 378.117 us 0.42% 0.448 us 0.12% SAME
I128 I32 2^28 0.201 6.042 ms 0.06% 6.030 ms 0.08% -11.844 us -0.20% FAST
I128 I64 2^16 1 16.382 us 0.39% 16.382 us 0.33% 0.000 us 0.00% SAME
I128 I64 2^20 1 34.013 us 2.89% 34.612 us 0.89% 0.599 us 1.76% SLOW
I128 I64 2^24 1 387.368 us 0.40% 387.368 us 0.41% 0.001 us 0.00% SAME
I128 I64 2^28 1 6.165 ms 0.06% 6.151 ms 0.06% -14.395 us -0.23% FAST
I128 I64 2^16 0.201 15.644 us 6.58% 15.723 us 6.12% 0.079 us 0.50% SAME
I128 I64 2^20 0.201 33.780 us 2.99% 33.236 us 3.35% -0.544 us -1.61% SAME
I128 I64 2^24 0.201 378.207 us 0.46% 378.599 us 0.44% 0.392 us 0.10% SAME
I128 I64 2^28 0.201 6.042 ms 0.07% 6.032 ms 0.07% -10.436 us -0.17% FAST
F32 I32 2^16 1 14.329 us 0.38% 14.010 us 5.28% -0.318 us -2.22% FAST
F32 I32 2^20 1 18.194 us 3.47% 16.999 us 6.18% -1.195 us -6.57% FAST
F32 I32 2^24 1 103.936 us 1.38% 102.896 us 1.35% -1.040 us -1.00% SAME
F32 I32 2^28 1 1.528 ms 0.14% 1.517 ms 0.15% -10.798 us -0.71% FAST
F32 I32 2^16 0.201 14.334 us 0.25% 13.393 us 7.61% -0.941 us -6.57% FAST
F32 I32 2^20 0.201 18.317 us 2.36% 17.824 us 6.42% -0.493 us -2.69% FAST
F32 I32 2^24 0.201 102.106 us 1.33% 102.701 us 1.46% 0.594 us 0.58% SAME
F32 I32 2^28 0.201 1.498 ms 0.14% 1.492 ms 0.14% -5.161 us -0.34% FAST
F32 I64 2^16 1 14.334 us 0.34% 14.336 us 0.62% 0.001 us 0.01% SAME
F32 I64 2^20 1 19.448 us 4.13% 19.484 us 4.11% 0.036 us 0.19% SAME
F32 I64 2^24 1 105.411 us 1.43% 103.859 us 1.37% -1.552 us -1.47% FAST
F32 I64 2^28 1 1.529 ms 0.15% 1.518 ms 0.15% -11.320 us -0.74% FAST
F32 I64 2^16 0.201 14.196 us 3.65% 12.882 us 7.22% -1.314 us -9.26% FAST
F32 I64 2^20 0.201 16.518 us 5.16% 16.298 us 3.48% -0.221 us -1.34% SAME
F32 I64 2^24 0.201 101.624 us 1.36% 102.772 us 1.36% 1.148 us 1.13% SAME
F32 I64 2^28 0.201 1.498 ms 0.14% 1.493 ms 0.14% -5.511 us -0.37% FAST
F64 I32 2^16 1 14.585 us 4.56% 14.841 us 5.91% 0.256 us 1.76% SAME
F64 I32 2^20 1 24.537 us 0.55% 24.542 us 0.54% 0.005 us 0.02% SAME
F64 I32 2^24 1 196.735 us 0.71% 195.994 us 0.70% -0.741 us -0.38% SAME
F64 I32 2^28 1 3.054 ms 0.10% 3.039 ms 0.10% -15.191 us -0.50% FAST
F64 I32 2^16 0.201 15.112 us 6.58% 15.243 us 6.67% 0.131 us 0.87% SAME
F64 I32 2^20 0.201 22.361 us 2.08% 22.385 us 2.23% 0.024 us 0.11% SAME
F64 I32 2^24 0.201 191.650 us 0.67% 192.510 us 0.64% 0.860 us 0.45% SAME
F64 I32 2^28 0.201 2.992 ms 0.09% 2.985 ms 0.09% -7.031 us -0.23% FAST
F64 I64 2^16 1 14.634 us 4.97% 14.720 us 5.45% 0.085 us 0.58% SAME
F64 I64 2^20 1 24.395 us 1.12% 24.446 us 1.04% 0.051 us 0.21% SAME
F64 I64 2^24 1 197.107 us 0.66% 196.543 us 0.66% -0.564 us -0.29% SAME
F64 I64 2^28 1 3.054 ms 0.10% 3.039 ms 0.09% -14.755 us -0.48% FAST
F64 I64 2^16 0.201 15.110 us 6.58% 15.117 us 6.59% 0.007 us 0.05% SAME
F64 I64 2^20 0.201 22.745 us 3.82% 22.351 us 1.17% -0.394 us -1.73% FAST
F64 I64 2^24 0.201 191.845 us 0.67% 192.470 us 0.61% 0.625 us 0.33% SAME
F64 I64 2^28 0.201 2.992 ms 0.09% 2.986 ms 0.10% -6.587 us -0.22% FAST

Summary

  • Total Matches: 112
    • Pass (diff <= min_noise): 62
    • Unknown (infinite noise): 0
    • Failure (diff > min_noise): 50

@pauleonix
Copy link
Contributor Author

pauleonix commented Oct 1, 2025

cub.bench.merge.pairs.base

B200 (UBLKCPY)

['/home/pgrossebley/SM_100_merge_pairs_final_old.json', '/home/pgrossebley/SM_100_merge_pairs_final_newest.json']

base

[0] NVIDIA B200

KeyT{ct} ValueT{ct} OffsetT{ct} Elements{io} Entropy Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I8 I32 2^16 1 19.602 us 2.92% 19.444 us 1.06% -0.158 us -0.81% SAME
I8 I8 I32 2^20 1 24.156 us 3.99% 23.524 us 1.17% -0.631 us -2.61% FAST
I8 I8 I32 2^24 1 97.525 us 0.71% 71.244 us 1.37% -26.282 us -26.95% FAST
I8 I8 I32 2^28 1 1.198 ms 0.10% 744.065 us 0.14% -453.848 us -37.89% FAST
I8 I8 I32 2^16 0.201 19.484 us 1.86% 19.379 us 2.02% -0.105 us -0.54% SAME
I8 I8 I32 2^20 0.201 23.536 us 1.01% 23.550 us 1.09% 0.014 us 0.06% SAME
I8 I8 I32 2^24 0.201 95.577 us 0.86% 69.384 us 1.51% -26.194 us -27.41% FAST
I8 I8 I32 2^28 0.201 1.193 ms 0.05% 737.128 us 0.15% -456.003 us -38.22% FAST
I8 I8 I64 2^16 1 19.676 us 3.62% 19.396 us 1.63% -0.279 us -1.42% SAME
I8 I8 I64 2^20 1 24.301 us 4.25% 23.575 us 1.41% -0.725 us -2.99% FAST
I8 I8 I64 2^24 1 100.093 us 1.07% 93.623 us 0.83% -6.470 us -6.46% FAST
I8 I8 I64 2^28 1 1.261 ms 0.08% 1.126 ms 0.10% -134.911 us -10.70% FAST
I8 I8 I64 2^16 0.201 19.540 us 2.48% 19.424 us 2.35% -0.116 us -0.60% SAME
I8 I8 I64 2^20 0.201 23.542 us 1.21% 23.551 us 1.34% 0.010 us 0.04% SAME
I8 I8 I64 2^24 0.201 99.169 us 0.50% 91.160 us 0.50% -8.009 us -8.08% FAST
I8 I8 I64 2^28 0.201 1.257 ms 0.06% 1.119 ms 0.08% -138.023 us -10.98% FAST
I8 I16 I32 2^16 1 19.909 us 4.48% 19.419 us 1.58% -0.491 us -2.47% FAST
I8 I16 I32 2^20 1 25.055 us 3.65% 23.586 us 1.38% -1.469 us -5.86% FAST
I8 I16 I32 2^24 1 100.719 us 1.13% 72.232 us 1.61% -28.487 us -28.28% FAST
I8 I16 I32 2^28 1 1.227 ms 0.10% 753.259 us 0.14% -473.274 us -38.59% FAST
I8 I16 I32 2^16 0.201 19.483 us 2.16% 19.250 us 2.35% -0.233 us -1.19% SAME
I8 I16 I32 2^20 0.201 23.530 us 1.51% 23.531 us 1.35% 0.000 us 0.00% SAME
I8 I16 I32 2^24 0.201 99.202 us 0.63% 70.629 us 0.56% -28.572 us -28.80% FAST
I8 I16 I32 2^28 0.201 1.221 ms 0.07% 750.413 us 0.10% -470.996 us -38.56% FAST
I8 I16 I64 2^16 1 19.894 us 4.58% 19.317 us 1.87% -0.578 us -2.90% FAST
I8 I16 I64 2^20 1 25.041 us 3.81% 23.623 us 1.61% -1.418 us -5.66% FAST
I8 I16 I64 2^24 1 103.828 us 0.66% 72.472 us 1.57% -31.356 us -30.20% FAST
I8 I16 I64 2^28 1 1.293 ms 0.11% 757.417 us 0.13% -535.349 us -41.41% FAST
I8 I16 I64 2^16 0.201 19.500 us 3.56% 19.439 us 1.55% -0.061 us -0.31% SAME
I8 I16 I64 2^20 0.201 23.877 us 3.59% 23.595 us 1.52% -0.281 us -1.18% SAME
I8 I16 I64 2^24 0.201 101.765 us 0.91% 70.557 us 0.57% -31.209 us -30.67% FAST
I8 I16 I64 2^28 0.201 1.289 ms 0.06% 754.960 us 0.09% -534.027 us -41.43% FAST
I8 I32 I32 2^16 1 19.443 us 0.81% 19.346 us 1.81% -0.097 us -0.50% SAME
I8 I32 I32 2^20 1 25.626 us 1.66% 25.642 us 1.06% 0.016 us 0.06% SAME
I8 I32 I32 2^24 1 86.651 us 1.54% 79.424 us 0.95% -7.227 us -8.34% FAST
I8 I32 I32 2^28 1 997.591 us 0.22% 879.188 us 0.10% -118.403 us -11.87% FAST
I8 I32 I32 2^16 0.201 19.375 us 1.65% 19.456 us 1.26% 0.081 us 0.42% SAME
I8 I32 I32 2^20 0.201 25.615 us 1.15% 25.515 us 1.76% -0.101 us -0.39% SAME
I8 I32 I32 2^24 0.201 85.022 us 0.48% 78.427 us 1.09% -6.595 us -7.76% FAST
I8 I32 I32 2^28 0.201 988.963 us 0.24% 874.111 us 0.11% -114.852 us -11.61% FAST
I8 I32 I64 2^16 1 19.468 us 1.25% 19.453 us 1.01% -0.015 us -0.08% SAME
I8 I32 I64 2^20 1 25.799 us 1.79% 25.673 us 1.47% -0.127 us -0.49% SAME
I8 I32 I64 2^24 1 85.869 us 1.31% 80.044 us 1.41% -5.826 us -6.78% FAST
I8 I32 I64 2^28 1 1.006 ms 0.09% 882.342 us 0.11% -124.145 us -12.33% FAST
I8 I32 I64 2^16 0.201 19.403 us 1.92% 19.433 us 2.27% 0.030 us 0.16% SAME
I8 I32 I64 2^20 0.201 25.684 us 1.38% 25.523 us 1.73% -0.160 us -0.62% SAME
I8 I32 I64 2^24 0.201 85.067 us 0.43% 78.507 us 1.01% -6.560 us -7.71% FAST
I8 I32 I64 2^28 0.201 1.002 ms 0.11% 877.825 us 0.11% -123.694 us -12.35% FAST
I8 I64 I32 2^16 1 18.124 us 5.58% 18.774 us 5.16% 0.650 us 3.59% SAME
I8 I64 I32 2^20 1 26.431 us 2.97% 26.161 us 3.42% -0.270 us -1.02% SAME
I8 I64 I32 2^24 1 95.376 us 1.26% 85.705 us 1.25% -9.671 us -10.14% FAST
I8 I64 I32 2^28 1 1.156 ms 0.13% 995.373 us 0.13% -160.370 us -13.88% FAST
I8 I64 I32 2^16 0.201 17.650 us 4.60% 17.748 us 4.37% 0.098 us 0.55% SAME
I8 I64 I32 2^20 0.201 25.615 us 0.65% 25.624 us 0.72% 0.009 us 0.03% SAME
I8 I64 I32 2^24 0.201 93.300 us 0.44% 84.950 us 0.63% -8.350 us -8.95% FAST
I8 I64 I32 2^28 0.201 1.137 ms 0.10% 979.592 us 0.10% -157.806 us -13.87% FAST
I8 I64 I64 2^16 1 17.831 us 5.47% 18.598 us 5.47% 0.767 us 4.30% SAME
I8 I64 I64 2^20 1 26.885 us 3.40% 26.753 us 3.78% -0.132 us -0.49% SAME
I8 I64 I64 2^24 1 95.928 us 1.26% 86.545 us 1.25% -9.383 us -9.78% FAST
I8 I64 I64 2^28 1 1.158 ms 0.12% 995.931 us 0.11% -161.607 us -13.96% FAST
I8 I64 I64 2^16 0.201 17.663 us 3.68% 17.530 us 2.80% -0.133 us -0.76% SAME
I8 I64 I64 2^20 0.201 25.730 us 1.46% 25.614 us 1.26% -0.117 us -0.45% SAME
I8 I64 I64 2^24 0.201 93.822 us 1.12% 85.021 us 0.94% -8.801 us -9.38% FAST
I8 I64 I64 2^28 0.201 1.141 ms 0.11% 978.476 us 0.11% -162.778 us -14.26% FAST
I16 I8 I32 2^16 1 20.747 us 5.03% 19.477 us 1.06% -1.270 us -6.12% FAST
I16 I8 I32 2^20 1 26.879 us 3.48% 25.773 us 5.98% -1.106 us -4.11% FAST
I16 I8 I32 2^24 1 119.725 us 0.30% 109.878 us 0.68% -9.847 us -8.22% FAST
I16 I8 I32 2^28 1 1.480 ms 0.06% 1.358 ms 0.08% -122.343 us -8.27% FAST
I16 I8 I32 2^16 0.201 19.833 us 4.06% 19.603 us 2.62% -0.231 us -1.16% SAME
I16 I8 I32 2^20 0.201 25.567 us 4.53% 24.731 us 4.23% -0.836 us -3.27% SAME
I16 I8 I32 2^24 0.201 106.502 us 0.81% 101.099 us 0.73% -5.403 us -5.07% FAST
I16 I8 I32 2^28 0.201 1.322 ms 0.09% 1.221 ms 0.09% -101.017 us -7.64% FAST
I16 I8 I64 2^16 1 20.884 us 5.24% 19.493 us 1.34% -1.392 us -6.66% FAST
I16 I8 I64 2^20 1 27.194 us 3.00% 25.544 us 3.55% -1.650 us -6.07% FAST
I16 I8 I64 2^24 1 119.760 us 0.34% 85.577 us 1.14% -34.184 us -28.54% FAST
I16 I8 I64 2^28 1 1.483 ms 0.07% 955.382 us 0.03% -527.904 us -35.59% FAST
I16 I8 I64 2^16 0.201 20.138 us 5.00% 19.760 us 3.65% -0.378 us -1.88% SAME
I16 I8 I64 2^20 0.201 25.526 us 4.81% 24.852 us 3.95% -0.674 us -2.64% SAME
I16 I8 I64 2^24 0.201 110.944 us 0.86% 78.853 us 0.13% -32.091 us -28.93% FAST
I16 I8 I64 2^28 0.201 1.375 ms 0.07% 848.957 us 0.06% -526.015 us -38.26% FAST
I16 I16 I32 2^16 1 20.467 us 5.29% 19.636 us 2.95% -0.831 us -4.06% FAST
I16 I16 I32 2^20 1 26.036 us 3.12% 26.913 us 3.82% 0.877 us 3.37% SLOW
I16 I16 I32 2^24 1 122.319 us 0.76% 85.362 us 1.02% -36.956 us -30.21% FAST
I16 I16 I32 2^28 1 1.513 ms 0.07% 942.845 us 0.08% -570.371 us -37.69% FAST
I16 I16 I32 2^16 0.201 20.266 us 5.17% 20.215 us 3.66% -0.051 us -0.25% SAME
I16 I16 I32 2^20 0.201 25.997 us 2.75% 24.426 us 4.68% -1.570 us -6.04% FAST
I16 I16 I32 2^24 0.201 112.969 us 0.88% 78.041 us 1.12% -34.928 us -30.92% FAST
I16 I16 I32 2^28 0.201 1.403 ms 0.09% 861.140 us 0.17% -542.300 us -38.64% FAST
I16 I16 I64 2^16 1 20.493 us 5.78% 19.785 us 3.06% -0.708 us -3.46% FAST
I16 I16 I64 2^20 1 27.186 us 2.99% 27.047 us 3.21% -0.139 us -0.51% SAME
I16 I16 I64 2^24 1 121.402 us 0.76% 86.975 us 0.48% -34.426 us -28.36% FAST
I16 I16 I64 2^28 1 1.494 ms 0.07% 949.526 us 0.08% -544.658 us -36.45% FAST
I16 I16 I64 2^16 0.201 20.421 us 4.68% 20.361 us 4.30% -0.060 us -0.29% SAME
I16 I16 I64 2^20 0.201 26.143 us 4.22% 24.342 us 3.91% -1.801 us -6.89% FAST
I16 I16 I64 2^24 0.201 112.418 us 1.21% 78.244 us 1.05% -34.174 us -30.40% FAST
I16 I16 I64 2^28 0.201 1.385 ms 0.09% 841.879 us 0.14% -543.426 us -39.23% FAST
I16 I32 I32 2^16 1 19.567 us 2.19% 19.730 us 2.98% 0.163 us 0.83% SAME
I16 I32 I32 2^20 1 27.444 us 2.20% 27.394 us 2.38% -0.050 us -0.18% SAME
I16 I32 I32 2^24 1 100.163 us 1.13% 80.377 us 1.40% -19.786 us -19.75% FAST
I16 I32 I32 2^28 1 1.185 ms 0.12% 867.267 us 0.16% -317.669 us -26.81% FAST
I16 I32 I32 2^16 0.201 20.034 us 3.75% 18.789 us 5.95% -1.245 us -6.21% FAST
I16 I32 I32 2^20 0.201 26.014 us 4.38% 24.476 us 4.23% -1.539 us -5.91% FAST
I16 I32 I32 2^24 0.201 94.900 us 0.75% 74.646 us 0.60% -20.253 us -21.34% FAST
I16 I32 I32 2^28 0.201 1.100 ms 0.12% 778.660 us 0.17% -321.777 us -29.24% FAST
I16 I32 I64 2^16 1 19.900 us 3.52% 19.925 us 3.58% 0.025 us 0.13% SAME
I16 I32 I64 2^20 1 27.597 us 1.25% 27.608 us 1.27% 0.011 us 0.04% SAME
I16 I32 I64 2^24 1 97.770 us 1.03% 80.249 us 1.52% -17.521 us -17.92% FAST
I16 I32 I64 2^28 1 1.154 ms 0.12% 868.068 us 0.16% -285.479 us -24.75% FAST
I16 I32 I64 2^16 0.201 20.415 us 4.67% 19.127 us 7.81% -1.288 us -6.31% FAST
I16 I32 I64 2^20 0.201 25.395 us 6.06% 24.999 us 4.05% -0.396 us -1.56% SAME
I16 I32 I64 2^24 0.201 92.904 us 0.79% 74.700 us 0.56% -18.204 us -19.59% FAST
I16 I32 I64 2^28 0.201 1.073 ms 0.14% 781.467 us 0.19% -291.187 us -27.15% FAST
I16 I64 I32 2^16 1 19.064 us 7.23% 19.542 us 5.57% 0.478 us 2.51% SAME
I16 I64 I32 2^20 1 29.372 us 2.70% 28.703 us 3.67% -0.669 us -2.28% SAME
I16 I64 I32 2^24 1 110.261 us 0.92% 102.636 us 1.04% -7.625 us -6.92% FAST
I16 I64 I32 2^28 1 1.372 ms 0.11% 1.248 ms 0.10% -123.652 us -9.01% FAST
I16 I64 I32 2^16 0.201 18.756 us 7.21% 18.946 us 7.60% 0.190 us 1.01% SAME
I16 I64 I32 2^20 0.201 26.390 us 3.28% 26.699 us 3.42% 0.309 us 1.17% SAME
I16 I64 I32 2^24 0.201 101.801 us 1.02% 92.935 us 0.76% -8.866 us -8.71% FAST
I16 I64 I32 2^28 0.201 1.234 ms 0.12% 1.080 ms 0.14% -153.345 us -12.43% FAST
I16 I64 I64 2^16 1 19.221 us 8.02% 19.841 us 5.61% 0.621 us 3.23% SAME
I16 I64 I64 2^20 1 29.680 us 1.32% 29.166 us 3.24% -0.514 us -1.73% FAST
I16 I64 I64 2^24 1 126.208 us 0.55% 103.635 us 0.60% -22.573 us -17.89% FAST
I16 I64 I64 2^28 1 1.648 ms 0.08% 1.262 ms 0.11% -386.549 us -23.45% FAST
I16 I64 I64 2^16 0.201 18.730 us 4.97% 19.158 us 7.26% 0.427 us 2.28% SAME
I16 I64 I64 2^20 0.201 29.271 us 2.61% 27.160 us 2.88% -2.111 us -7.21% FAST
I16 I64 I64 2^24 0.201 117.292 us 0.77% 93.138 us 0.51% -24.154 us -20.59% FAST
I16 I64 I64 2^28 0.201 1.498 ms 0.16% 1.090 ms 0.11% -407.691 us -27.22% FAST
I32 I8 I32 2^16 1 21.356 us 2.23% 21.481 us 1.03% 0.125 us 0.59% SAME
I32 I8 I32 2^20 1 28.224 us 3.47% 27.641 us 0.63% -0.583 us -2.06% FAST
I32 I8 I32 2^24 1 93.733 us 1.10% 85.925 us 0.84% -7.808 us -8.33% FAST
I32 I8 I32 2^28 1 1.070 ms 0.04% 969.626 us 0.08% -100.440 us -9.39% FAST
I32 I8 I32 2^16 0.201 21.036 us 3.71% 20.197 us 7.20% -0.840 us -3.99% FAST
I32 I8 I32 2^20 0.201 27.638 us 1.19% 25.962 us 3.65% -1.676 us -6.06% FAST
I32 I8 I32 2^24 0.201 85.091 us 0.48% 76.917 us 0.61% -8.174 us -9.61% FAST
I32 I8 I32 2^28 0.201 958.701 us 0.13% 843.184 us 0.17% -115.516 us -12.05% FAST
I32 I8 I64 2^16 1 21.466 us 1.48% 21.465 us 1.53% -0.000 us -0.00% SAME
I32 I8 I64 2^20 1 27.692 us 1.63% 27.731 us 1.36% 0.040 us 0.14% SAME
I32 I8 I64 2^24 1 93.656 us 1.62% 88.044 us 1.05% -5.613 us -5.99% FAST
I32 I8 I64 2^28 1 1.071 ms 0.11% 1.009 ms 0.14% -62.213 us -5.81% FAST
I32 I8 I64 2^16 0.201 21.289 us 2.48% 20.688 us 5.67% -0.601 us -2.82% FAST
I32 I8 I64 2^20 0.201 27.648 us 1.07% 26.893 us 3.93% -0.755 us -2.73% FAST
I32 I8 I64 2^24 0.201 85.190 us 0.64% 79.215 us 1.04% -5.975 us -7.01% FAST
I32 I8 I64 2^28 0.201 961.156 us 0.11% 874.524 us 0.19% -86.633 us -9.01% FAST
I32 I16 I32 2^16 1 21.495 us 0.88% 21.463 us 1.68% -0.032 us -0.15% SAME
I32 I16 I32 2^20 1 27.632 us 1.08% 27.648 us 0.57% 0.016 us 0.06% SAME
I32 I16 I32 2^24 1 98.551 us 1.20% 79.565 us 1.47% -18.987 us -19.27% FAST
I32 I16 I32 2^28 1 1.179 ms 0.10% 858.364 us 0.15% -320.490 us -27.19% FAST
I32 I16 I32 2^16 0.201 21.301 us 2.27% 20.503 us 6.10% -0.798 us -3.75% FAST
I32 I16 I32 2^20 0.201 26.803 us 3.97% 25.625 us 1.93% -1.177 us -4.39% FAST
I32 I16 I32 2^24 0.201 89.579 us 0.99% 70.810 us 0.69% -18.769 us -20.95% FAST
I32 I16 I32 2^28 0.201 1.046 ms 0.08% 744.869 us 0.14% -300.812 us -28.77% FAST
I32 I16 I64 2^16 1 21.398 us 1.71% 21.242 us 3.28% -0.156 us -0.73% SAME
I32 I16 I64 2^20 1 27.757 us 1.51% 27.662 us 1.01% -0.095 us -0.34% SAME
I32 I16 I64 2^24 1 98.674 us 0.86% 80.313 us 1.03% -18.361 us -18.61% FAST
I32 I16 I64 2^28 1 1.172 ms 0.12% 864.561 us 0.16% -307.245 us -26.22% FAST
I32 I16 I64 2^16 0.201 21.458 us 1.53% 20.773 us 5.19% -0.685 us -3.19% FAST
I32 I16 I64 2^20 0.201 26.621 us 4.10% 25.604 us 1.41% -1.017 us -3.82% FAST
I32 I16 I64 2^24 0.201 89.306 us 0.67% 71.060 us 1.05% -18.247 us -20.43% FAST
I32 I16 I64 2^28 0.201 1.043 ms 0.06% 748.964 us 0.13% -294.491 us -28.22% FAST
I32 I32 I32 2^16 1 19.479 us 3.40% 19.559 us 3.95% 0.081 us 0.41% SAME
I32 I32 I32 2^20 1 27.048 us 3.90% 26.570 us 4.28% -0.478 us -1.77% SAME
I32 I32 I32 2^24 1 94.504 us 1.27% 91.794 us 1.57% -2.710 us -2.87% FAST
I32 I32 I32 2^28 1 1.117 ms 0.10% 1.080 ms 0.10% -37.543 us -3.36% FAST
I32 I32 I32 2^16 0.201 19.282 us 3.01% 19.316 us 2.89% 0.034 us 0.18% SAME
I32 I32 I32 2^20 0.201 25.611 us 1.40% 25.580 us 0.62% -0.032 us -0.12% SAME
I32 I32 I32 2^24 0.201 83.137 us 0.77% 81.613 us 1.26% -1.525 us -1.83% FAST
I32 I32 I32 2^28 0.201 954.846 us 0.17% 936.245 us 0.15% -18.601 us -1.95% FAST
I32 I32 I64 2^16 1 19.842 us 5.03% 20.327 us 5.84% 0.484 us 2.44% SAME
I32 I32 I64 2^20 1 27.353 us 3.79% 27.390 us 3.39% 0.037 us 0.14% SAME
I32 I32 I64 2^24 1 95.071 us 0.60% 86.811 us 0.76% -8.260 us -8.69% FAST
I32 I32 I64 2^28 1 1.115 ms 0.10% 988.386 us 0.07% -126.699 us -11.36% FAST
I32 I32 I64 2^16 0.201 19.335 us 1.91% 19.330 us 2.27% -0.005 us -0.02% SAME
I32 I32 I64 2^20 0.201 25.709 us 2.21% 25.578 us 1.17% -0.131 us -0.51% SAME
I32 I32 I64 2^24 0.201 83.799 us 1.28% 76.461 us 1.37% -7.338 us -8.76% FAST
I32 I32 I64 2^28 0.201 967.087 us 0.15% 840.357 us 0.17% -126.730 us -13.10% FAST
I32 I64 I32 2^16 1 19.595 us 4.16% 20.276 us 5.75% 0.681 us 3.47% SAME
I32 I64 I32 2^20 1 27.653 us 0.84% 27.667 us 1.22% 0.014 us 0.05% SAME
I32 I64 I32 2^24 1 119.527 us 0.61% 108.488 us 1.04% -11.039 us -9.24% FAST
I32 I64 I32 2^28 1 1.535 ms 0.10% 1.365 ms 0.10% -170.528 us -11.11% FAST
I32 I64 I32 2^16 0.201 19.387 us 1.73% 19.525 us 2.44% 0.138 us 0.71% SAME
I32 I64 I32 2^20 0.201 27.636 us 1.05% 27.627 us 1.68% -0.009 us -0.03% SAME
I32 I64 I32 2^24 0.201 105.714 us 0.65% 96.697 us 1.20% -9.017 us -8.53% FAST
I32 I64 I32 2^28 0.201 1.325 ms 0.12% 1.183 ms 0.10% -142.575 us -10.76% FAST
I32 I64 I64 2^16 1 19.675 us 3.85% 20.295 us 5.52% 0.620 us 3.15% SAME
I32 I64 I64 2^20 1 27.854 us 1.78% 27.805 us 1.66% -0.049 us -0.18% SAME
I32 I64 I64 2^24 1 119.591 us 0.51% 108.895 us 0.96% -10.696 us -8.94% FAST
I32 I64 I64 2^28 1 1.534 ms 0.10% 1.369 ms 0.09% -165.282 us -10.77% FAST
I32 I64 I64 2^16 0.201 19.420 us 1.49% 19.580 us 2.78% 0.160 us 0.82% SAME
I32 I64 I64 2^20 0.201 27.670 us 1.10% 27.474 us 2.36% -0.195 us -0.71% SAME
I32 I64 I64 2^24 0.201 104.465 us 1.18% 96.979 us 1.21% -7.486 us -7.17% FAST
I32 I64 I64 2^28 0.201 1.317 ms 0.13% 1.184 ms 0.11% -133.213 us -10.11% FAST
I64 I8 I32 2^16 1 20.731 us 6.37% 21.602 us 6.36% 0.871 us 4.20% SAME
I64 I8 I32 2^20 1 32.706 us 3.68% 31.438 us 2.20% -1.269 us -3.88% FAST
I64 I8 I32 2^24 1 128.047 us 0.50% 101.601 us 0.60% -26.446 us -20.65% FAST
I64 I8 I32 2^28 1 1.643 ms 0.08% 1.241 ms 0.10% -401.649 us -24.45% FAST
I64 I8 I32 2^16 0.201 19.517 us 2.20% 19.981 us 3.85% 0.464 us 2.38% SLOW
I64 I8 I32 2^20 0.201 29.839 us 4.56% 28.309 us 2.38% -1.530 us -5.13% FAST
I64 I8 I32 2^24 0.201 111.573 us 0.54% 87.478 us 1.05% -24.095 us -21.60% FAST
I64 I8 I32 2^28 0.201 1.403 ms 0.11% 1.006 ms 0.11% -396.879 us -28.28% FAST
I64 I8 I64 2^16 1 20.745 us 6.36% 21.746 us 6.55% 1.001 us 4.83% SAME
I64 I8 I64 2^20 1 31.505 us 1.99% 31.667 us 1.42% 0.163 us 0.52% SAME
I64 I8 I64 2^24 1 115.984 us 0.59% 103.776 us 0.57% -12.208 us -10.53% FAST
I64 I8 I64 2^28 1 1.455 ms 0.07% 1.265 ms 0.07% -190.370 us -13.08% FAST
I64 I8 I64 2^16 0.201 19.854 us 4.33% 20.166 us 4.30% 0.312 us 1.57% SAME
I64 I8 I64 2^20 0.201 28.735 us 3.20% 28.329 us 2.90% -0.406 us -1.41% SAME
I64 I8 I64 2^24 0.201 99.270 us 0.36% 88.955 us 0.64% -10.315 us -10.39% FAST
I64 I8 I64 2^28 0.201 1.176 ms 0.09% 1.019 ms 0.10% -157.458 us -13.39% FAST
I64 I16 I32 2^16 1 20.876 us 6.10% 21.116 us 4.95% 0.240 us 1.15% SAME
I64 I16 I32 2^20 1 33.286 us 2.54% 30.757 us 2.72% -2.529 us -7.60% FAST
I64 I16 I32 2^24 1 130.233 us 0.42% 103.468 us 0.28% -26.765 us -20.55% FAST
I64 I16 I32 2^28 1 1.679 ms 0.08% 1.262 ms 0.10% -417.073 us -24.84% FAST
I64 I16 I32 2^16 0.201 19.668 us 3.30% 19.799 us 4.31% 0.131 us 0.67% SAME
I64 I16 I32 2^20 0.201 30.100 us 3.06% 28.426 us 3.37% -1.674 us -5.56% FAST
I64 I16 I32 2^24 0.201 114.776 us 0.94% 89.118 us 0.53% -25.658 us -22.36% FAST
I64 I16 I32 2^28 0.201 1.463 ms 0.18% 1.031 ms 0.11% -431.753 us -29.51% FAST
I64 I16 I64 2^16 1 21.044 us 6.41% 21.182 us 5.18% 0.139 us 0.66% SAME
I64 I16 I64 2^20 1 33.117 us 3.28% 31.633 us 1.61% -1.485 us -4.48% FAST
I64 I16 I64 2^24 1 130.331 us 0.56% 104.935 us 1.20% -25.396 us -19.49% FAST
I64 I16 I64 2^28 1 1.684 ms 0.08% 1.285 ms 0.09% -399.471 us -23.72% FAST
I64 I16 I64 2^16 0.201 19.632 us 2.62% 19.618 us 2.71% -0.014 us -0.07% SAME
I64 I16 I64 2^20 0.201 30.404 us 3.42% 28.482 us 3.05% -1.922 us -6.32% FAST
I64 I16 I64 2^24 0.201 114.028 us 0.84% 89.418 us 1.12% -24.610 us -21.58% FAST
I64 I16 I64 2^28 0.201 1.454 ms 0.13% 1.042 ms 0.11% -412.079 us -28.35% FAST
I64 I32 I32 2^16 1 21.544 us 6.34% 22.000 us 5.01% 0.456 us 2.12% SAME
I64 I32 I32 2^20 1 31.138 us 2.94% 31.454 us 2.25% 0.316 us 1.02% SAME
I64 I32 I32 2^24 1 125.856 us 1.07% 114.141 us 0.93% -11.715 us -9.31% FAST
I64 I32 I32 2^28 1 1.582 ms 0.11% 1.428 ms 0.11% -153.892 us -9.73% FAST
I64 I32 I32 2^16 0.201 20.047 us 4.53% 20.603 us 5.79% 0.556 us 2.78% SAME
I64 I32 I32 2^20 0.201 28.247 us 2.78% 28.239 us 2.73% -0.008 us -0.03% SAME
I64 I32 I32 2^24 0.201 106.434 us 1.02% 97.284 us 0.16% -9.150 us -8.60% FAST
I64 I32 I32 2^28 0.201 1.321 ms 0.14% 1.171 ms 0.11% -149.867 us -11.34% FAST
I64 I32 I64 2^16 1 21.592 us 7.25% 22.129 us 6.16% 0.537 us 2.49% SAME
I64 I32 I64 2^20 1 31.547 us 1.82% 31.489 us 2.08% -0.058 us -0.18% SAME
I64 I32 I64 2^24 1 126.256 us 1.01% 114.543 us 0.77% -11.713 us -9.28% FAST
I64 I32 I64 2^28 1 1.592 ms 0.11% 1.438 ms 0.09% -153.687 us -9.66% FAST
I64 I32 I64 2^16 0.201 20.419 us 4.29% 20.686 us 4.45% 0.266 us 1.30% SAME
I64 I32 I64 2^20 0.201 28.346 us 2.98% 28.345 us 2.95% -0.001 us -0.00% SAME
I64 I32 I64 2^24 0.201 106.851 us 0.94% 97.442 us 0.67% -9.409 us -8.81% FAST
I64 I32 I64 2^28 0.201 1.325 ms 0.13% 1.182 ms 0.12% -143.340 us -10.82% FAST
I64 I64 I32 2^16 1 19.602 us 2.47% 19.584 us 2.57% -0.018 us -0.09% SAME
I64 I64 I32 2^20 1 33.343 us 2.39% 33.256 us 2.63% -0.087 us -0.26% SAME
I64 I64 I32 2^24 1 150.748 us 0.76% 143.859 us 0.91% -6.888 us -4.57% FAST
I64 I64 I32 2^28 1 2.054 ms 0.06% 1.950 ms 0.07% -104.312 us -5.08% FAST
I64 I64 I32 2^16 0.201 19.633 us 2.42% 19.539 us 2.12% -0.095 us -0.48% SAME
I64 I64 I32 2^20 0.201 29.919 us 3.19% 29.571 us 3.83% -0.348 us -1.16% SAME
I64 I64 I32 2^24 0.201 123.993 us 0.66% 124.199 us 0.59% 0.206 us 0.17% SAME
I64 I64 I32 2^28 0.201 1.637 ms 0.11% 1.640 ms 0.11% 2.544 us 0.16% SLOW
I64 I64 I64 2^16 1 19.926 us 3.65% 19.771 us 3.21% -0.155 us -0.78% SAME
I64 I64 I64 2^20 1 33.623 us 1.66% 33.614 us 1.73% -0.009 us -0.03% SAME
I64 I64 I64 2^24 1 151.622 us 0.56% 144.549 us 0.95% -7.073 us -4.67% FAST
I64 I64 I64 2^28 1 2.063 ms 0.07% 1.956 ms 0.08% -106.401 us -5.16% FAST
I64 I64 I64 2^16 0.201 19.708 us 2.76% 19.577 us 2.88% -0.131 us -0.66% SAME
I64 I64 I64 2^20 0.201 30.437 us 3.36% 29.943 us 3.92% -0.494 us -1.62% SAME
I64 I64 I64 2^24 0.201 123.912 us 0.89% 124.382 us 0.54% 0.469 us 0.38% SAME
I64 I64 I64 2^28 0.201 1.638 ms 0.10% 1.651 ms 0.09% 12.689 us 0.77% SLOW
I128 I8 I32 2^16 1 23.162 us 3.49% 23.456 us 1.87% 0.294 us 1.27% SAME
I128 I8 I32 2^20 1 37.994 us 1.21% 36.079 us 1.91% -1.915 us -5.04% FAST
I128 I8 I32 2^24 1 212.834 us 0.58% 181.584 us 0.35% -31.250 us -14.68% FAST
I128 I8 I32 2^28 1 3.014 ms 0.04% 2.517 ms 0.05% -496.780 us -16.48% FAST
I128 I8 I32 2^16 0.201 21.377 us 2.43% 21.389 us 2.37% 0.012 us 0.06% SAME
I128 I8 I32 2^20 0.201 33.802 us 0.87% 32.143 us 2.58% -1.659 us -4.91% FAST
I128 I8 I32 2^24 0.201 154.930 us 0.43% 137.953 us 0.81% -16.977 us -10.96% FAST
I128 I8 I32 2^28 0.201 2.112 ms 0.07% 1.830 ms 0.10% -281.463 us -13.33% FAST
I128 I8 I64 2^16 1 22.981 us 4.61% 23.553 us 2.55% 0.573 us 2.49% SAME
I128 I8 I64 2^20 1 38.254 us 1.91% 36.419 us 2.47% -1.835 us -4.80% FAST
I128 I8 I64 2^24 1 212.536 us 0.33% 183.893 us 0.49% -28.642 us -13.48% FAST
I128 I8 I64 2^28 1 3.000 ms 0.04% 2.552 ms 0.05% -448.314 us -14.94% FAST
I128 I8 I64 2^16 0.201 21.374 us 1.75% 21.519 us 1.88% 0.145 us 0.68% SAME
I128 I8 I64 2^20 0.201 33.815 us 0.86% 32.070 us 2.47% -1.745 us -5.16% FAST
I128 I8 I64 2^24 0.201 153.549 us 0.73% 138.200 us 0.76% -15.349 us -10.00% FAST
I128 I8 I64 2^28 0.201 2.088 ms 0.07% 1.834 ms 0.10% -254.542 us -12.19% FAST
I128 I16 I32 2^16 1 23.480 us 2.61% 23.289 us 2.12% -0.192 us -0.82% SAME
I128 I16 I32 2^20 1 37.323 us 2.83% 36.366 us 2.64% -0.957 us -2.57% SAME
I128 I16 I32 2^24 1 208.270 us 0.40% 185.072 us 0.62% -23.198 us -11.14% FAST
I128 I16 I32 2^28 1 2.959 ms 0.04% 2.561 ms 0.05% -398.545 us -13.47% FAST
I128 I16 I32 2^16 0.201 21.516 us 2.86% 21.450 us 2.22% -0.067 us -0.31% SAME
I128 I16 I32 2^20 0.201 33.716 us 1.40% 33.034 us 3.32% -0.683 us -2.02% FAST
I128 I16 I32 2^24 0.201 149.168 us 0.61% 140.805 us 0.66% -8.362 us -5.61% FAST
I128 I16 I32 2^28 0.201 2.025 ms 0.07% 1.887 ms 0.08% -138.230 us -6.82% FAST
I128 I16 I64 2^16 1 23.326 us 3.25% 23.492 us 1.51% 0.165 us 0.71% SAME
I128 I16 I64 2^20 1 38.171 us 1.57% 36.742 us 49.39% -1.430 us -3.75% FAST
I128 I16 I64 2^24 1 214.786 us 0.47% 189.412 us 32.62% -25.374 us -11.81% FAST
I128 I16 I64 2^28 1 3.037 ms 0.05% 2.604 ms 0.05% -433.539 us -14.27% FAST
I128 I16 I64 2^16 0.201 21.495 us 2.05% 21.400 us 2.06% -0.094 us -0.44% SAME
I128 I16 I64 2^20 0.201 33.819 us 1.03% 33.306 us 2.87% -0.513 us -1.52% FAST
I128 I16 I64 2^24 0.201 159.401 us 0.67% 141.872 us 0.84% -17.528 us -11.00% FAST
I128 I16 I64 2^28 0.201 2.182 ms 0.06% 1.893 ms 0.07% -288.433 us -13.22% FAST
I128 I32 I32 2^16 1 23.426 us 2.07% 23.560 us 1.49% 0.133 us 0.57% SAME
I128 I32 I32 2^20 1 37.808 us 1.92% 37.349 us 2.69% -0.459 us -1.22% SAME
I128 I32 I32 2^24 1 210.290 us 0.31% 185.662 us 0.39% -24.628 us -11.71% FAST
I128 I32 I32 2^28 1 2.976 ms 0.04% 2.571 ms 0.05% -404.579 us -13.59% FAST
I128 I32 I32 2^16 0.201 21.358 us 2.93% 21.527 us 3.12% 0.169 us 0.79% SAME
I128 I32 I32 2^20 0.201 33.777 us 1.26% 33.535 us 2.40% -0.243 us -0.72% SAME
I128 I32 I32 2^24 0.201 151.291 us 0.69% 144.620 us 0.44% -6.671 us -4.41% FAST
I128 I32 I32 2^28 0.201 2.067 ms 0.08% 1.951 ms 0.09% -115.674 us -5.60% FAST
I128 I32 I64 2^16 1 23.358 us 2.13% 23.447 us 1.50% 0.089 us 0.38% SAME
I128 I32 I64 2^20 1 38.546 us 2.56% 36.922 us 3.01% -1.624 us -4.21% FAST
I128 I32 I64 2^24 1 216.539 us 0.38% 187.892 us 0.44% -28.647 us -13.23% FAST
I128 I32 I64 2^28 1 3.060 ms 0.04% 2.615 ms 0.05% -445.202 us -14.55% FAST
I128 I32 I64 2^16 0.201 21.412 us 2.50% 21.399 us 1.91% -0.013 us -0.06% SAME
I128 I32 I64 2^20 0.201 33.822 us 0.85% 33.727 us 1.82% -0.096 us -0.28% SAME
I128 I32 I64 2^24 0.201 163.261 us 0.47% 145.311 us 0.76% -17.950 us -10.99% FAST
I128 I32 I64 2^28 0.201 2.245 ms 0.09% 1.958 ms 0.08% -286.727 us -12.77% FAST
I128 I64 I32 2^16 1 21.820 us 4.92% 22.469 us 5.28% 0.649 us 2.98% SAME
I128 I64 I32 2^20 1 38.071 us 1.41% 38.008 us 1.04% -0.063 us -0.17% SAME
I128 I64 I32 2^24 1 238.597 us 0.08% 213.962 us 0.19% -24.634 us -10.32% FAST
I128 I64 I32 2^28 1 3.464 ms 0.04% 3.056 ms 0.05% -408.034 us -11.78% FAST
I128 I64 I32 2^16 0.201 21.498 us 1.08% 21.435 us 1.84% -0.062 us -0.29% SAME
I128 I64 I32 2^20 0.201 35.849 us 0.52% 34.648 us 3.26% -1.201 us -3.35% FAST
I128 I64 I32 2^24 0.201 193.443 us 0.32% 181.097 us 0.40% -12.346 us -6.38% FAST
I128 I64 I32 2^28 0.201 2.746 ms 0.06% 2.547 ms 0.08% -199.640 us -7.27% FAST
I128 I64 I64 2^16 1 22.034 us 4.49% 22.358 us 4.94% 0.324 us 1.47% SAME
I128 I64 I64 2^20 1 38.157 us 1.51% 38.139 us 1.34% -0.018 us -0.05% SAME
I128 I64 I64 2^24 1 238.490 us 0.17% 215.981 us 0.19% -22.508 us -9.44% FAST
I128 I64 I64 2^28 1 3.469 ms 0.04% 3.098 ms 0.05% -371.141 us -10.70% FAST
I128 I64 I64 2^16 0.201 21.456 us 1.56% 21.469 us 1.63% 0.013 us 0.06% SAME
I128 I64 I64 2^20 0.201 35.854 us 1.15% 35.156 us 3.05% -0.698 us -1.95% FAST
I128 I64 I64 2^24 0.201 192.928 us 0.56% 182.664 us 0.62% -10.264 us -5.32% FAST
I128 I64 I64 2^28 0.201 2.741 ms 0.06% 2.575 ms 0.07% -165.868 us -6.05% FAST
F32 I8 I32 2^16 1 21.332 us 1.92% 21.249 us 2.35% -0.084 us -0.39% SAME
F32 I8 I32 2^20 1 27.770 us 1.89% 27.641 us 1.38% -0.130 us -0.47% SAME
F32 I8 I32 2^24 1 93.870 us 1.08% 85.767 us 0.96% -8.103 us -8.63% FAST
F32 I8 I32 2^28 1 1.070 ms 0.11% 969.081 us 0.13% -100.547 us -9.40% FAST
F32 I8 I32 2^16 0.201 21.113 us 3.49% 20.133 us 7.21% -0.981 us -4.64% FAST
F32 I8 I32 2^20 0.201 27.610 us 1.30% 25.838 us 2.78% -1.772 us -6.42% FAST
F32 I8 I32 2^24 0.201 85.116 us 0.59% 77.009 us 0.78% -8.107 us -9.52% FAST
F32 I8 I32 2^28 0.201 957.571 us 0.13% 842.129 us 0.17% -115.442 us -12.06% FAST
F32 I8 I64 2^16 1 21.436 us 1.74% 21.448 us 1.60% 0.011 us 0.05% SAME
F32 I8 I64 2^20 1 27.703 us 1.64% 27.640 us 1.30% -0.063 us -0.23% SAME
F32 I8 I64 2^24 1 93.253 us 1.50% 88.175 us 1.01% -5.078 us -5.45% FAST
F32 I8 I64 2^28 1 1.071 ms 0.11% 1.009 ms 0.14% -62.284 us -5.82% FAST
F32 I8 I64 2^16 0.201 21.324 us 2.58% 20.339 us 6.41% -0.985 us -4.62% FAST
F32 I8 I64 2^20 0.201 27.663 us 1.08% 26.813 us 3.99% -0.849 us -3.07% FAST
F32 I8 I64 2^24 0.201 85.179 us 0.54% 79.189 us 0.89% -5.990 us -7.03% FAST
F32 I8 I64 2^28 0.201 960.252 us 0.12% 873.979 us 0.20% -86.273 us -8.98% FAST
F32 I16 I32 2^16 1 21.427 us 2.16% 21.492 us 1.02% 0.065 us 0.30% SAME
F32 I16 I32 2^20 1 27.641 us 0.57% 27.728 us 1.71% 0.087 us 0.31% SAME
F32 I16 I32 2^24 1 98.320 us 1.22% 79.669 us 1.16% -18.652 us -18.97% FAST
F32 I16 I32 2^28 1 1.179 ms 0.11% 858.997 us 0.17% -320.073 us -27.15% FAST
F32 I16 I32 2^16 0.201 21.398 us 1.92% 20.517 us 6.03% -0.882 us -4.12% FAST
F32 I16 I32 2^20 0.201 26.707 us 4.04% 25.603 us 1.07% -1.103 us -4.13% FAST
F32 I16 I32 2^24 0.201 89.449 us 0.91% 70.718 us 0.46% -18.731 us -20.94% FAST
F32 I16 I32 2^28 0.201 1.046 ms 0.09% 745.020 us 0.15% -300.893 us -28.77% FAST
F32 I16 I64 2^16 1 21.477 us 1.74% 21.494 us 1.65% 0.018 us 0.08% SAME
F32 I16 I64 2^20 1 27.714 us 1.26% 27.749 us 1.52% 0.034 us 0.12% SAME
F32 I16 I64 2^24 1 98.616 us 0.93% 80.195 us 1.05% -18.421 us -18.68% FAST
F32 I16 I64 2^28 1 1.173 ms 0.13% 865.665 us 0.16% -306.991 us -26.18% FAST
F32 I16 I64 2^16 0.201 21.379 us 1.69% 20.738 us 5.09% -0.642 us -3.00% FAST
F32 I16 I64 2^20 0.201 26.277 us 3.95% 25.578 us 1.43% -0.699 us -2.66% FAST
F32 I16 I64 2^24 0.201 89.223 us 0.63% 70.872 us 1.02% -18.351 us -20.57% FAST
F32 I16 I64 2^28 0.201 1.044 ms 0.06% 748.801 us 0.12% -294.700 us -28.24% FAST
F32 I32 I32 2^16 1 19.601 us 4.14% 19.790 us 5.13% 0.189 us 0.97% SAME
F32 I32 I32 2^20 1 26.420 us 4.25% 27.090 us 3.70% 0.670 us 2.54% SAME
F32 I32 I32 2^24 1 94.086 us 1.27% 91.426 us 1.74% -2.659 us -2.83% FAST
F32 I32 I32 2^28 1 1.118 ms 0.10% 1.081 ms 0.10% -37.266 us -3.33% FAST
F32 I32 I32 2^16 0.201 19.380 us 2.64% 19.387 us 2.13% 0.006 us 0.03% SAME
F32 I32 I32 2^20 0.201 25.594 us 0.72% 25.584 us 1.48% -0.011 us -0.04% SAME
F32 I32 I32 2^24 0.201 83.144 us 0.79% 81.600 us 1.26% -1.544 us -1.86% FAST
F32 I32 I32 2^28 0.201 954.704 us 0.18% 936.290 us 0.16% -18.414 us -1.93% FAST
F32 I32 I64 2^16 1 19.968 us 5.40% 19.578 us 4.27% -0.389 us -1.95% SAME
F32 I32 I64 2^20 1 27.693 us 2.01% 26.581 us 4.20% -1.112 us -4.02% FAST
F32 I32 I64 2^24 1 94.839 us 0.83% 86.321 us 1.27% -8.518 us -8.98% FAST
F32 I32 I64 2^28 1 1.116 ms 0.09% 988.800 us 0.11% -126.887 us -11.37% FAST
F32 I32 I64 2^16 0.201 19.389 us 1.79% 19.475 us 1.83% 0.086 us 0.44% SAME
F32 I32 I64 2^20 0.201 25.619 us 1.30% 25.581 us 1.30% -0.038 us -0.15% SAME
F32 I32 I64 2^24 0.201 83.611 us 1.18% 76.647 us 1.29% -6.964 us -8.33% FAST
F32 I32 I64 2^28 0.201 966.878 us 0.15% 840.504 us 0.18% -126.373 us -13.07% FAST
F32 I64 I32 2^16 1 19.826 us 4.06% 20.168 us 5.47% 0.342 us 1.72% SAME
F32 I64 I32 2^20 1 27.681 us 1.25% 27.650 us 1.25% -0.031 us -0.11% SAME
F32 I64 I32 2^24 1 119.665 us 0.46% 108.294 us 1.02% -11.371 us -9.50% FAST
F32 I64 I32 2^28 1 1.536 ms 0.09% 1.366 ms 0.10% -169.940 us -11.06% FAST
F32 I64 I32 2^16 0.201 19.465 us 1.72% 19.479 us 2.31% 0.013 us 0.07% SAME
F32 I64 I32 2^20 0.201 27.660 us 1.36% 27.626 us 1.10% -0.034 us -0.12% SAME
F32 I64 I32 2^24 0.201 105.576 us 0.54% 96.178 us 1.15% -9.399 us -8.90% FAST
F32 I64 I32 2^28 0.201 1.326 ms 0.11% 1.183 ms 0.08% -142.907 us -10.78% FAST
F32 I64 I64 2^16 1 19.603 us 3.10% 20.391 us 5.88% 0.788 us 4.02% SLOW
F32 I64 I64 2^20 1 27.837 us 1.70% 27.859 us 1.82% 0.021 us 0.08% SAME
F32 I64 I64 2^24 1 119.748 us 0.36% 108.830 us 0.98% -10.919 us -9.12% FAST
F32 I64 I64 2^28 1 1.535 ms 0.09% 1.370 ms 0.10% -164.524 us -10.72% FAST
F32 I64 I64 2^16 0.201 19.362 us 1.73% 19.460 us 1.86% 0.098 us 0.50% SAME
F32 I64 I64 2^20 0.201 27.664 us 1.00% 27.655 us 1.00% -0.009 us -0.03% SAME
F32 I64 I64 2^24 0.201 104.870 us 1.11% 96.796 us 1.17% -8.074 us -7.70% FAST
F32 I64 I64 2^28 0.201 1.318 ms 0.12% 1.185 ms 0.08% -133.545 us -10.13% FAST
F64 I8 I32 2^16 1 20.494 us 6.05% 21.326 us 6.42% 0.832 us 4.06% SAME
F64 I8 I32 2^20 1 32.796 us 3.57% 31.171 us 2.69% -1.626 us -4.96% FAST
F64 I8 I32 2^24 1 126.514 us 0.76% 101.427 us 0.29% -25.087 us -19.83% FAST
F64 I8 I32 2^28 1 1.633 ms 0.08% 1.232 ms 0.07% -400.888 us -24.55% FAST
F64 I8 I32 2^16 0.201 19.556 us 2.31% 19.885 us 3.57% 0.329 us 1.68% SAME
F64 I8 I32 2^20 0.201 29.548 us 3.98% 28.159 us 2.52% -1.388 us -4.70% FAST
F64 I8 I32 2^24 0.201 110.918 us 0.91% 87.064 us 0.48% -23.854 us -21.51% FAST
F64 I8 I32 2^28 0.201 1.381 ms 0.11% 998.460 us 0.12% -382.354 us -27.69% FAST
F64 I8 I64 2^16 1 20.570 us 5.99% 21.548 us 6.11% 0.978 us 4.75% SAME
F64 I8 I64 2^20 1 31.589 us 1.61% 31.582 us 1.85% -0.007 us -0.02% SAME
F64 I8 I64 2^24 1 115.922 us 0.47% 103.629 us 0.56% -12.293 us -10.60% FAST
F64 I8 I64 2^28 1 1.455 ms 0.07% 1.258 ms 0.11% -197.934 us -13.60% FAST
F64 I8 I64 2^16 0.201 19.812 us 4.30% 20.020 us 3.96% 0.209 us 1.05% SAME
F64 I8 I64 2^20 0.201 28.536 us 3.16% 28.542 us 3.04% 0.006 us 0.02% SAME
F64 I8 I64 2^24 0.201 99.083 us 0.64% 87.677 us 1.27% -11.406 us -11.51% FAST
F64 I8 I64 2^28 0.201 1.171 ms 0.11% 1.009 ms 0.11% -162.322 us -13.86% FAST
F64 I16 I32 2^16 1 21.181 us 6.03% 21.270 us 5.68% 0.090 us 0.42% SAME
F64 I16 I32 2^20 1 33.049 us 2.65% 30.893 us 2.92% -2.156 us -6.52% FAST
F64 I16 I32 2^24 1 130.322 us 0.55% 103.603 us 0.56% -26.719 us -20.50% FAST
F64 I16 I32 2^28 1 1.679 ms 0.08% 1.256 ms 0.07% -423.320 us -25.21% FAST
F64 I16 I32 2^16 0.201 20.010 us 4.58% 19.998 us 4.91% -0.012 us -0.06% SAME
F64 I16 I32 2^20 0.201 30.270 us 3.32% 28.322 us 2.37% -1.947 us -6.43% FAST
F64 I16 I32 2^24 0.201 113.603 us 0.40% 89.075 us 0.29% -24.528 us -21.59% FAST
F64 I16 I32 2^28 0.201 1.433 ms 0.16% 1.026 ms 0.13% -406.429 us -28.37% FAST
F64 I16 I64 2^16 1 21.308 us 7.26% 21.308 us 5.96% 0.000 us 0.00% SAME
F64 I16 I64 2^20 1 33.253 us 2.70% 31.628 us 1.70% -1.625 us -4.89% FAST
F64 I16 I64 2^24 1 130.308 us 0.43% 104.561 us 1.12% -25.748 us -19.76% FAST
F64 I16 I64 2^28 1 1.678 ms 0.08% 1.277 ms 0.09% -400.686 us -23.88% FAST
F64 I16 I64 2^16 0.201 20.150 us 4.43% 19.886 us 4.17% -0.264 us -1.31% SAME
F64 I16 I64 2^20 0.201 30.369 us 3.16% 28.560 us 3.14% -1.809 us -5.96% FAST
F64 I16 I64 2^24 0.201 113.395 us 0.64% 89.235 us 0.75% -24.160 us -21.31% FAST
F64 I16 I64 2^28 0.201 1.424 ms 0.11% 1.033 ms 0.11% -391.363 us -27.48% FAST
F64 I32 I32 2^16 1 20.759 us 6.60% 21.484 us 5.52% 0.726 us 3.50% SAME
F64 I32 I32 2^20 1 31.055 us 2.86% 30.949 us 2.86% -0.106 us -0.34% SAME
F64 I32 I32 2^24 1 125.832 us 1.06% 113.881 us 1.16% -11.951 us -9.50% FAST
F64 I32 I32 2^28 1 1.580 ms 0.11% 1.419 ms 0.08% -161.422 us -10.21% FAST
F64 I32 I32 2^16 0.201 19.629 us 3.04% 20.189 us 4.08% 0.561 us 2.86% SAME
F64 I32 I32 2^20 0.201 28.227 us 2.22% 27.948 us 2.35% -0.279 us -0.99% SAME
F64 I32 I32 2^24 0.201 105.608 us 0.52% 97.072 us 0.75% -8.536 us -8.08% FAST
F64 I32 I32 2^28 0.201 1.311 ms 0.14% 1.163 ms 0.10% -147.990 us -11.29% FAST
F64 I32 I64 2^16 1 21.460 us 7.49% 21.817 us 6.71% 0.357 us 1.66% SAME
F64 I32 I64 2^20 1 31.534 us 1.89% 31.497 us 2.12% -0.036 us -0.12% SAME
F64 I32 I64 2^24 1 125.999 us 1.05% 114.713 us 0.89% -11.285 us -8.96% FAST
F64 I32 I64 2^28 1 1.588 ms 0.11% 1.425 ms 0.08% -163.308 us -10.28% FAST
F64 I32 I64 2^16 0.201 19.708 us 2.88% 20.222 us 4.65% 0.514 us 2.61% SAME
F64 I32 I64 2^20 0.201 28.695 us 3.21% 28.421 us 2.97% -0.274 us -0.95% SAME
F64 I32 I64 2^24 0.201 105.836 us 0.82% 97.296 us 0.36% -8.540 us -8.07% FAST
F64 I32 I64 2^28 0.201 1.316 ms 0.14% 1.173 ms 0.09% -143.279 us -10.89% FAST
F64 I64 I32 2^16 1 19.714 us 2.76% 19.971 us 3.58% 0.256 us 1.30% SAME
F64 I64 I32 2^20 1 32.972 us 2.63% 32.952 us 3.44% -0.020 us -0.06% SAME
F64 I64 I32 2^24 1 150.823 us 0.82% 143.043 us 0.59% -7.780 us -5.16% FAST
F64 I64 I32 2^28 1 2.051 ms 0.06% 1.936 ms 0.07% -114.969 us -5.61% FAST
F64 I64 I32 2^16 0.201 19.577 us 2.69% 19.464 us 1.43% -0.113 us -0.58% SAME
F64 I64 I32 2^20 0.201 29.483 us 3.82% 29.063 us 4.26% -0.419 us -1.42% SAME
F64 I64 I32 2^24 0.201 123.179 us 0.85% 123.197 us 0.92% 0.019 us 0.02% SAME
F64 I64 I32 2^28 0.201 1.631 ms 0.11% 1.631 ms 0.11% -0.398 us -0.02% SAME
F64 I64 I64 2^16 1 19.919 us 3.59% 20.174 us 4.38% 0.255 us 1.28% SAME
F64 I64 I64 2^20 1 33.581 us 1.76% 33.153 us 3.05% -0.428 us -1.27% SAME
F64 I64 I64 2^24 1 151.694 us 0.61% 143.762 us 0.84% -7.932 us -5.23% FAST
F64 I64 I64 2^28 1 2.063 ms 0.07% 1.946 ms 0.08% -117.088 us -5.68% FAST
F64 I64 I64 2^16 0.201 19.689 us 2.99% 19.624 us 2.85% -0.065 us -0.33% SAME
F64 I64 I64 2^20 0.201 29.958 us 3.42% 29.887 us 4.21% -0.070 us -0.24% SAME
F64 I64 I64 2^24 0.201 124.044 us 0.76% 124.215 us 0.50% 0.171 us 0.14% SAME
F64 I64 I64 2^28 0.201 1.636 ms 0.10% 1.644 ms 0.09% 8.194 us 0.50% SLOW

Summary

  • Total Matches: 448
    • Pass (diff <= min_noise): 167
    • Unknown (infinite noise): 0
    • Failure (diff > min_noise): 281
H200 (UBLKCPY)

['/home/pgrossebley/SM_90_merge_pairs_final_old.json', '/home/pgrossebley/SM_90_merge_pairs_final_newest.json']

base

[0] NVIDIA H200

KeyT{ct} ValueT{ct} OffsetT{ct} Elements{io} Entropy Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I8 I32 2^16 1 15.784 us 2.83% 15.176 us 2.99% -0.608 us -3.85% FAST
I8 I8 I32 2^20 1 25.904 us 1.76% 22.231 us 2.05% -3.674 us -14.18% FAST
I8 I8 I32 2^24 1 103.762 us 0.51% 73.331 us 0.72% -30.432 us -29.33% FAST
I8 I8 I32 2^28 1 1.335 ms 0.07% 861.529 us 0.13% -473.451 us -35.47% FAST
I8 I8 I32 2^16 0.201 15.405 us 2.62% 14.911 us 2.81% -0.494 us -3.21% FAST
I8 I8 I32 2^20 0.201 24.269 us 2.05% 21.316 us 1.97% -2.953 us -12.17% FAST
I8 I8 I32 2^24 0.201 102.062 us 0.53% 71.952 us 0.72% -30.109 us -29.50% FAST
I8 I8 I32 2^28 0.201 1.328 ms 0.07% 847.391 us 0.10% -481.042 us -36.21% FAST
I8 I8 I64 2^16 1 15.891 us 2.78% 15.253 us 2.98% -0.638 us -4.02% FAST
I8 I8 I64 2^20 1 25.983 us 1.92% 22.193 us 1.89% -3.790 us -14.59% FAST
I8 I8 I64 2^24 1 105.268 us 0.50% 73.071 us 0.69% -32.197 us -30.59% FAST
I8 I8 I64 2^28 1 1.368 ms 0.07% 847.070 us 0.14% -521.102 us -38.09% FAST
I8 I8 I64 2^16 0.201 15.483 us 2.71% 14.858 us 2.87% -0.625 us -4.03% FAST
I8 I8 I64 2^20 0.201 24.471 us 2.05% 21.375 us 2.10% -3.096 us -12.65% FAST
I8 I8 I64 2^24 0.201 103.630 us 0.51% 71.578 us 0.77% -32.052 us -30.93% FAST
I8 I8 I64 2^28 0.201 1.362 ms 0.07% 841.965 us 0.10% -520.124 us -38.19% FAST
I8 I16 I32 2^16 1 16.277 us 2.74% 15.524 us 3.02% -0.753 us -4.62% FAST
I8 I16 I32 2^20 1 26.370 us 1.85% 22.580 us 1.94% -3.791 us -14.37% FAST
I8 I16 I32 2^24 1 106.912 us 0.50% 74.301 us 0.77% -32.611 us -30.50% FAST
I8 I16 I32 2^28 1 1.375 ms 0.10% 889.908 us 0.14% -485.121 us -35.28% FAST
I8 I16 I32 2^16 0.201 15.849 us 3.11% 15.251 us 3.10% -0.597 us -3.77% FAST
I8 I16 I32 2^20 0.201 24.810 us 2.07% 21.723 us 2.29% -3.087 us -12.44% FAST
I8 I16 I32 2^24 0.201 105.470 us 0.63% 73.344 us 0.85% -32.126 us -30.46% FAST
I8 I16 I32 2^28 0.201 1.369 ms 0.11% 869.966 us 0.11% -499.385 us -36.47% FAST
I8 I16 I64 2^16 1 16.312 us 2.92% 15.504 us 2.94% -0.808 us -4.95% FAST
I8 I16 I64 2^20 1 26.691 us 1.70% 22.673 us 2.00% -4.018 us -15.05% FAST
I8 I16 I64 2^24 1 107.614 us 0.52% 75.054 us 0.75% -32.560 us -30.26% FAST
I8 I16 I64 2^28 1 1.391 ms 0.14% 899.218 us 0.17% -492.251 us -35.38% FAST
I8 I16 I64 2^16 0.201 15.916 us 3.26% 15.228 us 3.33% -0.688 us -4.32% FAST
I8 I16 I64 2^20 0.201 25.056 us 2.03% 21.907 us 2.16% -3.149 us -12.57% FAST
I8 I16 I64 2^24 0.201 106.344 us 0.61% 73.907 us 0.84% -32.437 us -30.50% FAST
I8 I16 I64 2^28 0.201 1.388 ms 0.15% 880.306 us 0.13% -507.985 us -36.59% FAST
I8 I32 I32 2^16 1 15.473 us 3.34% 15.207 us 3.27% -0.266 us -1.72% SAME
I8 I32 I32 2^20 1 25.730 us 1.80% 22.035 us 2.15% -3.694 us -14.36% FAST
I8 I32 I32 2^24 1 119.990 us 0.55% 74.962 us 0.84% -45.028 us -37.53% FAST
I8 I32 I32 2^28 1 1.622 ms 0.29% 887.072 us 0.19% -734.720 us -45.30% FAST
I8 I32 I32 2^16 0.201 15.110 us 3.40% 14.927 us 3.45% -0.183 us -1.21% SAME
I8 I32 I32 2^20 0.201 24.143 us 2.11% 21.010 us 2.37% -3.132 us -12.97% FAST
I8 I32 I32 2^24 0.201 119.154 us 0.84% 73.848 us 0.96% -45.306 us -38.02% FAST
I8 I32 I32 2^28 0.201 1.606 ms 0.26% 879.359 us 0.21% -726.603 us -45.24% FAST
I8 I32 I64 2^16 1 15.530 us 3.32% 15.385 us 3.29% -0.145 us -0.93% SAME
I8 I32 I64 2^20 1 22.449 us 2.27% 22.170 us 2.28% -0.279 us -1.24% SAME
I8 I32 I64 2^24 1 87.151 us 0.67% 76.450 us 0.82% -10.701 us -12.28% FAST
I8 I32 I64 2^28 1 1.110 ms 0.37% 912.678 us 0.22% -197.210 us -17.77% FAST
I8 I32 I64 2^16 0.201 15.213 us 3.17% 15.126 us 3.16% -0.087 us -0.57% SAME
I8 I32 I64 2^20 0.201 21.888 us 2.54% 21.241 us 2.38% -0.647 us -2.95% FAST
I8 I32 I64 2^24 0.201 85.135 us 0.88% 75.190 us 0.95% -9.944 us -11.68% FAST
I8 I32 I64 2^28 0.201 1.112 ms 0.37% 912.839 us 0.29% -199.612 us -17.94% FAST
I8 I64 I32 2^16 1 14.635 us 3.45% 14.848 us 3.45% 0.213 us 1.45% SAME
I8 I64 I32 2^20 1 24.936 us 2.07% 23.390 us 2.11% -1.545 us -6.20% FAST
I8 I64 I32 2^24 1 123.881 us 1.65% 101.158 us 0.62% -22.723 us -18.34% FAST
I8 I64 I32 2^28 1 1.785 ms 0.20% 1.339 ms 0.09% -445.343 us -24.95% FAST
I8 I64 I32 2^16 0.201 14.270 us 3.42% 14.567 us 3.47% 0.297 us 2.08% SAME
I8 I64 I32 2^20 0.201 23.479 us 2.39% 22.155 us 2.29% -1.324 us -5.64% FAST
I8 I64 I32 2^24 0.201 124.476 us 1.23% 101.187 us 0.58% -23.289 us -18.71% FAST
I8 I64 I32 2^28 0.201 1.775 ms 0.18% 1.332 ms 0.10% -442.767 us -24.95% FAST
I8 I64 I64 2^16 1 14.711 us 3.30% 14.964 us 3.29% 0.253 us 1.72% SAME
I8 I64 I64 2^20 1 24.834 us 1.95% 23.331 us 2.05% -1.503 us -6.05% FAST
I8 I64 I64 2^24 1 124.505 us 1.12% 101.640 us 0.56% -22.865 us -18.36% FAST
I8 I64 I64 2^28 1 1.790 ms 0.21% 1.344 ms 0.10% -446.758 us -24.95% FAST
I8 I64 I64 2^16 0.201 14.119 us 3.25% 14.393 us 3.21% 0.274 us 1.94% SAME
I8 I64 I64 2^20 0.201 23.212 us 2.18% 22.079 us 2.20% -1.133 us -4.88% FAST
I8 I64 I64 2^24 0.201 125.165 us 1.31% 101.205 us 0.56% -23.960 us -19.14% FAST
I8 I64 I64 2^28 0.201 1.781 ms 0.17% 1.336 ms 0.09% -445.062 us -24.99% FAST
I16 I8 I32 2^16 1 16.902 us 2.85% 15.999 us 3.20% -0.903 us -5.34% FAST
I16 I8 I32 2^20 1 27.857 us 1.66% 23.514 us 1.93% -4.342 us -15.59% FAST
I16 I8 I32 2^24 1 121.639 us 0.58% 86.457 us 0.60% -35.181 us -28.92% FAST
I16 I8 I32 2^28 1 1.568 ms 0.08% 1.012 ms 0.39% -556.427 us -35.49% FAST
I16 I8 I32 2^16 0.201 16.212 us 2.50% 15.444 us 2.68% -0.768 us -4.74% FAST
I16 I8 I32 2^20 0.201 26.013 us 1.87% 22.565 us 2.06% -3.448 us -13.26% FAST
I16 I8 I32 2^24 0.201 112.562 us 0.68% 79.684 us 0.77% -32.878 us -29.21% FAST
I16 I8 I32 2^28 0.201 1.448 ms 0.13% 936.098 us 0.46% -511.518 us -35.34% FAST
I16 I8 I64 2^16 1 17.167 us 2.48% 16.030 us 2.89% -1.138 us -6.63% FAST
I16 I8 I64 2^20 1 28.220 us 3.73% 23.644 us 1.90% -4.576 us -16.21% FAST
I16 I8 I64 2^24 1 123.746 us 0.73% 86.895 us 0.60% -36.851 us -29.78% FAST
I16 I8 I64 2^28 1 1.600 ms 0.12% 1.020 ms 0.35% -580.097 us -36.25% FAST
I16 I8 I64 2^16 0.201 16.555 us 2.55% 15.483 us 2.74% -1.073 us -6.48% FAST
I16 I8 I64 2^20 0.201 26.256 us 1.89% 22.607 us 1.93% -3.649 us -13.90% FAST
I16 I8 I64 2^24 0.201 114.964 us 0.73% 79.762 us 0.74% -35.202 us -30.62% FAST
I16 I8 I64 2^28 0.201 1.483 ms 0.20% 939.254 us 0.35% -544.169 us -36.68% FAST
I16 I16 I32 2^16 1 17.215 us 3.06% 16.345 us 3.22% -0.870 us -5.05% FAST
I16 I16 I32 2^20 1 28.528 us 1.72% 24.080 us 2.02% -4.448 us -15.59% FAST
I16 I16 I32 2^24 1 126.782 us 0.76% 87.031 us 0.65% -39.752 us -31.35% FAST
I16 I16 I32 2^28 1 1.625 ms 0.10% 1.024 ms 0.11% -601.504 us -37.01% FAST
I16 I16 I32 2^16 0.201 16.615 us 2.65% 15.973 us 2.84% -0.642 us -3.86% FAST
I16 I16 I32 2^20 0.201 26.229 us 2.04% 22.957 us 2.05% -3.271 us -12.47% FAST
I16 I16 I32 2^24 0.201 116.956 us 0.69% 80.785 us 0.72% -36.171 us -30.93% FAST
I16 I16 I32 2^28 0.201 1.504 ms 0.16% 972.505 us 0.36% -531.789 us -35.35% FAST
I16 I16 I64 2^16 1 17.426 us 3.10% 16.529 us 3.23% -0.897 us -5.15% FAST
I16 I16 I64 2^20 1 28.770 us 1.85% 24.181 us 1.96% -4.589 us -15.95% FAST
I16 I16 I64 2^24 1 127.668 us 1.39% 87.546 us 0.69% -40.123 us -31.43% FAST
I16 I16 I64 2^28 1 1.657 ms 0.15% 1.035 ms 0.13% -621.139 us -37.49% FAST
I16 I16 I64 2^16 0.201 16.759 us 3.01% 16.146 us 3.12% -0.613 us -3.66% FAST
I16 I16 I64 2^20 0.201 26.589 us 1.85% 23.249 us 2.07% -3.340 us -12.56% FAST
I16 I16 I64 2^24 0.201 118.339 us 0.79% 81.203 us 0.72% -37.136 us -31.38% FAST
I16 I16 I64 2^28 0.201 1.544 ms 0.27% 991.636 us 0.50% -552.590 us -35.78% FAST
I16 I32 I32 2^16 1 15.944 us 3.22% 15.473 us 3.21% -0.471 us -2.95% SAME
I16 I32 I32 2^20 1 23.251 us 2.07% 22.820 us 2.03% -0.431 us -1.85% SAME
I16 I32 I32 2^24 1 109.446 us 0.87% 86.344 us 0.75% -23.102 us -21.11% FAST
I16 I32 I32 2^28 1 1.447 ms 0.23% 1.049 ms 0.14% -398.181 us -27.51% FAST
I16 I32 I32 2^16 0.201 14.996 us 2.74% 14.845 us 2.82% -0.151 us -1.01% SAME
I16 I32 I32 2^20 0.201 21.745 us 2.16% 21.408 us 3.39% -0.337 us -1.55% SAME
I16 I32 I32 2^24 0.201 98.666 us 1.10% 82.409 us 0.80% -16.257 us -16.48% FAST
I16 I32 I32 2^28 0.201 1.258 ms 0.40% 994.490 us 0.16% -263.208 us -20.93% FAST
I16 I32 I64 2^16 1 15.775 us 2.87% 15.278 us 2.87% -0.498 us -3.15% FAST
I16 I32 I64 2^20 1 23.190 us 1.98% 22.804 us 1.96% -0.386 us -1.66% SAME
I16 I32 I64 2^24 1 108.909 us 0.87% 86.832 us 0.74% -22.077 us -20.27% FAST
I16 I32 I64 2^28 1 1.460 ms 0.35% 1.060 ms 0.15% -400.398 us -27.42% FAST
I16 I32 I64 2^16 0.201 15.063 us 2.94% 14.809 us 2.83% -0.254 us -1.68% SAME
I16 I32 I64 2^20 0.201 21.796 us 2.03% 21.520 us 2.04% -0.276 us -1.27% SAME
I16 I32 I64 2^24 0.201 99.656 us 1.04% 82.907 us 0.78% -16.750 us -16.81% FAST
I16 I32 I64 2^28 0.201 1.274 ms 0.38% 1.004 ms 0.14% -269.402 us -21.15% FAST
I16 I64 I32 2^16 1 15.795 us 3.31% 15.854 us 3.36% 0.059 us 0.37% SAME
I16 I64 I32 2^20 1 25.993 us 1.88% 24.018 us 1.96% -1.975 us -7.60% FAST
I16 I64 I32 2^24 1 147.478 us 0.67% 114.396 us 0.69% -33.082 us -22.43% FAST
I16 I64 I32 2^28 1 2.079 ms 0.19% 1.529 ms 0.15% -549.648 us -26.44% FAST
I16 I64 I32 2^16 0.201 15.146 us 3.51% 15.238 us 3.44% 0.092 us 0.61% SAME
I16 I64 I32 2^20 0.201 24.531 us 2.10% 23.424 us 3.23% -1.107 us -4.51% FAST
I16 I64 I32 2^24 0.201 136.093 us 0.87% 110.366 us 0.62% -25.727 us -18.90% FAST
I16 I64 I32 2^28 0.201 1.853 ms 0.17% 1.473 ms 0.10% -379.951 us -20.50% FAST
I16 I64 I64 2^16 1 15.962 us 2.81% 16.067 us 2.88% 0.105 us 0.66% SAME
I16 I64 I64 2^20 1 26.065 us 1.91% 24.202 us 1.93% -1.862 us -7.15% FAST
I16 I64 I64 2^24 1 149.181 us 0.68% 115.225 us 0.65% -33.956 us -22.76% FAST
I16 I64 I64 2^28 1 2.103 ms 0.20% 1.535 ms 0.14% -567.876 us -27.01% FAST
I16 I64 I64 2^16 0.201 15.205 us 3.46% 15.345 us 3.41% 0.140 us 0.92% SAME
I16 I64 I64 2^20 0.201 24.830 us 2.05% 23.603 us 2.01% -1.227 us -4.94% FAST
I16 I64 I64 2^24 0.201 136.422 us 0.88% 110.818 us 0.63% -25.604 us -18.77% FAST
I16 I64 I64 2^28 0.201 1.857 ms 0.16% 1.476 ms 0.11% -381.367 us -20.53% FAST
I32 I8 I32 2^16 1 16.517 us 2.76% 16.087 us 2.79% -0.430 us -2.60% SAME
I32 I8 I32 2^20 1 27.152 us 1.79% 23.438 us 2.00% -3.715 us -13.68% FAST
I32 I8 I32 2^24 1 130.785 us 0.82% 77.853 us 0.81% -52.932 us -40.47% FAST
I32 I8 I32 2^28 1 1.743 ms 0.32% 900.123 us 0.32% -843.348 us -48.37% FAST
I32 I8 I32 2^16 0.201 15.951 us 3.02% 15.651 us 3.04% -0.300 us -1.88% SAME
I32 I8 I32 2^20 0.201 25.319 us 2.04% 22.333 us 1.94% -2.986 us -11.79% FAST
I32 I8 I32 2^24 0.201 121.430 us 0.81% 73.955 us 0.93% -47.475 us -39.10% FAST
I32 I8 I32 2^28 0.201 1.582 ms 0.31% 842.349 us 0.40% -739.207 us -46.74% FAST
I32 I8 I64 2^16 1 16.573 us 2.89% 16.283 us 2.85% -0.290 us -1.75% SAME
I32 I8 I64 2^20 1 27.008 us 1.70% 23.735 us 1.89% -3.273 us -12.12% FAST
I32 I8 I64 2^24 1 134.769 us 1.74% 94.712 us 1.33% -40.058 us -29.72% FAST
I32 I8 I64 2^28 1 1.810 ms 0.39% 1.229 ms 0.51% -581.485 us -32.13% FAST
I32 I8 I64 2^16 0.201 16.144 us 3.15% 15.820 us 3.29% -0.324 us -2.01% SAME
I32 I8 I64 2^20 0.201 25.336 us 1.85% 22.440 us 1.98% -2.896 us -11.43% FAST
I32 I8 I64 2^24 0.201 123.752 us 0.91% 86.997 us 1.26% -36.755 us -29.70% FAST
I32 I8 I64 2^28 0.201 1.627 ms 0.33% 1.095 ms 0.46% -531.965 us -32.69% FAST
I32 I16 I32 2^16 1 16.885 us 2.89% 16.410 us 2.80% -0.475 us -2.82% FAST
I32 I16 I32 2^20 1 23.866 us 2.03% 23.367 us 2.12% -0.499 us -2.09% FAST
I32 I16 I32 2^24 1 104.317 us 0.87% 85.237 us 0.78% -19.080 us -18.29% FAST
I32 I16 I32 2^28 1 1.358 ms 0.26% 1.034 ms 0.32% -323.932 us -23.85% FAST
I32 I16 I32 2^16 0.201 16.067 us 3.37% 15.842 us 5.10% -0.225 us -1.40% SAME
I32 I16 I32 2^20 0.201 22.671 us 2.11% 22.361 us 2.16% -0.310 us -1.37% SAME
I32 I16 I32 2^24 0.201 94.677 us 0.97% 80.347 us 0.86% -14.330 us -15.14% FAST
I32 I16 I32 2^28 0.201 1.230 ms 0.40% 958.433 us 0.37% -271.655 us -22.08% FAST
I32 I16 I64 2^16 1 17.115 us 3.10% 16.617 us 3.19% -0.498 us -2.91% SAME
I32 I16 I64 2^20 1 24.249 us 2.06% 23.671 us 2.06% -0.578 us -2.38% FAST
I32 I16 I64 2^24 1 107.443 us 0.75% 86.094 us 0.75% -21.349 us -19.87% FAST
I32 I16 I64 2^28 1 1.431 ms 0.37% 1.046 ms 0.31% -385.037 us -26.90% FAST
I32 I16 I64 2^16 0.201 16.258 us 3.10% 16.046 us 3.24% -0.212 us -1.31% SAME
I32 I16 I64 2^20 0.201 22.806 us 2.11% 22.359 us 2.18% -0.447 us -1.96% SAME
I32 I16 I64 2^24 0.201 96.458 us 1.20% 80.615 us 0.87% -15.843 us -16.42% FAST
I32 I16 I64 2^28 0.201 1.232 ms 0.39% 963.305 us 0.21% -268.316 us -21.79% FAST
I32 I32 I32 2^16 1 16.058 us 2.98% 16.087 us 2.93% 0.029 us 0.18% SAME
I32 I32 I32 2^20 1 26.051 us 1.97% 23.891 us 2.06% -2.160 us -8.29% FAST
I32 I32 I32 2^24 1 122.515 us 1.03% 95.164 us 0.71% -27.351 us -22.32% FAST
I32 I32 I32 2^28 1 1.682 ms 0.18% 1.217 ms 0.11% -465.378 us -27.67% FAST
I32 I32 I32 2^16 0.201 15.638 us 3.04% 15.689 us 3.06% 0.051 us 0.33% SAME
I32 I32 I32 2^20 0.201 24.347 us 1.90% 22.814 us 2.12% -1.533 us -6.30% FAST
I32 I32 I32 2^24 0.201 111.842 us 1.20% 92.482 us 0.75% -19.360 us -17.31% FAST
I32 I32 I32 2^28 0.201 1.531 ms 0.32% 1.188 ms 0.41% -343.625 us -22.44% FAST
I32 I32 I64 2^16 1 16.262 us 3.11% 16.352 us 3.07% 0.090 us 0.55% SAME
I32 I32 I64 2^20 1 26.267 us 1.87% 24.097 us 2.02% -2.169 us -8.26% FAST
I32 I32 I64 2^24 1 124.129 us 1.27% 95.433 us 0.68% -28.696 us -23.12% FAST
I32 I32 I64 2^28 1 1.717 ms 0.25% 1.223 ms 0.31% -494.089 us -28.77% FAST
I32 I32 I64 2^16 0.201 15.773 us 3.09% 15.878 us 3.19% 0.105 us 0.66% SAME
I32 I32 I64 2^20 0.201 24.519 us 1.93% 22.963 us 2.17% -1.557 us -6.35% FAST
I32 I32 I64 2^24 0.201 111.975 us 1.08% 92.768 us 0.75% -19.207 us -17.15% FAST
I32 I32 I64 2^28 0.201 1.532 ms 0.32% 1.191 ms 0.41% -341.225 us -22.27% FAST
I32 I64 I32 2^16 1 16.553 us 2.93% 16.715 us 2.92% 0.161 us 0.98% SAME
I32 I64 I32 2^20 1 26.896 us 2.00% 25.514 us 2.03% -1.382 us -5.14% FAST
I32 I64 I32 2^24 1 138.171 us 1.39% 129.012 us 0.67% -9.159 us -6.63% FAST
I32 I64 I32 2^28 1 1.925 ms 0.15% 1.787 ms 0.18% -138.319 us -7.18% FAST
I32 I64 I32 2^16 0.201 15.759 us 3.11% 15.936 us 3.03% 0.177 us 1.12% SAME
I32 I64 I32 2^20 0.201 25.474 us 1.99% 24.653 us 2.05% -0.821 us -3.22% FAST
I32 I64 I32 2^24 0.201 127.931 us 0.66% 126.938 us 0.65% -0.994 us -0.78% FAST
I32 I64 I32 2^28 0.201 1.780 ms 0.18% 1.754 ms 0.20% -26.000 us -1.46% FAST
I32 I64 I64 2^16 1 16.585 us 2.81% 16.826 us 2.81% 0.241 us 1.46% SAME
I32 I64 I64 2^20 1 26.826 us 1.92% 25.413 us 1.91% -1.413 us -5.27% FAST
I32 I64 I64 2^24 1 138.345 us 0.56% 129.706 us 0.61% -8.639 us -6.24% FAST
I32 I64 I64 2^28 1 1.929 ms 0.16% 1.798 ms 0.17% -131.265 us -6.80% FAST
I32 I64 I64 2^16 0.201 15.822 us 2.98% 16.023 us 2.92% 0.201 us 1.27% SAME
I32 I64 I64 2^20 0.201 25.641 us 1.98% 24.792 us 1.97% -0.849 us -3.31% FAST
I32 I64 I64 2^24 0.201 127.988 us 0.61% 126.959 us 0.63% -1.029 us -0.80% FAST
I32 I64 I64 2^28 0.201 1.784 ms 0.17% 1.756 ms 0.19% -28.185 us -1.58% FAST
I64 I8 I32 2^16 1 16.966 us 2.60% 17.173 us 2.53% 0.207 us 1.22% SAME
I64 I8 I32 2^20 1 27.796 us 1.74% 26.335 us 1.86% -1.461 us -5.26% FAST
I64 I8 I32 2^24 1 140.780 us 0.74% 114.500 us 0.72% -26.279 us -18.67% FAST
I64 I8 I32 2^28 1 1.933 ms 0.46% 1.495 ms 0.39% -438.760 us -22.69% FAST
I64 I8 I32 2^16 0.201 16.222 us 2.84% 16.500 us 2.70% 0.278 us 1.71% SAME
I64 I8 I32 2^20 0.201 25.665 us 2.05% 24.234 us 1.95% -1.431 us -5.58% FAST
I64 I8 I32 2^24 0.201 127.827 us 1.00% 106.190 us 0.64% -21.636 us -16.93% FAST
I64 I8 I32 2^28 0.201 1.769 ms 0.44% 1.375 ms 0.50% -394.278 us -22.29% FAST
I64 I8 I64 2^16 1 16.848 us 2.94% 17.010 us 2.89% 0.162 us 0.96% SAME
I64 I8 I64 2^20 1 27.987 us 1.77% 26.509 us 1.79% -1.479 us -5.28% FAST
I64 I8 I64 2^24 1 142.281 us 0.66% 114.893 us 0.76% -27.388 us -19.25% FAST
I64 I8 I64 2^28 1 1.952 ms 0.44% 1.499 ms 0.39% -452.573 us -23.19% FAST
I64 I8 I64 2^16 0.201 16.448 us 2.88% 16.618 us 2.93% 0.171 us 1.04% SAME
I64 I8 I64 2^20 0.201 25.708 us 1.86% 24.324 us 2.03% -1.384 us -5.38% FAST
I64 I8 I64 2^24 0.201 129.597 us 1.03% 106.490 us 0.67% -23.107 us -17.83% FAST
I64 I8 I64 2^28 0.201 1.798 ms 0.39% 1.377 ms 0.50% -420.480 us -23.39% FAST
I64 I16 I32 2^16 1 17.172 us 2.32% 17.311 us 2.33% 0.139 us 0.81% SAME
I64 I16 I32 2^20 1 28.161 us 1.71% 26.443 us 1.81% -1.718 us -6.10% FAST
I64 I16 I32 2^24 1 146.020 us 0.66% 117.339 us 1.48% -28.682 us -19.64% FAST
I64 I16 I32 2^28 1 2.016 ms 0.47% 1.545 ms 0.46% -471.839 us -23.40% FAST
I64 I16 I32 2^16 0.201 16.581 us 2.62% 16.727 us 2.68% 0.145 us 0.88% SAME
I64 I16 I32 2^20 0.201 26.275 us 2.20% 24.542 us 1.92% -1.733 us -6.59% FAST
I64 I16 I32 2^24 0.201 133.994 us 0.94% 112.090 us 0.62% -21.904 us -16.35% FAST
I64 I16 I32 2^28 0.201 1.884 ms 0.35% 1.481 ms 0.50% -403.468 us -21.42% FAST
I64 I16 I64 2^16 1 17.298 us 2.42% 17.462 us 2.45% 0.164 us 0.95% SAME
I64 I16 I64 2^20 1 28.312 us 1.74% 26.597 us 1.83% -1.715 us -6.06% FAST
I64 I16 I64 2^24 1 147.594 us 0.73% 126.879 us 0.73% -20.715 us -14.03% FAST
I64 I16 I64 2^28 1 2.043 ms 0.45% 1.698 ms 0.48% -345.379 us -16.90% FAST
I64 I16 I64 2^16 0.201 16.719 us 2.74% 16.858 us 2.75% 0.140 us 0.84% SAME
I64 I16 I64 2^20 0.201 26.212 us 1.97% 24.581 us 1.99% -1.631 us -6.22% FAST
I64 I16 I64 2^24 0.201 134.687 us 0.95% 117.976 us 0.69% -16.712 us -12.41% FAST
I64 I16 I64 2^28 0.201 1.906 ms 0.32% 1.571 ms 0.50% -334.945 us -17.58% FAST
I64 I32 I32 2^16 1 17.047 us 2.50% 17.225 us 2.49% 0.178 us 1.04% SAME
I64 I32 I32 2^20 1 28.353 us 1.80% 27.531 us 1.84% -0.822 us -2.90% FAST
I64 I32 I32 2^24 1 142.894 us 0.60% 131.470 us 1.31% -11.424 us -7.99% FAST
I64 I32 I32 2^28 1 1.973 ms 0.48% 1.788 ms 0.50% -185.656 us -9.41% FAST
I64 I32 I32 2^16 0.201 16.492 us 3.14% 16.662 us 3.10% 0.170 us 1.03% SAME
I64 I32 I32 2^20 0.201 26.044 us 1.95% 25.016 us 2.15% -1.029 us -3.95% FAST
I64 I32 I32 2^24 0.201 128.936 us 0.71% 128.303 us 0.66% -0.633 us -0.49% SAME
I64 I32 I32 2^28 0.201 1.780 ms 0.50% 1.757 ms 0.50% -23.263 us -1.31% FAST
I64 I32 I64 2^16 1 17.383 us 2.64% 17.571 us 2.60% 0.188 us 1.08% SAME
I64 I32 I64 2^20 1 28.561 us 1.93% 27.967 us 2.03% -0.593 us -2.08% FAST
I64 I32 I64 2^24 1 143.656 us 0.61% 132.285 us 0.70% -11.371 us -7.92% FAST
I64 I32 I64 2^28 1 1.981 ms 0.48% 1.795 ms 0.50% -186.327 us -9.40% FAST
I64 I32 I64 2^16 0.201 16.643 us 2.99% 16.820 us 3.09% 0.178 us 1.07% SAME
I64 I32 I64 2^20 0.201 26.285 us 1.99% 25.240 us 1.93% -1.045 us -3.98% FAST
I64 I32 I64 2^24 0.201 129.459 us 0.65% 128.730 us 0.62% -0.729 us -0.56% SAME
I64 I32 I64 2^28 0.201 1.785 ms 0.50% 1.758 ms 0.50% -26.628 us -1.49% FAST
I64 I64 I32 2^16 1 17.097 us 2.80% 17.316 us 2.74% 0.219 us 1.28% SAME
I64 I64 I32 2^20 1 30.309 us 1.74% 30.371 us 1.70% 0.061 us 0.20% SAME
I64 I64 I32 2^24 1 175.537 us 0.44% 174.788 us 0.43% -0.749 us -0.43% SAME
I64 I64 I32 2^28 1 2.537 ms 0.46% 2.530 ms 0.50% -6.380 us -0.25% SAME
I64 I64 I32 2^16 0.201 16.287 us 2.93% 16.446 us 2.96% 0.158 us 0.97% SAME
I64 I64 I32 2^20 0.201 27.250 us 2.06% 27.767 us 2.05% 0.516 us 1.90% SAME
I64 I64 I32 2^24 0.201 157.763 us 0.57% 165.940 us 0.50% 8.177 us 5.18% SLOW
I64 I64 I32 2^28 0.201 2.291 ms 0.50% 2.408 ms 0.50% 117.108 us 5.11% SLOW
I64 I64 I64 2^16 1 17.141 us 2.65% 17.414 us 2.67% 0.273 us 1.59% SAME
I64 I64 I64 2^20 1 30.243 us 1.66% 30.348 us 1.61% 0.106 us 0.35% SAME
I64 I64 I64 2^24 1 176.196 us 0.42% 175.812 us 0.41% -0.384 us -0.22% SAME
I64 I64 I64 2^28 1 2.544 ms 0.45% 2.541 ms 0.50% -3.527 us -0.14% SAME
I64 I64 I64 2^16 0.201 16.149 us 2.82% 16.349 us 2.80% 0.200 us 1.24% SAME
I64 I64 I64 2^20 0.201 27.092 us 1.90% 27.774 us 2.00% 0.682 us 2.52% SLOW
I64 I64 I64 2^24 0.201 157.878 us 0.55% 166.233 us 0.49% 8.355 us 5.29% SLOW
I64 I64 I64 2^28 0.201 2.289 ms 0.50% 2.409 ms 0.50% 119.747 us 5.23% SLOW
I128 I8 I32 2^16 1 18.275 us 2.44% 18.577 us 2.40% 0.302 us 1.65% SAME
I128 I8 I32 2^20 1 35.408 us 1.40% 34.583 us 1.38% -0.825 us -2.33% FAST
I128 I8 I32 2^24 1 223.888 us 0.41% 210.258 us 0.58% -13.631 us -6.09% FAST
I128 I8 I32 2^28 1 3.257 ms 0.50% 3.040 ms 0.50% -216.933 us -6.66% FAST
I128 I8 I32 2^16 0.201 17.487 us 2.58% 17.769 us 2.48% 0.282 us 1.61% SAME
I128 I8 I32 2^20 0.201 29.897 us 1.91% 30.142 us 1.72% 0.246 us 0.82% SAME
I128 I8 I32 2^24 0.201 182.975 us 0.49% 176.978 us 0.53% -5.996 us -3.28% FAST
I128 I8 I32 2^28 0.201 2.659 ms 0.50% 2.538 ms 0.50% -120.763 us -4.54% FAST
I128 I8 I64 2^16 1 18.408 us 2.24% 18.575 us 2.24% 0.167 us 0.91% SAME
I128 I8 I64 2^20 1 35.523 us 1.31% 35.601 us 3.56% 0.078 us 0.22% SAME
I128 I8 I64 2^24 1 225.036 us 0.43% 226.184 us 0.66% 1.148 us 0.51% SLOW
I128 I8 I64 2^28 1 3.271 ms 0.50% 3.292 ms 0.50% 20.572 us 0.63% SLOW
I128 I8 I64 2^16 0.201 17.581 us 4.26% 17.712 us 2.67% 0.131 us 0.75% SAME
I128 I8 I64 2^20 0.201 30.068 us 1.99% 30.950 us 1.78% 0.882 us 2.93% SLOW
I128 I8 I64 2^24 0.201 185.306 us 0.47% 192.074 us 0.51% 6.767 us 3.65% SLOW
I128 I8 I64 2^28 0.201 2.697 ms 0.50% 2.792 ms 0.50% 95.068 us 3.53% SLOW
I128 I16 I32 2^16 1 18.602 us 2.46% 18.822 us 2.45% 0.221 us 1.19% SAME
I128 I16 I32 2^20 1 36.009 us 1.69% 36.341 us 1.82% 0.332 us 0.92% SAME
I128 I16 I32 2^24 1 229.340 us 0.62% 232.328 us 0.78% 2.987 us 1.30% SLOW
I128 I16 I32 2^28 1 3.334 ms 0.50% 3.371 ms 0.50% 36.478 us 1.09% SLOW
I128 I16 I32 2^16 0.201 17.405 us 2.74% 17.628 us 2.74% 0.223 us 1.28% SAME
I128 I16 I32 2^20 0.201 30.623 us 1.80% 31.631 us 1.72% 1.008 us 3.29% SLOW
I128 I16 I32 2^24 0.201 191.125 us 0.43% 198.631 us 0.45% 7.506 us 3.93% SLOW
I128 I16 I32 2^28 0.201 2.787 ms 0.50% 2.897 ms 0.50% 110.566 us 3.97% SLOW
I128 I16 I64 2^16 1 18.736 us 2.47% 18.980 us 2.45% 0.244 us 1.30% SAME
I128 I16 I64 2^20 1 36.164 us 1.38% 36.206 us 1.69% 0.042 us 0.12% SAME
I128 I16 I64 2^24 1 231.466 us 0.46% 233.339 us 0.79% 1.873 us 0.81% SLOW
I128 I16 I64 2^28 1 3.362 ms 0.50% 3.393 ms 0.50% 31.469 us 0.94% SLOW
I128 I16 I64 2^16 0.201 18.077 us 2.75% 18.293 us 2.72% 0.216 us 1.19% SAME
I128 I16 I64 2^20 0.201 30.992 us 1.94% 31.788 us 1.81% 0.796 us 2.57% SLOW
I128 I16 I64 2^24 0.201 193.485 us 0.45% 199.129 us 0.46% 5.644 us 2.92% SLOW
I128 I16 I64 2^28 0.201 2.823 ms 0.50% 2.899 ms 0.50% 76.362 us 2.71% SLOW
I128 I32 I32 2^16 1 18.837 us 2.40% 19.056 us 2.36% 0.219 us 1.16% SAME
I128 I32 I32 2^20 1 36.417 us 1.43% 36.908 us 1.64% 0.490 us 1.35% SAME
I128 I32 I32 2^24 1 236.331 us 0.49% 237.234 us 0.57% 0.902 us 0.38% SAME
I128 I32 I32 2^28 1 3.449 ms 0.50% 3.462 ms 0.50% 12.927 us 0.37% SAME
I128 I32 I32 2^16 0.201 17.854 us 2.83% 18.072 us 2.83% 0.218 us 1.22% SAME
I128 I32 I32 2^20 0.201 31.135 us 1.93% 32.303 us 1.77% 1.168 us 3.75% SLOW
I128 I32 I32 2^24 0.201 201.164 us 0.37% 207.735 us 0.41% 6.572 us 3.27% SLOW
I128 I32 I32 2^28 0.201 2.955 ms 0.50% 3.041 ms 0.50% 85.479 us 2.89% SLOW
I128 I32 I64 2^16 1 18.905 us 2.41% 19.108 us 2.44% 0.203 us 1.08% SAME
I128 I32 I64 2^20 1 36.464 us 1.67% 36.502 us 1.80% 0.039 us 0.11% SAME
I128 I32 I64 2^24 1 239.000 us 0.56% 238.168 us 0.56% -0.832 us -0.35% SAME
I128 I32 I64 2^28 1 3.493 ms 0.50% 3.496 ms 0.50% 2.664 us 0.08% SAME
I128 I32 I64 2^16 0.201 17.969 us 2.63% 18.188 us 2.58% 0.219 us 1.22% SAME
I128 I32 I64 2^20 0.201 31.582 us 1.79% 32.378 us 3.85% 0.797 us 2.52% SLOW
I128 I32 I64 2^24 0.201 203.201 us 0.34% 208.798 us 0.39% 5.597 us 2.75% SLOW
I128 I32 I64 2^28 0.201 2.979 ms 0.50% 3.054 ms 0.50% 74.309 us 2.49% SLOW
I128 I64 I32 2^16 1 18.458 us 2.53% 18.617 us 2.47% 0.159 us 0.86% SAME
I128 I64 I32 2^20 1 37.599 us 1.52% 36.506 us 1.50% -1.093 us -2.91% FAST
I128 I64 I32 2^24 1 283.196 us 0.36% 258.232 us 0.37% -24.965 us -8.82% FAST
I128 I64 I32 2^28 1 4.272 ms 0.50% 3.869 ms 0.50% -403.590 us -9.45% FAST
I128 I64 I32 2^16 0.201 17.813 us 2.80% 18.006 us 2.75% 0.193 us 1.08% SAME
I128 I64 I32 2^20 0.201 34.164 us 2.44% 33.937 us 1.58% -0.227 us -0.67% SAME
I128 I64 I32 2^24 0.201 254.669 us 0.41% 239.549 us 0.38% -15.120 us -5.94% FAST
I128 I64 I32 2^28 0.201 3.846 ms 0.50% 3.600 ms 0.50% -246.238 us -6.40% FAST
I128 I64 I64 2^16 1 18.510 us 2.54% 18.635 us 2.51% 0.125 us 0.67% SAME
I128 I64 I64 2^20 1 37.770 us 1.47% 36.366 us 1.50% -1.404 us -3.72% FAST
I128 I64 I64 2^24 1 285.239 us 0.30% 256.973 us 0.35% -28.266 us -9.91% FAST
I128 I64 I64 2^28 1 4.308 ms 0.50% 3.855 ms 0.50% -452.562 us -10.51% FAST
I128 I64 I64 2^16 0.201 17.666 us 2.47% 17.827 us 2.49% 0.161 us 0.91% SAME
I128 I64 I64 2^20 0.201 34.076 us 1.52% 33.777 us 1.58% -0.299 us -0.88% SAME
I128 I64 I64 2^24 0.201 255.505 us 0.37% 239.081 us 0.38% -16.424 us -6.43% FAST
I128 I64 I64 2^28 0.201 3.861 ms 0.50% 3.593 ms 0.50% -268.310 us -6.95% FAST
F32 I8 I32 2^16 1 16.537 us 2.97% 16.168 us 2.97% -0.369 us -2.23% SAME
F32 I8 I32 2^20 1 27.092 us 1.66% 23.323 us 3.36% -3.770 us -13.91% FAST
F32 I8 I32 2^24 1 131.122 us 0.84% 77.731 us 0.78% -53.391 us -40.72% FAST
F32 I8 I32 2^28 1 1.741 ms 0.29% 900.509 us 0.32% -840.851 us -48.29% FAST
F32 I8 I32 2^16 0.201 16.055 us 4.67% 15.730 us 2.90% -0.325 us -2.02% SAME
F32 I8 I32 2^20 0.201 25.528 us 2.13% 22.493 us 2.02% -3.035 us -11.89% FAST
F32 I8 I32 2^24 0.201 120.951 us 0.76% 73.705 us 0.95% -47.246 us -39.06% FAST
F32 I8 I32 2^28 0.201 1.580 ms 0.31% 842.610 us 0.39% -737.462 us -46.67% FAST
F32 I8 I64 2^16 1 16.645 us 2.90% 16.353 us 2.81% -0.292 us -1.75% SAME
F32 I8 I64 2^20 1 27.104 us 1.75% 23.772 us 1.90% -3.331 us -12.29% FAST
F32 I8 I64 2^24 1 134.738 us 0.94% 93.647 us 1.27% -41.092 us -30.50% FAST
F32 I8 I64 2^28 1 1.806 ms 0.35% 1.227 ms 0.49% -578.909 us -32.05% FAST
F32 I8 I64 2^16 0.201 16.202 us 2.78% 15.857 us 2.94% -0.345 us -2.13% SAME
F32 I8 I64 2^20 0.201 25.251 us 1.90% 22.443 us 1.97% -2.808 us -11.12% FAST
F32 I8 I64 2^24 0.201 122.727 us 0.90% 86.903 us 1.32% -35.824 us -29.19% FAST
F32 I8 I64 2^28 0.201 1.627 ms 0.33% 1.096 ms 0.48% -531.730 us -32.68% FAST
F32 I16 I32 2^16 1 16.469 us 3.10% 15.999 us 3.14% -0.470 us -2.85% SAME
F32 I16 I32 2^20 1 23.844 us 1.93% 23.337 us 1.90% -0.507 us -2.13% FAST
F32 I16 I32 2^24 1 104.149 us 0.68% 85.342 us 0.76% -18.807 us -18.06% FAST
F32 I16 I32 2^28 1 1.359 ms 0.26% 1.035 ms 0.31% -324.549 us -23.88% FAST
F32 I16 I32 2^16 0.201 15.718 us 3.12% 15.517 us 3.16% -0.201 us -1.28% SAME
F32 I16 I32 2^20 0.201 22.484 us 2.05% 22.168 us 2.05% -0.316 us -1.40% SAME
F32 I16 I32 2^24 0.201 94.463 us 0.89% 79.970 us 0.87% -14.493 us -15.34% FAST
F32 I16 I32 2^28 0.201 1.228 ms 0.40% 957.031 us 0.39% -271.151 us -22.08% FAST
F32 I16 I64 2^16 1 16.685 us 2.92% 16.255 us 2.96% -0.430 us -2.58% SAME
F32 I16 I64 2^20 1 23.871 us 1.84% 23.253 us 1.87% -0.618 us -2.59% FAST
F32 I16 I64 2^24 1 107.111 us 0.80% 86.156 us 0.73% -20.955 us -19.56% FAST
F32 I16 I64 2^28 1 1.431 ms 0.32% 1.046 ms 0.30% -384.486 us -26.87% FAST
F32 I16 I64 2^16 0.201 15.852 us 4.95% 15.652 us 3.22% -0.200 us -1.26% SAME
F32 I16 I64 2^20 0.201 22.483 us 2.08% 22.032 us 2.13% -0.451 us -2.01% SAME
F32 I16 I64 2^24 0.201 95.342 us 0.97% 80.511 us 0.84% -14.831 us -15.56% FAST
F32 I16 I64 2^28 0.201 1.231 ms 0.41% 962.429 us 0.36% -268.548 us -21.82% FAST
F32 I32 I32 2^16 1 15.529 us 2.65% 15.574 us 2.63% 0.045 us 0.29% SAME
F32 I32 I32 2^20 1 25.706 us 1.94% 23.736 us 1.96% -1.970 us -7.66% FAST
F32 I32 I32 2^24 1 122.313 us 0.73% 94.710 us 0.68% -27.603 us -22.57% FAST
F32 I32 I32 2^28 1 1.682 ms 0.21% 1.223 ms 0.32% -458.927 us -27.28% FAST
F32 I32 I32 2^16 0.201 15.301 us 2.91% 15.365 us 2.92% 0.064 us 0.42% SAME
F32 I32 I32 2^20 0.201 24.375 us 2.18% 22.798 us 2.10% -1.577 us -6.47% FAST
F32 I32 I32 2^24 0.201 111.978 us 1.12% 92.302 us 0.72% -19.676 us -17.57% FAST
F32 I32 I32 2^28 0.201 1.526 ms 0.31% 1.190 ms 0.41% -336.501 us -22.05% FAST
F32 I32 I64 2^16 1 15.921 us 2.86% 16.039 us 4.74% 0.118 us 0.74% SAME
F32 I32 I64 2^20 1 26.262 us 2.02% 24.145 us 2.09% -2.117 us -8.06% FAST
F32 I32 I64 2^24 1 124.389 us 0.78% 95.622 us 0.68% -28.766 us -23.13% FAST
F32 I32 I64 2^28 1 1.716 ms 0.23% 1.229 ms 0.32% -486.716 us -28.37% FAST
F32 I32 I64 2^16 0.201 15.508 us 2.90% 15.653 us 2.97% 0.145 us 0.94% SAME
F32 I32 I64 2^20 0.201 24.485 us 2.24% 22.870 us 2.16% -1.615 us -6.59% FAST
F32 I32 I64 2^24 0.201 112.477 us 1.02% 92.772 us 0.72% -19.705 us -17.52% FAST
F32 I32 I64 2^28 0.201 1.525 ms 0.31% 1.193 ms 0.40% -332.406 us -21.80% FAST
F32 I64 I32 2^16 1 16.697 us 2.90% 16.891 us 2.96% 0.194 us 1.16% SAME
F32 I64 I32 2^20 1 26.898 us 1.87% 25.547 us 1.98% -1.352 us -5.02% FAST
F32 I64 I32 2^24 1 137.857 us 0.59% 128.907 us 0.65% -8.950 us -6.49% FAST
F32 I64 I32 2^28 1 1.926 ms 0.17% 1.787 ms 0.18% -139.361 us -7.24% FAST
F32 I64 I32 2^16 0.201 15.412 us 3.16% 15.616 us 3.19% 0.204 us 1.33% SAME
F32 I64 I32 2^20 0.201 25.513 us 1.93% 24.754 us 2.10% -0.759 us -2.98% FAST
F32 I64 I32 2^24 0.201 127.498 us 0.67% 126.527 us 0.65% -0.971 us -0.76% FAST
F32 I64 I32 2^28 0.201 1.779 ms 0.18% 1.753 ms 0.20% -25.854 us -1.45% FAST
F32 I64 I64 2^16 1 16.852 us 2.78% 17.155 us 2.71% 0.303 us 1.80% SAME
F32 I64 I64 2^20 1 27.063 us 1.92% 25.715 us 1.96% -1.348 us -4.98% FAST
F32 I64 I64 2^24 1 138.602 us 0.54% 129.834 us 0.63% -8.768 us -6.33% FAST
F32 I64 I64 2^28 1 1.930 ms 0.13% 1.798 ms 0.17% -132.225 us -6.85% FAST
F32 I64 I64 2^16 0.201 15.458 us 3.20% 15.704 us 3.10% 0.247 us 1.60% SAME
F32 I64 I64 2^20 0.201 25.675 us 2.23% 24.854 us 2.08% -0.821 us -3.20% FAST
F32 I64 I64 2^24 0.201 128.146 us 0.61% 127.285 us 0.64% -0.862 us -0.67% FAST
F32 I64 I64 2^28 0.201 1.784 ms 0.18% 1.756 ms 0.19% -27.864 us -1.56% FAST
F64 I8 I32 2^16 1 17.220 us 2.68% 17.443 us 2.64% 0.223 us 1.30% SAME
F64 I8 I32 2^20 1 27.940 us 1.85% 26.434 us 1.90% -1.506 us -5.39% FAST
F64 I8 I32 2^24 1 139.936 us 0.75% 114.141 us 0.77% -25.795 us -18.43% FAST
F64 I8 I32 2^28 1 1.922 ms 0.47% 1.491 ms 0.38% -431.397 us -22.44% FAST
F64 I8 I32 2^16 0.201 16.205 us 3.00% 16.476 us 3.05% 0.271 us 1.67% SAME
F64 I8 I32 2^20 0.201 25.971 us 2.10% 24.501 us 2.05% -1.470 us -5.66% FAST
F64 I8 I32 2^24 0.201 126.229 us 1.03% 106.032 us 0.66% -20.197 us -16.00% FAST
F64 I8 I32 2^28 0.201 1.741 ms 0.45% 1.371 ms 0.50% -369.865 us -21.25% FAST
F64 I8 I64 2^16 1 17.350 us 2.79% 17.540 us 2.77% 0.190 us 1.10% SAME
F64 I8 I64 2^20 1 28.103 us 1.78% 26.607 us 1.88% -1.496 us -5.32% FAST
F64 I8 I64 2^24 1 141.486 us 0.82% 114.644 us 0.75% -26.841 us -18.97% FAST
F64 I8 I64 2^28 1 1.947 ms 0.50% 1.494 ms 0.41% -453.068 us -23.27% FAST
F64 I8 I64 2^16 0.201 16.638 us 3.16% 16.861 us 3.16% 0.223 us 1.34% SAME
F64 I8 I64 2^20 0.201 26.120 us 2.23% 24.606 us 2.02% -1.514 us -5.80% FAST
F64 I8 I64 2^24 0.201 128.703 us 1.26% 106.322 us 0.64% -22.380 us -17.39% FAST
F64 I8 I64 2^28 0.201 1.798 ms 0.44% 1.373 ms 0.50% -425.211 us -23.65% FAST
F64 I16 I32 2^16 1 17.097 us 2.69% 17.346 us 2.67% 0.249 us 1.46% SAME
F64 I16 I32 2^20 1 28.043 us 1.88% 26.452 us 1.78% -1.591 us -5.67% FAST
F64 I16 I32 2^24 1 145.445 us 1.18% 117.074 us 0.72% -28.372 us -19.51% FAST
F64 I16 I32 2^28 1 2.020 ms 0.50% 1.538 ms 0.47% -481.838 us -23.86% FAST
F64 I16 I32 2^16 0.201 16.277 us 2.84% 16.479 us 2.78% 0.201 us 1.24% SAME
F64 I16 I32 2^20 0.201 25.988 us 2.07% 24.453 us 1.90% -1.535 us -5.91% FAST
F64 I16 I32 2^24 0.201 129.826 us 0.98% 112.168 us 0.64% -17.658 us -13.60% FAST
F64 I16 I32 2^28 0.201 1.803 ms 0.47% 1.479 ms 0.50% -324.182 us -17.98% FAST
F64 I16 I64 2^16 1 17.178 us 2.48% 17.419 us 2.47% 0.241 us 1.40% SAME
F64 I16 I64 2^20 1 27.986 us 1.78% 26.666 us 1.83% -1.319 us -4.71% FAST
F64 I16 I64 2^24 1 146.590 us 1.25% 126.655 us 0.72% -19.935 us -13.60% FAST
F64 I16 I64 2^28 1 2.057 ms 0.50% 1.696 ms 0.49% -360.760 us -17.54% FAST
F64 I16 I64 2^16 0.201 16.435 us 2.68% 16.653 us 2.62% 0.218 us 1.33% SAME
F64 I16 I64 2^20 0.201 26.325 us 1.92% 24.720 us 2.06% -1.605 us -6.10% FAST
F64 I16 I64 2^24 0.201 130.985 us 1.02% 117.640 us 0.70% -13.345 us -10.19% FAST
F64 I16 I64 2^28 0.201 1.819 ms 0.44% 1.566 ms 0.50% -253.302 us -13.92% FAST
F64 I32 I32 2^16 1 16.928 us 3.00% 17.135 us 2.91% 0.207 us 1.22% SAME
F64 I32 I32 2^20 1 29.166 us 1.63% 27.337 us 2.13% -1.829 us -6.27% FAST
F64 I32 I32 2^24 1 163.042 us 0.72% 131.484 us 0.66% -31.558 us -19.36% FAST
F64 I32 I32 2^28 1 2.308 ms 0.30% 1.787 ms 0.38% -521.097 us -22.58% FAST
F64 I32 I32 2^16 0.201 15.808 us 2.98% 16.012 us 2.89% 0.205 us 1.30% SAME
F64 I32 I32 2^20 0.201 26.446 us 2.27% 24.698 us 1.94% -1.748 us -6.61% FAST
F64 I32 I32 2^24 0.201 149.135 us 1.02% 127.618 us 0.65% -21.517 us -14.43% FAST
F64 I32 I32 2^28 0.201 2.131 ms 0.38% 1.754 ms 0.50% -377.346 us -17.71% FAST
F64 I32 I64 2^16 1 16.976 us 2.81% 17.173 us 2.76% 0.198 us 1.16% SAME
F64 I32 I64 2^20 1 29.275 us 1.65% 27.377 us 1.93% -1.898 us -6.48% FAST
F64 I32 I64 2^24 1 164.075 us 0.71% 131.973 us 0.65% -32.101 us -19.57% FAST
F64 I32 I64 2^28 1 2.328 ms 0.45% 1.794 ms 0.50% -534.008 us -22.94% FAST
F64 I32 I64 2^16 0.201 15.917 us 4.43% 16.125 us 2.88% 0.208 us 1.31% SAME
F64 I32 I64 2^20 0.201 26.622 us 2.30% 24.938 us 1.96% -1.685 us -6.33% FAST
F64 I32 I64 2^24 0.201 150.583 us 1.15% 128.135 us 1.62% -22.448 us -14.91% FAST
F64 I32 I64 2^28 0.201 2.156 ms 0.33% 1.756 ms 0.50% -399.563 us -18.54% FAST
F64 I64 I32 2^16 1 16.885 us 2.52% 17.082 us 2.45% 0.197 us 1.17% SAME
F64 I64 I32 2^20 1 29.921 us 1.69% 29.952 us 1.72% 0.030 us 0.10% SAME
F64 I64 I32 2^24 1 175.531 us 0.43% 175.101 us 0.42% -0.430 us -0.25% SAME
F64 I64 I32 2^28 1 2.536 ms 0.46% 2.531 ms 0.50% -5.539 us -0.22% SAME
F64 I64 I32 2^16 0.201 16.165 us 2.83% 16.367 us 2.73% 0.202 us 1.25% SAME
F64 I64 I32 2^20 0.201 26.795 us 1.89% 27.530 us 1.97% 0.735 us 2.74% SLOW
F64 I64 I32 2^24 0.201 157.415 us 0.54% 165.569 us 0.62% 8.154 us 5.18% SLOW
F64 I64 I32 2^28 0.201 2.288 ms 0.50% 2.400 ms 0.50% 112.349 us 4.91% SLOW
F64 I64 I64 2^16 1 16.958 us 2.48% 17.198 us 2.41% 0.240 us 1.42% SAME
F64 I64 I64 2^20 1 30.393 us 1.77% 30.524 us 1.79% 0.131 us 0.43% SAME
F64 I64 I64 2^24 1 176.359 us 0.46% 176.449 us 0.47% 0.090 us 0.05% SAME
F64 I64 I64 2^28 1 2.543 ms 0.46% 2.542 ms 0.50% -1.964 us -0.08% SAME
F64 I64 I64 2^16 0.201 16.694 us 2.86% 16.944 us 2.83% 0.249 us 1.49% SAME
F64 I64 I64 2^20 0.201 27.259 us 1.99% 27.884 us 1.94% 0.625 us 2.29% SLOW
F64 I64 I64 2^24 0.201 158.064 us 0.57% 166.296 us 1.09% 8.232 us 5.21% SLOW
F64 I64 I64 2^28 0.201 2.286 ms 0.50% 2.407 ms 0.50% 120.540 us 5.27% SLOW

Summary

  • Total Matches: 448
    • Pass (diff <= min_noise): 130
    • Unknown (infinite noise): 0
    • Failure (diff > min_noise): 318
A100 (LDGSTS)

Did not fit due to GH comment character limit, see newer comment below.

RTX 5090 (UBLKCPY)

['/home/pgrossebley/SM_120_merge_pairs_old_final_CTK13.json', '/home/pgrossebley/SM_120_merge_pairs_newest_final_CTK13.json']

base

[0] NVIDIA GeForce RTX 5090

KeyT{ct} ValueT{ct} OffsetT{ct} Elements{io} Entropy Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I8 I32 2^16 1 11.771 us 7.56% 11.770 us 7.56% -0.001 us -0.01% SAME
I8 I8 I32 2^20 1 18.290 us 1.28% 18.182 us 1.44% -0.108 us -0.59% SAME
I8 I8 I32 2^24 1 80.736 us 1.26% 59.197 us 1.01% -21.539 us -26.68% FAST
I8 I8 I32 2^28 1 1.017 ms 0.22% 777.521 us 0.24% -239.766 us -23.57% FAST
I8 I8 I32 2^16 0.201 10.288 us 2.69% 10.300 us 2.96% 0.012 us 0.12% SAME
I8 I8 I32 2^20 0.201 18.307 us 1.22% 18.163 us 2.61% -0.144 us -0.79% SAME
I8 I8 I32 2^24 0.201 78.219 us 1.12% 57.219 us 1.80% -21.000 us -26.85% FAST
I8 I8 I32 2^28 0.201 1.001 ms 0.21% 758.440 us 0.29% -242.922 us -24.26% FAST
I8 I8 I64 2^16 1 14.475 us 8.38% 13.848 us 6.30% -0.627 us -4.33% SAME
I8 I8 I64 2^20 1 18.334 us 1.55% 18.292 us 1.28% -0.042 us -0.23% SAME
I8 I8 I64 2^24 1 71.260 us 1.17% 59.361 us 0.88% -11.899 us -16.70% FAST
I8 I8 I64 2^28 1 829.328 us 0.19% 775.783 us 0.27% -53.545 us -6.46% FAST
I8 I8 I64 2^16 0.201 10.442 us 5.83% 10.316 us 3.74% -0.126 us -1.21% SAME
I8 I8 I64 2^20 0.201 18.281 us 1.31% 18.246 us 1.51% -0.036 us -0.20% SAME
I8 I8 I64 2^24 0.201 68.179 us 1.53% 59.666 us 2.39% -8.513 us -12.49% FAST
I8 I8 I64 2^28 0.201 811.722 us 0.15% 758.387 us 0.29% -53.335 us -6.57% FAST
I8 I16 I32 2^16 1 10.295 us 3.19% 10.276 us 2.59% -0.019 us -0.19% SAME
I8 I16 I32 2^20 1 18.559 us 2.92% 18.402 us 0.68% -0.158 us -0.85% FAST
I8 I16 I32 2^24 1 90.351 us 0.92% 78.307 us 1.18% -12.044 us -13.33% FAST
I8 I16 I32 2^28 1 1.185 ms 0.15% 1.132 ms 0.18% -53.480 us -4.51% FAST
I8 I16 I32 2^16 0.201 13.625 us 7.14% 13.526 us 7.40% -0.099 us -0.73% SAME
I8 I16 I32 2^20 0.201 18.323 us 1.18% 18.274 us 1.32% -0.049 us -0.27% SAME
I8 I16 I32 2^24 0.201 88.189 us 0.80% 78.641 us 1.50% -9.548 us -10.83% FAST
I8 I16 I32 2^28 0.201 1.163 ms 0.16% 1.115 ms 0.19% -47.677 us -4.10% FAST
I8 I16 I64 2^16 1 10.290 us 3.07% 10.293 us 3.17% 0.003 us 0.03% SAME
I8 I16 I64 2^20 1 18.356 us 1.77% 18.280 us 1.30% -0.076 us -0.42% SAME
I8 I16 I64 2^24 1 85.750 us 1.13% 78.643 us 1.30% -7.107 us -8.29% FAST
I8 I16 I64 2^28 1 1.170 ms 0.18% 1.134 ms 0.17% -36.323 us -3.10% FAST
I8 I16 I64 2^16 0.201 11.366 us 8.96% 10.348 us 4.41% -1.018 us -8.96% FAST
I8 I16 I64 2^20 0.201 18.496 us 3.64% 18.230 us 1.40% -0.266 us -1.44% FAST
I8 I16 I64 2^24 0.201 85.209 us 1.70% 78.825 us 1.78% -6.383 us -7.49% FAST
I8 I16 I64 2^28 0.201 1.150 ms 0.17% 1.116 ms 0.16% -34.203 us -2.97% FAST
I8 I32 I32 2^16 1 13.886 us 6.03% 13.403 us 7.56% -0.483 us -3.48% SAME
I8 I32 I32 2^20 1 20.464 us 0.66% 19.318 us 5.30% -1.146 us -5.60% FAST
I8 I32 I32 2^24 1 127.663 us 1.20% 126.074 us 1.23% -1.589 us -1.24% FAST
I8 I32 I32 2^28 1 1.868 ms 0.14% 1.855 ms 0.15% -12.829 us -0.69% FAST
I8 I32 I32 2^16 0.201 12.755 us 6.73% 12.855 us 7.13% 0.100 us 0.79% SAME
I8 I32 I32 2^20 0.201 20.299 us 1.57% 18.558 us 4.10% -1.742 us -8.58% FAST
I8 I32 I32 2^24 0.201 127.007 us 1.08% 125.909 us 0.95% -1.098 us -0.86% SAME
I8 I32 I32 2^28 0.201 1.848 ms 0.14% 1.838 ms 0.16% -10.579 us -0.57% FAST
I8 I32 I64 2^16 1 13.184 us 7.69% 13.034 us 7.54% -0.150 us -1.14% SAME
I8 I32 I64 2^20 1 19.921 us 4.60% 18.298 us 1.73% -1.623 us -8.15% FAST
I8 I32 I64 2^24 1 128.786 us 1.24% 127.103 us 1.17% -1.683 us -1.31% FAST
I8 I32 I64 2^28 1 1.868 ms 0.16% 1.856 ms 0.15% -12.117 us -0.65% FAST
I8 I32 I64 2^16 0.201 14.151 us 4.12% 13.885 us 6.10% -0.267 us -1.88% SAME
I8 I32 I64 2^20 0.201 20.304 us 1.23% 18.971 us 5.36% -1.333 us -6.57% FAST
I8 I32 I64 2^24 0.201 124.119 us 1.07% 123.521 us 0.95% -0.598 us -0.48% SAME
I8 I32 I64 2^28 0.201 1.848 ms 0.14% 1.838 ms 0.16% -9.769 us -0.53% FAST
I8 I64 I32 2^16 1 13.357 us 7.61% 12.901 us 7.22% -0.456 us -3.41% SAME
I8 I64 I32 2^20 1 24.103 us 4.46% 22.699 us 2.33% -1.403 us -5.82% FAST
I8 I64 I32 2^24 1 215.623 us 0.58% 214.787 us 0.62% -0.835 us -0.39% SAME
I8 I64 I32 2^28 1 3.367 ms 0.10% 3.346 ms 0.10% -20.965 us -0.62% FAST
I8 I64 I32 2^16 0.201 13.622 us 7.15% 13.478 us 7.60% -0.143 us -1.05% SAME
I8 I64 I32 2^20 0.201 23.216 us 4.56% 22.276 us 1.20% -0.939 us -4.05% FAST
I8 I64 I32 2^24 0.201 214.775 us 0.64% 214.123 us 0.61% -0.651 us -0.30% SAME
I8 I64 I32 2^28 0.201 3.326 ms 0.09% 3.308 ms 0.10% -18.009 us -0.54% FAST
I8 I64 I64 2^16 1 14.332 us 0.29% 14.333 us 0.30% 0.001 us 0.01% SAME
I8 I64 I64 2^20 1 25.476 us 7.33% 23.465 us 3.48% -2.011 us -7.89% FAST
I8 I64 I64 2^24 1 215.852 us 0.58% 215.137 us 0.58% -0.714 us -0.33% SAME
I8 I64 I64 2^28 1 3.367 ms 0.09% 3.347 ms 0.10% -20.172 us -0.60% FAST
I8 I64 I64 2^16 0.201 13.508 us 7.43% 13.404 us 7.61% -0.103 us -0.76% SAME
I8 I64 I64 2^20 0.201 22.314 us 1.44% 22.275 us 1.85% -0.039 us -0.17% SAME
I8 I64 I64 2^24 0.201 214.607 us 0.68% 213.961 us 0.62% -0.646 us -0.30% SAME
I8 I64 I64 2^28 0.201 3.327 ms 0.09% 3.309 ms 0.09% -18.248 us -0.55% FAST
I16 I8 I32 2^16 1 14.804 us 5.92% 14.289 us 2.02% -0.515 us -3.48% FAST
I16 I8 I32 2^20 1 15.364 us 6.42% 15.369 us 6.42% 0.005 us 0.03% SAME
I16 I8 I32 2^24 1 100.379 us 0.31% 79.646 us 0.99% -20.733 us -20.65% FAST
I16 I8 I32 2^28 1 1.325 ms 0.16% 1.163 ms 0.18% -161.982 us -12.23% FAST
I16 I8 I32 2^16 0.201 14.569 us 4.51% 14.333 us 0.23% -0.236 us -1.62% FAST
I16 I8 I32 2^20 0.201 18.278 us 1.80% 18.234 us 2.14% -0.044 us -0.24% SAME
I16 I8 I32 2^24 0.201 92.085 us 0.55% 77.684 us 1.17% -14.401 us -15.64% FAST
I16 I8 I32 2^28 0.201 1.202 ms 0.12% 1.126 ms 0.17% -75.885 us -6.31% FAST
I16 I8 I64 2^16 1 15.817 us 5.78% 14.334 us 0.29% -1.483 us -9.38% FAST
I16 I8 I64 2^20 1 20.410 us 4.54% 18.575 us 4.70% -1.835 us -8.99% FAST
I16 I8 I64 2^24 1 99.483 us 1.06% 87.572 us 1.11% -11.911 us -11.97% FAST
I16 I8 I64 2^28 1 1.313 ms 0.16% 1.202 ms 0.11% -110.708 us -8.43% FAST
I16 I8 I64 2^16 0.201 14.336 us 0.62% 14.331 us 0.55% -0.005 us -0.04% SAME
I16 I8 I64 2^20 0.201 14.061 us 5.33% 12.626 us 8.16% -1.435 us -10.21% FAST
I16 I8 I64 2^24 0.201 95.237 us 1.10% 82.309 us 0.99% -12.928 us -13.57% FAST
I16 I8 I64 2^28 0.201 1.215 ms 0.16% 1.148 ms 0.12% -67.317 us -5.54% FAST
I16 I16 I32 2^16 1 16.374 us 0.38% 14.333 us 0.72% -2.041 us -12.47% FAST
I16 I16 I32 2^20 1 20.372 us 1.54% 18.412 us 3.09% -1.960 us -9.62% FAST
I16 I16 I32 2^24 1 110.195 us 0.75% 102.754 us 0.96% -7.441 us -6.75% FAST
I16 I16 I32 2^28 1 1.544 ms 0.10% 1.510 ms 0.15% -34.737 us -2.25% FAST
I16 I16 I32 2^16 0.201 14.832 us 5.92% 14.334 us 0.42% -0.498 us -3.36% FAST
I16 I16 I32 2^20 0.201 18.500 us 3.87% 18.227 us 1.46% -0.273 us -1.47% FAST
I16 I16 I32 2^24 0.201 102.422 us 0.26% 101.144 us 1.08% -1.277 us -1.25% FAST
I16 I16 I32 2^28 0.201 1.477 ms 0.16% 1.478 ms 0.15% 1.090 us 0.07% SAME
I16 I16 I64 2^16 1 16.381 us 0.21% 14.335 us 0.41% -2.046 us -12.49% FAST
I16 I16 I64 2^20 1 20.695 us 3.81% 20.036 us 5.79% -0.659 us -3.19% SAME
I16 I16 I64 2^24 1 118.627 us 0.82% 102.255 us 0.93% -16.371 us -13.80% FAST
I16 I16 I64 2^28 1 1.568 ms 0.11% 1.510 ms 0.14% -58.362 us -3.72% FAST
I16 I16 I64 2^16 0.201 15.496 us 6.91% 14.282 us 2.31% -1.214 us -7.83% FAST
I16 I16 I64 2^20 0.201 18.223 us 1.54% 18.192 us 1.45% -0.030 us -0.17% SAME
I16 I16 I64 2^24 0.201 103.906 us 0.91% 100.531 us 1.16% -3.375 us -3.25% FAST
I16 I16 I64 2^28 0.201 1.487 ms 0.14% 1.478 ms 0.15% -8.529 us -0.57% FAST
I16 I32 I32 2^16 1 14.327 us 0.46% 14.327 us 0.44% 0.000 us 0.00% SAME
I16 I32 I32 2^20 1 21.912 us 4.93% 20.646 us 7.19% -1.267 us -5.78% FAST
I16 I32 I32 2^24 1 148.712 us 0.88% 147.117 us 0.80% -1.595 us -1.07% FAST
I16 I32 I32 2^28 1 2.277 ms 0.12% 2.259 ms 0.11% -18.687 us -0.82% FAST
I16 I32 I32 2^16 0.201 14.332 us 0.29% 14.335 us 0.43% 0.003 us 0.02% SAME
I16 I32 I32 2^20 0.201 20.398 us 1.81% 18.969 us 5.41% -1.429 us -7.01% FAST
I16 I32 I32 2^24 0.201 144.963 us 1.01% 144.881 us 1.01% -0.082 us -0.06% SAME
I16 I32 I32 2^28 0.201 2.225 ms 0.12% 2.212 ms 0.13% -12.170 us -0.55% FAST
I16 I32 I64 2^16 1 14.334 us 0.37% 14.334 us 0.52% -0.000 us -0.00% SAME
I16 I32 I64 2^20 1 23.101 us 3.76% 21.032 us 3.74% -2.070 us -8.96% FAST
I16 I32 I64 2^24 1 149.071 us 0.87% 147.825 us 0.79% -1.245 us -0.84% FAST
I16 I32 I64 2^28 1 2.276 ms 0.12% 2.259 ms 0.11% -17.791 us -0.78% FAST
I16 I32 I64 2^16 0.201 14.334 us 0.47% 14.333 us 0.40% -0.001 us -0.01% SAME
I16 I32 I64 2^20 0.201 20.389 us 2.12% 18.317 us 2.10% -2.072 us -10.16% FAST
I16 I32 I64 2^24 0.201 147.443 us 1.07% 147.025 us 1.07% -0.419 us -0.28% SAME
I16 I32 I64 2^28 0.201 2.223 ms 0.12% 2.212 ms 0.13% -11.425 us -0.51% FAST
I16 I64 I32 2^16 1 14.331 us 0.58% 14.324 us 0.62% -0.006 us -0.04% SAME
I16 I64 I32 2^20 1 28.059 us 3.46% 24.045 us 3.88% -4.014 us -14.31% FAST
I16 I64 I32 2^24 1 240.768 us 0.63% 240.521 us 0.62% -0.248 us -0.10% SAME
I16 I64 I32 2^28 1 3.774 ms 0.09% 3.762 ms 0.10% -11.787 us -0.31% FAST
I16 I64 I32 2^16 0.201 14.333 us 0.22% 14.334 us 0.40% 0.001 us 0.01% SAME
I16 I64 I32 2^20 0.201 26.374 us 4.04% 23.254 us 4.72% -3.120 us -11.83% FAST
I16 I64 I32 2^24 0.201 235.649 us 0.60% 237.197 us 0.60% 1.548 us 0.66% SLOW
I16 I64 I32 2^28 0.201 3.688 ms 0.09% 3.684 ms 0.09% -4.176 us -0.11% FAST
I16 I64 I64 2^16 1 14.334 us 0.53% 14.336 us 0.57% 0.002 us 0.01% SAME
I16 I64 I64 2^20 1 28.226 us 4.03% 25.114 us 4.33% -3.112 us -11.03% FAST
I16 I64 I64 2^24 1 240.181 us 0.56% 240.296 us 0.54% 0.115 us 0.05% SAME
I16 I64 I64 2^28 1 3.777 ms 0.09% 3.763 ms 0.10% -13.591 us -0.36% FAST
I16 I64 I64 2^16 0.201 14.334 us 0.55% 14.334 us 0.35% -0.000 us -0.00% SAME
I16 I64 I64 2^20 0.201 26.176 us 4.86% 23.026 us 3.67% -3.150 us -12.03% FAST
I16 I64 I64 2^24 0.201 236.693 us 0.68% 238.586 us 0.64% 1.892 us 0.80% SLOW
I16 I64 I64 2^28 0.201 3.689 ms 0.08% 3.684 ms 0.10% -5.158 us -0.14% FAST
I32 I8 I32 2^16 1 14.327 us 0.61% 14.327 us 0.41% 0.001 us 0.01% SAME
I32 I8 I32 2^20 1 21.398 us 4.62% 19.242 us 5.41% -2.156 us -10.08% FAST
I32 I8 I32 2^24 1 129.551 us 0.97% 128.412 us 1.02% -1.138 us -0.88% SAME
I32 I8 I32 2^28 1 1.907 ms 0.14% 1.893 ms 0.13% -13.954 us -0.73% FAST
I32 I8 I32 2^16 0.201 14.333 us 0.26% 14.335 us 0.32% 0.001 us 0.01% SAME
I32 I8 I32 2^20 0.201 20.950 us 3.69% 19.370 us 5.93% -1.581 us -7.54% FAST
I32 I8 I32 2^24 0.201 125.524 us 0.97% 125.236 us 1.06% -0.287 us -0.23% SAME
I32 I8 I32 2^28 0.201 1.870 ms 0.14% 1.859 ms 0.13% -10.544 us -0.56% FAST
I32 I8 I64 2^16 1 14.333 us 0.41% 14.334 us 0.55% 0.001 us 0.01% SAME
I32 I8 I64 2^20 1 22.090 us 3.16% 21.473 us 5.47% -0.617 us -2.79% SAME
I32 I8 I64 2^24 1 129.829 us 0.98% 128.075 us 0.98% -1.754 us -1.35% FAST
I32 I8 I64 2^28 1 1.906 ms 0.14% 1.894 ms 0.13% -12.781 us -0.67% FAST
I32 I8 I64 2^16 0.201 14.334 us 0.48% 14.334 us 0.52% 0.001 us 0.00% SAME
I32 I8 I64 2^20 0.201 20.801 us 5.12% 18.962 us 4.37% -1.839 us -8.84% FAST
I32 I8 I64 2^24 0.201 126.443 us 1.23% 125.677 us 1.27% -0.766 us -0.61% SAME
I32 I8 I64 2^28 0.201 1.871 ms 0.14% 1.860 ms 0.13% -11.260 us -0.60% FAST
I32 I16 I32 2^16 1 14.327 us 0.48% 14.328 us 0.48% 0.001 us 0.01% SAME
I32 I16 I32 2^20 1 22.728 us 5.92% 21.808 us 4.32% -0.920 us -4.05% SAME
I32 I16 I32 2^24 1 152.426 us 0.81% 150.824 us 0.82% -1.602 us -1.05% FAST
I32 I16 I32 2^28 1 2.283 ms 0.12% 2.267 ms 0.11% -16.171 us -0.71% FAST
I32 I16 I32 2^16 0.201 14.334 us 0.24% 14.334 us 0.27% 0.000 us 0.00% SAME
I32 I16 I32 2^20 0.201 20.819 us 3.35% 19.477 us 6.34% -1.341 us -6.44% FAST
I32 I16 I32 2^24 0.201 148.896 us 0.80% 148.272 us 0.85% -0.624 us -0.42% SAME
I32 I16 I32 2^28 0.201 2.233 ms 0.12% 2.221 ms 0.13% -11.555 us -0.52% FAST
I32 I16 I64 2^16 1 14.333 us 0.48% 14.334 us 0.54% 0.001 us 0.01% SAME
I32 I16 I64 2^20 1 23.911 us 3.93% 21.936 us 3.59% -1.975 us -8.26% FAST
I32 I16 I64 2^24 1 151.220 us 0.98% 150.428 us 0.87% -0.792 us -0.52% SAME
I32 I16 I64 2^28 1 2.283 ms 0.12% 2.268 ms 0.12% -14.645 us -0.64% FAST
I32 I16 I64 2^16 0.201 14.335 us 0.64% 14.336 us 0.55% 0.001 us 0.01% SAME
I32 I16 I64 2^20 0.201 21.066 us 3.81% 20.977 us 3.94% -0.089 us -0.42% SAME
I32 I16 I64 2^24 0.201 146.041 us 0.78% 145.490 us 0.81% -0.551 us -0.38% SAME
I32 I16 I64 2^28 0.201 2.233 ms 0.13% 2.222 ms 0.13% -10.468 us -0.47% FAST
I32 I32 I32 2^16 1 14.339 us 1.13% 14.326 us 0.75% -0.013 us -0.09% SAME
I32 I32 I32 2^20 1 28.650 us 1.25% 24.513 us 1.04% -4.137 us -14.44% FAST
I32 I32 I32 2^24 1 204.887 us 0.56% 196.435 us 0.62% -8.452 us -4.13% FAST
I32 I32 I32 2^28 1 3.146 ms 0.10% 3.036 ms 0.10% -110.520 us -3.51% FAST
I32 I32 I32 2^16 0.201 14.334 us 0.36% 14.334 us 0.39% -0.000 us -0.00% SAME
I32 I32 I32 2^20 0.201 27.731 us 3.60% 21.411 us 4.14% -6.321 us -22.79% FAST
I32 I32 I32 2^24 0.201 198.466 us 0.64% 193.091 us 0.75% -5.374 us -2.71% FAST
I32 I32 I32 2^28 0.201 3.061 ms 0.10% 2.970 ms 0.09% -91.290 us -2.98% FAST
I32 I32 I64 2^16 1 14.402 us 2.63% 14.384 us 2.25% -0.019 us -0.13% SAME
I32 I32 I64 2^20 1 24.442 us 1.12% 22.580 us 2.82% -1.862 us -7.62% FAST
I32 I32 I64 2^24 1 196.627 us 0.62% 195.097 us 0.59% -1.530 us -0.78% FAST
I32 I32 I64 2^28 1 3.057 ms 0.09% 3.036 ms 0.09% -20.758 us -0.68% FAST
I32 I32 I64 2^16 0.201 14.354 us 1.51% 14.346 us 1.16% -0.009 us -0.06% SAME
I32 I32 I64 2^20 0.201 23.821 us 3.53% 21.703 us 4.06% -2.118 us -8.89% FAST
I32 I32 I64 2^24 0.201 192.581 us 0.65% 193.241 us 0.71% 0.659 us 0.34% SAME
I32 I32 I64 2^28 0.201 2.984 ms 0.09% 2.970 ms 0.11% -13.809 us -0.46% FAST
I32 I64 I32 2^16 1 14.326 us 0.52% 14.325 us 0.57% -0.000 us -0.00% SAME
I32 I64 I32 2^20 1 28.659 us 0.83% 28.861 us 2.30% 0.203 us 0.71% SAME
I32 I64 I32 2^24 1 289.569 us 0.49% 288.037 us 0.51% -1.533 us -0.53% FAST
I32 I64 I32 2^28 1 4.556 ms 0.08% 4.527 ms 0.07% -29.830 us -0.65% FAST
I32 I64 I32 2^16 0.201 14.334 us 0.35% 14.335 us 0.46% 0.001 us 0.01% SAME
I32 I64 I32 2^20 0.201 30.788 us 2.61% 30.941 us 3.39% 0.152 us 0.50% SAME
I32 I64 I32 2^24 0.201 283.489 us 0.59% 283.677 us 0.57% 0.188 us 0.07% SAME
I32 I64 I32 2^28 0.201 4.451 ms 0.07% 4.434 ms 0.08% -17.083 us -0.38% FAST
I32 I64 I64 2^16 1 14.334 us 0.59% 14.335 us 0.57% 0.001 us 0.01% SAME
I32 I64 I64 2^20 1 30.621 us 1.17% 28.234 us 2.61% -2.387 us -7.79% FAST
I32 I64 I64 2^24 1 289.987 us 0.55% 288.392 us 0.56% -1.595 us -0.55% FAST
I32 I64 I64 2^28 1 4.559 ms 0.07% 4.528 ms 0.08% -31.518 us -0.69% FAST
I32 I64 I64 2^16 0.201 14.339 us 0.83% 14.337 us 0.57% -0.002 us -0.01% SAME
I32 I64 I64 2^20 0.201 28.907 us 3.90% 30.297 us 5.73% 1.390 us 4.81% SLOW
I32 I64 I64 2^24 0.201 283.786 us 0.58% 283.709 us 0.60% -0.077 us -0.03% SAME
I32 I64 I64 2^28 0.201 4.451 ms 0.08% 4.434 ms 0.07% -17.866 us -0.40% FAST
I64 I8 I32 2^16 1 14.345 us 1.44% 14.334 us 0.92% -0.010 us -0.07% SAME
I64 I8 I32 2^20 1 26.622 us 1.25% 24.661 us 2.13% -1.962 us -7.37% FAST
I64 I8 I32 2^24 1 219.789 us 0.55% 221.037 us 0.59% 1.248 us 0.57% SLOW
I64 I8 I32 2^28 1 3.435 ms 0.10% 3.429 ms 0.10% -5.671 us -0.17% FAST
I64 I8 I32 2^16 0.201 14.345 us 1.14% 14.343 us 1.05% -0.002 us -0.02% SAME
I64 I8 I32 2^20 0.201 26.500 us 1.15% 24.511 us 1.88% -1.990 us -7.51% FAST
I64 I8 I32 2^24 0.201 216.496 us 0.74% 217.780 us 0.72% 1.283 us 0.59% SAME
I64 I8 I32 2^28 0.201 3.370 ms 0.10% 3.368 ms 0.10% -1.338 us -0.04% SAME
I64 I8 I64 2^16 1 14.514 us 4.03% 14.443 us 3.22% -0.070 us -0.49% SAME
I64 I8 I64 2^20 1 26.385 us 2.13% 25.810 us 3.85% -0.575 us -2.18% FAST
I64 I8 I64 2^24 1 221.835 us 0.61% 222.136 us 0.61% 0.300 us 0.14% SAME
I64 I8 I64 2^28 1 3.436 ms 0.10% 3.431 ms 0.09% -4.768 us -0.14% FAST
I64 I8 I64 2^16 0.201 14.397 us 2.51% 14.396 us 2.49% -0.001 us -0.00% SAME
I64 I8 I64 2^20 0.201 26.479 us 1.32% 24.866 us 3.67% -1.613 us -6.09% FAST
I64 I8 I64 2^24 0.201 217.344 us 0.79% 218.326 us 0.77% 0.982 us 0.45% SAME
I64 I8 I64 2^28 0.201 3.373 ms 0.09% 3.370 ms 0.10% -2.767 us -0.08% SAME
I64 I16 I32 2^16 1 12.279 us 0.52% 12.282 us 0.47% 0.003 us 0.02% SAME
I64 I16 I32 2^20 1 28.621 us 1.37% 26.818 us 2.53% -1.803 us -6.30% FAST
I64 I16 I32 2^24 1 243.267 us 0.55% 242.982 us 0.54% -0.284 us -0.12% SAME
I64 I16 I32 2^28 1 3.796 ms 0.08% 3.785 ms 0.08% -11.064 us -0.29% FAST
I64 I16 I32 2^16 0.201 10.627 us 7.56% 12.052 us 5.39% 1.425 us 13.41% SLOW
I64 I16 I32 2^20 0.201 28.402 us 2.06% 26.501 us 0.90% -1.901 us -6.69% FAST
I64 I16 I32 2^24 0.201 236.770 us 0.64% 238.199 us 0.60% 1.429 us 0.60% SLOW
I64 I16 I32 2^28 0.201 3.727 ms 0.09% 3.723 ms 0.07% -4.190 us -0.11% FAST
I64 I16 I64 2^16 1 12.287 us 0.65% 12.288 us 0.81% 0.001 us 0.01% SAME
I64 I16 I64 2^20 1 28.794 us 2.57% 26.945 us 3.31% -1.849 us -6.42% FAST
I64 I16 I64 2^24 1 243.478 us 0.56% 243.009 us 0.53% -0.470 us -0.19% SAME
I64 I16 I64 2^28 1 3.798 ms 0.08% 3.786 ms 0.08% -11.959 us -0.31% FAST
I64 I16 I64 2^16 0.201 10.560 us 7.07% 11.970 us 6.17% 1.409 us 13.35% SLOW
I64 I16 I64 2^20 0.201 29.684 us 3.88% 30.219 us 3.22% 0.535 us 1.80% SAME
I64 I16 I64 2^24 0.201 237.127 us 0.69% 238.827 us 0.67% 1.700 us 0.72% SLOW
I64 I16 I64 2^28 0.201 3.727 ms 0.08% 3.724 ms 0.08% -3.565 us -0.10% FAST
I64 I32 I32 2^16 1 14.333 us 0.83% 14.332 us 0.72% -0.001 us -0.01% SAME
I64 I32 I32 2^20 1 28.683 us 1.02% 28.635 us 0.58% -0.048 us -0.17% SAME
I64 I32 I32 2^24 1 290.879 us 0.54% 290.379 us 0.55% -0.501 us -0.17% SAME
I64 I32 I32 2^28 1 4.551 ms 0.08% 4.539 ms 0.07% -11.655 us -0.26% FAST
I64 I32 I32 2^16 0.201 14.337 us 0.72% 14.337 us 0.56% -0.000 us -0.00% SAME
I64 I32 I32 2^20 0.201 30.562 us 1.54% 27.982 us 3.34% -2.580 us -8.44% FAST
I64 I32 I32 2^24 0.201 283.120 us 0.38% 285.326 us 0.43% 2.205 us 0.78% SLOW
I64 I32 I32 2^28 0.201 4.458 ms 0.07% 4.456 ms 0.08% -2.073 us -0.05% SAME
I64 I32 I64 2^16 1 14.399 us 2.55% 14.381 us 2.18% -0.018 us -0.12% SAME
I64 I32 I64 2^20 1 30.501 us 2.07% 28.525 us 1.49% -1.976 us -6.48% FAST
I64 I32 I64 2^24 1 292.631 us 0.54% 290.836 us 0.53% -1.795 us -0.61% FAST
I64 I32 I64 2^28 1 4.569 ms 0.07% 4.540 ms 0.07% -29.061 us -0.64% FAST
I64 I32 I64 2^16 0.201 14.373 us 2.08% 14.355 us 1.56% -0.018 us -0.12% SAME
I64 I32 I64 2^20 0.201 27.754 us 3.70% 27.383 us 3.87% -0.371 us -1.34% SAME
I64 I32 I64 2^24 0.201 285.343 us 0.41% 285.494 us 0.44% 0.151 us 0.05% SAME
I64 I32 I64 2^28 0.201 4.475 ms 0.07% 4.457 ms 0.08% -18.416 us -0.41% FAST
I64 I64 I32 2^16 1 14.337 us 1.10% 14.335 us 0.94% -0.002 us -0.01% SAME
I64 I64 I32 2^20 1 34.927 us 1.29% 34.929 us 1.31% 0.002 us 0.01% SAME
I64 I64 I32 2^24 1 386.334 us 0.41% 385.382 us 0.39% -0.952 us -0.25% SAME
I64 I64 I32 2^28 1 6.164 ms 0.06% 6.139 ms 0.06% -24.970 us -0.41% FAST
I64 I64 I32 2^16 0.201 10.294 us 3.23% 10.593 us 7.33% 0.299 us 2.91% SAME
I64 I64 I32 2^20 0.201 34.782 us 1.44% 35.154 us 2.61% 0.372 us 1.07% SAME
I64 I64 I32 2^24 0.201 376.456 us 0.36% 376.992 us 0.38% 0.535 us 0.14% SAME
I64 I64 I32 2^28 0.201 5.999 ms 0.07% 5.988 ms 0.06% -10.875 us -0.18% FAST
I64 I64 I64 2^16 1 15.626 us 6.31% 15.579 us 6.40% -0.046 us -0.30% SAME
I64 I64 I64 2^20 1 34.558 us 3.52% 34.903 us 2.81% 0.345 us 1.00% SAME
I64 I64 I64 2^24 1 387.167 us 0.44% 386.140 us 0.43% -1.027 us -0.27% SAME
I64 I64 I64 2^28 1 6.167 ms 0.06% 6.140 ms 0.06% -27.144 us -0.44% FAST
I64 I64 I64 2^16 0.201 10.533 us 6.86% 11.284 us 9.07% 0.751 us 7.13% SLOW
I64 I64 I64 2^20 0.201 34.663 us 1.02% 34.721 us 1.41% 0.059 us 0.17% SAME
I64 I64 I64 2^24 0.201 376.706 us 0.37% 377.113 us 0.37% 0.407 us 0.11% SAME
I64 I64 I64 2^28 0.201 6.001 ms 0.06% 5.989 ms 0.06% -12.068 us -0.20% FAST
I128 I8 I32 2^16 1 16.350 us 1.37% 16.336 us 1.73% -0.014 us -0.09% SAME
I128 I8 I32 2^20 1 38.912 us 2.48% 38.805 us 2.53% -0.107 us -0.27% SAME
I128 I8 I32 2^24 1 413.338 us 0.37% 411.463 us 0.37% -1.875 us -0.45% FAST
I128 I8 I32 2^28 1 6.541 ms 0.06% 6.507 ms 0.05% -34.268 us -0.52% FAST
I128 I8 I32 2^16 0.201 16.092 us 4.43% 16.039 us 4.77% -0.053 us -0.33% SAME
I128 I8 I32 2^20 0.201 36.434 us 3.61% 36.845 us 2.99% 0.410 us 1.13% SAME
I128 I8 I32 2^24 0.201 402.371 us 0.43% 401.500 us 0.44% -0.871 us -0.22% SAME
I128 I8 I32 2^28 0.201 6.413 ms 0.06% 6.389 ms 0.06% -24.321 us -0.38% FAST
I128 I8 I64 2^16 1 16.369 us 0.95% 16.361 us 1.30% -0.008 us -0.05% SAME
I128 I8 I64 2^20 1 39.340 us 2.51% 39.038 us 2.09% -0.303 us -0.77% SAME
I128 I8 I64 2^24 1 411.609 us 0.37% 410.013 us 0.38% -1.596 us -0.39% FAST
I128 I8 I64 2^28 1 6.542 ms 0.05% 6.509 ms 0.06% -33.212 us -0.51% FAST
I128 I8 I64 2^16 0.201 16.154 us 3.98% 16.118 us 4.25% -0.035 us -0.22% SAME
I128 I8 I64 2^20 0.201 36.421 us 3.66% 36.944 us 3.30% 0.523 us 1.43% SAME
I128 I8 I64 2^24 0.201 403.232 us 0.42% 402.339 us 0.43% -0.893 us -0.22% SAME
I128 I8 I64 2^28 0.201 6.416 ms 0.06% 6.390 ms 0.06% -26.189 us -0.41% FAST
I128 I16 I32 2^16 1 16.345 us 1.28% 16.314 us 2.16% -0.031 us -0.19% SAME
I128 I16 I32 2^20 1 38.839 us 2.29% 38.549 us 2.07% -0.290 us -0.75% SAME
I128 I16 I32 2^24 1 433.791 us 0.33% 432.425 us 0.33% -1.366 us -0.31% SAME
I128 I16 I32 2^28 1 6.898 ms 0.05% 6.863 ms 0.05% -34.984 us -0.51% FAST
I128 I16 I32 2^16 0.201 14.616 us 4.86% 14.624 us 4.88% 0.008 us 0.06% SAME
I128 I16 I32 2^20 0.201 36.654 us 3.31% 36.309 us 3.49% -0.345 us -0.94% SAME
I128 I16 I32 2^24 0.201 424.860 us 0.34% 424.345 us 0.33% -0.515 us -0.12% SAME
I128 I16 I32 2^28 0.201 6.768 ms 0.06% 6.743 ms 0.06% -24.186 us -0.36% FAST
I128 I16 I64 2^16 1 16.374 us 0.70% 16.359 us 1.30% -0.015 us -0.09% SAME
I128 I16 I64 2^20 1 40.400 us 2.33% 39.432 us 2.65% -0.969 us -2.40% FAST
I128 I16 I64 2^24 1 432.280 us 0.36% 432.762 us 0.35% 0.482 us 0.11% SAME
I128 I16 I64 2^28 1 6.872 ms 0.06% 6.864 ms 0.05% -7.571 us -0.11% FAST
I128 I16 I64 2^16 0.201 14.914 us 6.20% 15.026 us 6.44% 0.111 us 0.75% SAME
I128 I16 I64 2^20 0.201 37.637 us 2.59% 38.030 us 3.39% 0.393 us 1.04% SAME
I128 I16 I64 2^24 0.201 423.231 us 0.36% 424.658 us 0.34% 1.426 us 0.34% SAME
I128 I16 I64 2^28 0.201 6.742 ms 0.06% 6.745 ms 0.06% 3.084 us 0.05% SAME
I128 I32 I32 2^16 1 14.508 us 3.89% 14.458 us 3.37% -0.050 us -0.35% SAME
I128 I32 I32 2^20 1 42.152 us 2.64% 42.643 us 2.18% 0.492 us 1.17% SAME
I128 I32 I32 2^24 1 479.019 us 0.34% 477.399 us 0.34% -1.620 us -0.34% FAST
I128 I32 I32 2^28 1 7.615 ms 0.13% 7.582 ms 0.34% -33.593 us -0.44% FAST
I128 I32 I32 2^16 0.201 16.120 us 4.25% 15.942 us 5.27% -0.178 us -1.10% SAME
I128 I32 I32 2^20 0.201 40.829 us 3.06% 41.731 us 2.57% 0.902 us 2.21% SAME
I128 I32 I32 2^24 0.201 469.155 us 0.32% 469.009 us 0.33% -0.146 us -0.03% SAME
I128 I32 I32 2^28 0.201 7.480 ms 0.31% 7.458 ms 0.40% -22.002 us -0.29% SAME
I128 I32 I64 2^16 1 16.374 us 0.78% 16.383 us 1.19% 0.010 us 0.06% SAME
I128 I32 I64 2^20 1 40.816 us 1.64% 41.149 us 2.09% 0.333 us 0.82% SAME
I128 I32 I64 2^24 1 478.721 us 0.36% 478.422 us 0.37% -0.299 us -0.06% SAME
I128 I32 I64 2^28 1 7.590 ms 0.14% 7.583 ms 0.36% -7.048 us -0.09% SAME
I128 I32 I64 2^16 0.201 16.381 us 0.38% 16.382 us 0.23% 0.001 us 0.01% SAME
I128 I32 I64 2^20 0.201 39.874 us 2.37% 41.128 us 3.20% 1.254 us 3.15% SLOW
I128 I32 I64 2^24 0.201 469.321 us 0.33% 471.282 us 0.31% 1.961 us 0.42% SLOW
I128 I32 I64 2^28 0.201 7.455 ms 0.32% 7.460 ms 0.42% 4.817 us 0.06% SAME
I128 I64 I32 2^16 1 16.377 us 0.81% 16.386 us 0.99% 0.009 us 0.05% SAME
I128 I64 I32 2^20 1 45.838 us 2.22% 45.577 us 2.06% -0.261 us -0.57% SAME
I128 I64 I32 2^24 1 576.201 us 0.28% 574.683 us 0.30% -1.518 us -0.26% SAME
I128 I64 I32 2^28 1 9.248 ms 0.04% 9.216 ms 0.04% -32.345 us -0.35% FAST
I128 I64 I32 2^16 0.201 15.241 us 6.67% 15.178 us 6.63% -0.063 us -0.41% SAME
I128 I64 I32 2^20 0.201 45.489 us 2.39% 45.787 us 2.46% 0.298 us 0.66% SAME
I128 I64 I32 2^24 0.201 559.597 us 0.31% 559.881 us 0.31% 0.285 us 0.05% SAME
I128 I64 I32 2^28 0.201 9.003 ms 0.04% 8.989 ms 0.04% -13.525 us -0.15% FAST
I128 I64 I64 2^16 1 15.707 us 6.10% 15.477 us 6.55% -0.229 us -1.46% SAME
I128 I64 I64 2^20 1 45.051 us 1.46% 45.030 us 1.46% -0.021 us -0.05% SAME
I128 I64 I64 2^24 1 576.711 us 0.29% 575.854 us 0.31% -0.857 us -0.15% SAME
I128 I64 I64 2^28 1 9.252 ms 0.04% 9.221 ms 0.04% -31.259 us -0.34% FAST
I128 I64 I64 2^16 0.201 16.381 us 0.58% 16.381 us 0.27% 0.000 us 0.00% SAME
I128 I64 I64 2^20 0.201 46.429 us 1.91% 46.753 us 1.50% 0.324 us 0.70% SAME
I128 I64 I64 2^24 0.201 560.260 us 0.29% 560.522 us 0.30% 0.263 us 0.05% SAME
I128 I64 I64 2^28 0.201 9.006 ms 0.05% 8.992 ms 0.04% -13.818 us -0.15% FAST
F32 I8 I32 2^16 1 14.399 us 2.71% 14.324 us 0.50% -0.075 us -0.52% FAST
F32 I8 I32 2^20 1 21.400 us 4.62% 21.219 us 4.42% -0.181 us -0.85% SAME
F32 I8 I32 2^24 1 130.101 us 1.20% 128.459 us 1.12% -1.642 us -1.26% FAST
F32 I8 I32 2^28 1 1.906 ms 0.14% 1.893 ms 0.15% -13.180 us -0.69% FAST
F32 I8 I32 2^16 0.201 14.332 us 0.26% 14.332 us 0.28% -0.000 us -0.00% SAME
F32 I8 I32 2^20 0.201 20.611 us 2.67% 19.129 us 5.76% -1.482 us -7.19% FAST
F32 I8 I32 2^24 0.201 125.484 us 1.12% 124.703 us 1.27% -0.781 us -0.62% SAME
F32 I8 I32 2^28 0.201 1.870 ms 0.13% 1.859 ms 0.13% -10.982 us -0.59% FAST
F32 I8 I64 2^16 1 14.334 us 0.61% 14.333 us 0.37% -0.001 us -0.01% SAME
F32 I8 I64 2^20 1 21.568 us 4.38% 21.354 us 4.25% -0.215 us -0.99% SAME
F32 I8 I64 2^24 1 130.161 us 1.01% 128.417 us 1.01% -1.744 us -1.34% FAST
F32 I8 I64 2^28 1 1.906 ms 0.14% 1.893 ms 0.14% -12.801 us -0.67% FAST
F32 I8 I64 2^16 0.201 14.333 us 0.25% 14.333 us 0.24% 0.000 us 0.00% SAME
F32 I8 I64 2^20 0.201 18.899 us 5.31% 18.268 us 1.73% -0.631 us -3.34% FAST
F32 I8 I64 2^24 0.201 125.222 us 1.25% 125.138 us 1.31% -0.084 us -0.07% SAME
F32 I8 I64 2^28 0.201 1.871 ms 0.13% 1.860 ms 0.13% -11.141 us -0.60% FAST
F32 I16 I32 2^16 1 14.326 us 0.44% 14.329 us 0.39% 0.003 us 0.02% SAME
F32 I16 I32 2^20 1 21.941 us 5.95% 21.304 us 4.60% -0.637 us -2.90% SAME
F32 I16 I32 2^24 1 149.363 us 0.85% 148.119 us 0.83% -1.244 us -0.83% FAST
F32 I16 I32 2^28 1 2.283 ms 0.12% 2.267 ms 0.12% -15.638 us -0.69% FAST
F32 I16 I32 2^16 0.201 14.333 us 0.30% 14.334 us 0.40% 0.001 us 0.00% SAME
F32 I16 I32 2^20 0.201 21.285 us 5.19% 20.747 us 3.18% -0.538 us -2.53% SAME
F32 I16 I32 2^24 0.201 145.592 us 0.73% 145.237 us 0.73% -0.354 us -0.24% SAME
F32 I16 I32 2^28 0.201 2.232 ms 0.12% 2.221 ms 0.13% -11.433 us -0.51% FAST
F32 I16 I64 2^16 1 14.332 us 0.38% 14.333 us 0.34% 0.001 us 0.01% SAME
F32 I16 I64 2^20 1 22.239 us 3.89% 22.022 us 3.49% -0.217 us -0.98% SAME
F32 I16 I64 2^24 1 149.493 us 0.79% 148.293 us 0.78% -1.200 us -0.80% FAST
F32 I16 I64 2^28 1 2.282 ms 0.12% 2.267 ms 0.12% -14.244 us -0.62% FAST
F32 I16 I64 2^16 0.201 14.334 us 0.54% 14.334 us 0.52% -0.000 us -0.00% SAME
F32 I16 I64 2^20 0.201 21.168 us 4.04% 20.948 us 4.79% -0.219 us -1.04% SAME
F32 I16 I64 2^24 0.201 146.122 us 0.75% 145.679 us 0.69% -0.443 us -0.30% SAME
F32 I16 I64 2^28 0.201 2.232 ms 0.12% 2.221 ms 0.12% -10.819 us -0.48% FAST
F32 I32 I32 2^16 1 14.325 us 0.56% 14.328 us 0.53% 0.003 us 0.02% SAME
F32 I32 I32 2^20 1 28.434 us 2.32% 24.162 us 3.40% -4.272 us -15.02% FAST
F32 I32 I32 2^24 1 204.047 us 0.61% 196.480 us 0.57% -7.567 us -3.71% FAST
F32 I32 I32 2^28 1 3.145 ms 0.10% 3.035 ms 0.10% -110.090 us -3.50% FAST
F32 I32 I32 2^16 0.201 14.335 us 0.51% 14.334 us 0.48% -0.001 us -0.01% SAME
F32 I32 I32 2^20 0.201 25.623 us 4.46% 22.045 us 6.01% -3.578 us -13.96% FAST
F32 I32 I32 2^24 0.201 199.278 us 0.53% 194.509 us 0.59% -4.769 us -2.39% FAST
F32 I32 I32 2^28 0.201 3.060 ms 0.09% 2.968 ms 0.10% -91.785 us -3.00% FAST
F32 I32 I64 2^16 1 14.365 us 1.83% 14.350 us 1.33% -0.015 us -0.10% SAME
F32 I32 I64 2^20 1 24.456 us 0.93% 24.433 us 0.99% -0.024 us -0.10% SAME
F32 I32 I64 2^24 1 198.434 us 0.70% 196.459 us 0.70% -1.975 us -1.00% FAST
F32 I32 I64 2^28 1 3.056 ms 0.10% 3.035 ms 0.09% -20.887 us -0.68% FAST
F32 I32 I64 2^16 0.201 14.338 us 0.73% 14.336 us 0.65% -0.002 us -0.01% SAME
F32 I32 I64 2^20 0.201 23.537 us 4.37% 21.522 us 4.04% -2.014 us -8.56% FAST
F32 I32 I64 2^24 0.201 193.320 us 0.59% 194.000 us 0.66% 0.680 us 0.35% SAME
F32 I32 I64 2^28 0.201 2.983 ms 0.09% 2.969 ms 0.09% -13.717 us -0.46% FAST
F32 I64 I32 2^16 1 14.336 us 1.21% 14.344 us 1.27% 0.008 us 0.06% SAME
F32 I64 I32 2^20 1 31.593 us 3.26% 30.060 us 3.56% -1.533 us -4.85% FAST
F32 I64 I32 2^24 1 288.621 us 0.45% 287.006 us 0.45% -1.615 us -0.56% FAST
F32 I64 I32 2^28 1 4.556 ms 0.07% 4.526 ms 0.08% -29.127 us -0.64% FAST
F32 I64 I32 2^16 0.201 14.337 us 0.83% 14.337 us 0.64% 0.000 us 0.00% SAME
F32 I64 I32 2^20 0.201 28.283 us 2.82% 28.966 us 5.09% 0.684 us 2.42% SAME
F32 I64 I32 2^24 0.201 282.845 us 0.51% 283.296 us 0.53% 0.451 us 0.16% SAME
F32 I64 I32 2^28 0.201 4.451 ms 0.07% 4.433 ms 0.08% -17.651 us -0.40% FAST
F32 I64 I64 2^16 1 14.459 us 3.44% 14.476 us 3.62% 0.018 us 0.12% SAME
F32 I64 I64 2^20 1 31.978 us 2.95% 29.127 us 3.56% -2.851 us -8.92% FAST
F32 I64 I64 2^24 1 290.281 us 0.52% 288.600 us 0.52% -1.681 us -0.58% FAST
F32 I64 I64 2^28 1 4.557 ms 0.07% 4.527 ms 0.08% -30.867 us -0.68% FAST
F32 I64 I64 2^16 0.201 12.739 us 6.73% 13.209 us 7.79% 0.470 us 3.69% SAME
F32 I64 I64 2^20 0.201 27.573 us 5.41% 30.335 us 4.88% 2.762 us 10.02% SLOW
F32 I64 I64 2^24 0.201 283.142 us 0.53% 283.432 us 0.57% 0.290 us 0.10% SAME
F32 I64 I64 2^28 0.201 4.450 ms 0.08% 4.433 ms 0.08% -17.943 us -0.40% FAST
F64 I8 I32 2^16 1 14.390 us 2.31% 14.373 us 1.93% -0.018 us -0.12% SAME
F64 I8 I32 2^20 1 26.673 us 2.25% 26.470 us 1.32% -0.203 us -0.76% SAME
F64 I8 I32 2^24 1 222.371 us 0.57% 222.275 us 0.62% -0.095 us -0.04% SAME
F64 I8 I32 2^28 1 3.438 ms 0.10% 3.428 ms 0.09% -9.524 us -0.28% FAST
F64 I8 I32 2^16 0.201 14.825 us 5.88% 14.780 us 5.71% -0.045 us -0.31% SAME
F64 I8 I32 2^20 0.201 26.534 us 2.29% 24.408 us 2.24% -2.125 us -8.01% FAST
F64 I8 I32 2^24 0.201 216.050 us 0.75% 216.820 us 0.76% 0.769 us 0.36% SAME
F64 I8 I32 2^28 0.201 3.371 ms 0.10% 3.366 ms 0.12% -5.193 us -0.15% FAST
F64 I8 I64 2^16 1 14.413 us 2.73% 14.383 us 2.15% -0.030 us -0.21% SAME
F64 I8 I64 2^20 1 26.659 us 2.23% 26.555 us 1.53% -0.104 us -0.39% SAME
F64 I8 I64 2^24 1 221.809 us 0.54% 222.279 us 0.62% 0.470 us 0.21% SAME
F64 I8 I64 2^28 1 3.439 ms 0.10% 3.431 ms 0.11% -8.213 us -0.24% FAST
F64 I8 I64 2^16 0.201 14.446 us 3.22% 14.415 us 2.74% -0.031 us -0.22% SAME
F64 I8 I64 2^20 0.201 26.393 us 1.73% 25.373 us 4.20% -1.020 us -3.87% FAST
F64 I8 I64 2^24 0.201 216.581 us 0.74% 217.828 us 0.81% 1.247 us 0.58% SAME
F64 I8 I64 2^28 0.201 3.372 ms 0.10% 3.368 ms 0.10% -4.266 us -0.13% FAST
F64 I16 I32 2^16 1 14.926 us 6.66% 14.854 us 6.49% -0.073 us -0.49% SAME
F64 I16 I32 2^20 1 27.757 us 3.79% 27.020 us 3.44% -0.738 us -2.66% SAME
F64 I16 I32 2^24 1 244.419 us 0.52% 243.923 us 0.55% -0.496 us -0.20% SAME
F64 I16 I32 2^28 1 3.800 ms 0.09% 3.783 ms 0.08% -16.777 us -0.44% FAST
F64 I16 I32 2^16 0.201 14.656 us 5.07% 14.593 us 4.65% -0.063 us -0.43% SAME
F64 I16 I32 2^20 0.201 27.845 us 3.53% 26.394 us 1.03% -1.451 us -5.21% FAST
F64 I16 I32 2^24 0.201 238.312 us 0.49% 239.924 us 0.55% 1.612 us 0.68% SLOW
F64 I16 I32 2^28 0.201 3.728 ms 0.08% 3.720 ms 0.08% -7.972 us -0.21% FAST
F64 I16 I64 2^16 1 14.686 us 5.24% 14.531 us 4.13% -0.155 us -1.05% SAME
F64 I16 I64 2^20 1 27.167 us 3.69% 26.653 us 2.23% -0.514 us -1.89% SAME
F64 I16 I64 2^24 1 245.579 us 0.68% 244.949 us 0.67% -0.630 us -0.26% SAME
F64 I16 I64 2^28 1 3.800 ms 0.09% 3.784 ms 0.08% -15.734 us -0.41% FAST
F64 I16 I64 2^16 0.201 15.057 us 6.48% 15.111 us 6.57% 0.055 us 0.36% SAME
F64 I16 I64 2^20 0.201 30.018 us 3.15% 28.612 us 2.02% -1.406 us -4.68% FAST
F64 I16 I64 2^24 0.201 239.801 us 0.47% 241.070 us 0.53% 1.269 us 0.53% SLOW
F64 I16 I64 2^28 0.201 3.728 ms 0.08% 3.721 ms 0.08% -6.592 us -0.18% FAST
F64 I32 I32 2^16 1 14.664 us 5.10% 14.531 us 4.12% -0.133 us -0.91% SAME
F64 I32 I32 2^20 1 28.076 us 3.40% 28.300 us 2.86% 0.224 us 0.80% SAME
F64 I32 I32 2^24 1 291.899 us 0.50% 290.379 us 0.50% -1.520 us -0.52% FAST
F64 I32 I32 2^28 1 4.568 ms 0.07% 4.537 ms 0.07% -30.800 us -0.67% FAST
F64 I32 I32 2^16 0.201 15.265 us 6.66% 15.361 us 6.65% 0.096 us 0.63% SAME
F64 I32 I32 2^20 0.201 28.572 us 0.95% 28.020 us 3.33% -0.552 us -1.93% FAST
F64 I32 I32 2^24 0.201 286.802 us 0.41% 286.754 us 0.47% -0.049 us -0.02% SAME
F64 I32 I32 2^28 0.201 4.474 ms 0.07% 4.455 ms 0.07% -18.682 us -0.42% FAST
F64 I32 I64 2^16 1 14.718 us 5.41% 14.486 us 3.67% -0.232 us -1.58% SAME
F64 I32 I64 2^20 1 28.733 us 2.29% 28.754 us 1.87% 0.021 us 0.07% SAME
F64 I32 I64 2^24 1 292.927 us 0.55% 291.056 us 0.54% -1.871 us -0.64% FAST
F64 I32 I64 2^28 1 4.569 ms 0.08% 4.538 ms 0.07% -30.723 us -0.67% FAST
F64 I32 I64 2^16 0.201 14.520 us 4.04% 14.450 us 3.24% -0.071 us -0.49% SAME
F64 I32 I64 2^20 0.201 29.820 us 3.68% 27.295 us 4.04% -2.525 us -8.47% FAST
F64 I32 I64 2^24 0.201 285.807 us 0.46% 286.032 us 0.46% 0.225 us 0.08% SAME
F64 I32 I64 2^28 0.201 4.475 ms 0.07% 4.456 ms 0.07% -18.775 us -0.42% FAST
F64 I64 I32 2^16 1 15.917 us 5.33% 15.847 us 5.65% -0.070 us -0.44% SAME
F64 I64 I32 2^20 1 35.819 us 3.49% 35.566 us 3.10% -0.254 us -0.71% SAME
F64 I64 I32 2^24 1 387.656 us 0.38% 386.971 us 0.39% -0.685 us -0.18% SAME
F64 I64 I32 2^28 1 6.165 ms 0.06% 6.139 ms 0.07% -26.196 us -0.42% FAST
F64 I64 I32 2^16 0.201 15.386 us 6.64% 15.410 us 6.63% 0.024 us 0.16% SAME
F64 I64 I32 2^20 0.201 34.784 us 1.18% 35.017 us 2.33% 0.233 us 0.67% SAME
F64 I64 I32 2^24 0.201 377.104 us 0.43% 377.532 us 0.43% 0.429 us 0.11% SAME
F64 I64 I32 2^28 0.201 5.999 ms 0.06% 5.988 ms 0.06% -11.227 us -0.19% FAST
F64 I64 I64 2^16 1 16.306 us 2.33% 16.353 us 1.45% 0.047 us 0.29% SAME
F64 I64 I64 2^20 1 35.343 us 3.24% 34.143 us 3.76% -1.199 us -3.39% FAST
F64 I64 I64 2^24 1 387.929 us 0.38% 387.064 us 0.39% -0.865 us -0.22% SAME
F64 I64 I64 2^28 1 6.168 ms 0.07% 6.140 ms 0.06% -28.605 us -0.46% FAST
F64 I64 I64 2^16 0.201 15.883 us 5.32% 15.832 us 5.60% -0.051 us -0.32% SAME
F64 I64 I64 2^20 0.201 34.830 us 1.69% 35.013 us 2.36% 0.183 us 0.52% SAME
F64 I64 I64 2^24 0.201 377.466 us 0.49% 377.994 us 0.51% 0.528 us 0.14% SAME
F64 I64 I64 2^28 0.201 6.001 ms 0.06% 5.989 ms 0.05% -12.258 us -0.20% FAST

Summary

  • Total Matches: 448
    • Pass (diff <= min_noise): 223
    • Unknown (infinite noise): 0
    • Failure (diff > min_noise): 225

@github-actions

This comment was marked as outdated.

@bernhardmgruber
Copy link
Contributor

The benchmark looks very promising! There are few runs though that regressed a lot, like some 2^16 workloads with more than 20% slowdown. I think it would be worthwhile to understand what's going on there.

@pauleonix

This comment was marked as resolved.

@github-actions

This comment has been minimized.

@pauleonix
Copy link
Contributor Author

cub.bench.merge.pairs.base (cont'd due to character limit for GH comments)

A100 (LDGSTS)

['/home/pgrossebley/SM_80_merge_pairs_final_old.json', '/home/pgrossebley/SM_80_merge_pairs_final_newest.json']

base

[0] NVIDIA A100 80GB PCIe

KeyT{ct} ValueT{ct} OffsetT{ct} Elements{io} Entropy Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I8 I32 2^16 1 20.601 us 2.25% 19.908 us 2.60% -0.693 us -3.36% FAST
I8 I8 I32 2^20 1 32.337 us 1.76% 28.648 us 1.54% -3.689 us -11.41% FAST
I8 I8 I32 2^24 1 161.636 us 0.35% 118.892 us 0.50% -42.744 us -26.44% FAST
I8 I8 I32 2^28 1 2.174 ms 0.19% 1.468 ms 0.30% -706.162 us -32.48% FAST
I8 I8 I32 2^16 0.201 20.102 us 2.56% 19.904 us 2.57% -0.198 us -0.98% SAME
I8 I8 I32 2^20 0.201 31.067 us 1.83% 26.964 us 1.84% -4.103 us -13.21% FAST
I8 I8 I32 2^24 0.201 159.401 us 0.37% 113.544 us 0.48% -45.856 us -28.77% FAST
I8 I8 I32 2^28 0.201 2.160 ms 0.35% 1.424 ms 0.23% -736.187 us -34.09% FAST
I8 I8 I64 2^16 1 20.767 us 2.50% 20.416 us 2.59% -0.350 us -1.69% SAME
I8 I8 I64 2^20 1 32.952 us 1.68% 31.918 us 1.51% -1.034 us -3.14% FAST
I8 I8 I64 2^24 1 162.157 us 0.39% 151.255 us 0.36% -10.901 us -6.72% FAST
I8 I8 I64 2^28 1 2.192 ms 0.29% 1.979 ms 0.29% -213.123 us -9.72% FAST
I8 I8 I64 2^16 0.201 20.005 us 2.57% 20.057 us 2.56% 0.051 us 0.26% SAME
I8 I8 I64 2^20 0.201 31.532 us 1.74% 29.349 us 1.86% -2.183 us -6.92% FAST
I8 I8 I64 2^24 0.201 159.634 us 0.37% 145.391 us 0.38% -14.242 us -8.92% FAST
I8 I8 I64 2^28 0.201 2.177 ms 0.26% 1.965 ms 0.27% -212.256 us -9.75% FAST
I8 I16 I32 2^16 1 20.653 us 2.42% 20.313 us 2.40% -0.340 us -1.64% SAME
I8 I16 I32 2^20 1 34.122 us 1.73% 29.279 us 1.77% -4.843 us -14.19% FAST
I8 I16 I32 2^24 1 165.070 us 0.36% 121.829 us 0.49% -43.241 us -26.20% FAST
I8 I16 I32 2^28 1 2.212 ms 0.27% 1.518 ms 0.50% -693.678 us -31.36% FAST
I8 I16 I32 2^16 0.201 20.201 us 2.48% 19.996 us 2.68% -0.205 us -1.01% SAME
I8 I16 I32 2^20 0.201 32.247 us 1.82% 27.675 us 1.57% -4.571 us -14.18% FAST
I8 I16 I32 2^24 0.201 162.768 us 0.39% 118.446 us 0.55% -44.322 us -27.23% FAST
I8 I16 I32 2^28 0.201 2.195 ms 0.24% 1.500 ms 0.52% -695.073 us -31.66% FAST
I8 I16 I64 2^16 1 21.003 us 2.49% 20.249 us 2.43% -0.754 us -3.59% FAST
I8 I16 I64 2^20 1 33.923 us 1.64% 29.163 us 1.78% -4.761 us -14.03% FAST
I8 I16 I64 2^24 1 168.004 us 0.36% 122.756 us 0.49% -45.248 us -26.93% FAST
I8 I16 I64 2^28 1 2.257 ms 0.28% 1.531 ms 0.50% -725.709 us -32.15% FAST
I8 I16 I64 2^16 0.201 20.506 us 2.32% 20.457 us 2.39% -0.049 us -0.24% SAME
I8 I16 I64 2^20 0.201 32.295 us 1.90% 27.747 us 1.59% -4.548 us -14.08% FAST
I8 I16 I64 2^24 0.201 165.760 us 0.40% 119.178 us 0.54% -46.582 us -28.10% FAST
I8 I16 I64 2^28 0.201 2.235 ms 0.27% 1.527 ms 0.57% -707.830 us -31.67% FAST
I8 I32 I32 2^16 1 19.446 us 2.05% 20.091 us 2.60% 0.645 us 3.32% SLOW
I8 I32 I32 2^20 1 33.249 us 1.55% 29.896 us 1.63% -3.353 us -10.09% FAST
I8 I32 I32 2^24 1 184.600 us 0.60% 150.512 us 0.77% -34.088 us -18.47% FAST
I8 I32 I32 2^28 1 2.650 ms 0.58% 2.026 ms 0.50% -623.942 us -23.54% FAST
I8 I32 I32 2^16 0.201 18.999 us 2.75% 19.830 us 2.59% 0.831 us 4.37% SLOW
I8 I32 I32 2^20 0.201 31.414 us 1.60% 28.226 us 1.82% -3.188 us -10.15% FAST
I8 I32 I32 2^24 0.201 182.081 us 0.98% 149.533 us 0.75% -32.548 us -17.88% FAST
I8 I32 I32 2^28 0.201 2.583 ms 0.55% 2.018 ms 0.50% -564.442 us -21.86% FAST
I8 I32 I64 2^16 1 19.445 us 2.36% 20.132 us 2.55% 0.686 us 3.53% SLOW
I8 I32 I64 2^20 1 33.141 us 1.52% 29.672 us 1.56% -3.469 us -10.47% FAST
I8 I32 I64 2^24 1 186.386 us 0.52% 149.137 us 0.80% -37.249 us -19.98% FAST
I8 I32 I64 2^28 1 2.688 ms 0.55% 2.031 ms 0.50% -657.010 us -24.44% FAST
I8 I32 I64 2^16 0.201 19.583 us 2.34% 20.400 us 2.53% 0.817 us 4.17% SLOW
I8 I32 I64 2^20 0.201 31.223 us 1.66% 27.955 us 1.76% -3.268 us -10.47% FAST
I8 I32 I64 2^24 0.201 183.888 us 0.92% 148.882 us 0.77% -35.006 us -19.04% FAST
I8 I32 I64 2^28 0.201 2.637 ms 0.52% 2.022 ms 0.50% -614.978 us -23.32% FAST
I8 I64 I32 2^16 1 18.524 us 2.35% 19.414 us 2.38% 0.890 us 4.80% SLOW
I8 I64 I32 2^20 1 35.482 us 1.62% 35.805 us 1.74% 0.323 us 0.91% SAME
I8 I64 I32 2^24 1 225.668 us 0.90% 217.746 us 0.74% -7.922 us -3.51% FAST
I8 I64 I32 2^28 1 3.333 ms 0.50% 3.195 ms 0.50% -138.150 us -4.15% FAST
I8 I64 I32 2^16 0.201 17.829 us 2.88% 18.903 us 2.79% 1.074 us 6.03% SLOW
I8 I64 I32 2^20 0.201 34.294 us 1.73% 33.952 us 1.64% -0.342 us -1.00% SAME
I8 I64 I32 2^24 0.201 223.908 us 0.89% 218.326 us 0.82% -5.582 us -2.49% FAST
I8 I64 I32 2^28 0.201 3.273 ms 0.50% 3.169 ms 0.50% -103.760 us -3.17% FAST
I8 I64 I64 2^16 1 18.477 us 2.46% 19.441 us 2.44% 0.964 us 5.22% SLOW
I8 I64 I64 2^20 1 35.390 us 1.59% 35.934 us 1.92% 0.544 us 1.54% SAME
I8 I64 I64 2^24 1 227.899 us 0.85% 217.748 us 0.75% -10.151 us -4.45% FAST
I8 I64 I64 2^28 1 3.392 ms 0.50% 3.196 ms 0.50% -196.284 us -5.79% FAST
I8 I64 I64 2^16 0.201 17.901 us 2.93% 18.910 us 2.79% 1.009 us 5.64% SLOW
I8 I64 I64 2^20 0.201 34.459 us 1.78% 34.148 us 1.78% -0.312 us -0.90% SAME
I8 I64 I64 2^24 0.201 226.788 us 0.85% 218.384 us 0.79% -8.404 us -3.71% FAST
I8 I64 I64 2^28 0.201 3.340 ms 0.50% 3.170 ms 0.50% -170.396 us -5.10% FAST
I16 I8 I32 2^16 1 21.357 us 2.17% 20.998 us 2.49% -0.359 us -1.68% SAME
I16 I8 I32 2^20 1 35.013 us 1.54% 31.540 us 1.49% -3.473 us -9.92% FAST
I16 I8 I32 2^24 1 186.182 us 0.36% 149.758 us 0.43% -36.424 us -19.56% FAST
I16 I8 I32 2^28 1 2.532 ms 0.49% 1.951 ms 0.50% -581.173 us -22.96% FAST
I16 I8 I32 2^16 0.201 20.593 us 2.22% 20.839 us 2.38% 0.246 us 1.19% SAME
I16 I8 I32 2^20 0.201 32.618 us 1.69% 29.287 us 1.75% -3.330 us -10.21% FAST
I16 I8 I32 2^24 0.201 171.376 us 0.37% 135.216 us 0.55% -36.161 us -21.10% FAST
I16 I8 I32 2^28 0.201 2.311 ms 0.42% 1.782 ms 0.50% -528.769 us -22.88% FAST
I16 I8 I64 2^16 1 21.790 us 2.24% 21.365 us 2.16% -0.425 us -1.95% SAME
I16 I8 I64 2^20 1 35.709 us 1.50% 31.652 us 1.38% -4.057 us -11.36% FAST
I16 I8 I64 2^24 1 188.393 us 0.36% 149.802 us 0.56% -38.591 us -20.48% FAST
I16 I8 I64 2^28 1 2.569 ms 0.43% 1.991 ms 0.50% -578.804 us -22.53% FAST
I16 I8 I64 2^16 0.201 20.610 us 2.07% 20.656 us 2.20% 0.046 us 0.22% SAME
I16 I8 I64 2^20 0.201 33.281 us 1.64% 29.433 us 1.65% -3.848 us -11.56% FAST
I16 I8 I64 2^24 0.201 173.422 us 0.35% 135.238 us 0.49% -38.184 us -22.02% FAST
I16 I8 I64 2^28 0.201 2.347 ms 0.28% 1.792 ms 0.50% -554.788 us -23.64% FAST
I16 I16 I32 2^16 1 21.733 us 2.31% 20.976 us 2.47% -0.758 us -3.49% FAST
I16 I16 I32 2^20 1 35.884 us 1.49% 31.875 us 1.38% -4.010 us -11.17% FAST
I16 I16 I32 2^24 1 192.635 us 0.40% 157.780 us 0.68% -34.854 us -18.09% FAST
I16 I16 I32 2^28 1 2.611 ms 0.49% 2.094 ms 0.83% -517.627 us -19.82% FAST
I16 I16 I32 2^16 0.201 21.002 us 2.50% 21.106 us 2.49% 0.104 us 0.50% SAME
I16 I16 I32 2^20 0.201 33.373 us 1.61% 29.673 us 1.43% -3.700 us -11.09% FAST
I16 I16 I32 2^24 0.201 177.416 us 0.41% 147.373 us 0.73% -30.043 us -16.93% FAST
I16 I16 I32 2^28 0.201 2.387 ms 0.31% 1.989 ms 0.58% -397.820 us -16.66% FAST
I16 I16 I64 2^16 1 21.947 us 2.41% 21.529 us 2.07% -0.418 us -1.90% SAME
I16 I16 I64 2^20 1 35.996 us 1.45% 32.268 us 1.60% -3.728 us -10.36% FAST
I16 I16 I64 2^24 1 194.845 us 0.39% 157.647 us 0.61% -37.197 us -19.09% FAST
I16 I16 I64 2^28 1 2.644 ms 0.33% 2.110 ms 0.55% -534.190 us -20.20% FAST
I16 I16 I64 2^16 0.201 21.237 us 2.28% 21.364 us 2.29% 0.127 us 0.60% SAME
I16 I16 I64 2^20 0.201 33.577 us 1.70% 30.052 us 1.65% -3.525 us -10.50% FAST
I16 I16 I64 2^24 0.201 180.397 us 0.46% 146.593 us 0.85% -33.803 us -18.74% FAST
I16 I16 I64 2^28 0.201 2.429 ms 0.31% 1.982 ms 0.62% -447.149 us -18.41% FAST
I16 I32 I32 2^16 1 20.002 us 2.63% 20.797 us 2.37% 0.795 us 3.97% SLOW
I16 I32 I32 2^20 1 30.028 us 1.64% 31.813 us 1.56% 1.786 us 5.95% SLOW
I16 I32 I32 2^24 1 177.396 us 0.80% 173.404 us 1.10% -3.993 us -2.25% FAST
I16 I32 I32 2^28 1 2.545 ms 0.50% 2.389 ms 0.57% -155.449 us -6.11% FAST
I16 I32 I32 2^16 0.201 19.210 us 2.64% 19.905 us 2.75% 0.695 us 3.62% SLOW
I16 I32 I32 2^20 0.201 29.340 us 1.79% 30.027 us 1.63% 0.687 us 2.34% SLOW
I16 I32 I32 2^24 0.201 164.970 us 1.00% 164.518 us 1.19% -0.452 us -0.27% SAME
I16 I32 I32 2^28 0.201 2.311 ms 0.50% 2.302 ms 0.56% -9.875 us -0.43% SAME
I16 I32 I64 2^16 1 19.835 us 2.62% 20.610 us 2.34% 0.775 us 3.91% SLOW
I16 I32 I64 2^20 1 30.257 us 1.71% 31.633 us 1.55% 1.376 us 4.55% SLOW
I16 I32 I64 2^24 1 178.113 us 0.74% 174.359 us 1.17% -3.754 us -2.11% FAST
I16 I32 I64 2^28 1 2.549 ms 0.50% 2.406 ms 0.75% -142.377 us -5.59% FAST
I16 I32 I64 2^16 0.201 19.634 us 2.44% 20.478 us 2.49% 0.844 us 4.30% SLOW
I16 I32 I64 2^20 0.201 29.044 us 1.76% 29.657 us 1.37% 0.613 us 2.11% SLOW
I16 I32 I64 2^24 0.201 165.164 us 1.03% 164.577 us 1.21% -0.587 us -0.36% SAME
I16 I32 I64 2^28 0.201 2.325 ms 0.50% 2.305 ms 0.65% -19.836 us -0.85% FAST
I16 I64 I32 2^16 1 19.709 us 2.56% 20.928 us 2.66% 1.219 us 6.19% SLOW
I16 I64 I32 2^20 1 37.343 us 1.46% 37.715 us 1.56% 0.372 us 0.99% SAME
I16 I64 I32 2^24 1 259.322 us 0.77% 254.570 us 0.85% -4.753 us -1.83% FAST
I16 I64 I32 2^28 1 3.843 ms 0.50% 3.769 ms 0.71% -73.346 us -1.91% FAST
I16 I64 I32 2^16 0.201 18.638 us 2.51% 20.312 us 2.64% 1.675 us 8.99% SLOW
I16 I64 I32 2^20 0.201 36.022 us 1.69% 35.829 us 1.38% -0.193 us -0.54% SAME
I16 I64 I32 2^24 0.201 248.048 us 0.75% 245.718 us 0.77% -2.329 us -0.94% FAST
I16 I64 I32 2^28 0.201 3.643 ms 0.50% 3.575 ms 0.50% -68.447 us -1.88% FAST
I16 I64 I64 2^16 1 19.562 us 2.51% 20.752 us 2.48% 1.190 us 6.08% SLOW
I16 I64 I64 2^20 1 37.573 us 1.46% 37.875 us 1.51% 0.302 us 0.80% SAME
I16 I64 I64 2^24 1 259.708 us 0.78% 255.118 us 0.81% -4.590 us -1.77% FAST
I16 I64 I64 2^28 1 3.872 ms 0.50% 3.800 ms 0.59% -72.182 us -1.86% FAST
I16 I64 I64 2^16 0.201 19.030 us 2.84% 20.264 us 2.94% 1.234 us 6.48% SLOW
I16 I64 I64 2^20 0.201 36.355 us 1.65% 36.221 us 1.48% -0.135 us -0.37% SAME
I16 I64 I64 2^24 0.201 248.609 us 0.78% 246.667 us 0.74% -1.942 us -0.78% FAST
I16 I64 I64 2^28 0.201 3.648 ms 0.50% 3.589 ms 0.50% -58.679 us -1.61% FAST
I32 I8 I32 2^16 1 20.350 us 2.36% 20.813 us 2.77% 0.464 us 2.28% SAME
I32 I8 I32 2^20 1 33.823 us 1.29% 31.612 us 1.47% -2.211 us -6.54% FAST
I32 I8 I32 2^24 1 198.651 us 0.53% 176.572 us 0.54% -22.079 us -11.11% FAST
I32 I8 I32 2^28 1 2.789 ms 0.50% 2.522 ms 1.83% -266.807 us -9.57% FAST
I32 I8 I32 2^16 0.201 19.870 us 2.57% 20.841 us 2.83% 0.972 us 4.89% SLOW
I32 I8 I32 2^20 0.201 31.789 us 1.61% 29.589 us 1.57% -2.200 us -6.92% FAST
I32 I8 I32 2^24 0.201 181.076 us 0.51% 156.856 us 0.92% -24.220 us -13.38% FAST
I32 I8 I32 2^28 0.201 2.505 ms 0.50% 2.214 ms 1.42% -291.023 us -11.62% FAST
I32 I8 I64 2^16 1 20.806 us 2.39% 21.165 us 2.64% 0.360 us 1.73% SAME
I32 I8 I64 2^20 1 34.481 us 1.44% 31.800 us 1.41% -2.681 us -7.77% FAST
I32 I8 I64 2^24 1 201.563 us 0.80% 178.129 us 0.50% -23.435 us -11.63% FAST
I32 I8 I64 2^28 1 2.840 ms 0.68% 2.587 ms 1.97% -252.650 us -8.90% FAST
I32 I8 I64 2^16 0.201 19.783 us 2.62% 20.406 us 2.75% 0.623 us 3.15% SLOW
I32 I8 I64 2^20 0.201 32.210 us 1.60% 29.803 us 1.49% -2.407 us -7.47% FAST
I32 I8 I64 2^24 0.201 182.963 us 0.52% 159.248 us 0.85% -23.715 us -12.96% FAST
I32 I8 I64 2^28 0.201 2.542 ms 0.50% 2.294 ms 1.32% -248.598 us -9.78% FAST
I32 I16 I32 2^16 1 19.816 us 2.78% 20.664 us 2.49% 0.848 us 4.28% SLOW
I32 I16 I32 2^20 1 30.434 us 1.60% 31.928 us 1.49% 1.494 us 4.91% SLOW
I32 I16 I32 2^24 1 169.105 us 0.79% 177.327 us 1.25% 8.222 us 4.86% SLOW
I32 I16 I32 2^28 1 2.455 ms 0.57% 2.480 ms 1.45% 25.144 us 1.02% SLOW
I32 I16 I32 2^16 0.201 19.278 us 2.65% 20.372 us 2.82% 1.094 us 5.67% SLOW
I32 I16 I32 2^20 0.201 28.750 us 1.75% 30.295 us 1.72% 1.544 us 5.37% SLOW
I32 I16 I32 2^24 0.201 163.342 us 1.00% 165.313 us 1.01% 1.971 us 1.21% SLOW
I32 I16 I32 2^28 0.201 2.316 ms 0.60% 2.294 ms 0.79% -22.110 us -0.95% FAST
I32 I16 I64 2^16 1 20.405 us 2.68% 21.037 us 2.87% 0.633 us 3.10% SLOW
I32 I16 I64 2^20 1 31.195 us 1.65% 32.478 us 1.54% 1.283 us 4.11% SLOW
I32 I16 I64 2^24 1 172.070 us 0.63% 199.264 us 0.56% 27.194 us 15.80% SLOW
I32 I16 I64 2^28 1 2.585 ms 2.32% 2.906 ms 1.97% 321.651 us 12.45% SLOW
I32 I16 I64 2^16 0.201 19.192 us 2.95% 20.277 us 2.67% 1.085 us 5.65% SLOW
I32 I16 I64 2^20 0.201 29.155 us 1.84% 30.835 us 1.49% 1.680 us 5.76% SLOW
I32 I16 I64 2^24 0.201 163.741 us 0.96% 178.052 us 0.97% 14.311 us 8.74% SLOW
I32 I16 I64 2^28 0.201 2.328 ms 0.91% 2.606 ms 1.18% 277.820 us 11.93% SLOW
I32 I32 I32 2^16 1 19.480 us 2.59% 20.371 us 2.66% 0.891 us 4.57% SLOW
I32 I32 I32 2^20 1 34.618 us 1.40% 34.209 us 1.54% -0.409 us -1.18% SAME
I32 I32 I32 2^24 1 211.133 us 0.84% 215.125 us 1.09% 3.992 us 1.89% SLOW
I32 I32 I32 2^28 1 3.202 ms 1.65% 3.138 ms 1.10% -63.279 us -1.98% FAST
I32 I32 I32 2^16 0.201 19.020 us 3.00% 20.154 us 2.99% 1.134 us 5.96% SLOW
I32 I32 I32 2^20 0.201 33.167 us 1.64% 32.145 us 1.60% -1.022 us -3.08% FAST
I32 I32 I32 2^24 0.201 199.468 us 0.89% 203.289 us 0.83% 3.820 us 1.92% SLOW
I32 I32 I32 2^28 0.201 2.953 ms 0.77% 2.955 ms 0.61% 1.213 us 0.04% SAME
I32 I32 I64 2^16 1 19.932 us 2.78% 20.834 us 2.78% 0.902 us 4.53% SLOW
I32 I32 I64 2^20 1 34.754 us 1.32% 39.484 us 1.34% 4.730 us 13.61% SLOW
I32 I32 I64 2^24 1 211.749 us 0.86% 250.035 us 1.12% 38.286 us 18.08% SLOW
I32 I32 I64 2^28 1 3.204 ms 1.22% 3.748 ms 1.57% 543.758 us 16.97% SLOW
I32 I32 I64 2^16 0.201 19.137 us 2.90% 20.300 us 2.79% 1.162 us 6.07% SLOW
I32 I32 I64 2^20 0.201 32.904 us 1.65% 35.824 us 1.51% 2.920 us 8.87% SLOW
I32 I32 I64 2^24 0.201 199.962 us 0.91% 221.114 us 0.91% 21.152 us 10.58% SLOW
I32 I32 I64 2^28 0.201 2.953 ms 0.68% 3.335 ms 0.95% 381.222 us 12.91% SLOW
I32 I64 I32 2^16 1 20.141 us 2.81% 21.054 us 2.72% 0.913 us 4.53% SLOW
I32 I64 I32 2^20 1 38.764 us 1.53% 39.999 us 1.23% 1.236 us 3.19% SLOW
I32 I64 I32 2^24 1 280.723 us 0.54% 296.514 us 0.74% 15.791 us 5.63% SLOW
I32 I64 I32 2^28 1 4.287 ms 0.50% 4.521 ms 0.61% 233.777 us 5.45% SLOW
I32 I64 I32 2^16 0.201 19.103 us 3.00% 20.528 us 2.60% 1.426 us 7.46% SLOW
I32 I64 I32 2^20 0.201 37.276 us 1.62% 37.766 us 1.30% 0.489 us 1.31% SLOW
I32 I64 I32 2^24 0.201 274.364 us 0.51% 285.889 us 0.66% 11.525 us 4.20% SLOW
I32 I64 I32 2^28 0.201 4.135 ms 0.50% 4.334 ms 0.50% 198.585 us 4.80% SLOW
I32 I64 I64 2^16 1 19.893 us 2.97% 20.831 us 2.69% 0.939 us 4.72% SLOW
I32 I64 I64 2^20 1 39.291 us 1.60% 42.879 us 1.20% 3.588 us 9.13% SLOW
I32 I64 I64 2^24 1 280.971 us 0.56% 335.969 us 0.83% 54.998 us 19.57% SLOW
I32 I64 I64 2^28 1 4.298 ms 0.50% 5.312 ms 1.11% 1.014 ms 23.59% SLOW
I32 I64 I64 2^16 0.201 19.352 us 2.78% 20.472 us 2.65% 1.119 us 5.78% SLOW
I32 I64 I64 2^20 0.201 37.073 us 1.55% 40.151 us 1.32% 3.078 us 8.30% SLOW
I32 I64 I64 2^24 0.201 274.232 us 0.52% 309.844 us 0.82% 35.612 us 12.99% SLOW
I32 I64 I64 2^28 0.201 4.138 ms 0.50% 4.764 ms 0.66% 626.632 us 15.14% SLOW
I64 I8 I32 2^16 1 21.086 us 2.78% 21.872 us 2.66% 0.786 us 3.73% SLOW
I64 I8 I32 2^20 1 39.582 us 1.44% 40.350 us 1.31% 0.768 us 1.94% SLOW
I64 I8 I32 2^24 1 246.361 us 0.85% 255.423 us 1.16% 9.062 us 3.68% SLOW
I64 I8 I32 2^28 1 3.880 ms 1.11% 3.956 ms 1.11% 75.556 us 1.95% SLOW
I64 I8 I32 2^16 0.201 19.809 us 3.24% 21.003 us 3.05% 1.194 us 6.03% SLOW
I64 I8 I32 2^20 0.201 37.243 us 1.48% 36.476 us 1.45% -0.767 us -2.06% FAST
I64 I8 I32 2^24 0.201 233.425 us 0.96% 231.506 us 0.92% -1.920 us -0.82% SAME
I64 I8 I32 2^28 0.201 3.602 ms 1.56% 3.478 ms 0.89% -124.146 us -3.45% FAST
I64 I8 I64 2^16 1 20.893 us 2.97% 21.759 us 2.91% 0.866 us 4.15% SLOW
I64 I8 I64 2^20 1 39.388 us 1.43% 42.890 us 1.13% 3.501 us 8.89% SLOW
I64 I8 I64 2^24 1 249.512 us 0.87% 303.948 us 0.83% 54.436 us 21.82% SLOW
I64 I8 I64 2^28 1 3.954 ms 1.00% 4.786 ms 1.08% 831.962 us 21.04% SLOW
I64 I8 I64 2^16 0.201 20.360 us 2.83% 21.347 us 2.71% 0.987 us 4.85% SLOW
I64 I8 I64 2^20 0.201 36.811 us 1.44% 38.700 us 1.39% 1.889 us 5.13% SLOW
I64 I8 I64 2^24 0.201 236.455 us 0.96% 263.264 us 0.80% 26.809 us 11.34% SLOW
I64 I8 I64 2^28 0.201 3.686 ms 0.83% 4.116 ms 1.06% 430.011 us 11.66% SLOW
I64 I16 I32 2^16 1 21.362 us 2.80% 21.965 us 2.83% 0.602 us 2.82% SLOW
I64 I16 I32 2^20 1 40.704 us 1.39% 44.002 us 1.10% 3.298 us 8.10% SLOW
I64 I16 I32 2^24 1 261.888 us 0.75% 303.466 us 1.21% 41.578 us 15.88% SLOW
I64 I16 I32 2^28 1 4.161 ms 1.25% 4.756 ms 0.91% 594.302 us 14.28% SLOW
I64 I16 I32 2^16 0.201 20.122 us 3.25% 21.280 us 2.87% 1.159 us 5.76% SLOW
I64 I16 I32 2^20 0.201 37.761 us 1.59% 39.918 us 1.38% 2.157 us 5.71% SLOW
I64 I16 I32 2^24 0.201 245.199 us 0.80% 274.063 us 0.92% 28.864 us 11.77% SLOW
I64 I16 I32 2^28 0.201 3.783 ms 1.10% 4.267 ms 0.87% 484.200 us 12.80% SLOW
I64 I16 I64 2^16 1 21.078 us 2.93% 21.998 us 2.80% 0.920 us 4.36% SLOW
I64 I16 I64 2^20 1 39.943 us 1.25% 43.880 us 1.13% 3.938 us 9.86% SLOW
I64 I16 I64 2^24 1 263.043 us 0.74% 312.183 us 1.17% 49.140 us 18.68% SLOW
I64 I16 I64 2^28 1 4.180 ms 0.83% 4.928 ms 0.50% 748.167 us 17.90% SLOW
I64 I16 I64 2^16 0.201 20.547 us 2.85% 21.680 us 2.59% 1.133 us 5.51% SLOW
I64 I16 I64 2^20 0.201 37.091 us 1.50% 39.448 us 1.41% 2.357 us 6.35% SLOW
I64 I16 I64 2^24 0.201 245.853 us 0.79% 276.985 us 0.91% 31.131 us 12.66% SLOW
I64 I16 I64 2^28 0.201 3.805 ms 0.71% 4.316 ms 1.00% 511.221 us 13.44% SLOW
I64 I32 I32 2^16 1 21.072 us 3.04% 21.871 us 3.08% 0.798 us 3.79% SLOW
I64 I32 I32 2^20 1 40.178 us 1.34% 41.309 us 1.65% 1.131 us 2.81% SLOW
I64 I32 I32 2^24 1 285.370 us 0.59% 307.027 us 0.91% 21.657 us 7.59% SLOW
I64 I32 I32 2^28 1 4.407 ms 0.50% 4.770 ms 0.50% 363.052 us 8.24% SLOW
I64 I32 I32 2^16 0.201 19.915 us 3.35% 20.891 us 3.13% 0.976 us 4.90% SLOW
I64 I32 I32 2^20 0.201 38.527 us 1.50% 38.040 us 1.42% -0.487 us -1.26% SAME
I64 I32 I32 2^24 0.201 275.559 us 0.53% 286.728 us 0.69% 11.169 us 4.05% SLOW
I64 I32 I32 2^28 0.201 4.162 ms 0.50% 4.383 ms 0.50% 220.983 us 5.31% SLOW
I64 I32 I64 2^16 1 20.886 us 3.10% 21.701 us 2.90% 0.815 us 3.90% SLOW
I64 I32 I64 2^20 1 40.059 us 1.43% 45.570 us 1.15% 5.511 us 13.76% SLOW
I64 I32 I64 2^24 1 285.510 us 0.62% 352.654 us 1.09% 67.143 us 23.52% SLOW
I64 I32 I64 2^28 1 4.435 ms 0.50% 5.639 ms 0.50% 1.204 ms 27.16% SLOW
I64 I32 I64 2^16 0.201 20.483 us 3.12% 21.085 us 2.96% 0.602 us 2.94% SAME
I64 I32 I64 2^20 0.201 38.061 us 1.52% 41.310 us 1.34% 3.249 us 8.54% SLOW
I64 I32 I64 2^24 0.201 274.963 us 0.54% 315.023 us 0.81% 40.061 us 14.57% SLOW
I64 I32 I64 2^28 0.201 4.166 ms 0.50% 4.934 ms 0.75% 767.704 us 18.43% SLOW
I64 I64 I32 2^16 1 20.204 us 3.05% 21.373 us 2.69% 1.169 us 5.79% SLOW
I64 I64 I32 2^20 1 44.235 us 1.37% 54.261 us 1.15% 10.025 us 22.66% SLOW
I64 I64 I32 2^24 1 372.792 us 0.44% 452.055 us 0.72% 79.262 us 21.26% SLOW
I64 I64 I32 2^28 1 5.837 ms 0.50% 7.389 ms 0.80% 1.553 ms 26.60% SLOW
I64 I64 I32 2^16 0.201 19.208 us 3.34% 20.851 us 2.95% 1.643 us 8.56% SLOW
I64 I64 I32 2^20 0.201 42.405 us 1.42% 47.953 us 1.25% 5.548 us 13.08% SLOW
I64 I64 I32 2^24 0.201 358.130 us 0.45% 410.620 us 0.67% 52.490 us 14.66% SLOW
I64 I64 I32 2^28 0.201 5.560 ms 0.50% 6.549 ms 0.50% 988.945 us 17.79% SLOW
I64 I64 I64 2^16 1 20.064 us 3.24% 21.919 us 2.73% 1.856 us 9.25% SLOW
I64 I64 I64 2^20 1 44.658 us 1.34% 50.648 us 1.41% 5.990 us 13.41% SLOW
I64 I64 I64 2^24 1 373.593 us 0.45% 405.718 us 0.74% 32.125 us 8.60% SLOW
I64 I64 I64 2^28 1 5.844 ms 0.50% 6.535 ms 0.68% 690.625 us 11.82% SLOW
I64 I64 I64 2^16 0.201 19.957 us 3.34% 21.359 us 3.01% 1.402 us 7.03% SLOW
I64 I64 I64 2^20 0.201 42.408 us 1.44% 45.179 us 1.45% 2.771 us 6.53% SLOW
I64 I64 I64 2^24 0.201 358.397 us 0.46% 379.562 us 0.58% 21.165 us 5.91% SLOW
I64 I64 I64 2^28 0.201 5.564 ms 0.50% 5.950 ms 0.50% 385.493 us 6.93% SLOW
I128 I8 I32 2^16 1 22.957 us 2.78% 23.457 us 2.66% 0.500 us 2.18% SAME
I128 I8 I32 2^20 1 50.975 us 1.27% 52.997 us 1.71% 2.022 us 3.97% SLOW
I128 I8 I32 2^24 1 418.859 us 0.60% 438.930 us 0.69% 20.071 us 4.79% SLOW
I128 I8 I32 2^28 1 6.832 ms 1.56% 7.133 ms 1.59% 300.488 us 4.40% SLOW
I128 I8 I32 2^16 0.201 21.320 us 2.82% 21.502 us 2.93% 0.182 us 0.86% SAME
I128 I8 I32 2^20 0.201 45.988 us 1.26% 47.292 us 1.44% 1.304 us 2.84% SLOW
I128 I8 I32 2^24 0.201 388.976 us 0.50% 394.076 us 0.50% 5.100 us 1.31% SLOW
I128 I8 I32 2^28 0.201 6.055 ms 0.50% 6.137 ms 0.50% 81.953 us 1.35% SLOW
I128 I8 I64 2^16 1 22.757 us 2.59% 23.252 us 2.78% 0.495 us 2.18% SAME
I128 I8 I64 2^20 1 51.023 us 1.43% 53.934 us 1.57% 2.912 us 5.71% SLOW
I128 I8 I64 2^24 1 419.609 us 0.59% 439.374 us 0.71% 19.765 us 4.71% SLOW
I128 I8 I64 2^28 1 6.823 ms 2.36% 7.138 ms 1.42% 314.901 us 4.62% SLOW
I128 I8 I64 2^16 0.201 21.575 us 3.05% 22.252 us 2.88% 0.677 us 3.14% SLOW
I128 I8 I64 2^20 0.201 46.282 us 1.17% 47.710 us 1.40% 1.428 us 3.09% SLOW
I128 I8 I64 2^24 0.201 388.763 us 0.50% 393.453 us 0.51% 4.690 us 1.21% SLOW
I128 I8 I64 2^28 0.201 6.063 ms 0.50% 6.135 ms 0.50% 72.115 us 1.19% SLOW
I128 I16 I32 2^16 1 22.863 us 2.80% 23.528 us 2.73% 0.665 us 2.91% SLOW
I128 I16 I32 2^20 1 51.959 us 1.39% 54.143 us 1.50% 2.184 us 4.20% SLOW
I128 I16 I32 2^24 1 434.258 us 0.56% 451.791 us 0.64% 17.532 us 4.04% SLOW
I128 I16 I32 2^28 1 7.042 ms 1.22% 7.198 ms 2.58% 156.116 us 2.22% SLOW
I128 I16 I32 2^16 0.201 21.418 us 3.02% 22.068 us 2.80% 0.650 us 3.03% SLOW
I128 I16 I32 2^20 0.201 47.160 us 1.21% 48.582 us 1.43% 1.422 us 3.01% SLOW
I128 I16 I32 2^24 0.201 408.667 us 0.50% 413.223 us 0.50% 4.557 us 1.11% SLOW
I128 I16 I32 2^28 0.201 6.355 ms 0.50% 6.430 ms 0.50% 74.377 us 1.17% SLOW
I128 I16 I64 2^16 1 22.773 us 2.63% 23.417 us 2.70% 0.644 us 2.83% SLOW
I128 I16 I64 2^20 1 51.938 us 1.35% 54.429 us 1.50% 2.491 us 4.80% SLOW
I128 I16 I64 2^24 1 435.600 us 0.51% 449.588 us 0.62% 13.988 us 3.21% SLOW
I128 I16 I64 2^28 1 7.091 ms 1.25% 7.314 ms 2.02% 223.152 us 3.15% SLOW
I128 I16 I64 2^16 0.201 21.565 us 2.83% 21.967 us 2.72% 0.402 us 1.86% SAME
I128 I16 I64 2^20 0.201 47.807 us 1.22% 48.499 us 1.34% 0.692 us 1.45% SLOW
I128 I16 I64 2^24 0.201 409.314 us 0.50% 412.549 us 0.50% 3.235 us 0.79% SLOW
I128 I16 I64 2^28 0.201 6.357 ms 0.50% 6.420 ms 0.50% 63.381 us 1.00% SLOW
I128 I32 I32 2^16 1 22.643 us 2.65% 23.223 us 2.98% 0.579 us 2.56% SAME
I128 I32 I32 2^20 1 53.695 us 1.30% 56.342 us 1.38% 2.647 us 4.93% SLOW
I128 I32 I32 2^24 1 467.131 us 0.50% 478.193 us 0.51% 11.061 us 2.37% SLOW
I128 I32 I32 2^28 1 7.403 ms 0.50% 7.649 ms 0.51% 245.599 us 3.32% SLOW
I128 I32 I32 2^16 0.201 21.980 us 3.06% 22.736 us 2.71% 0.756 us 3.44% SLOW
I128 I32 I32 2^20 0.201 49.280 us 1.15% 50.528 us 1.36% 1.248 us 2.53% SLOW
I128 I32 I32 2^24 0.201 446.263 us 0.50% 451.163 us 0.50% 4.900 us 1.10% SLOW
I128 I32 I32 2^28 0.201 6.940 ms 0.50% 7.030 ms 0.52% 90.029 us 1.30% SLOW
I128 I32 I64 2^16 1 23.263 us 2.79% 23.778 us 2.59% 0.515 us 2.21% SAME
I128 I32 I64 2^20 1 53.885 us 1.33% 55.833 us 1.43% 1.949 us 3.62% SLOW
I128 I32 I64 2^24 1 467.783 us 0.52% 478.847 us 0.56% 11.064 us 2.37% SLOW
I128 I32 I64 2^28 1 7.475 ms 0.52% 7.695 ms 0.71% 220.035 us 2.94% SLOW
I128 I32 I64 2^16 0.201 21.765 us 2.76% 22.086 us 2.73% 0.322 us 1.48% SAME
I128 I32 I64 2^20 0.201 49.260 us 1.13% 50.054 us 1.34% 0.795 us 1.61% SLOW
I128 I32 I64 2^24 0.201 446.796 us 0.50% 450.557 us 0.50% 3.761 us 0.84% SLOW
I128 I32 I64 2^28 0.201 6.942 ms 0.50% 7.017 ms 0.50% 74.601 us 1.07% SLOW
I128 I64 I32 2^16 1 22.130 us 2.95% 23.033 us 2.97% 0.903 us 4.08% SLOW
I128 I64 I32 2^20 1 57.751 us 1.12% 58.935 us 1.18% 1.184 us 2.05% SLOW
I128 I64 I32 2^24 1 565.813 us 0.49% 572.643 us 0.50% 6.830 us 1.21% SLOW
I128 I64 I32 2^28 1 9.242 ms 0.50% 9.348 ms 0.50% 105.716 us 1.14% SLOW
I128 I64 I32 2^16 0.201 21.655 us 2.95% 22.598 us 2.96% 0.943 us 4.35% SLOW
I128 I64 I32 2^20 0.201 54.453 us 1.00% 56.150 us 1.13% 1.697 us 3.12% SLOW
I128 I64 I32 2^24 0.201 538.200 us 0.50% 545.646 us 0.50% 7.446 us 1.38% SLOW
I128 I64 I32 2^28 0.201 8.510 ms 0.50% 8.661 ms 0.50% 150.565 us 1.77% SLOW
I128 I64 I64 2^16 1 22.698 us 2.70% 23.519 us 2.86% 0.821 us 3.62% SLOW
I128 I64 I64 2^20 1 57.863 us 1.10% 59.324 us 1.17% 1.461 us 2.53% SLOW
I128 I64 I64 2^24 1 566.077 us 0.49% 573.482 us 0.47% 7.405 us 1.31% SLOW
I128 I64 I64 2^28 1 9.261 ms 0.50% 9.326 ms 0.50% 65.756 us 0.71% SLOW
I128 I64 I64 2^16 0.201 21.868 us 2.84% 22.485 us 3.08% 0.617 us 2.82% SAME
I128 I64 I64 2^20 0.201 54.715 us 1.04% 56.037 us 1.13% 1.322 us 2.42% SLOW
I128 I64 I64 2^24 0.201 538.767 us 0.50% 545.613 us 0.50% 6.846 us 1.27% SLOW
I128 I64 I64 2^28 0.201 8.516 ms 0.50% 8.647 ms 0.50% 130.964 us 1.54% SLOW
F32 I8 I32 2^16 1 20.707 us 2.77% 21.291 us 3.06% 0.584 us 2.82% SLOW
F32 I8 I32 2^20 1 33.958 us 1.37% 31.660 us 1.47% -2.298 us -6.77% FAST
F32 I8 I32 2^24 1 197.950 us 0.57% 177.918 us 0.60% -20.033 us -10.12% FAST
F32 I8 I32 2^28 1 3.010 ms 1.75% 2.571 ms 2.47% -438.902 us -14.58% FAST
F32 I8 I32 2^16 0.201 20.076 us 2.90% 20.808 us 2.88% 0.731 us 3.64% SLOW
F32 I8 I32 2^20 0.201 32.041 us 1.57% 29.629 us 1.54% -2.412 us -7.53% FAST
F32 I8 I32 2^24 0.201 180.513 us 0.48% 156.148 us 0.89% -24.364 us -13.50% FAST
F32 I8 I32 2^28 0.201 2.510 ms 0.48% 2.239 ms 2.15% -270.776 us -10.79% FAST
F32 I8 I64 2^16 1 21.027 us 2.54% 21.542 us 2.61% 0.515 us 2.45% SAME
F32 I8 I64 2^20 1 34.730 us 1.30% 32.159 us 1.60% -2.571 us -7.40% FAST
F32 I8 I64 2^24 1 202.029 us 1.16% 179.382 us 0.56% -22.647 us -11.21% FAST
F32 I8 I64 2^28 1 2.951 ms 2.11% 2.613 ms 2.57% -338.502 us -11.47% FAST
F32 I8 I64 2^16 0.201 20.438 us 2.78% 20.661 us 2.76% 0.223 us 1.09% SAME
F32 I8 I64 2^20 0.201 32.678 us 1.42% 30.135 us 1.72% -2.543 us -7.78% FAST
F32 I8 I64 2^24 0.201 184.055 us 0.59% 159.823 us 0.81% -24.232 us -13.17% FAST
F32 I8 I64 2^28 0.201 2.641 ms 2.69% 2.315 ms 2.25% -325.731 us -12.34% FAST
F32 I16 I32 2^16 1 20.459 us 2.58% 20.748 us 3.00% 0.289 us 1.41% SAME
F32 I16 I32 2^20 1 30.780 us 1.49% 32.037 us 1.55% 1.257 us 4.08% SLOW
F32 I16 I32 2^24 1 168.753 us 0.79% 177.191 us 1.18% 8.438 us 5.00% SLOW
F32 I16 I32 2^28 1 2.533 ms 2.22% 2.485 ms 1.50% -48.521 us -1.92% FAST
F32 I16 I32 2^16 0.201 19.663 us 2.94% 20.651 us 2.75% 0.987 us 5.02% SLOW
F32 I16 I32 2^20 0.201 28.913 us 1.83% 30.550 us 1.61% 1.637 us 5.66% SLOW
F32 I16 I32 2^24 0.201 163.477 us 1.02% 165.267 us 1.02% 1.790 us 1.09% SLOW
F32 I16 I32 2^28 0.201 2.354 ms 1.40% 2.308 ms 1.14% -46.307 us -1.97% FAST
F32 I16 I64 2^16 1 20.923 us 2.86% 21.129 us 2.82% 0.205 us 0.98% SAME
F32 I16 I64 2^20 1 31.018 us 1.58% 32.132 us 1.59% 1.115 us 3.59% SLOW
F32 I16 I64 2^24 1 171.343 us 0.59% 199.153 us 0.60% 27.810 us 16.23% SLOW
F32 I16 I64 2^28 1 2.614 ms 2.39% 2.889 ms 1.72% 274.981 us 10.52% SLOW
F32 I16 I64 2^16 0.201 19.803 us 2.90% 20.616 us 2.70% 0.813 us 4.11% SLOW
F32 I16 I64 2^20 0.201 29.219 us 1.84% 30.559 us 1.56% 1.341 us 4.59% SLOW
F32 I16 I64 2^24 0.201 162.904 us 0.97% 177.546 us 0.98% 14.642 us 8.99% SLOW
F32 I16 I64 2^28 0.201 2.346 ms 1.32% 2.601 ms 1.23% 255.630 us 10.90% SLOW
F32 I32 I32 2^16 1 19.935 us 3.08% 20.846 us 2.67% 0.911 us 4.57% SLOW
F32 I32 I32 2^20 1 34.952 us 1.35% 34.840 us 1.40% -0.112 us -0.32% SAME
F32 I32 I32 2^24 1 211.176 us 0.89% 215.662 us 1.08% 4.486 us 2.12% SLOW
F32 I32 I32 2^28 1 3.225 ms 2.07% 3.137 ms 1.21% -88.128 us -2.73% FAST
F32 I32 I32 2^16 0.201 19.387 us 2.85% 20.207 us 3.07% 0.820 us 4.23% SLOW
F32 I32 I32 2^20 0.201 33.201 us 1.72% 32.816 us 1.45% -0.385 us -1.16% SAME
F32 I32 I32 2^24 0.201 199.528 us 0.91% 203.996 us 0.81% 4.467 us 2.24% SLOW
F32 I32 I32 2^28 0.201 2.961 ms 0.68% 2.954 ms 0.58% -6.536 us -0.22% SAME
F32 I32 I64 2^16 1 20.003 us 2.91% 20.778 us 2.79% 0.775 us 3.88% SLOW
F32 I32 I64 2^20 1 34.888 us 1.31% 39.714 us 1.31% 4.827 us 13.83% SLOW
F32 I32 I64 2^24 1 211.561 us 0.88% 249.070 us 1.04% 37.509 us 17.73% SLOW
F32 I32 I64 2^28 1 3.218 ms 1.19% 3.733 ms 1.51% 514.802 us 16.00% SLOW
F32 I32 I64 2^16 0.201 19.600 us 2.86% 20.442 us 2.79% 0.843 us 4.30% SLOW
F32 I32 I64 2^20 0.201 33.273 us 1.62% 35.948 us 1.53% 2.676 us 8.04% SLOW
F32 I32 I64 2^24 0.201 199.593 us 0.88% 222.097 us 0.93% 22.504 us 11.28% SLOW
F32 I32 I64 2^28 0.201 2.969 ms 0.58% 3.348 ms 1.13% 379.059 us 12.77% SLOW
F32 I64 I32 2^16 1 20.250 us 2.82% 21.159 us 2.67% 0.909 us 4.49% SLOW
F32 I64 I32 2^20 1 38.884 us 1.50% 40.257 us 1.32% 1.373 us 3.53% SLOW
F32 I64 I32 2^24 1 280.849 us 0.57% 296.886 us 0.76% 16.037 us 5.71% SLOW
F32 I64 I32 2^28 1 4.292 ms 0.50% 4.520 ms 0.59% 227.337 us 5.30% SLOW
F32 I64 I32 2^16 0.201 19.360 us 2.75% 20.543 us 2.68% 1.184 us 6.11% SLOW
F32 I64 I32 2^20 0.201 37.278 us 1.54% 38.038 us 1.30% 0.760 us 2.04% SLOW
F32 I64 I32 2^24 0.201 274.094 us 0.50% 285.914 us 0.66% 11.821 us 4.31% SLOW
F32 I64 I32 2^28 0.201 4.133 ms 0.50% 4.332 ms 0.50% 199.040 us 4.82% SLOW
F32 I64 I64 2^16 1 20.048 us 3.06% 20.989 us 2.79% 0.941 us 4.69% SLOW
F32 I64 I64 2^20 1 39.194 us 1.49% 43.092 us 1.18% 3.898 us 9.95% SLOW
F32 I64 I64 2^24 1 280.869 us 0.56% 335.947 us 0.85% 55.078 us 19.61% SLOW
F32 I64 I64 2^28 1 4.299 ms 0.50% 5.312 ms 1.04% 1.013 ms 23.56% SLOW
F32 I64 I64 2^16 0.201 19.312 us 2.88% 20.800 us 2.62% 1.489 us 7.71% SLOW
F32 I64 I64 2^20 0.201 37.689 us 1.56% 40.390 us 1.37% 2.701 us 7.17% SLOW
F32 I64 I64 2^24 0.201 274.120 us 0.51% 309.555 us 0.83% 35.435 us 12.93% SLOW
F32 I64 I64 2^28 0.201 4.138 ms 0.50% 4.788 ms 0.64% 650.211 us 15.71% SLOW
F64 I8 I32 2^16 1 20.899 us 2.82% 22.189 us 2.80% 1.291 us 6.18% SLOW
F64 I8 I32 2^20 1 39.014 us 1.39% 40.328 us 1.32% 1.313 us 3.37% SLOW
F64 I8 I32 2^24 1 243.880 us 0.85% 254.781 us 1.21% 10.902 us 4.47% SLOW
F64 I8 I32 2^28 1 3.885 ms 1.58% 3.933 ms 0.87% 47.850 us 1.23% SLOW
F64 I8 I32 2^16 0.201 20.242 us 3.19% 21.082 us 3.03% 0.840 us 4.15% SLOW
F64 I8 I32 2^20 0.201 36.449 us 1.55% 36.305 us 1.48% -0.144 us -0.40% SAME
F64 I8 I32 2^24 0.201 230.766 us 1.03% 231.946 us 0.92% 1.180 us 0.51% SAME
F64 I8 I32 2^28 0.201 3.570 ms 0.94% 3.477 ms 0.74% -92.462 us -2.59% FAST
F64 I8 I64 2^16 1 21.381 us 3.03% 22.119 us 2.99% 0.739 us 3.46% SLOW
F64 I8 I64 2^20 1 40.369 us 1.43% 43.454 us 1.21% 3.085 us 7.64% SLOW
F64 I8 I64 2^24 1 245.838 us 0.80% 301.233 us 1.01% 55.395 us 22.53% SLOW
F64 I8 I64 2^28 1 3.906 ms 1.34% 4.773 ms 0.82% 866.645 us 22.19% SLOW
F64 I8 I64 2^16 0.201 20.248 us 3.09% 21.169 us 2.95% 0.921 us 4.55% SLOW
F64 I8 I64 2^20 0.201 36.978 us 1.60% 39.187 us 1.32% 2.209 us 5.97% SLOW
F64 I8 I64 2^24 0.201 231.337 us 0.98% 262.759 us 0.83% 31.422 us 13.58% SLOW
F64 I8 I64 2^28 0.201 3.570 ms 0.78% 4.108 ms 0.95% 537.332 us 15.05% SLOW
F64 I16 I32 2^16 1 21.086 us 3.40% 21.931 us 2.80% 0.845 us 4.01% SLOW
F64 I16 I32 2^20 1 39.860 us 1.37% 43.663 us 1.21% 3.803 us 9.54% SLOW
F64 I16 I32 2^24 1 260.237 us 0.72% 301.764 us 1.21% 41.527 us 15.96% SLOW
F64 I16 I32 2^28 1 4.161 ms 3.05% 4.751 ms 0.82% 590.474 us 14.19% SLOW
F64 I16 I32 2^16 0.201 20.288 us 3.19% 21.256 us 3.05% 0.968 us 4.77% SLOW
F64 I16 I32 2^20 0.201 37.029 us 1.56% 39.670 us 1.42% 2.641 us 7.13% SLOW
F64 I16 I32 2^24 0.201 244.721 us 0.83% 273.772 us 0.86% 29.051 us 11.87% SLOW
F64 I16 I32 2^28 0.201 3.763 ms 0.73% 4.225 ms 0.84% 461.307 us 12.26% SLOW
F64 I16 I64 2^16 1 21.418 us 2.97% 22.369 us 2.77% 0.951 us 4.44% SLOW
F64 I16 I64 2^20 1 40.089 us 1.29% 44.022 us 1.11% 3.933 us 9.81% SLOW
F64 I16 I64 2^24 1 261.043 us 0.73% 310.383 us 1.17% 49.340 us 18.90% SLOW
F64 I16 I64 2^28 1 4.184 ms 2.96% 4.957 ms 0.92% 773.098 us 18.48% SLOW
F64 I16 I64 2^16 0.201 20.321 us 3.05% 21.583 us 2.79% 1.262 us 6.21% SLOW
F64 I16 I64 2^20 0.201 37.333 us 1.49% 39.811 us 1.35% 2.478 us 6.64% SLOW
F64 I16 I64 2^24 0.201 245.545 us 0.82% 275.042 us 0.86% 29.497 us 12.01% SLOW
F64 I16 I64 2^28 0.201 3.793 ms 0.62% 4.269 ms 0.68% 476.165 us 12.55% SLOW
F64 I32 I32 2^16 1 21.101 us 3.05% 21.988 us 2.80% 0.887 us 4.20% SLOW
F64 I32 I32 2^20 1 40.095 us 1.31% 40.973 us 1.50% 0.879 us 2.19% SLOW
F64 I32 I32 2^24 1 284.629 us 0.61% 307.083 us 0.92% 22.454 us 7.89% SLOW
F64 I32 I32 2^28 1 4.377 ms 0.56% 4.769 ms 2.02% 391.853 us 8.95% SLOW
F64 I32 I32 2^16 0.201 19.928 us 3.41% 21.354 us 3.00% 1.426 us 7.15% SLOW
F64 I32 I32 2^20 0.201 38.616 us 1.47% 38.428 us 1.46% -0.187 us -0.49% SAME
F64 I32 I32 2^24 0.201 274.716 us 0.56% 287.486 us 0.71% 12.770 us 4.65% SLOW
F64 I32 I32 2^28 0.201 4.152 ms 0.50% 4.365 ms 0.50% 213.091 us 5.13% SLOW
F64 I32 I64 2^16 1 20.836 us 3.10% 21.854 us 3.18% 1.018 us 4.89% SLOW
F64 I32 I64 2^20 1 39.915 us 1.36% 45.224 us 1.09% 5.309 us 13.30% SLOW
F64 I32 I64 2^24 1 284.848 us 0.59% 350.763 us 1.08% 65.916 us 23.14% SLOW
F64 I32 I64 2^28 1 4.420 ms 0.56% 5.557 ms 0.70% 1.136 ms 25.71% SLOW
F64 I32 I64 2^16 0.201 20.263 us 3.19% 21.482 us 2.94% 1.219 us 6.02% SLOW
F64 I32 I64 2^20 0.201 38.233 us 1.52% 41.344 us 1.34% 3.111 us 8.14% SLOW
F64 I32 I64 2^24 0.201 274.931 us 0.52% 314.746 us 0.84% 39.815 us 14.48% SLOW
F64 I32 I64 2^28 0.201 4.164 ms 0.50% 4.908 ms 0.73% 744.433 us 17.88% SLOW
F64 I64 I32 2^16 1 20.374 us 3.29% 21.772 us 2.84% 1.398 us 6.86% SLOW
F64 I64 I32 2^20 1 44.060 us 1.28% 53.996 us 1.17% 9.936 us 22.55% SLOW
F64 I64 I32 2^24 1 373.512 us 0.42% 449.081 us 0.70% 75.568 us 20.23% SLOW
F64 I64 I32 2^28 1 5.843 ms 0.50% 7.365 ms 1.02% 1.522 ms 26.05% SLOW
F64 I64 I32 2^16 0.201 19.385 us 3.37% 20.607 us 2.74% 1.222 us 6.30% SLOW
F64 I64 I32 2^20 0.201 42.097 us 1.45% 48.004 us 1.25% 5.907 us 14.03% SLOW
F64 I64 I32 2^24 0.201 358.626 us 0.46% 408.884 us 0.70% 50.257 us 14.01% SLOW
F64 I64 I32 2^28 0.201 5.560 ms 0.50% 6.578 ms 0.50% 1.018 ms 18.30% SLOW
F64 I64 I64 2^16 1 20.259 us 3.13% 21.777 us 2.81% 1.518 us 7.49% SLOW
F64 I64 I64 2^20 1 44.383 us 1.28% 54.355 us 1.14% 9.972 us 22.47% SLOW
F64 I64 I64 2^24 1 373.854 us 0.43% 453.855 us 0.72% 80.000 us 21.40% SLOW
F64 I64 I64 2^28 1 5.835 ms 0.50% 7.484 ms 0.92% 1.650 ms 28.28% SLOW
F64 I64 I64 2^16 0.201 19.835 us 3.19% 21.157 us 2.96% 1.322 us 6.66% SLOW
F64 I64 I64 2^20 0.201 42.434 us 1.42% 48.316 us 1.28% 5.882 us 13.86% SLOW
F64 I64 I64 2^24 0.201 358.885 us 0.45% 409.272 us 0.73% 50.387 us 14.04% SLOW
F64 I64 I64 2^28 0.201 5.566 ms 0.50% 6.543 ms 0.50% 977.660 us 17.57% SLOW

Summary

  • Total Matches: 448
    • Pass (diff <= min_noise): 49
    • Unknown (infinite noise): 0
    • Failure (diff > min_noise): 399

@pauleonix
Copy link
Contributor Author

pauleonix commented Oct 5, 2025

@bernhardmgruber It seems that benchmarking small problem sizes (2^16) is not super reproducible on the RTX 5090. I replaced the old results with a newer run which has far less and smaller slowdowns. The few bigger slowdowns at 2^16 elements appear for different benchmarks than the slowdowns in the previous results. I can also see differences in the 2^16 results between running just those benchmarks or the full default set including bigger sizes.

The new run also has a slightly different environment hopefully nearer to the cluster (CTK 12.9->13.0, graphical->multiuser and some other probably less important things like kernel updates, turned on persistence mode) which might have somewhat increased reproducibility (runtimes for the old implementation have profited more than runtimes for the new implementation).

bernhardmgruber added a commit to bernhardmgruber/cccl that referenced this pull request Oct 7, 2025
This avoids needing to specify the cache modifier in two places and eases integration of NVIDIA#6077
bernhardmgruber added a commit that referenced this pull request Oct 7, 2025
This avoids needing to specify the cache modifier in two places and eases integration of #6077
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Nov 3, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@bernhardmgruber
Copy link
Contributor

/ok to test cf6dfa4

@bernhardmgruber
Copy link
Contributor

I rebased the branch on main.

(void) translate_indices;

item_type* items1_shared;
if constexpr (keys_use_block_load_to_shared)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: I think this should be items_use_block_load_to_shared

Copy link
Contributor

@bernhardmgruber bernhardmgruber Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is exposed by new tests: #6455. But only after some refactoring that I have not yet pushed to this branch.

@bernhardmgruber
Copy link
Contributor

/ok to test c2c52f0

@bernhardmgruber
Copy link
Contributor

@pauleonix and @elstehle #6460 offers a refactoring and additional tuning policies on top of this PR. We should merge it into this PR and then merge this PR.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 5, 2025

🥳 CI Workflow Results

🟩 Finished in 5h 21m: Pass: 100%/81 | Total: 4d 21h | Max: 5h 20m | Hits: 50%/72769

See results here.

@bernhardmgruber bernhardmgruber marked this pull request as ready for review November 6, 2025 09:17
@bernhardmgruber bernhardmgruber requested review from a team as code owners November 6, 2025 09:17
@cccl-authenticator-app cccl-authenticator-app bot moved this from In Progress to In Review in CCCL Nov 6, 2025
@bernhardmgruber bernhardmgruber merged commit e2f868f into NVIDIA:main Nov 6, 2025
186 of 189 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Nov 6, 2025
@bernhardmgruber
Copy link
Contributor

Great work @pauleonix! The first CUB algo is now using BlockLoadToShared in production :)

@miscco
Copy link
Contributor

miscco commented Nov 6, 2025

That is awesome @pauleonix , great job 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[FEA] Optimize cub::DeviceMerge by using cub::detail::BlockLoadToShared

3 participants