Skip to content

Conversation

@pauleonix
Copy link
Contributor

@pauleonix pauleonix commented Sep 18, 2025

Description

closes #6005

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@pauleonix pauleonix self-assigned this Sep 18, 2025
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Sep 18, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-project-automation github-project-automation bot moved this to Todo in CCCL Sep 18, 2025
@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Progress in CCCL Sep 18, 2025
@pauleonix pauleonix force-pushed the device_merge_tma branch 2 times, most recently from 049f46a to a24e563 Compare September 18, 2025 22:22
@pauleonix pauleonix mentioned this pull request Sep 19, 2025
3 tasks
@pauleonix

This comment was marked as resolved.

@pauleonix
Copy link
Contributor Author

pauleonix commented Sep 19, 2025

Benchmarking cub.bench.merge.keys.base (Updated with the newest correctness fixes)

RTX 5090 (Using UBLKCPY)

['/home/pgrossebley/merge_keys_old.json', '/home/pgrossebley/merge_keys_newer.json']

base

[0] NVIDIA GeForce RTX 5090

KeyT{ct} OffsetT{ct} Elements{io} Entropy Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I32 2^16 1 12.675 us 8.89% 13.828 us 6.40% 1.153 us 9.10% SLOW
I8 I32 2^20 1 14.346 us 1.00% 16.382 us 0.43% 2.036 us 14.19% SLOW
I8 I32 2^24 1 42.092 us 2.26% 45.054 us 0.12% 2.962 us 7.04% SLOW
I8 I32 2^28 1 451.306 us 0.26% 475.488 us 0.18% 24.182 us 5.36% SLOW
I8 I32 2^16 0.201 13.689 us 7.60% 13.083 us 8.76% -0.606 us -4.42% SAME
I8 I32 2^20 0.201 16.384 us 0.25% 15.971 us 1.29% -0.413 us -2.52% FAST
I8 I32 2^24 0.201 41.863 us 2.44% 44.441 us 2.30% 2.578 us 6.16% SLOW
I8 I32 2^28 0.201 432.111 us 0.23% 457.249 us 0.22% 25.138 us 5.82% SLOW
I8 I64 2^16 1 13.261 us 5.80% 14.104 us 4.61% 0.844 us 6.36% SLOW
I8 I64 2^20 1 16.384 us 0.19% 16.376 us 1.16% -0.008 us -0.05% SAME
I8 I64 2^24 1 42.961 us 0.82% 43.037 us 0.57% 0.076 us 0.18% SAME
I8 I64 2^28 1 451.983 us 0.16% 459.084 us 0.18% 7.101 us 1.57% SLOW
I8 I64 2^16 0.201 13.448 us 7.22% 12.602 us 7.84% -0.846 us -6.29% SAME
I8 I64 2^20 0.201 16.176 us 3.82% 16.385 us 0.41% 0.208 us 1.29% SLOW
I8 I64 2^24 0.201 42.265 us 2.46% 42.685 us 1.77% 0.420 us 0.99% SAME
I8 I64 2^28 0.201 432.584 us 0.21% 440.786 us 0.21% 8.202 us 1.90% SLOW
I16 I32 2^16 1 14.265 us 1.26% 14.208 us 3.48% -0.057 us -0.40% SAME
I16 I32 2^20 1 16.471 us 2.57% 16.711 us 4.70% 0.240 us 1.46% SAME
I16 I32 2^24 1 57.591 us 1.18% 55.341 us 0.56% -2.250 us -3.91% FAST
I16 I32 2^28 1 804.790 us 0.19% 800.113 us 0.15% -4.678 us -0.58% FAST
I16 I32 2^16 0.201 12.130 us 5.35% 12.797 us 6.91% 0.667 us 5.50% SLOW
I16 I32 2^20 0.201 16.387 us 0.67% 16.418 us 1.76% 0.031 us 0.19% SAME
I16 I32 2^24 0.201 55.597 us 1.32% 53.483 us 1.24% -2.114 us -3.80% FAST
I16 I32 2^28 0.201 771.795 us 0.16% 771.045 us 0.19% -0.750 us -0.10% SAME
I16 I64 2^16 1 13.996 us 1.77% 14.294 us 2.06% 0.298 us 2.13% SLOW
I16 I64 2^20 1 16.566 us 3.55% 17.339 us 6.00% 0.773 us 4.66% SLOW
I16 I64 2^24 1 57.774 us 1.46% 55.881 us 1.68% -1.893 us -3.28% FAST
I16 I64 2^28 1 804.123 us 0.16% 800.113 us 0.15% -4.010 us -0.50% FAST
I16 I64 2^16 0.201 12.998 us 7.99% 12.678 us 6.37% -0.320 us -2.46% SAME
I16 I64 2^20 0.201 16.661 us 4.24% 16.613 us 3.92% -0.048 us -0.29% SAME
I16 I64 2^24 0.201 56.234 us 1.84% 54.794 us 1.69% -1.440 us -2.56% FAST
I16 I64 2^28 0.201 772.242 us 0.17% 770.761 us 0.19% -1.481 us -0.19% FAST
I32 I32 2^16 1 14.246 us 1.39% 14.334 us 0.21% 0.088 us 0.62% SLOW
I32 I32 2^20 1 17.156 us 5.90% 19.271 us 5.20% 2.114 us 12.32% SLOW
I32 I32 2^24 1 103.841 us 1.29% 102.454 us 1.49% -1.387 us -1.34% FAST
I32 I32 2^28 1 1.528 ms 0.10% 1.518 ms 0.10% -10.121 us -0.66% FAST
I32 I32 2^16 0.201 13.975 us 1.72% 14.306 us 1.72% 0.331 us 2.37% SLOW
I32 I32 2^20 0.201 19.317 us 5.27% 18.877 us 4.50% -0.441 us -2.28% SAME
I32 I32 2^24 0.201 101.781 us 1.31% 101.826 us 1.38% 0.045 us 0.04% SAME
I32 I32 2^28 0.201 1.497 ms 0.09% 1.492 ms 0.09% -4.919 us -0.33% FAST
I32 I64 2^16 1 13.977 us 1.73% 14.336 us 0.08% 0.359 us 2.57% SLOW
I32 I64 2^20 1 17.631 us 7.26% 20.198 us 3.49% 2.567 us 14.56% SLOW
I32 I64 2^24 1 104.375 us 1.05% 103.091 us 1.21% -1.284 us -1.23% FAST
I32 I64 2^28 1 1.529 ms 0.10% 1.519 ms 0.10% -10.342 us -0.68% FAST
I32 I64 2^16 0.201 13.939 us 1.60% 13.438 us 7.55% -0.501 us -3.60% FAST
I32 I64 2^20 0.201 19.430 us 5.28% 19.173 us 5.15% -0.257 us -1.32% SAME
I32 I64 2^24 0.201 102.184 us 0.96% 101.904 us 1.23% -0.280 us -0.27% SAME
I32 I64 2^28 0.201 1.498 ms 0.09% 1.492 ms 0.10% -5.352 us -0.36% FAST
I64 I32 2^16 1 14.351 us 2.59% 14.676 us 5.21% 0.325 us 2.26% SAME
I64 I32 2^20 1 22.542 us 0.80% 22.750 us 2.81% 0.208 us 0.92% SLOW
I64 I32 2^24 1 197.287 us 0.55% 197.205 us 0.55% -0.082 us -0.04% SAME
I64 I32 2^28 1 3.052 ms 0.06% 3.038 ms 0.06% -13.716 us -0.45% FAST
I64 I32 2^16 0.201 9.890 us 2.55% 12.971 us 7.90% 3.081 us 31.16% SLOW
I64 I32 2^20 0.201 22.527 us 0.33% 22.527 us 0.22% -0.001 us -0.00% SAME
I64 I32 2^24 0.201 192.536 us 0.55% 193.461 us 0.61% 0.925 us 0.48% SAME
I64 I32 2^28 0.201 2.994 ms 0.05% 2.987 ms 0.06% -6.973 us -0.23% FAST
I64 I64 2^16 1 9.868 us 2.39% 15.312 us 6.71% 5.443 us 55.16% SLOW
I64 I64 2^20 1 22.564 us 1.20% 22.535 us 1.16% -0.029 us -0.13% SAME
I64 I64 2^24 1 196.942 us 0.57% 197.354 us 0.60% 0.413 us 0.21% SAME
I64 I64 2^28 1 3.053 ms 0.06% 3.040 ms 0.06% -13.221 us -0.43% FAST
I64 I64 2^16 0.201 13.651 us 10.18% 13.290 us 9.31% -0.360 us -2.64% SAME
I64 I64 2^20 0.201 22.531 us 0.38% 22.538 us 0.69% 0.007 us 0.03% SAME
I64 I64 2^24 0.201 192.213 us 0.65% 193.533 us 0.59% 1.320 us 0.69% SLOW
I64 I64 2^28 0.201 2.996 ms 0.05% 2.987 ms 0.06% -8.204 us -0.27% FAST
I128 I32 2^16 1 15.630 us 5.60% 16.382 us 0.37% 0.752 us 4.81% SLOW
I128 I32 2^20 1 34.417 us 2.57% 34.761 us 2.71% 0.344 us 1.00% SAME
I128 I32 2^24 1 386.795 us 0.34% 386.869 us 0.29% 0.075 us 0.02% SAME
I128 I32 2^28 1 6.164 ms 0.03% 6.147 ms 0.05% -16.109 us -0.26% FAST
I128 I32 2^16 0.201 15.276 us 6.63% 16.378 us 0.63% 1.102 us 7.22% SLOW
I128 I32 2^20 0.201 33.977 us 3.33% 33.705 us 3.89% -0.272 us -0.80% SAME
I128 I32 2^24 0.201 376.207 us 0.34% 378.449 us 0.32% 2.243 us 0.60% SLOW
I128 I32 2^28 0.201 6.039 ms 0.04% 6.027 ms 0.07% -11.724 us -0.19% FAST
I128 I64 2^16 1 15.903 us 2.98% 16.384 us 0.37% 0.481 us 3.03% SLOW
I128 I64 2^20 1 35.786 us 2.93% 35.531 us 2.82% -0.255 us -0.71% SAME
I128 I64 2^24 1 386.796 us 0.35% 387.148 us 0.31% 0.352 us 0.09% SAME
I128 I64 2^28 1 6.165 ms 0.04% 6.149 ms 0.06% -16.211 us -0.26% FAST
I128 I64 2^16 0.201 15.969 us 1.30% 16.384 us 0.24% 0.415 us 2.60% SLOW
I128 I64 2^20 0.201 33.161 us 4.23% 32.840 us 3.42% -0.322 us -0.97% SAME
I128 I64 2^24 0.201 376.372 us 0.33% 378.614 us 0.31% 2.242 us 0.60% SLOW
I128 I64 2^28 0.201 6.040 ms 0.03% 6.030 ms 0.06% -10.711 us -0.18% FAST
F32 I32 2^16 1 14.117 us 1.83% 14.335 us 0.46% 0.218 us 1.54% SLOW
F32 I32 2^20 1 18.808 us 4.21% 19.589 us 5.18% 0.782 us 4.16% SAME
F32 I32 2^24 1 103.413 us 1.39% 102.675 us 1.44% -0.738 us -0.71% SAME
F32 I32 2^28 1 1.529 ms 0.10% 1.518 ms 0.10% -10.742 us -0.70% FAST
F32 I32 2^16 0.201 13.053 us 7.89% 14.319 us 1.30% 1.266 us 9.70% SLOW
F32 I32 2^20 0.201 18.549 us 2.58% 19.447 us 5.27% 0.898 us 4.84% SLOW
F32 I32 2^24 0.201 100.287 us 1.24% 101.776 us 1.38% 1.489 us 1.48% SLOW
F32 I32 2^28 0.201 1.497 ms 0.08% 1.492 ms 0.11% -4.954 us -0.33% FAST
F32 I64 2^16 1 13.929 us 1.54% 14.336 us 0.43% 0.407 us 2.92% SLOW
F32 I64 2^20 1 19.666 us 5.09% 19.984 us 4.40% 0.317 us 1.61% SAME
F32 I64 2^24 1 103.036 us 1.42% 103.101 us 1.22% 0.065 us 0.06% SAME
F32 I64 2^28 1 1.529 ms 0.12% 1.519 ms 0.11% -10.785 us -0.71% FAST
F32 I64 2^16 0.201 13.908 us 1.45% 14.250 us 2.90% 0.342 us 2.46% SLOW
F32 I64 2^20 0.201 19.336 us 5.26% 19.599 us 5.17% 0.263 us 1.36% SAME
F32 I64 2^24 0.201 100.649 us 1.35% 101.710 us 1.36% 1.061 us 1.05% SAME
F32 I64 2^28 0.201 1.498 ms 0.09% 1.492 ms 0.11% -5.441 us -0.36% FAST
F64 I32 2^16 1 14.529 us 3.80% 15.141 us 6.60% 0.612 us 4.21% SLOW
F64 I32 2^20 1 24.576 us 0.29% 24.586 us 0.73% 0.010 us 0.04% SAME
F64 I32 2^24 1 196.678 us 0.65% 198.255 us 0.62% 1.577 us 0.80% SLOW
F64 I32 2^28 1 3.051 ms 0.06% 3.039 ms 0.06% -12.583 us -0.41% FAST
F64 I32 2^16 0.201 14.699 us 5.87% 15.529 us 6.50% 0.830 us 5.65% SAME
F64 I32 2^20 0.201 22.573 us 1.35% 22.685 us 2.43% 0.112 us 0.50% SAME
F64 I32 2^24 0.201 191.889 us 0.61% 193.579 us 0.61% 1.690 us 0.88% SLOW
F64 I32 2^28 0.201 2.991 ms 0.06% 2.987 ms 0.06% -4.590 us -0.15% FAST
F64 I64 2^16 1 15.432 us 4.54% 16.027 us 4.87% 0.594 us 3.85% SAME
F64 I64 2^20 1 24.587 us 1.10% 24.618 us 1.20% 0.031 us 0.13% SAME
F64 I64 2^24 1 196.678 us 0.66% 198.473 us 0.66% 1.795 us 0.91% SLOW
F64 I64 2^28 1 3.052 ms 0.06% 3.039 ms 0.06% -12.461 us -0.41% FAST
F64 I64 2^16 0.201 14.978 us 5.92% 16.049 us 4.73% 1.071 us 7.15% SLOW
F64 I64 2^20 0.201 22.544 us 0.81% 22.790 us 3.05% 0.246 us 1.09% SLOW
F64 I64 2^24 0.201 192.605 us 0.54% 193.719 us 0.62% 1.113 us 0.58% SLOW
F64 I64 2^28 0.201 2.993 ms 0.06% 2.987 ms 0.07% -5.919 us -0.20% FAST

Summary

  • Total Matches: 112
    • Pass (diff <= min_noise): 40
    • Unknown (infinite noise): 0
    • Failure (diff > min_noise): 72
B200 (Using UBLKCPY) ['/home/pgrossebley/merge_keys_old.json', '/home/pgrossebley/merge_keys_new.json'] # base

[0] NVIDIA B200

KeyT{ct} OffsetT{ct} Elements{io} Entropy Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I32 2^16 1 17.389 us 1.28% 17.378 us 1.28% -0.011 us -0.06% SAME
I8 I32 2^20 1 21.478 us 1.80% 21.567 us 46.15% 0.089 us 0.42% SAME
I8 I32 2^24 1 48.591 us 1.84% 47.741 us 2.82% -0.850 us -1.75% SAME
I8 I32 2^28 1 404.752 us 0.16% 384.518 us 0.24% -20.234 us -5.00% FAST
I8 I32 2^16 0.201 17.341 us 1.98% 17.380 us 2.37% 0.039 us 0.22% SAME
I8 I32 2^20 0.201 21.409 us 1.65% 21.415 us 1.67% 0.006 us 0.03% SAME
I8 I32 2^24 0.201 47.192 us 2.30% 46.035 us 0.72% -1.157 us -2.45% FAST
I8 I32 2^28 0.201 402.483 us 0.09% 380.330 us 0.23% -22.153 us -5.50% FAST
I8 I64 2^16 1 17.243 us 2.10% 17.320 us 3.10% 0.077 us 0.45% SAME
I8 I64 2^20 1 21.502 us 46.46% 21.510 us 2.31% 0.009 us 0.04% SAME
I8 I64 2^24 1 48.485 us 30.65% 46.437 us 1.75% -2.047 us -4.22% FAST
I8 I64 2^28 1 405.787 us 0.32% 372.046 us 0.20% -33.741 us -8.31% FAST
I8 I64 2^16 0.201 17.353 us 2.06% 17.325 us 2.02% -0.028 us -0.16% SAME
I8 I64 2^20 0.201 21.469 us 1.43% 21.479 us 1.40% 0.010 us 0.04% SAME
I8 I64 2^24 0.201 47.265 us 2.68% 44.426 us 2.41% -2.839 us -6.01% FAST
I8 I64 2^28 0.201 402.439 us 0.06% 367.616 us 0.06% -34.823 us -8.65% FAST
I16 I32 2^16 1 17.471 us 2.11% 17.501 us 2.74% 0.030 us 0.17% SAME
I16 I32 2^20 1 22.708 us 44.91% 23.352 us 6.03% 0.643 us 2.83% SAME
I16 I32 2^24 1 58.316 us 0.38% 55.279 us 2.05% -3.038 us -5.21% FAST
I16 I32 2^28 1 545.863 us 0.11% 500.980 us 0.17% -44.883 us -8.22% FAST
I16 I32 2^16 0.201 17.900 us 4.15% 18.083 us 4.37% 0.184 us 1.03% SAME
I16 I32 2^20 0.201 22.092 us 3.12% 22.086 us 2.99% -0.006 us -0.03% SAME
I16 I32 2^24 0.201 53.049 us 2.19% 49.589 us 1.70% -3.459 us -6.52% FAST
I16 I32 2^28 0.201 465.034 us 0.21% 432.329 us 0.24% -32.705 us -7.03% FAST
I16 I64 2^16 1 17.671 us 3.43% 18.325 us 6.24% 0.654 us 3.70% SLOW
I16 I64 2^20 1 22.566 us 4.66% 23.422 us 6.36% 0.856 us 3.79% SAME
I16 I64 2^24 1 57.522 us 2.15% 56.278 us 0.85% -1.244 us -2.16% FAST
I16 I64 2^28 1 521.339 us 0.10% 525.322 us 0.10% 3.983 us 0.76% SLOW
I16 I64 2^16 0.201 18.255 us 4.73% 18.016 us 4.33% -0.240 us -1.31% SAME
I16 I64 2^20 0.201 22.300 us 3.92% 22.407 us 4.05% 0.107 us 0.48% SAME
I16 I64 2^24 0.201 51.942 us 29.90% 50.803 us 2.93% -1.139 us -2.19% SAME
I16 I64 2^28 0.201 453.013 us 0.21% 439.105 us 0.27% -13.908 us -3.07% FAST
I32 I32 2^16 1 19.060 us 4.00% 19.119 us 4.31% 0.059 us 0.31% SAME
I32 I32 2^20 1 24.838 us 4.20% 24.963 us 42.38% 0.125 us 0.50% SAME
I32 I32 2^24 1 58.826 us 1.14% 56.760 us 1.50% -2.066 us -3.51% FAST
I32 I32 2^28 1 539.030 us 0.27% 485.078 us 0.20% -53.952 us -10.01% FAST
I32 I32 2^16 0.201 18.890 us 4.39% 19.049 us 3.80% 0.159 us 0.84% SAME
I32 I32 2^20 0.201 23.601 us 2.75% 23.831 us 3.93% 0.230 us 0.97% SAME
I32 I32 2^24 0.201 52.383 us 29.40% 50.150 us 0.69% -2.233 us -4.26% FAST
I32 I32 2^28 0.201 426.846 us 0.18% 406.572 us 0.11% -20.274 us -4.75% FAST
I32 I64 2^16 1 19.283 us 2.46% 19.447 us 3.85% 0.165 us 0.85% SAME
I32 I64 2^20 1 24.727 us 4.45% 24.845 us 4.20% 0.118 us 0.48% SAME
I32 I64 2^24 1 59.066 us 1.41% 56.567 us 2.44% -2.499 us -4.23% FAST
I32 I64 2^28 1 539.364 us 0.27% 485.579 us 0.21% -53.785 us -9.97% FAST
I32 I64 2^16 0.201 19.060 us 3.55% 19.088 us 3.47% 0.028 us 0.15% SAME
I32 I64 2^20 0.201 23.542 us 43.87% 23.594 us 2.11% 0.051 us 0.22% SAME
I32 I64 2^24 0.201 52.325 us 0.87% 50.323 us 0.92% -2.002 us -3.83% FAST
I32 I64 2^28 0.201 426.684 us 0.19% 408.606 us 0.06% -18.079 us -4.24% FAST
I64 I32 2^16 1 19.849 us 3.47% 19.957 us 4.03% 0.107 us 0.54% SAME
I64 I32 2^20 1 29.003 us 3.44% 28.800 us 39.86% -0.203 us -0.70% SAME
I64 I32 2^24 1 94.105 us 1.16% 89.900 us 1.28% -4.205 us -4.47% FAST
I64 I32 2^28 1 1.120 ms 0.12% 1.022 ms 0.10% -97.557 us -8.71% FAST
I64 I32 2^16 0.201 19.623 us 3.66% 19.494 us 1.70% -0.130 us -0.66% SAME
I64 I32 2^20 0.201 26.281 us 41.35% 26.340 us 3.61% 0.059 us 0.23% SAME
I64 I32 2^24 0.201 72.615 us 0.57% 70.760 us 1.34% -1.855 us -2.56% FAST
I64 I32 2^28 0.201 763.833 us 3.94% 751.244 us 0.15% -12.589 us -1.65% FAST
I64 I64 2^16 1 20.015 us 3.85% 19.934 us 3.63% -0.080 us -0.40% SAME
I64 I64 2^20 1 29.256 us 2.59% 29.208 us 2.67% -0.048 us -0.16% SAME
I64 I64 2^24 1 94.512 us 1.26% 90.504 us 1.33% -4.008 us -4.24% FAST
I64 I64 2^28 1 1.125 ms 0.09% 1.033 ms 0.10% -92.018 us -8.18% FAST
I64 I64 2^16 0.201 19.692 us 3.01% 19.651 us 2.81% -0.040 us -0.20% SAME
I64 I64 2^20 0.201 26.175 us 2.91% 26.206 us 41.53% 0.032 us 0.12% SAME
I64 I64 2^24 0.201 72.382 us 0.98% 71.710 us 1.66% -0.672 us -0.93% SAME
I64 I64 2^28 0.201 764.441 us 5.73% 757.050 us 0.13% -7.391 us -0.97% FAST
I128 I32 2^16 1 21.427 us 2.36% 21.394 us 2.76% -0.033 us -0.15% SAME
I128 I32 2^20 1 35.960 us 1.36% 34.241 us 2.53% -1.719 us -4.78% FAST
I128 I32 2^24 1 192.586 us 0.60% 171.124 us 0.35% -21.461 us -11.14% FAST
I128 I32 2^28 1 2.728 ms 0.05% 2.347 ms 0.06% -381.424 us -13.98% FAST
I128 I32 2^16 0.201 21.209 us 2.21% 21.480 us 45.31% 0.271 us 1.28% SAME
I128 I32 2^20 0.201 31.704 us 1.07% 31.037 us 3.52% -0.666 us -2.10% FAST
I128 I32 2^24 0.201 130.773 us 33.32% 118.310 us 0.80% -12.464 us -9.53% FAST
I128 I32 2^28 0.201 1.715 ms 0.06% 1.545 ms 0.08% -169.631 us -9.89% FAST
I128 I64 2^16 1 21.370 us 1.73% 21.392 us 1.73% 0.022 us 0.10% SAME
I128 I64 2^20 1 36.213 us 35.09% 34.175 us 2.04% -2.038 us -5.63% FAST
I128 I64 2^24 1 193.755 us 0.37% 171.284 us 16.59% -22.471 us -11.60% FAST
I128 I64 2^28 1 2.750 ms 2.59% 2.344 ms 1.89% -405.779 us -14.76% FAST
I128 I64 2^16 0.201 21.451 us 1.58% 21.390 us 1.93% -0.061 us -0.29% SAME
I128 I64 2^20 0.201 31.858 us 37.25% 30.564 us 3.47% -1.294 us -4.06% FAST
I128 I64 2^24 0.201 128.192 us 0.54% 118.505 us 19.33% -9.686 us -7.56% FAST
I128 I64 2^28 0.201 1.692 ms 0.08% 1.547 ms 0.07% -144.625 us -8.55% FAST
F32 I32 2^16 1 19.010 us 4.19% 19.040 us 4.23% 0.030 us 0.16% SAME
F32 I32 2^20 1 24.744 us 4.16% 25.465 us 1.89% 0.721 us 2.92% SLOW
F32 I32 2^24 1 58.864 us 1.16% 57.091 us 1.75% -1.773 us -3.01% FAST
F32 I32 2^28 1 538.543 us 0.24% 486.106 us 0.20% -52.437 us -9.74% FAST
F32 I32 2^16 0.201 18.819 us 48.86% 19.065 us 49.06% 0.246 us 1.30% SAME
F32 I32 2^20 0.201 23.703 us 3.02% 23.650 us 3.04% -0.053 us -0.22% SAME
F32 I32 2^24 0.201 52.343 us 29.70% 50.344 us 32.77% -1.999 us -3.82% SAME
F32 I32 2^28 0.201 426.919 us 0.16% 407.032 us 7.67% -19.888 us -4.66% FAST
F32 I64 2^16 1 19.223 us 2.55% 19.404 us 2.95% 0.181 us 0.94% SAME
F32 I64 2^20 1 24.876 us 4.10% 24.775 us 4.19% -0.101 us -0.40% SAME
F32 I64 2^24 1 59.498 us 1.50% 56.964 us 2.32% -2.534 us -4.26% FAST
F32 I64 2^28 1 538.946 us 0.23% 486.130 us 0.17% -52.816 us -9.80% FAST
F32 I64 2^16 0.201 19.240 us 48.28% 19.238 us 2.65% -0.002 us -0.01% SAME
F32 I64 2^20 0.201 23.488 us 1.91% 23.711 us 3.21% 0.223 us 0.95% SAME
F32 I64 2^24 0.201 52.393 us 28.71% 50.312 us 0.93% -2.081 us -3.97% FAST
F32 I64 2^28 0.201 426.519 us 0.23% 409.101 us 6.67% -17.418 us -4.08% FAST
F64 I32 2^16 1 19.860 us 3.50% 19.977 us 3.72% 0.117 us 0.59% SAME
F64 I32 2^20 1 28.712 us 3.11% 29.102 us 2.88% 0.390 us 1.36% SAME
F64 I32 2^24 1 93.996 us 1.12% 89.991 us 1.23% -4.005 us -4.26% FAST
F64 I32 2^28 1 1.119 ms 0.13% 1.024 ms 0.11% -95.148 us -8.50% FAST
F64 I32 2^16 0.201 19.688 us 3.34% 19.511 us 1.86% -0.176 us -0.89% SAME
F64 I32 2^20 0.201 26.219 us 41.88% 26.203 us 41.52% -0.016 us -0.06% SAME
F64 I32 2^24 0.201 72.418 us 0.90% 70.599 us 0.63% -1.819 us -2.51% FAST
F64 I32 2^28 0.201 767.827 us 1.07% 744.573 us 3.76% -23.253 us -3.03% FAST
F64 I64 2^16 1 19.855 us 3.43% 19.969 us 3.61% 0.114 us 0.58% SAME
F64 I64 2^20 1 29.313 us 2.64% 29.332 us 2.40% 0.019 us 0.07% SAME
F64 I64 2^24 1 95.255 us 0.45% 91.056 us 0.85% -4.199 us -4.41% FAST
F64 I64 2^28 1 1.129 ms 0.09% 1.035 ms 0.11% -94.239 us -8.35% FAST
F64 I64 2^16 0.201 19.659 us 3.25% 19.687 us 47.52% 0.028 us 0.14% SAME
F64 I64 2^20 0.201 26.559 us 3.40% 26.329 us 3.18% -0.230 us -0.87% SAME
F64 I64 2^24 0.201 72.639 us 0.46% 71.145 us 1.29% -1.494 us -2.06% FAST
F64 I64 2^28 0.201 768.973 us 3.88% 755.758 us 5.52% -13.215 us -1.72% SAME

Summary

  • Total Matches: 112
    • Pass (diff <= min_noise): 55
    • Unknown (infinite noise): 0
    • Failure (diff > min_noise): 57

@pauleonix
Copy link
Contributor Author

pauleonix commented Sep 19, 2025

Benchmarking cub.bench.merge.pairs.base (Updated with the newest correctness fixes)

RTX 5090 (Using UBLKCPY)

['/home/pgrossebley/merge_pairs_old.json', '/home/pgrossebley/merge_pairs_newer.json']

base

[0] NVIDIA GeForce RTX 5090

KeyT{ct} ValueT{ct} OffsetT{ct} Elements{io} Entropy Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I8 I32 2^16 1 15.419 us 4.65% 14.051 us 6.78% -1.368 us -8.87% FAST
I8 I8 I32 2^20 1 19.576 us 5.22% 18.563 us 2.75% -1.013 us -5.18% FAST
I8 I8 I32 2^24 1 87.981 us 0.85% 79.869 us 0.07% -8.112 us -9.22% FAST
I8 I8 I32 2^28 1 1.140 ms 0.14% 1.007 ms 0.14% -132.127 us -11.59% FAST
I8 I8 I32 2^16 0.201 14.178 us 8.75% 14.201 us 3.57% 0.024 us 0.17% SAME
I8 I8 I32 2^20 0.201 18.541 us 2.48% 18.026 us 1.14% -0.515 us -2.78% FAST
I8 I8 I32 2^24 0.201 87.533 us 1.46% 76.493 us 1.32% -11.040 us -12.61% FAST
I8 I8 I32 2^28 0.201 1.117 ms 0.19% 975.285 us 0.16% -141.630 us -12.68% FAST
I8 I8 I64 2^16 1 14.911 us 7.54% 14.280 us 6.85% -0.631 us -4.23% SAME
I8 I8 I64 2^20 1 18.436 us 0.58% 18.500 us 2.06% 0.064 us 0.34% SAME
I8 I8 I64 2^24 1 84.353 us 0.95% 79.863 us 0.19% -4.490 us -5.32% FAST
I8 I8 I64 2^28 1 1.072 ms 0.18% 1.002 ms 0.13% -70.269 us -6.56% FAST
I8 I8 I64 2^16 0.201 13.804 us 5.28% 14.241 us 3.03% 0.437 us 3.16% SLOW
I8 I8 I64 2^20 0.201 18.477 us 1.64% 18.434 us 0.58% -0.043 us -0.23% SAME
I8 I8 I64 2^24 0.201 83.631 us 1.29% 77.340 us 1.13% -6.291 us -7.52% FAST
I8 I8 I64 2^28 0.201 1.055 ms 0.25% 965.701 us 0.16% -88.950 us -8.43% FAST
I8 I16 I32 2^16 1 12.798 us 6.91% 13.813 us 9.59% 1.015 us 7.93% SLOW
I8 I16 I32 2^20 1 18.443 us 0.92% 18.711 us 3.78% 0.268 us 1.45% SLOW
I8 I16 I32 2^24 1 94.633 us 0.96% 86.197 us 0.73% -8.436 us -8.91% FAST
I8 I16 I32 2^28 1 1.226 ms 0.14% 1.170 ms 0.12% -55.709 us -4.54% FAST
I8 I16 I32 2^16 0.201 13.463 us 5.97% 14.117 us 4.51% 0.654 us 4.86% SLOW
I8 I16 I32 2^20 0.201 18.653 us 3.41% 18.475 us 1.75% -0.178 us -0.96% SAME
I8 I16 I32 2^24 0.201 95.108 us 1.41% 82.670 us 1.35% -12.438 us -13.08% FAST
I8 I16 I32 2^28 0.201 1.204 ms 0.12% 1.135 ms 0.10% -68.753 us -5.71% FAST
I8 I16 I64 2^16 1 14.152 us 7.35% 15.187 us 7.67% 1.035 us 7.31% SAME
I8 I16 I64 2^20 1 18.478 us 1.67% 18.814 us 4.31% 0.336 us 1.82% SLOW
I8 I16 I64 2^24 1 92.930 us 1.08% 87.320 us 1.13% -5.610 us -6.04% FAST
I8 I16 I64 2^28 1 1.209 ms 0.14% 1.169 ms 0.12% -39.969 us -3.31% FAST
I8 I16 I64 2^16 0.201 13.603 us 4.42% 14.189 us 3.76% 0.586 us 4.31% SLOW
I8 I16 I64 2^20 0.201 18.432 us 0.22% 18.473 us 1.61% 0.041 us 0.22% SLOW
I8 I16 I64 2^24 0.201 92.900 us 1.45% 83.240 us 1.26% -9.660 us -10.40% FAST
I8 I16 I64 2^28 0.201 1.188 ms 0.13% 1.136 ms 0.12% -52.097 us -4.38% FAST
I8 I32 I32 2^16 1 13.301 us 7.25% 13.858 us 6.38% 0.558 us 4.19% SAME
I8 I32 I32 2^20 1 20.414 us 1.76% 20.479 us 0.26% 0.066 us 0.32% SLOW
I8 I32 I32 2^24 1 127.608 us 1.05% 125.690 us 1.37% -1.918 us -1.50% FAST
I8 I32 I32 2^28 1 1.866 ms 0.11% 1.857 ms 0.11% -8.978 us -0.48% FAST
I8 I32 I32 2^16 0.201 13.677 us 4.09% 14.094 us 4.69% 0.417 us 3.05% SAME
I8 I32 I32 2^20 0.201 20.479 us 0.15% 20.092 us 3.99% -0.387 us -1.89% FAST
I8 I32 I32 2^24 0.201 125.830 us 1.04% 123.771 us 1.06% -2.060 us -1.64% FAST
I8 I32 I32 2^28 0.201 1.849 ms 0.11% 1.840 ms 0.11% -8.835 us -0.48% FAST
I8 I32 I64 2^16 1 13.648 us 3.99% 14.269 us 2.61% 0.621 us 4.55% SLOW
I8 I32 I64 2^20 1 21.159 us 4.62% 20.454 us 1.74% -0.705 us -3.33% FAST
I8 I32 I64 2^24 1 126.694 us 0.99% 125.309 us 1.13% -1.385 us -1.09% FAST
I8 I32 I64 2^28 1 1.867 ms 0.10% 1.856 ms 0.10% -11.396 us -0.61% FAST
I8 I32 I64 2^16 0.201 13.815 us 3.02% 14.262 us 2.71% 0.447 us 3.24% SLOW
I8 I32 I64 2^20 0.201 18.932 us 4.65% 20.243 us 3.23% 1.311 us 6.92% SLOW
I8 I32 I64 2^24 0.201 126.293 us 1.02% 123.690 us 1.07% -2.603 us -2.06% FAST
I8 I32 I64 2^28 0.201 1.849 ms 0.10% 1.839 ms 0.11% -9.439 us -0.51% FAST
I8 I64 I32 2^16 1 14.170 us 2.85% 14.334 us 0.42% 0.165 us 1.16% SLOW
I8 I64 I32 2^20 1 23.867 us 5.61% 23.591 us 4.34% -0.276 us -1.15% SAME
I8 I64 I32 2^24 1 214.973 us 0.58% 216.270 us 0.61% 1.297 us 0.60% SLOW
I8 I64 I32 2^28 1 3.366 ms 0.07% 3.347 ms 0.06% -19.860 us -0.59% FAST
I8 I64 I32 2^16 0.201 11.429 us 5.68% 14.335 us 0.46% 2.906 us 25.42% SLOW
I8 I64 I32 2^20 0.201 23.138 us 4.17% 22.528 us 0.45% -0.609 us -2.63% FAST
I8 I64 I32 2^24 0.201 215.190 us 0.62% 216.407 us 0.56% 1.217 us 0.57% SLOW
I8 I64 I32 2^28 0.201 3.327 ms 0.06% 3.309 ms 0.06% -17.212 us -0.52% FAST
I8 I64 I64 2^16 1 11.482 us 5.50% 14.336 us 0.26% 2.853 us 24.85% SLOW
I8 I64 I64 2^20 1 24.939 us 5.69% 24.023 us 3.78% -0.916 us -3.67% SAME
I8 I64 I64 2^24 1 215.857 us 0.58% 215.765 us 0.62% -0.092 us -0.04% SAME
I8 I64 I64 2^28 1 3.367 ms 0.07% 3.348 ms 0.06% -19.095 us -0.57% FAST
I8 I64 I64 2^16 0.201 13.884 us 1.62% 14.336 us 0.43% 0.452 us 3.25% SLOW
I8 I64 I64 2^20 0.201 22.606 us 1.75% 22.634 us 2.02% 0.028 us 0.13% SAME
I8 I64 I64 2^24 0.201 214.799 us 0.63% 215.007 us 0.56% 0.208 us 0.10% SAME
I8 I64 I64 2^28 0.201 3.327 ms 0.06% 3.310 ms 0.06% -16.508 us -0.50% FAST
I16 I8 I32 2^16 1 16.071 us 1.67% 14.352 us 1.36% -1.719 us -10.69% FAST
I16 I8 I32 2^20 1 20.657 us 2.80% 20.259 us 4.56% -0.398 us -1.93% SAME
I16 I8 I32 2^24 1 107.529 us 1.07% 80.123 us 0.88% -27.406 us -25.49% FAST
I16 I8 I32 2^28 1 1.452 ms 0.49% 1.165 ms 0.09% -286.445 us -19.73% FAST
I16 I8 I32 2^16 0.201 15.964 us 2.36% 14.335 us 0.44% -1.629 us -10.21% FAST
I16 I8 I32 2^20 0.201 18.989 us 4.83% 18.631 us 3.39% -0.358 us -1.88% SAME
I16 I8 I32 2^24 0.201 100.311 us 0.34% 77.649 us 1.30% -22.662 us -22.59% FAST
I16 I8 I32 2^28 0.201 1.308 ms 0.16% 1.127 ms 0.13% -180.347 us -13.79% FAST
I16 I8 I64 2^16 1 16.017 us 2.46% 14.352 us 1.25% -1.665 us -10.40% FAST
I16 I8 I64 2^20 1 22.864 us 3.52% 20.728 us 6.70% -2.136 us -9.34% FAST
I16 I8 I64 2^24 1 91.089 us 1.36% 94.475 us 0.73% 3.386 us 3.72% SLOW
I16 I8 I64 2^28 1 1.224 ms 0.14% 1.295 ms 0.10% 70.948 us 5.80% SLOW
I16 I8 I64 2^16 0.201 15.940 us 1.10% 14.336 us 0.44% -1.603 us -10.06% FAST
I16 I8 I64 2^20 0.201 22.640 us 5.33% 19.339 us 5.87% -3.301 us -14.58% FAST
I16 I8 I64 2^24 0.201 89.567 us 1.41% 83.975 us 0.30% -5.592 us -6.24% FAST
I16 I8 I64 2^28 0.201 1.182 ms 0.15% 1.175 ms 0.11% -6.880 us -0.58% FAST
I16 I16 I32 2^16 1 15.035 us 8.07% 16.367 us 1.17% 1.332 us 8.86% SLOW
I16 I16 I32 2^20 1 20.595 us 2.31% 20.656 us 2.82% 0.062 us 0.30% SAME
I16 I16 I32 2^24 1 107.362 us 0.95% 102.806 us 0.87% -4.556 us -4.24% FAST
I16 I16 I32 2^28 1 1.529 ms 0.08% 1.510 ms 0.11% -18.750 us -1.23% FAST
I16 I16 I32 2^16 0.201 13.880 us 3.17% 14.356 us 1.51% 0.476 us 3.43% SLOW
I16 I16 I32 2^20 0.201 18.626 us 3.23% 20.450 us 1.65% 1.824 us 9.79% SLOW
I16 I16 I32 2^24 0.201 102.437 us 0.32% 100.994 us 1.06% -1.443 us -1.41% FAST
I16 I16 I32 2^28 0.201 1.468 ms 0.12% 1.479 ms 0.09% 10.883 us 0.74% SLOW
I16 I16 I64 2^16 1 15.773 us 4.18% 14.452 us 3.31% -1.321 us -8.38% FAST
I16 I16 I64 2^20 1 20.897 us 3.95% 21.031 us 4.34% 0.134 us 0.64% SAME
I16 I16 I64 2^24 1 111.797 us 1.41% 102.893 us 0.98% -8.904 us -7.96% FAST
I16 I16 I64 2^28 1 1.544 ms 0.07% 1.513 ms 0.09% -31.122 us -2.02% FAST
I16 I16 I64 2^16 0.201 14.337 us 5.91% 14.336 us 0.28% -0.001 us -0.01% SAME
I16 I16 I64 2^20 0.201 20.356 us 4.99% 19.501 us 6.39% -0.855 us -4.20% SAME
I16 I16 I64 2^24 0.201 102.515 us 0.47% 101.073 us 1.06% -1.442 us -1.41% FAST
I16 I16 I64 2^28 0.201 1.473 ms 0.09% 1.480 ms 0.11% 6.483 us 0.44% SLOW
I16 I32 I32 2^16 1 14.238 us 1.43% 14.334 us 0.19% 0.097 us 0.68% SLOW
I16 I32 I32 2^20 1 21.099 us 4.87% 21.512 us 4.73% 0.414 us 1.96% SAME
I16 I32 I32 2^24 1 148.935 us 0.79% 148.247 us 0.81% -0.688 us -0.46% SAME
I16 I32 I32 2^28 1 2.274 ms 0.07% 2.261 ms 0.07% -12.950 us -0.57% FAST
I16 I32 I32 2^16 0.201 13.983 us 1.71% 14.335 us 0.30% 0.352 us 2.51% SLOW
I16 I32 I32 2^20 0.201 22.560 us 1.18% 20.495 us 3.66% -2.065 us -9.15% FAST
I16 I32 I32 2^24 0.201 144.532 us 0.90% 145.956 us 0.94% 1.424 us 0.99% SLOW
I16 I32 I32 2^28 0.201 2.222 ms 0.07% 2.212 ms 0.09% -9.601 us -0.43% FAST
I16 I32 I64 2^16 1 13.934 us 1.54% 14.336 us 0.58% 0.403 us 2.89% SLOW
I16 I32 I64 2^20 1 23.558 us 5.19% 22.364 us 2.49% -1.194 us -5.07% FAST
I16 I32 I64 2^24 1 149.897 us 0.71% 148.230 us 0.78% -1.667 us -1.11% FAST
I16 I32 I64 2^28 1 2.274 ms 0.07% 2.261 ms 0.07% -13.352 us -0.59% FAST
I16 I32 I64 2^16 0.201 13.951 us 1.61% 14.336 us 0.11% 0.384 us 2.75% SLOW
I16 I32 I64 2^20 0.201 20.740 us 3.30% 20.984 us 4.51% 0.244 us 1.18% SAME
I16 I32 I64 2^24 0.201 146.198 us 1.13% 146.328 us 1.04% 0.130 us 0.09% SAME
I16 I32 I64 2^28 0.201 2.223 ms 0.07% 2.213 ms 0.08% -10.598 us -0.48% FAST
I16 I64 I32 2^16 1 14.278 us 1.14% 14.334 us 0.22% 0.056 us 0.39% SLOW
I16 I64 I32 2^20 1 28.061 us 3.61% 25.054 us 4.82% -3.007 us -10.72% FAST
I16 I64 I32 2^24 1 238.998 us 0.70% 242.563 us 0.50% 3.565 us 1.49% SLOW
I16 I64 I32 2^28 1 3.773 ms 0.06% 3.767 ms 0.06% -5.835 us -0.15% FAST
I16 I64 I32 2^16 0.201 13.986 us 1.73% 14.335 us 0.18% 0.348 us 2.49% SLOW
I16 I64 I32 2^20 0.201 24.857 us 2.91% 25.763 us 6.61% 0.905 us 3.64% SLOW
I16 I64 I32 2^24 0.201 235.608 us 0.53% 237.895 us 0.65% 2.288 us 0.97% SLOW
I16 I64 I32 2^28 0.201 3.686 ms 0.06% 3.685 ms 0.07% -1.465 us -0.04% SAME
I16 I64 I64 2^16 1 14.048 us 1.82% 14.335 us 0.44% 0.287 us 2.05% SLOW
I16 I64 I64 2^20 1 30.651 us 1.69% 26.579 us 2.01% -4.072 us -13.28% FAST
I16 I64 I64 2^24 1 239.196 us 0.67% 241.553 us 0.55% 2.357 us 0.99% SLOW
I16 I64 I64 2^28 1 3.773 ms 0.05% 3.757 ms 0.05% -15.876 us -0.42% FAST
I16 I64 I64 2^16 0.201 13.984 us 1.73% 14.336 us 0.47% 0.353 us 2.52% SLOW
I16 I64 I64 2^20 0.201 27.362 us 4.95% 27.970 us 6.12% 0.608 us 2.22% SAME
I16 I64 I64 2^24 0.201 235.984 us 0.65% 236.878 us 0.67% 0.895 us 0.38% SAME
I16 I64 I64 2^28 0.201 3.688 ms 0.05% 3.674 ms 0.06% -13.535 us -0.37% FAST
I32 I8 I32 2^16 1 14.965 us 6.61% 14.334 us 0.20% -0.630 us -4.21% FAST
I32 I8 I32 2^20 1 21.339 us 4.74% 19.372 us 6.35% -1.967 us -9.22% FAST
I32 I8 I32 2^24 1 127.083 us 1.25% 129.907 us 1.13% 2.824 us 2.22% SLOW
I32 I8 I32 2^28 1 1.902 ms 0.08% 1.893 ms 0.10% -8.927 us -0.47% FAST
I32 I8 I32 2^16 0.201 13.984 us 1.71% 14.335 us 0.61% 0.351 us 2.51% SLOW
I32 I8 I32 2^20 0.201 21.181 us 4.59% 21.150 us 4.66% -0.031 us -0.15% SAME
I32 I8 I32 2^24 0.201 124.362 us 0.92% 125.594 us 1.32% 1.232 us 0.99% SLOW
I32 I8 I32 2^28 0.201 1.863 ms 0.10% 1.856 ms 0.10% -6.538 us -0.35% FAST
I32 I8 I64 2^16 1 14.004 us 1.98% 14.337 us 0.61% 0.333 us 2.37% SLOW
I32 I8 I64 2^20 1 22.310 us 2.83% 22.320 us 2.78% 0.010 us 0.04% SAME
I32 I8 I64 2^24 1 128.173 us 1.16% 129.559 us 0.99% 1.386 us 1.08% SLOW
I32 I8 I64 2^28 1 1.903 ms 0.08% 1.894 ms 0.10% -9.234 us -0.49% FAST
I32 I8 I64 2^16 0.201 13.962 us 1.65% 14.336 us 0.52% 0.374 us 2.68% SLOW
I32 I8 I64 2^20 0.201 21.912 us 4.29% 19.566 us 5.86% -2.346 us -10.70% FAST
I32 I8 I64 2^24 0.201 125.166 us 1.13% 126.670 us 1.42% 1.504 us 1.20% SLOW
I32 I8 I64 2^28 0.201 1.865 ms 0.09% 1.856 ms 0.10% -8.552 us -0.46% FAST
I32 I16 I32 2^16 1 14.241 us 1.40% 14.335 us 0.31% 0.094 us 0.66% SLOW
I32 I16 I32 2^20 1 21.543 us 5.20% 22.133 us 5.47% 0.589 us 2.73% SAME
I32 I16 I32 2^24 1 153.589 us 0.78% 151.799 us 0.60% -1.790 us -1.17% FAST
I32 I16 I32 2^28 1 2.281 ms 0.08% 2.268 ms 0.07% -12.527 us -0.55% FAST
I32 I16 I32 2^16 0.201 14.012 us 1.79% 14.335 us 0.30% 0.323 us 2.30% SLOW
I32 I16 I32 2^20 0.201 21.481 us 4.77% 21.393 us 5.61% -0.088 us -0.41% SAME
I32 I16 I32 2^24 0.201 145.967 us 0.70% 146.085 us 0.74% 0.118 us 0.08% SAME
I32 I16 I32 2^28 0.201 2.231 ms 0.08% 2.222 ms 0.09% -9.622 us -0.43% FAST
I32 I16 I64 2^16 1 14.059 us 1.84% 14.337 us 0.57% 0.278 us 1.98% SLOW
I32 I16 I64 2^20 1 23.307 us 4.79% 22.368 us 2.63% -0.939 us -4.03% FAST
I32 I16 I64 2^24 1 153.130 us 0.84% 151.729 us 0.58% -1.401 us -0.91% FAST
I32 I16 I64 2^28 1 2.282 ms 0.08% 2.268 ms 0.07% -14.053 us -0.62% FAST
I32 I16 I64 2^16 0.201 14.008 us 1.77% 14.336 us 0.34% 0.328 us 2.34% SLOW
I32 I16 I64 2^20 0.201 23.612 us 4.50% 21.752 us 4.57% -1.860 us -7.88% FAST
I32 I16 I64 2^24 0.201 147.432 us 0.80% 146.354 us 0.86% -1.078 us -0.73% SAME
I32 I16 I64 2^28 0.201 2.232 ms 0.08% 2.222 ms 0.09% -10.314 us -0.46% FAST
I32 I32 I32 2^16 1 14.252 us 1.40% 14.335 us 0.18% 0.083 us 0.58% SLOW
I32 I32 I32 2^20 1 28.556 us 1.80% 24.213 us 3.23% -4.343 us -15.21% FAST
I32 I32 I32 2^24 1 207.185 us 0.48% 196.897 us 0.59% -10.288 us -4.97% FAST
I32 I32 I32 2^28 1 3.145 ms 0.06% 3.036 ms 0.06% -108.724 us -3.46% FAST
I32 I32 I32 2^16 0.201 14.035 us 1.82% 14.336 us 0.40% 0.301 us 2.14% SLOW
I32 I32 I32 2^20 0.201 27.686 us 4.18% 22.221 us 3.31% -5.465 us -19.74% FAST
I32 I32 I32 2^24 0.201 198.053 us 0.61% 194.153 us 0.62% -3.900 us -1.97% FAST
I32 I32 I32 2^28 0.201 3.062 ms 0.06% 2.968 ms 0.06% -94.450 us -3.08% FAST
I32 I32 I64 2^16 1 14.055 us 2.36% 14.338 us 0.61% 0.283 us 2.01% SLOW
I32 I32 I64 2^20 1 24.566 us 0.67% 24.574 us 0.75% 0.008 us 0.03% SAME
I32 I32 I64 2^24 1 195.851 us 0.62% 200.821 us 0.53% 4.970 us 2.54% SLOW
I32 I32 I64 2^28 1 3.041 ms 0.08% 3.080 ms 0.06% 39.589 us 1.30% SLOW
I32 I32 I64 2^16 0.201 13.946 us 1.61% 14.336 us 0.50% 0.390 us 2.80% SLOW
I32 I32 I64 2^20 0.201 24.139 us 3.50% 23.713 us 4.88% -0.426 us -1.76% SAME
I32 I32 I64 2^24 0.201 190.950 us 0.60% 197.548 us 0.56% 6.598 us 3.46% SLOW
I32 I32 I64 2^28 0.201 2.968 ms 0.06% 3.012 ms 0.06% 44.171 us 1.49% SLOW
I32 I64 I32 2^16 1 14.256 us 1.57% 14.693 us 5.30% 0.438 us 3.07% SLOW
I32 I64 I32 2^20 1 28.343 us 2.96% 29.634 us 3.45% 1.291 us 4.56% SLOW
I32 I64 I32 2^24 1 288.101 us 0.55% 287.918 us 0.51% -0.183 us -0.06% SAME
I32 I64 I32 2^28 1 4.554 ms 0.04% 4.532 ms 0.05% -22.546 us -0.50% FAST
I32 I64 I32 2^16 0.201 14.066 us 1.86% 14.407 us 2.64% 0.341 us 2.42% SLOW
I32 I64 I32 2^20 0.201 29.090 us 4.37% 26.814 us 3.74% -2.277 us -7.83% FAST
I32 I64 I32 2^24 0.201 284.094 us 0.51% 283.874 us 0.52% -0.219 us -0.08% SAME
I32 I64 I32 2^28 0.201 4.447 ms 0.04% 4.432 ms 0.04% -15.171 us -0.34% FAST
I32 I64 I64 2^16 1 14.109 us 2.92% 14.499 us 3.85% 0.389 us 2.76% SAME
I32 I64 I64 2^20 1 29.223 us 3.38% 27.583 us 4.49% -1.640 us -5.61% FAST
I32 I64 I64 2^24 1 288.949 us 0.52% 288.839 us 0.56% -0.111 us -0.04% SAME
I32 I64 I64 2^28 1 4.557 ms 0.05% 4.533 ms 0.04% -23.782 us -0.52% FAST
I32 I64 I64 2^16 0.201 13.948 us 1.74% 14.533 us 4.47% 0.585 us 4.19% SLOW
I32 I64 I64 2^20 0.201 30.914 us 2.41% 28.783 us 2.08% -2.131 us -6.89% FAST
I32 I64 I64 2^24 0.201 284.749 us 0.45% 285.099 us 0.50% 0.350 us 0.12% SAME
I32 I64 I64 2^28 0.201 4.448 ms 0.04% 4.433 ms 0.05% -15.060 us -0.34% FAST
I64 I8 I32 2^16 1 14.354 us 2.72% 14.596 us 4.67% 0.242 us 1.68% SAME
I64 I8 I32 2^20 1 24.748 us 2.31% 26.527 us 1.63% 1.779 us 7.19% SLOW
I64 I8 I32 2^24 1 222.089 us 0.54% 222.646 us 0.64% 0.557 us 0.25% SAME
I64 I8 I32 2^28 1 3.445 ms 0.06% 3.431 ms 0.07% -13.838 us -0.40% FAST
I64 I8 I32 2^16 0.201 14.137 us 3.74% 14.359 us 1.49% 0.222 us 1.57% SLOW
I64 I8 I32 2^20 0.201 26.619 us 0.44% 24.606 us 1.02% -2.013 us -7.56% FAST
I64 I8 I32 2^24 0.201 217.144 us 0.73% 219.361 us 0.65% 2.218 us 1.02% SLOW
I64 I8 I32 2^28 0.201 3.377 ms 0.06% 3.369 ms 0.07% -8.154 us -0.24% FAST
I64 I8 I64 2^16 1 14.885 us 6.01% 15.658 us 6.26% 0.773 us 5.19% SAME
I64 I8 I64 2^20 1 28.565 us 1.66% 26.537 us 1.77% -2.028 us -7.10% FAST
I64 I8 I64 2^24 1 221.964 us 0.54% 223.071 us 0.63% 1.107 us 0.50% SAME
I64 I8 I64 2^28 1 3.432 ms 0.06% 3.432 ms 0.08% 0.059 us 0.00% SAME
I64 I8 I64 2^16 0.201 14.350 us 5.31% 15.168 us 6.65% 0.818 us 5.70% SLOW
I64 I8 I64 2^20 0.201 26.688 us 1.36% 25.273 us 3.84% -1.415 us -5.30% FAST
I64 I8 I64 2^24 0.201 216.578 us 0.76% 221.240 us 0.61% 4.662 us 2.15% SLOW
I64 I8 I64 2^28 0.201 3.366 ms 0.06% 3.370 ms 0.08% 3.906 us 0.12% SLOW
I64 I16 I32 2^16 1 14.418 us 3.31% 15.664 us 6.24% 1.246 us 8.64% SLOW
I64 I16 I32 2^20 1 30.208 us 3.05% 28.551 us 3.14% -1.657 us -5.49% FAST
I64 I16 I32 2^24 1 244.928 us 0.52% 245.370 us 0.57% 0.442 us 0.18% SAME
I64 I16 I32 2^28 1 3.794 ms 0.05% 3.784 ms 0.05% -9.857 us -0.26% FAST
I64 I16 I32 2^16 0.201 14.248 us 4.69% 14.795 us 5.78% 0.547 us 3.84% SAME
I64 I16 I32 2^20 0.201 28.487 us 2.08% 26.378 us 2.68% -2.109 us -7.40% FAST
I64 I16 I32 2^24 0.201 239.628 us 0.59% 241.832 us 0.62% 2.204 us 0.92% SLOW
I64 I16 I32 2^28 0.201 3.723 ms 0.05% 3.719 ms 0.05% -3.805 us -0.10% FAST
I64 I16 I64 2^16 1 14.950 us 5.84% 16.167 us 3.92% 1.216 us 8.14% SLOW
I64 I16 I64 2^20 1 30.627 us 2.05% 28.718 us 1.14% -1.909 us -6.23% FAST
I64 I16 I64 2^24 1 245.254 us 0.48% 246.000 us 0.47% 0.746 us 0.30% SAME
I64 I16 I64 2^28 1 3.795 ms 0.05% 3.784 ms 0.05% -10.794 us -0.28% FAST
I64 I16 I64 2^16 0.201 14.585 us 5.85% 15.411 us 6.64% 0.826 us 5.67% SAME
I64 I16 I64 2^20 0.201 28.439 us 2.30% 26.625 us 0.20% -1.814 us -6.38% FAST
I64 I16 I64 2^24 0.201 240.127 us 0.61% 242.180 us 0.60% 2.054 us 0.86% SLOW
I64 I16 I64 2^28 0.201 3.724 ms 0.05% 3.721 ms 0.04% -3.534 us -0.09% FAST
I64 I32 I32 2^16 1 14.583 us 3.98% 14.452 us 3.30% -0.131 us -0.90% SAME
I64 I32 I32 2^20 1 30.237 us 3.10% 28.687 us 0.66% -1.550 us -5.13% FAST
I64 I32 I32 2^24 1 291.917 us 0.46% 292.278 us 0.41% 0.361 us 0.12% SAME
I64 I32 I32 2^28 1 4.548 ms 0.04% 4.538 ms 0.04% -10.016 us -0.22% FAST
I64 I32 I32 2^16 0.201 14.835 us 5.76% 14.364 us 1.66% -0.471 us -3.17% FAST
I64 I32 I32 2^20 0.201 28.685 us 0.59% 28.588 us 1.47% -0.097 us -0.34% SAME
I64 I32 I32 2^24 0.201 284.989 us 0.31% 286.011 us 0.41% 1.022 us 0.36% SLOW
I64 I32 I32 2^28 0.201 4.456 ms 0.04% 4.455 ms 0.04% -1.135 us -0.03% SAME
I64 I32 I64 2^16 1 15.301 us 4.95% 15.710 us 6.12% 0.409 us 2.67% SAME
I64 I32 I64 2^20 1 30.318 us 3.58% 30.212 us 3.82% -0.105 us -0.35% SAME
I64 I32 I64 2^24 1 293.012 us 0.40% 289.964 us 0.43% -3.048 us -1.04% FAST
I64 I32 I64 2^28 1 4.568 ms 0.05% 4.528 ms 0.05% -39.237 us -0.86% FAST
I64 I32 I64 2^16 0.201 15.255 us 5.31% 15.892 us 5.52% 0.637 us 4.18% SAME
I64 I32 I64 2^20 0.201 28.676 us 0.39% 28.665 us 0.42% -0.011 us -0.04% SAME
I64 I32 I64 2^24 0.201 286.816 us 0.27% 283.779 us 0.41% -3.036 us -1.06% FAST
I64 I32 I64 2^28 0.201 4.473 ms 0.04% 4.443 ms 0.05% -30.625 us -0.68% FAST
I64 I64 I32 2^16 1 14.660 us 4.51% 14.342 us 0.79% -0.318 us -2.17% FAST
I64 I64 I32 2^20 1 32.955 us 1.83% 33.376 us 2.96% 0.422 us 1.28% SAME
I64 I64 I32 2^24 1 388.486 us 0.31% 388.024 us 0.38% -0.461 us -0.12% SAME
I64 I64 I32 2^28 1 6.161 ms 0.03% 6.140 ms 0.03% -21.561 us -0.35% FAST
I64 I64 I32 2^16 0.201 14.961 us 5.73% 15.012 us 6.43% 0.051 us 0.34% SAME
I64 I64 I32 2^20 0.201 32.807 us 1.00% 34.924 us 1.33% 2.117 us 6.45% SLOW
I64 I64 I32 2^24 0.201 377.199 us 0.33% 378.162 us 0.30% 0.963 us 0.26% SAME
I64 I64 I32 2^28 0.201 5.998 ms 0.03% 5.987 ms 0.04% -11.033 us -0.18% FAST
I64 I64 I64 2^16 1 15.871 us 3.37% 16.148 us 4.04% 0.277 us 1.75% SAME
I64 I64 I64 2^20 1 33.637 us 3.03% 35.012 us 4.11% 1.374 us 4.09% SLOW
I64 I64 I64 2^24 1 388.847 us 0.33% 387.170 us 0.39% -1.677 us -0.43% FAST
I64 I64 I64 2^28 1 6.163 ms 0.04% 6.142 ms 0.03% -21.777 us -0.35% FAST
I64 I64 I64 2^16 0.201 15.548 us 4.74% 15.969 us 5.17% 0.421 us 2.71% SAME
I64 I64 I64 2^20 0.201 32.945 us 1.81% 34.799 us 2.13% 1.853 us 5.63% SLOW
I64 I64 I64 2^24 0.201 377.514 us 0.32% 378.760 us 0.36% 1.246 us 0.33% SLOW
I64 I64 I64 2^28 0.201 5.999 ms 0.03% 5.989 ms 0.04% -9.666 us -0.16% FAST
I128 I8 I32 2^16 1 16.300 us 1.19% 16.569 us 3.54% 0.269 us 1.65% SLOW
I128 I8 I32 2^20 1 38.697 us 2.30% 41.588 us 2.71% 2.891 us 7.47% SLOW
I128 I8 I32 2^24 1 412.866 us 0.36% 411.997 us 0.34% -0.869 us -0.21% SAME
I128 I8 I32 2^28 1 6.536 ms 0.39% 6.503 ms 0.04% -33.028 us -0.51% FAST
I128 I8 I32 2^16 0.201 16.061 us 1.60% 15.874 us 5.57% -0.186 us -1.16% SAME
I128 I8 I32 2^20 0.201 37.756 us 2.88% 38.353 us 2.61% 0.596 us 1.58% SAME
I128 I8 I32 2^24 0.201 401.261 us 0.36% 403.507 us 0.30% 2.246 us 0.56% SLOW
I128 I8 I32 2^28 0.201 6.409 ms 0.50% 6.383 ms 0.04% -25.702 us -0.40% FAST
I128 I8 I64 2^16 1 16.067 us 1.56% 16.771 us 4.78% 0.705 us 4.39% SLOW
I128 I8 I64 2^20 1 38.140 us 2.67% 40.905 us 0.95% 2.765 us 7.25% SLOW
I128 I8 I64 2^24 1 413.155 us 0.36% 413.287 us 0.39% 0.133 us 0.03% SAME
I128 I8 I64 2^28 1 6.536 ms 0.43% 6.505 ms 0.03% -30.808 us -0.47% FAST
I128 I8 I64 2^16 0.201 16.032 us 1.51% 16.383 us 0.49% 0.351 us 2.19% SLOW
I128 I8 I64 2^20 0.201 38.270 us 2.68% 38.079 us 3.11% -0.191 us -0.50% SAME
I128 I8 I64 2^24 0.201 401.699 us 0.38% 402.193 us 0.30% 0.494 us 0.12% SAME
I128 I8 I64 2^28 0.201 6.409 ms 0.50% 6.386 ms 0.04% -23.249 us -0.36% FAST
I128 I16 I32 2^16 1 16.300 us 1.37% 16.123 us 4.25% -0.177 us -1.09% SAME
I128 I16 I32 2^20 1 38.766 us 1.66% 40.271 us 2.74% 1.504 us 3.88% SLOW
I128 I16 I32 2^24 1 434.207 us 0.28% 434.335 us 0.29% 0.128 us 0.03% SAME
I128 I16 I32 2^28 1 6.873 ms 0.25% 6.857 ms 0.03% -16.665 us -0.24% FAST
I128 I16 I32 2^16 0.201 16.068 us 1.57% 15.634 us 6.32% -0.434 us -2.70% FAST
I128 I16 I32 2^20 0.201 38.116 us 2.84% 38.878 us 3.53% 0.762 us 2.00% SAME
I128 I16 I32 2^24 0.201 423.702 us 0.28% 423.831 us 0.34% 0.129 us 0.03% SAME
I128 I16 I32 2^28 0.201 6.742 ms 0.48% 6.738 ms 0.04% -4.457 us -0.07% FAST
I128 I16 I64 2^16 1 16.114 us 1.72% 16.914 us 5.31% 0.800 us 4.96% SLOW
I128 I16 I64 2^20 1 39.509 us 2.46% 41.048 us 1.15% 1.539 us 3.90% SLOW
I128 I16 I64 2^24 1 436.502 us 0.28% 433.362 us 0.31% -3.140 us -0.72% FAST
I128 I16 I64 2^28 1 6.905 ms 0.23% 6.859 ms 0.03% -45.558 us -0.66% FAST
I128 I16 I64 2^16 0.201 16.023 us 1.51% 16.383 us 0.36% 0.360 us 2.24% SLOW
I128 I16 I64 2^20 0.201 38.312 us 2.66% 39.183 us 3.21% 0.871 us 2.27% SAME
I128 I16 I64 2^24 0.201 425.665 us 0.27% 425.234 us 0.36% -0.431 us -0.10% SAME
I128 I16 I64 2^28 0.201 6.775 ms 0.41% 6.740 ms 0.03% -34.799 us -0.51% FAST
I128 I32 I32 2^16 1 16.313 us 1.63% 17.360 us 5.89% 1.047 us 6.42% SLOW
I128 I32 I32 2^20 1 43.845 us 2.63% 41.066 us 2.01% -2.779 us -6.34% FAST
I128 I32 I32 2^24 1 479.667 us 0.25% 479.347 us 0.29% -0.320 us -0.07% SAME
I128 I32 I32 2^28 1 7.588 ms 0.09% 7.574 ms 0.33% -13.106 us -0.17% FAST
I128 I32 I32 2^16 0.201 16.081 us 1.63% 16.383 us 0.31% 0.302 us 1.88% SLOW
I128 I32 I32 2^20 0.201 40.366 us 3.35% 40.771 us 3.31% 0.405 us 1.00% SAME
I128 I32 I32 2^24 0.201 469.744 us 0.32% 471.675 us 0.25% 1.930 us 0.41% SLOW
I128 I32 I32 2^28 0.201 7.452 ms 0.31% 7.454 ms 0.40% 2.460 us 0.03% SAME
I128 I32 I64 2^16 1 16.045 us 1.56% 17.777 us 5.36% 1.732 us 10.79% SLOW
I128 I32 I64 2^20 1 43.233 us 1.66% 42.358 us 2.54% -0.875 us -2.02% FAST
I128 I32 I64 2^24 1 482.218 us 0.30% 479.794 us 0.32% -2.424 us -0.50% FAST
I128 I32 I64 2^28 1 7.613 ms 0.09% 7.577 ms 0.32% -36.417 us -0.48% FAST
I128 I32 I64 2^16 0.201 15.906 us 3.48% 15.887 us 5.52% -0.019 us -0.12% SAME
I128 I32 I64 2^20 0.201 40.990 us 3.24% 41.483 us 3.07% 0.493 us 1.20% SAME
I128 I32 I64 2^24 0.201 471.173 us 0.31% 470.432 us 0.27% -0.741 us -0.16% SAME
I128 I32 I64 2^28 0.201 7.481 ms 0.28% 7.458 ms 0.41% -22.716 us -0.30% FAST
I128 I64 I32 2^16 1 16.326 us 1.19% 16.362 us 1.33% 0.036 us 0.22% SAME
I128 I64 I32 2^20 1 46.841 us 1.83% 49.724 us 1.87% 2.884 us 6.16% SLOW
I128 I64 I32 2^24 1 576.553 us 0.22% 575.396 us 0.26% -1.157 us -0.20% SAME
I128 I64 I32 2^28 1 9.244 ms 0.02% 9.216 ms 0.02% -27.663 us -0.30% FAST
I128 I64 I32 2^16 0.201 16.059 us 1.56% 16.383 us 0.41% 0.324 us 2.01% SLOW
I128 I64 I32 2^20 0.201 47.036 us 1.45% 46.270 us 2.38% -0.766 us -1.63% FAST
I128 I64 I32 2^24 0.201 561.070 us 0.28% 561.599 us 0.27% 0.530 us 0.09% SAME
I128 I64 I32 2^28 0.201 8.997 ms 0.03% 8.984 ms 0.03% -13.284 us -0.15% FAST
I128 I64 I64 2^16 1 16.053 us 1.73% 16.629 us 4.00% 0.576 us 3.59% SLOW
I128 I64 I64 2^20 1 45.942 us 2.24% 47.500 us 1.73% 1.558 us 3.39% SLOW
I128 I64 I64 2^24 1 577.352 us 0.23% 576.918 us 0.27% -0.434 us -0.08% SAME
I128 I64 I64 2^28 1 9.246 ms 0.02% 9.221 ms 0.02% -24.607 us -0.27% FAST
I128 I64 I64 2^16 0.201 15.912 us 3.24% 16.166 us 3.90% 0.254 us 1.60% SAME
I128 I64 I64 2^20 0.201 45.407 us 1.84% 47.655 us 1.99% 2.248 us 4.95% SLOW
I128 I64 I64 2^24 0.201 560.361 us 0.25% 562.259 us 0.27% 1.898 us 0.34% SLOW
I128 I64 I64 2^28 0.201 9.001 ms 0.03% 8.989 ms 0.03% -12.288 us -0.14% FAST
F32 I8 I32 2^16 1 14.288 us 1.26% 14.336 us 0.24% 0.047 us 0.33% SLOW
F32 I8 I32 2^20 1 21.570 us 4.73% 21.153 us 4.54% -0.418 us -1.94% SAME
F32 I8 I32 2^24 1 128.754 us 1.25% 128.722 us 1.09% -0.033 us -0.03% SAME
F32 I8 I32 2^28 1 1.901 ms 0.09% 1.894 ms 0.10% -7.352 us -0.39% FAST
F32 I8 I32 2^16 0.201 13.698 us 4.74% 14.336 us 0.06% 0.637 us 4.65% SLOW
F32 I8 I32 2^20 0.201 20.716 us 3.17% 21.011 us 5.72% 0.294 us 1.42% SAME
F32 I8 I32 2^24 0.201 123.915 us 1.00% 124.851 us 1.26% 0.936 us 0.76% SAME
F32 I8 I32 2^28 0.201 1.862 ms 0.09% 1.857 ms 0.09% -5.527 us -0.30% FAST
F32 I8 I64 2^16 1 14.074 us 1.84% 14.336 us 0.32% 0.262 us 1.86% SLOW
F32 I8 I64 2^20 1 21.945 us 4.23% 22.250 us 3.16% 0.306 us 1.39% SAME
F32 I8 I64 2^24 1 128.971 us 1.14% 129.079 us 1.02% 0.108 us 0.08% SAME
F32 I8 I64 2^28 1 1.903 ms 0.09% 1.895 ms 0.08% -7.221 us -0.38% FAST
F32 I8 I64 2^16 0.201 13.978 us 1.69% 14.336 us 0.07% 0.358 us 2.56% SLOW
F32 I8 I64 2^20 0.201 21.694 us 4.64% 21.944 us 4.20% 0.251 us 1.15% SAME
F32 I8 I64 2^24 0.201 124.028 us 1.13% 127.163 us 1.22% 3.136 us 2.53% SLOW
F32 I8 I64 2^28 0.201 1.864 ms 0.09% 1.857 ms 0.08% -7.757 us -0.42% FAST
F32 I16 I32 2^16 1 10.411 us 4.65% 12.164 us 4.00% 1.753 us 16.84% SLOW
F32 I16 I32 2^20 1 23.409 us 4.32% 22.447 us 5.51% -0.962 us -4.11% SAME
F32 I16 I32 2^24 1 150.589 us 0.95% 150.732 us 1.06% 0.143 us 0.09% SAME
F32 I16 I32 2^28 1 2.280 ms 0.06% 2.267 ms 0.07% -12.332 us -0.54% FAST
F32 I16 I32 2^16 0.201 11.197 us 6.70% 11.988 us 6.05% 0.792 us 7.07% SLOW
F32 I16 I32 2^20 0.201 21.735 us 5.84% 21.602 us 4.72% -0.133 us -0.61% SAME
F32 I16 I32 2^24 0.201 146.548 us 0.78% 147.080 us 0.74% 0.531 us 0.36% SAME
F32 I16 I32 2^28 0.201 2.231 ms 0.07% 2.221 ms 0.07% -9.615 us -0.43% FAST
F32 I16 I64 2^16 1 11.809 us 4.73% 12.178 us 3.84% 0.369 us 3.13% SAME
F32 I16 I64 2^20 1 22.220 us 4.65% 22.463 us 1.67% 0.243 us 1.09% SAME
F32 I16 I64 2^24 1 151.001 us 0.83% 151.047 us 0.91% 0.046 us 0.03% SAME
F32 I16 I64 2^28 1 2.281 ms 0.07% 2.268 ms 0.08% -13.458 us -0.59% FAST
F32 I16 I64 2^16 0.201 12.369 us 6.24% 12.289 us 0.51% -0.079 us -0.64% FAST
F32 I16 I64 2^20 0.201 23.189 us 5.90% 21.890 us 4.37% -1.299 us -5.60% FAST
F32 I16 I64 2^24 0.201 145.927 us 0.71% 146.483 us 0.80% 0.555 us 0.38% SAME
F32 I16 I64 2^28 0.201 2.231 ms 0.08% 2.222 ms 0.08% -9.049 us -0.41% FAST
F32 I32 I32 2^16 1 14.278 us 1.18% 12.248 us 2.29% -2.030 us -14.22% FAST
F32 I32 I32 2^20 1 28.716 us 1.42% 24.354 us 2.63% -4.362 us -15.19% FAST
F32 I32 I32 2^24 1 204.812 us 0.53% 195.926 us 0.63% -8.886 us -4.34% FAST
F32 I32 I32 2^28 1 3.144 ms 0.06% 3.035 ms 0.06% -108.526 us -3.45% FAST
F32 I32 I32 2^16 0.201 14.013 us 1.79% 12.287 us 0.28% -1.725 us -12.31% FAST
F32 I32 I32 2^20 0.201 28.032 us 3.81% 22.373 us 2.69% -5.659 us -20.19% FAST
F32 I32 I32 2^24 0.201 199.317 us 0.59% 193.732 us 0.57% -5.585 us -2.80% FAST
F32 I32 I32 2^28 0.201 3.061 ms 0.07% 2.967 ms 0.06% -94.236 us -3.08% FAST
F32 I32 I64 2^16 1 14.062 us 1.84% 12.291 us 0.77% -1.771 us -12.59% FAST
F32 I32 I64 2^20 1 24.568 us 0.56% 24.474 us 3.16% -0.094 us -0.38% SAME
F32 I32 I64 2^24 1 196.768 us 0.56% 199.087 us 0.54% 2.318 us 1.18% SLOW
F32 I32 I64 2^28 1 3.039 ms 0.07% 3.080 ms 0.06% 40.604 us 1.34% SLOW
F32 I32 I64 2^16 0.201 13.982 us 1.72% 12.289 us 0.58% -1.693 us -12.11% FAST
F32 I32 I64 2^20 0.201 24.121 us 3.53% 24.650 us 5.18% 0.529 us 2.19% SAME
F32 I32 I64 2^24 0.201 191.768 us 0.71% 197.145 us 0.51% 5.378 us 2.80% SLOW
F32 I32 I64 2^28 0.201 2.968 ms 0.06% 3.011 ms 0.06% 42.600 us 1.44% SLOW
F32 I64 I32 2^16 1 14.272 us 1.20% 14.335 us 0.34% 0.063 us 0.44% SLOW
F32 I64 I32 2^20 1 29.852 us 3.88% 28.728 us 1.25% -1.124 us -3.76% FAST
F32 I64 I32 2^24 1 287.890 us 0.45% 290.512 us 0.41% 2.622 us 0.91% SLOW
F32 I64 I32 2^28 1 4.553 ms 0.04% 4.531 ms 0.05% -21.971 us -0.48% FAST
F32 I64 I32 2^16 0.201 14.048 us 2.01% 14.335 us 0.49% 0.287 us 2.04% SLOW
F32 I64 I32 2^20 0.201 28.763 us 3.29% 28.304 us 5.24% -0.459 us -1.60% SAME
F32 I64 I32 2^24 0.201 283.073 us 0.52% 284.412 us 0.56% 1.339 us 0.47% SAME
F32 I64 I32 2^28 0.201 4.447 ms 0.04% 4.432 ms 0.05% -15.502 us -0.35% FAST
F32 I64 I64 2^16 1 14.025 us 1.81% 14.336 us 0.15% 0.311 us 2.22% SLOW
F32 I64 I64 2^20 1 29.912 us 4.48% 29.495 us 3.65% -0.417 us -1.39% SAME
F32 I64 I64 2^24 1 287.017 us 0.43% 291.484 us 0.46% 4.467 us 1.56% SLOW
F32 I64 I64 2^28 1 4.556 ms 0.04% 4.533 ms 0.05% -22.865 us -0.50% FAST
F32 I64 I64 2^16 0.201 14.047 us 2.63% 14.336 us 0.41% 0.289 us 2.06% SLOW
F32 I64 I64 2^20 0.201 31.451 us 3.35% 27.775 us 4.49% -3.676 us -11.69% FAST
F32 I64 I64 2^24 0.201 282.937 us 0.55% 285.230 us 0.55% 2.294 us 0.81% SLOW
F32 I64 I64 2^28 0.201 4.448 ms 0.03% 4.432 ms 0.04% -16.222 us -0.36% FAST
F64 I8 I32 2^16 1 14.331 us 0.49% 16.046 us 5.00% 1.715 us 11.97% SLOW
F64 I8 I32 2^20 1 26.625 us 0.54% 26.684 us 1.32% 0.059 us 0.22% SAME
F64 I8 I32 2^24 1 223.075 us 0.61% 223.964 us 0.61% 0.889 us 0.40% SAME
F64 I8 I32 2^28 1 3.450 ms 0.07% 3.432 ms 0.08% -17.284 us -0.50% FAST
F64 I8 I32 2^16 0.201 14.333 us 0.57% 14.406 us 2.64% 0.073 us 0.51% SAME
F64 I8 I32 2^20 0.201 25.371 us 3.93% 26.624 us 0.16% 1.253 us 4.94% SLOW
F64 I8 I32 2^24 0.201 218.849 us 0.63% 221.667 us 0.63% 2.818 us 1.29% SLOW
F64 I8 I32 2^28 0.201 3.381 ms 0.06% 3.367 ms 0.06% -14.163 us -0.42% FAST
F64 I8 I64 2^16 1 15.301 us 5.16% 15.665 us 8.58% 0.365 us 2.38% SAME
F64 I8 I64 2^20 1 26.777 us 2.33% 26.920 us 2.74% 0.143 us 0.53% SAME
F64 I8 I64 2^24 1 223.318 us 0.46% 223.959 us 0.68% 0.641 us 0.29% SAME
F64 I8 I64 2^28 1 3.433 ms 0.06% 3.437 ms 0.08% 3.827 us 0.11% SLOW
F64 I8 I64 2^16 0.201 14.252 us 4.61% 14.841 us 5.96% 0.589 us 4.13% SAME
F64 I8 I64 2^20 0.201 26.739 us 1.78% 26.653 us 0.97% -0.086 us -0.32% SAME
F64 I8 I64 2^24 0.201 216.833 us 0.71% 219.843 us 0.60% 3.009 us 1.39% SLOW
F64 I8 I64 2^28 0.201 3.368 ms 0.06% 3.368 ms 0.06% 0.068 us 0.00% SAME
F64 I16 I32 2^16 1 12.156 us 1.97% 13.346 us 7.66% 1.191 us 9.79% SLOW
F64 I16 I32 2^20 1 27.612 us 3.73% 26.988 us 3.00% -0.624 us -2.26% SAME
F64 I16 I32 2^24 1 246.681 us 0.56% 244.881 us 0.50% -1.800 us -0.73% FAST
F64 I16 I32 2^28 1 3.794 ms 0.05% 3.784 ms 0.05% -10.692 us -0.28% FAST
F64 I16 I32 2^16 0.201 15.242 us 5.47% 12.291 us 0.75% -2.951 us -19.36% FAST
F64 I16 I32 2^20 0.201 26.902 us 2.62% 26.749 us 1.93% -0.153 us -0.57% SAME
F64 I16 I32 2^24 0.201 240.418 us 0.55% 240.176 us 0.57% -0.242 us -0.10% SAME
F64 I16 I32 2^28 0.201 3.726 ms 0.06% 3.718 ms 0.05% -7.639 us -0.21% FAST
F64 I16 I64 2^16 1 11.907 us 2.00% 12.382 us 3.50% 0.476 us 4.00% SLOW
F64 I16 I64 2^20 1 27.987 us 4.23% 27.766 us 4.57% -0.221 us -0.79% SAME
F64 I16 I64 2^24 1 246.393 us 0.59% 245.100 us 0.52% -1.293 us -0.52% FAST
F64 I16 I64 2^28 1 3.798 ms 0.06% 3.784 ms 0.05% -13.726 us -0.36% FAST
F64 I16 I64 2^16 0.201 11.914 us 2.23% 12.290 us 0.68% 0.376 us 3.16% SLOW
F64 I16 I64 2^20 0.201 28.460 us 2.32% 26.652 us 0.93% -1.808 us -6.35% FAST
F64 I16 I64 2^24 0.201 241.788 us 0.46% 241.343 us 0.53% -0.445 us -0.18% SAME
F64 I16 I64 2^28 0.201 3.726 ms 0.06% 3.720 ms 0.04% -6.174 us -0.17% FAST
F64 I32 I32 2^16 1 15.573 us 5.22% 16.408 us 5.98% 0.835 us 5.36% SLOW
F64 I32 I32 2^20 1 30.310 us 2.91% 29.129 us 3.07% -1.181 us -3.90% FAST
F64 I32 I32 2^24 1 291.706 us 0.43% 291.053 us 0.42% -0.653 us -0.22% SAME
F64 I32 I32 2^28 1 4.549 ms 0.05% 4.537 ms 0.05% -12.571 us -0.28% FAST
F64 I32 I32 2^16 0.201 14.067 us 2.80% 14.991 us 6.39% 0.923 us 6.56% SLOW
F64 I32 I32 2^20 0.201 28.721 us 1.10% 30.750 us 0.83% 2.029 us 7.07% SLOW
F64 I32 I32 2^24 0.201 284.195 us 0.38% 285.199 us 0.41% 1.004 us 0.35% SAME
F64 I32 I32 2^28 0.201 4.458 ms 0.05% 4.455 ms 0.04% -2.508 us -0.06% FAST
F64 I32 I64 2^16 1 15.918 us 5.40% 16.069 us 7.53% 0.152 us 0.95% SAME
F64 I32 I64 2^20 1 33.099 us 3.92% 34.294 us 3.70% 1.195 us 3.61% SAME
F64 I32 I64 2^24 1 300.797 us 0.43% 290.420 us 0.41% -10.377 us -3.45% FAST
F64 I32 I64 2^28 1 4.674 ms 0.05% 4.529 ms 0.05% -145.095 us -3.10% FAST
F64 I32 I64 2^16 0.201 14.111 us 3.61% 14.683 us 5.25% 0.571 us 4.05% SLOW
F64 I32 I64 2^20 0.201 33.067 us 2.24% 28.823 us 1.87% -4.244 us -12.83% FAST
F64 I32 I64 2^24 0.201 292.563 us 0.35% 282.744 us 0.45% -9.819 us -3.36% FAST
F64 I32 I64 2^28 0.201 4.566 ms 0.04% 4.443 ms 0.05% -122.911 us -2.69% FAST
F64 I64 I32 2^16 1 15.444 us 5.39% 15.456 us 6.69% 0.012 us 0.08% SAME
F64 I64 I32 2^20 1 35.125 us 2.44% 33.948 us 3.98% -1.177 us -3.35% FAST
F64 I64 I32 2^24 1 387.264 us 0.42% 387.693 us 0.33% 0.428 us 0.11% SAME
F64 I64 I32 2^28 1 6.162 ms 0.04% 6.141 ms 0.03% -20.729 us -0.34% FAST
F64 I64 I32 2^16 0.201 15.705 us 3.70% 15.587 us 6.41% -0.118 us -0.75% SAME
F64 I64 I32 2^20 0.201 34.880 us 1.13% 34.041 us 3.25% -0.840 us -2.41% FAST
F64 I64 I32 2^24 0.201 377.055 us 0.40% 377.317 us 0.39% 0.261 us 0.07% SAME
F64 I64 I32 2^28 0.201 5.998 ms 0.04% 5.987 ms 0.03% -10.396 us -0.17% FAST
F64 I64 I64 2^16 1 16.013 us 2.07% 16.038 us 4.87% 0.025 us 0.16% SAME
F64 I64 I64 2^20 1 34.322 us 3.03% 37.649 us 3.31% 3.327 us 9.69% SLOW
F64 I64 I64 2^24 1 388.039 us 0.42% 387.866 us 0.32% -0.173 us -0.04% SAME
F64 I64 I64 2^28 1 6.165 ms 0.03% 6.143 ms 0.03% -22.026 us -0.36% FAST
F64 I64 I64 2^16 0.201 15.317 us 5.19% 12.290 us 0.79% -3.027 us -19.76% FAST
F64 I64 I64 2^20 0.201 32.898 us 3.28% 35.665 us 3.04% 2.767 us 8.41% SLOW
F64 I64 I64 2^24 0.201 377.081 us 0.39% 377.786 us 0.43% 0.706 us 0.19% SAME
F64 I64 I64 2^28 0.201 5.999 ms 0.04% 5.990 ms 0.04% -8.541 us -0.14% FAST

Summary

  • Total Matches: 448
    • Pass (diff <= min_noise): 130
    • Unknown (infinite noise): 0
    • Failure (diff > min_noise): 318
B200 (Using UBLKCPY)

['/home/pgrossebley/merge_pairs_old.json', '/home/pgrossebley/merge_pairs_new.json']

base

[0] NVIDIA B200

KeyT{ct} ValueT{ct} OffsetT{ct} Elements{io} Entropy Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I8 I32 2^16 1 19.573 us 2.81% 19.458 us 1.20% -0.115 us -0.59% SAME
I8 I8 I32 2^20 1 24.124 us 4.21% 24.708 us 4.16% 0.585 us 2.42% SAME
I8 I8 I32 2^24 1 97.590 us 0.76% 99.551 us 1.00% 1.960 us 2.01% SLOW
I8 I8 I32 2^28 1 1.194 ms 0.09% 1.211 ms 0.07% 17.318 us 1.45% SLOW
I8 I8 I32 2^16 0.201 19.457 us 48.01% 19.467 us 1.19% 0.010 us 0.05% SAME
I8 I8 I32 2^20 0.201 23.540 us 43.72% 23.514 us 1.25% -0.026 us -0.11% SAME
I8 I8 I32 2^24 0.201 95.257 us 0.35% 95.084 us 0.63% -0.173 us -0.18% SAME
I8 I8 I32 2^28 0.201 1.191 ms 0.05% 1.161 ms 0.09% -29.387 us -2.47% FAST
I8 I8 I64 2^16 1 19.699 us 3.81% 19.699 us 3.67% 0.001 us 0.00% SAME
I8 I8 I64 2^20 1 24.606 us 4.44% 24.434 us 4.24% -0.171 us -0.70% SAME
I8 I8 I64 2^24 1 79.766 us 1.36% 75.493 us 1.26% -4.273 us -5.36% FAST
I8 I8 I64 2^28 1 913.524 us 0.14% 789.730 us 0.09% -123.794 us -13.55% FAST
I8 I8 I64 2^16 0.201 19.519 us 3.15% 19.518 us 2.47% -0.000 us -0.00% SAME
I8 I8 I64 2^20 0.201 23.478 us 1.68% 23.471 us 1.43% -0.007 us -0.03% SAME
I8 I8 I64 2^24 0.201 78.743 us 0.61% 72.608 us 0.68% -6.135 us -7.79% FAST
I8 I8 I64 2^28 0.201 912.458 us 0.04% 777.221 us 0.03% -135.238 us -14.82% FAST
I8 I16 I32 2^16 1 19.462 us 4.55% 20.163 us 6.25% 0.701 us 3.60% SAME
I8 I16 I32 2^20 1 25.014 us 3.78% 25.185 us 3.35% 0.171 us 0.68% SAME
I8 I16 I32 2^24 1 101.143 us 1.09% 74.896 us 0.61% -26.247 us -25.95% FAST
I8 I16 I32 2^28 1 1.235 ms 0.09% 789.467 us 0.10% -445.253 us -36.06% FAST
I8 I16 I32 2^16 0.201 19.458 us 1.35% 19.427 us 1.07% -0.032 us -0.16% SAME
I8 I16 I32 2^20 0.201 23.484 us 1.73% 23.528 us 2.26% 0.045 us 0.19% SAME
I8 I16 I32 2^24 0.201 98.810 us 0.97% 72.656 us 0.50% -26.154 us -26.47% FAST
I8 I16 I32 2^28 0.201 1.230 ms 0.05% 786.326 us 0.14% -443.582 us -36.07% FAST
I8 I16 I64 2^16 1 19.616 us 3.83% 20.193 us 5.33% 0.577 us 2.94% SAME
I8 I16 I64 2^20 1 25.000 us 3.86% 25.008 us 3.91% 0.008 us 0.03% SAME
I8 I16 I64 2^24 1 81.478 us 0.92% 75.259 us 1.01% -6.219 us -7.63% FAST
I8 I16 I64 2^28 1 921.913 us 0.14% 794.836 us 0.15% -127.077 us -13.78% FAST
I8 I16 I64 2^16 0.201 19.420 us 2.88% 19.317 us 1.93% -0.102 us -0.53% SAME
I8 I16 I64 2^20 0.201 23.772 us 3.32% 23.805 us 3.37% 0.034 us 0.14% SAME
I8 I16 I64 2^24 0.201 79.699 us 1.38% 72.678 us 0.50% -7.021 us -8.81% FAST
I8 I16 I64 2^28 0.201 920.780 us 0.07% 788.065 us 0.12% -132.716 us -14.41% FAST
I8 I32 I32 2^16 1 19.375 us 1.73% 19.401 us 1.86% 0.026 us 0.13% SAME
I8 I32 I32 2^20 1 25.604 us 2.06% 25.740 us 2.66% 0.135 us 0.53% SAME
I8 I32 I32 2^24 1 86.496 us 1.50% 82.303 us 1.52% -4.193 us -4.85% FAST
I8 I32 I32 2^28 1 1.000 ms 0.18% 918.396 us 0.13% -81.693 us -8.17% FAST
I8 I32 I32 2^16 0.201 19.234 us 2.38% 19.170 us 2.55% -0.064 us -0.33% SAME
I8 I32 I32 2^20 0.201 25.629 us 1.61% 25.611 us 1.90% -0.018 us -0.07% SAME
I8 I32 I32 2^24 0.201 85.008 us 0.66% 79.324 us 1.20% -5.684 us -6.69% FAST
I8 I32 I32 2^28 0.201 992.058 us 0.24% 907.678 us 0.13% -84.380 us -8.51% FAST
I8 I32 I64 2^16 1 19.449 us 1.57% 19.441 us 1.55% -0.008 us -0.04% SAME
I8 I32 I64 2^20 1 25.833 us 2.22% 26.450 us 4.17% 0.616 us 2.39% SLOW
I8 I32 I64 2^24 1 86.781 us 1.48% 83.467 us 0.99% -3.314 us -3.82% FAST
I8 I32 I64 2^28 1 1.011 ms 0.10% 925.980 us 0.12% -85.293 us -8.43% FAST
I8 I32 I64 2^16 0.201 19.342 us 2.26% 19.323 us 2.39% -0.018 us -0.09% SAME
I8 I32 I64 2^20 0.201 25.708 us 1.81% 25.725 us 1.90% 0.017 us 0.07% SAME
I8 I32 I64 2^24 0.201 85.119 us 0.54% 81.046 us 0.60% -4.073 us -4.78% FAST
I8 I32 I64 2^28 0.201 1.007 ms 0.10% 918.444 us 0.08% -88.726 us -8.81% FAST
I8 I64 I32 2^16 1 17.961 us 6.38% 19.294 us 2.00% 1.333 us 7.42% SLOW
I8 I64 I32 2^20 1 26.470 us 2.90% 26.438 us 3.24% -0.032 us -0.12% SAME
I8 I64 I32 2^24 1 95.310 us 1.30% 87.530 us 1.00% -7.779 us -8.16% FAST
I8 I64 I32 2^28 1 1.156 ms 0.13% 1.023 ms 0.13% -133.769 us -11.57% FAST
I8 I64 I32 2^16 0.201 17.589 us 3.80% 17.950 us 6.19% 0.361 us 2.05% SAME
I8 I64 I32 2^20 0.201 25.621 us 1.10% 25.196 us 3.47% -0.424 us -1.66% FAST
I8 I64 I32 2^24 0.201 93.308 us 0.51% 85.195 us 0.67% -8.113 us -8.69% FAST
I8 I64 I32 2^28 0.201 1.138 ms 0.12% 989.851 us 0.11% -148.160 us -13.02% FAST
I8 I64 I64 2^16 1 17.853 us 5.72% 19.332 us 2.35% 1.479 us 8.29% SLOW
I8 I64 I64 2^20 1 26.897 us 3.25% 27.014 us 4.30% 0.117 us 0.43% SAME
I8 I64 I64 2^24 1 94.548 us 1.38% 96.187 us 0.89% 1.639 us 1.73% SLOW
I8 I64 I64 2^28 1 1.149 ms 0.14% 1.152 ms 0.12% 2.907 us 0.25% SLOW
I8 I64 I64 2^16 0.201 17.741 us 3.80% 17.980 us 5.45% 0.239 us 1.35% SAME
I8 I64 I64 2^20 0.201 25.781 us 2.14% 25.799 us 2.13% 0.017 us 0.07% SAME
I8 I64 I64 2^24 0.201 93.338 us 0.74% 93.300 us 0.68% -0.038 us -0.04% SAME
I8 I64 I64 2^28 0.201 1.134 ms 0.12% 1.128 ms 0.10% -6.164 us -0.54% FAST
I16 I8 I32 2^16 1 20.770 us 6.10% 19.843 us 5.22% -0.927 us -4.47% SAME
I16 I8 I32 2^20 1 27.022 us 3.16% 26.919 us 3.42% -0.103 us -0.38% SAME
I16 I8 I32 2^24 1 119.229 us 0.88% 115.362 us 0.71% -3.867 us -3.24% FAST
I16 I8 I32 2^28 1 1.485 ms 0.07% 1.456 ms 0.07% -29.618 us -1.99% FAST
I16 I8 I32 2^16 0.201 20.463 us 5.47% 19.812 us 3.20% -0.651 us -3.18% SAME
I16 I8 I32 2^20 0.201 25.603 us 4.89% 24.427 us 3.84% -1.176 us -4.59% FAST
I16 I8 I32 2^24 0.201 106.979 us 0.73% 100.798 us 0.90% -6.181 us -5.78% FAST
I16 I8 I32 2^28 0.201 1.324 ms 0.10% 1.247 ms 0.11% -77.495 us -5.85% FAST
I16 I8 I64 2^16 1 20.998 us 5.94% 19.814 us 3.81% -1.185 us -5.64% FAST
I16 I8 I64 2^20 1 27.203 us 2.85% 27.215 us 2.87% 0.012 us 0.04% SAME
I16 I8 I64 2^24 1 119.784 us 0.52% 116.691 us 1.03% -3.093 us -2.58% FAST
I16 I8 I64 2^28 1 1.488 ms 0.07% 1.478 ms 0.05% -10.695 us -0.72% FAST
I16 I8 I64 2^16 0.201 20.731 us 5.91% 19.840 us 3.12% -0.891 us -4.30% FAST
I16 I8 I64 2^20 0.201 25.281 us 5.30% 24.522 us 4.57% -0.759 us -3.00% SAME
I16 I8 I64 2^24 0.201 111.438 us 0.49% 102.587 us 1.12% -8.851 us -7.94% FAST
I16 I8 I64 2^28 0.201 1.381 ms 0.11% 1.263 ms 0.11% -118.281 us -8.56% FAST
I16 I16 I32 2^16 1 21.159 us 4.53% 21.532 us 4.04% 0.372 us 1.76% SAME
I16 I16 I32 2^20 1 26.326 us 3.64% 26.144 us 3.25% -0.182 us -0.69% SAME
I16 I16 I32 2^24 1 121.190 us 0.82% 91.524 us 0.93% -29.666 us -24.48% FAST
I16 I16 I32 2^28 1 1.505 ms 0.07% 1.033 ms 0.05% -472.148 us -31.36% FAST
I16 I16 I32 2^16 0.201 20.324 us 5.76% 20.358 us 3.98% 0.034 us 0.17% SAME
I16 I16 I32 2^20 0.201 25.464 us 5.05% 25.917 us 3.50% 0.453 us 1.78% SAME
I16 I16 I32 2^24 0.201 109.508 us 0.51% 82.207 us 1.31% -27.301 us -24.93% FAST
I16 I16 I32 2^28 0.201 1.348 ms 0.08% 907.281 us 0.16% -440.981 us -32.71% FAST
I16 I16 I64 2^16 1 20.898 us 5.62% 21.701 us 4.29% 0.804 us 3.85% SAME
I16 I16 I64 2^20 1 27.293 us 2.63% 27.221 us 2.74% -0.072 us -0.27% SAME
I16 I16 I64 2^24 1 120.822 us 0.90% 91.125 us 0.57% -29.697 us -24.58% FAST
I16 I16 I64 2^28 1 1.494 ms 0.07% 1.029 ms 0.12% -464.809 us -31.11% FAST
I16 I16 I64 2^16 0.201 20.455 us 5.58% 20.636 us 4.45% 0.181 us 0.88% SAME
I16 I16 I64 2^20 0.201 26.498 us 4.21% 26.578 us 3.58% 0.081 us 0.31% SAME
I16 I16 I64 2^24 0.201 111.439 us 0.96% 82.181 us 1.42% -29.258 us -26.25% FAST
I16 I16 I64 2^28 0.201 1.383 ms 0.09% 907.890 us 0.18% -475.244 us -34.36% FAST
I16 I32 I32 2^16 1 19.577 us 2.64% 19.969 us 3.58% 0.392 us 2.00% SAME
I16 I32 I32 2^20 1 27.376 us 2.34% 27.312 us 2.42% -0.064 us -0.23% SAME
I16 I32 I32 2^24 1 100.785 us 1.09% 83.040 us 0.96% -17.745 us -17.61% FAST
I16 I32 I32 2^28 1 1.195 ms 0.12% 924.125 us 0.15% -270.550 us -22.65% FAST
I16 I32 I32 2^16 0.201 20.105 us 4.19% 19.756 us 6.03% -0.349 us -1.74% SAME
I16 I32 I32 2^20 0.201 26.229 us 4.87% 24.768 us 4.53% -1.460 us -5.57% FAST
I16 I32 I32 2^24 0.201 94.800 us 0.77% 75.179 us 1.63% -19.621 us -20.70% FAST
I16 I32 I32 2^28 0.201 1.104 ms 0.10% 804.078 us 0.17% -299.787 us -27.16% FAST
I16 I32 I64 2^16 1 19.876 us 47.81% 20.013 us 3.56% 0.137 us 0.69% SAME
I16 I32 I64 2^20 1 27.600 us 1.24% 27.583 us 1.46% -0.017 us -0.06% SAME
I16 I32 I64 2^24 1 97.362 us 0.63% 101.477 us 0.61% 4.115 us 4.23% SLOW
I16 I32 I64 2^28 1 1.147 ms 0.12% 1.225 ms 0.12% 77.805 us 6.78% SLOW
I16 I32 I64 2^16 0.201 20.484 us 4.53% 20.512 us 4.37% 0.028 us 0.14% SAME
I16 I32 I64 2^20 0.201 25.638 us 5.82% 25.375 us 5.24% -0.263 us -1.02% SAME
I16 I32 I64 2^24 0.201 92.163 us 1.22% 89.040 us 0.76% -3.123 us -3.39% FAST
I16 I32 I64 2^28 0.201 1.069 ms 0.11% 1.051 ms 0.21% -18.517 us -1.73% FAST
I16 I64 I32 2^16 1 19.617 us 6.40% 20.073 us 3.14% 0.456 us 2.32% SAME
I16 I64 I32 2^20 1 29.246 us 3.00% 29.621 us 1.30% 0.374 us 1.28% SAME
I16 I64 I32 2^24 1 109.895 us 0.70% 104.130 us 0.98% -5.764 us -5.25% FAST
I16 I64 I32 2^28 1 1.370 ms 0.10% 1.280 ms 0.10% -89.755 us -6.55% FAST
I16 I64 I32 2^16 0.201 18.980 us 6.96% 20.204 us 6.49% 1.224 us 6.45% SAME
I16 I64 I32 2^20 0.201 26.430 us 2.92% 26.422 us 2.77% -0.008 us -0.03% SAME
I16 I64 I32 2^24 0.201 101.650 us 1.08% 93.122 us 0.94% -8.528 us -8.39% FAST
I16 I64 I32 2^28 0.201 1.234 ms 0.12% 1.098 ms 0.11% -136.243 us -11.04% FAST
I16 I64 I64 2^16 1 19.648 us 7.30% 20.204 us 4.04% 0.556 us 2.83% SAME
I16 I64 I64 2^20 1 29.728 us 1.43% 29.639 us 1.09% -0.089 us -0.30% SAME
I16 I64 I64 2^24 1 126.116 us 0.47% 115.773 us 0.62% -10.343 us -8.20% FAST
I16 I64 I64 2^28 1 1.651 ms 0.08% 1.458 ms 0.07% -193.390 us -11.71% FAST
I16 I64 I64 2^16 0.201 19.510 us 6.66% 20.700 us 4.79% 1.191 us 6.10% SLOW
I16 I64 I64 2^20 0.201 29.253 us 2.63% 27.134 us 2.86% -2.119 us -7.24% FAST
I16 I64 I64 2^24 0.201 117.456 us 0.66% 101.980 us 1.07% -15.477 us -13.18% FAST
I16 I64 I64 2^28 0.201 1.500 ms 0.15% 1.240 ms 0.12% -259.671 us -17.31% FAST
I32 I8 I32 2^16 1 21.239 us 2.98% 21.271 us 2.42% 0.032 us 0.15% SAME
I32 I8 I32 2^20 1 28.047 us 3.15% 27.626 us 1.13% -0.421 us -1.50% FAST
I32 I8 I32 2^24 1 93.766 us 1.61% 90.009 us 1.11% -3.757 us -4.01% FAST
I32 I8 I32 2^28 1 1.070 ms 0.07% 1.041 ms 0.10% -29.713 us -2.78% FAST
I32 I8 I32 2^16 0.201 21.053 us 3.77% 21.227 us 3.22% 0.174 us 0.83% SAME
I32 I8 I32 2^20 0.201 27.574 us 1.52% 27.204 us 3.40% -0.369 us -1.34% SAME
I32 I8 I32 2^24 0.201 85.095 us 0.55% 79.782 us 1.37% -5.313 us -6.24% FAST
I32 I8 I32 2^28 0.201 960.520 us 0.12% 905.889 us 0.33% -54.631 us -5.69% FAST
I32 I8 I64 2^16 1 21.383 us 1.93% 21.450 us 1.53% 0.067 us 0.32% SAME
I32 I8 I64 2^20 1 27.707 us 1.91% 27.638 us 1.14% -0.069 us -0.25% SAME
I32 I8 I64 2^24 1 93.047 us 1.42% 92.631 us 1.04% -0.416 us -0.45% SAME
I32 I8 I64 2^28 1 1.068 ms 0.09% 1.107 ms 0.22% 38.754 us 3.63% SLOW
I32 I8 I64 2^16 0.201 21.191 us 2.84% 21.273 us 2.47% 0.081 us 0.38% SAME
I32 I8 I64 2^20 0.201 27.626 us 1.02% 27.227 us 3.26% -0.399 us -1.45% FAST
I32 I8 I64 2^24 0.201 85.115 us 0.62% 80.742 us 1.14% -4.373 us -5.14% FAST
I32 I8 I64 2^28 0.201 955.441 us 0.16% 906.789 us 0.20% -48.652 us -5.09% FAST
I32 I16 I32 2^16 1 21.248 us 2.77% 21.482 us 1.55% 0.235 us 1.10% SAME
I32 I16 I32 2^20 1 27.627 us 1.10% 27.626 us 1.05% -0.000 us -0.00% SAME
I32 I16 I32 2^24 1 99.638 us 1.61% 81.847 us 1.09% -17.790 us -17.85% FAST
I32 I16 I32 2^28 1 1.190 ms 0.11% 902.915 us 0.13% -287.375 us -24.14% FAST
I32 I16 I32 2^16 0.201 21.301 us 2.74% 20.568 us 6.00% -0.733 us -3.44% FAST
I32 I16 I32 2^20 0.201 27.083 us 3.64% 25.580 us 2.06% -1.503 us -5.55% FAST
I32 I16 I32 2^24 0.201 90.124 us 1.25% 72.124 us 1.52% -18.000 us -19.97% FAST
I32 I16 I32 2^28 0.201 1.054 ms 0.10% 760.408 us 0.13% -293.571 us -27.85% FAST
I32 I16 I64 2^16 1 21.333 us 1.96% 21.387 us 1.92% 0.054 us 0.25% SAME
I32 I16 I64 2^20 1 27.802 us 1.83% 27.787 us 1.56% -0.014 us -0.05% SAME
I32 I16 I64 2^24 1 98.670 us 0.85% 99.816 us 1.43% 1.146 us 1.16% SLOW
I32 I16 I64 2^28 1 1.176 ms 0.11% 1.206 ms 0.11% 30.851 us 2.62% SLOW
I32 I16 I64 2^16 0.201 21.367 us 1.90% 21.375 us 2.35% 0.007 us 0.03% SAME
I32 I16 I64 2^20 0.201 26.329 us 4.14% 25.757 us 2.75% -0.572 us -2.17% SAME
I32 I16 I64 2^24 0.201 89.244 us 0.78% 85.173 us 0.87% -4.071 us -4.56% FAST
I32 I16 I64 2^28 0.201 1.045 ms 0.08% 986.327 us 0.17% -59.087 us -5.65% FAST
I32 I32 I32 2^16 1 19.447 us 3.73% 21.284 us 2.06% 1.837 us 9.45% SLOW
I32 I32 I32 2^20 1 26.870 us 4.28% 27.766 us 1.48% 0.896 us 3.33% SLOW
I32 I32 I32 2^24 1 93.767 us 1.54% 89.943 us 1.57% -3.824 us -4.08% FAST
I32 I32 I32 2^28 1 1.114 ms 0.11% 1.042 ms 0.09% -72.662 us -6.52% FAST
I32 I32 I32 2^16 0.201 19.372 us 2.08% 21.464 us 1.32% 2.092 us 10.80% SLOW
I32 I32 I32 2^20 0.201 25.563 us 1.39% 25.901 us 3.19% 0.338 us 1.32% SAME
I32 I32 I32 2^24 0.201 82.997 us 0.47% 76.974 us 0.83% -6.023 us -7.26% FAST
I32 I32 I32 2^28 0.201 951.397 us 0.16% 867.063 us 0.16% -84.335 us -8.86% FAST
I32 I32 I64 2^16 1 19.627 us 4.39% 21.382 us 1.59% 1.755 us 8.94% SLOW
I32 I32 I64 2^20 1 27.859 us 2.35% 27.933 us 2.03% 0.074 us 0.27% SAME
I32 I32 I64 2^24 1 94.981 us 0.65% 99.573 us 1.01% 4.592 us 4.83% SLOW
I32 I32 I64 2^28 1 1.126 ms 0.10% 1.208 ms 0.06% 81.570 us 7.24% SLOW
I32 I32 I64 2^16 0.201 19.335 us 1.88% 21.374 us 1.74% 2.039 us 10.54% SLOW
I32 I32 I64 2^20 0.201 25.613 us 1.90% 26.269 us 3.91% 0.656 us 2.56% SLOW
I32 I32 I64 2^24 0.201 84.896 us 1.12% 85.284 us 0.66% 0.388 us 0.46% SAME
I32 I32 I64 2^28 0.201 974.916 us 0.13% 992.848 us 0.13% 17.932 us 1.84% SLOW
I32 I64 I32 2^16 1 20.023 us 4.96% 21.363 us 1.65% 1.340 us 6.69% SLOW
I32 I64 I32 2^20 1 27.630 us 0.88% 28.368 us 3.57% 0.737 us 2.67% SLOW
I32 I64 I32 2^24 1 119.061 us 0.89% 113.541 us 0.40% -5.520 us -4.64% FAST
I32 I64 I32 2^28 1 1.535 ms 0.10% 1.447 ms 0.10% -88.477 us -5.76% FAST
I32 I64 I32 2^16 0.201 19.515 us 2.91% 19.561 us 3.58% 0.046 us 0.23% SAME
I32 I64 I32 2^20 0.201 27.580 us 1.37% 27.623 us 1.02% 0.043 us 0.16% SAME
I32 I64 I32 2^24 0.201 105.425 us 0.91% 98.222 us 1.17% -7.203 us -6.83% FAST
I32 I64 I32 2^28 0.201 1.325 ms 0.12% 1.214 ms 0.12% -111.205 us -8.39% FAST
I32 I64 I64 2^16 1 19.902 us 4.86% 21.447 us 1.54% 1.545 us 7.76% SLOW
I32 I64 I64 2^20 1 27.874 us 1.92% 28.458 us 3.74% 0.585 us 2.10% SLOW
I32 I64 I64 2^24 1 119.450 us 0.62% 125.853 us 0.29% 6.403 us 5.36% SLOW
I32 I64 I64 2^28 1 1.535 ms 0.09% 1.651 ms 0.08% 116.151 us 7.57% SLOW
I32 I64 I64 2^16 0.201 19.516 us 2.47% 19.455 us 2.46% -0.061 us -0.31% SAME
I32 I64 I64 2^20 0.201 27.638 us 1.22% 27.645 us 1.16% 0.007 us 0.02% SAME
I32 I64 I64 2^24 0.201 104.524 us 1.20% 107.163 us 1.09% 2.639 us 2.52% SLOW
I32 I64 I64 2^28 0.201 1.320 ms 0.13% 1.362 ms 0.13% 41.413 us 3.14% SLOW
I64 I8 I32 2^16 1 21.516 us 7.69% 21.856 us 3.22% 0.340 us 1.58% SAME
I64 I8 I32 2^20 1 32.981 us 3.52% 31.474 us 2.14% -1.507 us -4.57% FAST
I64 I8 I32 2^24 1 128.282 us 0.62% 104.077 us 1.02% -24.205 us -18.87% FAST
I64 I8 I32 2^28 1 1.658 ms 0.06% 1.275 ms 0.07% -382.803 us -23.09% FAST
I64 I8 I32 2^16 0.201 20.401 us 5.63% 19.694 us 4.62% -0.707 us -3.46% SAME
I64 I8 I32 2^20 0.201 30.173 us 4.10% 28.485 us 3.13% -1.688 us -5.59% FAST
I64 I8 I32 2^24 0.201 113.459 us 0.50% 87.436 us 1.18% -26.023 us -22.94% FAST
I64 I8 I32 2^28 0.201 1.438 ms 0.13% 1.009 ms 0.14% -428.640 us -29.82% FAST
I64 I8 I64 2^16 1 21.701 us 7.30% 21.940 us 3.27% 0.238 us 1.10% SAME
I64 I8 I64 2^20 1 32.458 us 3.84% 31.555 us 1.76% -0.904 us -2.78% FAST
I64 I8 I64 2^24 1 128.110 us 0.58% 105.844 us 0.90% -22.266 us -17.38% FAST
I64 I8 I64 2^28 1 1.650 ms 0.08% 1.302 ms 0.08% -347.607 us -21.07% FAST
I64 I8 I64 2^16 0.201 20.492 us 5.46% 19.895 us 4.60% -0.597 us -2.91% SAME
I64 I8 I64 2^20 0.201 30.734 us 3.41% 28.831 us 3.20% -1.904 us -6.19% FAST
I64 I8 I64 2^24 0.201 113.545 us 0.34% 89.193 us 0.74% -24.352 us -21.45% FAST
I64 I8 I64 2^28 0.201 1.448 ms 0.14% 1.038 ms 0.13% -410.241 us -28.34% FAST
I64 I16 I32 2^16 1 20.870 us 6.23% 22.027 us 3.15% 1.157 us 5.55% SLOW
I64 I16 I32 2^20 1 33.192 us 2.77% 31.295 us 2.45% -1.897 us -5.72% FAST
I64 I16 I32 2^24 1 130.129 us 0.37% 113.957 us 0.67% -16.172 us -12.43% FAST
I64 I16 I32 2^28 1 1.678 ms 0.07% 1.414 ms 0.10% -264.023 us -15.73% FAST
I64 I16 I32 2^16 0.201 19.649 us 3.35% 20.084 us 4.96% 0.435 us 2.21% SAME
I64 I16 I32 2^20 0.201 30.187 us 2.84% 28.155 us 2.63% -2.032 us -6.73% FAST
I64 I16 I32 2^24 0.201 114.651 us 1.00% 97.196 us 0.43% -17.456 us -15.23% FAST
I64 I16 I32 2^28 0.201 1.465 ms 0.20% 1.158 ms 0.15% -307.506 us -20.99% FAST
I64 I16 I64 2^16 1 21.374 us 6.64% 22.109 us 3.67% 0.735 us 3.44% SAME
I64 I16 I64 2^20 1 33.050 us 3.21% 31.648 us 1.46% -1.402 us -4.24% FAST
I64 I16 I64 2^24 1 130.118 us 0.56% 107.618 us 0.59% -22.500 us -17.29% FAST
I64 I16 I64 2^28 1 1.684 ms 0.08% 1.317 ms 0.11% -367.425 us -21.82% FAST
I64 I16 I64 2^16 0.201 19.982 us 4.95% 20.379 us 5.67% 0.397 us 1.99% SAME
I64 I16 I64 2^20 0.201 30.475 us 3.42% 28.664 us 3.18% -1.811 us -5.94% FAST
I64 I16 I64 2^24 0.201 113.993 us 0.91% 91.004 us 0.47% -22.989 us -20.17% FAST
I64 I16 I64 2^28 0.201 1.461 ms 0.16% 1.064 ms 0.13% -397.676 us -27.22% FAST
I64 I32 I32 2^16 1 21.059 us 5.92% 22.366 us 4.32% 1.308 us 6.21% SLOW
I64 I32 I32 2^20 1 31.101 us 2.78% 31.132 us 2.82% 0.031 us 0.10% SAME
I64 I32 I32 2^24 1 125.177 us 1.03% 128.277 us 0.57% 3.100 us 2.48% SLOW
I64 I32 I32 2^28 1 1.582 ms 0.11% 1.658 ms 0.09% 75.905 us 4.80% SLOW
I64 I32 I32 2^16 0.201 20.402 us 3.72% 20.628 us 5.71% 0.226 us 1.11% SAME
I64 I32 I32 2^20 0.201 28.409 us 3.00% 28.429 us 2.61% 0.020 us 0.07% SAME
I64 I32 I32 2^24 0.201 106.029 us 0.98% 107.382 us 0.54% 1.353 us 1.28% SLOW
I64 I32 I32 2^28 0.201 1.321 ms 0.15% 1.339 ms 0.13% 17.856 us 1.35% SLOW
I64 I32 I64 2^16 1 21.520 us 6.94% 22.700 us 4.01% 1.180 us 5.48% SLOW
I64 I32 I64 2^20 1 31.533 us 1.81% 31.571 us 1.64% 0.038 us 0.12% SAME
I64 I32 I64 2^24 1 126.247 us 0.89% 129.323 us 1.02% 3.077 us 2.44% SLOW
I64 I32 I64 2^28 1 1.590 ms 0.10% 1.678 ms 0.10% 88.107 us 5.54% SLOW
I64 I32 I64 2^16 0.201 20.359 us 4.35% 20.692 us 5.17% 0.333 us 1.64% SAME
I64 I32 I64 2^20 0.201 28.405 us 2.91% 28.385 us 3.02% -0.020 us -0.07% SAME
I64 I32 I64 2^24 0.201 107.114 us 0.80% 108.109 us 0.94% 0.995 us 0.93% SLOW
I64 I32 I64 2^28 0.201 1.333 ms 0.14% 1.354 ms 0.12% 20.993 us 1.57% SLOW
I64 I64 I32 2^16 1 20.064 us 3.56% 21.544 us 4.45% 1.480 us 7.38% SLOW
I64 I64 I32 2^20 1 33.409 us 2.28% 33.367 us 2.40% -0.043 us -0.13% SAME
I64 I64 I32 2^24 1 150.021 us 0.82% 150.880 us 0.51% 0.859 us 0.57% SLOW
I64 I64 I32 2^28 1 2.055 ms 0.06% 2.073 ms 0.07% 18.248 us 0.89% SLOW
I64 I64 I32 2^16 0.201 19.507 us 1.68% 19.741 us 3.12% 0.233 us 1.20% SAME
I64 I64 I32 2^20 0.201 29.892 us 3.71% 29.745 us 3.64% -0.147 us -0.49% SAME
I64 I64 I32 2^24 0.201 123.909 us 0.74% 125.102 us 0.93% 1.193 us 0.96% SLOW
I64 I64 I32 2^28 0.201 1.640 ms 0.10% 1.669 ms 0.09% 29.922 us 1.83% SLOW
I64 I64 I64 2^16 1 19.958 us 3.55% 21.678 us 4.85% 1.720 us 8.62% SLOW
I64 I64 I64 2^20 1 33.550 us 1.91% 33.750 us 1.72% 0.199 us 0.59% SAME
I64 I64 I64 2^24 1 151.296 us 0.60% 153.654 us 0.81% 2.358 us 1.56% SLOW
I64 I64 I64 2^28 1 2.060 ms 0.08% 2.110 ms 0.07% 50.212 us 2.44% SLOW
I64 I64 I64 2^16 0.201 19.727 us 2.97% 19.696 us 2.78% -0.031 us -0.16% SAME
I64 I64 I64 2^20 0.201 29.826 us 4.63% 30.584 us 3.04% 0.758 us 2.54% SAME
I64 I64 I64 2^24 0.201 123.921 us 0.78% 126.374 us 0.77% 2.452 us 1.98% SLOW
I64 I64 I64 2^28 0.201 1.641 ms 0.10% 1.697 ms 0.08% 55.567 us 3.39% SLOW
I128 I8 I32 2^16 1 23.201 us 2.47% 23.320 us 1.96% 0.119 us 0.51% SAME
I128 I8 I32 2^20 1 38.013 us 1.31% 36.721 us 2.98% -1.292 us -3.40% FAST
I128 I8 I32 2^24 1 212.362 us 0.49% 185.823 us 0.50% -26.539 us -12.50% FAST
I128 I8 I32 2^28 1 3.014 ms 0.05% 2.586 ms 0.06% -427.060 us -14.17% FAST
I128 I8 I32 2^16 0.201 21.458 us 1.33% 21.560 us 2.26% 0.102 us 0.48% SAME
I128 I8 I32 2^20 0.201 33.735 us 1.02% 32.088 us 2.81% -1.647 us -4.88% FAST
I128 I8 I32 2^24 0.201 155.172 us 0.69% 138.279 us 0.34% -16.893 us -10.89% FAST
I128 I8 I32 2^28 0.201 2.118 ms 0.07% 1.838 ms 0.09% -280.172 us -13.23% FAST
I128 I8 I64 2^16 1 22.777 us 5.17% 23.510 us 1.40% 0.733 us 3.22% SLOW
I128 I8 I64 2^20 1 38.197 us 1.56% 36.897 us 3.03% -1.299 us -3.40% FAST
I128 I8 I64 2^24 1 212.212 us 0.27% 185.131 us 0.57% -27.081 us -12.76% FAST
I128 I8 I64 2^28 1 2.999 ms 0.05% 2.560 ms 0.06% -439.168 us -14.64% FAST
I128 I8 I64 2^16 0.201 21.363 us 1.75% 21.499 us 2.74% 0.136 us 0.64% SAME
I128 I8 I64 2^20 0.201 33.789 us 1.04% 31.786 us 1.38% -2.004 us -5.93% FAST
I128 I8 I64 2^24 0.201 154.186 us 0.69% 136.560 us 0.61% -17.625 us -11.43% FAST
I128 I8 I64 2^28 0.201 2.095 ms 0.07% 1.828 ms 0.12% -267.215 us -12.75% FAST
I128 I16 I32 2^16 1 23.535 us 1.72% 23.535 us 1.61% 0.000 us 0.00% SAME
I128 I16 I32 2^20 1 38.041 us 1.30% 37.505 us 2.72% -0.536 us -1.41% FAST
I128 I16 I32 2^24 1 215.441 us 0.66% 189.456 us 0.40% -25.985 us -12.06% FAST
I128 I16 I32 2^28 1 3.060 ms 0.04% 2.628 ms 0.06% -432.750 us -14.14% FAST
I128 I16 I32 2^16 0.201 21.251 us 2.56% 21.496 us 4.50% 0.245 us 1.15% SAME
I128 I16 I32 2^20 0.201 33.754 us 1.18% 32.937 us 3.30% -0.817 us -2.42% FAST
I128 I16 I32 2^24 0.201 160.804 us 0.25% 142.071 us 0.68% -18.733 us -11.65% FAST
I128 I16 I32 2^28 0.201 2.201 ms 0.07% 1.897 ms 0.08% -303.687 us -13.80% FAST
I128 I16 I64 2^16 1 22.820 us 5.03% 23.471 us 2.00% 0.651 us 2.85% SLOW
I128 I16 I64 2^20 1 38.200 us 1.51% 37.563 us 3.03% -0.637 us -1.67% FAST
I128 I16 I64 2^24 1 214.181 us 0.42% 187.874 us 0.47% -26.307 us -12.28% FAST
I128 I16 I64 2^28 1 3.036 ms 0.04% 2.610 ms 0.06% -426.713 us -14.05% FAST
I128 I16 I64 2^16 0.201 21.399 us 1.97% 21.888 us 4.36% 0.489 us 2.29% SLOW
I128 I16 I64 2^20 0.201 33.819 us 1.13% 32.247 us 2.96% -1.572 us -4.65% FAST
I128 I16 I64 2^24 0.201 159.341 us 0.68% 140.597 us 0.58% -18.744 us -11.76% FAST
I128 I16 I64 2^28 0.201 2.184 ms 0.07% 1.889 ms 0.10% -294.931 us -13.50% FAST
I128 I32 I32 2^16 1 23.439 us 1.69% 23.431 us 1.81% -0.008 us -0.03% SAME
I128 I32 I32 2^20 1 38.038 us 1.38% 37.533 us 2.34% -0.506 us -1.33% SAME
I128 I32 I32 2^24 1 216.163 us 0.23% 189.592 us 0.29% -26.571 us -12.29% FAST
I128 I32 I32 2^28 1 3.079 ms 0.05% 2.638 ms 0.06% -440.807 us -14.32% FAST
I128 I32 I32 2^16 0.201 21.457 us 1.85% 22.666 us 5.18% 1.208 us 5.63% SLOW
I128 I32 I32 2^20 0.201 33.799 us 1.61% 33.766 us 1.18% -0.033 us -0.10% SAME
I128 I32 I32 2^24 0.201 163.738 us 0.68% 145.158 us 0.78% -18.580 us -11.35% FAST
I128 I32 I32 2^28 0.201 2.259 ms 0.09% 1.957 ms 0.09% -301.951 us -13.37% FAST
I128 I32 I64 2^16 1 22.847 us 4.63% 23.352 us 1.73% 0.506 us 2.21% SLOW
I128 I32 I64 2^20 1 38.656 us 2.66% 38.059 us 1.99% -0.597 us -1.54% SAME
I128 I32 I64 2^24 1 215.284 us 0.55% 189.222 us 0.56% -26.063 us -12.11% FAST
I128 I32 I64 2^28 1 3.057 ms 0.05% 2.623 ms 0.06% -433.372 us -14.18% FAST
I128 I32 I64 2^16 0.201 21.553 us 2.91% 22.496 us 5.35% 0.944 us 4.38% SLOW
I128 I32 I64 2^20 0.201 33.836 us 1.09% 33.222 us 3.02% -0.614 us -1.81% FAST
I128 I32 I64 2^24 0.201 163.216 us 0.49% 144.636 us 0.40% -18.580 us -11.38% FAST
I128 I32 I64 2^28 0.201 2.248 ms 0.09% 1.948 ms 0.09% -300.298 us -13.36% FAST
I128 I64 I32 2^16 1 21.963 us 4.21% 22.992 us 4.34% 1.029 us 4.69% SLOW
I128 I64 I32 2^20 1 38.385 us 1.69% 40.200 us 1.36% 1.815 us 4.73% SLOW
I128 I64 I32 2^24 1 238.475 us 0.18% 238.727 us 0.33% 0.252 us 0.11% SAME
I128 I64 I32 2^28 1 3.463 ms 0.04% 3.466 ms 0.06% 3.719 us 0.11% SLOW
I128 I64 I32 2^16 0.201 21.405 us 1.63% 21.572 us 2.67% 0.167 us 0.78% SAME
I128 I64 I32 2^20 0.201 35.838 us 0.58% 35.925 us 1.51% 0.087 us 0.24% SAME
I128 I64 I32 2^24 0.201 193.232 us 0.40% 202.624 us 0.51% 9.392 us 4.86% SLOW
I128 I64 I32 2^28 0.201 2.746 ms 0.06% 2.904 ms 0.07% 157.684 us 5.74% SLOW
I128 I64 I64 2^16 1 21.940 us 4.41% 23.478 us 1.72% 1.538 us 7.01% SLOW
I128 I64 I64 2^20 1 38.405 us 1.84% 40.429 us 1.73% 2.024 us 5.27% SLOW
I128 I64 I64 2^24 1 237.242 us 0.51% 244.443 us 0.29% 7.200 us 3.04% SLOW
I128 I64 I64 2^28 1 3.453 ms 0.04% 3.554 ms 0.05% 101.410 us 2.94% SLOW
I128 I64 I64 2^16 0.201 21.417 us 1.50% 21.592 us 3.06% 0.175 us 0.82% SAME
I128 I64 I64 2^20 0.201 35.803 us 1.77% 36.301 us 2.63% 0.498 us 1.39% SAME
I128 I64 I64 2^24 0.201 192.410 us 0.62% 206.263 us 0.46% 13.854 us 7.20% SLOW
I128 I64 I64 2^28 0.201 2.741 ms 0.06% 2.965 ms 0.06% 224.481 us 8.19% SLOW
F32 I8 I32 2^16 1 21.145 us 3.09% 21.449 us 1.14% 0.304 us 1.44% SLOW
F32 I8 I32 2^20 1 27.741 us 1.79% 27.597 us 1.19% -0.144 us -0.52% SAME
F32 I8 I32 2^24 1 93.494 us 1.40% 90.001 us 1.01% -3.494 us -3.74% FAST
F32 I8 I32 2^28 1 1.070 ms 0.07% 1.041 ms 0.12% -29.489 us -2.76% FAST
F32 I8 I32 2^16 0.201 20.938 us 3.57% 21.119 us 3.47% 0.181 us 0.87% SAME
F32 I8 I32 2^20 0.201 27.570 us 1.22% 26.946 us 3.87% -0.623 us -2.26% FAST
F32 I8 I32 2^24 0.201 85.026 us 0.34% 79.356 us 1.20% -5.670 us -6.67% FAST
F32 I8 I32 2^28 0.201 960.065 us 0.11% 905.696 us 0.35% -54.369 us -5.66% FAST
F32 I8 I64 2^16 1 21.371 us 1.93% 21.432 us 1.53% 0.061 us 0.28% SAME
F32 I8 I64 2^20 1 27.669 us 1.66% 27.652 us 1.10% -0.017 us -0.06% SAME
F32 I8 I64 2^24 1 93.024 us 1.51% 92.428 us 0.99% -0.596 us -0.64% SAME
F32 I8 I64 2^28 1 1.067 ms 0.13% 1.108 ms 0.27% 41.029 us 3.84% SLOW
F32 I8 I64 2^16 0.201 21.282 us 2.70% 21.355 us 2.31% 0.073 us 0.34% SAME
F32 I8 I64 2^20 0.201 27.593 us 1.20% 27.207 us 3.27% -0.386 us -1.40% FAST
F32 I8 I64 2^24 0.201 85.242 us 0.71% 80.926 us 1.03% -4.317 us -5.06% FAST
F32 I8 I64 2^28 0.201 954.720 us 0.15% 906.812 us 0.22% -47.908 us -5.02% FAST
F32 I16 I32 2^16 1 21.413 us 1.83% 21.458 us 1.52% 0.045 us 0.21% SAME
F32 I16 I32 2^20 1 27.620 us 0.74% 27.617 us 1.15% -0.003 us -0.01% SAME
F32 I16 I32 2^24 1 99.849 us 1.36% 81.981 us 0.89% -17.869 us -17.90% FAST
F32 I16 I32 2^28 1 1.190 ms 0.12% 903.163 us 0.14% -287.252 us -24.13% FAST
F32 I16 I32 2^16 0.201 21.199 us 2.75% 20.396 us 6.37% -0.804 us -3.79% FAST
F32 I16 I32 2^20 0.201 26.719 us 4.10% 25.546 us 2.05% -1.174 us -4.39% FAST
F32 I16 I32 2^24 0.201 89.871 us 1.24% 72.257 us 1.27% -17.614 us -19.60% FAST
F32 I16 I32 2^28 0.201 1.054 ms 0.10% 760.596 us 0.11% -293.465 us -27.84% FAST
F32 I16 I64 2^16 1 21.378 us 1.88% 21.484 us 1.71% 0.107 us 0.50% SAME
F32 I16 I64 2^20 1 27.720 us 1.47% 27.720 us 1.47% 0.000 us 0.00% SAME
F32 I16 I64 2^24 1 98.665 us 0.84% 99.695 us 1.42% 1.030 us 1.04% SLOW
F32 I16 I64 2^28 1 1.176 ms 0.11% 1.207 ms 0.12% 31.088 us 2.64% SLOW
F32 I16 I64 2^16 0.201 21.182 us 3.27% 20.835 us 4.91% -0.347 us -1.64% SAME
F32 I16 I64 2^20 0.201 25.754 us 2.85% 25.625 us 2.21% -0.129 us -0.50% SAME
F32 I16 I64 2^24 0.201 89.115 us 0.74% 84.638 us 1.28% -4.477 us -5.02% FAST
F32 I16 I64 2^28 0.201 1.046 ms 0.07% 986.286 us 0.17% -59.248 us -5.67% FAST
F32 I32 I32 2^16 1 19.821 us 4.75% 21.245 us 2.11% 1.423 us 7.18% SLOW
F32 I32 I32 2^20 1 26.546 us 4.34% 27.667 us 39.99% 1.121 us 4.22% SAME
F32 I32 I32 2^24 1 93.460 us 1.51% 96.279 us 0.99% 2.819 us 3.02% SLOW
F32 I32 I32 2^28 1 1.115 ms 0.10% 1.160 ms 0.07% 44.870 us 4.02% SLOW
F32 I32 I32 2^16 0.201 19.229 us 3.22% 20.935 us 4.63% 1.707 us 8.88% SLOW
F32 I32 I32 2^20 0.201 25.567 us 0.77% 25.796 us 3.03% 0.228 us 0.89% SLOW
F32 I32 I32 2^24 0.201 82.973 us 0.34% 83.035 us 0.63% 0.062 us 0.08% SAME
F32 I32 I32 2^28 0.201 951.129 us 0.17% 956.660 us 0.15% 5.531 us 0.58% SLOW
F32 I32 I64 2^16 1 19.569 us 3.63% 21.321 us 1.81% 1.752 us 8.95% SLOW
F32 I32 I64 2^20 1 27.374 us 3.21% 27.790 us 1.69% 0.416 us 1.52% SAME
F32 I32 I64 2^24 1 94.792 us 0.80% 97.491 us 1.35% 2.699 us 2.85% SLOW
F32 I32 I64 2^28 1 1.127 ms 0.11% 1.182 ms 0.10% 55.409 us 4.92% SLOW
F32 I32 I64 2^16 0.201 19.325 us 1.91% 20.602 us 5.57% 1.277 us 6.61% SLOW
F32 I32 I64 2^20 0.201 25.601 us 1.24% 25.888 us 3.08% 0.287 us 1.12% SAME
F32 I32 I64 2^24 0.201 84.626 us 1.24% 84.200 us 1.37% -0.426 us -0.50% SAME
F32 I32 I64 2^28 0.201 974.806 us 0.13% 976.552 us 0.16% 1.746 us 0.18% SLOW
F32 I64 I32 2^16 1 20.220 us 5.05% 21.538 us 1.28% 1.318 us 6.52% SLOW
F32 I64 I32 2^20 1 27.655 us 1.28% 28.322 us 3.56% 0.666 us 2.41% SLOW
F32 I64 I32 2^24 1 119.376 us 0.71% 113.372 us 0.49% -6.003 us -5.03% FAST
F32 I64 I32 2^28 1 1.536 ms 0.08% 1.448 ms 0.09% -87.671 us -5.71% FAST
F32 I64 I32 2^16 0.201 19.426 us 2.11% 19.596 us 3.66% 0.170 us 0.88% SAME
F32 I64 I32 2^20 0.201 27.626 us 1.33% 27.594 us 1.17% -0.032 us -0.11% SAME
F32 I64 I32 2^24 0.201 105.424 us 0.66% 98.105 us 1.10% -7.319 us -6.94% FAST
F32 I64 I32 2^28 0.201 1.326 ms 0.12% 1.215 ms 0.10% -111.172 us -8.38% FAST
F32 I64 I64 2^16 1 19.928 us 4.79% 21.420 us 1.59% 1.492 us 7.49% SLOW
F32 I64 I64 2^20 1 27.843 us 1.79% 28.809 us 4.09% 0.966 us 3.47% SLOW
F32 I64 I64 2^24 1 119.651 us 0.47% 125.861 us 0.27% 6.211 us 5.19% SLOW
F32 I64 I64 2^28 1 1.535 ms 0.07% 1.652 ms 0.07% 116.296 us 7.58% SLOW
F32 I64 I64 2^16 0.201 19.449 us 2.32% 19.397 us 3.20% -0.052 us -0.26% SAME
F32 I64 I64 2^20 0.201 27.636 us 1.05% 27.605 us 1.23% -0.032 us -0.11% SAME
F32 I64 I64 2^24 0.201 104.806 us 1.08% 107.411 us 0.92% 2.605 us 2.49% SLOW
F32 I64 I64 2^28 0.201 1.321 ms 0.12% 1.363 ms 0.12% 41.688 us 3.16% SLOW
F64 I8 I32 2^16 1 21.407 us 6.84% 21.548 us 1.47% 0.141 us 0.66% SAME
F64 I8 I32 2^20 1 33.097 us 3.25% 31.597 us 1.65% -1.500 us -4.53% FAST
F64 I8 I32 2^24 1 126.986 us 0.96% 103.457 us 0.80% -23.529 us -18.53% FAST
F64 I8 I32 2^28 1 1.635 ms 0.07% 1.269 ms 0.10% -366.680 us -22.42% FAST
F64 I8 I32 2^16 0.201 19.896 us 3.70% 19.822 us 4.29% -0.073 us -0.37% SAME
F64 I8 I32 2^20 0.201 29.823 us 4.24% 28.560 us 3.38% -1.263 us -4.24% FAST
F64 I8 I32 2^24 0.201 111.081 us 0.84% 87.057 us 0.36% -24.023 us -21.63% FAST
F64 I8 I32 2^28 0.201 1.388 ms 0.10% 998.790 us 0.14% -388.900 us -28.02% FAST
F64 I8 I64 2^16 1 21.793 us 6.89% 21.610 us 2.40% -0.182 us -0.84% SAME
F64 I8 I64 2^20 1 32.848 us 3.40% 31.653 us 1.35% -1.194 us -3.64% FAST
F64 I8 I64 2^24 1 127.712 us 0.82% 105.714 us 0.58% -21.998 us -17.23% FAST
F64 I8 I64 2^28 1 1.644 ms 0.08% 1.297 ms 0.10% -346.824 us -21.10% FAST
F64 I8 I64 2^16 0.201 20.360 us 4.98% 19.704 us 3.89% -0.656 us -3.22% SAME
F64 I8 I64 2^20 0.201 30.530 us 3.84% 28.721 us 3.23% -1.808 us -5.92% FAST
F64 I8 I64 2^24 0.201 112.694 us 0.98% 88.916 us 0.49% -23.778 us -21.10% FAST
F64 I8 I64 2^28 0.201 1.431 ms 0.13% 1.024 ms 0.14% -407.295 us -28.46% FAST
F64 I16 I32 2^16 1 21.284 us 7.25% 22.248 us 3.76% 0.964 us 4.53% SLOW
F64 I16 I32 2^20 1 33.334 us 2.53% 31.372 us 2.25% -1.963 us -5.89% FAST
F64 I16 I32 2^24 1 130.082 us 0.29% 113.788 us 0.54% -16.295 us -12.53% FAST
F64 I16 I32 2^28 1 1.676 ms 0.07% 1.413 ms 0.09% -263.094 us -15.69% FAST
F64 I16 I32 2^16 0.201 20.105 us 4.28% 20.687 us 6.61% 0.581 us 2.89% SAME
F64 I16 I32 2^20 0.201 30.235 us 3.63% 28.410 us 2.94% -1.825 us -6.04% FAST
F64 I16 I32 2^24 0.201 112.825 us 0.96% 96.953 us 0.74% -15.872 us -14.07% FAST
F64 I16 I32 2^28 0.201 1.425 ms 0.15% 1.152 ms 0.13% -273.202 us -19.17% FAST
F64 I16 I64 2^16 1 21.558 us 7.28% 22.365 us 3.81% 0.807 us 3.74% SAME
F64 I16 I64 2^20 1 33.110 us 3.06% 31.416 us 2.12% -1.694 us -5.12% FAST
F64 I16 I64 2^24 1 130.261 us 0.44% 107.671 us 0.60% -22.591 us -17.34% FAST
F64 I16 I64 2^28 1 1.680 ms 0.07% 1.311 ms 0.07% -368.888 us -21.95% FAST
F64 I16 I64 2^16 0.201 20.159 us 4.74% 20.951 us 6.33% 0.792 us 3.93% SAME
F64 I16 I64 2^20 0.201 30.293 us 3.44% 28.483 us 3.02% -1.810 us -5.97% FAST
F64 I16 I64 2^24 0.201 113.401 us 0.60% 91.061 us 0.42% -22.341 us -19.70% FAST
F64 I16 I64 2^28 0.201 1.434 ms 0.12% 1.055 ms 0.13% -378.684 us -26.40% FAST
F64 I32 I32 2^16 1 20.408 us 5.63% 22.048 us 3.27% 1.641 us 8.04% SLOW
F64 I32 I32 2^20 1 31.259 us 2.72% 31.095 us 2.73% -0.164 us -0.52% SAME
F64 I32 I32 2^24 1 124.956 us 1.00% 128.538 us 0.64% 3.582 us 2.87% SLOW
F64 I32 I32 2^28 1 1.580 ms 0.11% 1.663 ms 0.08% 82.355 us 5.21% SLOW
F64 I32 I32 2^16 0.201 19.873 us 4.08% 19.973 us 4.88% 0.100 us 0.51% SAME
F64 I32 I32 2^20 0.201 28.106 us 2.73% 28.265 us 2.31% 0.160 us 0.57% SAME
F64 I32 I32 2^24 0.201 105.515 us 0.44% 106.854 us 0.94% 1.338 us 1.27% SLOW
F64 I32 I32 2^28 0.201 1.309 ms 0.14% 1.328 ms 0.13% 19.301 us 1.47% SLOW
F64 I32 I64 2^16 1 20.915 us 6.58% 22.516 us 4.05% 1.601 us 7.66% SLOW
F64 I32 I64 2^20 1 31.560 us 1.76% 31.547 us 1.76% -0.013 us -0.04% SAME
F64 I32 I64 2^24 1 126.265 us 1.00% 129.607 us 0.98% 3.343 us 2.65% SLOW
F64 I32 I64 2^28 1 1.590 ms 0.09% 1.679 ms 0.09% 88.947 us 5.59% SLOW
F64 I32 I64 2^16 0.201 20.155 us 4.40% 19.837 us 3.66% -0.318 us -1.58% SAME
F64 I32 I64 2^20 0.201 28.816 us 3.15% 28.830 us 3.13% 0.014 us 0.05% SAME
F64 I32 I64 2^24 0.201 106.383 us 1.03% 107.533 us 0.41% 1.150 us 1.08% SLOW
F64 I32 I64 2^28 0.201 1.324 ms 0.14% 1.345 ms 0.12% 20.558 us 1.55% SLOW
F64 I64 I32 2^16 1 19.926 us 3.26% 21.490 us 3.39% 1.564 us 7.85% SLOW
F64 I64 I32 2^20 1 33.258 us 2.44% 33.326 us 2.73% 0.069 us 0.21% SAME
F64 I64 I32 2^24 1 149.756 us 0.91% 152.888 us 0.58% 3.132 us 2.09% SLOW
F64 I64 I32 2^28 1 2.050 ms 0.06% 2.091 ms 0.07% 41.040 us 2.00% SLOW
F64 I64 I32 2^16 0.201 19.498 us 1.85% 19.514 us 3.23% 0.016 us 0.08% SAME
F64 I64 I32 2^20 0.201 29.626 us 3.98% 29.742 us 3.50% 0.116 us 0.39% SAME
F64 I64 I32 2^24 0.201 123.051 us 0.89% 126.065 us 0.42% 3.013 us 2.45% SLOW
F64 I64 I32 2^28 0.201 1.632 ms 0.10% 1.681 ms 0.09% 49.078 us 3.01% SLOW
F64 I64 I64 2^16 1 19.910 us 3.41% 21.700 us 4.49% 1.791 us 9.00% SLOW
F64 I64 I64 2^20 1 33.585 us 1.67% 33.749 us 1.71% 0.164 us 0.49% SAME
F64 I64 I64 2^24 1 151.342 us 0.54% 153.467 us 0.58% 2.125 us 1.40% SLOW
F64 I64 I64 2^28 1 2.060 ms 0.07% 2.107 ms 0.07% 47.283 us 2.30% SLOW
F64 I64 I64 2^16 0.201 19.743 us 3.25% 19.888 us 4.18% 0.145 us 0.74% SAME
F64 I64 I64 2^20 0.201 29.550 us 4.28% 30.309 us 3.41% 0.760 us 2.57% SAME
F64 I64 I64 2^24 0.201 123.798 us 0.78% 126.283 us 0.55% 2.485 us 2.01% SLOW
F64 I64 I64 2^28 0.201 1.637 ms 0.10% 1.689 ms 0.09% 52.524 us 3.21% SLOW

Summary

  • Total Matches: 448
    • Pass (diff <= min_noise): 149
    • Unknown (infinite noise): 0
    • Failure (diff > min_noise): 299

@github-actions

This comment was marked as outdated.

@pauleonix pauleonix force-pushed the device_merge_tma branch 2 times, most recently from 9363b0e to e210716 Compare September 19, 2025 06:16
@pauleonix

This comment was marked as resolved.

@github-actions

This comment has been minimized.

@pauleonix pauleonix force-pushed the device_merge_tma branch 2 times, most recently from 8209e0c to e3de734 Compare September 20, 2025 05:39
@pauleonix

This comment was marked as resolved.

@pauleonix

This comment was marked as resolved.

1 similar comment
@pauleonix

This comment was marked as resolved.

@github-actions

This comment has been minimized.

@pauleonix

This comment was marked as resolved.

@pauleonix pauleonix force-pushed the device_merge_tma branch 2 times, most recently from da9dc89 to 33cf65e Compare September 24, 2025 03:29
@pauleonix

This comment was marked as resolved.

@github-actions

This comment was marked as outdated.

@pauleonix

This comment was marked as resolved.

@github-actions
Copy link
Contributor

😬 CI Workflow Results

🟥 Finished in 9h 08m: Pass: 19%/185 | Total: 23h 08m | Max: 1h 02m

See results here.

@copy-pr-bot copy-pr-bot bot deleted the branch NVIDIA:pull-request/5780 September 30, 2025 08:29
@copy-pr-bot copy-pr-bot bot closed this Sep 30, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in CCCL Sep 30, 2025
@pauleonix
Copy link
Contributor Author

Resumed in #6077 as this PR was automatically closed due to the target branch being deleted (It should have been changed back to main before merging #5780. I can't reopen/edit the target anymore.).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[FEA] Optimize cub::DeviceMerge by using cub::detail::BlockLoadToShared

2 participants