Skip to content

v1.9.13

Compare
Choose a tag to compare
@klauspost klauspost released this 03 Aug 16:14
· 87 commits to master since this release
7bd2279

Wider AVX2 loops and less usage. (#162)

  • Experiment with 64 bytes/loop AVX2
  • Only reduce when doing 64.
  • Use no more than 8 goroutines for avx2 codegen.
name                         old speed      new speed      delta
Encode10x2x10000-32          33.3GB/s ± 0%  37.5GB/s ± 1%  +12.49%   (p=0.000 n=9+10)
Encode100x20x10000-32        3.79GB/s ± 5%  3.77GB/s ± 5%     ~     (p=0.853 n=10+10)
Encode17x3x1M-32             78.2GB/s ± 1%  76.0GB/s ± 6%     ~     (p=0.123 n=10+10)
Encode10x4x16M-32            28.3GB/s ± 0%  27.7GB/s ± 2%   -2.32%   (p=0.000 n=8+10)
Encode5x2x1M-32               112GB/s ± 1%   113GB/s ± 1%     ~     (p=0.796 n=10+10)
Encode10x2x1M-32              149GB/s ± 1%   129GB/s ± 3%  -13.24%   (p=0.000 n=9+10)
Encode10x4x1M-32             99.1GB/s ± 1%  91.5GB/s ± 3%   -7.74%  (p=0.000 n=10+10)
Encode50x20x1M-32            19.7GB/s ± 1%  19.8GB/s ± 1%     ~      (p=0.447 n=9+10)
Encode17x3x16M-32            33.4GB/s ± 0%  33.3GB/s ± 1%   -0.46%   (p=0.043 n=10+9)
Encode_8x4x8M-32             30.1GB/s ± 1%  29.4GB/s ± 3%   -2.31%  (p=0.000 n=10+10)
Encode_12x4x12M-32           30.6GB/s ± 0%  30.5GB/s ± 0%     ~      (p=0.720 n=10+9)
Encode_16x4x16M-32           31.5GB/s ± 0%  31.5GB/s ± 0%     ~      (p=0.497 n=10+9)
Encode_16x4x32M-32           31.9GB/s ± 0%  31.5GB/s ± 4%     ~     (p=0.165 n=10+10)
Encode_16x4x64M-32           32.4GB/s ± 0%  32.3GB/s ± 0%     ~       (p=0.321 n=9+8)
Encode_8x5x8M-32             28.4GB/s ± 0%  28.4GB/s ± 1%     ~      (p=0.237 n=10+8)
Encode_8x6x8M-32             27.0GB/s ± 0%  27.2GB/s ± 2%     ~     (p=0.075 n=10+10)
Encode_8x7x8M-32             26.0GB/s ± 1%  25.8GB/s ± 1%   -0.53%   (p=0.003 n=9+10)
Encode_8x9x8M-32             24.6GB/s ± 1%  24.4GB/s ± 1%   -0.63%  (p=0.000 n=10+10)
Encode_8x10x8M-32            23.7GB/s ± 1%  23.7GB/s ± 0%   +0.32%   (p=0.035 n=10+9)
Encode_8x11x8M-32            23.0GB/s ± 1%  22.8GB/s ± 0%   -0.59%    (p=0.000 n=9+8)
Encode_8x8x05M-32            66.4GB/s ± 1%  64.2GB/s ± 1%   -3.32%  (p=0.000 n=10+10)
Encode_8x8x1M-32             56.7GB/s ± 0%  75.7GB/s ± 2%  +33.55%    (p=0.000 n=9+9)
Encode_8x8x8M-32             24.9GB/s ± 0%  24.9GB/s ± 1%     ~      (p=0.146 n=8+10)
Encode_8x8x32M-32            23.8GB/s ± 0%  23.4GB/s ± 0%   -1.42%   (p=0.000 n=9+10)
Encode_24x8x24M-32           29.9GB/s ± 0%  29.9GB/s ± 0%     ~      (p=0.278 n=10+9)
Encode_24x8x48M-32           30.7GB/s ± 1%  30.7GB/s ± 0%     ~       (p=0.351 n=9+7)
StreamEncode10x2x10000-32    15.5GB/s ± 1%  16.5GB/s ± 0%   +6.53%   (p=0.000 n=10+9)
StreamEncode100x20x10000-32  2.09GB/s ± 1%  2.06GB/s ± 2%   -1.78%  (p=0.000 n=10+10)
StreamEncode17x3x1M-32       12.2GB/s ± 2%  12.3GB/s ± 1%   +1.19%   (p=0.008 n=10+9)
StreamEncode10x4x16M-32      8.68GB/s ± 0%  9.47GB/s ± 1%   +9.05%   (p=0.000 n=8+10)
StreamEncode5x2x1M-32        12.3GB/s ± 1%  13.2GB/s ± 1%   +7.61%  (p=0.000 n=10+10)
StreamEncode10x2x1M-32       11.5GB/s ± 4%  13.3GB/s ± 2%  +15.15%   (p=0.000 n=10+7)