Releases · klauspost/reedsolomon · GitHub

28 Nov 12:47

klauspost

BUGFIX: Fixes v1.11.2 Leopard GF8 GFNI code

What's Changed

Leopard GF8: Write directly to output by @klauspost in #226
Disable broken Leopard GF8 GFNI code by @klauspost in #229

Full Changelog: v1.11.2...v1.11.3

Contributors

klauspost

Assets 2

18 Nov 08:28

klauspost

v1.11.2

What's Changed

Add Leopard GF8 AVX512 by @klauspost in #222
Add AVX2 xor by @klauspost in #223
Add AVX512 GFNI processing by @klauspost in #224
Leopard GF8 GFNI by @klauspost in #225

Full Changelog: v1.11.1...v1.11.2

Contributors

klauspost

Assets 2

04 Oct 14:26

klauspost

v1.11.1

What's Changed

Implement 8-bit leopard variant by @elias-orijtech in #209
Fix incorrect leopardFF16.TotalShards by @elias-orijtech in #210
Use assembly for fftDIT28 mulAdd by @klauspost in #214
Add 8 bit errorfield by @klauspost in #213
Add ARM64+PPC64LE leopard 8 assembly. by @klauspost in #212
Add fftDIT and ifftDIT assembly by @klauspost in #218
Add inversion cache for leopard GF8 by @klauspost in #219
Split leopard gf8 input by @klauspost in #221

Full Changelog: v1.11.0...v1.11.1

Contributors

klauspost and elias-orijtech

Assets 2

12 Sep 06:28

klauspost

v1.11.0

What's Changed

Add an efficient implementation of shard counts up to 65536 by @elias-orijtech in #191
Implement jerasure algorithm of matrix generation for interoperability by @vitalif in #200
Add GF16 AVX2, AVX512 and SSSE3 by @klauspost in #193
Add GF16 Split/Join by @klauspost in #194
Add Leopard bitfield by @klauspost in #208
Improve fwht speed by @klauspost in #198
Decode benchmark by @elias-orijtech in #197
Unroll ifftDIT4/fftDIT4 branches. by @klauspost in #204
Simplify function selection by @klauspost in #206
Update interfaces & tests by @klauspost in #203
Upgrade Go CICD by @klauspost in #201

New Contributors

@elias-orijtech made their first contribution in #191

Full Changelog: v1.10.0...v1.11.0

Contributors

vitalif, klauspost, and elias-orijtech

Assets 2

21 Jun 17:06

klauspost

v1.10.0

What's Changed

Publish withSSE/withAVX options by @vitalif in #186
Add custom coding matrix support by @vitalif in #187
Implement ReconstructSome() to reconstruct only specific data shards by @vitalif in #189
Use VPTERNLOGD on GOAMD64=v4 in AVX2 by @klauspost in #182
Upgrade cpuid to avoid long startup time on Xen guests by @klauspost in #190

New Contributors

@vitalif made their first contribution in #186

Full Changelog: v1.9.16...v1.10.0

Contributors

vitalif and klauspost

Assets 2

08 Feb 15:50

klauspost

v1.9.16

What's Changed

avx2: Improve speed when > 10 input or output shards. by @klauspost in #174

Full Changelog: v1.9.15...v1.9.16

Contributors

klauspost

Assets 2

02 Dec 15:41

klauspost

v1.9.15

What's Changed

fix: avoid the padding of last data shard is not zero by @jiangfucheng in #173
Unroll pure go xor loop by @klauspost in #172

New Contributors

@jiangfucheng made their first contribution in #173

Full Changelog: v1.9.14...v1.9.15

Contributors

klauspost and jiangfucheng

Assets 2

29 Oct 13:05

klauspost

v1.9.14

What's Changed

Add progressive erasure shard encoding by @klauspost in #170

Full Changelog: v1.9.13...v1.9.14

Contributors

klauspost

Assets 2

03 Aug 16:14

klauspost

v1.9.13

Wider AVX2 loops and less usage. (#162)

Experiment with 64 bytes/loop AVX2
Only reduce when doing 64.
Use no more than 8 goroutines for avx2 codegen.

name                         old speed      new speed      delta
Encode10x2x10000-32          33.3GB/s ± 0%  37.5GB/s ± 1%  +12.49%   (p=0.000 n=9+10)
Encode100x20x10000-32        3.79GB/s ± 5%  3.77GB/s ± 5%     ~     (p=0.853 n=10+10)
Encode17x3x1M-32             78.2GB/s ± 1%  76.0GB/s ± 6%     ~     (p=0.123 n=10+10)
Encode10x4x16M-32            28.3GB/s ± 0%  27.7GB/s ± 2%   -2.32%   (p=0.000 n=8+10)
Encode5x2x1M-32               112GB/s ± 1%   113GB/s ± 1%     ~     (p=0.796 n=10+10)
Encode10x2x1M-32              149GB/s ± 1%   129GB/s ± 3%  -13.24%   (p=0.000 n=9+10)
Encode10x4x1M-32             99.1GB/s ± 1%  91.5GB/s ± 3%   -7.74%  (p=0.000 n=10+10)
Encode50x20x1M-32            19.7GB/s ± 1%  19.8GB/s ± 1%     ~      (p=0.447 n=9+10)
Encode17x3x16M-32            33.4GB/s ± 0%  33.3GB/s ± 1%   -0.46%   (p=0.043 n=10+9)
Encode_8x4x8M-32             30.1GB/s ± 1%  29.4GB/s ± 3%   -2.31%  (p=0.000 n=10+10)
Encode_12x4x12M-32           30.6GB/s ± 0%  30.5GB/s ± 0%     ~      (p=0.720 n=10+9)
Encode_16x4x16M-32           31.5GB/s ± 0%  31.5GB/s ± 0%     ~      (p=0.497 n=10+9)
Encode_16x4x32M-32           31.9GB/s ± 0%  31.5GB/s ± 4%     ~     (p=0.165 n=10+10)
Encode_16x4x64M-32           32.4GB/s ± 0%  32.3GB/s ± 0%     ~       (p=0.321 n=9+8)
Encode_8x5x8M-32             28.4GB/s ± 0%  28.4GB/s ± 1%     ~      (p=0.237 n=10+8)
Encode_8x6x8M-32             27.0GB/s ± 0%  27.2GB/s ± 2%     ~     (p=0.075 n=10+10)
Encode_8x7x8M-32             26.0GB/s ± 1%  25.8GB/s ± 1%   -0.53%   (p=0.003 n=9+10)
Encode_8x9x8M-32             24.6GB/s ± 1%  24.4GB/s ± 1%   -0.63%  (p=0.000 n=10+10)
Encode_8x10x8M-32            23.7GB/s ± 1%  23.7GB/s ± 0%   +0.32%   (p=0.035 n=10+9)
Encode_8x11x8M-32            23.0GB/s ± 1%  22.8GB/s ± 0%   -0.59%    (p=0.000 n=9+8)
Encode_8x8x05M-32            66.4GB/s ± 1%  64.2GB/s ± 1%   -3.32%  (p=0.000 n=10+10)
Encode_8x8x1M-32             56.7GB/s ± 0%  75.7GB/s ± 2%  +33.55%    (p=0.000 n=9+9)
Encode_8x8x8M-32             24.9GB/s ± 0%  24.9GB/s ± 1%     ~      (p=0.146 n=8+10)
Encode_8x8x32M-32            23.8GB/s ± 0%  23.4GB/s ± 0%   -1.42%   (p=0.000 n=9+10)
Encode_24x8x24M-32           29.9GB/s ± 0%  29.9GB/s ± 0%     ~      (p=0.278 n=10+9)
Encode_24x8x48M-32           30.7GB/s ± 1%  30.7GB/s ± 0%     ~       (p=0.351 n=9+7)
StreamEncode10x2x10000-32    15.5GB/s ± 1%  16.5GB/s ± 0%   +6.53%   (p=0.000 n=10+9)
StreamEncode100x20x10000-32  2.09GB/s ± 1%  2.06GB/s ± 2%   -1.78%  (p=0.000 n=10+10)
StreamEncode17x3x1M-32       12.2GB/s ± 2%  12.3GB/s ± 1%   +1.19%   (p=0.008 n=10+9)
StreamEncode10x4x16M-32      8.68GB/s ± 0%  9.47GB/s ± 1%   +9.05%   (p=0.000 n=8+10)
StreamEncode5x2x1M-32        12.3GB/s ± 1%  13.2GB/s ± 1%   +7.61%  (p=0.000 n=10+10)
StreamEncode10x2x1M-32       11.5GB/s ± 4%  13.3GB/s ± 2%  +15.15%   (p=0.000 n=10+7)

Assets 2

11 Mar 10:50

klauspost

v1.9.12

Allow zero parity shards (#161)

Assets 2