Releases: klauspost/reedsolomon
Releases · klauspost/reedsolomon
BUGFIX: Fixes v1.11.2 Leopard GF8 GFNI code
What's Changed
- Leopard GF8: Write directly to output by @klauspost in #226
- Disable broken Leopard GF8 GFNI code by @klauspost in #229
Full Changelog: v1.11.2...v1.11.3
v1.11.2
What's Changed
- Add Leopard GF8 AVX512 by @klauspost in #222
- Add AVX2 xor by @klauspost in #223
- Add AVX512 GFNI processing by @klauspost in #224
- Leopard GF8 GFNI by @klauspost in #225
Full Changelog: v1.11.1...v1.11.2
v1.11.1
What's Changed
- Implement 8-bit leopard variant by @elias-orijtech in #209
- Fix incorrect leopardFF16.TotalShards by @elias-orijtech in #210
- Use assembly for fftDIT28 mulAdd by @klauspost in #214
- Add 8 bit errorfield by @klauspost in #213
- Add ARM64+PPC64LE leopard 8 assembly. by @klauspost in #212
- Add fftDIT and ifftDIT assembly by @klauspost in #218
- Add inversion cache for leopard GF8 by @klauspost in #219
- Split leopard gf8 input by @klauspost in #221
Full Changelog: v1.11.0...v1.11.1
v1.11.0
What's Changed
- Add an efficient implementation of shard counts up to 65536 by @elias-orijtech in #191
- Implement jerasure algorithm of matrix generation for interoperability by @vitalif in #200
- Add GF16 AVX2, AVX512 and SSSE3 by @klauspost in #193
- Add GF16 Split/Join by @klauspost in #194
- Add Leopard bitfield by @klauspost in #208
- Improve fwht speed by @klauspost in #198
- Decode benchmark by @elias-orijtech in #197
- Unroll ifftDIT4/fftDIT4 branches. by @klauspost in #204
- Simplify function selection by @klauspost in #206
- Update interfaces & tests by @klauspost in #203
- Upgrade Go CICD by @klauspost in #201
New Contributors
- @elias-orijtech made their first contribution in #191
Full Changelog: v1.10.0...v1.11.0
v1.10.0
What's Changed
- Publish withSSE/withAVX options by @vitalif in #186
- Add custom coding matrix support by @vitalif in #187
- Implement ReconstructSome() to reconstruct only specific data shards by @vitalif in #189
- Use VPTERNLOGD on GOAMD64=v4 in AVX2 by @klauspost in #182
- Upgrade cpuid to avoid long startup time on Xen guests by @klauspost in #190
New Contributors
Full Changelog: v1.9.16...v1.10.0
v1.9.16
What's Changed
- avx2: Improve speed when > 10 input or output shards. by @klauspost in #174
Full Changelog: v1.9.15...v1.9.16
v1.9.15
What's Changed
- fix: avoid the padding of last data shard is not zero by @jiangfucheng in #173
- Unroll pure go xor loop by @klauspost in #172
New Contributors
- @jiangfucheng made their first contribution in #173
Full Changelog: v1.9.14...v1.9.15
v1.9.14
What's Changed
- Add progressive erasure shard encoding by @klauspost in #170
Full Changelog: v1.9.13...v1.9.14
v1.9.13
Wider AVX2 loops and less usage. (#162)
- Experiment with 64 bytes/loop AVX2
- Only reduce when doing 64.
- Use no more than 8 goroutines for avx2 codegen.
name old speed new speed delta
Encode10x2x10000-32 33.3GB/s ± 0% 37.5GB/s ± 1% +12.49% (p=0.000 n=9+10)
Encode100x20x10000-32 3.79GB/s ± 5% 3.77GB/s ± 5% ~ (p=0.853 n=10+10)
Encode17x3x1M-32 78.2GB/s ± 1% 76.0GB/s ± 6% ~ (p=0.123 n=10+10)
Encode10x4x16M-32 28.3GB/s ± 0% 27.7GB/s ± 2% -2.32% (p=0.000 n=8+10)
Encode5x2x1M-32 112GB/s ± 1% 113GB/s ± 1% ~ (p=0.796 n=10+10)
Encode10x2x1M-32 149GB/s ± 1% 129GB/s ± 3% -13.24% (p=0.000 n=9+10)
Encode10x4x1M-32 99.1GB/s ± 1% 91.5GB/s ± 3% -7.74% (p=0.000 n=10+10)
Encode50x20x1M-32 19.7GB/s ± 1% 19.8GB/s ± 1% ~ (p=0.447 n=9+10)
Encode17x3x16M-32 33.4GB/s ± 0% 33.3GB/s ± 1% -0.46% (p=0.043 n=10+9)
Encode_8x4x8M-32 30.1GB/s ± 1% 29.4GB/s ± 3% -2.31% (p=0.000 n=10+10)
Encode_12x4x12M-32 30.6GB/s ± 0% 30.5GB/s ± 0% ~ (p=0.720 n=10+9)
Encode_16x4x16M-32 31.5GB/s ± 0% 31.5GB/s ± 0% ~ (p=0.497 n=10+9)
Encode_16x4x32M-32 31.9GB/s ± 0% 31.5GB/s ± 4% ~ (p=0.165 n=10+10)
Encode_16x4x64M-32 32.4GB/s ± 0% 32.3GB/s ± 0% ~ (p=0.321 n=9+8)
Encode_8x5x8M-32 28.4GB/s ± 0% 28.4GB/s ± 1% ~ (p=0.237 n=10+8)
Encode_8x6x8M-32 27.0GB/s ± 0% 27.2GB/s ± 2% ~ (p=0.075 n=10+10)
Encode_8x7x8M-32 26.0GB/s ± 1% 25.8GB/s ± 1% -0.53% (p=0.003 n=9+10)
Encode_8x9x8M-32 24.6GB/s ± 1% 24.4GB/s ± 1% -0.63% (p=0.000 n=10+10)
Encode_8x10x8M-32 23.7GB/s ± 1% 23.7GB/s ± 0% +0.32% (p=0.035 n=10+9)
Encode_8x11x8M-32 23.0GB/s ± 1% 22.8GB/s ± 0% -0.59% (p=0.000 n=9+8)
Encode_8x8x05M-32 66.4GB/s ± 1% 64.2GB/s ± 1% -3.32% (p=0.000 n=10+10)
Encode_8x8x1M-32 56.7GB/s ± 0% 75.7GB/s ± 2% +33.55% (p=0.000 n=9+9)
Encode_8x8x8M-32 24.9GB/s ± 0% 24.9GB/s ± 1% ~ (p=0.146 n=8+10)
Encode_8x8x32M-32 23.8GB/s ± 0% 23.4GB/s ± 0% -1.42% (p=0.000 n=9+10)
Encode_24x8x24M-32 29.9GB/s ± 0% 29.9GB/s ± 0% ~ (p=0.278 n=10+9)
Encode_24x8x48M-32 30.7GB/s ± 1% 30.7GB/s ± 0% ~ (p=0.351 n=9+7)
StreamEncode10x2x10000-32 15.5GB/s ± 1% 16.5GB/s ± 0% +6.53% (p=0.000 n=10+9)
StreamEncode100x20x10000-32 2.09GB/s ± 1% 2.06GB/s ± 2% -1.78% (p=0.000 n=10+10)
StreamEncode17x3x1M-32 12.2GB/s ± 2% 12.3GB/s ± 1% +1.19% (p=0.008 n=10+9)
StreamEncode10x4x16M-32 8.68GB/s ± 0% 9.47GB/s ± 1% +9.05% (p=0.000 n=8+10)
StreamEncode5x2x1M-32 12.3GB/s ± 1% 13.2GB/s ± 1% +7.61% (p=0.000 n=10+10)
StreamEncode10x2x1M-32 11.5GB/s ± 4% 13.3GB/s ± 2% +15.15% (p=0.000 n=10+7)