Faster short string compression with terminator byte and AVX512 #48

XiangpengHao · 2024-10-26T22:44:41Z

Hi SpiralDB, I've had a great experience using the fsst lib and vortex -- thank you for building them.

I'm trying to make fsst even faster. Currently the fsst compress the vortex varbin array by iterating each string and compress them individually. This works well for long strings but not so much for short strings, especially strings that are shorter than 8 bytes -- in that case, every string compression will fallback to the slow path.

The original paper suggests to copy the short strings to a new buffer and add a terminator between strings (Section 5.2). With the new long buffer, we can compress faster even with scalar code (and potentially auto-vectorized code). The new long buffer also allows us to compress with AVX512.

I guess the first step is to add a terminator byte to the symbol table, as shown in the c++ implementation.

Would you happen to have plans to implement this or any thoughts on the approach? I’d love to hear your thoughts.

a10y · 2024-10-27T16:58:37Z

Hey @XiangpengHao , thanks for broaching this.

As you noted, the implementation right now is entirely based on the Section 5.1 "Predicated Scalar Compression" solution. This corresponds to the compressBulkC++ impl, however as you note we don't split the input segments into 511 byte chunks + terminator byte.

My reasoning at the time was that (a) comments in the C++ code mostly indicated that the 511-byte segmentation for the scalar fallback impl is mainly to keep consistent output format with the AVX512 variant, so that data compresses the same way across architectures, and (b) it seemed complicated.

I think the first step would be doing some benchmarking to show that the speedup from not falling back to the "slow path" loop is not outweighed by having to memcpy and prepare chunks of 511 bytes from the input.

I think if we do this, it'd end up greatly changing the structure of the Compressor, so I'd just wanna prove the idea out a bit more before we go that route. As you mentioned, it is a bridge we would need to cross anyway if we were to implement the AVX512 "meat grinder" compressor

XiangpengHao · 2024-10-29T14:37:35Z

Thank you for the comments @a10y. They make perfect sense to me!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster short string compression with terminator byte and AVX512 #48

Faster short string compression with terminator byte and AVX512 #48

XiangpengHao commented Oct 26, 2024

a10y commented Oct 27, 2024

XiangpengHao commented Oct 29, 2024

Faster short string compression with terminator byte and AVX512 #48

Faster short string compression with terminator byte and AVX512 #48

Comments

XiangpengHao commented Oct 26, 2024

a10y commented Oct 27, 2024

XiangpengHao commented Oct 29, 2024