You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi SpiralDB, I've had a great experience using the fsst lib and vortex -- thank you for building them.
I'm trying to make fsst even faster. Currently the fsst compress the vortex varbin array by iterating each string and compress them individually. This works well for long strings but not so much for short strings, especially strings that are shorter than 8 bytes -- in that case, every string compression will fallback to the slow path.
The original paper suggests to copy the short strings to a new buffer and add a terminator between strings (Section 5.2). With the new long buffer, we can compress faster even with scalar code (and potentially auto-vectorized code). The new long buffer also allows us to compress with AVX512.
I guess the first step is to add a terminator byte to the symbol table, as shown in the c++ implementation.
Would you happen to have plans to implement this or any thoughts on the approach? I’d love to hear your thoughts.
The text was updated successfully, but these errors were encountered:
As you noted, the implementation right now is entirely based on the Section 5.1 "Predicated Scalar Compression" solution. This corresponds to the compressBulkC++ impl, however as you note we don't split the input segments into 511 byte chunks + terminator byte.
My reasoning at the time was that (a) comments in the C++ code mostly indicated that the 511-byte segmentation for the scalar fallback impl is mainly to keep consistent output format with the AVX512 variant, so that data compresses the same way across architectures, and (b) it seemed complicated.
I think the first step would be doing some benchmarking to show that the speedup from not falling back to the "slow path" loop is not outweighed by having to memcpy and prepare chunks of 511 bytes from the input.
I think if we do this, it'd end up greatly changing the structure of the Compressor, so I'd just wanna prove the idea out a bit more before we go that route. As you mentioned, it is a bridge we would need to cross anyway if we were to implement the AVX512 "meat grinder" compressor
Hi SpiralDB, I've had a great experience using the fsst lib and vortex -- thank you for building them.
I'm trying to make fsst even faster. Currently the fsst compress the vortex varbin array by iterating each string and compress them individually. This works well for long strings but not so much for short strings, especially strings that are shorter than 8 bytes -- in that case, every string compression will fallback to the slow path.
The original paper suggests to copy the short strings to a new buffer and add a terminator between strings (Section 5.2). With the new long buffer, we can compress faster even with scalar code (and potentially auto-vectorized code). The new long buffer also allows us to compress with AVX512.
I guess the first step is to add a terminator byte to the symbol table, as shown in the c++ implementation.
Would you happen to have plans to implement this or any thoughts on the approach? I’d love to hear your thoughts.
The text was updated successfully, but these errors were encountered: