-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do pre-decompress gzip with IAA hardware #5718
Comments
Regarding checking "whether IAA can decompress the data", the logic is like that: Otherwise, fallback to the Velox SW path to decompress the data. Hi @yaqi-zhao , be free add your comments if the above description is inaccurate. Thanks! |
@george-gu-2021 The zlib header and history buffer are checked in one step. We just read the compressed data header to check if it is Zlib header format and then read window size form the header. |
I am curious how you implement |
Hi, @Yuhta Actually to avoid affect the current |
@yaqi-zhao Yes if you can implement the logic inside |
@yaqi-zhao Also I would recommend rebase your work on #5914 , so that |
Description
The Intel® In-Memory Analytics Accelerator (Intel® IAA) is a hardware accelerator that provides very high
throughput compression and decompression combined with primitive analytic functions. It is available in the newest generation of Intel® Xeon® Scalable processors ("Sapphire Rapids").
We can offload the GZip (window size is 4KB) decompression to the IAA hardware and save the CPU bandwidth. Here is a description of how to offload the GZip decompression to the IAA hardware.
We use IAA to do pre-decompress work in Velox native parquet reader and the get a good performance in the TPCH alike Benchmark. The Velox early test data are as follows (the result is preliminary and varies in different environments, just for reference rather than commitment):
About 2X performance gain compared with the current gzip(window size is 4KB) solution
About 20
40% performance gain compared with ZSTD compressed file10% performance compared with snappy compressed file.About 5
Implementation Brief
To make minimize the effection to the current code, we add a new IAAPageReader class. In the ParquetReader, it will check if the IAA can be used to decompress the data. If is true, then the ParquetReader will use IAAPageReader to to the next work. The criteria of the checking in ParquetReader is:
Here is a flow chart of the code change:
The text was updated successfully, but these errors were encountered: