Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-38042: [C++][Benchmark] Add non-stream Codec Compression/Decompression #38067

Merged
merged 5 commits into from
Oct 25, 2023

Conversation

mapleFU
Copy link
Member

@mapleFU mapleFU commented Oct 6, 2023

Rationale for this change

Currently, we will enable compression benchmark with ARROW_WITH_BENCHMARKS_REFERENCE

Note that it only has benchmark for compressor ( make by Codec::MakeCompressor() ) and decompressor ( make by Codec::MakeDecompressor ). However, Parquet uses Codec to encode and decode. So, I'd like to add benchmarks that use Codec directly.

What changes are included in this PR?

Add benchmark for direct compression and decompression

Are these changes tested?

no need

Are there any user-facing changes?

no

@github-actions
Copy link

github-actions bot commented Oct 6, 2023

⚠️ GitHub issue #38042 has been automatically assigned in GitHub to PR creator.

@mapleFU mapleFU requested review from kou and pitrou October 20, 2023 06:14
@mapleFU
Copy link
Member Author

mapleFU commented Oct 20, 2023

@pitrou @kou Since parquet uses Compression/Decompression in Codec, I've add group of test here. Would you mind take a look?

@kou kou changed the title GH-38042: [C++] Benchmark: Add benchmark for non-stream Codec Compression/Decompression GH-38042: [C++][Benchmark] Add non-stream Codec Compression/Decompression Oct 20, 2023
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

cpp/src/arrow/util/compression_benchmark.cc Outdated Show resolved Hide resolved
cpp/src/arrow/util/compression_benchmark.cc Outdated Show resolved Hide resolved
cpp/src/arrow/util/compression_benchmark.cc Outdated Show resolved Hide resolved
cpp/src/arrow/util/compression_benchmark.cc Outdated Show resolved Hide resolved
Comment on lines +254 to +256
BENCHMARK_TEMPLATE(ReferenceCompression, Compression::LZ4_FRAME);
BENCHMARK_TEMPLATE(ReferenceStreamingDecompression, Compression::LZ4_FRAME);
BENCHMARK_TEMPLATE(ReferenceDecompression, Compression::LZ4_FRAME);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is LZ4_FRAME OK?
It seems that Parquet doesn't use LZ4_FRAME.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can even benchmark both LZ4 variants.

Copy link
Member Author

@mapleFU mapleFU Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that Parquet doesn't use LZ4_FRAME

Aha I remember parquet-mr first implement LZ4. And arrow implement a different version ( LZ4_FRAME ). LZ4 stores an extra-length here.

Maybe apache/parquet-format#168 helps

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I don't think they have too many differences...

Currently I didn't add LZ4. But feel free to add if neccesssary

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Oct 20, 2023
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@kou kou merged commit 3be5e60 into apache:main Oct 25, 2023
34 of 35 checks passed
@kou kou removed the awaiting merge Awaiting merge label Oct 25, 2023
@github-actions github-actions bot added the awaiting merge Awaiting merge label Oct 25, 2023
@pitrou
Copy link
Member

pitrou commented Oct 25, 2023

So we could have added LZ4 and Snappy here. @mapleFU Would you like to do that as a followup PR?

@mapleFU
Copy link
Member Author

mapleFU commented Oct 25, 2023

Let me rush it :-)

(Just curiously, is it related to #38389 ) ?

@pitrou
Copy link
Member

pitrou commented Oct 25, 2023

It's just reasonable to benchmark all available codecs, not a subset of them.

@mapleFU
Copy link
Member Author

mapleFU commented Oct 25, 2023

@pitrou added #38453

JerAguilon pushed a commit to JerAguilon/arrow that referenced this pull request Oct 25, 2023
…ompression (apache#38067)

### Rationale for this change

Currently, we will enable compression benchmark with ARROW_WITH_BENCHMARKS_REFERENCE

Note that it only has benchmark for compressor ( make by Codec::MakeCompressor() ) and decompressor ( make by Codec::MakeDecompressor ). However, Parquet uses Codec to encode and decode. So, I'd like to add benchmarks that use Codec directly.

### What changes are included in this PR?

Add benchmark for direct compression and decompression

### Are these changes tested?

no need

### Are there any user-facing changes?

no

* Closes: apache#38042

Authored-by: mwish <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 3be5e60.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them.

loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…ompression (apache#38067)

### Rationale for this change

Currently, we will enable compression benchmark with ARROW_WITH_BENCHMARKS_REFERENCE

Note that it only has benchmark for compressor ( make by Codec::MakeCompressor() ) and decompressor ( make by Codec::MakeDecompressor ). However, Parquet uses Codec to encode and decode. So, I'd like to add benchmarks that use Codec directly.

### What changes are included in this PR?

Add benchmark for direct compression and decompression

### Are these changes tested?

no need

### Are there any user-facing changes?

no

* Closes: apache#38042

Authored-by: mwish <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…ompression (apache#38067)

### Rationale for this change

Currently, we will enable compression benchmark with ARROW_WITH_BENCHMARKS_REFERENCE

Note that it only has benchmark for compressor ( make by Codec::MakeCompressor() ) and decompressor ( make by Codec::MakeDecompressor ). However, Parquet uses Codec to encode and decode. So, I'd like to add benchmarks that use Codec directly.

### What changes are included in this PR?

Add benchmark for direct compression and decompression

### Are these changes tested?

no need

### Are there any user-facing changes?

no

* Closes: apache#38042

Authored-by: mwish <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++][Benchmark] Add non-stream Codec Compression/Decompression cases
3 participants