-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-44081: [C++][Parquet] Fix reported metrics in parquet-arrow-reader-writer-benchmark #44082
Conversation
cc @boshek @austin3dickey for CB history breakage |
…reader-writer-benchmark 1. items/sec and bytes/sec were set to the same value in some benchmarks 2. bytes/sec was incorrectly computed for boolean columns
9076d07
to
726e7de
Compare
Also, note that in all cases, some of the reported figures were too optimistic (never too pessimistic). |
@@ -104,13 +107,28 @@ std::shared_ptr<ColumnDescriptor> MakeSchema(Repetition::type repetition) { | |||
repetition == Repetition::REPEATED); | |||
} | |||
|
|||
template <bool nullable, typename ParquetType> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So nullable is unused previously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as you can see.
Float16Type is not a physical type, just a logical type, this is different from other case. But this LGTM |
I know, but this was convenient :-) |
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit e0ac5d5. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 32 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…reader-writer-benchmark (apache#44082) ### Rationale for this change 1. items/sec and bytes/sec were set to the same value in some benchmarks 2. bytes/sec was incorrectly computed for boolean columns ### What changes are included in this PR? Fix parquet-arrow-reader-writer-benchmark to report correct metrics. #### Example (column writing) Before: ``` -------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------------------------------- BM_WriteColumn<false,Int32Type> 43138428 ns 43118609 ns 15 bytes_per_second=927.674Mi/s items_per_second=972.736M/s BM_WriteColumn<true,Int32Type> 150528627 ns 150480597 ns 5 bytes_per_second=265.815Mi/s items_per_second=278.727M/s BM_WriteColumn<false,Int64Type> 49243514 ns 49214955 ns 14 bytes_per_second=1.58742Gi/s items_per_second=1.70448G/s BM_WriteColumn<true,Int64Type> 151526550 ns 151472832 ns 5 bytes_per_second=528.148Mi/s items_per_second=553.803M/s BM_WriteColumn<false,DoubleType> 59101372 ns 59068058 ns 12 bytes_per_second=1.32263Gi/s items_per_second=1.42016G/s BM_WriteColumn<true,DoubleType> 159944872 ns 159895095 ns 4 bytes_per_second=500.328Mi/s items_per_second=524.632M/s BM_WriteColumn<false,BooleanType> 32855604 ns 32845322 ns 21 bytes_per_second=304.457Mi/s items_per_second=319.247M/s BM_WriteColumn<true,BooleanType> 150566118 ns 150528329 ns 5 bytes_per_second=66.4327Mi/s items_per_second=69.6597M/s ``` After: ``` Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------------------------------- BM_WriteColumn<false,Int32Type> 43919180 ns 43895926 ns 16 bytes_per_second=911.246Mi/s items_per_second=238.878M/s BM_WriteColumn<true,Int32Type> 153981290 ns 153929841 ns 5 bytes_per_second=259.859Mi/s items_per_second=68.1204M/s BM_WriteColumn<false,Int64Type> 49906105 ns 49860098 ns 14 bytes_per_second=1.56688Gi/s items_per_second=210.304M/s BM_WriteColumn<true,Int64Type> 154273499 ns 154202319 ns 5 bytes_per_second=518.799Mi/s items_per_second=68M/s BM_WriteColumn<false,DoubleType> 59789490 ns 59733498 ns 12 bytes_per_second=1.30789Gi/s items_per_second=175.542M/s BM_WriteColumn<true,DoubleType> 161235860 ns 161169670 ns 4 bytes_per_second=496.371Mi/s items_per_second=65.0604M/s BM_WriteColumn<false,BooleanType> 32962097 ns 32950864 ns 21 bytes_per_second=37.9353Mi/s items_per_second=318.224M/s BM_WriteColumn<true,BooleanType> 154103499 ns 154052873 ns 5 bytes_per_second=8.1141Mi/s items_per_second=68.066M/s ``` #### Example (column reading) Before: ``` --------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------------------------------- BM_ReadColumn<false,BooleanType>/-1/0 6456731 ns 6453510 ns 108 bytes_per_second=1.51323Gi/s items_per_second=1.62482G/s BM_ReadColumn<false,BooleanType>/1/20 19012505 ns 19006068 ns 36 bytes_per_second=526.148Mi/s items_per_second=551.706M/s BM_ReadColumn<true,BooleanType>/-1/1 58365426 ns 58251529 ns 12 bytes_per_second=171.669Mi/s items_per_second=180.008M/s BM_ReadColumn<true,BooleanType>/5/10 46498966 ns 46442191 ns 15 bytes_per_second=215.321Mi/s items_per_second=225.781M/s BM_ReadIndividualRowGroups 29617575 ns 29600557 ns 24 bytes_per_second=2.63931Gi/s items_per_second=2.83394G/s BM_ReadMultipleRowGroups 47416980 ns 47288951 ns 15 bytes_per_second=1.65208Gi/s items_per_second=1.7739G/s BM_ReadMultipleRowGroupsGenerator 29741012 ns 29722112 ns 24 bytes_per_second=2.62851Gi/s items_per_second=2.82235G/s ``` After: ``` --------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------------------------------- BM_ReadColumn<false,BooleanType>/-1/0 6438249 ns 6435159 ns 109 bytes_per_second=194.245Mi/s items_per_second=1.62945G/s BM_ReadColumn<false,BooleanType>/1/20 19427495 ns 19419378 ns 37 bytes_per_second=64.3687Mi/s items_per_second=539.964M/s BM_ReadColumn<true,BooleanType>/-1/1 58342877 ns 58298236 ns 12 bytes_per_second=21.4415Mi/s items_per_second=179.864M/s BM_ReadColumn<true,BooleanType>/5/10 46591584 ns 46532288 ns 15 bytes_per_second=26.8631Mi/s items_per_second=225.344M/s BM_ReadIndividualRowGroups 30039049 ns 30021676 ns 23 bytes_per_second=2.60229Gi/s items_per_second=349.273M/s BM_ReadMultipleRowGroups 47877663 ns 47650438 ns 15 bytes_per_second=1.63954Gi/s items_per_second=220.056M/s BM_ReadMultipleRowGroupsGenerator 30377987 ns 30360019 ns 23 bytes_per_second=2.57329Gi/s items_per_second=345.381M/s ``` ### Are these changes tested? Manually by running benchmarks. ### Are there any user-facing changes? No, but this breaks historical comparisons in continuous benchmarking. * GitHub Issue: apache#44081 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Rationale for this change
What changes are included in this PR?
Fix parquet-arrow-reader-writer-benchmark to report correct metrics.
Example (column writing)
Before:
After:
Example (column reading)
Before:
After:
Are these changes tested?
Manually by running benchmarks.
Are there any user-facing changes?
No, but this breaks historical comparisons in continuous benchmarking.