-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-44052: [C++][Compute] Reduce the complexity of row segmenter #44053
Conversation
The benchmark number, ran on my Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz, shows a maximum Before: ----------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
----------------------------------------------------------------------------------------------------------------------
BenchmarkRowSegmenter/Rows:32768/Segments:1/SegmentKeys:0 24934 ns 24906 ns 27871 bytes_per_second=9.80236Gi/s items_per_second=1.31565G/s
BenchmarkRowSegmenter/Rows:32768/Segments:4/SegmentKeys:0 24729 ns 24687 ns 27358 bytes_per_second=9.88957Gi/s items_per_second=1.32736G/s
BenchmarkRowSegmenter/Rows:32768/Segments:16/SegmentKeys:0 24630 ns 24616 ns 28100 bytes_per_second=9.91805Gi/s items_per_second=1.33118G/s
BenchmarkRowSegmenter/Rows:32768/Segments:64/SegmentKeys:0 24633 ns 24620 ns 28112 bytes_per_second=9.91645Gi/s items_per_second=1.33096G/s
BenchmarkRowSegmenter/Rows:32768/Segments:256/SegmentKeys:0 24679 ns 24656 ns 28651 bytes_per_second=9.90192Gi/s items_per_second=1.32901G/s
BenchmarkRowSegmenter/Rows:32768/Segments:1/SegmentKeys:1 199781 ns 199339 ns 3606 bytes_per_second=2.4495Gi/s items_per_second=164.383M/s
BenchmarkRowSegmenter/Rows:32768/Segments:4/SegmentKeys:1 222783 ns 220778 ns 2905 bytes_per_second=2.21164Gi/s items_per_second=148.421M/s
BenchmarkRowSegmenter/Rows:32768/Segments:16/SegmentKeys:1 254637 ns 254545 ns 2703 bytes_per_second=1.91825Gi/s items_per_second=128.732M/s
BenchmarkRowSegmenter/Rows:32768/Segments:64/SegmentKeys:1 452659 ns 452356 ns 1550 bytes_per_second=1.07942Gi/s items_per_second=72.4385M/s
BenchmarkRowSegmenter/Rows:32768/Segments:256/SegmentKeys:1 1241185 ns 1240526 ns 572 bytes_per_second=403.055Mi/s items_per_second=26.4146M/s
BenchmarkRowSegmenter/Rows:32768/Segments:1/SegmentKeys:2 351964 ns 351202 ns 1983 bytes_per_second=1.39031Gi/s items_per_second=93.3024M/s
BenchmarkRowSegmenter/Rows:32768/Segments:4/SegmentKeys:2 1076041 ns 1010477 ns 727 bytes_per_second=494.816Mi/s items_per_second=32.4282M/s
BenchmarkRowSegmenter/Rows:32768/Segments:16/SegmentKeys:2 4587149 ns 4584191 ns 152 bytes_per_second=109.071Mi/s items_per_second=7.14804M/s
BenchmarkRowSegmenter/Rows:32768/Segments:64/SegmentKeys:2 25132123 ns 25118536 ns 28 bytes_per_second=19.9056Mi/s items_per_second=1.30453M/s
BenchmarkRowSegmenter/Rows:32768/Segments:256/SegmentKeys:2 102196767 ns 102086286 ns 7 bytes_per_second=4.89782Mi/s items_per_second=320.983k/s
BenchmarkRowSegmenter/Rows:32768/Segments:1/SegmentKeys:3 424279 ns 423836 ns 1619 bytes_per_second=1.15205Gi/s items_per_second=77.3129M/s
BenchmarkRowSegmenter/Rows:32768/Segments:4/SegmentKeys:3 1180743 ns 1157215 ns 610 bytes_per_second=432.072Mi/s items_per_second=28.3163M/s
BenchmarkRowSegmenter/Rows:32768/Segments:16/SegmentKeys:3 5738171 ns 5729356 ns 118 bytes_per_second=87.2698Mi/s items_per_second=5.71932M/s
BenchmarkRowSegmenter/Rows:32768/Segments:64/SegmentKeys:3 30695547 ns 30656783 ns 23 bytes_per_second=16.3096Mi/s items_per_second=1.06887M/s
BenchmarkRowSegmenter/Rows:32768/Segments:256/SegmentKeys:3 125980210 ns 125906333 ns 6 bytes_per_second=3.97121Mi/s items_per_second=260.257k/s After: ----------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
----------------------------------------------------------------------------------------------------------------------
BenchmarkRowSegmenter/Rows:32768/Segments:1/SegmentKeys:0 24717 ns 24709 ns 28098 bytes_per_second=9.8808Gi/s items_per_second=1.32618G/s
BenchmarkRowSegmenter/Rows:32768/Segments:4/SegmentKeys:0 24283 ns 24275 ns 28709 bytes_per_second=10.0574Gi/s items_per_second=1.34989G/s
BenchmarkRowSegmenter/Rows:32768/Segments:16/SegmentKeys:0 24426 ns 24413 ns 28328 bytes_per_second=10.0006Gi/s items_per_second=1.34226G/s
BenchmarkRowSegmenter/Rows:32768/Segments:64/SegmentKeys:0 24331 ns 24320 ns 28073 bytes_per_second=10.0385Gi/s items_per_second=1.34735G/s
BenchmarkRowSegmenter/Rows:32768/Segments:256/SegmentKeys:0 26190 ns 25537 ns 28471 bytes_per_second=9.56034Gi/s items_per_second=1.28317G/s
BenchmarkRowSegmenter/Rows:32768/Segments:1/SegmentKeys:1 197105 ns 196813 ns 3587 bytes_per_second=2.48093Gi/s items_per_second=166.493M/s
BenchmarkRowSegmenter/Rows:32768/Segments:4/SegmentKeys:1 207379 ns 207241 ns 3384 bytes_per_second=2.3561Gi/s items_per_second=158.115M/s
BenchmarkRowSegmenter/Rows:32768/Segments:16/SegmentKeys:1 254707 ns 254565 ns 2777 bytes_per_second=1.9181Gi/s items_per_second=128.721M/s
BenchmarkRowSegmenter/Rows:32768/Segments:64/SegmentKeys:1 439539 ns 439415 ns 1589 bytes_per_second=1.11121Gi/s items_per_second=74.5719M/s
BenchmarkRowSegmenter/Rows:32768/Segments:256/SegmentKeys:1 1225127 ns 1224517 ns 576 bytes_per_second=408.324Mi/s items_per_second=26.7599M/s
BenchmarkRowSegmenter/Rows:32768/Segments:1/SegmentKeys:2 355268 ns 355082 ns 1985 bytes_per_second=1.37512Gi/s items_per_second=92.283M/s
BenchmarkRowSegmenter/Rows:32768/Segments:4/SegmentKeys:2 410702 ns 410486 ns 1703 bytes_per_second=1.18952Gi/s items_per_second=79.8273M/s
BenchmarkRowSegmenter/Rows:32768/Segments:16/SegmentKeys:2 667262 ns 667087 ns 1035 bytes_per_second=749.527Mi/s items_per_second=49.121M/s
BenchmarkRowSegmenter/Rows:32768/Segments:64/SegmentKeys:2 1087405 ns 1086757 ns 651 bytes_per_second=460.084Mi/s items_per_second=30.1521M/s
BenchmarkRowSegmenter/Rows:32768/Segments:256/SegmentKeys:2 2133928 ns 2132308 ns 331 bytes_per_second=234.488Mi/s items_per_second=15.3674M/s
BenchmarkRowSegmenter/Rows:32768/Segments:1/SegmentKeys:3 437444 ns 437048 ns 1592 bytes_per_second=1.11722Gi/s items_per_second=74.9757M/s
BenchmarkRowSegmenter/Rows:32768/Segments:4/SegmentKeys:3 505824 ns 505355 ns 1385 bytes_per_second=989.403Mi/s items_per_second=64.8415M/s
BenchmarkRowSegmenter/Rows:32768/Segments:16/SegmentKeys:3 756347 ns 755999 ns 906 bytes_per_second=661.377Mi/s items_per_second=43.344M/s
BenchmarkRowSegmenter/Rows:32768/Segments:64/SegmentKeys:3 1294590 ns 1293676 ns 546 bytes_per_second=386.496Mi/s items_per_second=25.3294M/s
BenchmarkRowSegmenter/Rows:32768/Segments:256/SegmentKeys:3 2552917 ns 2551420 ns 274 bytes_per_second=195.969Mi/s items_per_second=12.843M/s |
Hi friends, @pitrou @felipecrv @mapleFU , would you like to take a look? Thanks. |
ARROW_ASSIGN_OR_RAISE(auto segment_exec_batch, batch.SelectValues(ids)); | ||
ExecSpan segment_batch(segment_exec_batch); | ||
|
||
while (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only call-site in non-testing codes.
} | ||
// Assert next is the last (empty) segment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new API generates fewer segments than before - there is no meaningless (length == 0
) tailing segment.
@@ -629,61 +622,47 @@ TEST(RowSegmenter, Basics) { | |||
auto batch2 = ExecBatchFromJSON(types2, "[[1, 1], [1, 2], [2, 2]]"); | |||
auto batch1 = ExecBatchFromJSON(types1, "[[1], [1], [2]]"); | |||
ExecBatch batch0({}, 3); | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new API doesn't accept offset
argument so no need for such testing.
ExecSpan span0(batch0); | ||
TestSegments(segmenter, span0, {{0, 3, true, true}, {3, 0, true, true}}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and below are just because the new API doesn't generate the zero-length ending segment.
@@ -54,13 +55,8 @@ using group_id_t = std::remove_const<decltype(kNoGroupId)>::type; | |||
using GroupIdType = CTypeTraits<group_id_t>::ArrowType; | |||
auto g_group_id_type = std::make_shared<GroupIdType>(); | |||
|
|||
inline const uint8_t* GetValuesAsBytes(const ArraySpan& data, int64_t offset = 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved w/o touching the code inside.
@@ -102,21 +109,6 @@ Segment MakeSegment(int64_t batch_length, int64_t offset, int64_t length, bool e | |||
return Segment{offset, length, offset + length >= batch_length, extends}; | |||
} | |||
|
|||
// Used by SimpleKeySegmenter::GetNextSegment to find the match-length of a value within a | |||
// fixed-width buffer | |||
int64_t GetMatchLength(const uint8_t* match_bytes, int64_t match_width, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved w/o touching the code inside.
@@ -147,21 +152,15 @@ struct SimpleKeySegmenter : public BaseRowSegmenter { | |||
save_key_data_(static_cast<size_t>(key_type_.type->byte_width())), | |||
extend_was_called_(false) {} | |||
|
|||
Status CheckType(const DataType& type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved w/o touching the code inside.
@@ -97,7 +97,12 @@ class ARROW_EXPORT RowSegmenter { | |||
virtual Status Reset() = 0; | |||
|
|||
/// \brief Get the next segment for the given batch starting from the given offset | |||
/// DEPRECATED: Due to its inefficiency, use GetSegments instead. | |||
ARROW_DEPRECATED("Deprecated in 18.0.0. Use GetSegments instead.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deprecate instead of removing it because I assume this is a public API (due to grouper.h
is a public header).
// Runs the grouper on a single row. This is used to determine the group id of the | ||
// first row of a new segment to see if it extends the previous segment. | ||
template <typename Batch> | ||
Result<group_id_t> MapGroupIdAt(const Batch& batch, int64_t offset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved w/o touching the code inside.
// | ||
|
||
template <typename... Args> | ||
static void BenchmarkRowSegmenter(benchmark::State& state, Args&&...) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we turn this into a more realistic GroupBy benchmark? For example:
- M=0 to 1 non-segmented keys
- N=1 to 2 segmented keys
- K=key cardinality (2,8,64)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Will do, and the name should be named with GroupBy
for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
ArrayVector segments(num_segments); | ||
for (int i = 0; i < num_segments; ++i) { | ||
ASSERT_OK_AND_ASSIGN( | ||
segments[i], | ||
Constant(std::make_shared<Int64Scalar>(i))->Generate(num_rows / num_segments)); | ||
} | ||
// Concat all segments to form the segment key. | ||
ASSERT_OK_AND_ASSIGN(auto segment_key, Concatenate(segments)); | ||
// num_segment_keys copies of the segment key. | ||
ArrayVector segment_keys(num_segment_keys, segment_key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less trivially, you can, for each segmented key, 1) generate a random Int64 array of the right cardinality (using the min and max values) 2) sort it to make it segmented. Then the multi-column segments will not trivially map to the individual key segments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the multi-column segments will not trivially map to the individual key segments.
Sorry, I don't follow this part :(
Could you elaborate a bit? Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that you have the exact same segment lengths in each segment key column. By using random generation the patterns would be slightly different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you mean introducing some randomness into the individual segment lengths. That sounds good given that this particular benchmark needs to be more realistic.
Will do. Thank you for elaborating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
std::vector<std::vector<int64_t>> row_segmenter_args = { | ||
{32 * 1024}, benchmark::CreateRange(1, 256, 4), benchmark::CreateDenseRange(0, 3, 1)}; | ||
|
||
BENCHMARK(BenchmarkRowSegmenter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a GroupBy benchmark, so should probably have "GroupBy" in its name like other benchmarks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
BenchmarkGroupBy(state, {{"count", ""}}, {arg}, /*keys=*/{}, segment_keys); | ||
|
||
state.SetItemsProcessed(num_rows * state.iterations()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we call SetItemsProcessed
in BenchmarkGroupBy
instead? It should probably be relevant for the other GroupBy benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess they should.
Moved into BenchmarkGroupBy
. Other sharing benchmarks will have items_per_second metrics now.
using arrow::Concatenate; | ||
using arrow::ConstantArrayGenerator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These look unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
cpp/src/arrow/compute/row/grouper.cc
Outdated
} | ||
|
||
ARROW_DEPRECATED("Deprecated in 18.0.0. Use GetSegments instead.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already in the declaration, is it useful to duplicate the deprecation here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess not so useful. Removed.
I'm curious, is the |
If you mean if the I hacked the code and got the benchmark number.
|
I see, thanks. Are segment keys used often? If so, it might worth improving |
Not in traditional RDBMS-es, but could be essential in time-series scenarios. (I believe I saw the original author mentioned that somewhere, but I can't find it at the moment.) I can help with enhancing the simple key case anyway. |
That said, it's quite low priority. |
Co-authored-by: Antoine Pitrou <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
auto segment_key = rng.Int64(num_rows, /*min=*/0, /*max=*/num_segments - 1); | ||
int64_t* values = segment_key->data()->GetMutableValues<int64_t>(1); | ||
std::sort(values, values + num_rows); | ||
// num_segment_keys copies of the segment key. | ||
ArrayVector segment_keys(num_segment_keys, segment_key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not much better than before, is it? I would expect something like:
auto segment_key = rng.Int64(num_rows, /*min=*/0, /*max=*/num_segments - 1); | |
int64_t* values = segment_key->data()->GetMutableValues<int64_t>(1); | |
std::sort(values, values + num_rows); | |
// num_segment_keys copies of the segment key. | |
ArrayVector segment_keys(num_segment_keys, segment_key); | |
ArrayVector segment_keys(num_segment_keys); | |
for (auto& segment_key : segment_keys) { | |
segment_key = rng.Int64(num_rows, /*min=*/0, /*max=*/num_segments - 1); | |
int64_t* values = segment_key->data()->GetMutableValues<int64_t>(1); | |
std::sort(values, values + num_rows); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see your point. It is only that I want to make the number of segments to be exactly as specified. Combining independently-random keys, even with the same distribution, will make the number of segments (potentially, much) bigger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that's also much more realistic, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, as you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that's also much more realistic, isn't it?
I think having exact number of segments would help on diagnostics of performance issues than introducing yet another level of randomness, so I'd rather keep it as is :)
} | ||
BENCHMARK(CountGroupByIntsSegmentedByInts) | ||
->ArgNames({"Keys", "SegmentKeys", "Segments"}) | ||
->ArgsProduct({{1, 2}, {0, 1, 2}, benchmark::CreateRange(1, 256, 8)}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to use 1+ segment keys and 0 non-segment keys to get an idea of the purely-segmented performance, but I get:
Invalid: The provided function (hash_count) is a hash aggregate function. Since there are no keys to group by, a scalar aggregate function was expected (normally these do not start with hash_)
It's a bit of a bummer, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As my other comment explained, an aggregation w/o any group by keys (aka. non-segment keys here) is a scalar aggregation and requires "count" rather than "hash_count".
This is a "group by aggregation" benchmark (the counter-part of the previous "CountScalar" one) so there has to be at least one group by key.
} | ||
|
||
template <typename... Args> | ||
static void CountScalarSegmentedByInts(benchmark::State& state, Args&&...) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So "CountScalar" is actually the case where Keys=0 but there are segment keys?
This is quite misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please let me explain a bit.
Though both are named "aggregation", depending on whether there are "group by" keys or not (take SQL select count(*) from t group by c
and select count(*) from t
for instance), most compute engines have two variants of them. Acero calls them "scalar aggregation" and "group by aggregation", and they work with aggregation functions "count/sum/..." and "hash_count/hash_sum/..." respectively. This is understandable because w/o a group by key, the aggregation just needs to hold one "scalar" value (e.g., current count/sum/...) during the whole computation, whereas a group by key immediately requires some structures like a hash table.
And segment keys, on the other hand, working orthogonally with group by keys, apply to both scalar and group by aggregations.
Back to your question, yes, "CountScalar" implies exactly that this is a "scalar aggregation" (i.e., w/o any group by keys) on a "count" function. "SegmentedByInts" implies that there are potential segment keys.
Hope this can clear things a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But select count(*) from t
is not what this is doing. It's still doing a "group by", it's just not using a hash table for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, what this benchmark does is not simply a select count(*) from t
. This is merely to explain the difference between a scalar agg and a group by agg. But that doesn't mean this benchmark is a group by - at least not in Acero.
Though I'm not entirely sure how the segmented non-group-by agg map to a SQL basis, in Acero, specifying segment keys doesn't actually require group-by's existence. One just specifies segment keys and group by keys (independently) within
arrow/cpp/src/arrow/acero/options.h
Lines 332 to 348 in 9576a41
class ARROW_ACERO_EXPORT AggregateNodeOptions : public ExecNodeOptions { | |
public: | |
/// \brief create an instance from values | |
explicit AggregateNodeOptions(std::vector<Aggregate> aggregates, | |
std::vector<FieldRef> keys = {}, | |
std::vector<FieldRef> segment_keys = {}) | |
: aggregates(std::move(aggregates)), | |
keys(std::move(keys)), | |
segment_keys(std::move(segment_keys)) {} | |
// aggregations which will be applied to the targeted fields | |
std::vector<Aggregate> aggregates; | |
// keys by which aggregations will be grouped (optional) | |
std::vector<FieldRef> keys; | |
// keys by which aggregations will be segmented (optional) | |
std::vector<FieldRef> segment_keys; | |
}; |
So I would let the naming reflect the underlying implementation (scalar agg vs. group by agg).
(I understand the argument could be: a segment key still implies group by semantic. I agree. But that's a more end-to-end perspective and doesn't reflect how we handle the segment key).
int64_t num_keys = state.range(0); | ||
ArrayVector keys(num_keys); | ||
for (auto& key : keys) { | ||
key = rng.Int64(num_rows, /*min=*/0, /*max=*/64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit weird that this doesn't use num_segments. For comparison purposes, I would expect a similar cardinality in segmented and non-segmented keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally the cardinalities of a segment key and a group by key are relatively independent, so I chose not to use the one of segment key for group by as well. If we want to make the group by cardinality variable, we can use another independent parameter.
But my concerns is that the performance of segment keys and group by keys are also independent, that is why this benchmark uses a fixed group by key cardinality. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point here is to compare performance of segmented vs. non-segmented keys, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In these two particular benchmarks, they are to show how the number of segment keys and the number of segments perform with a predefined group by setup. In other words, the group by portion of the benchmark is merely to make the benchmark more realistic. They are not designed for the comparison you suggested.
IIUC, an apple to apple comparison though might be: a segmented scalar agg (N segment keys and 0 group by keys) VS. a non-segmented group by agg (0 segment keys and N group by keys), with the same key distribution. In this comparison, we can answer how a group by can be done more efficiently by taking advantage of the segmented nature of the key(s) (i.e., using segmented agg). If you are suggesting such a comparison, yes that makes sense. I think I can add a new benchmark for it.
What do you think? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for the explanation. A new benchmark can be added in a later PR if we want, this one is fine now that I understand the motivation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thanks for this
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 3d6d581. There was 1 benchmark result indicating a performance regression:
The full Conbench report has more details. It also includes information about 255 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
As described in #44052, currently
AnyKeysSegmenter::GetNextSegment
hasO(n*m)
complexity, wheren
is the number of rows in a batch, andm
is the number of segments in this batch (a "segment" is the group of contiguous rows who have the same segment key). This is because in each invocation of the method, it computes all the group ids of the remaining rows in this batch, where it's only interested in the first group, making the rest of the computation a waste.In this PR I introduced a new API
GetSegments
(and subsequently deprecated the oldGetNextSegment
) to compute the group ids only once and iterate all the segments outside to avoid the duplicated computation. This reduces the complexity fromO(n*m)
toO(n)
.What changes are included in this PR?
grouper.h
is a public header, so I assumeRowSegmenter::GetNextSegment
is a public API and only deprecate it instead of removing it.RowSegmenter::GetSegments
and update the call-sites.50x
speedup, nearlyO(n*m)
toO(n)
complexity reduction.)Are these changes tested?
Legacy tests are sufficient.
Are there any user-facing changes?
Yes.
This PR includes breaking changes to public APIs.
The API
RowSegmenter::GetNextSegment
is deprecated due to its inefficiency and replaced with a more efficient oneRowSegmenter::GetSegments
.