-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregator support batch serialize #9777
base: master
Are you sure you want to change the base?
Conversation
ff696d3
to
8bbb062
Compare
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
30a5b1c
to
241537d
Compare
fc95526
to
3354b7f
Compare
…ize_refine Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
…h into hashagg_batch_serialize
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
bb98d79
to
a2f168e
Compare
Signed-off-by: guo-shaoge <[email protected]>
void init(size_t start_row, const ColumnRawPtrs & key_columns, const TiDB::TiDBCollators & collators) | ||
{ | ||
batch_row_idx = start_row; | ||
byte_size.resize_fill_zero(key_columns[0]->size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we already have start_row here, do we need to resize to total size of key_columns[0]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
countSerializeByteSizeForCmp
doesn't have start_row parameter. So we have to resize byte_size
to the total size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So will byte_size be inited more than once here with different start_row parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. Whenever a block needs to be aggregated (whether it is a new block or the result of a resize exception causing start_row to be greater than 0), the function countSerializeByteSizeForCmp will be called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment added
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
@@ -661,7 +662,10 @@ AggregatedDataVariants::Type Aggregator::chooseAggregationMethodInner() | |||
if (params.keys_size == 1 && types_not_null[0]->isFixedString()) | |||
return AggregatedDataVariants::Type::key_fixed_string; | |||
|
|||
return ChooseAggregationMethodFastPath(params.keys_size, types_not_null, params.collators); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need to keep the implementation of ChooseAggregationMethodFastPath?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it can be removed, i will remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
std::vector<typename Method::State::KeyHolderType> key_holders; | ||
if constexpr (enable_prefetch) | ||
std::vector<KeyHolderType> key_holders; | ||
if constexpr (enable_prefetch || batch_get_key_holder) | ||
{ | ||
// mini batch will only be used when HashTable is big(a.k.a enable_prefetch is true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments needed be updated
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: yibin87 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
What problem does this PR solve?
Issue Number: close #9692
Problem Summary: Reduce virtual function call for
key_serialize
andkey_string
.Workload: tpch-50g
Queries: same with #9679
Workload: clickbench
Queries: https://github.com/ClickHouse/ClickBench/blob/fdfdb5d94f2a668dce1f63d55498aa34510e4c9c/clickhouse/queries.sql#L11
NOTE:
two_keys_num64_strbinpadding
. opt key iskey_serialized
.What is changed and how it works?
Check List
Tests
Side effects
Documentation
Release note