Skip to content

Commit

Permalink
fix: Use the right row container to extract rows in grouping set (fac…
Browse files Browse the repository at this point in the history
…ebookincubator#11899)

Summary:
Pull Request resolved: facebookincubator#11899

Currently we always extract rows from grouping set through the default row container in the grouping set.
It can cause potential performance and even data correctness problem in spill path as the row extract depends
on the column stats recorded in the row container to decide whether apply fast no null path or slow path with nulls.
If we use default row container to extract rows from the row container used by spill, the column stats is not accurate.
If spill merge row container has null, then we get random non-null value for non column. This is correctness problem.
If spill merge row container has no null, then we get slow path with null handling

This PR fixes by always use the right container to extract rows

Reviewed By: Yuhta

Differential Revision: D67352373

fbshipit-source-id: 9d48857c4b3ca270bfc6777e0c3e4fbc03ee1e03
  • Loading branch information
xiaoxmeng authored and facebook-github-bot committed Dec 17, 2024
1 parent 9b64b94 commit 2d2cdbd
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 6 deletions.
12 changes: 7 additions & 5 deletions velox/exec/GroupingSet.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -760,22 +760,23 @@ bool GroupingSet::getOutput(
}
return false;
}
extractGroups(folly::Range<char**>(groups, numGroups), result);
extractGroups(
table_->rows(), folly::Range<char**>(groups, numGroups), result);
return true;
}

void GroupingSet::extractGroups(
RowContainer* rowContainer,
folly::Range<char**> groups,
const RowVectorPtr& result) {
result->resize(groups.size());
if (groups.empty()) {
return;
}
RowContainer& rows = *table_->rows();
const auto totalKeys = rows.keyTypes().size();
const auto totalKeys = rowContainer->keyTypes().size();
for (int32_t i = 0; i < totalKeys; ++i) {
auto& keyVector = result->childAt(i);
rows.extractColumn(
rowContainer->extractColumn(
groups.data(),
groups.size(),
groupingKeyOutputProjections_[i],
Expand Down Expand Up @@ -1252,7 +1253,8 @@ void GroupingSet::extractSpillResult(const RowVectorPtr& result) {
mergeRows_->listRows(
&iter, rows.size(), RowContainer::kUnlimited, rows.data());
}
extractGroups(folly::Range<char**>(rows.data(), rows.size()), result);
extractGroups(
mergeRows_.get(), folly::Range<char**>(rows.data(), rows.size()), result);
mergeRows_->clear();
}

Expand Down
5 changes: 4 additions & 1 deletion velox/exec/GroupingSet.h
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,10 @@ class GroupingSet {
// Copies the grouping keys and aggregates for 'groups' into 'result' If
// partial output, extracts the intermediate type for aggregates, final result
// otherwise.
void extractGroups(folly::Range<char**> groups, const RowVectorPtr& result);
void extractGroups(
RowContainer* container,
folly::Range<char**> groups,
const RowVectorPtr& result);

// Produces output in if spilling has occurred. First produces data
// from non-spilled partitions, then merges spill runs and unspilled data
Expand Down

0 comments on commit 2d2cdbd

Please sign in to comment.