Skip to content

Commit

Permalink
refactor: Simplify the partial full handling for distinct aggregation
Browse files Browse the repository at this point in the history
Summary:
The current partial full is cleared when there is no new distinct in add input if it has been set.
The reason for doing this is to avoid distinct partial aggregation hanging problem
as we only reset the grouping set in case of partial full if there is new distinct found in get output.
This is a bit tricky and it is better to handle reset partial full unconditionally in get output for
distinct type of aggregation.

Differential Revision: D67288365
  • Loading branch information
xiaoxmeng authored and facebook-github-bot committed Dec 16, 2024
1 parent 2d31862 commit 3917617
Showing 1 changed file with 5 additions and 8 deletions.
13 changes: 5 additions & 8 deletions velox/exec/HashAggregation.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -205,11 +205,6 @@ void HashAggregation::addInput(RowVectorPtr input) {
if (newDistincts_) {
// Save input to use for output in getOutput().
input_ = input;
} else {
// If no new distinct groups (meaning we don't have anything to output),
// then we need to ensure we 'need input'. For that we need to reset
// the 'partial full' flag.
partialFull_ = false;
}
}
}
Expand Down Expand Up @@ -348,7 +343,11 @@ RowVectorPtr HashAggregation::getOutput() {
}

if (isDistinct_) {
return getDistinctOutput();
auto output = getDistinctOutput();
if (partialFull_) {
resetPartialOutputIfNeed();
}
return output;
}

const auto& queryConfig = operatorCtx_->driverCtx()->queryConfig();
Expand Down Expand Up @@ -393,8 +392,6 @@ RowVectorPtr HashAggregation::getDistinctOutput() {
// Drop reference to input_ to make it singly-referenced at the producer and
// allow for memory reuse.
input_ = nullptr;

resetPartialOutputIfNeed();
return output;
}
VELOX_CHECK(!newDistincts_);
Expand Down

0 comments on commit 3917617

Please sign in to comment.