Enable spilling support for partial aggregation #7558

zhztheplayer · 2023-11-14T11:28:57Z

Resolves #7930, also be related to #7511

Partial aggregation currently relies on following options to reduce memory usage:

kMaxPartialAggregationMemory
kMaxExtendedPartialAggregationMemory

The PR adds the following one to enable flushing on partial aggregation:

kPartialAggregationSpillEnabled (default value: false)

When kPartialAggregationSpillEnabled is set to true, partial aggregation will try spilling data to disk just like final aggregation can do. At the same time, flushing will be disabled.

Also, when spilling has been triggered, code path of abandoning partial aggregation will be disabled.

netlify · 2023-11-14T11:29:03Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`f7118f5`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/65813ff292045e0008605ff9

zhztheplayer · 2023-11-17T07:39:54Z

@mbasmanova @xiaoxmeng Would you like to review? Thanks!

zhztheplayer · 2023-11-23T01:26:46Z

@mbasmanova @xiaoxmeng Hi, would you please help review? Thanks!

xiaoxmeng

@zhztheplayer if we need to spill in partial aggr for gluten use case, we might need a separate config for this. Thanks!

zhztheplayer · 2023-11-27T11:20:32Z

@xiaoxmeng

mbasmanova

Let's finish design discussion in #7511 before proceeding with code changes.

velox/exec/GroupingSet.cpp

zhztheplayer · 2023-12-13T04:08:47Z

Hi @mbasmanova @xiaoxmeng , should we continue the review? Thanks!

mbasmanova

@zhztheplayer Would you update PR description to describe the latest changes. The description seems outdated.

Also, please, document new configs in https://facebookincubator.github.io/velox/configs.html

mbasmanova · 2023-12-13T13:36:13Z

velox/core/PlanNode.cpp

-  return (isFinal() || isSingle()) && preGroupedKeys().empty() &&
-      queryConfig.aggregationSpillEnabled();
+  if ((isFinal() || isSingle()) && queryConfig.aggregationSpillEnabled()) {
+    return preGroupedKeys().empty();


This is a bit confusing. Perhaps,

if (preGroupedKeys().empty()) { return false(); } ...

Sure. But would you mean the following?

if (!preGroupedKeys().empty()) { return false; }

Oh yes. Sorry for the typo.

mbasmanova · 2023-12-13T13:37:07Z

velox/functions/prestosql/aggregates/ApproxPercentileAggregate.cpp

@@ -650,7 +663,8 @@ class ApproxPercentileAggregate : public exec::Aggregate {
    if constexpr (checkIntermediateInputs) {
      VELOX_USER_CHECK(rowVec);
      for (int i = kPercentiles; i <= kAccuracy; ++i) {
-        VELOX_USER_CHECK(rowVec->childAt(i)->isConstantEncoding());
+        VELOX_USER_CHECK(isConstantVector(


Unrelated change? Would you extract it into a separate PR, provide clear description and tests?

It was related since without the change AggregateFuzzer would fail.

However I'll try reverting the change since the failure might be related to flushing and spilling was once both enabled in the PR. Probably it's no longer needed.

velox/exec/HashAggregation.cpp

mbasmanova

Would you check if AggregationFuzzer covers the new code paths? If not, let's extend it.

zhztheplayer · 2023-12-14T04:22:04Z

velox/exec/tests/utils/AggregationFuzzer.cpp

+    if (injectPartialSpill) {
+      spillDirectory = exec::test::TempDirectoryPath::create();
+      builder.spillDirectory(spillDirectory->path)
+          .config(core::QueryConfig::kSpillEnabled, "true")
+          .config(core::QueryConfig::kPartialAggregationSpillEnabled, "true")
+          .config(core::QueryConfig::kTestingSpillPct, "100");
+    }


Flushing will be disabled when partial spill is on so this new dimension is needed to avoid losing coverage.

zhztheplayer · 2023-12-14T06:15:51Z

@mbasmanova Thanks

zhztheplayer · 2023-12-15T07:47:10Z

Hi @mbasmanova I've addressed current comments. Would you please take another look? Thank you.

mbasmanova

@zhztheplayer Thank you for iterating. Would you update PR description to describe changes to the Fuzzer? Would you also run Fuzzer for 1h+ to make sure there are no failures. BTW, have you seen spilling in partial agg during Fuzzer runs? How often does it happen?

@duanmeng @xiaoxmeng Please, help review this PR.

mbasmanova · 2023-12-15T09:38:47Z

velox/docs/configs.rst

+     - boolean
+     - false
+     - When `spill_enabled` is true, determines whether the partial phase of HashAggregation operator can spill to disk under memory pressure.
+       Flushing will be disabled so max_partial_aggregation_memory and max_extended_partial_aggregation_memory will be ignored when turning the option on.


typos: " the option on. this option."

mbasmanova · 2023-12-15T09:40:05Z

velox/exec/tests/utils/TempDirectoryPath.cpp

@@ -28,7 +28,7 @@ std::shared_ptr<TempDirectoryPath> TempDirectoryPath::create() {
 }

 TempDirectoryPath::~TempDirectoryPath() {
-  LOG(INFO) << "TempDirectoryPath:: removing all files from" << path;
+  LOG(INFO) << "TempDirectoryPath:: removing all files from " << path;


Nice fix. Thanks.

mbasmanova · 2023-12-15T09:41:36Z

velox/exec/tests/SharedArbitratorTest.cpp

+  const auto spillDirectory = exec::test::TempDirectoryPath::create();
+  core::PlanNodeId partialAggNodeId;
+  core::PlanNodeId finalAggNodeId;
+  std::shared_ptr<core::QueryCtx> queryCtx = newQueryCtx(maxQueryCapacity);


mbasmanova · 2023-12-15T09:42:10Z

velox/exec/tests/SharedArbitratorTest.cpp

+          .config(core::QueryConfig::kAggregationSpillEnabled, "true")
+          .config(
+              core::QueryConfig::kMaxPartialAggregationMemory,
+              std::to_string(1LL << 30)) // disable flush


Why do we need this? Enabling partial spilling disables flushing, no?

The test case explicitly eliminates the impact of flushing by disabling it, despite whether we had made flushing and spilling exclusive. In future we may decide to make flushing and spilling work together but this case should still work and still test against flushing independently in that case.

The next test case will ensure flushing is disabled by spilling. Would this make sense to you?

mbasmanova · 2023-12-15T09:42:25Z

velox/exec/tests/SharedArbitratorTest.cpp

+              core::QueryConfig::kMaxExtendedPartialAggregationMemory,
+              std::to_string(1LL << 30)) // disable flush
+          .config(
+              core::QueryConfig::kAbandonPartialAggregationMinPct,


// avoid abandoning

Why do we need this?

The test cases meant to test against spilling but if partial aggregation got abandoned then spill will hardly happen.

mbasmanova · 2023-12-15T09:45:15Z

velox/docs/configs.rst

+   * - partial_aggregation_spill_enabled
+     - boolean
+     - false
+     - When `spill_enabled` is true, determines whether the partial phase of HashAggregation operator can spill to disk under memory pressure.


Let's explain when it could be beneficial to enable this? E.g. much fewer reducers, slow or inefficient shuffle, anything else?

Can we just describe the result when turning the option on? For example, just note user the size of data emitted from partial aggregation would be reduced as much as possible if setting to true. Users will know it should be on if they rely on inefficient shuffle or have other specific reasons.

duanmeng

@zhztheplayer This PR is crucial to batch / ETL queries, 👍 . Perhaps we could add some test cases in the AggregationTest.

velox/exec/HashAggregation.cpp

duanmeng · 2023-12-18T02:55:59Z

@mbasmanova @xiaoxmeng Do we need two separate spill-enable flags if our goal is to support spilling for aggregation in all equivalent query plans? WDYT?

zhztheplayer · 2023-12-19T03:23:52Z

@zhztheplayer This PR is crucial to batch / ETL queries, 👍 . Perhaps we could add some test cases in the AggregationTest.

Thanks @duanmeng . I've added some tests to AggregationTest.

fixup fixup fixup doc fixup revert fixup fixup fixup revert unneeded changes fixup fixup fixup fixup fixup fixup fixup fixup fixup fixup fixup fixup fixup fixup fixup fixup Revert "fixup" This reverts commit ecf530e. Revert "fixup" This reverts commit 9bd48de. fixup fixup fixup fixup fixup fixup fixup fixup Revert "fixup" This reverts commit f9efc1a. fixup fixup fixup fixup test This reverts commit 73bf319b76d74a794d2fcffa3b992f581d69f6a1.

zhztheplayer · 2023-12-19T04:51:21Z

@mbasmanova @xiaoxmeng Do we need two separate spill-enable flags if our goal is to support spilling for aggregation in all equivalent query plans? WDYT?

Let's keep them separate for the initial version? Spilling for partial aggregation should be turned off by default since it disables flushing. While for regular aggregation spilling is on by default.

zhztheplayer · 2023-12-20T03:03:55Z

@mbasmanova @xiaoxmeng @duanmeng

After going through some history about partial aggregation and companion functions (especially #4412 and #4566), I am feeling that the topic is on the wrong direction. So I am about to close this.

Reason:

Spark's partial aggregation actually doesn't support flushing so ideally in Gluten we should not use Velox (probably Presto also)'s PartialAggregation. Velox's FinalAggregation + Companion functions is better for Gluten to choose. Since final aggregation already supports spilling so no extra effort is needed in my opinion.

PartialAggregation in Velox as a concept should consist of optimizations when a regular aggregation doesn't have to emit full-aggregated data. Spark doesn't have a plan type mapping to this functionality yet. Adding spilling support to PartialAggreagtion may help in some rare corner cases but the way more or less breaks PartialAggregation's design finally.

zhztheplayer · 2023-12-20T03:09:31Z

Closing. Please reopen if anyone thinks it's still needed.

mbasmanova · 2023-12-20T10:43:33Z

@zhztheplayer Thank you for the update. It comes as a bit of surprise, but I guess it makes sense. Would you also comment on the 2 related issues and close these?

Resolves #7930, also be related to #7511

mbasmanova · 2023-12-20T10:44:12Z

@zhztheplayer Curious, what prompted you to take another look and go "through some history about partial aggregation and companion functions".

zhztheplayer · 2023-12-21T06:47:05Z

@zhztheplayer Curious, what prompted you to take another look and go "through some history about partial aggregation and companion functions".

One of Gluten's users reported an issue that expression count(distinct l_orderedkey) returns wrong result in Gluten. Spark generated the following plan:

+- HashAggregate(keys=[], functions=[count(distinct l_orderkey#114L)])
   +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=140]
      +- (Partial) HashAggregate(keys=[], functions=[partial_count(distinct l_orderkey#114L)])    // <- node 2
         +- (Partial) HashAggregate(keys=[l_orderkey#114L], functions=[])   // <- node 1
            +- Exchange hashpartitioning(l_orderkey#114L, 100), ENSURE_REQUIREMENTS, [plan_id=136]
               +- (Partial) HashAggregate(keys=[l_orderkey#114L], functions=[])
                  +- FileScan

The plan node I marked as node 2 is using node 1's output as distinct input to do partial_count. However since node 1 is now mapped to Velox's PartialAggregation, it may flush intermediate data at any time. That makes node 2 output wrong count result.

Looks to be that the user's issue is related to incorrect mapping from Spark's Aggregation + partial function to Velox's PartialAggregation. If we correct the mapping as from Spark's Aggregation + partial function to Velox's FinalAggregation + partial companion function, then we can solve this count+distinct issue and don't have to add spilling support to partial aggregation either. I'm still working with @rui-mo to try to do some refactors to Gluten to use the new mapping, but anyway I think this topic can be closed since it was driven by issue related to Gluten's non-optimal design.

mbasmanova · 2023-12-21T12:12:33Z

@zhztheplayer Thank you for sharing this context. This is super helpful and now I have a better understanding. Indeed, it sounds like "partial agg" in Spark doesn't mean the same as in Presto / Velox and you are correct that it probably needs to be mapped to final agg + partial companion function. CC: @kagamiori

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 14, 2023

zhztheplayer mentioned this pull request Nov 14, 2023

Partial aggregation easily causes OOM #7511

Closed

zhztheplayer force-pushed the wip-partial-spill-up branch from e4c26c2 to e8572ae Compare November 16, 2023 04:10

This was referenced Nov 16, 2023

Enable spilling for partial aggregation oap-project/velox#439

Merged

[VL] Enable spill-to-disk for partial aggregation apache/incubator-gluten#3697

Merged

zhztheplayer marked this pull request as ready for review November 17, 2023 06:11

zhztheplayer changed the title ~~WIP: Enable spilling support for partial aggregation~~ Enable spilling support for partial aggregation Nov 17, 2023

zhztheplayer changed the title ~~Enable spilling support for partial aggregation~~ WIP: Enable spilling support for partial aggregation Nov 17, 2023

zhztheplayer marked this pull request as draft November 17, 2023 06:12

zhztheplayer marked this pull request as ready for review November 17, 2023 07:18

zhztheplayer changed the title ~~WIP: Enable spilling support for partial aggregation~~ Enable spilling support for partial aggregation Nov 17, 2023

xiaoxmeng reviewed Nov 24, 2023

View reviewed changes

mbasmanova requested changes Nov 27, 2023

View reviewed changes

zhztheplayer mentioned this pull request Nov 28, 2023

Re-pick partial aggregation spilling patch oap-project/velox#453

Merged

zhztheplayer force-pushed the wip-partial-spill-up branch from 485c203 to f6aa20e Compare November 28, 2023 11:55

zhztheplayer force-pushed the wip-partial-spill-up branch 2 times, most recently from 3839948 to 80f153e Compare December 13, 2023 03:29

zhztheplayer commented Dec 13, 2023

View reviewed changes

velox/exec/GroupingSet.cpp Outdated Show resolved Hide resolved

mbasmanova reviewed Dec 13, 2023

View reviewed changes

velox/exec/HashAggregation.cpp Show resolved Hide resolved

mbasmanova reviewed Dec 13, 2023

View reviewed changes

zhztheplayer force-pushed the wip-partial-spill-up branch from 0fbe799 to 5428864 Compare December 14, 2023 01:48

zhztheplayer commented Dec 14, 2023

View reviewed changes

zhztheplayer force-pushed the wip-partial-spill-up branch from 82e68f5 to 667d24d Compare December 15, 2023 05:48

mbasmanova reviewed Dec 15, 2023

View reviewed changes

zhztheplayer force-pushed the wip-partial-spill-up branch from bd26242 to 1b20465 Compare December 18, 2023 02:11

duanmeng reviewed Dec 18, 2023

View reviewed changes

velox/exec/HashAggregation.cpp Show resolved Hide resolved

zhztheplayer added 8 commits December 19, 2023 12:34

fixup

af56efa

fixup

eb1b3eb

fixup

d5d5c4e

fixup

ba34c84

fixup

749f688

fixup

0de210b

fixup

3fd188e

zhztheplayer force-pushed the wip-partial-spill-up branch from 24ec1f4 to 3fd188e Compare December 19, 2023 04:48

empty

f7118f5

zhztheplayer closed this Dec 20, 2023

zhztheplayer mentioned this pull request Dec 21, 2023

Enable spilling support for partial aggregation #7930

Closed

Enable spilling support for partial aggregation #7558

Enable spilling support for partial aggregation #7558

Conversation

zhztheplayer commented Nov 14, 2023 • edited Loading

netlify bot commented Nov 14, 2023 • edited Loading

✅ Deploy Preview for meta-velox canceled.

zhztheplayer commented Nov 17, 2023

zhztheplayer commented Nov 23, 2023

xiaoxmeng left a comment

Choose a reason for hiding this comment

zhztheplayer commented Nov 27, 2023

mbasmanova left a comment

Choose a reason for hiding this comment

zhztheplayer commented Dec 13, 2023

mbasmanova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbasmanova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhztheplayer commented Dec 14, 2023

zhztheplayer commented Dec 15, 2023

mbasmanova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhztheplayer Dec 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

duanmeng left a comment

Choose a reason for hiding this comment

duanmeng commented Dec 18, 2023

zhztheplayer commented Dec 19, 2023

zhztheplayer commented Dec 19, 2023

zhztheplayer commented Dec 20, 2023 • edited Loading

zhztheplayer commented Dec 20, 2023

mbasmanova commented Dec 20, 2023

mbasmanova commented Dec 20, 2023

zhztheplayer commented Dec 21, 2023 • edited Loading

mbasmanova commented Dec 21, 2023

zhztheplayer commented Nov 14, 2023 •

edited

Loading

netlify bot commented Nov 14, 2023 •

edited

Loading

zhztheplayer Dec 18, 2023 •

edited

Loading

zhztheplayer commented Dec 20, 2023 •

edited

Loading

zhztheplayer commented Dec 21, 2023 •

edited

Loading