[Multi-stage] Support is_enable_group_trim agg option #14664

Jackie-Jiang · 2024-12-16T08:32:00Z

Introduce is_enable_group_trim as an agg option to enable group trim in the leaf stage.
Group trim can be enabled when there is order by and limit.

E.g.

SELECT /*+ aggOptions(is_partitioned_by_group_by_keys='true', is_enable_group_trim='true') */ {tbl1}.num, COUNT(*), SUM({tbl1}.val), SUM({tbl1}.num), COUNT(DISTINCT {tbl1}.val) FROM {tbl1} WHERE {tbl1}.val >= 0 AND {tbl1}.name != 'a' GROUP BY {tbl1}.num ORDER BY COUNT(*) DESC LIMIT 1

codecov-commenter · 2024-12-16T09:11:15Z

Codecov Report

Attention: Patch coverage is 74.01575% with 33 lines in your changes missing coverage. Please review.

Project coverage is 64.01%. Comparing base (59551e4) to head (7c21d95).
Report is 1477 commits behind head on master.

Files with missing lines	Patch %	Lines
...el/rules/PinotAggregateExchangeNodeInsertRule.java	72.50%	10 Missing and 12 partials ⚠️
...he/pinot/query/planner/explain/PlanNodeMerger.java	0.00%	4 Missing ⚠️
.../query/planner/logical/EquivalentStagesFinder.java	0.00%	3 Missing ⚠️
...he/pinot/query/planner/plannode/AggregateNode.java	66.66%	0 Missing and 3 partials ⚠️
.../runtime/plan/server/ServerPlanRequestVisitor.java	94.11%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #14664      +/-   ##
============================================
+ Coverage     61.75%   64.01%   +2.26%     
- Complexity      207     1606    +1399     
============================================
  Files          2436     2703     +267     
  Lines        133233   148997   +15764     
  Branches      20636    22839    +2203     
============================================
+ Hits          82274    95387   +13113     
- Misses        44911    46622    +1711     
- Partials       6048     6988     +940

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (+99.99%)`	⬆️
integration	`100.00% <ø> (+99.99%)`	⬆️
integration1	`100.00% <ø> (+99.99%)`	⬆️
integration2	`0.00% <ø> (ø)`
java-11	`63.98% <74.01%> (+2.27%)`	⬆️
java-21	`63.89% <74.01%> (+2.26%)`	⬆️
skip-bytebuffers-false	`63.98% <74.01%> (+2.24%)`	⬆️
skip-bytebuffers-true	`63.87% <74.01%> (+36.14%)`	⬆️
temurin	`64.01% <74.01%> (+2.26%)`	⬆️
unittests	`64.01% <74.01%> (+2.26%)`	⬆️
unittests1	`56.36% <74.01%> (+9.46%)`	⬆️
unittests2	`34.41% <7.08%> (+6.68%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

gortiz · 2024-12-18T07:32:38Z

cc @bziobrowski

gortiz · 2024-12-18T07:50:36Z

...r/src/main/java/org/apache/pinot/calcite/rel/rules/PinotAggregateExchangeNodeInsertRule.java

+      List<RexNode> projects = projectRel.getProjects();
+      List<RelFieldCollation> collations = sortRel.getCollation().getFieldCollations();
+      if (collations.isEmpty()) {
+        // Cannot enable group trim without sort key.


I understand this is required for within segment trimming right? Is it an actual requirement for cross segment trimming?

Anyway, I'm not sure if we should make this kind of decisions here. Couldn't we populate the fields and let the actual operator decide whether an optimization can be applied or not?

More specifically: Don't you think it may be interesting to keep the limit even if we don't have collations?

Having non-empty collation is required for trimming within segment and cross-segment but it could be beneficial to propagate limit even if collation is missing.
That is because when limit is present (!= Integer.MAX_VALUE) combine operator might limit the number of group by keys in indexed table.

gortiz · 2024-12-18T07:53:42Z

...r/src/main/java/org/apache/pinot/calcite/rel/rules/PinotAggregateExchangeNodeInsertRule.java

+public class PinotAggregateExchangeNodeInsertRule {
+
+  public static class SortProjectAggregate extends RelOptRule {


Although the javadoc in the main class applies to all rules, it would be cool to add a javadoc for each rule to explain what they do (and maybe add an example)

gortiz · 2024-12-18T09:17:14Z

...r/src/main/java/org/apache/pinot/calcite/rel/rules/PinotAggregateExchangeNodeInsertRule.java

+    }
+
+    @Override
+    public void onMatch(RelOptRuleCall call) {


nit:

Calcite we have onMatch and matches methods. This method seems to be doing both. I think it would be better to override matches to do not apply the rule in all the cases that are filtered out here.

For the final query is the same and we would need to repeat some code, but the advantage of using matches is that if try to debug which rules apply (using MarkerFilter, as explained in https://youtu.be/_phzRNCWJfw?si=hH9ukXAS2Iml11nq&t=331).

I've just created #14680 to do not forget how to enable these logs.

bziobrowski · 2024-12-18T15:50:28Z

pinot-query-planner/src/test/resources/queries/GroupByPlans.json

@@ -249,6 +249,39 @@
          "\n              LogicalTableScan(table=[[default, a]])",
          "\n"
        ]
+      },


Is this PR meant to enable trimming within (minSegmentTrim) and across segments (minServerTrim) ?
If so - It'd be good to mention that :

the former is disabled by default and requires using query option (or cluster setting)

the latter is enabled by default but probably haven't been applied so far

I reckon It'd be good to also test this hint has on a query without order by and/or limit clause.

Apart from manual hinting I think we could propagate limit and order by details to leaf plans and enable both types of trimming if:

order by is based on group by key(s) only (not aggregates)

there is no HAVING clause

bziobrowski · 2024-12-18T15:54:13Z

pinot-query-runtime/src/test/resources/queries/QueryHints.json

@@ -321,6 +321,10 @@
        "description": "aggregate with skip intermediate stage hint (via hint option is_partitioned_by_group_by_keys)",
        "sql": "SELECT /*+ aggOptions(is_partitioned_by_group_by_keys='true') */ {tbl1}.num, COUNT(*), SUM({tbl1}.val), SUM({tbl1}.num), COUNT(DISTINCT {tbl1}.val) FROM {tbl1} WHERE {tbl1}.val >= 0 AND {tbl1}.name != 'a' GROUP BY {tbl1}.num"
      },


Is there a test asserting that trimming affects results ?

Jackie-Jiang added enhancement documentation Configuration Config changes (addition/deletion/change in behavior) multi-stage Related to the multi-stage query engine labels Dec 16, 2024

Jackie-Jiang requested a review from xiangfu0 December 16, 2024 08:32

Jackie-Jiang force-pushed the aggregate_limit_push_down branch from 9eaf163 to e6d0b43 Compare December 16, 2024 18:26

Support is_enable_group_trim agg option

7c21d95

Jackie-Jiang force-pushed the aggregate_limit_push_down branch from e6d0b43 to 7c21d95 Compare December 16, 2024 19:26

gortiz requested review from yashmayya and gortiz December 18, 2024 07:23

gortiz approved these changes Dec 18, 2024

View reviewed changes

bziobrowski reviewed Dec 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Multi-stage] Support is_enable_group_trim agg option #14664

[Multi-stage] Support is_enable_group_trim agg option #14664

Jackie-Jiang commented Dec 16, 2024

codecov-commenter commented Dec 16, 2024 •

edited

Loading

gortiz commented Dec 18, 2024

gortiz Dec 18, 2024

bziobrowski Dec 18, 2024

gortiz Dec 18, 2024

gortiz Dec 18, 2024

bziobrowski Dec 18, 2024 •

edited

Loading

bziobrowski Dec 18, 2024

		public class PinotAggregateExchangeNodeInsertRule {

		public static class SortProjectAggregate extends RelOptRule {

[Multi-stage] Support is_enable_group_trim agg option #14664

Are you sure you want to change the base?

[Multi-stage] Support is_enable_group_trim agg option #14664

Conversation

Jackie-Jiang commented Dec 16, 2024

codecov-commenter commented Dec 16, 2024 • edited Loading

Codecov Report

gortiz commented Dec 18, 2024

gortiz Dec 18, 2024

Choose a reason for hiding this comment

bziobrowski Dec 18, 2024

Choose a reason for hiding this comment

gortiz Dec 18, 2024

Choose a reason for hiding this comment

gortiz Dec 18, 2024

Choose a reason for hiding this comment

bziobrowski Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

bziobrowski Dec 18, 2024

Choose a reason for hiding this comment

codecov-commenter commented Dec 16, 2024 •

edited

Loading

bziobrowski Dec 18, 2024 •

edited

Loading