Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Multi-stage] Support is_enable_group_trim agg option #14664

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Jackie-Jiang
Copy link
Contributor

Introduce is_enable_group_trim as an agg option to enable group trim in the leaf stage.
Group trim can be enabled when there is order by and limit.

E.g.

SELECT /*+ aggOptions(is_partitioned_by_group_by_keys='true', is_enable_group_trim='true') */ {tbl1}.num, COUNT(*), SUM({tbl1}.val), SUM({tbl1}.num), COUNT(DISTINCT {tbl1}.val) FROM {tbl1} WHERE {tbl1}.val >= 0 AND {tbl1}.name != 'a' GROUP BY {tbl1}.num ORDER BY COUNT(*) DESC LIMIT 1

@Jackie-Jiang Jackie-Jiang added enhancement documentation Configuration Config changes (addition/deletion/change in behavior) multi-stage Related to the multi-stage query engine labels Dec 16, 2024
@codecov-commenter
Copy link

codecov-commenter commented Dec 16, 2024

Codecov Report

Attention: Patch coverage is 74.01575% with 33 lines in your changes missing coverage. Please review.

Project coverage is 64.01%. Comparing base (59551e4) to head (7c21d95).
Report is 1477 commits behind head on master.

Files with missing lines Patch % Lines
...el/rules/PinotAggregateExchangeNodeInsertRule.java 72.50% 10 Missing and 12 partials ⚠️
...he/pinot/query/planner/explain/PlanNodeMerger.java 0.00% 4 Missing ⚠️
.../query/planner/logical/EquivalentStagesFinder.java 0.00% 3 Missing ⚠️
...he/pinot/query/planner/plannode/AggregateNode.java 66.66% 0 Missing and 3 partials ⚠️
.../runtime/plan/server/ServerPlanRequestVisitor.java 94.11% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14664      +/-   ##
============================================
+ Coverage     61.75%   64.01%   +2.26%     
- Complexity      207     1606    +1399     
============================================
  Files          2436     2703     +267     
  Lines        133233   148997   +15764     
  Branches      20636    22839    +2203     
============================================
+ Hits          82274    95387   +13113     
- Misses        44911    46622    +1711     
- Partials       6048     6988     +940     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.98% <74.01%> (+2.27%) ⬆️
java-21 63.89% <74.01%> (+2.26%) ⬆️
skip-bytebuffers-false 63.98% <74.01%> (+2.24%) ⬆️
skip-bytebuffers-true 63.87% <74.01%> (+36.14%) ⬆️
temurin 64.01% <74.01%> (+2.26%) ⬆️
unittests 64.01% <74.01%> (+2.26%) ⬆️
unittests1 56.36% <74.01%> (+9.46%) ⬆️
unittests2 34.41% <7.08%> (+6.68%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Jackie-Jiang Jackie-Jiang force-pushed the aggregate_limit_push_down branch from 9eaf163 to e6d0b43 Compare December 16, 2024 18:26
@Jackie-Jiang Jackie-Jiang force-pushed the aggregate_limit_push_down branch from e6d0b43 to 7c21d95 Compare December 16, 2024 19:26
@gortiz gortiz requested review from yashmayya and gortiz December 18, 2024 07:23
@gortiz
Copy link
Contributor

gortiz commented Dec 18, 2024

cc @bziobrowski

List<RexNode> projects = projectRel.getProjects();
List<RelFieldCollation> collations = sortRel.getCollation().getFieldCollations();
if (collations.isEmpty()) {
// Cannot enable group trim without sort key.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is required for within segment trimming right? Is it an actual requirement for cross segment trimming?

Anyway, I'm not sure if we should make this kind of decisions here. Couldn't we populate the fields and let the actual operator decide whether an optimization can be applied or not?

More specifically: Don't you think it may be interesting to keep the limit even if we don't have collations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having non-empty collation is required for trimming within segment and cross-segment but it could be beneficial to propagate limit even if collation is missing.
That is because when limit is present (!= Integer.MAX_VALUE) combine operator might limit the number of group by keys in indexed table.

Comment on lines +88 to +90
public class PinotAggregateExchangeNodeInsertRule {

public static class SortProjectAggregate extends RelOptRule {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the javadoc in the main class applies to all rules, it would be cool to add a javadoc for each rule to explain what they do (and maybe add an example)

}

@Override
public void onMatch(RelOptRuleCall call) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Calcite we have onMatch and matches methods. This method seems to be doing both. I think it would be better to override matches to do not apply the rule in all the cases that are filtered out here.

For the final query is the same and we would need to repeat some code, but the advantage of using matches is that if try to debug which rules apply (using MarkerFilter, as explained in https://youtu.be/_phzRNCWJfw?si=hH9ukXAS2Iml11nq&t=331).

I've just created #14680 to do not forget how to enable these logs.

@@ -249,6 +249,39 @@
"\n LogicalTableScan(table=[[default, a]])",
"\n"
]
},
Copy link
Contributor

@bziobrowski bziobrowski Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this PR meant to enable trimming within (minSegmentTrim) and across segments (minServerTrim) ?
If so - It'd be good to mention that :

  • the former is disabled by default and requires using query option (or cluster setting)
  • the latter is enabled by default but probably haven't been applied so far

I reckon It'd be good to also test this hint has on a query without order by and/or limit clause.

Apart from manual hinting I think we could propagate limit and order by details to leaf plans and enable both types of trimming if:

  • order by is based on group by key(s) only (not aggregates)
  • there is no HAVING clause

@@ -321,6 +321,10 @@
"description": "aggregate with skip intermediate stage hint (via hint option is_partitioned_by_group_by_keys)",
"sql": "SELECT /*+ aggOptions(is_partitioned_by_group_by_keys='true') */ {tbl1}.num, COUNT(*), SUM({tbl1}.val), SUM({tbl1}.num), COUNT(DISTINCT {tbl1}.val) FROM {tbl1} WHERE {tbl1}.val >= 0 AND {tbl1}.name != 'a' GROUP BY {tbl1}.num"
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test asserting that trimming affects results ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Configuration Config changes (addition/deletion/change in behavior) documentation enhancement multi-stage Related to the multi-stage query engine
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants