-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Multi-stage] Support is_enable_group_trim agg option #14664
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #14664 +/- ##
============================================
+ Coverage 61.75% 64.01% +2.26%
- Complexity 207 1606 +1399
============================================
Files 2436 2703 +267
Lines 133233 148997 +15764
Branches 20636 22839 +2203
============================================
+ Hits 82274 95387 +13113
- Misses 44911 46622 +1711
- Partials 6048 6988 +940
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
9eaf163
to
e6d0b43
Compare
e6d0b43
to
7c21d95
Compare
cc @bziobrowski |
List<RexNode> projects = projectRel.getProjects(); | ||
List<RelFieldCollation> collations = sortRel.getCollation().getFieldCollations(); | ||
if (collations.isEmpty()) { | ||
// Cannot enable group trim without sort key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this is required for within segment trimming right? Is it an actual requirement for cross segment trimming?
Anyway, I'm not sure if we should make this kind of decisions here. Couldn't we populate the fields and let the actual operator decide whether an optimization can be applied or not?
More specifically: Don't you think it may be interesting to keep the limit even if we don't have collations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having non-empty collation is required for trimming within segment and cross-segment but it could be beneficial to propagate limit even if collation is missing.
That is because when limit is present (!= Integer.MAX_VALUE) combine operator might limit the number of group by keys in indexed table.
public class PinotAggregateExchangeNodeInsertRule { | ||
|
||
public static class SortProjectAggregate extends RelOptRule { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the javadoc in the main class applies to all rules, it would be cool to add a javadoc for each rule to explain what they do (and maybe add an example)
} | ||
|
||
@Override | ||
public void onMatch(RelOptRuleCall call) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
Calcite we have onMatch
and matches
methods. This method seems to be doing both. I think it would be better to override matches
to do not apply the rule in all the cases that are filtered out here.
For the final query is the same and we would need to repeat some code, but the advantage of using matches is that if try to debug which rules apply (using MarkerFilter, as explained in https://youtu.be/_phzRNCWJfw?si=hH9ukXAS2Iml11nq&t=331).
I've just created #14680 to do not forget how to enable these logs.
@@ -249,6 +249,39 @@ | |||
"\n LogicalTableScan(table=[[default, a]])", | |||
"\n" | |||
] | |||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this PR meant to enable trimming within (minSegmentTrim) and across segments (minServerTrim) ?
If so - It'd be good to mention that :
- the former is disabled by default and requires using query option (or cluster setting)
- the latter is enabled by default but probably haven't been applied so far
I reckon It'd be good to also test this hint has on a query without order by and/or limit clause.
Apart from manual hinting I think we could propagate limit and order by details to leaf plans and enable both types of trimming if:
- order by is based on group by key(s) only (not aggregates)
- there is no HAVING clause
@@ -321,6 +321,10 @@ | |||
"description": "aggregate with skip intermediate stage hint (via hint option is_partitioned_by_group_by_keys)", | |||
"sql": "SELECT /*+ aggOptions(is_partitioned_by_group_by_keys='true') */ {tbl1}.num, COUNT(*), SUM({tbl1}.val), SUM({tbl1}.num), COUNT(DISTINCT {tbl1}.val) FROM {tbl1} WHERE {tbl1}.val >= 0 AND {tbl1}.name != 'a' GROUP BY {tbl1}.num" | |||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a test asserting that trimming affects results ?
Introduce
is_enable_group_trim
as an agg option to enable group trim in the leaf stage.Group trim can be enabled when there is order by and limit.
E.g.