Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add group by trimming to MSQE/V2 query engine #14727

Merged
merged 10 commits into from
Jan 14, 2025

Conversation

bziobrowski
Copy link
Contributor

@bziobrowski bziobrowski commented Dec 30, 2024

PR adds following to MSQE engine:

  • group_trim_size hint - that enables trimming at aggregate operator stage if both order by and limit are available (currently requires using is_enable_group_trim hint). Note: is_enable_group_trim also enables v1-style leaf-stage group by results trimming. See grouping algorithm documentation for details.
  • error_or_num_groups_limit hint or errorOnNumGroupsLimit query option - throws exception when num_groups_limit is reached in aggregate operator instead of setting a metadata flag

Examples:

  • enable group by trimming in MSQE intermediate stage:
    Query:
select /*+  aggOptions(is_enable_group_trim='true',num_groups_limit='50') */ i, j, count(*) as cnt
from tab
group by i, j
order by i, j desc
limit 5

Execution plan:

LogicalSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[DESC], offset=[0], fetch=[5])
       PinotLogicalSortExchange(distribution=[hash], collation=[[0, 1 DESC]], ...)
           LogicalSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[DESC], fetch=[5])                  
             PinotLogicalAggregate(group=[{0, 1}], agg#0=[COUNT($2)], aggType=[FINAL]...) <-- trimming happens here
               PinotLogicalExchange(distribution=[hash[0, 1]])
                 LeafStageCombineOperator(table=[mytable])
                   StreamingInstanceResponse
                     CombineGroupBy
                       GroupBy(groupKeys=[[i, j]], aggregations=[[count(*)]])
                         Project(columns=[[i, j]])
                           DocIdSet(maxDocs=[40000])
                             FilterMatchEntireSegment(numDocs=[80])
  • enable group by trimming in MSQE leaf and intermediate stage:
    Query:
select /*+  aggOptions(is_enable_group_trim='true',group_trim_size='3') */ t1.i, t1.j, count(*) as cnt
 from tab t1
 join tab t2 on 1=1
 group by t1.i, t1.j
 order by t1.i asc, t1.j asc
 limit 5

Execution plan:

Execution plan:
LogicalSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC], offset=[0], fetch=[5])
  PinotLogicalSortExchange(distribution=[hash], collation=[[0, 1]], isSortOnSender=[false], "
isSortOnReceiver=[true])
    LogicalSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC], fetch=[5])
      PinotLogicalAggregate(group=[{0, 1}], agg#0=[COUNT($2)], aggType=[FINAL], ...) <-- trimming happens here
        PinotLogicalExchange(distribution=[hash[0, 1]])
          PinotLogicalAggregate(group=[{0, 1}], agg#0=[COUNT()], aggType=[LEAF], ...) <-- trimming happens here
            LogicalJoin(condition=[true], joinType=[inner])
              PinotLogicalExchange(distribution=[random])
                LeafStageCombineOperator(table=[mytable])
                  StreamingInstanceResponse
                    StreamingCombineSelect
                      SelectStreaming(table=[mytable], totalDocs=[80])
                        Project(columns=[[i, j]])
                          DocIdSet(maxDocs=[40000])
                            FilterMatchEntireSegment(numDocs=[80])
              PinotLogicalExchange(distribution=[broadcast])
                LeafStageCombineOperator(table=[mytable])
                  StreamingInstanceResponse
                    StreamingCombineSelect
                      SelectStreaming(table=[mytable], totalDocs=[80])
                        Transform(expressions=[['0']])
                          Project(columns=[[]])
                            DocIdSet(maxDocs=[40000])
                              FilterMatchEntireSegment(numDocs=[80])

cc @Jackie-Jiang @gortiz

@codecov-commenter
Copy link

codecov-commenter commented Dec 30, 2024

Codecov Report

Attention: Patch coverage is 59.18367% with 60 lines in your changes missing coverage. Please review.

Project coverage is 63.89%. Comparing base (59551e4) to head (94f20d6).
Report is 1570 commits behind head on master.

Files with missing lines Patch % Lines
...inot/controller/helix/ControllerRequestClient.java 0.00% 22 Missing ⚠️
...ry/runtime/operator/MultistageGroupByExecutor.java 52.63% 17 Missing and 1 partial ⚠️
...inot/query/runtime/operator/AggregateOperator.java 64.10% 6 Missing and 8 partials ⚠️
...va/org/apache/pinot/query/runtime/QueryRunner.java 42.85% 1 Missing and 3 partials ⚠️
.../pinot/query/service/dispatch/QueryDispatcher.java 92.85% 0 Missing and 1 partial ⚠️
...spi/utils/builder/ControllerRequestURLBuilder.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14727      +/-   ##
============================================
+ Coverage     61.75%   63.89%   +2.14%     
- Complexity      207     1612    +1405     
============================================
  Files          2436     2704     +268     
  Lines        133233   151088   +17855     
  Branches      20636    23342    +2706     
============================================
+ Hits          82274    96537   +14263     
- Misses        44911    47323    +2412     
- Partials       6048     7228    +1180     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.84% <59.18%> (+2.13%) ⬆️
java-21 63.75% <59.18%> (+2.12%) ⬆️
skip-bytebuffers-false 63.89% <59.18%> (+2.14%) ⬆️
skip-bytebuffers-true 63.70% <59.18%> (+35.97%) ⬆️
temurin 63.89% <59.18%> (+2.14%) ⬆️
unittests 63.89% <59.18%> (+2.14%) ⬆️
unittests1 56.31% <69.60%> (+9.42%) ⬆️
unittests2 34.15% <10.88%> (+6.42%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

# Conflicts:
#	pinot-query-planner/src/main/java/org/apache/pinot/calcite/rel/logical/PinotLogicalAggregate.java
#	pinot-query-planner/src/main/java/org/apache/pinot/calcite/rel/rules/PinotAggregateExchangeNodeInsertRule.java
#	pinot-query-planner/src/main/java/org/apache/pinot/query/planner/plannode/AggregateNode.java
#	pinot-query-planner/src/test/resources/queries/GroupByPlans.json
#	pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/plan/server/ServerPlanRequestVisitor.java
#	pinot-query-runtime/src/test/resources/queries/QueryHints.json
@gortiz gortiz merged commit b6904da into apache:master Jan 14, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants