-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: add rule to decorrelate union operators #131141
Conversation
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r1, 10 of 10 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)
pkg/sql/opt/norm/rules/decorrelate.opt
line 410 at r3 (raw file):
# TryDecorrelateUnion replaces a Union/UnionAll beneath a ScalarGroupBy with an # InnerJoin between two ScalarGroupBy operators. A Project operator coalesces
It's really a cross-join isn't it? A diagram of the transformation in this comment, either in an abbreviated opt plan format or SQL, might be helpful in explaining the rule, but up to you.
pkg/sql/opt/norm/testdata/rules/decorrelate
line 7595 at r3 (raw file):
│ └── j └── projections └── CASE WHEN COALESCE("?column?", "?column?") IS NOT NULL THEN 1 ELSE 0 END
nit: I realize it'll blow up the size of these plans, but a more verbose output might be nice, e.g., to see that the correct columns are coalesced here.
This commit makes a small improvement to the `EnsureKey` custom function (used in decorrelation rules), so that it can add passthrough columns to a `Project` operator in an effort to find a key. This can prevent rules from adding unnecessary `Ordinality` operators to the query plan. Epic: None Release note: None
This commit makes a small improvement to the subquery-hoisting rules so that hoisting an `EXISTS` subquery can often avoid projecting a new column to check for NULL values. This can allow other optimization rules to match later on. Epic: None Release note: None
This commit adds a new rule `TryDecorrelateUnion`, which matches on a `Union` or `UnionAll` operator in the input of a `ScalarGroupBy`. The `ScalarGroupBy` must have "any-not-null" semantics, meaning it produces an arbitrary non-null value from each input column. If these conditions are satisfied, the `Union` operator is replaced by an `InnerJoin` between two `ScalarGroupBy` operators. A `Project` coalesces columns from each side of the join to produce the final aggregated values. This transformation does not itself decorrelate the `Union` operators, but it does make it easier for other rules to do so. Release note: None Epic: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)
pkg/sql/opt/norm/rules/decorrelate.opt
line 410 at r3 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
It's really a cross-join isn't it? A diagram of the transformation in this comment, either in an abbreviated opt plan format or SQL, might be helpful in explaining the rule, but up to you.
Done.
pkg/sql/opt/norm/testdata/rules/decorrelate
line 7595 at r3 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: I realize it'll blow up the size of these plans, but a more verbose output might be nice, e.g., to see that the correct columns are coalesced here.
Done. Turns out there's a show-columns
directive.
2653bec
to
1ec3568
Compare
bors r+ |
Post-merge thought: all these tests have a single column projected in the subquery. Should we have some tests with multiple columns, expressions on those columns, and column references instead of constants? E.g.:
|
opt: improve EnsureKey custom function
This commit makes a small improvement to the
EnsureKey
custom function(used in decorrelation rules), so that it can add passthrough columns to
a
Project
operator in an effort to find a key. This can prevent rulesfrom adding unnecessary
Ordinality
operators to the query plan.Epic: None
Release note: None
opt: improve exists subquery hoisting
This commit makes a small improvement to the subquery-hoisting rules so
that hoisting an
EXISTS
subquery can often avoid projecting a newcolumn to check for NULL values. This can allow other optimization rules
to match later on.
Epic: None
Release note: None
opt: add rule to decorrelate unions in EXISTS subqueries
This commit adds a new rule
TryDecorrelateUnion
, which matches on aUnion
orUnionAll
operator in the input of aScalarGroupBy
. TheScalarGroupBy
must have "any-not-null" semantics, meaning it producesan arbitrary non-null value from each input column.
If these conditions are satisfied, the
Union
operator is replaced by anInnerJoin
between twoScalarGroupBy
operators. AProject
coalescescolumns from each side of the join to produce the final aggregated values.
This transformation does not itself decorrelate the
Union
operators, butit does make it easier for other rules to do so.
Release note: None
Epic: None