Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: add rule to decorrelate union operators #131141

Merged
merged 3 commits into from
Oct 3, 2024

Conversation

DrewKimball
Copy link
Collaborator

opt: improve EnsureKey custom function

This commit makes a small improvement to the EnsureKey custom function
(used in decorrelation rules), so that it can add passthrough columns to
a Project operator in an effort to find a key. This can prevent rules
from adding unnecessary Ordinality operators to the query plan.

Epic: None

Release note: None

opt: improve exists subquery hoisting

This commit makes a small improvement to the subquery-hoisting rules so
that hoisting an EXISTS subquery can often avoid projecting a new
column to check for NULL values. This can allow other optimization rules
to match later on.

Epic: None

Release note: None

opt: add rule to decorrelate unions in EXISTS subqueries

This commit adds a new rule TryDecorrelateUnion, which matches on a
Union or UnionAll operator in the input of a ScalarGroupBy. The
ScalarGroupBy must have "any-not-null" semantics, meaning it produces
an arbitrary non-null value from each input column.

If these conditions are satisfied, the Union operator is replaced by an
InnerJoin between two ScalarGroupBy operators. A Project coalesces
columns from each side of the join to produce the final aggregated values.

This transformation does not itself decorrelate the Union operators, but
it does make it easier for other rules to do so.

Release note: None

Epic: None

@DrewKimball DrewKimball requested review from mgartner and a team September 21, 2024 08:04
@DrewKimball DrewKimball requested a review from a team as a code owner September 21, 2024 08:04
Copy link

blathers-crl bot commented Sep 21, 2024

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Very nice!!!

Reviewed 6 of 6 files at r1, 10 of 10 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)


pkg/sql/opt/norm/rules/decorrelate.opt line 410 at r3 (raw file):


# TryDecorrelateUnion replaces a Union/UnionAll beneath a ScalarGroupBy with an
# InnerJoin between two ScalarGroupBy operators. A Project operator coalesces

It's really a cross-join isn't it? A diagram of the transformation in this comment, either in an abbreviated opt plan format or SQL, might be helpful in explaining the rule, but up to you.


pkg/sql/opt/norm/testdata/rules/decorrelate line 7595 at r3 (raw file):

 │              └── j
 └── projections
      └── CASE WHEN COALESCE("?column?", "?column?") IS NOT NULL THEN 1 ELSE 0 END

nit: I realize it'll blow up the size of these plans, but a more verbose output might be nice, e.g., to see that the correct columns are coalesced here.

This commit makes a small improvement to the `EnsureKey` custom function
(used in decorrelation rules), so that it can add passthrough columns to
a `Project` operator in an effort to find a key. This can prevent rules
from adding unnecessary `Ordinality` operators to the query plan.

Epic: None

Release note: None
This commit makes a small improvement to the subquery-hoisting rules so
that hoisting an `EXISTS` subquery can often avoid projecting a new
column to check for NULL values. This can allow other optimization rules
to match later on.

Epic: None

Release note: None
This commit adds a new rule `TryDecorrelateUnion`, which matches on a
`Union` or `UnionAll` operator in the input of a `ScalarGroupBy`. The
`ScalarGroupBy` must have "any-not-null" semantics, meaning it produces
an arbitrary non-null value from each input column.

If these conditions are satisfied, the `Union` operator is replaced by an
`InnerJoin` between two `ScalarGroupBy` operators. A `Project` coalesces
columns from each side of the join to produce the final aggregated values.

This transformation does not itself decorrelate the `Union` operators, but
it does make it easier for other rules to do so.

Release note: None

Epic: None
Copy link
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)


pkg/sql/opt/norm/rules/decorrelate.opt line 410 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

It's really a cross-join isn't it? A diagram of the transformation in this comment, either in an abbreviated opt plan format or SQL, might be helpful in explaining the rule, but up to you.

Done.


pkg/sql/opt/norm/testdata/rules/decorrelate line 7595 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: I realize it'll blow up the size of these plans, but a more verbose output might be nice, e.g., to see that the correct columns are coalesced here.

Done. Turns out there's a show-columns directive.

@DrewKimball
Copy link
Collaborator Author

bors r+

@craig craig bot merged commit 0c0af95 into cockroachdb:master Oct 3, 2024
23 checks passed
@DrewKimball DrewKimball deleted the decorrelate branch October 3, 2024 03:16
@mgartner
Copy link
Collaborator

mgartner commented Oct 4, 2024

pkg/sql/opt/norm/testdata/rules/decorrelate line 7801 at r6 (raw file):

# No-op because there's an Intersect instead of a Union.
norm expect-not=TryDecorrelateUnion format=(hide-all,show-columns)
SELECT *, CASE WHEN EXISTS (SELECT 1 FROM xy WHERE x = k INTERSECT SELECT 1 FROM uv WHERE v = k) THEN 1 ELSE 0 END FROM a;

Post-merge thought: all these tests have a single column projected in the subquery. Should we have some tests with multiple columns, expressions on those columns, and column references instead of constants? E.g.:

... EXISTS (SELECT 1, 2 ... UNION ...)

... EXISTS (SELECT x ... UNION ...)

... EXISTS (SELECT x + y ... UNION ...)

... EXISTS (SELECT x, y ... UNION ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants