-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-51114] [SQL] Refactor PullOutNondeterministic rule #49837
[SPARK-51114] [SQL] Refactor PullOutNondeterministic rule #49837
Conversation
cccf3a3
to
e08b60b
Compare
Please specify that this is for single-pass Analyzer in the PR description. |
...catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/PullOutNondeterministic.scala
Outdated
Show resolved
Hide resolved
e08b60b
to
ddd9b52
Compare
Hi @MaxGekk could you please review when you have time? Thanks |
...main/scala/org/apache/spark/sql/catalyst/analysis/NondeterministicExpressionCollection.scala
Outdated
Show resolved
Hide resolved
...main/scala/org/apache/spark/sql/catalyst/analysis/NondeterministicExpressionCollection.scala
Outdated
Show resolved
Hide resolved
...main/scala/org/apache/spark/sql/catalyst/analysis/NondeterministicExpressionCollection.scala
Outdated
Show resolved
Hide resolved
nondeterministicToAttributes | ||
} | ||
|
||
def tryConvertNondeterministicToAttribute( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels too trivial to be reused. Please use getOrDefault
in-place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getOrDefault
can't be used here as default value is Expression
type but getOrDefault
expects a NamedExpression
. I would leave it like it is
val newChild = Project(a.child.output ++ nondeterToAttr.values, a.child) | ||
val nondeterToAttr = | ||
NondeterministicExpressionCollection.getNondeterministicToAttributes(a.groupingExpressions) | ||
val newChild = Project(a.child.output ++ nondeterToAttr.values.asScala.toSeq, a.child) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a separate issue (not related to this refactoring), but we have a non-deterministic Project
list order here. We should do a separate PR with a transition to a LinkedHashMap
.
Kinda similar to #49319
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah makes sense. Will make a followup after this one is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
ddd9b52
to
460f8e0
Compare
...catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/PullOutNondeterministic.scala
Show resolved
Hide resolved
460f8e0
to
62c97dd
Compare
} | ||
e -> ne | ||
} | ||
}.toMap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we also keep the previous code of creating an immutable map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
62c97dd
to
b200cf9
Compare
LGTM |
The AQE test failure is unrelated, thanks, merging to master! |
What changes were proposed in this pull request?
Refactor
PullOutNondeterministic
rule body so it can be reused in the single-pass analyzer.Why are the changes needed?
Better reusability of the
PullOutNondeterministic
rule.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing tests (just refactoring).
Was this patch authored or co-authored using generative AI tooling?
No.