-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49992][SQL] Default collation resolution for DDL and DML queries #48962
Closed
stefankandic
wants to merge
60
commits into
apache:master
from
stefankandic:fixSessionCollationOrder
Closed
[SPARK-49992][SQL] Default collation resolution for DDL and DML queries #48962
stefankandic
wants to merge
60
commits into
apache:master
from
stefankandic:fixSessionCollationOrder
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
cloud-fan
reviewed
Nov 28, 2024
...talyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultStringTypes.scala
Outdated
Show resolved
Hide resolved
cloud-fan
reviewed
Nov 28, 2024
...talyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultStringTypes.scala
Outdated
Show resolved
Hide resolved
cloud-fan
reviewed
Nov 28, 2024
...talyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultStringTypes.scala
Outdated
Show resolved
Hide resolved
cloud-fan
reviewed
Nov 28, 2024
...talyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultStringTypes.scala
Outdated
Show resolved
Hide resolved
cloud-fan
reviewed
Nov 28, 2024
...talyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultStringTypes.scala
Show resolved
Hide resolved
cloud-fan
reviewed
Nov 28, 2024
...talyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultStringTypes.scala
Outdated
Show resolved
Hide resolved
cloud-fan
reviewed
Nov 28, 2024
...talyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultStringTypes.scala
Outdated
Show resolved
Hide resolved
cloud-fan
reviewed
Nov 28, 2024
...talyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultStringTypes.scala
Outdated
Show resolved
Hide resolved
cloud-fan
reviewed
Nov 28, 2024
...talyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultStringTypes.scala
Outdated
Show resolved
Hide resolved
cloud-fan
reviewed
Nov 28, 2024
...talyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultStringTypes.scala
Outdated
Show resolved
Hide resolved
cloud-fan
reviewed
Nov 29, 2024
...talyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultStringTypes.scala
Outdated
Show resolved
Hide resolved
cloud-fan
reviewed
Nov 29, 2024
...talyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultStringTypes.scala
Outdated
Show resolved
Hide resolved
cloud-fan
reviewed
Nov 29, 2024
...talyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultStringTypes.scala
Outdated
Show resolved
Hide resolved
cloud-fan
approved these changes
Nov 29, 2024
The last commit changes code comment only, we don't need to test it again. I'm merging it to master, thanks! |
cloud-fan
pushed a commit
that referenced
this pull request
Feb 13, 2025
… apply object level collation for DDL queries ### What changes were proposed in this pull request? This PR is a partial revert of the original PR #48962 that introduced the resolution of default session level collation for DDL and DML queries. The part that is reverted is the default collation resolution for DML queries, whereas the part that is kept is the default collation resolution for DDL queries, which is required to apply the object level collation that was introduced as part of PR #49084. As part of this logic, object level collation is now applied to DDL queries accordingly, with the main logic implemented in ResolveDefaultStringTypes.stringTypeForDDLCommand() method. ### Why are the changes needed? As there were some unresolved technical issues when attempting to merge the functionality from PR #48962 on Delta side, due to its effect on DML queries, it was decided to pause this functionality for now, thus partially reverting unused parts for maintaining a cleaner code moving forward. Also, this is inline with customer feedback where object level collation is much more requested functionality, so the focus is to introduce the resolution of object level collation for DDL queries instead, allowing the collation to be specified per table or view on their creation or modification, with propagating the default collation specified to subsequent queries on top of those entities. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests that cover the collations functionality, as well adding new dedicated tests for applying object level collation to the underlying columns. ### Was this patch authored or co-authored using generative AI tooling? No Closes #49772 from dejankrak-db/revert-session-collations. Lead-authored-by: Dejan Krakovic <[email protected]> Co-authored-by: Stefan Kandic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan
pushed a commit
that referenced
this pull request
Feb 13, 2025
… apply object level collation for DDL queries ### What changes were proposed in this pull request? This PR is a partial revert of the original PR #48962 that introduced the resolution of default session level collation for DDL and DML queries. The part that is reverted is the default collation resolution for DML queries, whereas the part that is kept is the default collation resolution for DDL queries, which is required to apply the object level collation that was introduced as part of PR #49084. As part of this logic, object level collation is now applied to DDL queries accordingly, with the main logic implemented in ResolveDefaultStringTypes.stringTypeForDDLCommand() method. ### Why are the changes needed? As there were some unresolved technical issues when attempting to merge the functionality from PR #48962 on Delta side, due to its effect on DML queries, it was decided to pause this functionality for now, thus partially reverting unused parts for maintaining a cleaner code moving forward. Also, this is inline with customer feedback where object level collation is much more requested functionality, so the focus is to introduce the resolution of object level collation for DDL queries instead, allowing the collation to be specified per table or view on their creation or modification, with propagating the default collation specified to subsequent queries on top of those entities. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests that cover the collations functionality, as well adding new dedicated tests for applying object level collation to the underlying columns. ### Was this patch authored or co-authored using generative AI tooling? No Closes #49772 from dejankrak-db/revert-session-collations. Lead-authored-by: Dejan Krakovic <[email protected]> Co-authored-by: Stefan Kandic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit e92e12a) Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR proposes not using session-level collation in DDL commands (create/alter view/table, add/replace columns).
Also, resolution of default collation should happen in the analyzer and not in the parser. However, due to how we are checking for default string type (using reference equals with
StringType
object) we cannot just replace this object withStringType("UTF8_BINARY")
because they compare as equal so the tree node framework will just return the old plan. Because of this we have to perform this resolution twice, once by changing theStringType
object into aTemporaryStringType
and then back toStringType("UTF8_BINARY")
which is not considered a default string type anymore.Another thing is that the dependent rules
ResolveInlineTables
andCollationTypeCoercion
are updated so that they don't execute if there are still unresolved string types in the plan.Why are the changes needed?
The default collation for DDL commands should be tied to the object being created or altered (e.g., table, view, schema) rather than the session-level setting. Since object-level collations are not yet supported, we will assume the UTF8_BINARY collation by default for now.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added new unit tests.
Was this patch authored or co-authored using generative AI tooling?
No.