Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49992][SQL] Default collation resolution for DDL and DML queries #48962

Closed

Conversation

stefankandic
Copy link
Contributor

@stefankandic stefankandic commented Nov 25, 2024

What changes were proposed in this pull request?

This PR proposes not using session-level collation in DDL commands (create/alter view/table, add/replace columns).

Also, resolution of default collation should happen in the analyzer and not in the parser. However, due to how we are checking for default string type (using reference equals with StringType object) we cannot just replace this object with StringType("UTF8_BINARY") because they compare as equal so the tree node framework will just return the old plan. Because of this we have to perform this resolution twice, once by changing the StringType object into a TemporaryStringType and then back to StringType("UTF8_BINARY") which is not considered a default string type anymore.

Another thing is that the dependent rules ResolveInlineTables and CollationTypeCoercion are updated so that they don't execute if there are still unresolved string types in the plan.

Why are the changes needed?

The default collation for DDL commands should be tied to the object being created or altered (e.g., table, view, schema) rather than the session-level setting. Since object-level collations are not yet supported, we will assume the UTF8_BINARY collation by default for now.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added new unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@cloud-fan
Copy link
Contributor

The last commit changes code comment only, we don't need to test it again. I'm merging it to master, thanks!

@cloud-fan cloud-fan closed this in b45045e Nov 29, 2024
cloud-fan pushed a commit that referenced this pull request Feb 13, 2025
… apply object level collation for DDL queries

### What changes were proposed in this pull request?

This PR is a partial revert of the original PR #48962 that introduced the resolution of default session level collation for DDL and DML queries.
The part that is reverted is the default collation resolution for DML queries, whereas the part that is kept is the default collation resolution for DDL queries, which is required to apply the object level collation that was introduced as part of PR #49084. As part of this logic, object level collation is now applied to DDL queries accordingly, with the main logic implemented in ResolveDefaultStringTypes.stringTypeForDDLCommand() method.

### Why are the changes needed?

As there were some unresolved technical issues when attempting to merge the functionality from PR #48962 on Delta side, due to its effect on DML queries, it was decided to pause this functionality for now, thus partially reverting unused parts for maintaining a cleaner code moving forward.
Also, this is inline with customer feedback where object level collation is much more requested functionality, so the focus is to introduce the resolution of object level collation for DDL queries instead, allowing the collation to be specified per table or view on their creation or modification, with propagating the default collation specified to subsequent queries on top of those entities.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests that cover the collations functionality, as well adding new dedicated tests for applying object level collation to the underlying columns.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #49772 from dejankrak-db/revert-session-collations.

Lead-authored-by: Dejan Krakovic <[email protected]>
Co-authored-by: Stefan Kandic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Feb 13, 2025
… apply object level collation for DDL queries

### What changes were proposed in this pull request?

This PR is a partial revert of the original PR #48962 that introduced the resolution of default session level collation for DDL and DML queries.
The part that is reverted is the default collation resolution for DML queries, whereas the part that is kept is the default collation resolution for DDL queries, which is required to apply the object level collation that was introduced as part of PR #49084. As part of this logic, object level collation is now applied to DDL queries accordingly, with the main logic implemented in ResolveDefaultStringTypes.stringTypeForDDLCommand() method.

### Why are the changes needed?

As there were some unresolved technical issues when attempting to merge the functionality from PR #48962 on Delta side, due to its effect on DML queries, it was decided to pause this functionality for now, thus partially reverting unused parts for maintaining a cleaner code moving forward.
Also, this is inline with customer feedback where object level collation is much more requested functionality, so the focus is to introduce the resolution of object level collation for DDL queries instead, allowing the collation to be specified per table or view on their creation or modification, with propagating the default collation specified to subsequent queries on top of those entities.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests that cover the collations functionality, as well adding new dedicated tests for applying object level collation to the underlying columns.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #49772 from dejankrak-db/revert-session-collations.

Lead-authored-by: Dejan Krakovic <[email protected]>
Co-authored-by: Stefan Kandic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit e92e12a)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants