-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Spark 4.0: Add schema conversion support for default values #14407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this looks pretty good for the first part of respecting default values in Spark. Just some minor comments
Thank you @geruh !
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/TypeToSparkType.java
Show resolved
Hide resolved
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/TestSparkSchemaUtil.java
Outdated
Show resolved
Hide resolved
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkDefaultValues.java
Outdated
Show resolved
Hide resolved
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkDefaultValues.java
Outdated
Show resolved
Hide resolved
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkDefaultValues.java
Outdated
Show resolved
Hide resolved
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkDefaultValues.java
Outdated
Show resolved
Hide resolved
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/TypeToSparkType.java
Outdated
Show resolved
Hide resolved
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/TypeToSparkType.java
Outdated
Show resolved
Hide resolved
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/TypeToSparkType.java
Show resolved
Hide resolved
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/TypeToSparkType.java
Show resolved
Hide resolved
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/TypeToSparkType.java
Show resolved
Hide resolved
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSparkDefaultValues.java
Outdated
Show resolved
Hide resolved
|
@geruh Thanks for the PR! It looks good to me overall. Just left a few minor comments. |
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @geruh, I'll leave it up for a bit in case anyone else has any comments.
|
Thanks @geruh and @huaxingao for reviewing! |
This PR adds support for default values in Spark. During the conversion of an Iceberg schema to Spark's StructType, default values are now passed through to Spark's column metadata using the
CURRENT_DEFAULTandEXISTS_DEFAULTkeys that Spark recognizes.The changes extend
TypeToSparkType()function to extract default values from Iceberg fields and convert them to Spark SQL string representations, enabling Spark to understand and utilize the defaults that were defined in Iceberg.Tests for initial defaults weren't added here since that functionality already works without these changes. So I'll follow up with some to be added to this new test suite.
Note: The current tests focus on default Write capabilities as partial column inserts for DSV2 tables aren't available until Spark 4.1.0 per apache/spark#50044.