-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark 3.3: Backport support for default values #11988
Spark 3.3: Backport support for default values #11988
Conversation
526f807
to
8f4c8e5
Compare
@@ -65,7 +91,7 @@ public abstract class AvroDataTest { | |||
required(117, "dec_38_10", Types.DecimalType.of(38, 10)) // Spark's maximum precision | |||
); | |||
|
|||
@Rule public TemporaryFolder temp = new TemporaryFolder(); | |||
@TempDir protected Path temp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned on the PR for 3.4, this JUnit 4 temp folder wasn't working for JUnit 5 parameterized tests. I made some tests independent of this (to keep the backport small) and ended up porting subclasses of AvroDataTest
to JUnit 5 in a larger backport.
These test changes were the only significant deviations from the original PRs.
DateTimeUtil.isoTimestamptzToMicros("2024-12-17T23:59:59.999999+00:00")), | ||
// Arguments.of( | ||
// Types.TimestampType.withoutZone(), | ||
// DateTimeUtil.isoTimestampToMicros("2024-12-17T23:59:59.999999")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark 3.3 doesn't support TimestampNTZ without a flag, so this 3.3 backport doesn't remove withSQLConf
below or the testTimestampWithoutZone
case. It also doesn't use TimestampType.withoutZone()
in default tests or in tests that use SUPPORTED_PRIMITIVES
.
Do we still want to back-port new features to Spark 3.3 given its support is deprecated? |
I think it is best to keep the Spark versions as close as possible 👍 |
Here's what we say on "Deprecated".
Isn't this PR to achieve feature parity? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think @manuzhang is technically correct, generally we wouldn't backport to 3.3.
We'd remove the 3.3 support anyways in the 1.9 release.
All that said, I'm not really opposed to getting it in for 3.3 (I'd say we should document this is supported for 3.3 though) unless there's strong objections? I'd also say going forward though we probably just want to be mindful of this, just to ensure we don't increase our maintenance burden.
@manuzhang, I think this is a good idea. While we don't really expect people to use default values yet, Spark versions stay around a long time. Having this support helps ensure that there aren't correctness issues when people use this version with Spark 3.3 a few years from now. It's not strictly necessary, but since it wasn't very difficult (just porting the 3.4 changes) I thought it would be a good idea to do it. If you're against it, we can discuss more. |
@amogh-jahagirdar @rdblue I agree with your rational, but I'm confused about the criteria here. Shall we back-port other features from 3.4 / 3.5 since they are also nice and not difficult to have? It might also be confusing to contributors / users that the meaning of deprecation seems arbitrary. |
@manuzhang, this could be a correctness issue with Spark 3.3 and v3 tables, so I think it is an important fix. The language you're referencing is also trying to set expectations for other people, not limit what we will commit:
I'm the one interested in backporting this to avoid potential problems, but there should still not be an expectation that the Iceberg community will backport everything just because the branch is still there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation @rdblue , I missed the statement in the docs:
People who are still interested in the version can backport any necessary feature or bug fix from newer versions, but the community will not spend effort in achieving feature parity
Given that, and that default values probably should go in to avoid any future correctness issues if people use this version with Spark 3.3, I think it makes sense to get this in.
This backports support for default values from 3.5.
Each PR is backported as a separate commit: #11299, #11803, #11811, #11815, and #11832.
This contains the same changes as #11987.