Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 3.3: Backport support for default values #11988

Merged
merged 5 commits into from
Jan 21, 2025

Conversation

rdblue
Copy link
Contributor

@rdblue rdblue commented Jan 17, 2025

This backports support for default values from 3.5.

Each PR is backported as a separate commit: #11299, #11803, #11811, #11815, and #11832.

This contains the same changes as #11987.

@github-actions github-actions bot added the spark label Jan 17, 2025
@rdblue rdblue force-pushed the spark-3.3-default-values branch from 526f807 to 8f4c8e5 Compare January 17, 2025 03:26
@@ -65,7 +91,7 @@ public abstract class AvroDataTest {
required(117, "dec_38_10", Types.DecimalType.of(38, 10)) // Spark's maximum precision
);

@Rule public TemporaryFolder temp = new TemporaryFolder();
@TempDir protected Path temp;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned on the PR for 3.4, this JUnit 4 temp folder wasn't working for JUnit 5 parameterized tests. I made some tests independent of this (to keep the backport small) and ended up porting subclasses of AvroDataTest to JUnit 5 in a larger backport.

These test changes were the only significant deviations from the original PRs.

DateTimeUtil.isoTimestamptzToMicros("2024-12-17T23:59:59.999999+00:00")),
// Arguments.of(
// Types.TimestampType.withoutZone(),
// DateTimeUtil.isoTimestampToMicros("2024-12-17T23:59:59.999999")),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark 3.3 doesn't support TimestampNTZ without a flag, so this 3.3 backport doesn't remove withSQLConf below or the testTimestampWithoutZone case. It also doesn't use TimestampType.withoutZone() in default tests or in tests that use SUPPORTED_PRIMITIVES.

@rdblue rdblue added this to the Iceberg 1.8.0 milestone Jan 17, 2025
@manuzhang
Copy link
Collaborator

Do we still want to back-port new features to Spark 3.3 given its support is deprecated?

@Fokko
Copy link
Contributor

Fokko commented Jan 17, 2025

Do we still want to back-port new features to Spark 3.3 given its support is deprecated?

I think it is best to keep the Spark versions as close as possible 👍

@manuzhang
Copy link
Collaborator

Here's what we say on "Deprecated".

Deprecated: an engine version is no longer actively maintained. People who are still interested in the version can backport any necessary feature or bug fix from newer versions, but the community will not spend effort in achieving feature parity.

Isn't this PR to achieve feature parity?

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think @manuzhang is technically correct, generally we wouldn't backport to 3.3.
We'd remove the 3.3 support anyways in the 1.9 release.

All that said, I'm not really opposed to getting it in for 3.3 (I'd say we should document this is supported for 3.3 though) unless there's strong objections? I'd also say going forward though we probably just want to be mindful of this, just to ensure we don't increase our maintenance burden.

@rdblue
Copy link
Contributor Author

rdblue commented Jan 17, 2025

@manuzhang, I think this is a good idea. While we don't really expect people to use default values yet, Spark versions stay around a long time. Having this support helps ensure that there aren't correctness issues when people use this version with Spark 3.3 a few years from now. It's not strictly necessary, but since it wasn't very difficult (just porting the 3.4 changes) I thought it would be a good idea to do it.

If you're against it, we can discuss more.

@manuzhang
Copy link
Collaborator

@amogh-jahagirdar @rdblue I agree with your rational, but I'm confused about the criteria here. Shall we back-port other features from 3.4 / 3.5 since they are also nice and not difficult to have? It might also be confusing to contributors / users that the meaning of deprecation seems arbitrary.

@rdblue
Copy link
Contributor Author

rdblue commented Jan 21, 2025

@manuzhang, this could be a correctness issue with Spark 3.3 and v3 tables, so I think it is an important fix. The language you're referencing is also trying to set expectations for other people, not limit what we will commit:

People who are still interested in the version can backport any necessary feature or bug fix from newer versions, but the community will not spend effort in achieving feature parity.

I'm the one interested in backporting this to avoid potential problems, but there should still not be an expectation that the Iceberg community will backport everything just because the branch is still there.

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @rdblue , I missed the statement in the docs:

 People who are still interested in the version can backport any necessary feature or bug fix from newer versions, but the community will not spend effort in achieving feature parity

Given that, and that default values probably should go in to avoid any future correctness issues if people use this version with Spark 3.3, I think it makes sense to get this in.

@amogh-jahagirdar amogh-jahagirdar merged commit 5b13760 into apache:main Jan 21, 2025
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants