Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark] Deprecate tableVersion field in type widening metadata #3334

Merged

Conversation

johanl-db
Copy link
Collaborator

Description

The protocol specification for type widening dropped the requirement to populate a tableVersion field as part of type change history to track in which version of the table a given change was applied.
See protocol update: #3297
This field was used at some point during the preview but isn't needed anymore.

It is now deprecated:

  • The field is preserved in table metadata that already contains it.
  • The field isn't set anymore when the stable table feature is active on the table.
  • The field is still set when only the preview table feature is active on the table.

The last point is necessary to avoid breaking preview clients (Delta 3.2 & Delta 4.0 preview) that require the field to be set.

How was this patch tested?

  • Updated existing metadata tests to cover tableVersion not being set by default.
  • Added metadata tests to explicitly cover tableVersion being set.
  • Added tests covering tableVersion when using the preview and stable table features.

Does this PR introduce any user-facing changes?

Yes. As of this change, a table that supports the stable table feature typeWidening won't have a tableVersion field in the type change history stored in the table metadata.

Tables that only support the preview table feature typeWidening-preview don't see any change.

Copy link
Collaborator

@tomvanbussel tomvanbussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments, LGTM otherwise

Comment on lines 59 to 60
// tableVersion was a field present during the preview and removed afterwards. We preserve it if
// it's already present in the type change metadata of the table to avoid breaking older clients.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do they still use it for? Would it be possible to always set it to some constant number (e.g. zero)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This information is used to decide which files to rewrite when dropping the table feature.
That is, until #3155 where we switched to looking at parquet footers instead.
Delta 3.2 and Delta 4.0 preview use the "old" way.

We could set it to Long.MaxValue, which would effectively cause all files to be rewritten but that's pretty hacky

// The version field isn't used anymore but we need to populate it in case the table doesn't
// use the stable feature, as preview clients may then still access the table and rely on
// the field being present.
if (!txn.protocol.isFeatureSupported(TypeWideningTableFeature)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we check for TypeWideningTableFeature not being there instead of TypeWideningPreviewTableFeature being there?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because a table can technically have both the preview and stable feature if one is present and a user manually add the other using ALTER TABLE t SET TBLPROPERTIES ('delta.feature.typeWidening[-preview]' = 'supported')

As soon as the stable feature is supported, we know preview versions can't access the table and we don't need to populate tableVersion.

This is a bit convoluted for an edge case that really should be extremely rare, on second thoughts it's probably better to always set tableVersion when the preview feature is supported to avoid having to reason about this edge case.
I updated this change

s"TBLPROPERTIES ('${DeltaConfigs.ENABLE_TYPE_WIDENING.key}' = 'true')")
sql(s"CREATE TABLE $testTableName (a int) USING delta TBLPROPERTIES (" +
s"'${DeltaConfigs.ENABLE_TYPE_WIDENING.key}' = 'true', " +
s"'${propertyKey(TypeWideningTableFeature)}' = 'supported')")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? Isn't the feature automatically enabled when setting the property?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preview feature is enabled by default, this forces the stable to be added instead.

I added a comment to clarify that and that we're testing with stable feature by default and only explicit set the preview feature in specific tests that cover the old behavior re: tableVersion.

  • added a missing test for the preview feature

Copy link
Collaborator

@bart-samwel bart-samwel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@scottsand-db scottsand-db merged commit 70bfe82 into delta-io:master Jul 9, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants