Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC][3.3] Doc changes for InCommitTimestamps #3979

Merged
merged 8 commits into from
Dec 20, 2024

Conversation

dhruvarya-db
Copy link
Collaborator

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (Doc)

Description

Updates docs with details about InCommitTimestamps.

How was this patch tested?

N/A

Does this PR introduce any user-facing changes?

No

@dhruvarya-db dhruvarya-db changed the title [DOC] Doc changes for InCommitTimestamps [DOC][3.3] Doc changes for InCommitTimestamps Dec 16, 2024
### In-Commit Timestamps

#### Overview
<Delta> 3.3 introduced [In-Commit Timestamps](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#in-commit-timestamps) to provide a more reliable and consistent way to track table modifications. This feature addresses limitations of the traditional approach that relied on file modification timestamps, particularly in scenarios involving data migration or replication.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we add delta-spark versions in our documentation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it has been done before. From the same file:

You can selectively overwrite only the data that matches an arbitrary expression. This feature is available with DataFrames in <Delta> 1.1.0 and above and supported in SQL in <Delta> 2.4.0 and above.

<Delta> 3.3 introduced [In-Commit Timestamps](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#in-commit-timestamps) to provide a more reliable and consistent way to track table modifications. This feature addresses limitations of the traditional approach that relied on file modification timestamps, particularly in scenarios involving data migration or replication.

#### Background
Previously, <Delta> used file modification timestamps as the source of truth for table modifications. This approach presented several challenges:
Copy link
Collaborator

@prakharjain09 prakharjain09 Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Previously, <Delta> used file modification timestamps as the source of truth for table modifications. This approach presented several challenges:
Without the In-Commit Timestamp feature, <Delta> uses file modification timestamps as the commit timestamp. The commit timestamps are needed for various usecases e.g. time-travel to a specific time in the past. This approach has various limitations:

#### Overview
<Delta> 3.3 introduced [In-Commit Timestamps](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#in-commit-timestamps) to provide a more reliable and consistent way to track table modifications. This feature addresses limitations of the traditional approach that relied on file modification timestamps, particularly in scenarios involving data migration or replication.

#### Background
Copy link
Collaborator

@prakharjain09 prakharjain09 Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is documentation and so we don't want to explain what delta used to do before and why this feature was built.
Instead we want to tell the behavior of Delta with and without this feature to users.

  • Section-1: Overview
  • Section-2: This could be renamed to Feature Details i.e. we can merge Background and Feature Details section.
  • Inside Feature Details, We can talk about how Delta behaves when the feature is enabled.
  • Inside Feature Details, Next we can talk about how Delta behaves when feature is disabled + its limitations.
  • Then Section-3 - we can talk about how to enable the feature.

3. Time Travel Reliability: These timestamp changes could affect the accuracy and consistency of time travel queries

#### Enabling the Feature
This feature can be enabled by setting the table property `delta.enableInCommitTimestamps` to `true`:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This feature can be enabled by setting the table property `delta.enableInCommitTimestamps` to `true`:
This is a Writer table feature and can be enabled by setting the table property `delta.enableInCommitTimestamps` to `true`:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also attach the link to Writer TableFetaure section (if any).

@allisonport-db allisonport-db merged commit d5edfab into delta-io:branch-3.3 Dec 20, 2024
16 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants