Skip to content

Commit

Permalink
[DOC][3.3] Doc changes for InCommitTimestamps (#3979)
Browse files Browse the repository at this point in the history
<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [X] Other (Doc)

## Description

<!--
- Describe what this PR changes.
- Describe why we need the change.
 
If this PR resolves an issue be sure to include "Resolves #XXX" to
correctly link and close the issue upon merge.
-->
Updates docs with details about InCommitTimestamps.  

## How was this patch tested?

<!--
If tests were added, say they were added here. Please make sure to test
the changes thoroughly including negative and positive cases if
possible.
If the changes were tested in any way other than unit tests, please
clarify how you tested step by step (ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future).
If the changes were not tested, please explain why.
-->
N/A

## Does this PR introduce _any_ user-facing changes?

<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->
No

---------

Co-authored-by: Allison Portis <[email protected]>
  • Loading branch information
dhruvarya-db and allisonport-db authored Dec 20, 2024
1 parent 2e186b4 commit d5edfab
Show file tree
Hide file tree
Showing 4 changed files with 45 additions and 1 deletion.
31 changes: 31 additions & 0 deletions docs/source/delta-batch.md
Original file line number Diff line number Diff line change
Expand Up @@ -740,6 +740,37 @@ Each time a checkpoint is written, Delta automatically cleans up log entries old
.. note::
Due to log entry cleanup, instances can arise where you cannot time travel to a version that is less than the retention interval. <Delta> requires all consecutive log entries since the previous checkpoint to time travel to a particular version. For example, with a table initially consisting of log entries for versions [0, 19] and a checkpoint at verison 10, if the log entry for version 0 is cleaned up, then you cannot time travel to versions [1, 9]. Increasing the table property `delta.logRetentionDuration` can help avoid these situations.

### In-Commit Timestamps

#### Overview
<Delta> 3.3 introduced [In-Commit Timestamps](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#in-commit-timestamps) to provide a more reliable and consistent way to track table modification timestamps. These modification timestamps are needed for various usecases e.g. time-travel to a specific time in the past. This feature addresses limitations of the traditional approach that relied on file modification timestamps, particularly in scenarios involving data migration or replication.

#### Feature Details
In-Commit Timestamps stores modification timestamps within the commit itself, ensuring they remain unchanged regardless of file system operations. This provides several benefits:

- **Immutable History**: Timestamps become part of the table's permanent commit history
- **Consistent Time Travel**: Queries using timestamp-based time travel produce reliable results even after table migration

Without the In-Commit Timestamp feature, <Delta> uses file modification timestamps as the commit timestamp. This approach has various limitations:

1. Data Migration Issues: When tables were moved between storage locations, file modification timestamps would change, potentially disrupting historical tracking
2. Replication Scenarios: Timestamp inconsistencies could arise when replicating data across different environments
3. Time Travel Reliability: These timestamp changes could affect the accuracy and consistency of time travel queries

#### Enabling the Feature
This is a [writer table feature](versioning.md#what-are-table-features) and can be enabled by setting the table property `delta.enableInCommitTimestamps` to `true`:

```sql
ALTER TABLE <table_name>
SET TBLPROPERTIES ('delta.enableInCommitTimestamps' = 'true');
```

After enabling In-Commit Timestamps:
- Only new write operations will include the embedded timestamps
- File modification timestamps will continued to be used for historical commits performed before enablement

See the [Versioning](./versioning) section for more details around compatibility.

<a id="deltadataframewrites"></a>

## Write to a table
Expand Down
2 changes: 1 addition & 1 deletion docs/source/delta-drop-feature.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ You can drop the following Delta table features:
- `columnMapping`. See [_](delta-column-mapping.md). Drop support for column mapping is available in <Delta> 3.3.0 and above.
- `vacuumProtocolCheck`. See [Vacuum Protocol Check Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#vacuum-protocol-check). Drop support for vacuum protocol check is available in <Delta> 3.3.0 and above.
- `checkConstraints`. See [_](delta-constraints.md). Drop support for check constraints is available in <Delta> 3.3.0 and above.
- `inCommitTimestamp`. See [_](delta-batch.md#in-tommit-timestamps). Drop support for In-Commit Timestamp is available in <Delta> 3.3.0 and above.
- `inCommitTimestamp`. See [_](delta-batch.md#in-commit-timestamps). In-Commit Timestamp is available in <Delta> 3.3.0 and above.

You cannot drop other [Delta table features](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#valid-feature-names-in-table-features).

Expand Down
11 changes: 11 additions & 0 deletions docs/source/table-properties.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,17 @@ properties are set. Available Delta table properties include:
| |
| Default: `classic` |
+-------------------------------------------------------------------------------------------+
| `delta.enableInCommitTimestamps` |
| |
| `true` for enabling the InCommitTimestamps table feature. |
| |
| |
| See [_](delta-batch.md#in--commit-timestamps). |
| |
| Data type: `Boolean` |
| |
| Default: `false` |
+-------------------------------------------------------------------------------------------+

.. <Delta> replace:: Delta Lake
.. <AS> replace:: Apache Spark
2 changes: 2 additions & 0 deletions docs/source/versioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ The following <Delta> features break forward compatibility. Features are enabled
Row Tracking, [Delta Lake 3.2.0](https://github.com/delta-io/delta/releases/tag/v3.2.0),[_](/delta-row-tracking.md)
Type widening (Preview),[Delta Lake 3.2.0](https://github.com/delta-io/delta/releases/tag/v3.2.0),[_](/delta-type-widening.md)
Identity columns, [Delta Lake 3.3.0](https://github.com/delta-io/delta/releases/tag/v3.3.0),[_](/delta-batch.md#use-identity-columns)
In-Commit Timestamps, [Delta Lake 3.3.0](https://github.com/delta-io/delta/releases/tag/v3.3.0),[_](/delta-batch.md#use-identity-columns)

<a id="table-protocol"></a>

Expand Down Expand Up @@ -113,6 +114,7 @@ The following table shows minimum protocol versions required for <Delta> feature
Vacuum Protocol Check,7,3,[Vacuum Protocol Check Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#vacuum-protocol-check)
Row Tracking,7,3,[_](/delta-row-tracking.md)
Type widening (Preview),7,3,[_](/delta-type-widening.md)
In-Commit Timestamps,7,3,[In-Commit Timestamps Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#in-commit-timestamps)

<a id="upgrade"></a>

Expand Down

0 comments on commit d5edfab

Please sign in to comment.