From 10af5ebf1b642ed5f3863adfa8573dfd2e6ef6c4 Mon Sep 17 00:00:00 2001 From: Dhruv Arya Date: Mon, 16 Dec 2024 23:39:01 +0530 Subject: [PATCH 1/7] add ict docs --- docs/source/delta-batch.md | 7 +++++++ docs/source/delta-drop-feature.md | 2 +- docs/source/table-properties.md | 12 +++++++++++- docs/source/versioning.md | 2 ++ 4 files changed, 21 insertions(+), 2 deletions(-) diff --git a/docs/source/delta-batch.md b/docs/source/delta-batch.md index b3168f05307..34db66f69cf 100644 --- a/docs/source/delta-batch.md +++ b/docs/source/delta-batch.md @@ -742,6 +742,13 @@ Each time a checkpoint is written, Delta automatically cleans up log entries old .. note:: Due to log entry cleanup, instances can arise where you cannot time travel to a version that is less than the retention interval. requires all consecutive log entries since the previous checkpoint to time travel to a particular version. For example, with a table initially consisting of log entries for versions [0, 19] and a checkpoint at verison 10, if the log entry for version 0 is cleaned up, then you cannot time travel to versions [1, 9]. Increasing the table property `delta.logRetentionDuration` can help avoid these situations. +### In-Commit Timestamps + +Historically, Delta has relied on file modification timetamps to be the source of truth for when +the table was modified. This becomes problematic when tables are moved from one storage location to another since the file modification timestamps change in such scenarios. To ensure that the timestamps +used for time travel don't change in such scenarios and that timestamp-based time travel queries produce +consistent results, the [In-Commit Timestamps](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#in-commit-timestamps) table feature was introduced in Delta 3.3. This feature can be enabled by setting the table property `delta.enableInCommitTimestamps` to `true`. See the [Versioning](./versioning) section for more details around compatibility. + ## Write to a table diff --git a/docs/source/delta-drop-feature.md b/docs/source/delta-drop-feature.md index 1189199c1f3..343ef7be819 100644 --- a/docs/source/delta-drop-feature.md +++ b/docs/source/delta-drop-feature.md @@ -27,7 +27,7 @@ You can drop the following Delta table features: - `deletionVectors`. See [_](delta-deletion-vectors.md). - `typeWidening-preview`. See [_](delta-type-widening.md). Type widening is available in preview in 3.2.0 and above. - `v2Checkpoint`. See [V2 Checkpoint Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#v2-spec). Drop support for V2 Checkpoints is available in 3.1.0 and above. - +- `inCommitTimestamp`. See [In-Commit Timestamps Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#in-commit-timestamps) You cannot drop other [Delta table features](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#valid-feature-names-in-table-features). ## How are Delta table features dropped? diff --git a/docs/source/table-properties.md b/docs/source/table-properties.md index 01a58269d94..43377d41916 100644 --- a/docs/source/table-properties.md +++ b/docs/source/table-properties.md @@ -169,6 +169,16 @@ properties are set. Available Delta table properties include: | | | Default: `classic` | +-------------------------------------------------------------------------------------------+ - +| `delta.enableInCommitTimestamps` | +| | +| `true` for enabling the InCommitTimestamps table feature. | +| | +| | +| See [_](/presto-integration.md#step-3-update-manifests). | +| | +| Data type: `Boolean` | +| | +| Default: `false` | ++-------------------------------------------------------------------------------------------+ .. replace:: Delta Lake .. replace:: Apache Spark \ No newline at end of file diff --git a/docs/source/versioning.md b/docs/source/versioning.md index 2135a6d5b0d..bf6df741a0a 100644 --- a/docs/source/versioning.md +++ b/docs/source/versioning.md @@ -29,6 +29,7 @@ The following features break forward compatibility. Features are enabled Row Tracking, [Delta Lake 3.2.0](https://github.com/delta-io/delta/releases/tag/v3.2.0),[_](/delta-row-tracking.md) Type widening (Preview),[Delta Lake 3.2.0](https://github.com/delta-io/delta/releases/tag/v3.2.0),[_](/delta-type-widening.md) Identity columns, [Delta Lake 3.3.0](https://github.com/delta-io/delta/releases/tag/v3.3.0),[_](/delta-batch.md#use-identity-columns) + In-Commit Timestamps, [Delta Lake 3.3.0](https://github.com/delta-io/delta/releases/tag/v3.3.0),[_](/delta-batch.md#use-identity-columns) @@ -113,6 +114,7 @@ The following table shows minimum protocol versions required for feature Vacuum Protocol Check,7,3,[Vacuum Protocol Check Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#vacuum-protocol-check) Row Tracking,7,3,[_](/delta-row-tracking.md) Type widening (Preview),7,3,[_](/delta-type-widening.md) + In-Commit Timestamps,7,3,[In-Commit Timestamps Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#in-commit-timestamps) From a51ec83b2309551defad02d91ec240f7ecdee6da Mon Sep 17 00:00:00 2001 From: Dhruv Arya Date: Mon, 16 Dec 2024 23:50:12 +0530 Subject: [PATCH 2/7] improvements --- docs/source/delta-batch.md | 33 +++++++++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 4 deletions(-) diff --git a/docs/source/delta-batch.md b/docs/source/delta-batch.md index 34db66f69cf..a9353635aec 100644 --- a/docs/source/delta-batch.md +++ b/docs/source/delta-batch.md @@ -744,10 +744,35 @@ Each time a checkpoint is written, Delta automatically cleans up log entries old ### In-Commit Timestamps -Historically, Delta has relied on file modification timetamps to be the source of truth for when -the table was modified. This becomes problematic when tables are moved from one storage location to another since the file modification timestamps change in such scenarios. To ensure that the timestamps -used for time travel don't change in such scenarios and that timestamp-based time travel queries produce -consistent results, the [In-Commit Timestamps](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#in-commit-timestamps) table feature was introduced in Delta 3.3. This feature can be enabled by setting the table property `delta.enableInCommitTimestamps` to `true`. See the [Versioning](./versioning) section for more details around compatibility. +#### Overview +Delta Lake 3.3 introduced [In-Commit Timestamps](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#in-commit-timestamps) to provide a more reliable and consistent way to track table modifications. This feature addresses limitations of the traditional approach that relied on file modification timestamps, particularly in scenarios involving data migration or replication. + +#### Background +Previously, Delta Lake used file modification timestamps as the source of truth for table modifications. This approach presented several challenges: + +1. Data Migration Issues: When tables were moved between storage locations, file modification timestamps would change, potentially disrupting historical tracking +2. Replication Scenarios: Timestamp inconsistencies could arise when replicating data across different environments +3. Time Travel Reliability: These timestamp changes could affect the accuracy and consistency of time travel queries + +#### Feature Details +In-Commit Timestamps stores modification timestamps within the commit itself, ensuring they remain unchanged regardless of file system operations. This provides several benefits: + +- **Immutable History**: Timestamps become part of the table's permanent commit history +- **Consistent Time Travel**: Queries using timestamp-based time travel produce reliable results even after table migration + +### Enabling the Feature +This feature can be enabled by setting the table property `delta.enableInCommitTimestamps` to `true`: + +```sql +ALTER TABLE +SET TBLPROPERTIES ('delta.enableInCommitTimestamps' = 'true'); +``` + +After enabling In-Commit Timestamps: +- Only new write operations will include the embedded timestamps +- File modification timestamps will continued to be used for historical commits performed before enablement + +See the [Versioning](./versioning) section for more details around compatibility. From 72c1b744d6a45b81c5c4e2c1f27ab8e8b881db2a Mon Sep 17 00:00:00 2001 From: Dhruv Arya Date: Mon, 16 Dec 2024 23:51:34 +0530 Subject: [PATCH 3/7] fix heading --- docs/source/delta-batch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/delta-batch.md b/docs/source/delta-batch.md index a9353635aec..7479408376a 100644 --- a/docs/source/delta-batch.md +++ b/docs/source/delta-batch.md @@ -760,7 +760,7 @@ In-Commit Timestamps stores modification timestamps within the commit itself, en - **Immutable History**: Timestamps become part of the table's permanent commit history - **Consistent Time Travel**: Queries using timestamp-based time travel produce reliable results even after table migration -### Enabling the Feature +#### Enabling the Feature This feature can be enabled by setting the table property `delta.enableInCommitTimestamps` to `true`: ```sql From 6b6a3aabc6f64ffbf82ade3d13ae73c87d0a0fff Mon Sep 17 00:00:00 2001 From: Dhruv Arya Date: Mon, 16 Dec 2024 23:52:27 +0530 Subject: [PATCH 4/7] fix spacing --- docs/source/table-properties.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/source/table-properties.md b/docs/source/table-properties.md index 43377d41916..35c20da29b2 100644 --- a/docs/source/table-properties.md +++ b/docs/source/table-properties.md @@ -180,5 +180,6 @@ properties are set. Available Delta table properties include: | | | Default: `false` | +-------------------------------------------------------------------------------------------+ + .. replace:: Delta Lake .. replace:: Apache Spark \ No newline at end of file From 4601bb54ed092218d041d36334b4fe84b2369184 Mon Sep 17 00:00:00 2001 From: Dhruv Arya Date: Mon, 16 Dec 2024 23:56:33 +0530 Subject: [PATCH 5/7] fix --- docs/source/delta-batch.md | 4 ++-- docs/source/table-properties.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/delta-batch.md b/docs/source/delta-batch.md index 7479408376a..dcdfbb3b9e0 100644 --- a/docs/source/delta-batch.md +++ b/docs/source/delta-batch.md @@ -745,10 +745,10 @@ Each time a checkpoint is written, Delta automatically cleans up log entries old ### In-Commit Timestamps #### Overview -Delta Lake 3.3 introduced [In-Commit Timestamps](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#in-commit-timestamps) to provide a more reliable and consistent way to track table modifications. This feature addresses limitations of the traditional approach that relied on file modification timestamps, particularly in scenarios involving data migration or replication. + 3.3 introduced [In-Commit Timestamps](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#in-commit-timestamps) to provide a more reliable and consistent way to track table modifications. This feature addresses limitations of the traditional approach that relied on file modification timestamps, particularly in scenarios involving data migration or replication. #### Background -Previously, Delta Lake used file modification timestamps as the source of truth for table modifications. This approach presented several challenges: +Previously, used file modification timestamps as the source of truth for table modifications. This approach presented several challenges: 1. Data Migration Issues: When tables were moved between storage locations, file modification timestamps would change, potentially disrupting historical tracking 2. Replication Scenarios: Timestamp inconsistencies could arise when replicating data across different environments diff --git a/docs/source/table-properties.md b/docs/source/table-properties.md index 35c20da29b2..318173d3cdf 100644 --- a/docs/source/table-properties.md +++ b/docs/source/table-properties.md @@ -174,7 +174,7 @@ properties are set. Available Delta table properties include: | `true` for enabling the InCommitTimestamps table feature. | | | | | -| See [_](/presto-integration.md#step-3-update-manifests). | +| See [_](delta-batch.md#in--commit-timestamps). | | | | Data type: `Boolean` | | | From 3a82f21b19a79e3d6ad470ebcce7fab13694bc59 Mon Sep 17 00:00:00 2001 From: Dhruv Arya Date: Tue, 17 Dec 2024 08:48:30 +0530 Subject: [PATCH 6/7] update as per feedback --- docs/source/delta-batch.md | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/docs/source/delta-batch.md b/docs/source/delta-batch.md index dcdfbb3b9e0..bed70f5e7e8 100644 --- a/docs/source/delta-batch.md +++ b/docs/source/delta-batch.md @@ -745,14 +745,7 @@ Each time a checkpoint is written, Delta automatically cleans up log entries old ### In-Commit Timestamps #### Overview - 3.3 introduced [In-Commit Timestamps](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#in-commit-timestamps) to provide a more reliable and consistent way to track table modifications. This feature addresses limitations of the traditional approach that relied on file modification timestamps, particularly in scenarios involving data migration or replication. - -#### Background -Previously, used file modification timestamps as the source of truth for table modifications. This approach presented several challenges: - -1. Data Migration Issues: When tables were moved between storage locations, file modification timestamps would change, potentially disrupting historical tracking -2. Replication Scenarios: Timestamp inconsistencies could arise when replicating data across different environments -3. Time Travel Reliability: These timestamp changes could affect the accuracy and consistency of time travel queries + 3.3 introduced [In-Commit Timestamps](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#in-commit-timestamps) to provide a more reliable and consistent way to track table modification timestamps. These modification timestamps are needed for various usecases e.g. time-travel to a specific time in the past. This feature addresses limitations of the traditional approach that relied on file modification timestamps, particularly in scenarios involving data migration or replication. #### Feature Details In-Commit Timestamps stores modification timestamps within the commit itself, ensuring they remain unchanged regardless of file system operations. This provides several benefits: @@ -760,6 +753,12 @@ In-Commit Timestamps stores modification timestamps within the commit itself, en - **Immutable History**: Timestamps become part of the table's permanent commit history - **Consistent Time Travel**: Queries using timestamp-based time travel produce reliable results even after table migration +Without the In-Commit Timestamp feature, uses file modification timestamps as the commit timestamp. This approach has various limitations: + +1. Data Migration Issues: When tables were moved between storage locations, file modification timestamps would change, potentially disrupting historical tracking +2. Replication Scenarios: Timestamp inconsistencies could arise when replicating data across different environments +3. Time Travel Reliability: These timestamp changes could affect the accuracy and consistency of time travel queries + #### Enabling the Feature This feature can be enabled by setting the table property `delta.enableInCommitTimestamps` to `true`: From 9fb8db23e81abe1e0b3fb484267ec42c19a7d49e Mon Sep 17 00:00:00 2001 From: Allison Portis Date: Thu, 19 Dec 2024 14:08:32 -0800 Subject: [PATCH 7/7] respond to comment --- docs/source/delta-batch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/delta-batch.md b/docs/source/delta-batch.md index bed70f5e7e8..7b4546dc1c5 100644 --- a/docs/source/delta-batch.md +++ b/docs/source/delta-batch.md @@ -760,7 +760,7 @@ Without the In-Commit Timestamp feature, uses file modification timestam 3. Time Travel Reliability: These timestamp changes could affect the accuracy and consistency of time travel queries #### Enabling the Feature -This feature can be enabled by setting the table property `delta.enableInCommitTimestamps` to `true`: +This is a [writer table feature](versioning.md#what-are-table-features) and can be enabled by setting the table property `delta.enableInCommitTimestamps` to `true`: ```sql ALTER TABLE