Skip to content

Commit

Permalink
Spec: Add added-rows field to Snapshot (#11976)
Browse files Browse the repository at this point in the history
  • Loading branch information
RussellSpitzer authored Jan 17, 2025
1 parent 4d0f40c commit f895b33
Showing 1 changed file with 7 additions and 1 deletion.
8 changes: 7 additions & 1 deletion format/spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -411,6 +411,8 @@ The `first_row_id` of the EXISTING file `data1` was already assigned, so the fil

Files `data2` and `data3` are written with `null` for `first_row_id` and are assigned `first_row_id` at read time based on the manifest's `first_row_id` and the `record_count` of previously listed ADDED files in this manifest: (1,000 + 0) and (1,000 + 50).

The snapshot then populates the total number of `added-rows` based on the sum of all added rows in the manifests: 100 (50 + 50)

When the new snapshot is committed, the table's `next-row-id` must also be updated (even if the new snapshot is not in the main branch). Because 225 rows were added (`added1`: 100 + `added2`: 0 + `added3`: 125), the new value is 1,000 + 225 = 1,225:


Expand Down Expand Up @@ -664,7 +666,9 @@ A snapshot consists of the following fields:
| _optional_ | | | **`manifests`** | A list of manifest file locations. Must be omitted if `manifest-list` is present |
| _optional_ | _required_ | _required_ | **`summary`** | A string map that summarizes the snapshot changes, including `operation` as a _required_ field (see below) |
| _optional_ | _optional_ | _optional_ | **`schema-id`** | ID of the table's current schema when the snapshot was created |
| | | _optional_ | **`first-row-id`** | The first `_row_id` assigned to the first row in the first data file in the first manifest, see [Row Lineage](#row-lineage) |
| | | _optional_ | **`first-row-id`** | The first `_row_id` assigned to the first row in the first data file in the first manifest, see [Row Lineage](#row-lineage) |
| | | _optional_ | **`added-rows`** | Sum of the [`added_rows_count`](#manifest-lists) from all manifests added in this snapshot. Required if [Row Lineage](#row-lineage) is enabled |


The snapshot summary's `operation` field is used by some operations, like snapshot expiration, to skip processing certain snapshots. Possible `operation` values are:

Expand Down Expand Up @@ -692,6 +696,8 @@ A snapshot's `first-row-id` is assigned to the table's current `next-row-id` on

The snapshot's `first-row-id` is the starting `first_row_id` assigned to manifests in the snapshot's manifest list.

The snapshot's `added-rows` is the sum of all the [`added_rows_count`](#manifest-lists) in all added manifests.


### Manifest Lists

Expand Down

0 comments on commit f895b33

Please sign in to comment.