diff --git a/protocol_rfcs/README.md b/protocol_rfcs/README.md index 8ff6cb58a5a..835b5be46d3 100644 --- a/protocol_rfcs/README.md +++ b/protocol_rfcs/README.md @@ -23,7 +23,7 @@ Here is the history of all the RFCs propose/accepted/rejected since Feb 6, 2024, | 2023-02-26 | [column-mapping-usage.tracking.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/column-mapping-usage-tracking.md) | https://github.com/delta-io/delta/issues/2682 | Column Mapping Usage Tracking | | 2023-04-24 | [variant-type.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/variant-type.md) | https://github.com/delta-io/delta/issues/2864 | Variant Data Type | | 2024-04-30 | [collated-string-type.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/collated-string-type.md) | https://github.com/delta-io/delta/issues/2894 | Collated String Type | -| 2025-01-09 | [variant-shredding.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/variant-shredding.md) | https://github.com/delta-io/delta/issues/4032 | Variant Shredding | +| 2025-02-07 | [variant-shredding.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/variant-shredding.md) | https://github.com/delta-io/delta/issues/4032 | Variant Shredding | ### Accepted RFCs diff --git a/protocol_rfcs/variant-shredding.md b/protocol_rfcs/variant-shredding.md index 7d158f5ad93..e147df25f51 100644 --- a/protocol_rfcs/variant-shredding.md +++ b/protocol_rfcs/variant-shredding.md @@ -13,7 +13,7 @@ Shredding allows Variant data to be be more efficiently stored and queried. This feature enables support for shredding of the Variant data type, to store and query Variant data more efficiently. Shredding a Variant value is taking paths from the Variant value, and storing them as a typed column in the file. The shredding does not duplicate data, so if a value is stored in the typed column, it is removed from the Variant binary. -Storing Variant values as typed columns is faster to access, and enables skipping with statistics. +Storing Variant values as typed columns is faster to access, and enables data skipping with statistics. The `variantShredding` feature depends on the `variantType` feature. @@ -31,7 +31,7 @@ Struct field name | Parquet primitive type | Description -|-|- metadata | binary | (required) The binary-encoded Variant metadata, as described in [Parquet Variant binary encoding](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md) value | binary | (optional) The binary-encoded Variant value, as described in [Parquet Variant binary encoding](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md) -typed_value | * | (optional) This can be any Parquet type, representing the data stored in the Variant. Details of the shredding scheme is found in the [Parquet Variant binary encoding](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md) +typed_value | * | (optional) This can be any Parquet type, representing the data stored in the Variant. Details of the shredding scheme is found in the [Parquet Variant binary encoding](https://github.com/apache/parquet-format/blob/master/VariantShredding.md) ## Writer Requirements for Variant Shredding @@ -42,4 +42,4 @@ When Variant Shredding is supported (`writerFeatures` field of a table's `protoc When Variant type is supported (`readerFeatures` field of a table's `protocol` action contains `variantShredding`), readers: - must recognize and tolerate a `variant` data type in a Delta schema -- must tolerate a parquet schema that is either unshredded (only `metadata` and `value` struct fields) or shredded (`metadata`, `value`, and `typed_value` struct fields) when reading a Variant data type from file. +- must recognize and correctly process a parquet schema that is either unshredded (only `metadata` and `value` struct fields) or shredded (`metadata`, `value`, and `typed_value` struct fields) when reading a Variant data type from file.