From 5e5e50c36777cb59b0b2cde802e46fad0743308a Mon Sep 17 00:00:00 2001 From: Allison Portis Date: Thu, 19 Dec 2024 15:41:08 -0800 Subject: [PATCH] [3.3][Docs] Update docs for 3.3 release (#3992) #### Which Delta project/connector is this regarding? - [ ] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [x] Other (fill in here) ## Description Updates versions in docs. Minor fixes / clarifications in docs. Adds to list of supported features for dropping. ## How was this patch tested? Local build ## Does this PR introduce _any_ user-facing changes? No (cherry picked from commit 2e186b46749a592c8ea71d1b771daa21ba5f39a5) --- docs/source/delta-batch.md | 2 -- docs/source/delta-clustering.md | 8 +++++--- docs/source/delta-drop-feature.md | 4 ++++ docs/source/delta-sharing.md | 4 ++++ docs/source/delta-storage.md | 10 +++++----- docs/source/delta-type-widening.md | 8 ++++---- docs/source/delta-utility.md | 2 +- docs/source/quick-start.md | 16 ++++++++-------- docs/source/releases.md | 1 + 9 files changed, 32 insertions(+), 23 deletions(-) diff --git a/docs/source/delta-batch.md b/docs/source/delta-batch.md index b3168f05307..4e3bf03cdfb 100644 --- a/docs/source/delta-batch.md +++ b/docs/source/delta-batch.md @@ -55,8 +55,6 @@ You can create tables in the following ways. SQL also supports creating a table at a path, without creating an entry in the Hive metastore. - .. code-language-tabs:: - ```sql -- Create or replace table with path CREATE OR REPLACE TABLE delta.`/tmp/delta/people10m` ( diff --git a/docs/source/delta-clustering.md b/docs/source/delta-clustering.md index 4abfba70435..c9f1ae28c7f 100644 --- a/docs/source/delta-clustering.md +++ b/docs/source/delta-clustering.md @@ -33,7 +33,7 @@ To enable liquid clustering, add the `CLUSTER BY` phrase to a table creation sta -- Create an empty table CREATE TABLE table1(col0 int, col1 string) USING DELTA CLUSTER BY (col0); - -- Using a CTAS statement + -- Using a CTAS statement (Delta 3.3+) CREATE EXTERNAL TABLE table2 CLUSTER BY (col0) -- specify clustering after table name, not in subquery LOCATION 'table_location' AS SELECT * FROM table1; @@ -61,14 +61,14 @@ To enable liquid clustering, add the `CLUSTER BY` phrase to a table creation sta .. warning:: Tables created with liquid clustering have `Clustering` and `DomainMetadata` table features enabled (both writer features) and use Delta writer version 7 and reader version 1. Table protocol versions cannot be downgraded. See [_](/versioning.md). -You can enable liquid clustering on an existing unpartitioned Delta table using the following syntax: +In 3.3 and above you can enable liquid clustering on an existing unpartitioned Delta table using the following syntax: ```sql ALTER TABLE CLUSTER BY () ``` -.. important:: Default behavior does not apply clustering to previously written data. To force reclustering for all records, you must use `OPTIMIZE FULL`. See [_](#optimize-full). +.. important:: Default behavior does not apply clustering to previously written data. To force reclustering for all records, you must use `OPTIMIZE FULL`. See [_](#recluster-entire-table). ## Choose clustering columns @@ -97,6 +97,8 @@ OPTIMIZE table_name; Liquid clustering is incremental, meaning that data is only rewritten as necessary to accommodate data that needs to be clustered. Already clustered data files with different clustering columns are not rewritten. +### Recluster entire table + In 3.3 and above, you can force reclustering of all records in a table with the following syntax: ```sql diff --git a/docs/source/delta-drop-feature.md b/docs/source/delta-drop-feature.md index 1189199c1f3..61081e01eb9 100644 --- a/docs/source/delta-drop-feature.md +++ b/docs/source/delta-drop-feature.md @@ -27,6 +27,10 @@ You can drop the following Delta table features: - `deletionVectors`. See [_](delta-deletion-vectors.md). - `typeWidening-preview`. See [_](delta-type-widening.md). Type widening is available in preview in 3.2.0 and above. - `v2Checkpoint`. See [V2 Checkpoint Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#v2-spec). Drop support for V2 Checkpoints is available in 3.1.0 and above. +- `columnMapping`. See [_](delta-column-mapping.md). Drop support for column mapping is available in 3.3.0 and above. +- `vacuumProtocolCheck`. See [Vacuum Protocol Check Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#vacuum-protocol-check). Drop support for vacuum protocol check is available in 3.3.0 and above. +- `checkConstraints`. See [_](delta-constraints.md). Drop support for check constraints is available in 3.3.0 and above. +- `inCommitTimestamp`. See [_](delta-batch.md#in-tommit-timestamps). Drop support for In-Commit Timestamp is available in 3.3.0 and above. You cannot drop other [Delta table features](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#valid-feature-names-in-table-features). diff --git a/docs/source/delta-sharing.md b/docs/source/delta-sharing.md index 3a0c0e7241a..8791bd07fc5 100644 --- a/docs/source/delta-sharing.md +++ b/docs/source/delta-sharing.md @@ -182,6 +182,10 @@ Please remember to set the spark configurations mentioned in [_](delta-batch.md# | - | - | | [Deletion Vectors](delta-deletion-vectors.md) | 3.1.0 | | [Column Mapping](delta-column-mapping.md) | 3.1.0 | +| [Timestamp without Timezone](https://spark.apache.org/docs/latest/sql-ref-datatypes.html) | 3.3.0 | +| [Type widening (Preview)](/delta-type-widening.md) | 3.3.0 | +| [Variant Type (Preview)](https://github.com/delta-io/delta/blob/master/protocol_rfcs/variant-type.md) | 3.3.0 | + Batch queries can be performed as is, because it can automatically resolve the `responseFormat` based on the table features of the shared table. An additional option `responseFormat=delta` needs to be set for cdf and streaming queries when reading shared tables with Deletion Vectors or Column Mapping enabled. diff --git a/docs/source/delta-storage.md b/docs/source/delta-storage.md index 59aba10a22f..ae41dc987cf 100644 --- a/docs/source/delta-storage.md +++ b/docs/source/delta-storage.md @@ -66,11 +66,11 @@ In this default mode, supports concurrent reads from multiple clusters, This section explains how to quickly start reading and writing Delta tables on S3 using single-cluster mode. For a detailed explanation of the configuration, see [_](#setup-configuration-s3-multi-cluster). -#. Use the following command to launch a Spark shell with and S3 support (assuming you use Spark 3.5.0 which is pre-built for Hadoop 3.3.4): +#. Use the following command to launch a Spark shell with and S3 support (assuming you use Spark 3.5.3 which is pre-built for Hadoop 3.3.4): ```bash bin/spark-shell \ - --packages io.delta:delta-spark_2.12:3.1.0,org.apache.hadoop:hadoop-aws:3.3.4 \ + --packages io.delta:delta-spark_2.12:3.3.0,org.apache.hadoop:hadoop-aws:3.3.4 \ --conf spark.hadoop.fs.s3a.access.key= \ --conf spark.hadoop.fs.s3a.secret.key= ``` @@ -91,7 +91,7 @@ For efficient listing of metadata files on S3, set the configuration `de ```scala bin/spark-shell \ - --packages io.delta:delta-spark_2.12:3.1.0,org.apache.hadoop:hadoop-aws:3.3.4 \ + --packages io.delta:delta-spark_2.12:3.3.0,org.apache.hadoop:hadoop-aws:3.3.4 \ --conf spark.hadoop.fs.s3a.access.key= \ --conf spark.hadoop.fs.s3a.secret.key= \ --conf "spark.hadoop.delta.enableFastS3AListFrom=true @@ -138,11 +138,11 @@ This mode supports concurrent writes to S3 from multiple clusters and has to be This section explains how to quickly start reading and writing Delta tables on S3 using multi-cluster mode. -#. Use the following command to launch a Spark shell with and S3 support (assuming you use Spark 3.5.0 which is pre-built for Hadoop 3.3.4): +#. Use the following command to launch a Spark shell with and S3 support (assuming you use Spark 3.5.3 which is pre-built for Hadoop 3.3.4): ```bash bin/spark-shell \ - --packages io.delta:delta-spark_2.12:3.1.0,org.apache.hadoop:hadoop-aws:3.3.4,io.delta:delta-storage-s3-dynamodb:3.1.0 \ + --packages io.delta:delta-spark_2.12:3,org.apache.hadoop:hadoop-aws:3.3.4,io.delta:delta-storage-s3-dynamodb:3.3.0 \ --conf spark.hadoop.fs.s3a.access.key= \ --conf spark.hadoop.fs.s3a.secret.key= \ --conf spark.delta.logStore.s3a.impl=io.delta.storage.S3DynamoDBLogStore \ diff --git a/docs/source/delta-type-widening.md b/docs/source/delta-type-widening.md index ca6238e63b6..e4ef2ec6a0f 100644 --- a/docs/source/delta-type-widening.md +++ b/docs/source/delta-type-widening.md @@ -4,13 +4,13 @@ description: Learn about type widening in Delta. # Delta type widening -.. note:: This feature is available in preview in 3.2. +.. note:: This feature is available in preview in 3.2 and above. The type widening feature allows changing the type of columns in a Delta table to a wider type. This enables manual type changes using the `ALTER TABLE ALTER COLUMN` command and automatic type migration with schema evolution in `INSERT` and `MERGE INTO` commands. ## Supported type changes -The feature preview in 3.2 supports a limited set of type changes: +The feature preview in 3.2 and above supports a limited set of type changes: - `BYTE` to `SHORT` and `INT`. - `SHORT` to `INT` @@ -31,7 +31,7 @@ You can enable type widening on an existing table by setting the `delta.enableTy Alternatively, you can enable type widening during table creation: ```sql - CREATE TABLE T(c1 INT) USING DELTA TBLPROPERTIES('delta.enableTypeWidening' = 'true') + CREATE TABLE USING DELTA TBLPROPERTIES('delta.enableTypeWidening' = 'true') ``` To disable type widening: @@ -68,7 +68,7 @@ When all conditions are satisfied, the target table schema is updated automatica The type widening feature can be removed from a Delta table using the `DROP FEATURE` command: ```sql - ALTER TABLE DROP FEATURE 'typeWidening-preview' [TRUNCATE HISTORY] + ALTER TABLE DROP FEATURE 'typeWidening-preview' [TRUNCATE HISTORY] ``` See [_](delta-drop-feature.md) for more information on dropping Delta table features. diff --git a/docs/source/delta-utility.md b/docs/source/delta-utility.md index b4e50224f8b..a1e5a9b5e8d 100644 --- a/docs/source/delta-utility.md +++ b/docs/source/delta-utility.md @@ -37,7 +37,7 @@ default retention threshold for the files is 7 days. To change this behavior, se VACUUM eventsTable LITE -- This VACUUM in ‘LITE’ mode runs faster. -- Instead of finding all files in the table directory, `VACUUM LITE` uses the Delta transaction log to identify and remove files no longer referenced by any table versions within the retention duration. -- If `VACUUM LITE` cannot be completed because the Delta log has been pruned a `DELTA_CANNOT_VACUUM_LITE` exception is raised. - -- This mode is available only in 3.3 and above. + -- This mode is available only in Delta 3.3 and above. VACUUM '/data/events' -- vacuum files in path-based table diff --git a/docs/source/quick-start.md b/docs/source/quick-start.md index f056c0255f8..a8359fa3041 100644 --- a/docs/source/quick-start.md +++ b/docs/source/quick-start.md @@ -18,13 +18,13 @@ Follow these instructions to set up with Spark. You can run the steps in #. Run as a project: Set up a Maven or SBT project (Scala or Java) with , copy the code snippets into a source file, and run the project. Alternatively, you can use the [examples provided in the Github repository](https://github.com/delta-io/delta/tree/master/examples). -.. important:: For all of the following instructions, make sure to install the correct version of Spark or PySpark that is compatible with `3.2.0`. See the [release compatibility matrix](releases.md) for details. +.. important:: For all of the following instructions, make sure to install the correct version of Spark or PySpark that is compatible with `3.3.0`. See the [release compatibility matrix](releases.md) for details. ### Prerequisite: set up Java As mentioned in the official installation instructions [here](https://spark.apache.org/docs/latest/index.html#downloading), make sure you have a valid Java version installed (8, 11, or 17) and that Java is configured correctly on your system using either the system `PATH` or `JAVA_HOME` environmental variable. -Windows users should follow the instructions in this [blog](https://phoenixnap.com/kb/install-spark-on-windows-10), making sure to use the correct version of that is compatible with `3.2.0`. +Windows users should follow the instructions in this [blog](https://phoenixnap.com/kb/install-spark-on-windows-10), making sure to use the correct version of that is compatible with `3.3.0`. ### Set up interactive shell @@ -35,7 +35,7 @@ To use interactively within the Spark SQL, Scala, or Python shell, you n Download the [compatible version](releases.md) of by following instructions from [Downloading Spark](https://spark.apache.org/downloads.html), either using `pip` or by downloading and extracting the archive and running `spark-sql` in the extracted directory. ```bash -bin/spark-sql --packages io.delta:delta-spark_2.12:3.2.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" +bin/spark-sql --packages io.delta:delta-spark_2.12:3.3.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" ``` #### PySpark Shell @@ -49,7 +49,7 @@ bin/spark-sql --packages io.delta:delta-spark_2.12:3.2.0 --conf "spark.sql.exten #. Run PySpark with the package and additional configurations: ```bash - pyspark --packages io.delta:delta-spark_2.12:3.2.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" + pyspark --packages io.delta:delta-spark_2.12:3.3.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" ``` #### Spark Scala Shell @@ -57,7 +57,7 @@ bin/spark-sql --packages io.delta:delta-spark_2.12:3.2.0 --conf "spark.sql.exten Download the [compatible version](releases.md) of by following instructions from [Downloading Spark](https://spark.apache.org/downloads.html), either using `pip` or by downloading and extracting the archive and running `spark-shell` in the extracted directory. ```bash -bin/spark-shell --packages io.delta:delta-spark_2.12:3.2.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" +bin/spark-shell --packages io.delta:delta-spark_2.12:3.3.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" ``` ### Set up project @@ -72,7 +72,7 @@ You include in your Maven project by adding it as a dependency in your P io.delta delta-spark_2.12 - 3.2.0 + 3.3.0 ``` @@ -81,12 +81,12 @@ You include in your Maven project by adding it as a dependency in your P You include in your SBT project by adding the following line to your `build.sbt` file: ```scala -libraryDependencies += "io.delta" %% "delta-spark" % "3.2.0" +libraryDependencies += "io.delta" %% "delta-spark" % "3.3.0" ``` #### Python -To set up a Python project (for example, for unit testing), you can install using `pip install delta-spark==3.2.0` and then configure the SparkSession with the `configure_spark_with_delta_pip()` utility function in . +To set up a Python project (for example, for unit testing), you can install using `pip install delta-spark==3.3.0` and then configure the SparkSession with the `configure_spark_with_delta_pip()` utility function in . ```python import pyspark diff --git a/docs/source/releases.md b/docs/source/releases.md index d12ff0db269..dca37de50be 100644 --- a/docs/source/releases.md +++ b/docs/source/releases.md @@ -17,6 +17,7 @@ The following table lists versions and their compatible versions. | version | version | | --- | --- | +| 3.3.x | 3.5.x | | 3.2.x | 3.5.x | | 3.1.x | 3.5.x | | 3.0.x | 3.5.x |