Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.3][Docs] Update docs for 3.3 release #3992

Merged
merged 1 commit into from
Dec 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions docs/source/delta-batch.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,6 @@ You can create tables in the following ways.

SQL also supports creating a table at a path, without creating an entry in the Hive metastore.

.. code-language-tabs::

```sql
-- Create or replace table with path
CREATE OR REPLACE TABLE delta.`/tmp/delta/people10m` (
Expand Down
8 changes: 5 additions & 3 deletions docs/source/delta-clustering.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ To enable liquid clustering, add the `CLUSTER BY` phrase to a table creation sta
-- Create an empty table
CREATE TABLE table1(col0 int, col1 string) USING DELTA CLUSTER BY (col0);

-- Using a CTAS statement
-- Using a CTAS statement (Delta 3.3+)
CREATE EXTERNAL TABLE table2 CLUSTER BY (col0) -- specify clustering after table name, not in subquery
LOCATION 'table_location'
AS SELECT * FROM table1;
Expand Down Expand Up @@ -61,14 +61,14 @@ To enable liquid clustering, add the `CLUSTER BY` phrase to a table creation sta

.. warning:: Tables created with liquid clustering have `Clustering` and `DomainMetadata` table features enabled (both writer features) and use Delta writer version 7 and reader version 1. Table protocol versions cannot be downgraded. See [_](/versioning.md).

You can enable liquid clustering on an existing unpartitioned Delta table using the following syntax:
In <Delta> 3.3 and above you can enable liquid clustering on an existing unpartitioned Delta table using the following syntax:

```sql
ALTER TABLE <table_name>
CLUSTER BY (<clustering_columns>)
```

.. important:: Default behavior does not apply clustering to previously written data. To force reclustering for all records, you must use `OPTIMIZE FULL`. See [_](#optimize-full).
.. important:: Default behavior does not apply clustering to previously written data. To force reclustering for all records, you must use `OPTIMIZE FULL`. See [_](#recluster-entire-table).

## Choose clustering columns

Expand Down Expand Up @@ -97,6 +97,8 @@ OPTIMIZE table_name;

Liquid clustering is incremental, meaning that data is only rewritten as necessary to accommodate data that needs to be clustered. Already clustered data files with different clustering columns are not rewritten.

### Recluster entire table

In <Delta> 3.3 and above, you can force reclustering of all records in a table with the following syntax:

```sql
Expand Down
4 changes: 4 additions & 0 deletions docs/source/delta-drop-feature.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ You can drop the following Delta table features:
- `deletionVectors`. See [_](delta-deletion-vectors.md).
- `typeWidening-preview`. See [_](delta-type-widening.md). Type widening is available in preview in <Delta> 3.2.0 and above.
- `v2Checkpoint`. See [V2 Checkpoint Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#v2-spec). Drop support for V2 Checkpoints is available in <Delta> 3.1.0 and above.
- `columnMapping`. See [_](delta-column-mapping.md). Drop support for column mapping is available in <Delta> 3.3.0 and above.
- `vacuumProtocolCheck`. See [Vacuum Protocol Check Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#vacuum-protocol-check). Drop support for vacuum protocol check is available in <Delta> 3.3.0 and above.
- `checkConstraints`. See [_](delta-constraints.md). Drop support for check constraints is available in <Delta> 3.3.0 and above.
- `inCommitTimestamp`. See [_](delta-batch.md#in-tommit-timestamps). Drop support for In-Commit Timestamp is available in <Delta> 3.3.0 and above.

You cannot drop other [Delta table features](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#valid-feature-names-in-table-features).

Expand Down
4 changes: 4 additions & 0 deletions docs/source/delta-sharing.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,10 @@ Please remember to set the spark configurations mentioned in [_](delta-batch.md#
| - | - |
| [Deletion Vectors](delta-deletion-vectors.md) | 3.1.0 |
| [Column Mapping](delta-column-mapping.md) | 3.1.0 |
| [Timestamp without Timezone](https://spark.apache.org/docs/latest/sql-ref-datatypes.html) | 3.3.0 |
| [Type widening (Preview)](/delta-type-widening.md) | 3.3.0 |
| [Variant Type (Preview)](https://github.com/delta-io/delta/blob/master/protocol_rfcs/variant-type.md) | 3.3.0 |


Batch queries can be performed as is, because it can automatically resolve the `responseFormat` based on the table features of the shared table.
An additional option `responseFormat=delta` needs to be set for cdf and streaming queries when reading shared tables with Deletion Vectors or Column Mapping enabled.
Expand Down
10 changes: 5 additions & 5 deletions docs/source/delta-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,11 +66,11 @@ In this default mode, <Delta> supports concurrent reads from multiple clusters,

This section explains how to quickly start reading and writing Delta tables on S3 using single-cluster mode. For a detailed explanation of the configuration, see [_](#setup-configuration-s3-multi-cluster).

#. Use the following command to launch a Spark shell with <Delta> and S3 support (assuming you use Spark 3.5.0 which is pre-built for Hadoop 3.3.4):
#. Use the following command to launch a Spark shell with <Delta> and S3 support (assuming you use Spark 3.5.3 which is pre-built for Hadoop 3.3.4):

```bash
bin/spark-shell \
--packages io.delta:delta-spark_2.12:3.1.0,org.apache.hadoop:hadoop-aws:3.3.4 \
--packages io.delta:delta-spark_2.12:3.3.0,org.apache.hadoop:hadoop-aws:3.3.4 \
--conf spark.hadoop.fs.s3a.access.key=<your-s3-access-key> \
--conf spark.hadoop.fs.s3a.secret.key=<your-s3-secret-key>
```
Expand All @@ -91,7 +91,7 @@ For efficient listing of <Delta> metadata files on S3, set the configuration `de

```scala
bin/spark-shell \
--packages io.delta:delta-spark_2.12:3.1.0,org.apache.hadoop:hadoop-aws:3.3.4 \
--packages io.delta:delta-spark_2.12:3.3.0,org.apache.hadoop:hadoop-aws:3.3.4 \
--conf spark.hadoop.fs.s3a.access.key=<your-s3-access-key> \
--conf spark.hadoop.fs.s3a.secret.key=<your-s3-secret-key> \
--conf "spark.hadoop.delta.enableFastS3AListFrom=true
Expand Down Expand Up @@ -138,11 +138,11 @@ This mode supports concurrent writes to S3 from multiple clusters and has to be

This section explains how to quickly start reading and writing Delta tables on S3 using multi-cluster mode.

#. Use the following command to launch a Spark shell with <Delta> and S3 support (assuming you use Spark 3.5.0 which is pre-built for Hadoop 3.3.4):
#. Use the following command to launch a Spark shell with <Delta> and S3 support (assuming you use Spark 3.5.3 which is pre-built for Hadoop 3.3.4):

```bash
bin/spark-shell \
--packages io.delta:delta-spark_2.12:3.1.0,org.apache.hadoop:hadoop-aws:3.3.4,io.delta:delta-storage-s3-dynamodb:3.1.0 \
--packages io.delta:delta-spark_2.12:3,org.apache.hadoop:hadoop-aws:3.3.4,io.delta:delta-storage-s3-dynamodb:3.3.0 \
--conf spark.hadoop.fs.s3a.access.key=<your-s3-access-key> \
--conf spark.hadoop.fs.s3a.secret.key=<your-s3-secret-key> \
--conf spark.delta.logStore.s3a.impl=io.delta.storage.S3DynamoDBLogStore \
Expand Down
8 changes: 4 additions & 4 deletions docs/source/delta-type-widening.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ description: Learn about type widening in Delta.

# Delta type widening

.. note:: This feature is available in preview in <Delta> 3.2.
.. note:: This feature is available in preview in <Delta> 3.2 and above.

The type widening feature allows changing the type of columns in a Delta table to a wider type. This enables manual type changes using the `ALTER TABLE ALTER COLUMN` command and automatic type migration with schema evolution in `INSERT` and `MERGE INTO` commands.

## Supported type changes

The feature preview in <Delta> 3.2 supports a limited set of type changes:
The feature preview in <Delta> 3.2 and above supports a limited set of type changes:
- `BYTE` to `SHORT` and `INT`.
- `SHORT` to `INT`

Expand All @@ -31,7 +31,7 @@ You can enable type widening on an existing table by setting the `delta.enableTy
Alternatively, you can enable type widening during table creation:

```sql
CREATE TABLE T(c1 INT) USING DELTA TBLPROPERTIES('delta.enableTypeWidening' = 'true')
CREATE TABLE <table_name> USING DELTA TBLPROPERTIES('delta.enableTypeWidening' = 'true')
```

To disable type widening:
Expand Down Expand Up @@ -68,7 +68,7 @@ When all conditions are satisfied, the target table schema is updated automatica
The type widening feature can be removed from a Delta table using the `DROP FEATURE` command:

```sql
ALTER TABLE <table-name> DROP FEATURE 'typeWidening-preview' [TRUNCATE HISTORY]
ALTER TABLE <table_name> DROP FEATURE 'typeWidening-preview' [TRUNCATE HISTORY]
```

See [_](delta-drop-feature.md) for more information on dropping Delta table features.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/delta-utility.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ default retention threshold for the files is 7 days. To change this behavior, se
VACUUM eventsTable LITE -- This VACUUM in ‘LITE’ mode runs faster.
-- Instead of finding all files in the table directory, `VACUUM LITE` uses the Delta transaction log to identify and remove files no longer referenced by any table versions within the retention duration.
-- If `VACUUM LITE` cannot be completed because the Delta log has been pruned a `DELTA_CANNOT_VACUUM_LITE` exception is raised.
-- This mode is available only in <Delta> 3.3 and above.
-- This mode is available only in Delta 3.3 and above.

VACUUM '/data/events' -- vacuum files in path-based table

Expand Down
16 changes: 8 additions & 8 deletions docs/source/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,13 @@ Follow these instructions to set up <Delta> with Spark. You can run the steps in

#. Run as a project: Set up a Maven or SBT project (Scala or Java) with <Delta>, copy the code snippets into a source file, and run the project. Alternatively, you can use the [examples provided in the Github repository](https://github.com/delta-io/delta/tree/master/examples).

.. important:: For all of the following instructions, make sure to install the correct version of Spark or PySpark that is compatible with <Delta> `3.2.0`. See the [release compatibility matrix](releases.md) for details.
.. important:: For all of the following instructions, make sure to install the correct version of Spark or PySpark that is compatible with <Delta> `3.3.0`. See the [release compatibility matrix](releases.md) for details.

### Prerequisite: set up Java

As mentioned in the official <AS> installation instructions [here](https://spark.apache.org/docs/latest/index.html#downloading), make sure you have a valid Java version installed (8, 11, or 17) and that Java is configured correctly on your system using either the system `PATH` or `JAVA_HOME` environmental variable.

Windows users should follow the instructions in this [blog](https://phoenixnap.com/kb/install-spark-on-windows-10), making sure to use the correct version of <AS> that is compatible with <Delta> `3.2.0`.
Windows users should follow the instructions in this [blog](https://phoenixnap.com/kb/install-spark-on-windows-10), making sure to use the correct version of <AS> that is compatible with <Delta> `3.3.0`.

### Set up interactive shell

Expand All @@ -35,7 +35,7 @@ To use <Delta> interactively within the Spark SQL, Scala, or Python shell, you n
Download the [compatible version](releases.md) of <AS> by following instructions from [Downloading Spark](https://spark.apache.org/downloads.html), either using `pip` or by downloading and extracting the archive and running `spark-sql` in the extracted directory.

```bash
bin/spark-sql --packages io.delta:delta-spark_2.12:3.2.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
bin/spark-sql --packages io.delta:delta-spark_2.12:3.3.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
```

#### PySpark Shell
Expand All @@ -49,15 +49,15 @@ bin/spark-sql --packages io.delta:delta-spark_2.12:3.2.0 --conf "spark.sql.exten
#. Run PySpark with the <Delta> package and additional configurations:

```bash
pyspark --packages io.delta:delta-spark_2.12:3.2.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
pyspark --packages io.delta:delta-spark_2.12:3.3.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
```

#### Spark Scala Shell

Download the [compatible version](releases.md) of <AS> by following instructions from [Downloading Spark](https://spark.apache.org/downloads.html), either using `pip` or by downloading and extracting the archive and running `spark-shell` in the extracted directory.

```bash
bin/spark-shell --packages io.delta:delta-spark_2.12:3.2.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
bin/spark-shell --packages io.delta:delta-spark_2.12:3.3.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
```

### Set up project
Expand All @@ -72,7 +72,7 @@ You include <Delta> in your Maven project by adding it as a dependency in your P
<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-spark_2.12</artifactId>
<version>3.2.0</version>
<version>3.3.0</version>
</dependency>
```

Expand All @@ -81,12 +81,12 @@ You include <Delta> in your Maven project by adding it as a dependency in your P
You include <Delta> in your SBT project by adding the following line to your `build.sbt` file:

```scala
libraryDependencies += "io.delta" %% "delta-spark" % "3.2.0"
libraryDependencies += "io.delta" %% "delta-spark" % "3.3.0"
```

#### Python

To set up a Python project (for example, for unit testing), you can install <Delta> using `pip install delta-spark==3.2.0` and then configure the SparkSession with the `configure_spark_with_delta_pip()` utility function in <Delta>.
To set up a Python project (for example, for unit testing), you can install <Delta> using `pip install delta-spark==3.3.0` and then configure the SparkSession with the `configure_spark_with_delta_pip()` utility function in <Delta>.

```python
import pyspark
Expand Down
1 change: 1 addition & 0 deletions docs/source/releases.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ The following table lists <Delta> versions and their compatible <AS> versions.

| <Delta> version | <AS> version |
| --- | --- |
| 3.3.x | 3.5.x |
| 3.2.x | 3.5.x |
| 3.1.x | 3.5.x |
| 3.0.x | 3.5.x |
Expand Down
Loading