Skip to content

Commit

Permalink
[Docs] Update docs for 3.3 release (#3992) (#4015)
Browse files Browse the repository at this point in the history
<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description

<!--
- Describe what this PR changes.
- Describe why we need the change.
 
If this PR resolves an issue be sure to include "Resolves #XXX" to
correctly link and close the issue upon merge.
-->

Cherrypick
2e186b4
to master.

## How was this patch tested?

<!--
If tests were added, say they were added here. Please make sure to test
the changes thoroughly including negative and positive cases if
possible.
If the changes were tested in any way other than unit tests, please
clarify how you tested step by step (ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future).
If the changes were not tested, please explain why.
-->

## Does this PR introduce _any_ user-facing changes?

<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->
  • Loading branch information
allisonport-db authored Jan 6, 2025
1 parent d6d6b04 commit 55801a2
Show file tree
Hide file tree
Showing 9 changed files with 32 additions and 23 deletions.
2 changes: 0 additions & 2 deletions docs/source/delta-batch.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,6 @@ You can create tables in the following ways.

SQL also supports creating a table at a path, without creating an entry in the Hive metastore.

.. code-language-tabs::

```sql
-- Create or replace table with path
CREATE OR REPLACE TABLE delta.`/tmp/delta/people10m` (
Expand Down
8 changes: 5 additions & 3 deletions docs/source/delta-clustering.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ To enable liquid clustering, add the `CLUSTER BY` phrase to a table creation sta
-- Create an empty table
CREATE TABLE table1(col0 int, col1 string) USING DELTA CLUSTER BY (col0);

-- Using a CTAS statement
-- Using a CTAS statement (Delta 3.3+)
CREATE EXTERNAL TABLE table2 CLUSTER BY (col0) -- specify clustering after table name, not in subquery
LOCATION 'table_location'
AS SELECT * FROM table1;
Expand Down Expand Up @@ -61,14 +61,14 @@ To enable liquid clustering, add the `CLUSTER BY` phrase to a table creation sta

.. warning:: Tables created with liquid clustering have `Clustering` and `DomainMetadata` table features enabled (both writer features) and use Delta writer version 7 and reader version 1. Table protocol versions cannot be downgraded. See [_](/versioning.md).

You can enable liquid clustering on an existing unpartitioned Delta table using the following syntax:
In <Delta> 3.3 and above you can enable liquid clustering on an existing unpartitioned Delta table using the following syntax:

```sql
ALTER TABLE <table_name>
CLUSTER BY (<clustering_columns>)
```

.. important:: Default behavior does not apply clustering to previously written data. To force reclustering for all records, you must use `OPTIMIZE FULL`. See [_](#optimize-full).
.. important:: Default behavior does not apply clustering to previously written data. To force reclustering for all records, you must use `OPTIMIZE FULL`. See [_](#recluster-entire-table).

## Choose clustering columns

Expand Down Expand Up @@ -97,6 +97,8 @@ OPTIMIZE table_name;

Liquid clustering is incremental, meaning that data is only rewritten as necessary to accommodate data that needs to be clustered. Already clustered data files with different clustering columns are not rewritten.

### Recluster entire table

In <Delta> 3.3 and above, you can force reclustering of all records in a table with the following syntax:

```sql
Expand Down
4 changes: 4 additions & 0 deletions docs/source/delta-drop-feature.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ You can drop the following Delta table features:
- `deletionVectors`. See [_](delta-deletion-vectors.md).
- `typeWidening-preview`. See [_](delta-type-widening.md). Type widening is available in preview in <Delta> 3.2.0 and above.
- `v2Checkpoint`. See [V2 Checkpoint Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#v2-spec). Drop support for V2 Checkpoints is available in <Delta> 3.1.0 and above.
- `columnMapping`. See [_](delta-column-mapping.md). Drop support for column mapping is available in <Delta> 3.3.0 and above.
- `vacuumProtocolCheck`. See [Vacuum Protocol Check Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#vacuum-protocol-check). Drop support for vacuum protocol check is available in <Delta> 3.3.0 and above.
- `checkConstraints`. See [_](delta-constraints.md). Drop support for check constraints is available in <Delta> 3.3.0 and above.
- `inCommitTimestamp`. See [_](delta-batch.md#in-tommit-timestamps). Drop support for In-Commit Timestamp is available in <Delta> 3.3.0 and above.

You cannot drop other [Delta table features](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#valid-feature-names-in-table-features).

Expand Down
4 changes: 4 additions & 0 deletions docs/source/delta-sharing.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,10 @@ Please remember to set the spark configurations mentioned in [_](delta-batch.md#
| - | - |
| [Deletion Vectors](delta-deletion-vectors.md) | 3.1.0 |
| [Column Mapping](delta-column-mapping.md) | 3.1.0 |
| [Timestamp without Timezone](https://spark.apache.org/docs/latest/sql-ref-datatypes.html) | 3.3.0 |
| [Type widening (Preview)](/delta-type-widening.md) | 3.3.0 |
| [Variant Type (Preview)](https://github.com/delta-io/delta/blob/master/protocol_rfcs/variant-type.md) | 3.3.0 |


Batch queries can be performed as is, because it can automatically resolve the `responseFormat` based on the table features of the shared table.
An additional option `responseFormat=delta` needs to be set for cdf and streaming queries when reading shared tables with Deletion Vectors or Column Mapping enabled.
Expand Down
10 changes: 5 additions & 5 deletions docs/source/delta-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,11 +66,11 @@ In this default mode, <Delta> supports concurrent reads from multiple clusters,

This section explains how to quickly start reading and writing Delta tables on S3 using single-cluster mode. For a detailed explanation of the configuration, see [_](#setup-configuration-s3-multi-cluster).

#. Use the following command to launch a Spark shell with <Delta> and S3 support (assuming you use Spark 3.5.0 which is pre-built for Hadoop 3.3.4):
#. Use the following command to launch a Spark shell with <Delta> and S3 support (assuming you use Spark 3.5.3 which is pre-built for Hadoop 3.3.4):

```bash
bin/spark-shell \
--packages io.delta:delta-spark_2.12:3.1.0,org.apache.hadoop:hadoop-aws:3.3.4 \
--packages io.delta:delta-spark_2.12:3.3.0,org.apache.hadoop:hadoop-aws:3.3.4 \
--conf spark.hadoop.fs.s3a.access.key=<your-s3-access-key> \
--conf spark.hadoop.fs.s3a.secret.key=<your-s3-secret-key>
```
Expand All @@ -91,7 +91,7 @@ For efficient listing of <Delta> metadata files on S3, set the configuration `de

```scala
bin/spark-shell \
--packages io.delta:delta-spark_2.12:3.1.0,org.apache.hadoop:hadoop-aws:3.3.4 \
--packages io.delta:delta-spark_2.12:3.3.0,org.apache.hadoop:hadoop-aws:3.3.4 \
--conf spark.hadoop.fs.s3a.access.key=<your-s3-access-key> \
--conf spark.hadoop.fs.s3a.secret.key=<your-s3-secret-key> \
--conf "spark.hadoop.delta.enableFastS3AListFrom=true
Expand Down Expand Up @@ -138,11 +138,11 @@ This mode supports concurrent writes to S3 from multiple clusters and has to be

This section explains how to quickly start reading and writing Delta tables on S3 using multi-cluster mode.

#. Use the following command to launch a Spark shell with <Delta> and S3 support (assuming you use Spark 3.5.0 which is pre-built for Hadoop 3.3.4):
#. Use the following command to launch a Spark shell with <Delta> and S3 support (assuming you use Spark 3.5.3 which is pre-built for Hadoop 3.3.4):

```bash
bin/spark-shell \
--packages io.delta:delta-spark_2.12:3.1.0,org.apache.hadoop:hadoop-aws:3.3.4,io.delta:delta-storage-s3-dynamodb:3.1.0 \
--packages io.delta:delta-spark_2.12:3,org.apache.hadoop:hadoop-aws:3.3.4,io.delta:delta-storage-s3-dynamodb:3.3.0 \
--conf spark.hadoop.fs.s3a.access.key=<your-s3-access-key> \
--conf spark.hadoop.fs.s3a.secret.key=<your-s3-secret-key> \
--conf spark.delta.logStore.s3a.impl=io.delta.storage.S3DynamoDBLogStore \
Expand Down
8 changes: 4 additions & 4 deletions docs/source/delta-type-widening.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ description: Learn about type widening in Delta.

# Delta type widening

.. note:: This feature is available in preview in <Delta> 3.2.
.. note:: This feature is available in preview in <Delta> 3.2 and above.

The type widening feature allows changing the type of columns in a Delta table to a wider type. This enables manual type changes using the `ALTER TABLE ALTER COLUMN` command and automatic type migration with schema evolution in `INSERT` and `MERGE INTO` commands.

## Supported type changes

The feature preview in <Delta> 3.2 supports a limited set of type changes:
The feature preview in <Delta> 3.2 and above supports a limited set of type changes:
- `BYTE` to `SHORT` and `INT`.
- `SHORT` to `INT`

Expand All @@ -31,7 +31,7 @@ You can enable type widening on an existing table by setting the `delta.enableTy
Alternatively, you can enable type widening during table creation:

```sql
CREATE TABLE T(c1 INT) USING DELTA TBLPROPERTIES('delta.enableTypeWidening' = 'true')
CREATE TABLE <table_name> USING DELTA TBLPROPERTIES('delta.enableTypeWidening' = 'true')
```

To disable type widening:
Expand Down Expand Up @@ -68,7 +68,7 @@ When all conditions are satisfied, the target table schema is updated automatica
The type widening feature can be removed from a Delta table using the `DROP FEATURE` command:

```sql
ALTER TABLE <table-name> DROP FEATURE 'typeWidening-preview' [TRUNCATE HISTORY]
ALTER TABLE <table_name> DROP FEATURE 'typeWidening-preview' [TRUNCATE HISTORY]
```

See [_](delta-drop-feature.md) for more information on dropping Delta table features.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/delta-utility.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ default retention threshold for the files is 7 days. To change this behavior, se
VACUUM eventsTable LITE -- This VACUUM in ‘LITE’ mode runs faster.
-- Instead of finding all files in the table directory, `VACUUM LITE` uses the Delta transaction log to identify and remove files no longer referenced by any table versions within the retention duration.
-- If `VACUUM LITE` cannot be completed because the Delta log has been pruned a `DELTA_CANNOT_VACUUM_LITE` exception is raised.
-- This mode is available only in <Delta> 3.3 and above.
-- This mode is available only in Delta 3.3 and above.

VACUUM '/data/events' -- vacuum files in path-based table

Expand Down
16 changes: 8 additions & 8 deletions docs/source/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,13 @@ Follow these instructions to set up <Delta> with Spark. You can run the steps in

#. Run as a project: Set up a Maven or SBT project (Scala or Java) with <Delta>, copy the code snippets into a source file, and run the project. Alternatively, you can use the [examples provided in the Github repository](https://github.com/delta-io/delta/tree/master/examples).

.. important:: For all of the following instructions, make sure to install the correct version of Spark or PySpark that is compatible with <Delta> `3.2.0`. See the [release compatibility matrix](releases.md) for details.
.. important:: For all of the following instructions, make sure to install the correct version of Spark or PySpark that is compatible with <Delta> `3.3.0`. See the [release compatibility matrix](releases.md) for details.

### Prerequisite: set up Java

As mentioned in the official <AS> installation instructions [here](https://spark.apache.org/docs/latest/index.html#downloading), make sure you have a valid Java version installed (8, 11, or 17) and that Java is configured correctly on your system using either the system `PATH` or `JAVA_HOME` environmental variable.

Windows users should follow the instructions in this [blog](https://phoenixnap.com/kb/install-spark-on-windows-10), making sure to use the correct version of <AS> that is compatible with <Delta> `3.2.0`.
Windows users should follow the instructions in this [blog](https://phoenixnap.com/kb/install-spark-on-windows-10), making sure to use the correct version of <AS> that is compatible with <Delta> `3.3.0`.

### Set up interactive shell

Expand All @@ -35,7 +35,7 @@ To use <Delta> interactively within the Spark SQL, Scala, or Python shell, you n
Download the [compatible version](releases.md) of <AS> by following instructions from [Downloading Spark](https://spark.apache.org/downloads.html), either using `pip` or by downloading and extracting the archive and running `spark-sql` in the extracted directory.

```bash
bin/spark-sql --packages io.delta:delta-spark_2.12:3.2.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
bin/spark-sql --packages io.delta:delta-spark_2.12:3.3.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
```

#### PySpark Shell
Expand All @@ -49,15 +49,15 @@ bin/spark-sql --packages io.delta:delta-spark_2.12:3.2.0 --conf "spark.sql.exten
#. Run PySpark with the <Delta> package and additional configurations:

```bash
pyspark --packages io.delta:delta-spark_2.12:3.2.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
pyspark --packages io.delta:delta-spark_2.12:3.3.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
```

#### Spark Scala Shell

Download the [compatible version](releases.md) of <AS> by following instructions from [Downloading Spark](https://spark.apache.org/downloads.html), either using `pip` or by downloading and extracting the archive and running `spark-shell` in the extracted directory.

```bash
bin/spark-shell --packages io.delta:delta-spark_2.12:3.2.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
bin/spark-shell --packages io.delta:delta-spark_2.12:3.3.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
```

### Set up project
Expand All @@ -72,7 +72,7 @@ You include <Delta> in your Maven project by adding it as a dependency in your P
<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-spark_2.12</artifactId>
<version>3.2.0</version>
<version>3.3.0</version>
</dependency>
```

Expand All @@ -81,12 +81,12 @@ You include <Delta> in your Maven project by adding it as a dependency in your P
You include <Delta> in your SBT project by adding the following line to your `build.sbt` file:

```scala
libraryDependencies += "io.delta" %% "delta-spark" % "3.2.0"
libraryDependencies += "io.delta" %% "delta-spark" % "3.3.0"
```

#### Python

To set up a Python project (for example, for unit testing), you can install <Delta> using `pip install delta-spark==3.2.0` and then configure the SparkSession with the `configure_spark_with_delta_pip()` utility function in <Delta>.
To set up a Python project (for example, for unit testing), you can install <Delta> using `pip install delta-spark==3.3.0` and then configure the SparkSession with the `configure_spark_with_delta_pip()` utility function in <Delta>.

```python
import pyspark
Expand Down
1 change: 1 addition & 0 deletions docs/source/releases.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ The following table lists <Delta> versions and their compatible <AS> versions.

| <Delta> version | <AS> version |
| --- | --- |
| 3.3.x | 3.5.x |
| 3.2.x | 3.5.x |
| 3.1.x | 3.5.x |
| 3.0.x | 3.5.x |
Expand Down

0 comments on commit 55801a2

Please sign in to comment.