From 2a3887be60980c1d9953e04a700914924be357a2 Mon Sep 17 00:00:00 2001 From: Matthew Powers Date: Tue, 21 Nov 2023 10:59:07 -0500 Subject: [PATCH 1/4] docs on append, overwrite, delete and z ordering --- .../appending-overwriting-delta-lake-table.md | 76 +++++++++++++++++++ docs/usage/create-delta-lake-table.md | 25 ++++++ .../deleting-rows-from-delta-lake-table.md | 34 +++++++++ docs/usage/optimize/delta-lake-z-order.md | 16 ++++ .../small-file-compaction-with-optimize.md | 0 mkdocs.yml | 17 +++-- 6 files changed, 162 insertions(+), 6 deletions(-) create mode 100644 docs/usage/appending-overwriting-delta-lake-table.md create mode 100644 docs/usage/create-delta-lake-table.md create mode 100644 docs/usage/deleting-rows-from-delta-lake-table.md create mode 100644 docs/usage/optimize/delta-lake-z-order.md rename docs/usage/{ => optimize}/small-file-compaction-with-optimize.md (100%) diff --git a/docs/usage/appending-overwriting-delta-lake-table.md b/docs/usage/appending-overwriting-delta-lake-table.md new file mode 100644 index 0000000000..29a77aec1b --- /dev/null +++ b/docs/usage/appending-overwriting-delta-lake-table.md @@ -0,0 +1,76 @@ +# Appending to and overwriting a Delta Lake table + +This section explains how to append to an exising Delta table and how to overwrite a Delta table. + +## Delta Lake append transactions + +Suppose you have a Delta table with the following contents: + +``` ++-------+----------+ +| num | letter | +|-------+----------| +| 1 | a | +| 2 | b | +| 3 | c | ++-------+----------+ +``` + +Append two additional rows of data to the table: + +```python +df = pd.DataFrame({"num": [8, 9], "letter": ["dd", "ee"]}) +dl.writer.write_deltalake("tmp/some-table", df, mode="append") +``` + +Here are the updated contents of the Delta table: + +``` ++-------+----------+ +| num | letter | +|-------+----------| +| 1 | a | +| 2 | b | +| 3 | c | +| 8 | dd | +| 9 | ee | ++-------+----------+ +``` + +Now let's see how to perform an overwrite transaction. + +## Delta Lake overwrite transactions + +Now let's see how to overwrite the exisitng Delta table. + +```python +df = pd.DataFrame({"num": [11, 22], "letter": ["aa", "bb"]}) +dl.writer.write_deltalake("tmp/some-table", df, mode="overwrite") +``` + +Here are the contents of the Delta table after the overwrite operation: + +``` ++-------+----------+ +| num | letter | +|-------+----------| +| 11 | aa | +| 22 | bb | ++-------+----------+ +``` + +Overwriting just performs a logical delete. It doesn't physically remove the previous data from storage. Time travel back to the previous version to confirm that the old version of the table is still accessable. + +``` +dt = dl.DeltaTable("tmp/some-table", version=1) + ++-------+----------+ +| num | letter | +|-------+----------| +| 1 | a | +| 2 | b | +| 3 | c | +| 8 | dd | +| 9 | ee | ++-------+----------+ +``` diff --git a/docs/usage/create-delta-lake-table.md b/docs/usage/create-delta-lake-table.md new file mode 100644 index 0000000000..363802a5fe --- /dev/null +++ b/docs/usage/create-delta-lake-table.md @@ -0,0 +1,25 @@ +# Creating a Delta Lake Table + +This section explains how to create a Delta Lake table. + +You can easily write a DataFrame to a Delta table. + +```python +import deltalake as dl +import pandas as pd + +df = pd.DataFrame({"num": [1, 2, 3], "letter": ["a", "b", "c"]}) +dl.writer.write_deltalake("tmp/some-table", df) +``` + +Here are the contents of the Delta table in storage: + +``` ++-------+----------+ +| num | letter | +|-------+----------| +| 1 | a | +| 2 | b | +| 3 | c | ++-------+----------+ +``` diff --git a/docs/usage/deleting-rows-from-delta-lake-table.md b/docs/usage/deleting-rows-from-delta-lake-table.md new file mode 100644 index 0000000000..2471690f50 --- /dev/null +++ b/docs/usage/deleting-rows-from-delta-lake-table.md @@ -0,0 +1,34 @@ +# Deleting rows from a Delta Lake table + +This section explains how to delete rows from a Delta Lake table. + +Suppose you have the following Delta table with four rows: + +``` ++-------+----------+ +| num | letter | +|-------+----------| +| 1 | a | +| 2 | b | +| 3 | c | +| 4 | d | ++-------+----------+ +``` + +Here's how to delete all the rows where the `num` is greater than 2: + +```python +dt = dl.DeltaTable("tmp/my-table") +dt.delete("num > 2") +``` + +Here are the contents of the Delta table after the delete operation has been performed: + +``` ++-------+----------+ +| num | letter | +|-------+----------| +| 1 | a | +| 2 | b | ++-------+----------+ +``` diff --git a/docs/usage/optimize/delta-lake-z-order.md b/docs/usage/optimize/delta-lake-z-order.md new file mode 100644 index 0000000000..81ddabefcb --- /dev/null +++ b/docs/usage/optimize/delta-lake-z-order.md @@ -0,0 +1,16 @@ +# Delta Lake Z Order + +This section explains how to Z Order a Delta table. + +Z Ordering colocates similar data in the same files, which allows for more better file skipping and faster queries. + +Suppose you have a table with `first_name`, `age`, and `country` columns. + +If you Z Order the data by the `country` column, then individuals from the same country will be stored in the same files. When you subquently query the data for individuals from a given country, it will execute faster because more data can be skipped. + +Here's how to Z Order a Delta table: + +```python +dt = DeltaTable("tmp") +dt.optimize.z_order([country]) +``` diff --git a/docs/usage/small-file-compaction-with-optimize.md b/docs/usage/optimize/small-file-compaction-with-optimize.md similarity index 100% rename from docs/usage/small-file-compaction-with-optimize.md rename to docs/usage/optimize/small-file-compaction-with-optimize.md diff --git a/mkdocs.yml b/mkdocs.yml index 41f0ee309c..514872e5c8 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -19,12 +19,17 @@ nav: - Usage: - Installation: usage/installation.md - Overview: usage/index.md - - Loading a Delta Table: usage/loading-table.md - - Examining a Delta Table: usage/examining-table.md - - Querying a Delta Table: usage/querying-delta-tables.md - - Managing a Delta Table: usage/managing-tables.md - - Writing Delta Tables: usage/writing-delta-tables.md - - Small file compaction: usage/small-file-compaction-with-optimize.md + - Creating a table: usage/create-delta-lake-table.md + - Loading a table: usage/loading-table.md + - Append/overwrite tables: usage/appending-overwriting-delta-lake-table.md + - Examining a table: usage/examining-table.md + - Querying a table: usage/querying-delta-tables.md + - Managing a table: usage/managing-tables.md + - Writing a table: usage/writing-delta-tables.md + - Deleting rows from a table: usage/deleting-rows-from-delta-lake-table.md + - Optimize: + - Small file compaction: usage/optimize/small-file-compaction-with-optimize.md + - Z Order: usage/optimize/delta-lake-z-order.md - API Reference: - api/delta_table.md - api/schema.md From 9123367979c786cfee64a41c3dc92798e554fe5e Mon Sep 17 00:00:00 2001 From: Matthew Powers Date: Wed, 22 Nov 2023 07:37:06 -0500 Subject: [PATCH 2/4] docs: update imports --- docs/usage/appending-overwriting-delta-lake-table.md | 6 ++++-- docs/usage/create-delta-lake-table.md | 3 ++- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/docs/usage/appending-overwriting-delta-lake-table.md b/docs/usage/appending-overwriting-delta-lake-table.md index 29a77aec1b..46f8881c86 100644 --- a/docs/usage/appending-overwriting-delta-lake-table.md +++ b/docs/usage/appending-overwriting-delta-lake-table.md @@ -19,8 +19,10 @@ Suppose you have a Delta table with the following contents: Append two additional rows of data to the table: ```python +from deltalake.writer import write_deltalake + df = pd.DataFrame({"num": [8, 9], "letter": ["dd", "ee"]}) -dl.writer.write_deltalake("tmp/some-table", df, mode="append") +write_deltalake("tmp/some-table", df, mode="append") ``` Here are the updated contents of the Delta table: @@ -45,7 +47,7 @@ Now let's see how to overwrite the exisitng Delta table. ```python df = pd.DataFrame({"num": [11, 22], "letter": ["aa", "bb"]}) -dl.writer.write_deltalake("tmp/some-table", df, mode="overwrite") +write_deltalake("tmp/some-table", df, mode="overwrite") ``` Here are the contents of the Delta table after the overwrite operation: diff --git a/docs/usage/create-delta-lake-table.md b/docs/usage/create-delta-lake-table.md index 363802a5fe..f14adeac8e 100644 --- a/docs/usage/create-delta-lake-table.md +++ b/docs/usage/create-delta-lake-table.md @@ -6,10 +6,11 @@ You can easily write a DataFrame to a Delta table. ```python import deltalake as dl +from deltalake.writer import write_deltalake import pandas as pd df = pd.DataFrame({"num": [1, 2, 3], "letter": ["a", "b", "c"]}) -dl.writer.write_deltalake("tmp/some-table", df) +write_deltalake("tmp/some-table", df) ``` Here are the contents of the Delta table in storage: From c1d97ac0e5c2b8b546e6a5efa7f697022213a9e9 Mon Sep 17 00:00:00 2001 From: Matthew Powers Date: Wed, 22 Nov 2023 14:58:57 -0500 Subject: [PATCH 3/4] docs: remove unnecessary import --- docs/usage/create-delta-lake-table.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/usage/create-delta-lake-table.md b/docs/usage/create-delta-lake-table.md index f14adeac8e..831882c0e6 100644 --- a/docs/usage/create-delta-lake-table.md +++ b/docs/usage/create-delta-lake-table.md @@ -5,7 +5,6 @@ This section explains how to create a Delta Lake table. You can easily write a DataFrame to a Delta table. ```python -import deltalake as dl from deltalake.writer import write_deltalake import pandas as pd From 163f61af778c5ecc0e02fc28ca448b8a01b6b186 Mon Sep 17 00:00:00 2001 From: Matthew Powers Date: Wed, 22 Nov 2023 15:11:42 -0500 Subject: [PATCH 4/4] docs: more import fixes --- docs/usage/appending-overwriting-delta-lake-table.md | 4 ++-- docs/usage/create-delta-lake-table.md | 2 +- docs/usage/deleting-rows-from-delta-lake-table.md | 2 +- docs/usage/optimize/delta-lake-z-order.md | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/usage/appending-overwriting-delta-lake-table.md b/docs/usage/appending-overwriting-delta-lake-table.md index 46f8881c86..0930d8da1e 100644 --- a/docs/usage/appending-overwriting-delta-lake-table.md +++ b/docs/usage/appending-overwriting-delta-lake-table.md @@ -19,7 +19,7 @@ Suppose you have a Delta table with the following contents: Append two additional rows of data to the table: ```python -from deltalake.writer import write_deltalake +from deltalake import write_deltalake, DeltaTable df = pd.DataFrame({"num": [8, 9], "letter": ["dd", "ee"]}) write_deltalake("tmp/some-table", df, mode="append") @@ -64,7 +64,7 @@ Here are the contents of the Delta table after the overwrite operation: Overwriting just performs a logical delete. It doesn't physically remove the previous data from storage. Time travel back to the previous version to confirm that the old version of the table is still accessable. ``` -dt = dl.DeltaTable("tmp/some-table", version=1) +dt = DeltaTable("tmp/some-table", version=1) +-------+----------+ | num | letter | diff --git a/docs/usage/create-delta-lake-table.md b/docs/usage/create-delta-lake-table.md index 831882c0e6..3a2f023a47 100644 --- a/docs/usage/create-delta-lake-table.md +++ b/docs/usage/create-delta-lake-table.md @@ -5,7 +5,7 @@ This section explains how to create a Delta Lake table. You can easily write a DataFrame to a Delta table. ```python -from deltalake.writer import write_deltalake +from deltalake import write_deltalake import pandas as pd df = pd.DataFrame({"num": [1, 2, 3], "letter": ["a", "b", "c"]}) diff --git a/docs/usage/deleting-rows-from-delta-lake-table.md b/docs/usage/deleting-rows-from-delta-lake-table.md index 2471690f50..e1833c84b9 100644 --- a/docs/usage/deleting-rows-from-delta-lake-table.md +++ b/docs/usage/deleting-rows-from-delta-lake-table.md @@ -18,7 +18,7 @@ Suppose you have the following Delta table with four rows: Here's how to delete all the rows where the `num` is greater than 2: ```python -dt = dl.DeltaTable("tmp/my-table") +dt = DeltaTable("tmp/my-table") dt.delete("num > 2") ``` diff --git a/docs/usage/optimize/delta-lake-z-order.md b/docs/usage/optimize/delta-lake-z-order.md index 81ddabefcb..54be212c47 100644 --- a/docs/usage/optimize/delta-lake-z-order.md +++ b/docs/usage/optimize/delta-lake-z-order.md @@ -2,7 +2,7 @@ This section explains how to Z Order a Delta table. -Z Ordering colocates similar data in the same files, which allows for more better file skipping and faster queries. +Z Ordering colocates similar data in the same files, which allows for better file skipping and faster queries. Suppose you have a table with `first_name`, `age`, and `country` columns.