diff --git a/docs/usage/appending-overwriting-delta-lake-table.md b/docs/usage/appending-overwriting-delta-lake-table.md new file mode 100644 index 0000000000..0930d8da1e --- /dev/null +++ b/docs/usage/appending-overwriting-delta-lake-table.md @@ -0,0 +1,78 @@ +# Appending to and overwriting a Delta Lake table + +This section explains how to append to an exising Delta table and how to overwrite a Delta table. + +## Delta Lake append transactions + +Suppose you have a Delta table with the following contents: + +``` ++-------+----------+ +| num | letter | +|-------+----------| +| 1 | a | +| 2 | b | +| 3 | c | ++-------+----------+ +``` + +Append two additional rows of data to the table: + +```python +from deltalake import write_deltalake, DeltaTable + +df = pd.DataFrame({"num": [8, 9], "letter": ["dd", "ee"]}) +write_deltalake("tmp/some-table", df, mode="append") +``` + +Here are the updated contents of the Delta table: + +``` ++-------+----------+ +| num | letter | +|-------+----------| +| 1 | a | +| 2 | b | +| 3 | c | +| 8 | dd | +| 9 | ee | ++-------+----------+ +``` + +Now let's see how to perform an overwrite transaction. + +## Delta Lake overwrite transactions + +Now let's see how to overwrite the exisitng Delta table. + +```python +df = pd.DataFrame({"num": [11, 22], "letter": ["aa", "bb"]}) +write_deltalake("tmp/some-table", df, mode="overwrite") +``` + +Here are the contents of the Delta table after the overwrite operation: + +``` ++-------+----------+ +| num | letter | +|-------+----------| +| 11 | aa | +| 22 | bb | ++-------+----------+ +``` + +Overwriting just performs a logical delete. It doesn't physically remove the previous data from storage. Time travel back to the previous version to confirm that the old version of the table is still accessable. + +``` +dt = DeltaTable("tmp/some-table", version=1) + ++-------+----------+ +| num | letter | +|-------+----------| +| 1 | a | +| 2 | b | +| 3 | c | +| 8 | dd | +| 9 | ee | ++-------+----------+ +``` diff --git a/docs/usage/create-delta-lake-table.md b/docs/usage/create-delta-lake-table.md new file mode 100644 index 0000000000..3a2f023a47 --- /dev/null +++ b/docs/usage/create-delta-lake-table.md @@ -0,0 +1,25 @@ +# Creating a Delta Lake Table + +This section explains how to create a Delta Lake table. + +You can easily write a DataFrame to a Delta table. + +```python +from deltalake import write_deltalake +import pandas as pd + +df = pd.DataFrame({"num": [1, 2, 3], "letter": ["a", "b", "c"]}) +write_deltalake("tmp/some-table", df) +``` + +Here are the contents of the Delta table in storage: + +``` ++-------+----------+ +| num | letter | +|-------+----------| +| 1 | a | +| 2 | b | +| 3 | c | ++-------+----------+ +``` diff --git a/docs/usage/deleting-rows-from-delta-lake-table.md b/docs/usage/deleting-rows-from-delta-lake-table.md new file mode 100644 index 0000000000..e1833c84b9 --- /dev/null +++ b/docs/usage/deleting-rows-from-delta-lake-table.md @@ -0,0 +1,34 @@ +# Deleting rows from a Delta Lake table + +This section explains how to delete rows from a Delta Lake table. + +Suppose you have the following Delta table with four rows: + +``` ++-------+----------+ +| num | letter | +|-------+----------| +| 1 | a | +| 2 | b | +| 3 | c | +| 4 | d | ++-------+----------+ +``` + +Here's how to delete all the rows where the `num` is greater than 2: + +```python +dt = DeltaTable("tmp/my-table") +dt.delete("num > 2") +``` + +Here are the contents of the Delta table after the delete operation has been performed: + +``` ++-------+----------+ +| num | letter | +|-------+----------| +| 1 | a | +| 2 | b | ++-------+----------+ +``` diff --git a/docs/usage/optimize/delta-lake-z-order.md b/docs/usage/optimize/delta-lake-z-order.md new file mode 100644 index 0000000000..54be212c47 --- /dev/null +++ b/docs/usage/optimize/delta-lake-z-order.md @@ -0,0 +1,16 @@ +# Delta Lake Z Order + +This section explains how to Z Order a Delta table. + +Z Ordering colocates similar data in the same files, which allows for better file skipping and faster queries. + +Suppose you have a table with `first_name`, `age`, and `country` columns. + +If you Z Order the data by the `country` column, then individuals from the same country will be stored in the same files. When you subquently query the data for individuals from a given country, it will execute faster because more data can be skipped. + +Here's how to Z Order a Delta table: + +```python +dt = DeltaTable("tmp") +dt.optimize.z_order([country]) +``` diff --git a/docs/usage/small-file-compaction-with-optimize.md b/docs/usage/optimize/small-file-compaction-with-optimize.md similarity index 100% rename from docs/usage/small-file-compaction-with-optimize.md rename to docs/usage/optimize/small-file-compaction-with-optimize.md diff --git a/mkdocs.yml b/mkdocs.yml index 41f0ee309c..514872e5c8 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -19,12 +19,17 @@ nav: - Usage: - Installation: usage/installation.md - Overview: usage/index.md - - Loading a Delta Table: usage/loading-table.md - - Examining a Delta Table: usage/examining-table.md - - Querying a Delta Table: usage/querying-delta-tables.md - - Managing a Delta Table: usage/managing-tables.md - - Writing Delta Tables: usage/writing-delta-tables.md - - Small file compaction: usage/small-file-compaction-with-optimize.md + - Creating a table: usage/create-delta-lake-table.md + - Loading a table: usage/loading-table.md + - Append/overwrite tables: usage/appending-overwriting-delta-lake-table.md + - Examining a table: usage/examining-table.md + - Querying a table: usage/querying-delta-tables.md + - Managing a table: usage/managing-tables.md + - Writing a table: usage/writing-delta-tables.md + - Deleting rows from a table: usage/deleting-rows-from-delta-lake-table.md + - Optimize: + - Small file compaction: usage/optimize/small-file-compaction-with-optimize.md + - Z Order: usage/optimize/delta-lake-z-order.md - API Reference: - api/delta_table.md - api/schema.md