forked from delta-io/delta-rs
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: on append, overwrite, delete and z-ordering (delta-io#1897)
# Description Adds docs on how to append, overwrite, delete rows, and Z Order Delta tables. Will add much more detailed pages in the future. Just getting the high-level skeleton of the docs developed.
- Loading branch information
1 parent
2e1f0c9
commit fee4d77
Showing
6 changed files
with
164 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
# Appending to and overwriting a Delta Lake table | ||
|
||
This section explains how to append to an exising Delta table and how to overwrite a Delta table. | ||
|
||
## Delta Lake append transactions | ||
|
||
Suppose you have a Delta table with the following contents: | ||
|
||
``` | ||
+-------+----------+ | ||
| num | letter | | ||
|-------+----------| | ||
| 1 | a | | ||
| 2 | b | | ||
| 3 | c | | ||
+-------+----------+ | ||
``` | ||
|
||
Append two additional rows of data to the table: | ||
|
||
```python | ||
from deltalake import write_deltalake, DeltaTable | ||
|
||
df = pd.DataFrame({"num": [8, 9], "letter": ["dd", "ee"]}) | ||
write_deltalake("tmp/some-table", df, mode="append") | ||
``` | ||
|
||
Here are the updated contents of the Delta table: | ||
|
||
``` | ||
+-------+----------+ | ||
| num | letter | | ||
|-------+----------| | ||
| 1 | a | | ||
| 2 | b | | ||
| 3 | c | | ||
| 8 | dd | | ||
| 9 | ee | | ||
+-------+----------+ | ||
``` | ||
|
||
Now let's see how to perform an overwrite transaction. | ||
|
||
## Delta Lake overwrite transactions | ||
|
||
Now let's see how to overwrite the exisitng Delta table. | ||
|
||
```python | ||
df = pd.DataFrame({"num": [11, 22], "letter": ["aa", "bb"]}) | ||
write_deltalake("tmp/some-table", df, mode="overwrite") | ||
``` | ||
|
||
Here are the contents of the Delta table after the overwrite operation: | ||
|
||
``` | ||
+-------+----------+ | ||
| num | letter | | ||
|-------+----------| | ||
| 11 | aa | | ||
| 22 | bb | | ||
+-------+----------+ | ||
``` | ||
|
||
Overwriting just performs a logical delete. It doesn't physically remove the previous data from storage. Time travel back to the previous version to confirm that the old version of the table is still accessable. | ||
|
||
``` | ||
dt = DeltaTable("tmp/some-table", version=1) | ||
+-------+----------+ | ||
| num | letter | | ||
|-------+----------| | ||
| 1 | a | | ||
| 2 | b | | ||
| 3 | c | | ||
| 8 | dd | | ||
| 9 | ee | | ||
+-------+----------+ | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# Creating a Delta Lake Table | ||
|
||
This section explains how to create a Delta Lake table. | ||
|
||
You can easily write a DataFrame to a Delta table. | ||
|
||
```python | ||
from deltalake import write_deltalake | ||
import pandas as pd | ||
|
||
df = pd.DataFrame({"num": [1, 2, 3], "letter": ["a", "b", "c"]}) | ||
write_deltalake("tmp/some-table", df) | ||
``` | ||
|
||
Here are the contents of the Delta table in storage: | ||
|
||
``` | ||
+-------+----------+ | ||
| num | letter | | ||
|-------+----------| | ||
| 1 | a | | ||
| 2 | b | | ||
| 3 | c | | ||
+-------+----------+ | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# Deleting rows from a Delta Lake table | ||
|
||
This section explains how to delete rows from a Delta Lake table. | ||
|
||
Suppose you have the following Delta table with four rows: | ||
|
||
``` | ||
+-------+----------+ | ||
| num | letter | | ||
|-------+----------| | ||
| 1 | a | | ||
| 2 | b | | ||
| 3 | c | | ||
| 4 | d | | ||
+-------+----------+ | ||
``` | ||
|
||
Here's how to delete all the rows where the `num` is greater than 2: | ||
|
||
```python | ||
dt = DeltaTable("tmp/my-table") | ||
dt.delete("num > 2") | ||
``` | ||
|
||
Here are the contents of the Delta table after the delete operation has been performed: | ||
|
||
``` | ||
+-------+----------+ | ||
| num | letter | | ||
|-------+----------| | ||
| 1 | a | | ||
| 2 | b | | ||
+-------+----------+ | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# Delta Lake Z Order | ||
|
||
This section explains how to Z Order a Delta table. | ||
|
||
Z Ordering colocates similar data in the same files, which allows for better file skipping and faster queries. | ||
|
||
Suppose you have a table with `first_name`, `age`, and `country` columns. | ||
|
||
If you Z Order the data by the `country` column, then individuals from the same country will be stored in the same files. When you subquently query the data for individuals from a given country, it will execute faster because more data can be skipped. | ||
|
||
Here's how to Z Order a Delta table: | ||
|
||
```python | ||
dt = DeltaTable("tmp") | ||
dt.optimize.z_order([country]) | ||
``` |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters