Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: on append, overwrite, delete and z-ordering #1897

Merged
merged 4 commits into from
Nov 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 78 additions & 0 deletions docs/usage/appending-overwriting-delta-lake-table.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Appending to and overwriting a Delta Lake table

This section explains how to append to an exising Delta table and how to overwrite a Delta table.

## Delta Lake append transactions

Suppose you have a Delta table with the following contents:

```
+-------+----------+
| num | letter |
|-------+----------|
| 1 | a |
| 2 | b |
| 3 | c |
+-------+----------+
```

Append two additional rows of data to the table:

```python
from deltalake import write_deltalake, DeltaTable

df = pd.DataFrame({"num": [8, 9], "letter": ["dd", "ee"]})
write_deltalake("tmp/some-table", df, mode="append")
```

Here are the updated contents of the Delta table:

```
+-------+----------+
| num | letter |
|-------+----------|
| 1 | a |
| 2 | b |
| 3 | c |
| 8 | dd |
| 9 | ee |
+-------+----------+
```

Now let's see how to perform an overwrite transaction.

## Delta Lake overwrite transactions

Now let's see how to overwrite the exisitng Delta table.

```python
df = pd.DataFrame({"num": [11, 22], "letter": ["aa", "bb"]})
write_deltalake("tmp/some-table", df, mode="overwrite")
```

Here are the contents of the Delta table after the overwrite operation:

```
+-------+----------+
| num | letter |
|-------+----------|
| 11 | aa |
| 22 | bb |
+-------+----------+
```

Overwriting just performs a logical delete. It doesn't physically remove the previous data from storage. Time travel back to the previous version to confirm that the old version of the table is still accessable.

```
dt = DeltaTable("tmp/some-table", version=1)

+-------+----------+
| num | letter |
|-------+----------|
| 1 | a |
| 2 | b |
| 3 | c |
| 8 | dd |
| 9 | ee |
+-------+----------+
```
25 changes: 25 additions & 0 deletions docs/usage/create-delta-lake-table.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Creating a Delta Lake Table

This section explains how to create a Delta Lake table.

You can easily write a DataFrame to a Delta table.

```python
from deltalake import write_deltalake
import pandas as pd

df = pd.DataFrame({"num": [1, 2, 3], "letter": ["a", "b", "c"]})
write_deltalake("tmp/some-table", df)
```

Here are the contents of the Delta table in storage:

```
+-------+----------+
| num | letter |
|-------+----------|
| 1 | a |
| 2 | b |
| 3 | c |
+-------+----------+
```
34 changes: 34 additions & 0 deletions docs/usage/deleting-rows-from-delta-lake-table.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Deleting rows from a Delta Lake table

This section explains how to delete rows from a Delta Lake table.

Suppose you have the following Delta table with four rows:

```
+-------+----------+
| num | letter |
|-------+----------|
| 1 | a |
| 2 | b |
| 3 | c |
| 4 | d |
+-------+----------+
```

Here's how to delete all the rows where the `num` is greater than 2:

```python
dt = DeltaTable("tmp/my-table")
dt.delete("num > 2")
```

Here are the contents of the Delta table after the delete operation has been performed:

```
+-------+----------+
| num | letter |
|-------+----------|
| 1 | a |
| 2 | b |
+-------+----------+
```
16 changes: 16 additions & 0 deletions docs/usage/optimize/delta-lake-z-order.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Delta Lake Z Order

This section explains how to Z Order a Delta table.

Z Ordering colocates similar data in the same files, which allows for better file skipping and faster queries.

Suppose you have a table with `first_name`, `age`, and `country` columns.

If you Z Order the data by the `country` column, then individuals from the same country will be stored in the same files. When you subquently query the data for individuals from a given country, it will execute faster because more data can be skipped.

Here's how to Z Order a Delta table:

```python
dt = DeltaTable("tmp")
dt.optimize.z_order([country])
```
17 changes: 11 additions & 6 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,17 @@ nav:
- Usage:
- Installation: usage/installation.md
- Overview: usage/index.md
- Loading a Delta Table: usage/loading-table.md
- Examining a Delta Table: usage/examining-table.md
- Querying a Delta Table: usage/querying-delta-tables.md
- Managing a Delta Table: usage/managing-tables.md
- Writing Delta Tables: usage/writing-delta-tables.md
- Small file compaction: usage/small-file-compaction-with-optimize.md
- Creating a table: usage/create-delta-lake-table.md
- Loading a table: usage/loading-table.md
- Append/overwrite tables: usage/appending-overwriting-delta-lake-table.md
- Examining a table: usage/examining-table.md
- Querying a table: usage/querying-delta-tables.md
- Managing a table: usage/managing-tables.md
- Writing a table: usage/writing-delta-tables.md
- Deleting rows from a table: usage/deleting-rows-from-delta-lake-table.md
- Optimize:
- Small file compaction: usage/optimize/small-file-compaction-with-optimize.md
- Z Order: usage/optimize/delta-lake-z-order.md
- API Reference:
- api/delta_table.md
- api/schema.md
Expand Down
Loading