Skip to content

Commit

Permalink
Merge branch 'main' into 1783-run-python-doctest
Browse files Browse the repository at this point in the history
  • Loading branch information
marijncv committed Nov 23, 2023
2 parents f309383 + 7250544 commit fb316e9
Show file tree
Hide file tree
Showing 7 changed files with 166 additions and 8 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
<p align="center">
A native Rust library for Delta Lake, with bindings to Python
<br>
<a href="https://delta-io.github.io/delta-rs/python/">Python docs</a>
<a href="https://delta-io.github.io/delta-rs/">Python docs</a>
·
<a href="https://docs.rs/deltalake/latest/deltalake/">Rust docs</a>
·
Expand Down Expand Up @@ -48,7 +48,7 @@ API that lets you query, inspect, and operate your Delta Lake with ease.

[pypi]: https://pypi.org/project/deltalake/
[pypi-dl]: https://img.shields.io/pypi/dm/deltalake?style=flat-square&color=00ADD4
[py-docs]: https://delta-io.github.io/delta-rs/python/
[py-docs]: https://delta-io.github.io/delta-rs/
[rs-docs]: https://docs.rs/deltalake/latest/deltalake/
[crates]: https://crates.io/crates/deltalake
[crates-dl]: https://img.shields.io/crates/d/deltalake?color=F75101
Expand Down
78 changes: 78 additions & 0 deletions docs/usage/appending-overwriting-delta-lake-table.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Appending to and overwriting a Delta Lake table

This section explains how to append to an exising Delta table and how to overwrite a Delta table.

## Delta Lake append transactions

Suppose you have a Delta table with the following contents:

```
+-------+----------+
| num | letter |
|-------+----------|
| 1 | a |
| 2 | b |
| 3 | c |
+-------+----------+
```

Append two additional rows of data to the table:

```python
from deltalake import write_deltalake, DeltaTable

df = pd.DataFrame({"num": [8, 9], "letter": ["dd", "ee"]})
write_deltalake("tmp/some-table", df, mode="append")
```

Here are the updated contents of the Delta table:

```
+-------+----------+
| num | letter |
|-------+----------|
| 1 | a |
| 2 | b |
| 3 | c |
| 8 | dd |
| 9 | ee |
+-------+----------+
```

Now let's see how to perform an overwrite transaction.

## Delta Lake overwrite transactions

Now let's see how to overwrite the exisitng Delta table.

```python
df = pd.DataFrame({"num": [11, 22], "letter": ["aa", "bb"]})
write_deltalake("tmp/some-table", df, mode="overwrite")
```

Here are the contents of the Delta table after the overwrite operation:

```
+-------+----------+
| num | letter |
|-------+----------|
| 11 | aa |
| 22 | bb |
+-------+----------+
```

Overwriting just performs a logical delete. It doesn't physically remove the previous data from storage. Time travel back to the previous version to confirm that the old version of the table is still accessable.

```
dt = DeltaTable("tmp/some-table", version=1)
+-------+----------+
| num | letter |
|-------+----------|
| 1 | a |
| 2 | b |
| 3 | c |
| 8 | dd |
| 9 | ee |
+-------+----------+
```
25 changes: 25 additions & 0 deletions docs/usage/create-delta-lake-table.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Creating a Delta Lake Table

This section explains how to create a Delta Lake table.

You can easily write a DataFrame to a Delta table.

```python
from deltalake import write_deltalake
import pandas as pd

df = pd.DataFrame({"num": [1, 2, 3], "letter": ["a", "b", "c"]})
write_deltalake("tmp/some-table", df)
```

Here are the contents of the Delta table in storage:

```
+-------+----------+
| num | letter |
|-------+----------|
| 1 | a |
| 2 | b |
| 3 | c |
+-------+----------+
```
34 changes: 34 additions & 0 deletions docs/usage/deleting-rows-from-delta-lake-table.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Deleting rows from a Delta Lake table

This section explains how to delete rows from a Delta Lake table.

Suppose you have the following Delta table with four rows:

```
+-------+----------+
| num | letter |
|-------+----------|
| 1 | a |
| 2 | b |
| 3 | c |
| 4 | d |
+-------+----------+
```

Here's how to delete all the rows where the `num` is greater than 2:

```python
dt = DeltaTable("tmp/my-table")
dt.delete("num > 2")
```

Here are the contents of the Delta table after the delete operation has been performed:

```
+-------+----------+
| num | letter |
|-------+----------|
| 1 | a |
| 2 | b |
+-------+----------+
```
16 changes: 16 additions & 0 deletions docs/usage/optimize/delta-lake-z-order.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Delta Lake Z Order

This section explains how to Z Order a Delta table.

Z Ordering colocates similar data in the same files, which allows for better file skipping and faster queries.

Suppose you have a table with `first_name`, `age`, and `country` columns.

If you Z Order the data by the `country` column, then individuals from the same country will be stored in the same files. When you subquently query the data for individuals from a given country, it will execute faster because more data can be skipped.

Here's how to Z Order a Delta table:

```python
dt = DeltaTable("tmp")
dt.optimize.z_order([country])
```
17 changes: 11 additions & 6 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,17 @@ nav:
- Usage:
- Installation: usage/installation.md
- Overview: usage/index.md
- Loading a Delta Table: usage/loading-table.md
- Examining a Delta Table: usage/examining-table.md
- Querying a Delta Table: usage/querying-delta-tables.md
- Managing a Delta Table: usage/managing-tables.md
- Writing Delta Tables: usage/writing-delta-tables.md
- Small file compaction: usage/small-file-compaction-with-optimize.md
- Creating a table: usage/create-delta-lake-table.md
- Loading a table: usage/loading-table.md
- Append/overwrite tables: usage/appending-overwriting-delta-lake-table.md
- Examining a table: usage/examining-table.md
- Querying a table: usage/querying-delta-tables.md
- Managing a table: usage/managing-tables.md
- Writing a table: usage/writing-delta-tables.md
- Deleting rows from a table: usage/deleting-rows-from-delta-lake-table.md
- Optimize:
- Small file compaction: usage/optimize/small-file-compaction-with-optimize.md
- Z Order: usage/optimize/delta-lake-z-order.md
- API Reference:
- api/delta_table.md
- api/schema.md
Expand Down

0 comments on commit fb316e9

Please sign in to comment.