Skip to content

Commit

Permalink
docs(python): Add example for writing hive partitioned parquet to use…
Browse files Browse the repository at this point in the history
…r guide (#17483)
  • Loading branch information
nameexhaustion authored Jul 8, 2024
1 parent b347717 commit 2b54214
Show file tree
Hide file tree
Showing 2 changed files with 44 additions and 1 deletion.
13 changes: 13 additions & 0 deletions docs/src/python/user-guide/io/hive.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,3 +116,16 @@ def dir_recurse(path: Path):
print(df)

# --8<-- [end:scan_file_hive]

# --8<-- [start:write_parquet_partitioned_show_data]
df = pl.DataFrame({"a": [1, 1, 2, 2, 3], "b": [1, 1, 1, 2, 2], "c": 1})
print(df)
# --8<-- [end:write_parquet_partitioned_show_data]

# --8<-- [start:write_parquet_partitioned]
df.write_parquet_partitioned("docs/data/hive_write/", ["a", "b"])
# --8<-- [end:write_parquet_partitioned]

# --8<-- [start:write_parquet_partitioned_show_paths]
print_paths("docs/data/hive_write/")
# --8<-- [end:write_parquet_partitioned_show_paths]
32 changes: 31 additions & 1 deletion docs/user-guide/io/hive.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Hive partitioned data
## Scanning hive partitioned data

Polars supports scanning hive partitioned parquet and IPC datasets, with planned support for other
formats in the future.
Expand Down Expand Up @@ -69,3 +69,33 @@ Pass `hive_partitioning=True` to enable hive partition parsing:
```python exec="on" result="text" session="user-guide/io/hive"
--8<-- "python/user-guide/io/hive.py:scan_file_hive"
```

## Writing hive partitioned data

> Note: The following functionality is considered _unstable_, and is subject to change.
Polars supports writing hive partitioned parquet datasets, with planned support for other formats.

### Example

For this example the following DataFrame is used:

{{code_block('user-guide/io/hive','write_parquet_partitioned_show_data',['write_parquet_partitioned'])}}

```python exec="on" result="text" session="user-guide/io/hive"
--8<-- "python/user-guide/io/hive.py:write_parquet_partitioned_show_data"
```

We will write it to a hive-partitioned parquet dataset, partitioned by the columns `a` and `b`:

{{code_block('user-guide/io/hive','write_parquet_partitioned',['write_parquet_partitioned'])}}

```python exec="on" result="text" session="user-guide/io/hive"
--8<-- "python/user-guide/io/hive.py:write_parquet_partitioned"
```

The output is a hive partitioned parquet dataset with the following paths:

```python exec="on" result="text" session="user-guide/io/hive"
--8<-- "python/user-guide/io/hive.py:write_parquet_partitioned_show_paths"
```

0 comments on commit 2b54214

Please sign in to comment.