docs(python): Add example for writing hive partitioned parquet to use…

…r guide (#17483)
pola-rs · Jul 8, 2024 · 2b54214 · 2b54214
1 parent b347717
commit 2b54214
Show file tree

Hide file tree

Showing 2 changed files with 44 additions and 1 deletion.
diff --git a/docs/src/python/user-guide/io/hive.py b/docs/src/python/user-guide/io/hive.py
@@ -116,3 +116,16 @@ def dir_recurse(path: Path):
 print(df)
 
 # --8<-- [end:scan_file_hive]
+
+# --8<-- [start:write_parquet_partitioned_show_data]
+df = pl.DataFrame({"a": [1, 1, 2, 2, 3], "b": [1, 1, 1, 2, 2], "c": 1})
+print(df)
+# --8<-- [end:write_parquet_partitioned_show_data]
+
+# --8<-- [start:write_parquet_partitioned]
+df.write_parquet_partitioned("docs/data/hive_write/", ["a", "b"])
+# --8<-- [end:write_parquet_partitioned]
+
+# --8<-- [start:write_parquet_partitioned_show_paths]
+print_paths("docs/data/hive_write/")
+# --8<-- [end:write_parquet_partitioned_show_paths]
diff --git a/docs/user-guide/io/hive.md b/docs/user-guide/io/hive.md
@@ -1,4 +1,4 @@
-## Hive partitioned data
+## Scanning hive partitioned data
 
 Polars supports scanning hive partitioned parquet and IPC datasets, with planned support for other
 formats in the future.
@@ -69,3 +69,33 @@ Pass `hive_partitioning=True` to enable hive partition parsing:
 ```python exec="on" result="text" session="user-guide/io/hive"
 --8<-- "python/user-guide/io/hive.py:scan_file_hive"
 ```
+
+## Writing hive partitioned data
+
+> Note: The following functionality is considered _unstable_, and is subject to change.
+
+Polars supports writing hive partitioned parquet datasets, with planned support for other formats.
+
+### Example
+
+For this example the following DataFrame is used:
+
+{{code_block('user-guide/io/hive','write_parquet_partitioned_show_data',['write_parquet_partitioned'])}}
+
+```python exec="on" result="text" session="user-guide/io/hive"
+--8<-- "python/user-guide/io/hive.py:write_parquet_partitioned_show_data"
+```
+
+We will write it to a hive-partitioned parquet dataset, partitioned by the columns `a` and `b`:
+
+{{code_block('user-guide/io/hive','write_parquet_partitioned',['write_parquet_partitioned'])}}
+
+```python exec="on" result="text" session="user-guide/io/hive"
+--8<-- "python/user-guide/io/hive.py:write_parquet_partitioned"
+```
+
+The output is a hive partitioned parquet dataset with the following paths:
+
+```python exec="on" result="text" session="user-guide/io/hive"
+--8<-- "python/user-guide/io/hive.py:write_parquet_partitioned_show_paths"
+```