Skip to content

Commit

Permalink
Update Python IO docs (#853)
Browse files Browse the repository at this point in the history
  • Loading branch information
kylebarron authored Nov 12, 2024
1 parent 49fd4cb commit 7721ab8
Show file tree
Hide file tree
Showing 14 changed files with 581 additions and 72 deletions.
16 changes: 16 additions & 0 deletions python/docs/api/io/arrow_ipc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Arrow IPC

It's possible to read and write GeoArrow data to the [Arrow IPC format](https://arrow.apache.org/docs/python/ipc.html).

The Arrow IPC format is able to fully represent GeoArrow data. Loading such files back into memory will identically reproduce the prior data.

Arrow IPC generically supports GeoArrow data without any extra behavior, so the functionality to read and write Arrow IPC files lives in [`arro3`](https://github.com/kylebarron/arro3).

Refer to:

- [`arro3.io.read_ipc`][]
- [`arro3.io.read_ipc_stream`][]
- [`arro3.io.write_ipc`][]
- [`arro3.io.write_ipc_stream`][]

When saved without any internal compression, the Arrow IPC format can also be memory-mapped, enabling faster reading.
6 changes: 6 additions & 0 deletions python/docs/api/io/csv.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# CSV

Read and write CSV files with a geometry column encoded as Well-Known Text.

::: geoarrow.rust.io.read_csv
::: geoarrow.rust.io.write_csv
3 changes: 0 additions & 3 deletions python/docs/api/io/enums.md

This file was deleted.

7 changes: 7 additions & 0 deletions python/docs/api/io/flatgeobuf.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# FlatGeobuf

Read and write [FlatGeobuf](https://flatgeobuf.org/) files.

::: geoarrow.rust.io.read_flatgeobuf
::: geoarrow.rust.io.read_flatgeobuf_async
::: geoarrow.rust.io.write_flatgeobuf
29 changes: 0 additions & 29 deletions python/docs/api/io/functions.md

This file was deleted.

10 changes: 10 additions & 0 deletions python/docs/api/io/gdal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# GDAL

GDAL natively supports reading data from any vector driver as GeoArrow data, and natively supports writing data to any vector driver from GeoArrow data.

For reading and writing you have two options:

- You can use [`pyogrio`'s Arrow integration](https://pyogrio.readthedocs.io/en/latest/api.html#arrow-integration) directly
- You can use the [`geoarrow.rust.core.read_pyogrio`][] wrapper.

This calls `pyogrio` under the hood (and requires that `pyogrio` is installed). The wrapper lives in `geoarrow.rust.core` because it has no dependency on any Rust IO code.
8 changes: 8 additions & 0 deletions python/docs/api/io/geojson.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# GeoJSON

Read and write GeoJSON and newline-delimited GeoJSON files.

::: geoarrow.rust.io.read_geojson
::: geoarrow.rust.io.read_geojson_lines
::: geoarrow.rust.io.write_geojson
::: geoarrow.rust.io.write_geojson_lines
13 changes: 13 additions & 0 deletions python/docs/api/io/geoparquet.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# GeoParquet

Read and write [GeoParquet](https://geoparquet.org/) files.

::: geoarrow.rust.io.read_parquet
::: geoarrow.rust.io.read_parquet_async
::: geoarrow.rust.io.write_parquet
::: geoarrow.rust.io.ParquetDataset
::: geoarrow.rust.io.ParquetFile
::: geoarrow.rust.io.ParquetWriter
::: geoarrow.rust.io.types.BboxCovering
::: geoarrow.rust.io.enums.GeoParquetEncoding
::: geoarrow.rust.io.types.GeoParquetEncodingT
6 changes: 6 additions & 0 deletions python/docs/api/io/postgis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# PostGIS

Read from a PostGIS database.

::: geoarrow.rust.io.read_postgis
::: geoarrow.rust.io.read_postgis_async
3 changes: 0 additions & 3 deletions python/docs/api/io/types.md

This file was deleted.

26 changes: 11 additions & 15 deletions python/geoarrow-io/python/geoarrow/rust/io/_io.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,7 @@ from geoarrow.rust.core import NativeArray
from pyproj import CRS

from .enums import GeoParquetEncoding
from .types import (
BboxPaths,
GeoParquetEncodingT,
IntFloat,
)
from .types import BboxCovering, GeoParquetEncodingT

class ParquetFile:
def __init__(self, path: str, fs: ObjectStore) -> None:
Expand Down Expand Up @@ -60,7 +56,7 @@ class ParquetFile:
CRS
"""
def row_group_bounds(
self, row_group_idx: int, bbox_paths: BboxPaths | None = None
self, row_group_idx: int, bbox_paths: BboxCovering | None = None
) -> List[float]:
"""Get the bounds of a single row group.
Expand All @@ -71,7 +67,7 @@ class ParquetFile:
Returns:
The bounds of a single row group.
"""
def row_groups_bounds(self, bbox_paths: BboxPaths | None = None) -> NativeArray:
def row_groups_bounds(self, bbox_paths: BboxCovering | None = None) -> NativeArray:
"""
Get the bounds of all row groups.
Expand Down Expand Up @@ -99,8 +95,8 @@ class ParquetFile:
batch_size: int | None = None,
limit: int | None = None,
offset: int | None = None,
bbox: Sequence[IntFloat] | None = None,
bbox_paths: BboxPaths | None = None,
bbox: Sequence[int | float] | None = None,
bbox_paths: BboxCovering | None = None,
) -> Table:
"""Perform an async read with the given options
Expand All @@ -120,8 +116,8 @@ class ParquetFile:
batch_size: int | None = None,
limit: int | None = None,
offset: int | None = None,
bbox: Sequence[IntFloat] | None = None,
bbox_paths: BboxPaths | None = None,
bbox: Sequence[int | float] | None = None,
bbox_paths: BboxCovering | None = None,
) -> Table:
"""Perform a sync read with the given options
Expand Down Expand Up @@ -174,8 +170,8 @@ class ParquetDataset:
batch_size: int | None = None,
limit: int | None = None,
offset: int | None = None,
bbox: Sequence[IntFloat] | None = None,
bbox_paths: BboxPaths | None = None,
bbox: Sequence[int | float] | None = None,
bbox_paths: BboxCovering | None = None,
) -> Table:
"""Perform an async read with the given options
Expand All @@ -196,8 +192,8 @@ class ParquetDataset:
batch_size: int | None = None,
limit: int | None = None,
offset: int | None = None,
bbox: Sequence[IntFloat] | None = None,
bbox_paths: BboxPaths | None = None,
bbox: Sequence[int | float] | None = None,
bbox_paths: BboxCovering | None = None,
) -> Table:
"""Perform a sync read with the given options
Expand Down
26 changes: 22 additions & 4 deletions python/geoarrow-io/python/geoarrow/rust/io/types.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
from __future__ import annotations

from typing import Literal, Sequence, TypedDict, Union

IntFloat = Union[int, float]
from typing import Literal, Sequence, TypedDict


GeoParquetEncodingT = Literal["wkb", "native"]
Expand All @@ -11,8 +9,28 @@
"""


class BboxPaths(TypedDict):
class BboxCovering(TypedDict):
"""Column names for the per-row bounding box covering used in spatial partitioning.
The spatial partitioning defined in GeoParquet 1.1 allows for a [`"covering"`
field](https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md#covering).
The covering should be four float columns that represent the bounding box of each
row of the data.
As of GeoParquet 1.1, this metadata is included in the Parquet file itself, but this
typed dict can be used with spatially-partitioned GeoParquet datasets that do not
write GeoParquet 1.1 metadata. Providing this information is unnecessary for
GeoParquet 1.1 files with included covering information.
"""

xmin: Sequence[str]
"""The path to the xmin bounding box column."""

ymin: Sequence[str]
"""The path to the ymin bounding box column."""

xmax: Sequence[str]
"""The path to the xmax bounding box column."""

ymax: Sequence[str]
"""The path to the ymax bounding box column."""
10 changes: 7 additions & 3 deletions python/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,13 @@ nav:
- api/compute/enums.md
- api/compute/types.md
- geoarrow.rust.io:
- api/io/functions.md
- api/io/types.md
- api/io/enums.md
- api/io/csv.md
- api/io/flatgeobuf.md
- api/io/geojson.md
- api/io/geoparquet.md
- api/io/postgis.md
- api/io/arrow_ipc.md
- api/io/gdal.md
- Ecosystem:
- ecosystem/geopandas.md
- ecosystem/lonboard.md
Expand Down
Loading

0 comments on commit 7721ab8

Please sign in to comment.