Skip to content

Commit

Permalink
Read version number from the schema (#159)
Browse files Browse the repository at this point in the history
* Read version number from the schema

* Update lockfile after poetry update

* Generate example before extracting schema

* Allow for differences in the example.parquet file

* Move test script to test directory
  • Loading branch information
tschaub authored Dec 14, 2022
1 parent 84ae2d9 commit 592fb0a
Show file tree
Hide file tree
Showing 11 changed files with 416 additions and 845 deletions.
14 changes: 6 additions & 8 deletions .github/workflows/scripts.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,6 @@ jobs:
geoparquet_validator $example || exit 1;
done
- name: Test json schema
run: |
python -m pip install pytest
cd tests
pytest test_json_schema.py -v
test-json-metadata:
runs-on: ubuntu-latest
steps:
Expand All @@ -56,8 +50,12 @@ jobs:
- name: Run scripts
run: |
cd scripts
poetry run pytest test_json_schema.py -v
poetry run python generate_example.py
poetry run python update_example_schemas.py
cd ../examples
# Assert no changes in the git repo, aka that the json version of the
# schemas are up to date
# Assert that the version number and file metadata are up to date
# Allow for differences in example.parquet
git restore example.parquet
git diff
test -z "$(git status --porcelain)"
5 changes: 2 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
# Ignore GeoPackage file used in conversion to GeoParquet
*.gpkg*
tests/data/*
/scripts/data/
/scripts/__pycache__/
8 changes: 0 additions & 8 deletions examples/environment.yml

This file was deleted.

Binary file modified examples/example.parquet
Binary file not shown.
15 changes: 2 additions & 13 deletions format-specs/geoparquet.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,23 +41,12 @@ A GeoParquet file MUST include a `geo` key in the Parquet metadata (see [`FileMe

| Field Name | Type | Description |
| ------------------ | ------ | -------------------------------------------------------------------- |
| version | string | **REQUIRED.** The version of the GeoParquet metadata standard used when writing. |
| primary_column | string | **REQUIRED.** The name of the "primary" geometry column. |
| version | string | **REQUIRED.** The version identifier for the GeoParquet specification. |
| primary_column | string | **REQUIRED.** The name of the "primary" geometry column. In cases where a GeoParquet file contains multiple geometry columns, the primary geometry may be used by default in geospatial operations. |
| columns | object\<string, [Column Metadata](#column-metadata)> | **REQUIRED.** Metadata about geometry columns. Each key is the name of a geometry column in the table. |

At this level, additional implementation-specific fields (e.g. library name) MAY be present, and readers should be robust in ignoring those.

### Additional file metadata information

#### primary_column

This indicates the "primary" or "active" geometry for systems that can store multiple geometries,
but have a default geometry used for geospatial operations.

#### version

Version of the GeoParquet spec used, currently 0.5.0-dev

### Column metadata

Each geometry column in the dataset MUST be included in the `columns` field above with the following content, keyed by the column name:
Expand Down
20 changes: 19 additions & 1 deletion scripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,29 @@ poetry update
To run a script, prefix it with `poetry run`. For example:

```
poetry run python update_example_schemas.py
poetry run python generate_example.py
```

Using `poetry run` ensures that you're running the python script using _this_ local environment, not your global environment.

### Tests

To run the tests, change into the `scripts` directory and run the following:

```
poetry run pytest test_json_schema.py -v
```

### example.parquet

The `example.parquet` file in the `examples` directory is generated with the `generate_example.py` script. This script needs to be updated and run any time there are changes to the "geo" file metadata or to the version constant in `schema.json`.

To update the `../examples/example.parquet` file, run this from the `scripts` directory:

```
poetry run python generate_example.py
```

### nz-building-outlines to Parquet

```bash
Expand Down
11 changes: 9 additions & 2 deletions examples/example.py → scripts/generate_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,15 @@
table = pa.Table.from_pandas(df.head().to_wkb())


def get_version() -> str:
"""Read the version const from the schema.json file"""
with open(HERE / "../format-specs/schema.json") as f:
spec_schema = json.load(f)
return spec_schema["properties"]["version"]["const"]


metadata = {
"version": "0.5.0-dev",
"version": get_version(),
"primary_column": "geometry",
"columns": {
"geometry": {
Expand All @@ -42,4 +49,4 @@
)
table = table.cast(schema)

pq.write_table(table, HERE / "example.parquet")
pq.write_table(table, HERE / "../examples/example.parquet")
Loading

0 comments on commit 592fb0a

Please sign in to comment.