Skip to content

Commit

Permalink
Merge pull request #62 from GNS-Science/50_epic_ths_table_overhaul_ve…
Browse files Browse the repository at this point in the history
…rsioning_support_plus_arrow

50 epic ths table overhaul versioning support plus arrow
  • Loading branch information
chrisbc authored May 27, 2024
2 parents 77efd33 + f3e1984 commit a87209d
Show file tree
Hide file tree
Showing 147 changed files with 19,672 additions and 2,487 deletions.
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.7.9
current_version = 0.9.0
commit = True
tag = True

Expand Down
5 changes: 2 additions & 3 deletions .github/workflows/dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,10 @@ name: dev workflow

# Controls when the action will run.
on:
# Triggers the workflow on push or pull request events but only for the master branch
push:
branches: [ main ]
branches: [ main, pre-release ]
pull_request:
branches: [ main ]
branches: [ main, pre-release ]

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
Expand Down
39 changes: 38 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,44 @@
# Changelog

## [0.7.9] - 2024-02-26
## [0.9.0] - 2024-05-27

### Added
- V4 epic tables
- parquet support
- new scripts:
- ths_r4_filter_dataset
- ths_r4_import
- ths_r4_migrate
- ths_r4_query
- migration/ths_r4_sanity
- extract datasets directly from hdf5
- more documtention

### Changed
- switch to nzshm-common#pre-release branch
- switch to nzshm-model#pre-release branch
- move outdated scripts to scripts/legacy
- new documentation theme

## [0.8.0] - 2024-02
### Added
- db_adapter architecture
- sqlite3 as db_adapter for localstorage option
- new envionment varisbale for localstorage
- more documentation
- use tmp_path for new localstorage tests
- db_adapter supports SS field type
- dynamodb unique behaviour implement in sqlite
- support for .env configuration (using python-dotenv)

### Changed
- update openquake dependency for NSHM GSIMs
- drop python 3.8 and update deps for openquake
- more test coverage
- refactor tests to use temporary folders correctly
- migrated to pynamodb>=6.0

## [0.7.9] - 2024-02-26
### Changed
- dependencies for compatibility with openquake-engine v3.19

Expand Down
13 changes: 10 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,18 @@
* PyPI: <https://pypi.org/project/toshi-hazard-store/>
* Free software: GPL-3.0-only


This library provides different hazard storage options used withon NSHM hazard pipelines. Third parties may wish to
process models based on, or similar in scale to the NZSHM 22.

## Features

* Main purpose is to upload Openquake hazard results to a DynamodDB tables defined herein.
* relates the results to the toshi hazard id identifying the OQ hazard job run.
* extracts metadata from the openquake hdf5 solution
* Extract realisations from PSHA (openquake) hazard calcs and store these in Parquet dataset.
* Manage Openquake hazard results in AWS DynamodDB tables defined herein (used by NSHM project).
* CLI tools for end users
* **Legacy features:**
* Option for caching using sqlite, See NZSHM22_HAZARD_STORE_LOCAL_CACHE environment variable.
* Option to use a local sqlite store instead of DynamoDB, see THS_USE_SQLITE_ADAPTER and THS_SQLITE_FOLDER variables.

## Credits

Expand Down
8 changes: 7 additions & 1 deletion docs/api.md
Original file line number Diff line number Diff line change
@@ -1 +1,7 @@
::: toshi_hazard_store
## Hazard Queries

::: toshi_hazard_store.query.hazard_query

## Gridded Hazard Queries

::: toshi_hazard_store.query.gridded_hazard_query
18 changes: 18 additions & 0 deletions docs/cli/legacy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# CLI Reference (Legacy)

This page provides documentation for our command line tools.

These scripts relate to V3 and earlier THS dynamodDB models. These are
superceded by revision_4 for new hazard calculations from May 2024.

::: mkdocs-click
:module: scripts.legacy.ths_testing
:command: cli
:prog_name: ths_testing

::: mkdocs-click
:module: scripts.legacy.ths_cache
:command: cli
:prog_name: ths_cache

This module maybe deprecated
12 changes: 12 additions & 0 deletions docs/cli/store_hazard_v4.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
::: scripts.store_hazard_v4
:depth: 1
options:
members: no

# Click CLI documentation

::: mkdocs-click
:module: scripts.store_hazard_v4
:command: main
:prog_name: store_hazard_v4
:depth: 1
12 changes: 12 additions & 0 deletions docs/cli/ths_r4_defrag.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
::: scripts.ths_r4_defrag
:depth: 1
options:
members: no

# Click CLI documentation

::: mkdocs-click
:module: scripts.ths_r4_defrag
:command: main
:prog_name: ths_r4_defrag
:depth: 1
12 changes: 12 additions & 0 deletions docs/cli/ths_r4_filter_dataset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
::: scripts.ths_r4_filter_dataset
:depth: 1
options:
members: no

# Click CLI documentation

::: mkdocs-click
:module: scripts.ths_r4_filter_dataset
:command: main
:prog_name: ths_r4_filter_dataset
:depth: 1
12 changes: 12 additions & 0 deletions docs/cli/ths_r4_import.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
::: scripts.ths_r4_import
:depth: 1
options:
members: no

# Click CLI documentation

::: mkdocs-click
:module: scripts.ths_r4_import
:command: main
:prog_name: ths_r4_import
:depth: 1
12 changes: 12 additions & 0 deletions docs/cli/ths_r4_migrate.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
::: scripts.ths_r4_migrate
:depth: 1
options:
members: no

# Click CLI documentation

::: mkdocs-click
:module: scripts.ths_r4_migrate
:command: main
:prog_name: ths_r4_migrate
:depth: 1
12 changes: 12 additions & 0 deletions docs/cli/ths_r4_query.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
::: scripts.ths_r4_query
:depth: 1
options:
members: no

# Click CLI documentation

::: mkdocs-click
:module: scripts.ths_r4_query
:command: main
:prog_name: ths_r4_query
:depth: 1
12 changes: 12 additions & 0 deletions docs/cli/ths_r4_sanity.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
::: scripts.ths_r4_sanity
:depth: 1
options:
members: no

# Click CLI documentation

::: mkdocs-click
:module: scripts.ths_r4_sanity
:command: main
:prog_name: ths_r4_sanity
:depth: 1
75 changes: 75 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Configuration


The toshi_hazard_store project was originally designed to support the AWS Dynamodb database service. It now provides an option
to use a local sqlite3 store as an alternative.

Caveats for local storage:

- a complete model (e.g. the NSHM_v1.0.4 dataset) will likely prove too large for this option.
- this is a single-user solution.
- we provide no way to migrate data between storage backends (although in principle this should be relatively easy)


Run-time options let you configure the library for your use-case. Settings are made using environment variables, and/or a local `.env` (dotenv) file see [python-dotenv](https://github.com/theskumar/python-dotenv).

The '.env' file should be created in the folder from where the python interpreter is invoked - typically the root folder of your project.


### General settings

| | Default | Description | for Cloud | for Local |
|---------|---------|-------------|-----------|-----------|
| **NZSHM22_HAZARD_STORE_STAGE** | None | descriminator for table names | Required | Required |
| **NZSHM22_HAZARD_STORE_NUM_WORKERS** | 1 | number of parallel workers for batch operations | Optional integer | NA (single worker only) |
| **THS_USE_SQLITE_ADAPTER** | FALSE | use local (sqlite) storage? | NA | TRUE |


### Cloud settings

The NZSHM toshi-hazard-store database is available for public, read-only access using AWS API credentials (contact via email: [email protected]).

- AWS credentials will be provided with so-called `short-term credentials` in the form of an `awx_access_key_id` and and `aws_access_key_secret`.

- Typically these are configured in your local credentials file as described in [Authenticate with short-term credentials](https://docs.aws.amazon.com/cli/v1/userguide/cli-authentication-short-term.html).

- An `AWS_PROFILE` environment variable determines the credentials used at run-time by THS.


| | Default | Description | for Cloud | for Local |
|---------|---------|-------------|-----------|-----------|
| **AWS_PROFILE** | None | Name of your AWS credentials | Required | N/A |
| **NZSHM22_HAZARD_STORE_REGION** | None | AWS regaion e.g us-east-1 | Required | N/A |
| **NZSHM22_HAZARD_STORE_LOCAL_CACHE** | None | folder for local cache | Optional (leave unset to disable caching)| N/A |



### Local (off-cloud) settings

| | Default | Description | for Cloud | for Local |
|---------|---------|-------------|-----------|-----------|
| **THS_SQLITE_FOLDER** | None | folder for local storage | N/A | Required


## Example .env file

```
# GENERAL settings
NZSHM22_HAZARD_STORE_STAGE=TEST
NZSHM22_HAZARD_STORE_NUM_WORKERS=4
# IMPORTANT !!
THS_USE_SQLITE_ADAPTER=TRUE
# CLOUD settings
AWS_PROFILE={YOUR AWS PROFILE}
NZSHM22_HAZARD_STORE_REGION={us-east-1)
# LOCAL Caching (Optional, cloud only)
NZSHM22_HAZARD_STORE_LOCAL_CACHE=/home/chrisbc/.cache/toshi_hazard_store
# LOCAL Storage settings
THS_SQLITE_FOLDER=/GNSDATA/LIB/toshi-hazard-store/LOCALSTORAGE
```

These settings can be overridden by specifiying values in the local environment.
62 changes: 62 additions & 0 deletions docs/domain_model/disaggregation_models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
**Tables:**

- **DisaggAggregationExceedance** - Disaggregation curves of Probablity of Exceedance
- **DisaggAggregationOccurence** - Disaggregation curves of Probablity of Occurence

The base class **LocationIndexedModel** provides common attributes and indexing for models that support location-based indexing.

The base class **DisaggAggregationBase** defines attribtues common to both types of disaggregation curve.

```mermaid
classDiagram
direction TB
class LocationIndexedModel {
partition_key = UnicodeAttribute(hash_key=True) # For this we will use a downsampled location to 1.0 degree
sort_key = UnicodeAttribute(range_key=True)
nloc_001 = UnicodeAttribute() # 0.001deg ~100m grid
nloc_01 = UnicodeAttribute() # 0.01deg ~1km grid
nloc_1 = UnicodeAttribute() # 0.1deg ~10km grid
nloc_0 = UnicodeAttribute() # 1.0deg ~100km grid
version = VersionAttribute()
uniq_id = UnicodeAttribute()
lat = FloatAttribute() # latitude decimal degrees
lon = FloatAttribute() # longitude decimal degrees
vs30 = EnumConstrainedIntegerAttribute(VS30Enum)
site_vs30 = FloatAttribute(null=True)
created = TimestampAttribute(default=datetime_now)
}
class DisaggAggregationBase{
... fields from LocationIndexedModel
hazard_model_id = UnicodeAttribute()
imt = EnumConstrainedUnicodeAttribute(IntensityMeasureTypeEnum)
hazard_agg = EnumConstrainedUnicodeAttribute(AggregationEnum) # eg MEAN
disagg_agg = EnumConstrainedUnicodeAttribute(AggregationEnum)
disaggs = CompressedPickleAttribute() # a very compressible numpy array,
bins = PickleAttribute() # a much smaller numpy array
shaking_level = FloatAttribute()
probability = EnumAttribute(ProbabilityEnum) # eg TEN_PCT_IN_50YRS
}
class DisaggAggregationExceedance{
... fields from DisaggAggregationBase
}
class DisaggAggregationOccurence{
... fields from DisaggAggregationBase
}
LocationIndexedModel <|-- DisaggAggregationBase
DisaggAggregationBase <| -- DisaggAggregationExceedance
DisaggAggregationBase <| -- DisaggAggregationOccurence
```
25 changes: 25 additions & 0 deletions docs/domain_model/gridded_hazard_models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
**Tables:**

- **GriddedHazard** - Grid points defined in location_grid_id has a values in grid_poes.
- **HazardAggregation** - stores aggregate hazard curves [see ./openquake_models for details](./openquake_models.md)

```mermaid
classDiagram
direction LR
class GriddedHazard{
partition_key = UnicodeAttribute(hash_key=True)
sort_key = UnicodeAttribute(range_key=True)
version = VersionAttribute()
created = TimestampAttribute(default=datetime_now)
hazard_model_id = UnicodeAttribute()
location_grid_id = UnicodeAttribute()
vs30 = EnumConstrainedIntegerAttribute(VS30Enum)
imt = EnumConstrainedUnicodeAttribute(IntensityMeasureTypeEnum)
agg = EnumConstrainedUnicodeAttribute(AggregationEnum)
poe = FloatAttribute()
grid_poes = CompressedListAttribute()
}
GriddedHazard --> "1..*" HazardAggregation
```
Loading

0 comments on commit a87209d

Please sign in to comment.