Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/53 th sv2 db adapters #54

Open
wants to merge 73 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 43 commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
6419fef
WIP on adapter and sqlite implementation
chrisbc Dec 13, 2023
1445c1e
sqlite adapter is progressing
chrisbc Dec 14, 2023
d0e2d8b
added exists(), delete_table to sqlite adapter;
chrisbc Dec 14, 2023
5d6784b
added save() to sqlite adapter;
chrisbc Dec 14, 2023
b6c9e5f
added save() & query();
chrisbc Dec 14, 2023
8a23e1f
refactor db_adapter packages;
chrisbc Dec 14, 2023
fde0e73
refactoring
chrisbc Dec 14, 2023
84e7259
clone all oq models for v2 extract testing;
chrisbc Dec 14, 2023
50543ac
working test 1;
chrisbc Dec 17, 2023
fa0b1c3
v2 realisations OK; added test script/ths_v2.py;
chrisbc Dec 18, 2023
9290e52
simplified adapter pattern; added tests for pynamodb vs sqlite
chrisbc Dec 19, 2023
f3a93ce
update test script
chrisbc Dec 19, 2023
9fb34b0
batch support for sqlite;
chrisbc Dec 20, 2023
d507227
get caching back together; update changelog
chrisbc Dec 21, 2023
d281655
Bump version: 0.7.7 → 0.8.0
chrisbc Dec 21, 2023
5efa553
fix caching imports;
chrisbc Dec 21, 2023
0c3ac60
fix imports; reduce bounds for randint in test;
chrisbc Dec 21, 2023
32f641d
fix imports
chrisbc Dec 21, 2023
b6447d3
more docs; test using tmp_path;
chrisbc Jan 8, 2024
1144e57
tweak sqlite adapter folder location
chrisbc Jan 8, 2024
8909765
get pytest and mocking pynamodb working on meta table;
chrisbc Jan 8, 2024
5781a96
detox
chrisbc Jan 8, 2024
5418d20
WIP on dynamic base classes
chrisbc Jan 9, 2024
3e55e5e
dynamic_base_class working on ToshiOpenquakeMeta
chrisbc Jan 10, 2024
b7c2dcb
WIP on dyanic_base_class setup
chrisbc Jan 10, 2024
9c240c0
OpenquakeRealization dynamic working
chrisbc Jan 10, 2024
850dcd1
WIP on refactoring tests
chrisbc Jan 11, 2024
e63c615
delete duplicated test
chrisbc Jan 14, 2024
cb58a74
realization tests passing with db_adapter
chrisbc Jan 14, 2024
b9e32bd
HazardAggregation tests working with db_adapter
chrisbc Jan 15, 2024
0ce29f8
caching tests fixed
chrisbc Jan 15, 2024
091f278
moved db_adapter package; dropped unuused v2 models;
chrisbc Jan 15, 2024
93d8a3e
solved PyanmodbAdapterInterface typing and inheritance configuration;…
chrisbc Jan 16, 2024
584eda6
WIP on transform + export
chrisbc Jan 16, 2024
981ccba
reproduced pickle.dump error n oq_rlz.
chrisbc Jan 16, 2024
31ab9fb
pickle error relates to rebase importing package instead of module
chrisbc Jan 16, 2024
0af7db7
all tests passing; store_hazard script workging with SqliteAdapter; d…
chrisbc Jan 17, 2024
1f89db0
make get_tables dynamic; add test showing how NOT to rebase models;
chrisbc Jan 17, 2024
85a5826
testing [sqlite] rlz with ths_testing script;
chrisbc Jan 17, 2024
c307e51
update cli docs;
chrisbc Jan 17, 2024
d8bfb96
update docs; add model digrams and outline of proposed changes.
chrisbc Jan 18, 2024
b1df4ae
get all tests running in tox; fix db_adapter pattern in hazard_query.…
chrisbc Jan 25, 2024
91377c0
reorg hazard_query; add scripts watch to mkdocs config; fix random ed…
chrisbc Jan 25, 2024
1298d40
minor docs updates;
chrisbc Feb 25, 2024
0ea15d1
updated black; simplify pyproject.toml so `poetry check` passes;
chrisbc Feb 25, 2024
35eedac
black formatting
chrisbc Feb 25, 2024
bf596e8
Merge remote-tracking branch 'origin/update-urllib3-dep' into feature…
chrisbc Feb 25, 2024
c5bf44f
full support for SS round-tripping in sqlite with test cover;
chrisbc Feb 26, 2024
df3323c
update openquake to 3.19 to get all the NSHM GSIMS; re-store test_tra…
chrisbc Feb 26, 2024
a0d8b12
update CHANGELOG
chrisbc Feb 26, 2024
8aac572
update changelog
chrisbc Feb 26, 2024
dac19d1
cleanup tweaks;
chrisbc Feb 26, 2024
632c1e0
add test cover for unique contraints; sqlite is incomplete;
chrisbc Feb 27, 2024
40399bb
WIP on unique constraint for sqlite;
chrisbc Feb 27, 2024
cfedbb0
add deduplication to sqlite adapter batch save;
chrisbc Feb 27, 2024
7b578cd
commenting
chrisbc Feb 27, 2024
40bff41
changelog
chrisbc Feb 27, 2024
0fd4e35
added python-dotenv; .env support; fixed tests to use tempfile properly;
chrisbc Feb 28, 2024
1280759
fix breaking cache put test; WIP;
chrisbc Feb 28, 2024
3e9ed96
update to pynamodb>=6;
chrisbc Feb 29, 2024
439b0d4
WIP on serialization improvements;
chrisbc Feb 29, 2024
d1ef3de
more serialisation WIP
chrisbc Mar 1, 2024
0707287
refactoring WIP;
chrisbc Mar 4, 2024
09e139e
test coverage on db_adapter for custom attributes;
chrisbc Mar 4, 2024
d223a55
more cover and fixes for custom attributes
chrisbc Mar 4, 2024
b4207fc
formatting
chrisbc Mar 4, 2024
fc0e2c8
detox; tests passing;
chrisbc Mar 4, 2024
65d6db4
format; log level tweak;
chrisbc Mar 5, 2024
6f347c9
updated docs
chrisbc Mar 6, 2024
9ee00df
resolved merge conflicts on main
chrisbc Mar 6, 2024
8697193
add auto-configuration
chrisbc Mar 6, 2024
8cae1cc
configure logging and error handler for batch ops
chrisbc Mar 11, 2024
2899861
add cdc test with logging
chrisbc Mar 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.7.7
current_version = 0.8.0
commit = True
tag = True

Expand Down
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
# Changelog

## [0.8.0] - 2024-01-08
### Added
- db_adapter architecture
- sqlite3 as db_adapter for localstorage option
- new envionment varisbale for localstorage
- more documentation
- use tmp_path for new localstorage tests


## [0.7.7] - 2023-12-13
### Changed
- fix publication workflow
Expand Down
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,10 @@

## Features

* Main purpose is to upload Openquake hazard results to a DynamodDB tables defined herein.
* relates the results to the toshi hazard id identifying the OQ hazard job run.
* extracts metadata from the openquake hdf5 solution
* Manage Openquake hazard results in AWS DynamodDB tables defined herein.
* Option for caching using sqlite, See NZSHM22_HAZARD_STORE_LOCAL_CACHE environment variable.
* Option to use a local sqlite store instead of DynamoDB, see THS_USE_SQLITE_ADAPTER and THS_SQLITE_FOLDER variables.
* cli tools for end users

## Credits

Expand Down
8 changes: 7 additions & 1 deletion docs/api.md
Original file line number Diff line number Diff line change
@@ -1 +1,7 @@
::: toshi_hazard_store
## Hazard Queries

::: toshi_hazard_store.query.hazard_query

## Gridded Hazard Queries

::: toshi_hazard_store.query.gridded_hazard_query
15 changes: 15 additions & 0 deletions docs/cli.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# CLI Reference

This page provides documentation for our command line tools.

::: mkdocs-click
:module: scripts.ths_testing
:command: cli
:prog_name: ths_testing

::: mkdocs-click
:module: scripts.ths_cache
:command: cli
:prog_name: ths_cache

This module maybe deprecated
62 changes: 62 additions & 0 deletions docs/domain_model/disaggregation_models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
**Tables:**

- **DisaggAggregationExceedance** - Disaggregation curves of Probablity of Exceedance
- **DisaggAggregationOccurence** - Disaggregation curves of Probablity of Occurence

The base class **LocationIndexedModel** provides common attributes and indexing for models that support location-based indexing.

The base class **DisaggAggregationBase** defines attribtues common to both types of disaggregation curve.

```mermaid
classDiagram
direction TB

class LocationIndexedModel {

partition_key = UnicodeAttribute(hash_key=True) # For this we will use a downsampled location to 1.0 degree
sort_key = UnicodeAttribute(range_key=True)

nloc_001 = UnicodeAttribute() # 0.001deg ~100m grid
nloc_01 = UnicodeAttribute() # 0.01deg ~1km grid
nloc_1 = UnicodeAttribute() # 0.1deg ~10km grid
nloc_0 = UnicodeAttribute() # 1.0deg ~100km grid

version = VersionAttribute()
uniq_id = UnicodeAttribute()

lat = FloatAttribute() # latitude decimal degrees
lon = FloatAttribute() # longitude decimal degrees

vs30 = EnumConstrainedIntegerAttribute(VS30Enum)
site_vs30 = FloatAttribute(null=True)

created = TimestampAttribute(default=datetime_now)

}

class DisaggAggregationBase{
... fields from LocationIndexedModel
hazard_model_id = UnicodeAttribute()
imt = EnumConstrainedUnicodeAttribute(IntensityMeasureTypeEnum)

hazard_agg = EnumConstrainedUnicodeAttribute(AggregationEnum) # eg MEAN
disagg_agg = EnumConstrainedUnicodeAttribute(AggregationEnum)

disaggs = CompressedPickleAttribute() # a very compressible numpy array,
bins = PickleAttribute() # a much smaller numpy array

shaking_level = FloatAttribute()
probability = EnumAttribute(ProbabilityEnum) # eg TEN_PCT_IN_50YRS
}

class DisaggAggregationExceedance{
... fields from DisaggAggregationBase
}

class DisaggAggregationOccurence{
... fields from DisaggAggregationBase
}
LocationIndexedModel <|-- DisaggAggregationBase
DisaggAggregationBase <| -- DisaggAggregationExceedance
DisaggAggregationBase <| -- DisaggAggregationOccurence
```
25 changes: 25 additions & 0 deletions docs/domain_model/gridded_hazard_models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
**Tables:**

- **GriddedHazard** - Grid points defined in location_grid_id has a values in grid_poes.
- **HazardAggregation** - stores aggregate hazard curves [see ./openquake_models for details](./openquake_models.md)

```mermaid
classDiagram
direction LR

class GriddedHazard{
partition_key = UnicodeAttribute(hash_key=True)
sort_key = UnicodeAttribute(range_key=True)
version = VersionAttribute()
created = TimestampAttribute(default=datetime_now)
hazard_model_id = UnicodeAttribute()
location_grid_id = UnicodeAttribute()
vs30 = EnumConstrainedIntegerAttribute(VS30Enum)
imt = EnumConstrainedUnicodeAttribute(IntensityMeasureTypeEnum)
agg = EnumConstrainedUnicodeAttribute(AggregationEnum)
poe = FloatAttribute()
grid_poes = CompressedListAttribute()
}

GriddedHazard --> "1..*" HazardAggregation
```
95 changes: 95 additions & 0 deletions docs/domain_model/openquake_models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
## CURRENT STATE

These table models are used to store data created by GEMs **openquake** PSHA engine. Data is extracted from the HDF5 files created by openquake and stored with relevant metadata in the following tables.

## Seismic Hazard Model diagram

**Tables:**

- **ToshiOpenquakeMeta** - stores metadata from the job configuration and the openquake results.

```mermaid
classDiagram
direction LR

class ToshiOpenquakeMeta {
partition_key = UnicodeAttribute(hash_key=True) # a static value as we actually don't want to partition our data
hazsol_vs30_rk = UnicodeAttribute(range_key=True)

created = TimestampAttribute(default=datetime_now)

hazard_solution_id = UnicodeAttribute()
general_task_id = UnicodeAttribute()
vs30 = NumberAttribute() # vs30 value

imts = UnicodeSetAttribute() # list of IMTs
locations_id = UnicodeAttribute() # Location codes identifier (ENUM?)
source_ids = UnicodeSetAttribute()
source_tags = UnicodeSetAttribute()
inv_time = NumberAttribute() # Invesigation time in years

src_lt = JSONAttribute() # sources meta as DataFrame JSON
gsim_lt = JSONAttribute() # gmpe meta as DataFrame JSON
rlz_lt = JSONAttribute() # realization meta as DataFrame JSON
}
```

**Tables:**

- **OpenquakeRealization** - stores the individual hazard realisation curves.
- **HazardAggregation** - stores aggregate hazard curves from **OpenquakeRealization** curves.

The base class **LocationIndexedModel** provides common attributes and indexing for models that support location-based indexing.


```mermaid
classDiagram
direction TB

class LocationIndexedModel {
partition_key = UnicodeAttribute(hash_key=True) # For this we will use a downsampled location to 1.0 degree
sort_key = UnicodeAttribute(range_key=True)

nloc_001 = UnicodeAttribute() # 0.001deg ~100m grid
nloc_01 = UnicodeAttribute() # 0.01deg ~1km grid
nloc_1 = UnicodeAttribute() # 0.1deg ~10km grid
nloc_0 = UnicodeAttribute() # 1.0deg ~100km grid

version = VersionAttribute()
uniq_id = UnicodeAttribute()

lat = FloatAttribute() # latitude decimal degrees
lon = FloatAttribute() # longitude decimal degrees

vs30 = EnumConstrainedIntegerAttribute(VS30Enum)
site_vs30 = FloatAttribute(null=True)

created = TimestampAttribute(default=datetime_now)

}

class OpenquakeRealization {
... fields from LocationIndexedModel
hazard_solution_id = UnicodeAttribute()
source_tags = UnicodeSetAttribute()
source_ids = UnicodeSetAttribute()

rlz = IntegerAttribute() # index of the openquake realization
values = ListAttribute(of=IMTValuesAttribute)
}

class HazardAggregation {
... fields from LocationIndexedModel
hazard_model_id = UnicodeAttribute() e.g. `NSHM_V1.0.4``
imt = EnumConstrainedUnicodeAttribute(IntensityMeasureTypeEnum)
agg = EnumConstrainedUnicodeAttribute(AggregationEnum)
values = ListAttribute(of=LevelValuePairAttribute)
}


ToshiOpenquakeMeta --> "0..*" OpenquakeRealization
HazardAggregation --> "1..*" OpenquakeRealization
LocationIndexedModel <|-- OpenquakeRealization
LocationIndexedModel <|-- HazardAggregation

```
115 changes: 115 additions & 0 deletions docs/domain_model/proposed_hazard_models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
## FUTURE STATE

These table models are used to store data created by any suitable PSHA engine.

## Seismic Hazard Model diagram

Different hazard engines, versions and/or configurations may produce compatible calcalution curves.

This model is similar to the current one, except that:

- the concept of compatible producer configs is supported
- **HazardRealizationCurve** records are identified solely by internal attributes & relationships. So **toshi_hazard_soluton_id** is removed but can be recorded in **HazardRealizationMeta**.

**TODO:** formalise logic tree branch identification for both source and GMM logic trees so that these are:

- a) unique and unambigious, and
- b) easily relatable to **nzshm_model** instances.

**Tables:**

- **CompatibleHazardConfig (CHC)** - defines a logical identifier for compatable **HCPCs**. Model managers must ensure that compability holds true.
- **HazardCurveProducerConfig (HCPC)** - stores the unique attributes that define compatible hazard curve producers.
- **HazardRealizationMeta** - stores metadata common to a set of hazard realization curves.
- **HazardRealizationCurve** - stores the individual hazard realisation curves.
- **HazardAggregation** - stores the aggregated hazard curves [see ./openquake_models for details](./openquake_models.md)

```mermaid
classDiagram
direction TB

class CompatibleHazardConfig {
primary_key
}

class HazardCurveProducerConfig {
primary_key
fk_compatible_config

producer_software = UnicodeAttribute()
producer_version_id = UnicodeAttribute()
configuration_hash = UnicodeAttribute()
configuration_data = UnicodeAttribute()
}

class HazardRealizationMeta {
partition_key = UnicodeAttribute(hash_key=True) # a static value as we actually don't want to partition our data
sort_key = UnicodeAttribute(range_key=True)

fk_compatible_config
fk_producer_config

created = TimestampAttribute(default=datetime_now)

?hazard_solution_id = UnicodeAttribute()
?general_task_id = UnicodeAttribute()
vs30 = NumberAttribute() # vs30 value

src_lt = JSONAttribute() # sources meta as DataFrame JSON
gsim_lt = JSONAttribute() # gmpe meta as DataFrame JSON
rlz_lt = JSONAttribute() # realization meta as DataFrame JSON
}

class LocationIndexedModel {
partition_key = UnicodeAttribute(hash_key=True)
sort_key = UnicodeAttribute(range_key=True)

nloc_001 = UnicodeAttribute() # 0.001deg ~100m grid
etc...
version = VersionAttribute()
uniq_id = UnicodeAttribute()

lat = FloatAttribute() # latitude decimal degrees
lon = FloatAttribute() # longitude decimal degrees

vs30 = EnumConstrainedIntegerAttribute(VS30Enum)
site_vs30 = FloatAttribute(null=True)

created = TimestampAttribute(default=datetime_now)
}

class HazardRealizationCurve {
... fields from LocationIndexedModel
fk_metadata
fk_compatible_config

?source_tags = UnicodeSetAttribute()
?source_ids = UnicodeSetAttribute()

rlz # TODO ID of the realization
values = ListAttribute(of=IMTValuesAttribute)
}

class HazardAggregation {
... fields from LocationIndexedModel

fk_compatible_config

hazard_model_id = UnicodeAttribute() e.g. `NSHM_V1.0.4``
imt = EnumConstrainedUnicodeAttribute(IntensityMeasureTypeEnum)
agg = EnumConstrainedUnicodeAttribute(AggregationEnum)
values = ListAttribute(of=LevelValuePairAttribute)
}

CompatibleHazardConfig --> "1..*" HazardCurveProducerConfig
HazardRealizationMeta --> "*..1" HazardCurveProducerConfig
HazardRealizationMeta --> "*..1" CompatibleHazardConfig

LocationIndexedModel <|-- HazardRealizationCurve
LocationIndexedModel <|-- HazardAggregation

HazardRealizationCurve --> "*..1" CompatibleHazardConfig
HazardRealizationCurve --> "*..1" HazardRealizationMeta

HazardAggregation --> "*..1" CompatibleHazardConfig
```
1 change: 1 addition & 0 deletions docs/gridded_hazard_query_api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: toshi_hazard_store.query.gridded_hazard_query
1 change: 1 addition & 0 deletions docs/hazard_disagg_query_api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: toshi_hazard_store.query.disagg_queries
1 change: 1 addition & 0 deletions docs/hazard_query_api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: toshi_hazard_store.query.hazard_query
10 changes: 9 additions & 1 deletion docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,19 @@
To install toshi-hazard-store, run this command in your
terminal:

### using pip

``` console
$ pip install toshi-hazard-store
```

This is the preferred method to install toshi-hazard-store, as it will always install the most recent stable release.
### using poetry

``` console
$ poetry add toshi-hazard-store
```

These are the preferred method to install toshi-hazard-store, as they will always install the most recent stable release.

If you don't have [pip][] installed, this [Python installation guide][]
can guide you through the process.
Expand Down
Loading
Loading