Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres ranges rebuild #1017

Merged
merged 32 commits into from
May 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
54eb1b3
Implement new schema and management - nothing is using it yet!
SpacemanPaul May 3, 2024
4e3da8f
Start getting update process onto new schema - incomplete work in pro…
SpacemanPaul May 6, 2024
be3a6bd
Writing to the new layer-based schema, with batch caching.
SpacemanPaul May 6, 2024
5aea75b
Reading from the new layer range table. More product->layer renaming.
SpacemanPaul May 6, 2024
7cadc5d
Passing mypy, failing tests.
SpacemanPaul May 6, 2024
fba9a30
Passing unit tests, server intialising. Integration tests still failing.
SpacemanPaul May 6, 2024
50091bc
Passing integration tests.
SpacemanPaul May 7, 2024
df9c9ea
make datacube/env handling more generic (one step closer to multi-db)…
SpacemanPaul May 7, 2024
685a939
Passing all tests.
SpacemanPaul May 7, 2024
2b7e8e5
Add new tests and fix broken tests.
SpacemanPaul May 8, 2024
5f92285
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 8, 2024
ca68fca
lintage.
SpacemanPaul May 8, 2024
bee3f22
lintage.
SpacemanPaul May 8, 2024
bf2dd9b
Don't rely on DEA Explorer
SpacemanPaul May 8, 2024
a03c8f9
Update db to postgres 16 and use DB_URL
SpacemanPaul May 8, 2024
d9ec247
Revert main docker-compose.yaml
SpacemanPaul May 8, 2024
932023a
Need port as well.
SpacemanPaul May 8, 2024
57750fe
Fix nodb test fixture for GH
SpacemanPaul May 8, 2024
08d236f
Opps - used non-raw github link.
SpacemanPaul May 8, 2024
9be4ad2
Fix ows-update call in GHA test prep script.
SpacemanPaul May 8, 2024
15802ca
Update documentation.
SpacemanPaul May 9, 2024
5b2d437
Fix spelling or add (non-)words to wordlist.
SpacemanPaul May 9, 2024
cdc04e5
Various fixes/cleanups found on self-review.
SpacemanPaul May 9, 2024
746e877
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 9, 2024
88d3fce
Make no_db in test_no_db_routes a proper fixture.
SpacemanPaul May 10, 2024
52c04ea
Documentation edits
SpacemanPaul May 10, 2024
39d277a
Some cleanup in wms_utils.py
SpacemanPaul May 10, 2024
3fc0b17
Some cleanup in update_ranges_impl.py
SpacemanPaul May 10, 2024
fc83287
Make access in initialiser more consistent.
SpacemanPaul May 10, 2024
d6af6e9
Provide better examples of role granting in scripts and documentation.
SpacemanPaul May 10, 2024
b5ddeed
Fix inconsistent indentation.
SpacemanPaul May 10, 2024
0de93b4
Typo
SpacemanPaul May 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .env_simple
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@
################
# ODC DB Config
# ##############
DB_HOSTNAME=postgres
ODC_DEFAULT_DB_URL=postgresql://opendatacubeusername:opendatacubepassword@postgres:5432/opendatacube
# Needed for docker db image.
DB_PORT=5432
DB_USERNAME=opendatacubeusername
DB_PASSWORD=opendatacubepassword
Expand Down
23 changes: 4 additions & 19 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ Before you submit a pull request, check that it meets these guidelines:

1. The pull request should include tests (and should pass them - and all pre-existing tests!)
2. If the pull request adds or modifies functionality, the docs should be updated.
3. The pull request should work for Python 3.7+. Check the results of
3. The pull request should work for Python 3.10+. Check the results of
the github actions and make sure that your PR passes all checks and
does not decrease test coverage.

Expand Down Expand Up @@ -143,8 +143,9 @@ indexing and create db dump

# now go to ows container
docker exec -it datacube-ows_ows_1 bash
datacube-ows-update --schema --role <db_read_role>
datacube-ows-update --views
# Run this a database superuser role
datacube-ows-update --schema --read-role <db_read_role> --write-role <db_write_role>
# Run this as the <db_write_role> user above
datacube-ows-update
exit

Expand Down Expand Up @@ -178,22 +179,6 @@ manually modify translation for `de` for `assert` test to pass, then create `ows
docker cp datacube-ows_ows_1:/tmp/translations datacube-ows/integrations/cfg/


Generating database relationship diagram
----------------------------------------

.. code-block:: console

docker run -it --rm -v "$PWD:/output" --network="host" schemaspy/schemaspy:snapshot -u $DB_USERNAME -host localhost -port $DB_PORT -db $DB_DATABASE -t pgsql11 -schemas wms -norows -noviews -pfp -imageformat svg

Merge relationship diagram and orphan diagram

.. code-block:: console

python3 svg_stack.py --direction=h --margin=100 ../wms/diagrams/summary/relationships.real.large.svg ../wms/diagrams/orphans/orphans.svg > ows.merged.large.svg

cp svg_stack/ows.merged.large.svg ../datacube-ows/docs/diagrams/db-relationship-diagram.svg


Links
-----

Expand Down
22 changes: 10 additions & 12 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,23 +3,20 @@ datacube-ows
============

.. image:: https://github.com/opendatacube/datacube-ows/workflows/Linting/badge.svg
:target: https://github.com/opendatacube/datacube-ows/actions?query=workflow%3ALinting
:target: https://github.com/opendatacube/datacube-ows/actions?query=workflow%3ACode%20Linting

.. image:: https://github.com/opendatacube/datacube-ows/workflows/Tests/badge.svg
:target: https://github.com/opendatacube/datacube-ows/actions?query=workflow%3ATests

.. image:: https://github.com/opendatacube/datacube-ows/workflows/Docker/badge.svg
:target: https://github.com/opendatacube/datacube-ows/actions?query=workflow%3ADocker
:target: https://github.com/opendatacube/datacube-ows/actions?query=workflow%3ADockerfile%20Linting

.. image:: https://github.com/opendatacube/datacube-ows/workflows/Scan/badge.svg
:target: https://github.com/opendatacube/datacube-ows/actions?query=workflow%3A%22Scan%22

.. image:: https://codecov.io/gh/opendatacube/datacube-ows/branch/master/graph/badge.svg
:target: https://codecov.io/gh/opendatacube/datacube-ows

.. image:: https://img.shields.io/pypi/v/datacube?label=datacube
:alt: PyPI

Datacube Open Web Services
--------------------------

Expand All @@ -41,7 +38,7 @@ Features
System Architecture
-------------------

.. image:: docs/diagrams/ows_diagram.png
.. image:: docs/diagrams/ows_diagram1.9.png
:width: 700

Community
Expand Down Expand Up @@ -141,14 +138,14 @@ To run the standard Docker image, create a docker volume containing your ows con
-e AWS_DEFAULT_REGION=ap-southeast-2 \ # AWS Default Region (supply even if NOT accessing files on S3! See Issue #151)
-e SENTRY_DSN=https://[email protected]/projid \ # Key for Sentry logging (optional)
\ # Database connection URL: postgresql://<username>:<password>@<hostname>:<port>/<database>
-e ODC_DEFAULT_DB_URL=postgresql://cube:DataCube@172.17.0.1:5432/datacube \
-e ODC_DEFAULT_DB_URL=postgresql://myuser:mypassword@172.17.0.1:5432/mydb \
-e PYTHONPATH=/code # The default PATH is under env, change this to target /code
-p 8080:8000 \ # Publish the gunicorn port (8000) on the Docker
\ # container at port 8008 on the host machine.
--mount source=test_cfg,target=/code/datacube_ows/config \ # Mount the docker volume where the config lives
name_of_built_container

The image is based on the standard ODC container.
The image is based on the standard ODC container and an external database

Installation with Conda
------------
Expand All @@ -157,7 +154,7 @@ The following instructions are for installing on a clean Linux system.

* Create a conda python 3.8 and activate conda environment::

conda create -n ows -c conda-forge python=3.8 datacube pre_commit postgis
conda create -n ows -c conda-forge python=3.10 datacube pre_commit postgis
conda activate ows

* install the latest release using pip install::
Expand Down Expand Up @@ -186,7 +183,7 @@ The following instructions are for installing on a clean Linux system.
# to create schema, tables and materialised views used by datacube-ows.

export DATACUBE_OWS_CFG=datacube_ows.ows_cfg_example.ows_cfg
datacube-ows-update --role ubuntu --schema
datacube-ows-update --write-role ubuntu --schema


* Create a configuration file for your service, and all data products you wish to publish in
Expand Down Expand Up @@ -253,8 +250,9 @@ Local Postgres database
| xargs -n1 -I {} datacube dataset add s3://deafrica-data/{}

5. Write an ows config file to identify the products you want available in ows, see example here: https://github.com/opendatacube/datacube-ows/blob/master/datacube_ows/ows_cfg_example.py
6. Run `datacube-ows-update --schema --role <db_read_role>` to create ows specific tables
7. Run `datacube-ows-update` to generate ows extents.
6. Run ``datacube-ows-update --schema --read-role <db_read_role> --write-role <db_write_role>`` as a database
superuser role to create ows specific tables and views
7. Run ``datacube-ows-update`` as ``db_write_role`` to populate ows extent tables.

Apache2 mod_wsgi
----------------
Expand Down
4 changes: 2 additions & 2 deletions check-code-all.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ datacube product add https://raw.githubusercontent.com/GeoscienceAustralia/dea-c

# Geomedian for summary product testing

datacube product add https://explorer-aws.dea.ga.gov.au/products/ga_ls8c_nbart_gm_cyear_3.odc-product.yaml
datacube product add https://raw.githubusercontent.com/GeoscienceAustralia/dea-config/master/products/baseline_satellite_data/geomedian-au/ga_ls8c_nbart_gm_cyear_3.odc-product.yaml

# S2 multiproduct datasets
datacube dataset add https://dea-public-data.s3.ap-southeast-2.amazonaws.com/baseline/ga_s2bm_ard_3/52/LGM/2017/07/19/20170719T030622/ga_s2bm_ard_3-2-1_52LGM_2017-07-19_final.odc-metadata.yaml --ignore-lineage
Expand All @@ -44,7 +44,7 @@ datacube dataset add https://dea-public-data.s3.ap-southeast-2.amazonaws.com/der
datacube dataset add https://dea-public-data.s3.ap-southeast-2.amazonaws.com/derivative/ga_ls8c_nbart_gm_cyear_3/3-0-0/x17/y37/2021--P1Y/ga_ls8c_nbart_gm_cyear_3_x17y37_2021--P1Y_final.odc-metadata.yaml --ignore-lineage

# create material view for ranges extents
datacube-ows-update --schema --role $DB_USERNAME
datacube-ows-update --schema --write-role $DB_USERNAME
datacube-ows-update

# run test
Expand Down
4 changes: 2 additions & 2 deletions datacube_ows/cfg_parser_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,12 +117,12 @@ def parse_path(path: str | None, parse_only: bool, folders: bool, styles: bool,
click.echo()
click.echo("Layers and Styles")
click.echo("=================")
for lyr in cfg.product_index.values():
for lyr in cfg.layer_index.values():
click.echo(f"{lyr.name} [{','.join(lyr.product_names)}]")
print_styles(lyr)
click.echo()
if input_file or output_file:
layers_report(cfg.product_index, input_file, output_file)
layers_report(cfg.layer_index, input_file, output_file)
return True


Expand Down
36 changes: 18 additions & 18 deletions datacube_ows/loading.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,37 +125,37 @@ def simple_layer_query(cls, layer: OWSNamedLayer,
class DataStacker:
@log_call
def __init__(self,
product: OWSNamedLayer,
layer: OWSNamedLayer,
geobox: GeoBox,
times: list[datetime.datetime],
resampling: Resampling | None = None,
style: StyleDef | None = None,
bands: list[str] | None = None):
self._product = product
self.cfg = product.global_cfg
self._layer = layer
self.cfg = layer.global_cfg
self._geobox = geobox
self._resampling = resampling if resampling is not None else "nearest"
self.style = style
if style:
self._needed_bands = list(style.needed_bands)
elif bands:
self._needed_bands = [self._product.band_idx.locale_band(b) for b in bands]
self._needed_bands = [self._layer.band_idx.locale_band(b) for b in bands]
else:
self._needed_bands = list(self._product.band_idx.measurements.keys())
self._needed_bands = list(self._layer.band_idx.measurements.keys())

for band in self._product.always_fetch_bands:
for band in self._layer.always_fetch_bands:
if band not in self._needed_bands:
self._needed_bands.append(band)
self.raw_times = times
if product.mosaic_date_func:
self._times = [product.mosaic_date_func(product.ranges["times"])]
if self._layer.mosaic_date_func:
self._times = [self._layer.mosaic_date_func(layer.ranges.times)]
else:
self._times = [
self._product.search_times(
self._layer.search_times(
t, self._geobox)
for t in times
]
self.group_by = self._product.dataset_groupby()
self.group_by = self._layer.dataset_groupby()
self.resource_limited = False

def needed_bands(self) -> list[str]:
Expand Down Expand Up @@ -185,7 +185,7 @@ def datasets(self, index: datacube.index.Index,
# Not returning datasets - use main product only
queries = [
ProductBandQuery.simple_layer_query(
self._product,
self._layer,
self.needed_bands(),
self.resource_limited)

Expand All @@ -194,10 +194,10 @@ def datasets(self, index: datacube.index.Index,
# we have a style - lets go with that.
queries = ProductBandQuery.style_queries(self.style)
elif all_flag_bands:
queries = ProductBandQuery.full_layer_queries(self._product, self.needed_bands())
queries = ProductBandQuery.full_layer_queries(self._layer, self.needed_bands())
else:
# Just take needed bands.
queries = [ProductBandQuery.simple_layer_query(self._product, self.needed_bands())]
queries = [ProductBandQuery.simple_layer_query(self._layer, self.needed_bands())]

if point:
geom = point
Expand Down Expand Up @@ -338,14 +338,14 @@ def manual_data_stack(self,
d = self.read_data_for_single_dataset(ds, measurements, self._geobox, fuse_func=fuse_func)
extent_mask = None
for band in non_flag_bands:
for f in self._product.extent_mask_func:
for f in self._layer.extent_mask_func:
if extent_mask is None:
extent_mask = f(d, band)
else:
extent_mask &= f(d, band)
if extent_mask is not None:
d = d.where(extent_mask)
if self._product.solar_correction and not skip_corrections:
if self._layer.solar_correction and not skip_corrections:
for band in non_flag_bands:
d[band] = solar_correct_data(d[band], ds)
if merged is None:
Expand Down Expand Up @@ -383,7 +383,7 @@ def read_data(self,
measurements=measurements,
fuse_func=fuse_func,
skip_broken_datasets=skip_broken,
patch_url=self._product.patch_url,
patch_url=self._layer.patch_url,
resampling=resampling)
except Exception as e:
_LOG.error("Error (%s) in load_data: %s", e.__class__.__name__, str(e))
Expand All @@ -399,7 +399,7 @@ def read_data_for_single_dataset(self,
resampling: Resampling = "nearest",
fuse_func: datacube.api.core.FuserFunction | None = None) -> xarray.Dataset:
datasets = [dataset]
dc_datasets = datacube.Datacube.group_datasets(datasets, self._product.time_resolution.dataset_groupby())
dc_datasets = datacube.Datacube.group_datasets(datasets, self._layer.time_resolution.dataset_groupby())
CredentialManager.check_cred()
try:
return datacube.Datacube.load_data(
Expand All @@ -408,7 +408,7 @@ def read_data_for_single_dataset(self,
measurements=measurements,
fuse_func=fuse_func,
skip_broken_datasets=skip_broken,
patch_url=self._product.patch_url,
patch_url=self._layer.patch_url,
resampling=resampling)
except Exception as e:
_LOG.error("Error (%s) in load_data: %s", e.__class__.__name__, str(e))
Expand Down
10 changes: 5 additions & 5 deletions datacube_ows/mv_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,11 @@ def get_sqlalc_engine(index: Index) -> Engine:

def get_st_view(meta: MetaData) -> Table:
return Table('space_time_view', meta,
Column('id', UUID()),
Column('dataset_type_ref', SMALLINT()),
Column('spatial_extent', Geometry(from_text='ST_GeomFromGeoJSON', name='geometry')),
Column('temporal_extent', TSTZRANGE())
)
Column('id', UUID()),
Column('dataset_type_ref', SMALLINT()),
SpacemanPaul marked this conversation as resolved.
Show resolved Hide resolved
Column('spatial_extent', Geometry(from_text='ST_GeomFromGeoJSON', name='geometry')),
Column('temporal_extent', TSTZRANGE()),
schema="ows")


_meta = MetaData()
Expand Down
21 changes: 12 additions & 9 deletions datacube_ows/ogc.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,14 +179,17 @@ def ogc_wcs_impl():
def ping():
db_ok = False
cfg = get_config()
with cfg.dc.index._db.give_me_a_connection() as conn:
results = conn.execute(text("""
SELECT *
FROM wms.product_ranges
LIMIT 1""")
)
for r in results:
db_ok = True
try:
with cfg.dc.index._db.give_me_a_connection() as conn:
results = conn.execute(text("""
SELECT *
FROM ows.layer_ranges
LIMIT 1""")
)
for r in results:
db_ok = True
except Exception:
pass
if db_ok:
return (render_template("ping.html", status="Up"), 200, resp_headers({"Content-Type": "text/html"}))
else:
Expand All @@ -202,7 +205,7 @@ def ping():
def legend(layer, style, dates=None):
# pylint: disable=redefined-outer-name
cfg = get_config()
product = cfg.product_index.get(layer)
product = cfg.layer_index.get(layer)
if not product:
return ("Unknown Layer", 404, resp_headers({"Content-Type": "text/plain"}))
if dates is None:
Expand Down
Loading
Loading