Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add STAC catalog #297

Merged
merged 91 commits into from
Aug 8, 2023
Merged

Add STAC catalog #297

merged 91 commits into from
Aug 8, 2023

Conversation

matprov
Copy link
Collaborator

@matprov matprov commented Feb 24, 2023

Overview

  • Adds STAC to the stack (optional) when ./components/stac is added to EXTRA_CONF_DIRS.

Changes

  • Service stac (API) gets added with endpoints /twitcher/ows/proxy/stac and /stac.

  • STAC catalog can be explored via the stac-browser component, available under /stac-browser.

  • Image crim-ca/stac-app is a STAC implementation based on stac-utils/stac-fastapi.

  • Image crim-ca/stac-browser is a fork of radiantearth/stac-browser.

  • Adds Magpie permissions and service for stac endpoints.

  • Uses stac-populator to populate STAC catalog with sample collection
    items via CEDA STAC Generator, employed in sample
    CMIP Dataset Ingestion Workflows.

Demo Instance

STAC API : https://stac-dev.crim.ca/stac/
STAC Browser : https://stac-dev.crim.ca/stac-browser/

Note that by default STAC API will return 10 items to reduce payload size. It is however possible to change this limitation by adding ?limit=200 to the URL in order to query 200 items. In the response payload you'll have a link referring to the next items, adding a token to the query params in order for STAC API to return next results.

Sample STAC API collection query using a CLI

Remove the -c flag for global query across any collection.

pip install pystac-client
stac-client search https://stac-dev.crim.ca/stac -c c604ffb6d610adbb9a6b4787db7b8fd7 --query "variable_id=txgt_32" "scenario=ssp585"

Sample STAC API global query using CQL via cURL call

Note that the operators are describe here : https://portal.ogc.org/files/96288

curl --location --globoff 'https://stac-dev.crim.ca/stac/search' \
--header 'Content-Type: application/json' \
--data '{
   "filter":{
      "and":[
         {
            "eq":[
               {
                  "property":"freq"
               },
               "MS"
            ]
         },
         {
            "like":[
               {
                  "property":"variable_id"
               },
               "tr_%"
            ]
         }
      ],
      "intersects":[
         {
            "property":"geometry"
         },
         {
            "type":"Polygon",
            "coordinates":[
               [
                  [
                     -140.99778,
                     41.6751050889
                  ],
                  [
                     -140.99778,
                     83.23324
                  ],
                  [
                     -52.6480987209,
                     83.23324
                  ],
                  [
                     -52.6480987209,
                     41.6751050889
                  ],
                  [
                     -140.99778,
                     41.6751050889
                  ]
               ]
            ]
         }
      ],
      "anyinteracts":[
         {
            "property":"datetime"
         },
         [
            "2010-05-03T13:21:30.040Z",
            "2022-05-03T13:21:30.040Z"
         ]
      ]
   }
}'

Get the queryables of the CMIP6 collection, statically created at collection creation

https://stac-dev.crim.ca/stac/collections/c604ffb6d610adbb9a6b4787db7b8fd7

Get the queryables of the CMIP6 collection, dynamically created at query time

https://stac-dev.crim.ca/stac/collections/c604ffb6d610adbb9a6b4787db7b8fd7/queryables

Get the queryables of the union of the CMIP5 and CMIP6 collections, dynamically created at query time

https://stac-dev.crim.ca/stac/queryables?collections=0798aa197d54eb4332767a5a4077fb0f,c604ffb6d610adbb9a6b4787db7b8fd7

daccs_configs_branch: stac_populator
daccs_skip_ci: true

fyi @huard @mishaschwartz

@crim-jenkins-bot
Copy link
Collaborator

E2E Test Results

DACCS-iac Pipeline Results

Build URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/1830/
Result : failure

BIRDHOUSE_DEPLOY_BRANCH : add_stac
DACCS_CONFIGS_BRANCH : stac_populator
PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master
PAVICS_SDI_BRANCH : master

DESTROY_INFRA_ON_EXIT : true
PAVICS_HOST : https://host-140-133.rdext.crim.ca

Infrastructure deployment failed. Instance has not been destroyed. @matprov

Copy link
Collaborator

@fmigneault fmigneault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick review pass.
Will review again once the platform works with those configs.

CHANGES.md Outdated Show resolved Hide resolved
birdhouse/components/stac/.gitignore Outdated Show resolved Hide resolved
birdhouse/optional-components/README.rst Outdated Show resolved Hide resolved
@matprov
Copy link
Collaborator Author

matprov commented Jul 27, 2023

Is there a way to change or remove these links so that they're not confusing to the end-user?

@mishaschwartz Now possible via

stac:
    environment:
         - OPENAPI_URL=/stac/api
         - DOCS_URL=/stac/api.html

Copy link
Collaborator

@fmigneault fmigneault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor edit left to fix.

Is there anything still blocking this PR?

birdhouse/optional-components/README.rst Outdated Show resolved Hide resolved
@matprov
Copy link
Collaborator Author

matprov commented Aug 8, 2023

Is there anything still blocking this PR?

Only approvals of @mishaschwartz and @tlvu

@matprov matprov merged commit bc3273c into master Aug 8, 2023
3 checks passed
@matprov matprov deleted the add_stac branch August 8, 2023 14:41
@fmigneault
Copy link
Collaborator

@matprov
The bump versions were not applied. Are they planned in a follow-up PR?

Copy link
Collaborator

@tlvu tlvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First off, totally sorry for my late review. I had too much on my plate lately and I completely forgot about this PR given it has been opened for like 6 months and movement seems to only be picked up lately.

I found a few things to fix here but it's alright since all are in the new codes so no regression at all.

@@ -15,7 +15,32 @@
[Unreleased](https://github.com/bird-house/birdhouse-deploy/tree/master) (latest)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matprov No bumpversion? FYI the release procedure https://github.com/bird-house/birdhouse-deploy/blob/master/birdhouse/README.rst#release-procedure

Planning on a quick subsequent PR?

services:
stac:
container_name: stac
image: ghcr.io/crim-ca/stac-app:main
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not using exact version for reproductibility?

Copy link
Collaborator

@mishaschwartz mishaschwartz Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this first? crim-ca/stac-app#1

environment:
- POSTGRES_USER=${STAC_POSTGRES_USER}
- POSTGRES_PASS=${STAC_POSTGRES_PASSWORD}
- POSTGRES_DBNAME=postgis
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, is postgis hardcoded somewhere? Why not call the DB simply stac?


stac-browser:
container_name: stac-browser
image: ghcr.io/crim-ca/stac-browser:docker_image_push
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Versionned image for reproductibility?

- PGDATABASE=postgis
volumes:
- stac-db:/var/lib/postgresql/data
healthcheck:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to have a healthcheck here. The other 2 containers, would be nice to have some sort of healthcheck as well.

retries: 5

# extend proxy with endpoint and config for STAC API access
proxy:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate fragment with existing file birdhouse/components/stac/config/proxy/docker-compose-extra.yml !

@@ -4,4 +4,5 @@
proxy_set_header X-Forwarded-Proto $real_scheme;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Host $host:$server_port;
proxy_set_header Forwarded "proto=https;host=${PAVICS_FQDN}"; # Helps the STAC component to craft URLs containing the full PAVICS_FQDN
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use PAVICS_FQDN_PUBLIC for anything public facing.

# populates STAC catalog with sample collection items
stac-populator:
container_name: stac-populator
image: ghcr.io/crim-ca/stac-populator:master
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Versionned image for reproductibility?

- STAC_ASSET_GENERATOR_TIMEOUT=${STAC_ASSET_GENERATOR_TIMEOUT}
- STAC_HOST=http://stac:8000/stac # STAC API internally accessed to avoid Twitcher authentication
command: >
bash -c "./wait-for-it.sh stac:8000 -t 30 && ./populate.sh"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noob question about the stac-populator: does this just populate once and exit or it stays in the background and listen for new data and repopulate?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another Noob question about the stac-populator: how does it knows which collection to crawl and populate the stac-db? Is the path https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/catalog/datasets/catalog.html hardcoded? This should be configurable.

Or are we crawling directly on disk? But then I do not see any volume-mount.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tlvu The pipeline for populating STAC is being handled in a separate repo (https://github.com/crim-ca/stac-populator). I think this is just for testing.

@@ -0,0 +1,6 @@
version: "3.4"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire file should be in birdhouse/optional-components/stac-public-access/config/magpie/docker-compose-extra.yml to follow the new layout I think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mishaschwartz Oh this one is a special case ! If this file move to birdhouse/optional-components/stac-public-access/config/magpie/docker-compose-extra.yml, then there is no file docker-compose-extra.yml at the root of this component birdhouse/optional-components/stac-public-access/ !

Would the "inner" docker-compose fragment file be discovered even if no file at the root of the component?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works... but if we want to follow the pattern we use elsewhere we should really add this to birdhouse/optional-components/stac-public-access/config/magpie/docker-compose-extra.yml so that it will only be set if magpie is enabled as well.

We're almost certainly going to make magpie a required component but it would be nicer to keep the pattern we've already established.

Thanks for finding that @tlvu

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes agreed. The new directory layout pattern is not only for looking nice and tidy, it's to allow 100% flexible deployment.

I think we need to document this pattern here https://github.com/bird-house/birdhouse-deploy/blob/master/birdhouse/README.rst and explain the proper reason behind it.

mishaschwartz added a commit that referenced this pull request Aug 10, 2023
@mishaschwartz mishaschwartz mentioned this pull request Aug 10, 2023
mishaschwartz added a commit that referenced this pull request Aug 15, 2023
## Overview

This PR includes some changes that were suggested in a review for #297.
But because the PR was already merged they are included here:

- removes extra block to include in docker compose files (no longer
needed)
- moves docker compose file in `stac-public-access` component to the
correct location
- uses `PAVICS_FQDN_PUBLIC` for public facing URLs in all places

## Changes

**Non-breaking changes**
- code reorganization

**Breaking changes**
None

## Related Issue / Discussion

- Related to #297 

## Additional Information
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/magpie Related to https://github.com/Ouranosinc/Magpie component/STAC Features or components related to STAC component/twitcher Related to https://github.com/bird-house/twitcher documentation Improvements or additions to documentation project/DACCS Related to DACCS project (https://github.com/orgs/DACCS-Climate)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants