H2Ox is a team of Oxford University PhD students and researchers who won first prize in theWave2Web Hackathon, September 2021, organised by the World Resources Institute and sponsored by Microsoft and Blackrock. In the Wave2Web hackathon, teams competed to predict reservoir levels in four reservoirs in the Kaveri basin West of Bangaluru: Kabini, Krishnaraja Sagar, Harangi, and Hemavathy. H2Ox used sequence-to-sequence models with meterological and forecast forcing data to predict reservoir levels up to 90 days in the future.
The H2Ox dashboard can be found at https://h2ox.org. The data API can be accessed at https://api.h2ox.org. All code and repos can be https://github.com/H2Oxford. Our Prototype Submission Slides are here. The H2Ox team is Lucas Kruitwagen, Chris Arderne, Tommy Lees, and Lisa Thalheimer.
This repo is for a dockerised service to ingest ECMWF ERA5-Land data into a Zarr archive. The Zarr data is rechunked in the time domain in blocks of four years. This ensures efficient access to moderately-sized chunks of data, facilitating timeseries research. Two variables are ingested: two-meter temperature (t2m) and total precipitation (tp).
For development, the repo can be pip installed with the -e
flag and [pre-commit]
options:
git clone https://github.com/H2Oxford/h2ox-chirps.git
cd h2ox-chirps
pip install -e .[pre-commit]
For containerised deployment, a docker container can be built from this repo. This repo supports the creation of three different dockerised services:
- an
enqueuer
queues up data requests from the Copernicus Data Store (CDS) - the
downloader
periodically pings the CDS to determine if the data is ready for download. When it is ready, it downloads the data and stores it in cloud storage. - the
ingestor
ingests the downloaded data into a zarr archive, rechunking it in the time dimension.
The apps use three sequential cloud storage buckets to store download status json tokens, and then raw .nc files of large chunks of continuguous ECMWF data.
Then the three apps behave as follows: enqueuer
stores a queue token in CLOUD_STAGING_QUEUE
and periodically checks the CDS API is the data is ready for download.
When data is ready to download, enqueuer
places a queue token in CLOUD_STAGING_SCHEDULE
, indicating the data is ready for download, and returns a 'success' message.
ecwmf_downloader
then downloads the data and stores it in the CLOUD_STAGING_RAW
directory.
This repo also allows the user to specify a PROVIDER
environment variable, making the docker container flexible to different cloud service ecosystems.
The code at h2ox/provider allows utilities specific to different cloud services providers to be imported in a flexible way.
Google Cloud Platform (GCP) is provided as a full implementation, but other cloud service providers could be added.
The Copernicus Data Store serves era5land data using the CDS API library.
To protect the CDS API and to schedule repeated updates, this app schedules requests to the CDS queue.
Users of this app will need to request credentialed access to the CDS API.
Then, to use this app and the CDS API, the user needs specify the URL and API-key in: ~/.cdsapirc
.
A slackbot messenger is also implemented to post updates to a slack workspace.
Follow these instuctions to set up a slackbot user, and then set the SLACKBOT_TOKEN
and SLACKBOT_TARGET
environment variables.
The three different services require environment variables to target the various cloud and ECMWF resources.
PROVIDER=<GCP|e.g. AWS> # a string to tell the app which utilities to use in src/h2ox/provider
CLOUD_STAGING_QUEUE=<gs://path/to/queue/tokens/> # path to the tokens for enqueued data request
CLOUD_STAGING_SCHEDULE=<gs://path/to/download/staging/tokens/> # path to the tokens for data which had been stages
CLOUD_STAGING_RAW=<gs://path/to/raw/ncdata/files/> # path to the raw staged .nc files
SLACKBOT_TOKEN=<my-slackbot-token> # a token for a slack-bot messenger
SLACKBOT_TARGET=<my-slackbot-target> # target channel to issue ingestion updates
CDSAPI_URL=<url-included-in-cds-credentials> # the url used to access the CDS api
CDSAPI_KEY=<key-included-in-cds-credentials> # the key to access the CDS api
TARGET=<gs://my/era5/zarr/archive> # the cloud path for the zarr archive
ZERO_DT=<YYYY-mm-dd> # the initial date offset of the zarr archive
N_WORKERS=<int> # the number of workers the cloud machine should use for data ingestion.
To set each app, the Docker service needs to be given the MAIN
argument, which is the filepath to the app root directory, e.g. for the enqueuer:
docker build -t <my-tag> --build-arg', MAIN=apps/enqueuer .
Cloudbuild container registery services can also be targeted at forks of this repository. The cloudbuild service will need to provide the MAIN
build argument.
To run the docker container, the environment variables can be passed as a .env
file:
docker run --env-file=.env -t <my-tag>
xarray can be used with a zarr backend to lazily access very large zarr archives.
ERA5Land can be cited as:
Muñoz Sabater, J., (2019): ERA5-Land hourly data from 1981 to present. Copernicus Climate Change Service (C3S) Climate Data Store (CDS). 10.24381/cds.e2161bac
Our Wave2Web submission can be cited as:
Kruitwagen, L., Arderne, C., Lees, T., Thalheimer, L, Kuzma, S., & Basak, S. (2022): Wave2Web: Near-real-time reservoir availability prediction for water security in India. Preprint submitted to EarthArXiv, doi: 10.31223/X5V06F. Available at https://eartharxiv.org/repository/view/3381/