ESGF Virtual Aggregation

Remote data access to Virtual Analysis Ready Data (Virtual ARD) for climate datasets of the ESGF.

Run the demo , check this Pangeo Showcase or see run your own ESGF Virtual Aggregation.

Important - The ESGF Virtual Aggregation depends on ESGF data nodes being available. This is not the case half of the time, expect errors when trying to load datasets. Check the status of ESGF data nodes here.

Rationale

The ESGF is a federated file distribution service for climate data. Remote data access and virtual datasets are possible through OPeNDAP and netCDF-java, available by default in all ESGF nodes. However, these capabilities have never been used. This provides:

Analysis Ready Data (ARD) in the form of virtual datasets, that is, no data duplication needed.
Remote data access without the need to download files. Open an URL and get direct access to an analytical data cube.

Run your own ESGF Virtual Aggregation

The ESGF Virtual Aggregation data involves two steps:

Query ESGF fedeartion for metadata and store it in a local SQL database.
Generate virtual aggregations (NcMLs) from the SQL database.

ESGF Virtual Aggregation is fully customizable via selection files. See the sample file selection-sample.

The following code generates the metadata SQL database from the selection-sample file.

python search.py -d sample.db -s selection-sample

Now, generate the virtual aggregations (both esgf_dataset and esgf_ensemble) from the database using 4 parallel jobs.

python ncmls.py -j4 --database sample.db -p esgf_ensemble

You will find that the virtual aggregations are NcML files. You will need a client based on netCDF-java to read them or you can also set up a TDS server and read via OpenDAP. See next section.

Run your own server

A THREDDS Data Server (TDS) with access to the ESGF Virtual Aggregation datasets is available at https://hub.ipcc.ifca.es/thredds.

You may deploy your own THREDDS Data Server and perform remote data analysis on the ESGF Virtual Aggregation dataset.

docker run -p 8080:8080 -v $(pwd)/content:/usr/local/tomcat/content/thredds unidata/thredds-docker:5.0-beta7

Now, visit localhost:8080/thredds and inspect the server's directory. You may download the NcML from the HTTPServer endpoint or use the OpenDAP service to get the OpenDAP URL (it should look like http://localhost:8080/thredds/dodsC/...).

The OpenDAP service may be used to perform remote data analysis using xarray.

import xarray,dask

dask.config.set(scheduler="processes")

url = "http://localhost:8080/thredds/dodsC/esgeva/demo/CMIP6_CMIP_AS-RCEC_TaiESM1_historical_day_tas_gn_v20200626_esgf.ceda.ac.uk.ncml"
ds = xarray.open_dataset(url).chunk({"time": 100})

# query the size of the dataset on the server side
ds.attrs["size_human"]

# view the variant_label coordinate
ds["variant_label"][...].compute()

# compute spatial mean for all variant_labels
# this involves transferring the necessary data from the server
means = ds["tas"].mean(["lat", "lon"]).compute()
means

See the notebooks for usage and reproducibility.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
content/thredds		content/thredds
figures		figures
kerchunks		kerchunks
templates		templates
CITATION.cff		CITATION.cff
CMIP6_ScenarioMIP_CNRM-CERFACS_CNRM-CM6-1_ssp245_day_tas_gr_v20190410_aims3.llnl.gov.zip		CMIP6_ScenarioMIP_CNRM-CERFACS_CNRM-CM6-1_ssp245_day_tas_gr_v20190410_aims3.llnl.gov.zip
README.md		README.md
demo.ipynb		demo.ipynb
environment.yml		environment.yml
get_times.py		get_times.py
model_evaluation.ipynb		model_evaluation.ipynb
ncml_dist.csv.zip		ncml_dist.csv.zip
ncmls.py		ncmls.py
performance.ipynb		performance.ipynb
results.csv		results.csv
results.ipynb		results.ipynb
search.py		search.py
selection-performance		selection-performance
selection-sample		selection-sample
stats.ipynb		stats.ipynb
stats.zip		stats.zip
tas4d.png		tas4d.png
validation.ipynb		validation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ESGF Virtual Aggregation

Rationale

Run your own ESGF Virtual Aggregation

Run your own server

About

Releases 3

Packages

Languages

zequihg50/esgf-virtual-aggregation

Folders and files

Latest commit

History

Repository files navigation

ESGF Virtual Aggregation

Rationale

Run your own ESGF Virtual Aggregation

Run your own server

About

Resources

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages