Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a notebook demonstrating how to find and download files from ESGF #2265

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
387 changes: 387 additions & 0 deletions notebooks/search-and-download-files.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,387 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "12dfb0f0-64d9-4c95-942d-c1ad1290cae3",
"metadata": {},
"source": [
"# Finding and downloading files"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "e26927eb-c121-4660-b2a1-a3548dd8dd97",
"metadata": {},
"outputs": [],
"source": [
"from esmvalcore.esgf.facets import FACETS\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"id": "c453def8-0262-4d46-9c58-389eb6610cdd",
"metadata": {},
"source": [
"ESMValCore provides unified names for facets. The mapping to the names used on ESGF is the following:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "d32be466-a972-4350-8c71-6ee11849c3cb",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>CMIP3</th>\n",
" <th>CMIP5</th>\n",
" <th>CMIP6</th>\n",
" <th>CORDEX</th>\n",
" <th>obs4MIPs</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>dataset</th>\n",
" <td>model</td>\n",
" <td>model</td>\n",
" <td>source_id</td>\n",
" <td>rcm_name</td>\n",
" <td>source_id</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ensemble</th>\n",
" <td>ensemble</td>\n",
" <td>ensemble</td>\n",
" <td>member_id</td>\n",
" <td>ensemble</td>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>exp</th>\n",
" <td>experiment</td>\n",
" <td>experiment</td>\n",
" <td>experiment_id</td>\n",
" <td>experiment</td>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>frequency</th>\n",
" <td>time_frequency</td>\n",
" <td>time_frequency</td>\n",
" <td></td>\n",
" <td>time_frequency</td>\n",
" <td>time_frequency</td>\n",
" </tr>\n",
" <tr>\n",
" <th>short_name</th>\n",
" <td>variable</td>\n",
" <td>variable</td>\n",
" <td>variable</td>\n",
" <td>variable</td>\n",
" <td>variable</td>\n",
" </tr>\n",
" <tr>\n",
" <th>institute</th>\n",
" <td></td>\n",
" <td>institute</td>\n",
" <td>institution_id</td>\n",
" <td>institute</td>\n",
" <td>institute</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mip</th>\n",
" <td></td>\n",
" <td>cmor_table</td>\n",
" <td>table_id</td>\n",
" <td></td>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>product</th>\n",
" <td></td>\n",
" <td>product</td>\n",
" <td></td>\n",
" <td>product</td>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>activity</th>\n",
" <td></td>\n",
" <td></td>\n",
" <td>activity_drs</td>\n",
" <td></td>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>grid</th>\n",
" <td></td>\n",
" <td></td>\n",
" <td>grid_label</td>\n",
" <td></td>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>driver</th>\n",
" <td></td>\n",
" <td></td>\n",
" <td></td>\n",
" <td>driving_model</td>\n",
" <td></td>\n",
" </tr>\n",
" <tr>\n",
" <th>domain</th>\n",
" <td></td>\n",
" <td></td>\n",
" <td></td>\n",
" <td>domain</td>\n",
" <td></td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" CMIP3 CMIP5 CMIP6 CORDEX \\\n",
"dataset model model source_id rcm_name \n",
"ensemble ensemble ensemble member_id ensemble \n",
"exp experiment experiment experiment_id experiment \n",
"frequency time_frequency time_frequency time_frequency \n",
"short_name variable variable variable variable \n",
"institute institute institution_id institute \n",
"mip cmor_table table_id \n",
"product product product \n",
"activity activity_drs \n",
"grid grid_label \n",
"driver driving_model \n",
"domain domain \n",
"\n",
" obs4MIPs \n",
"dataset source_id \n",
"ensemble \n",
"exp \n",
"frequency time_frequency \n",
"short_name variable \n",
"institute institute \n",
"mip \n",
"product \n",
"activity \n",
"grid \n",
"driver \n",
"domain "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame.from_dict(FACETS).fillna('')"
]
},
{
"cell_type": "markdown",
"id": "777bcbe6-1c95-4517-9cb9-0fb02e877d98",
"metadata": {},
"source": [
"## Finding files"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "760df6c9-0d8e-41ab-b570-39b0f43aedd4",
"metadata": {},
"outputs": [],
"source": [
"from esmvalcore.esgf import find_files"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "86b45f45-be53-4e96-a2e5-931cd9a0aab3",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"165"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"files = find_files(\n",
" project='CMIP6',\n",
" mip='Amon',\n",
" short_name='tas',\n",
" dataset='AWI-CM-1-1-MR',\n",
" exp='historical',\n",
" ensemble='r1i1p1f1',\n",
")\n",
"len(files)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "aeec9582-b935-4340-a5ba-e970cea6117d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[ESGFFile:CMIP6/CMIP/AWI/AWI-CM-1-1-MR/historical/r1i1p1f1/Amon/tas/gn/v20200720/tas_Amon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_185001-185012.nc on hosts ['esgf-data1.llnl.gov', 'esgf.ceda.ac.uk', 'esgf-data04.diasjp.net', 'esgf3.dkrz.de'],\n",
" ESGFFile:CMIP6/CMIP/AWI/AWI-CM-1-1-MR/historical/r1i1p1f1/Amon/tas/gn/v20200720/tas_Amon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_185101-185112.nc on hosts ['esgf-data1.llnl.gov', 'esgf.ceda.ac.uk', 'esgf-data04.diasjp.net', 'esgf3.dkrz.de']]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"files[:2]"
]
},
{
"cell_type": "markdown",
"id": "173a7b7b-ce9b-49a5-bb50-d235ecfc95bc",
"metadata": {},
"source": [
"## Downloading files"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "34ec6cde-71a1-4390-ae15-73c4f13092e5",
"metadata": {},
"outputs": [],
"source": [
"from pathlib import Path\n",
"from esmvalcore.esgf import download"
]
},
{
"cell_type": "markdown",
"id": "4359bf3d-45b7-4f73-9e26-7bbe998c711d",
"metadata": {},
"source": [
"Download the first two files from the list in parallel:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "215eafe3-43ab-4114-b36c-66ad2ec654e6",
"metadata": {},
"outputs": [],
"source": [
"download_dir = Path.home() / 'climate_data'\n",
"download(files[:2], dest_folder=download_dir, n_jobs=2)"
]
},
{
"cell_type": "markdown",
"id": "7a36710b-3395-4b20-82ba-b17ef93ae205",
"metadata": {},
"source": [
"Check that the download succeeded:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "dd6214c2-bd93-4dc9-87df-85c4a1917237",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"LocalFile('/home/bandela/climate_data/CMIP6/CMIP/AWI/AWI-CM-1-1-MR/historical/r1i1p1f1/Amon/tas/gn/v20200720/tas_Amon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_185001-185012.nc')"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"esgf_file = files[0]\n",
"local_file = esgf_file.local_file(dest_folder=download_dir)\n",
"local_file"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "ea3e8602-4648-4a1e-8331-74eec002fefd",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"local_file.exists()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}