Skip to content

Commit

Permalink
Merge pull request #12 from ESMValGroup/episode03_configuration
Browse files Browse the repository at this point in the history
Update episode03_configuration
  • Loading branch information
Peter9192 authored Jun 30, 2020
2 parents 203c592 + cfbc4f9 commit a674b4c
Showing 1 changed file with 209 additions and 54 deletions.
263 changes: 209 additions & 54 deletions _episodes/03-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,91 +3,246 @@ title: "Configuration"
teaching: 0
exercises: 0
questions:
- "What is user configuration file and how can I use it?"
- What is the user configuration file and how should I use it?

objectives:
- "Understand the data directories structure"
- "Configure ESMValTool to ignore some settings"
- Understand the contents of the user-config.yml file
- Prepare a personalized user-config.yml file
- Configure ESMValTool to use some settings

keypoints:
- "The ``config-user.yml`` file tells ESMValTool what data are input"
- "The ``config-user.yml`` file tells ESMValTool what directory is the destination"
- The ``config-user.yml`` tells ESMValTool where to find input data.
- "``rootpath`` defines the root directory for the input data."
- "``output_dir`` defines the destination directory."

---

## The configuration file

The ``config-user.yml`` configuration file contains all the global level information needed by ESMValTool to run.
This is an (YAML file) [https://yaml.org/spec/1.2/spec.html]. An example configuration file can be found in the root directory of the ESMValTool repository.
Make a copy and rename it to ``config-user.yml``:
The ``config-user.yml`` configuration file contains all the global level information
needed by ESMValTool to run. This is an
[YAML file](https://yaml.org/spec/1.2/spec.html). An example configuration file
can be found in the root directory of the ESMValTool repository:
[config-user-example.yml](https://github.com/ESMValGroup/ESMValTool/blob/master/config-user-example.yml).

First, we make a working directory ``esmvaltool_tutorial``.
In a new terminal, run:

~~~bash
mkdir esmvaltool_tutorial
cd esmvaltool_tutorial
~~~
cp config-user-example.yml config-user.yml

Now, we download the configuration file to our working directory.
To do that, click on
[this link](https://raw.githubusercontent.com/ESMValGroup/ESMValTool/master/config-user-example.yml)
to see a raw version of the file, right-click and press ``save as``,
then you can rename it to ``config-user.yml``and save it into the working directory
``esmvaltool_tutorial``.

Now, let's change our working directory in a terminal window to ``esmvaltool_tutorial``.
Then, we run a text editor called Nano to have a look inside the configuration file:

~~~bash
nano config-user.yml
~~~
{: .source}

This file contains the information for:
* Rootpaths to the data from different projects
* Directory structure for input data
* Number of available CPUs
* Destination directory
* Auxiliary data directory
* Output settings

## Rootpaths to input data
ESMValTool uses several categories (in ESMValTool, this is referred to as projects) for input data based on their source, like CMIP for dataset from climate model intercomparison project, and OBS for observational dataset that adhere to (CMOR standard)[https://cmor.llnl.gov/].
For each category, you can define either one path or several pathes as a list.
In this lesson, you work with data from (CMIP5)[https://esgf-node.llnl.gov/projects/cmip5/].
Add the root path of the folder where you downloaded the data during the (Setup)[https://escience-academy.github.io/lesson-esmvaltool/setup.html].

- Rootpath to input data
- Directory structure for the data from different projects
- Number of tasks that can be run in parallel
- Destination directory
- Auxiliary data directory
- Output settings

> ## Text editor side note
>
> No matter what editor you use, you will need to know where it searches
> for and saves files. If you start it from the shell, it will (probably)
> use your current working directory as its default location. We use ``nano``
> in examples here because it is one of the least complex text editors.
> Press <kbd>ctrl</kbd> + <kbd>O</kbd> to save the file,
> and then <kbd>ctrl</kbd> + <kbd>X</kbd> to exit ``nano``.
{: .callout}

## Rootpath to input data

ESMValTool uses several categories (in ESMValTool, this is referred to as projects)
for input data based on their source. The current categories in the configuration
file are mentioned below. For example, CMIP is used for a dataset from
the climate model intercomparison project whereas OBS is used for an observational dataset.
We can find more information about the projects in the ESMValTool
[documentation](https://docs.esmvaltool.org/en/latest/input.html).
The ``rootpath`` specifies the directories where ESMValTool will look for input data.
For each category, you can define either one path or several paths as a list.

~~~YAML
rootpath:
CMIP3: [~/cmip3_inputpath1, ~/cmip3_inputpath2]
CMIP5: [~/cmip5_inputpath1, ~/cmip5_inputpath2]
CMIP6: [~/cmip6_inputpath1, ~/cmip6_inputpath2]
OBS: ~/obs_inputpath
OBS6: ~/obs6_inputpath
obs4mips: ~/obs4mips_inputpath
ana4mips: ~/ana4mips_inputpath
native6: ~/native6_inputpath
RAWOBS: ~/rawobs_inputpath
default: ~/default_inputpath
~~~

In this lesson, we will work with data from
[CMIP5](https://esgf-node.llnl.gov/projects/cmip5/).
We add the root path of the folder where our/your data is available.

~~~YAML
rootpath:
...
CMIP5: [~/cmip5_inputpath1, ~/cmip5_inputpath2, ~/escience-academy/test_data]
CMIP5: [~/cmip5_inputpath1, ~/cmip5_inputpath2, ~/esmvaltool_tutorial/data]
~~~
{: .source}

## Auxiliary data directory (used for some additional datasets)
auxiliary_data_dir: ~/auxiliary_data
> ## Setting the correct rootpath
>
> - To get the data (or its correct rootpath), check instruction in
[Setup]({{ page.root }}{% link setup.md %}).
> - For more information about setting the rootpath, see also the ESMValTool
[documentation](https://esmvaltool.readthedocs.io/projects/esmvalcore/en/latest/esmvalcore/datafinder.html).
{: .callout}

The ``auxiliary_data_dir`` setting is the path to place any required
additional auxiliary data files. This method was necessary because certain
Python toolkits such as cartopy will attempt to download data files at run
time, typically geographic data files such as coastlines or land surface maps.
This can fail if the machine does not have access to the wider internet. This
location allows us to tell cartopy (and other similar tools) where to find the
files if they can not be downloaded at runtime. To reiterate, this setting is
not for model or observational datasets, rather it is for data files used in
plotting such as coastline descriptions and so on.
## Directory structure for the data from different projects

Input data can be from various models, observations and reanalysis data that adhere
to the [CF/CMOR standard](https://cmor.llnl.gov/).
The ``drs`` setting describes the file structure.
Let's use ``default`` for ``CMIP5`` in our example here:

## Output settings
~~~YAML
drs:
CMIP5: default
~~~

.. code-block:: yaml
> ## Available drs
>
> The ``drs`` setting describes the file structure for several projects
(e.g. ``CMIP6``, ``CMIP5``, ``obs4mips``, ``OBS6``, ``OBS``) on several key machines
(e.g. ``BADC``, ``CP4CDS``, ``DKRZ``, ``ETHZ``, ``SMHI``, ``BSC``).
For more information about ``drs``, you can visit the ESMValTool
[documentation](https://docs.esmvaltool.org/projects/esmvalcore/en/latest/quickstart/find_data.html#cmor-drs).
{: .callout}

# Diagnostics create plots? [true]/false
write_plots: true
# Diagnositcs write NetCDF files? [true]/false
write_netcdf: true
## Number of parallel tasks

The ``write_plots`` setting is used to inform ESMValTool about your preference
for saving figures. Similarly, the ``write_netcdf`` setting is a boolean which
turns on or off the writing of netCDF files.
This option enables you to perform parallel processing.
You can choose the number of tasks in parallel as
1/2/3/4/... or you can set it to ``null``. That tells
ESMValTool to use the maximum number of available CPUs:

The ```rootpath`` specifies the directories where ESMValTool will look for input
data. Similarly, ``output_dir`` specifies where ESMValTool will store its
output, i.e. figures, data, logs, etc. Make sure to set appropriate paths.
~~~YAML

.. code-block:: yaml
max_parallel_tasks: null
~~~

> ## Set the number of tasks
>
> If you run out of memory, try setting ``max_parallel_tasks`` to 1.
Then, check the amount of memory you need for that by inspecting
the file ``run/resource_usage.txt`` in the output directory.
Using the number there you can increase the number of parallel tasks
again to a reasonable number for the amount of memory available in your system.
{: .callout}

## Destination directory

The destination directory is the rootpath where ESMValTool will store its output,
i.e. figures, data, logs, etc. With every run, ESMValTool automatically generates
a new output folder determined by recipe name, and date and time using
the format: YYYYMMDD_HHMMSS.
This folder contains four further subfolders: ``plots``, ``preproc``, ``run``, ``work``.

You can tailor it for your system using the explanation below.
Let's name our destination directory ``esmvaltool_output`` in the working directory:

.. note::
~~~YAML
output_dir: ./esmvaltool_output
~~~

The ``config-user.yml`` file is specified as argument at run time, so it is
possible to have several available with different purposes: one for
formalised runs, one for debugging, etc...
> ## Content of subfolders
>
> - ``plots``: the location for all plots, split by individual diagnostics and fields.
> - ``preproc``: this folder contains all the preprocessed data and metadata.yml
interface files. Note that by default this directory will be deleted after
each run because most users will only need the results from the diagnostic scripts.
> - ``run``: this folder includes all log files, a copy of the recipe,
a summary of the resource usage, and the settings.yml interface files,
resource_usage.txt and temporary files created by the diagnostic scripts.
> - ``work``: this folder is a place for any diagnostic script results that
are not plots, e.g. files in NetCDF format (depends on the diagnostic script).
>
> We explain more about output in the next
[lesson]({{ page.root }}{% link _episodes/04-toy-example.md %})
{: .callout}

## Auxiliary data directory

The ``auxiliary_data_dir`` setting is the path where any required
additional auxiliary data files are stored. This location allows us to tell
the diagnostic script where to find the files if they can not be downloaded
at runtime. This option should not be used for model or observational datasets, but
for data files (e.g. shape files) used in plotting such as coastline descriptions and so on.

~~~YAML
auxiliary_data_dir: ~/auxiliary_data
~~~

## Output settings

{% include links.md %}
These settings are used to inform ESMValTool about your preference about specific actions.
You can turn on or off the setting by ``true`` or ``false`` values.
Most of these settings are fairly self-explanatory, ie:

~~~YAML
# Diagnostics create plots? [true]/false
write_plots: true
# Diagnositcs write NetCDF files? [true]/false
write_netcdf: true
# Set the console log level debug, [info], warning, error
log_level: info
# Exit on warning (only for NCL diagnostic scripts)? true/[false]
exit_on_warning: false
# Plot file format? [png]/pdf/ps/eps/epsi
output_file_type: png
# Use netCDF compression true/[false]
compress_netcdf: false
# Save intermediary cubes in the preprocessor true/[false]
save_intermediary_cubes: false
# Remove the preproc dir if all fine
remove_preproc_dir: true
# Path to custom config-developer file, to customise project configurations.
# See config-developer.yml for an example. Set to [null] to use the default
# config_developer_file: null
# Get profiling information for diagnostics
# Only available for Python diagnostics
profile_diagnostic: false
~~~

> ## Make your own configuration file
>
> It is possible to have several configuration files with different purposes,
for example: config-user_formalised_runs.yml, config-user_debugging.yml
{: .callout}
>
> ## Saving preprocessed data
>
> In the configuration file, which settings are useful to make sure preprocessed data
is stored when ESMValTool is run?
>
>> ## Solution
>>
>> If the option ``save_intermediary_cubes`` is set to true in
the config-user.yml file, then the intermediary cubes will also be saved
in the folder ``preproc``. Also, if the option ``remove_preproc_dir``
is set to ``false``, then the ``preproc/`` directory contains all
the preprocessed data and the metadata interface files.
> {: .solution}
{: .challenge}

{% include links.md %}

0 comments on commit a674b4c

Please sign in to comment.