diff --git a/docs/source/conf.py b/docs/source/conf.py index bc5887e..6f5fe0c 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -12,17 +12,20 @@ # import os import sys +from datetime import datetime +from axiom import __version__ sys.path.insert(0, os.path.abspath('../..')) # -- Project information ----------------------------------------------------- project = 'Axiom' -copyright = '2021, Ben Schroeter' +year = datetime.utcnow().year +copyright = f'{year}, Ben Schroeter' author = 'Ben Schroeter' # The full version, including alpha/beta/rc tags -release = '0.0.1' +release = __version__ # -- General configuration --------------------------------------------------- diff --git a/docs/source/drs.rst b/docs/source/drs.rst deleted file mode 100644 index b2fc197..0000000 --- a/docs/source/drs.rst +++ /dev/null @@ -1,77 +0,0 @@ -DRS -=== - -Axiom has inbuilt functionality to convert CCAM outputs into Data Reference Syntax (DRS). - -Command-line interface ----------------------- - -.. code-block:: shell - - axiom drs -h - usage: axiom drs [ARGS] input_files [input_files ...] output_directory - - DRS utility - - positional arguments: - input_files Input filepaths - output_directory Output base directory (DRS structure built from here) - - optional arguments: - -h, --help show this help message and exit - -o, --overwrite Overwrite existing output - -s START_YEAR, --start_year START_YEAR - Start year - -e END_YEAR, --end_year END_YEAR - End year - -r INPUT_RESOLUTION, --input_resolution INPUT_RESOLUTION - Input resolution in km, leave blank to auto-detect - from path. - -f [output_frequency ...], --output_frequency [output_frequency ...] - Output frequency, Examples include "12min", "1M" (1 - month) etc. see https://pandas.pydata.org/pandas- - docs/stable/user_guide/timeseries.html#offset-aliases. - -p {DELWP,WINE,ACS,_default,_default_12min}, --project {DELWP,WINE,ACS,_default,_default_12min} - -m {ERA,ERA-NUDGED,ERA5,ACCESS1-0,CCSM4,CNRM-CM5,GFDL-ESM2M,HadGEM2,MIROC5,MPI-ESM-LR,NorESM1-M}, --model {ERA,ERA-NUDGED,ERA5,ACCESS1-0,CCSM4,CNRM-CM5,GFDL-ESM2M,HadGEM2,MIROC5,MPI-ESM-LR,NorESM1-M} - -d [domain ...], --domain [domain ...] - Domains to process, space-separated. - -v [variable ...], --variable [variable ...] - Variables to process, omit to use those defined in - config. - --cordex Process for CORDEX - - -Python API ----------- - -The DRS functionality can be accessed via the Python API. For example: - -.. code-block:: python - - import axiom.drs as drs - import glob - - # Get a list of input files. - input_files = sorted(glob.glob('/path/to/input/files/*.nc')) - - # Call the command - drs.main( - input_files, - output_directory='/path/to/build/drs', # The full DRS structure will be built from here. - start_year=2019, end_year=2019, # A single year - output_frequency='1M', # Monthly frequency - project='DELWP', - model='ACCESS1-0', - variable='tasmax', - domains=['AUS-50'], - cordex=True, - input_resolution=None, # Auto-detect from input files. - overwrite=True # Do not skip existing outputs, overwrite them. - ) - -Error Tolerance ---------------- - -Axiom has built-in error tolerance and recoverability in the form of sandboxed processing of each variable. This ensures that exceptions do not cause the system to crash, allowing subsequent variables to continue processing if a prior variable should fail to process for whatever reason. - -A list of regular expressions is maintained in the drs.json configuration file (recoverable_errors) which allow the system to query stacktraces and cope with any transient errors due to HPC configuration and where a simple re-running of the task will fix the problem without human intervention. \ No newline at end of file diff --git a/docs/source/drs/configuration copy.rst b/docs/source/drs/configuration copy.rst deleted file mode 100644 index c9de97c..0000000 --- a/docs/source/drs/configuration copy.rst +++ /dev/null @@ -1,161 +0,0 @@ -Configuration -============= - -The Axiom DRS subsystem is based on a principal of cascading user configuration, whereby default configuration files are loaded from the Axiom installation's data directory, then OVERRIDDEN with a user-defined configuration file in the user's $HOME/.axiom folder. - -- drs.json -- models.json -- projects.json -- domains.json - -drs.json --------- - -The drs.json file contains all of the configuration options required to drive the DRS subsystem, it also defines some metadata defaults that are applied when building the metadata interpolation context used to create filepaths and metadata keys. - -As there are a lot of keys and there is still rapid development underway it is not possible to describe every setting available in the drs.json file. Instead, users are encouraged to look at (or simply use) the drs.json file included in the axiom/data directory of the repository. - -.. list-table:: - :widths: 10 10 40 40 - :header-rows: 1 - - * - Key - - Type - - Description - - Example - * - time_units - - string - - Reference time units applied to outputs. - - "days since 1949-12-01 00:00:00" - * - reference_time - - string - - Reference time applied to outputs. - - "1949-12-01 00:00:00" - * - dask - - dictionary - - Settings to control the connection to dask. - - See below. - * - dask['enable'] - - boolean - - Enable the dask interface - - true - * - dask['restart_client_between_variables'] - - boolean - - Restart the client between variables. - - true - -models.json ------------ - -The models.json file defines preliminary metadata that is used to build the interpolation context object used in filepaths and metadata. Top-level keys reflect the "model" argument that is passed to the DRS subsystem (either through the CLI or via the Python API) and point to a dictionary of otherwise arbitrary keys (NB: Arbitrary in the sense that nothing is inherently required unless explicitly used or interpolated elsewhere). - -.. code-block:: json - - { - "NCC-NorESM2-MM": { - "model_lower": "noresm2-mm", - "model_short": "norsesm2", - "gcm_model": "NorESM2-MM", - "gcm_institute": "NCC", - "run_type": "Climate change", - "mode": "bias- and variance-corrected sea surface temperatures", - "description": "%(run_type)s run using %(gcm_institute)s-%(gcm_model)s %(experiment)s %(ensemble)s %(mode)s" - } - -See the axiom/data directory for a sample models.json file. - -projects.json -------------- - -The projects.json file defines preliminary metadata that is used to build the interpolation context object used in filepaths and metadata. Top-level keys reflect the "project" argument that is passed to the DRS subsystem (either through the CLI or via the Python API) and point to a dictionary of otherwise arbitrary keys (NB: Arbitrary in the sense that nothing is inherently required unless explicitly used or interpolated elsewhere). - -.. code-block:: json - - { - "CORDEX-CMIP6": { - "base": "surf.ccam_%(res_km)skm", - "project_lower": "acs", - "rcp": "TBA", - "experiment": "", - "project_long": "2021 Climate and Resiliences Service Australia", - "variables_2d": [ - "pr", - "ps", - "ts", - "clh", - "cll", - "clm", - "clt", - "prc", - "prw", - "psl", - "sic", - "snc", - "snd", - "snm", - "snw", - "tas", - "uas", - "vas", - "hfls", - "hfss", - "hurs", - "huss", - "mrro", - "mrso", - "orog", - "prsn", - "rlds", - "rlut", - "rsds", - "rsdt", - "rsus", - "rsut", - "sund", - "tauu", - "tauv", - "zmla", - "clivi", - "clwvi", - "mrfso", - "mrros", - "sftlf", - "ta200", - "ta500", - "ta850", - "ua200", - "ua500", - "ua850", - "va200", - "va500", - "va850", - "zg200", - "zg500", - "hus850", - "prhmax", - "tasmax", - "tasmin", - "evspsbl", - "sfcWind", - "evspsblpot", - "sfcWindmax" - ], - "variables_3d": {}, - "variables_fixed": [ - "orog", - "sftlf", - "sftlaf", - "srfurf", - "sfturf" - ] - } - } - -See the axiom/data directory for a sample projects.json file. - -domains.json ------------- - -The domains.json file specifies keyed domain directives that are accessed through the CLI or Python API. - -See the axiom/data directory for details. \ No newline at end of file diff --git a/docs/source/custom_processors.rst b/docs/source/drs/custom_processors.rst similarity index 100% rename from docs/source/custom_processors.rst rename to docs/source/drs/custom_processors.rst diff --git a/docs/source/performance.rst b/docs/source/drs/performance.rst similarity index 100% rename from docs/source/performance.rst rename to docs/source/drs/performance.rst diff --git a/docs/source/drs_configuration.rst b/docs/source/drs_configuration.rst deleted file mode 100644 index 1d57f71..0000000 --- a/docs/source/drs_configuration.rst +++ /dev/null @@ -1,161 +0,0 @@ -DRS Configuration -================= - -The Axiom DRS subsystem is based on a principal of cascading user configuration, whereby default configuration files are loaded from the Axiom installation's data directory, then OVERRIDDEN with a user-defined configuration file in the user's $HOME/.axiom folder. - -- drs.json -- models.json -- projects.json -- domains.json - -drs.json --------- - -The drs.json file contains all of the configuration options required to drive the DRS subsystem, it also defines some metadata defaults that are applied when building the metadata interpolation context used to create filepaths and metadata keys. - -As there are a lot of keys and there is still rapid development underway it is not possible to describe every setting available in the drs.json file. Instead, users are encouraged to look at (or simply use) the drs.json file included in the axiom/data directory of the repository. - -.. list-table:: - :widths: 10 10 40 40 - :header-rows: 1 - - * - Key - - Type - - Description - - Example - * - time_units - - string - - Reference time units applied to outputs. - - "days since 1949-12-01 00:00:00" - * - reference_time - - string - - Reference time applied to outputs. - - "1949-12-01 00:00:00" - * - dask - - dictionary - - Settings to control the connection to dask. - - See below. - * - dask['enable'] - - boolean - - Enable the dask interface - - true - * - dask['restart_client_between_variables'] - - boolean - - Restart the client between variables. - - true - -models.json ------------ - -The models.json file defines preliminary metadata that is used to build the interpolation context object used in filepaths and metadata. Top-level keys reflect the "model" argument that is passed to the DRS subsystem (either through the CLI or via the Python API) and point to a dictionary of otherwise arbitrary keys (NB: Arbitrary in the sense that nothing is inherently required unless explicitly used or interpolated elsewhere). - -.. code-block:: json - - { - "NCC-NorESM2-MM": { - "model_lower": "noresm2-mm", - "model_short": "norsesm2", - "gcm_model": "NorESM2-MM", - "gcm_institute": "NCC", - "run_type": "Climate change", - "mode": "bias- and variance-corrected sea surface temperatures", - "description": "%(run_type)s run using %(gcm_institute)s-%(gcm_model)s %(experiment)s %(ensemble)s %(mode)s" - } - -See the axiom/data directory for a sample models.json file. - -projects.json -------------- - -The projects.json file defines preliminary metadata that is used to build the interpolation context object used in filepaths and metadata. Top-level keys reflect the "project" argument that is passed to the DRS subsystem (either through the CLI or via the Python API) and point to a dictionary of otherwise arbitrary keys (NB: Arbitrary in the sense that nothing is inherently required unless explicitly used or interpolated elsewhere). - -.. code-block:: json - - { - "CORDEX-CMIP6": { - "base": "surf.ccam_%(res_km)skm", - "project_lower": "acs", - "rcp": "TBA", - "experiment": "", - "project_long": "2021 Climate and Resiliences Service Australia", - "variables_2d": [ - "pr", - "ps", - "ts", - "clh", - "cll", - "clm", - "clt", - "prc", - "prw", - "psl", - "sic", - "snc", - "snd", - "snm", - "snw", - "tas", - "uas", - "vas", - "hfls", - "hfss", - "hurs", - "huss", - "mrro", - "mrso", - "orog", - "prsn", - "rlds", - "rlut", - "rsds", - "rsdt", - "rsus", - "rsut", - "sund", - "tauu", - "tauv", - "zmla", - "clivi", - "clwvi", - "mrfso", - "mrros", - "sftlf", - "ta200", - "ta500", - "ta850", - "ua200", - "ua500", - "ua850", - "va200", - "va500", - "va850", - "zg200", - "zg500", - "hus850", - "prhmax", - "tasmax", - "tasmin", - "evspsbl", - "sfcWind", - "evspsblpot", - "sfcWindmax" - ], - "variables_3d": {}, - "variables_fixed": [ - "orog", - "sftlf", - "sftlaf", - "srfurf", - "sfturf" - ] - } - } - -See the axiom/data directory for a sample projects.json file. - -domains.json ------------- - -The domains.json file specifies keyed domain directives that are accessed through the CLI or Python API. - -See the axiom/data directory for details. \ No newline at end of file diff --git a/docs/source/index.rst b/docs/source/index.rst index 8ccd888..77a0720 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -9,13 +9,14 @@ Axiom is a Python library and command-line utility for verifying metadata agains about installation + cli .. toctree:: :maxdepth: 2 :caption: Metadata metadata - schemas + schemas/schemas conversion validation reporting @@ -24,7 +25,13 @@ Axiom is a Python library and command-line utility for verifying metadata agains :maxdepth: 2 :caption: Data Reference Syntax (DRS) - payloads + drs/index + drs/configuration + drs/payloads + drs/fault-tolerance + drs/typical-workflow + drs/custom_processors + drs/performance .. toctree:: :maxdepth: 2 diff --git a/docs/source/installation.rst b/docs/source/installation.rst index da29046..b249e42 100644 --- a/docs/source/installation.rst +++ b/docs/source/installation.rst @@ -3,9 +3,7 @@ Installation Axiom is installed via pip and should be installed inside a virtual environment such as conda. -As the project matures, these steps will become more automated and familiar. - -Create environment +Install Axiom from scratch. ------------------ .. code-block:: shell @@ -14,34 +12,27 @@ Create environment conda create -n axiom_dev conda activate axiom_dev + # Install Axiom from pip + pip install acs-axiom + -Install Axiom +Install Axiom on NCI ------------- .. code-block:: shell - # Clone the repository - git clone git@github.com:AusClimateService/axiom.git - - # Navigate to the local copy - cd axiom - - # Install - pip install . - - # Move back up - cd .. + # Select the hh5 modules + module use /g/data/hh5/public/modules + # Load the module + module load conda/analysis3 -Install Axiom Schemas ---------------------- - -Most of the utilities dependencies will be installed automatically, with the exception of the Axiom Schemas component, which must be installed separately. - -.. code-block:: shell + # Create a virtual environment and activate it + conda create -n axiom_dev pip + conda activate axiom_dev - # Clone Axiom Schemas - git clone git@github.com:AusClimateService/axiom-schemas.git + # Install Axiom + pip install acs-axiom - cd axiom_schemas - pip install -e . \ No newline at end of file + # Alternatively, install to user space + pip install --user acs-axiom \ No newline at end of file diff --git a/docs/source/payloads.rst b/docs/source/payloads.rst deleted file mode 100644 index 39ec140..0000000 --- a/docs/source/payloads.rst +++ /dev/null @@ -1,59 +0,0 @@ -Payloads -======== - -The payload consumption system of Axiom was developed in response to the overwhelming number of command-line arguments required to correctly configure a DRS processing instance. Rather than supplying a myriad of arguments, a user can supply a payload (JSON) file with an expected structure to initiate a DRS processing task. This approach has the added benefit of providing a mechanism for a decoupled workflow whereby a model simulation could write a payload file in a known location to be picked up periodically by another process, essentially acting as a simple message queue. - - -Anatomy of a Payload --------------------- - -.. list-table:: - :widths: 25 25 50 - :header-rows: 1 - - * - Key - - Type - - Description - * - input_files - - REQUIRED - - A globbable path to input files for processing. - * - output_directory - - REQUIRED - - Destination path from which DRS structure will be built. - * - start_year - - REQUIRED - - First year to process. - * - end_year - - REQUIRED - - Last year to process (set to same value as start_year to process 1 year). - * - output_frequency - - REQUIRED - - Desired output frequency of output data, following the syntax of https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases - * - project - - REQUIRED - - Project key to read from projects.json for metadata. - * - model - - REQUIRED - - Model key to read from models.json for metadata. - * - domain - - REQUIRED - - Domain key to read from domains.json, or a domain directive. - * - variables - - REQUIRED - - A list of variable names to process. An empty list will attempt to process all variables described by the schema references in drs.json. - * - input_resolution - - OPTIONAL - - The input resolution in km of the input data. Leaving this blank will attempt to auto-detect the resolution from the input paths. It is best to provide the input resolution if known. - -Any additional key/value pairs will be added to the processing context, which at the very least will be added to the output metadata, but may otherwise affect certain processing logic (usually in the case of custom pre/postprocessors) or interpolation templates described by drs.json. - - -Generating Payloads -------------------- - -Payloads can be generated in a number of ways. - -1. Hand-writing JSON files matching the correct format. -2. Using the ``Payload`` class located in ``axiom.drs.payloads``. -3. Programmatically using the ``generate_payloads`` method in ``axiom.drs.payloads``. -4. Using the command-line utility ``axiom drs_gen_payloads``. \ No newline at end of file diff --git a/docs/source/schemas.rst b/docs/source/schemas.rst deleted file mode 100644 index fa3e81e..0000000 --- a/docs/source/schemas.rst +++ /dev/null @@ -1,109 +0,0 @@ -.. _schemas: -Metadata Schemas -================ - -A metadata schema (also referred to as a "specification") is a configuration file which defines the rules and standards that metadata must conform to in order to "meet the standard" and pass validation checks. They too are written as a JSON configuration file following a strict format and using validation rules defined by the Cerberus validation library, a subsystem used by Axiom. - -https://docs.python-cerberus.org/en/stable/index.html - -A metadata schema follows a format as follows: - -.. code-block:: json - - { - "name": "Name of the specification", - "version: "0.1.0", - "description": "A description for the specification.", - "contact": "The person who can be contacted regarding the specification.", - "contact_email": "contact@example.com", - "_global": {}, - "variables": {} - } - -The main points of the schema are listed under the "_global" and "variables" keys, which are expanded as Cerberus validation dictionaries, however, the other items are required by Axiom to properly process the schema. Additional entries in the header are allowed but ignored by Axiom. - -Global attributes -~~~~~~~~~~~~~~~~~ - -Global attributes are listed in the "_global" key of the schema JSON file, with all child keys evaluated using the Cerberus validation subsystem. The format is a key-value pair of attribute names (key) and attribute rules (value), for example: - -.. code-block:: json - - { - "_global": { - "author": {"type": "string"}, - "description": {"type": "string"}, - "date_created": {"type": "datetime"} - } - } - -Validation rules can be found at https://docs.python-cerberus.org/en/stable/validation-rules.html. Note: all global metadata keys are required by default, so there is no need to add ``"required": true`` to the validation rules for a given metadata key. If an existing key is not required or can take multiple forms, consider either omitting it from the specification, or updating the specification to enforce a standard. The latter is preferable. - -Variable attributes -~~~~~~~~~~~~~~~~~~~ - -Variable attributes are defined in much the same way as global attributes, with the exception of being nested under their variable name. - -For example: - -.. code-block:: json - - { - "variables": { - "t2": { - "units": {"type": "string", "allowed": ["K", "C"]}, - "description": {"type": "string"} - } - } - } - -Again, all attributes defined in a variable's schema are required by default. - -Default variable attribute --------------------------- - -There is one special configuration option for variable schemas, the ``_default`` configuration option. If a schema provides a ``_default`` configuration option, it sets a base set of validation rules which all other variables inherit or override. - -For example: - -.. code-block:: json - - { - "variables": { - "_default": { - "units" : {"type": "string"}, - "description" : {"type": "string"}, - "standard_name": {"type": "string"}, - "long_name": {"type": "string"} - } - } - } - - -This example will enforce all variables to require units, description, standard_name and long_name as metadata attributes, unless they provide their own set of rules. - -Putting it all together -~~~~~~~~~~~~~~~~~~~~~~~ - -Using the above examples, the complete metadata schema could take the following form: - -.. code-block:: json - - { - "name": "My specification", - "version: "0.1.0", - "description": "A simple specification.", - "contact": "John Smith", - "contact_email": "john.smith@example.com", - "_global": { - "author": {"type": "string"}, - "description": {"type": "string"}, - "date_created": {"type": "datetime"} - }, - "variables": { - "t2": { - "units": {"type": "string", "allowed": ["K", "C"]}, - "description": {"type": "string"} - } - } - } \ No newline at end of file