Skip to content

Commit

Permalink
Add docstrings and docs pages for RequiredDataValidator and `DataVa…
Browse files Browse the repository at this point in the history
…lidator` (#470)

* Add docstrings and docs pages for `RequiredDataValidator` and `DataValidator`

* Update docs/user_guide/data-validation.rst

Co-authored-by: Daniel Huppmann <[email protected]>

* Update docs/user_guide/data-validation.rst

Co-authored-by: Daniel Huppmann <[email protected]>

* Update docs/user_guide/data-validation.rst

Co-authored-by: Daniel Huppmann <[email protected]>

* Update docs/user_guide/data-validation.rst

Co-authored-by: Daniel Huppmann <[email protected]>

* Create subpages and ToC

* Change title to Validation

* Update docs/user_guide/validation/data-validation.rst

Co-authored-by: Daniel Huppmann <[email protected]>

* Update docs/user_guide/validation/required-data-validation.rst

Co-authored-by: Daniel Huppmann <[email protected]>

---------

Co-authored-by: David Almeida <[email protected]>
Co-authored-by: Daniel Huppmann <[email protected]>
  • Loading branch information
3 people authored Feb 21, 2025
1 parent e3541cc commit 4e5d9a9
Show file tree
Hide file tree
Showing 7 changed files with 160 additions and 1 deletion.
1 change: 1 addition & 0 deletions docs/user_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,4 @@ the RegionProcessor and validated using DataStructureDefinition.
user_guide/model-registration
user_guide/config
user_guide/local-usage
user_guide/validation
17 changes: 17 additions & 0 deletions docs/user_guide/validation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
.. _validation:

.. currentmodule:: nomenclature

Validation
==========

The **nomenclature** package allows users to validate IAMC data in several ways.

For this, validation requirements and criteria can be specified in YAML configuration
files.

.. toctree::
:maxdepth: 1

validation/data-validation
validation/required-data-validation
64 changes: 64 additions & 0 deletions docs/user_guide/validation/data-validation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
.. _data-validation:

.. currentmodule:: nomenclature

Data validation
===============

**Data validation** checks if timeseries data values are within specified ranges.

Consider the example below:

.. code:: yaml
- variable: Primary Energy
year: 2010
validation:
- upper_bound: 5
lower_bound: 1
- warning_level: low
upper_bound: 2.5
lower_bound: 1
- variable: Primary Energy|Coal
year: 2010
value: 5
rtol: 2
atol: 1
Each criteria item contains **data filter arguments** and **validation arguments**.

Data filter arguments include: ``model``, ``scenario``, ``region``, ``variable``,
``unit``, and ``year``.
For the first criteria item, the data is filtered for variable *Primary Energy*
and year 2010.

The ``validation`` arguments include: ``upper_bound``/``lower_bound`` *or*
``value``/``rtol``/``atol`` (relative tolerance, absolute tolerance). Only one
of the two can be set for each ``warning_level``.
The possible levels are: ``error``, ``high``, ``medium``, or ``low``.
For the same data filters, multiple warning levels with different criteria each
can be set. These must be listed in descending order of severity, otherwise a
``ValidationError`` is raised.
In the example, for the first criteria item, the validation arguments are set
for warning level ``error`` (by default, in case of omission) and ``low``,
using bounds.
Flagged datapoints are skipped for lower severity warnings in the same criteria
item (e.g.: if datapoints are flagged for the ``error`` level, they will not be
checked again for ``low``).

The second criteria item (for variable *Primary Energy|Coal*) uses the old notation.
Its use is deprecated for being more verbose (requires each warning level to be
a separate criteria item) and slower to process.

Standard usage
--------------

Run the following in a Python script to check that an IAMC dataset has valid data.

.. code-block:: python
from nomenclature.processor import DataValidator
# ...setting directory/file paths and loading dataset
DataValidator.from_file(data_val_yaml).apply(df)
43 changes: 43 additions & 0 deletions docs/user_guide/validation/required-data-validation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
.. _required-data-validation:

.. currentmodule:: nomenclature

Required data validation
========================

**Required data validation** checks if certain models, variables, regions and/or
periods of time are covered in the timeseries data.

For this, a configuration file specifies the model(s) and dimension(s) expected
in the dataset. These are ``variable``, ``region`` and/or ``year``.
Alternatively, instead of using ``variable``, it is possible to declare measurands,
which jointly specify variables and units.

.. code:: yaml
description: Required variables for running MAGICC
model: model_a
required_data:
- measurand:
Emissions|CO2:
unit: Mt CO2/yr
region: World
year: [2020, 2030, 2040, 2050]
In the example above, for *model_a*, the dataset must include datapoints of the
variable *Emissions|CO2* (measured in *Mt CO2/yr*), in the region *World*, for the
years 2020, 2030, 2040 and 2050.

Standard usage
--------------

Run the following in a Python script to check that an IAMC dataset has valid
required data.

.. code-block:: python
from nomenclature import RequiredDataValidator
# ...setting directory/file paths and loading dataset
RequiredDataValidator.from_file(req_data_yaml).apply(df)
18 changes: 18 additions & 0 deletions nomenclature/processor/data_validator.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,24 @@ def from_file(cls, file: Path | str) -> "DataValidator":
return cls(file=file, criteria_items=criteria_items)

def apply(self, df: IamDataFrame) -> IamDataFrame:
"""Validates data in IAMC format according to specified criteria.
Logs warning/error messages for each criterion that is not met.
Parameters
----------
df : IamDataFrame
Data in IAMC format to be validated
Returns
-------
IamDataFrame
Raises
------
`ValueError` if any criterion has a warning level of `error`
"""

fail_list = []
error = False

Expand Down
17 changes: 17 additions & 0 deletions nomenclature/processor/required_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,8 @@ def _wrong_unit_variables(


class RequiredDataValidator(Processor):
"""Processor for validating required dimensions in IAMC datapoints"""

description: str | None = None
model: list[str] | None = None
required_data: list[RequiredData]
Expand All @@ -164,6 +166,21 @@ def from_file(cls, file: Path | str) -> "RequiredDataValidator":
return cls(file=file, **content)

def apply(self, df: IamDataFrame) -> IamDataFrame:
"""Validates data in IAMC format according to required models and dimensions.
Parameters
----------
df : IamDataFrame
Data in IAMC format to be validated
Returns
-------
IamDataFrame
Raises
------
`ValueError` if any required dimension is not found in the data
"""
if self.model is not None:
models_to_check = [model for model in df.model if model in self.model]
else:
Expand Down
1 change: 0 additions & 1 deletion tests/data/required_data/required_data/requiredData.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,3 @@ required_data:
unit: Mt CO2/yr
region: World
year: [2020, 2030, 2040, 2050]

0 comments on commit 4e5d9a9

Please sign in to comment.