From 68e29a5514307eaa95b02cc33f0c8ebca9668828 Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Thu, 11 Apr 2024 05:41:45 -0700 Subject: [PATCH 01/37] Draft PR for lossy compression with @cofinoa --- ch01.adoc | 4 +++- ch08.adoc | 27 +++++++++++++++++++++++++++ 2 files changed, 30 insertions(+), 1 deletion(-) diff --git a/ch01.adoc b/ch01.adoc index e8bda96d..ff906c6d 100644 --- a/ch01.adoc +++ b/ch01.adoc @@ -92,6 +92,8 @@ The word "apex" refers to position of this group at the vertex of the tree of gr longitude dimension:: A dimension of a netCDF variable that has an associated longitude coordinate variable. +lossy compression variable:: A variable used as a container for attributes that define a specific lossy compression algorithm. The type of the variable is arbitrary since it contains no data. + multidimensional coordinate variable:: An auxiliary coordinate variable that is multidimensional. nearest item:: The item (variable or group) that can be reached via the shortest traversal of the file from the referring group following the rules set forth in the <>. @@ -223,4 +225,4 @@ The UGRID conventions description is referenced from, rather than rewritten into A summary indicating how UGRID relates to other parts of the CF conventions, and which features of UGRID are excluded from CF, can be found in <>. To reduce the chance of ambiguities arising from their accidental re-use, all of the UGRID standardized attributes are specified in <> and <>. -The UGRID conventions have their own conformance document, which should be used in conjunction with the CF conformance document when checking the validity of datasets. \ No newline at end of file +The UGRID conventions have their own conformance document, which should be used in conjunction with the CF conformance document when checking the validity of datasets. diff --git a/ch08.adoc b/ch08.adoc index 591d3312..1261195f 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -675,3 +675,30 @@ The data creator shall specify the floating-point arithmetic precision used duri Using the given computational precision in the interpolation computations is a necessary, but not sufficient, condition for the data user to be able to reconstitute the coordinates to an accuracy comparable to that intended by the data creator. For instance, a **`computational_precision**` value of **`"64"**` would specify that, using the same implementation and hardware as the creator of the compressed dataset, sufficient accuracy could not be reached when using a floating-point precision lower than 64-bit floating-point arithmetic in the interpolation computations required to reconstitute the coordinates. +[[lossy-compression-by-quantization, Section 8.4, "Lossy Compression by Quantization"]] +=== Lossy Compression by Quantization + +Geoscientific models and measurements generate false precision (scientifically meaningless data bits) that wastes storage space. False precision can mislead (by implying noise is signal) and is scientifically pointless. Quantization is a technique that can eliminate false precision, usually by setting the least significant bits of IEEE floating-point values to zero. The results are themselves valid IEEE values with quantized precision---no special software or decoding is necessary to read them. Moreover, the quantized bits compress more efficiently than random bits. Thus quantization is often referred to as a form of lossy compression though, strictly speaking, quantization is a pre-conditioner for compression that must actually be performed by a subsequent codec. + +These CF conventions constitute a framework to provide quantization properties as metadata that accompanies quantized data. The goals are twofold. First, to inform the interested user how, and to what degree, the quantized data differ from the original (and irrecoverable) unquantized data. Second, to provide the necessary provenance metadata for users to reproduce the data transformations on the same or other raw data. These conventions also allow users to better understand the precision that the data producer expects from the source model or measurement. + +Software can use a variety of algorithms to quantize data and write it in netCDF format. In practice, data purveyors are likely to employ the same quantization algorithm to multiple variables in a single file, possibly with variable-specific levels of quantization. This suggests the use of a hybrid version of the container variable model for each quantization algorithm employed on variables in a given file. The lossy compression container variable records the generic properties of the algorithm, while the algorithm parameters applied to specific variables are as attributes of those variables. + +[[lossy-compression-variable, Section 8.4.1, "Lossy Compression Varible"]] +==== Lossy compression variable + +A lossy compression variable provides the description of a lossy compression algorithm via a collection of attached attributes. It is of arbitrary type since it contains no data. Its purpose is to act as a container for the generic attributes of the algorithm. All lossy compression variables must have at least three attributes: family, algorithm, and implementation. + +The family describes the overall type of lossy compression. The only valid value of family currently is quantize. Future versions of CF are anticipated to support other families of lossy algorithms, e.g., rounding, zfp, fpzip. The family attribute quickly conveys the general class of lossy algorithm to the user who may not be familiar with the names of all possible algorithms in each class. + +The algorithm attribute that defines a specific algorithm depends on the value of family. The netCDF library itself has supported three quantization algorithms since 2023: BitGroom, BitRound, and Granular BitRound. The valid values of algorithm for the quantize family are thus bitgroom, bitround, and granular_bitround. These controlled vocabulary values are case-insensitive versions of the algorithm names, with white space replaced by underscores. + +The final required attribute in a lossy compression variable is implementation. This attribute contains free-form text that concisely conveys the algorithm provenance, including the name of the library or client that performed the quantization, the software version, and the name of the author(s) if deemed relevant. It is recommonded that implementation include the version of the library or client software that applied that algorithm. + +[[per-variable-lossy-compression-attributes, Section 8.4.2, "Per-variable lossy compression attributes"]] +==== Per-variable lossy compression attributes + +Each variable that has been lossily compressed must include at least two attributes. The lossy_compression attribute + + +The number of bits so quantized can be chosen by multiples criteria including: 1) Preserving a desired floating point precision that corresponds to the known measurement or model precision or accuracy; 2) Achieving a desired reduction in required storage space. From 4e2e09716b9a532e5c1a9531a6023c5e174bf1fb Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Fri, 12 Apr 2024 04:13:03 -0700 Subject: [PATCH 02/37] Finish most draft text in section. Change family from required to optional. No formatting or examples yet. --- ch08.adoc | 15 +++++++++------ history.adoc | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/ch08.adoc b/ch08.adoc index 1261195f..ee28245c 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -682,23 +682,26 @@ Geoscientific models and measurements generate false precision (scientifically m These CF conventions constitute a framework to provide quantization properties as metadata that accompanies quantized data. The goals are twofold. First, to inform the interested user how, and to what degree, the quantized data differ from the original (and irrecoverable) unquantized data. Second, to provide the necessary provenance metadata for users to reproduce the data transformations on the same or other raw data. These conventions also allow users to better understand the precision that the data producer expects from the source model or measurement. -Software can use a variety of algorithms to quantize data and write it in netCDF format. In practice, data purveyors are likely to employ the same quantization algorithm to multiple variables in a single file, possibly with variable-specific levels of quantization. This suggests the use of a hybrid version of the container variable model for each quantization algorithm employed on variables in a given file. The lossy compression container variable records the generic properties of the algorithm, while the algorithm parameters applied to specific variables are as attributes of those variables. +Software can use a variety of algorithms to quantize data and write it in netCDF format. In practice, data purveyors are likely to employ the same quantization algorithm to multiple variables in a single file, possibly with variable-specific levels of quantization. This suggests the use of a hybrid version of the container variable model for each quantization algorithm employed on variables in a given file. The lossy compression container variable records the generic properties of the algorithm, while the algorithm parameters applied to specific variables are as attributes of those variables. The values of many of the attributes make use of controlled vocabularies that are case-insensitive, with white space replaced by underscores. [[lossy-compression-variable, Section 8.4.1, "Lossy Compression Varible"]] ==== Lossy compression variable -A lossy compression variable provides the description of a lossy compression algorithm via a collection of attached attributes. It is of arbitrary type since it contains no data. Its purpose is to act as a container for the generic attributes of the algorithm. All lossy compression variables must have at least three attributes: family, algorithm, and implementation. +A lossy compression variable provides the description of a lossy compression algorithm via a collection of attached attributes. It is of arbitrary type since it contains no data. Its purpose is to act as a container for the generic attributes of the algorithm. Lossy compression variables are recommended to have at least three attributes: family, algorithm, and implementation. -The family describes the overall type of lossy compression. The only valid value of family currently is quantize. Future versions of CF are anticipated to support other families of lossy algorithms, e.g., rounding, zfp, fpzip. The family attribute quickly conveys the general class of lossy algorithm to the user who may not be familiar with the names of all possible algorithms in each class. +The family attribute conveys the general class of lossy algorithm to the user who may not be familiar with the names of all possible algorithms in each class. The only valid value of family currently is quantize. For this reason family is an optional though recommended attribute. If and when algorithms outside the quantize family are supported, the family attribute may become required. Other potential families of lossy algorithms include rounding, packing, zfp, and fpzip. -The algorithm attribute that defines a specific algorithm depends on the value of family. The netCDF library itself has supported three quantization algorithms since 2023: BitGroom, BitRound, and Granular BitRound. The valid values of algorithm for the quantize family are thus bitgroom, bitround, and granular_bitround. These controlled vocabulary values are case-insensitive versions of the algorithm names, with white space replaced by underscores. +The algorithm attribute that defines a specific algorithm depends on the value of family. The netCDF library itself has supported three quantization algorithms since 2023: BitGroom, BitRound, and Granular BitRound. The controlled vocabulary for the quantize family of algorithm is thus bitgroom, bitround, and granular_bitround. -The final required attribute in a lossy compression variable is implementation. This attribute contains free-form text that concisely conveys the algorithm provenance, including the name of the library or client that performed the quantization, the software version, and the name of the author(s) if deemed relevant. It is recommonded that implementation include the version of the library or client software that applied that algorithm. +The final attribute in a lossy compression variable is implementation. This required attribute contains free-form text that concisely conveys the algorithm provenance, including the name of the library or client that performed the quantization, the software version, and, if deemed relevant, the author name(s). [[per-variable-lossy-compression-attributes, Section 8.4.2, "Per-variable lossy compression attributes"]] ==== Per-variable lossy compression attributes -Each variable that has been lossily compressed must include at least two attributes. The lossy_compression attribute +Each variable that has been lossily compressed must include at least two attributes. The lossy compression variables are associated with the data variables by the lossy_compression attribute. This attribute is attached to data variables so that variables with compressed with different algorithms may be present in a single file. +Data variables that have been lossily compressed must also record the specific parameter value(s) used in the compression algorithm. The input parameter for all algorithms in the quantize family specifies the preserved precision. BitRound retains the specified number of significant bits (NSB) in the IEEE mantissa, and quantize the trailing bits. Thus all data variables quantized by BitRound shall have a corresponding attribute lossy_compression_nsb. The value of lossy_compression_nsb is an integer with 1 <= NSB <= 23 for data type float or real, and 1 <= NSB <= 52 for data type double. + +The Bitgroom and Granular Bitgroom algorithms guarantee preservation of a specified number of significant digits (NSD) in base 10 representation. The actual number of mantissa bits quantized depends on the algorithm. Thus all data variables quantized by BitGroom or Granular BitGroom shall have a corresponding attribute lossy_compression_nsd. The value of lossy_compression_nsd is an integer with 1 <= NSD <= 7 for data type float or real, and 1 <= NSD <= 16 for data type double. The number of bits so quantized can be chosen by multiples criteria including: 1) Preserving a desired floating point precision that corresponds to the known measurement or model precision or accuracy; 2) Achieving a desired reduction in required storage space. diff --git a/history.adoc b/history.adoc index 5c246d51..9556dad6 100644 --- a/history.adoc +++ b/history.adoc @@ -7,6 +7,7 @@ === Working version (most recent first) +* {issues}403[Issue #403]: Metadata to encode lossy compression properties * {issues}511[Issue #511]: Appendix B: New element in XML file header to record the "first published date" * {issues}509[Issue #509]: In exceptional cases allow a standard name to be aliased into two alternatives * {issues}501[Issue #501]: Clarify that data variables and variables containing coordinate data are highly recommended to have **`long_name`** or **`standard_name`** attributes, that **`cf_role`** is used only for discrete sampling geometries and UGRID mesh topologies, and that CF does not prohibit CF attributes from being used in ways that are not defined by CF but that in such cases their meaning is not defined by CF. From 7fe0c232cb3a7c02c65c1b35a7134824d2bde25a Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Fri, 12 Apr 2024 05:22:16 -0700 Subject: [PATCH 03/37] Start formatting --- ch08.adoc | 62 ++++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 48 insertions(+), 14 deletions(-) diff --git a/ch08.adoc b/ch08.adoc index ee28245c..55f21c5e 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -678,30 +678,64 @@ For instance, a **`computational_precision**` value of **`"64"**` would specify [[lossy-compression-by-quantization, Section 8.4, "Lossy Compression by Quantization"]] === Lossy Compression by Quantization -Geoscientific models and measurements generate false precision (scientifically meaningless data bits) that wastes storage space. False precision can mislead (by implying noise is signal) and is scientifically pointless. Quantization is a technique that can eliminate false precision, usually by setting the least significant bits of IEEE floating-point values to zero. The results are themselves valid IEEE values with quantized precision---no special software or decoding is necessary to read them. Moreover, the quantized bits compress more efficiently than random bits. Thus quantization is often referred to as a form of lossy compression though, strictly speaking, quantization is a pre-conditioner for compression that must actually be performed by a subsequent codec. - -These CF conventions constitute a framework to provide quantization properties as metadata that accompanies quantized data. The goals are twofold. First, to inform the interested user how, and to what degree, the quantized data differ from the original (and irrecoverable) unquantized data. Second, to provide the necessary provenance metadata for users to reproduce the data transformations on the same or other raw data. These conventions also allow users to better understand the precision that the data producer expects from the source model or measurement. - -Software can use a variety of algorithms to quantize data and write it in netCDF format. In practice, data purveyors are likely to employ the same quantization algorithm to multiple variables in a single file, possibly with variable-specific levels of quantization. This suggests the use of a hybrid version of the container variable model for each quantization algorithm employed on variables in a given file. The lossy compression container variable records the generic properties of the algorithm, while the algorithm parameters applied to specific variables are as attributes of those variables. The values of many of the attributes make use of controlled vocabularies that are case-insensitive, with white space replaced by underscores. - -[[lossy-compression-variable, Section 8.4.1, "Lossy Compression Varible"]] +Geoscientific models and measurements generate false precision (scientifically meaningless data bits) that wastes storage space. +False precision can mislead (by implying noise is signal) and is scientifically pointless. +The quantization technique can eliminate false precision, usually by setting the least significant bits of [<>] floating-point mantissas to zeros. +The quantized results are valid [<>] values---no special software or decoder is necessary to read them. +Importantly, the quantized bits compress more efficiently than random bits. +Thus quantization is often referred to as a lossy compression technique though, strictly speaking, quantization only pre-conditions data for more efficient compression by a subsequent codec. + +These CF conventions constitute a framework to provide quantization properties as metadata that accompanies quantized data. +The goals are twofold. +First, to inform interested users how, and to what degree, the quantized data differ from the original (and irrecoverable) unquantized data. +Second, to provide the necessary provenance metadata for users to reproduce the data transformations on the same or other raw data. +These conventions also allow users to better understand the precision that data producers expect from source models or measurements. + +Software can use a variety of algorithms to quantize data and write it in netCDF format. +In practice, data purveyors are likely to employ the same lossy quantization algorithm to multiple variables in a single file, possibly with variable-specific levels of compression. +This suggests the use of a hybrid version of the container variable model for each quantization algorithm employed on variables in a given file. +The **`lossy compression`** container variable records the generic properties of the algorithm, while the algorithm parameters are stored as attributes of the specific data variables to which they were applied. +Keeping with CF precedents, many attributes make use of controlled vocabularies that are case-insensitive, with white space replaced by underscores. + +[[lossy-compression-variable, Section 8.4.1, "Lossy Compression Variable"]] ==== Lossy compression variable -A lossy compression variable provides the description of a lossy compression algorithm via a collection of attached attributes. It is of arbitrary type since it contains no data. Its purpose is to act as a container for the generic attributes of the algorithm. Lossy compression variables are recommended to have at least three attributes: family, algorithm, and implementation. +A **`lossy compression`** variable provides the description of a lossy compression algorithm via a collection of attached attributes. +It is of arbitrary type since it contains no data. +Its purpose is to act as a container for the generic attributes of an algorithm. +Lossy compression variables are recommended to have at least three attributes: **`family`**, **`algorithm`**, and **`implementation`**. -The family attribute conveys the general class of lossy algorithm to the user who may not be familiar with the names of all possible algorithms in each class. The only valid value of family currently is quantize. For this reason family is an optional though recommended attribute. If and when algorithms outside the quantize family are supported, the family attribute may become required. Other potential families of lossy algorithms include rounding, packing, zfp, and fpzip. +The **`family`** attribute conveys the general class of lossy algorithm to the user who may not be familiar with the names of all possible algorithms in each class. +The only valid value of **`family`** currently is **`quantize`**. +For this reason **`family`** is an optional though recommended attribute. +If and when algorithms outside the quantize family are supported, the **`family`** attribute may become required. +Other potential families of lossy algorithms include rounding, packing, zfp, and fpzip. -The algorithm attribute that defines a specific algorithm depends on the value of family. The netCDF library itself has supported three quantization algorithms since 2023: BitGroom, BitRound, and Granular BitRound. The controlled vocabulary for the quantize family of algorithm is thus bitgroom, bitround, and granular_bitround. +The **`algorithm`** attribute name a specific algorithm from those defined for the give **`family`**. +The netCDF library itself has supported three quantization algorithms since 2023: BitGroom, BitRound, and Granular BitRound. +The controlled vocabulary for the **`quantize`** algorithms consists of **`bitgroom`**, **`bitround`**, and **`granular_bitround`**. -The final attribute in a lossy compression variable is implementation. This required attribute contains free-form text that concisely conveys the algorithm provenance, including the name of the library or client that performed the quantization, the software version, and, if deemed relevant, the author name(s). +The final attribute in a lossy compression variable is **`implementation`**. +This required attribute contains free-form text that concisely conveys the algorithm provenance, including the name of the library or client that performed the quantization, and the software version. +Other information such as the author name(s) may be included in **`implementation`** if it is required convey a reproducible version of the algorithm employed. [[per-variable-lossy-compression-attributes, Section 8.4.2, "Per-variable lossy compression attributes"]] ==== Per-variable lossy compression attributes -Each variable that has been lossily compressed must include at least two attributes. The lossy compression variables are associated with the data variables by the lossy_compression attribute. This attribute is attached to data variables so that variables with compressed with different algorithms may be present in a single file. +Each variable that has been lossily compressed must include at least two attributes. +The lossy compression variables are associated with the data variables by the lossy_compression attribute. +This attribute is attached to data variables so that variables with compressed with different algorithms may be present in a single file. -Data variables that have been lossily compressed must also record the specific parameter value(s) used in the compression algorithm. The input parameter for all algorithms in the quantize family specifies the preserved precision. BitRound retains the specified number of significant bits (NSB) in the IEEE mantissa, and quantize the trailing bits. Thus all data variables quantized by BitRound shall have a corresponding attribute lossy_compression_nsb. The value of lossy_compression_nsb is an integer with 1 <= NSB <= 23 for data type float or real, and 1 <= NSB <= 52 for data type double. +Data variables that have been lossily compressed must also record the specific parameter value(s) used in the compression algorithm. +The input parameter for all algorithms in the quantize family specifies the preserved precision. +BitRound retains the specified number of significant bits (NSB) in the IEEE mantissa, and quantize the trailing bits. +Thus all data variables quantized by BitRound shall have a corresponding attribute lossy_compression_nsb. +The value of lossy_compression_nsb is an integer with 1 <= NSB <= 23 for data type float or real, and 1 <= NSB <= 52 for data type double. -The Bitgroom and Granular Bitgroom algorithms guarantee preservation of a specified number of significant digits (NSD) in base 10 representation. The actual number of mantissa bits quantized depends on the algorithm. Thus all data variables quantized by BitGroom or Granular BitGroom shall have a corresponding attribute lossy_compression_nsd. The value of lossy_compression_nsd is an integer with 1 <= NSD <= 7 for data type float or real, and 1 <= NSD <= 16 for data type double. +The Bitgroom and Granular Bitgroom algorithms guarantee preservation of a specified number of significant digits (NSD) in base 10 representation. +The actual number of mantissa bits quantized depends on the algorithm. +Thus all data variables quantized by BitGroom or Granular BitGroom shall have a corresponding attribute lossy_compression_nsd. +The value of lossy_compression_nsd is an integer with 1 <= NSD <= 7 for data type float or real, and 1 <= NSD <= 16 for data type double. +fxm got to here editing The number of bits so quantized can be chosen by multiples criteria including: 1) Preserving a desired floating point precision that corresponds to the known measurement or model precision or accuracy; 2) Achieving a desired reduction in required storage space. From e50a1d8d708ac38a1e27429a0ae9da3406cba2f3 Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Wed, 17 Apr 2024 10:59:16 -0700 Subject: [PATCH 04/37] Finish per-variable and description sections. Add two examples. --- bibliography.adoc | 8 +++- ch08.adoc | 109 ++++++++++++++++++++++++++++++++++++++-------- toc-extra.adoc | 4 +- 3 files changed, 99 insertions(+), 22 deletions(-) diff --git a/bibliography.adoc b/bibliography.adoc index 5673cc8d..3e0b5f9b 100644 --- a/bibliography.adoc +++ b/bibliography.adoc @@ -3,11 +3,15 @@ [bibliography] === References +- [[[CFDM]]] link:$$https://doi.org/10.5194/gmd-10-4619-2017$$[A data model of the Climate and Forecast metadata conventions (CF-1.6) with a software implementation (cf-python v2.1)]. Hassell, D., Gregory, J., Blower, J., Lawrence, B. N., and Taylor, K. E.: _Geosci. Model Dev._, 10, 4619-4646, 2017. - [[[COARDS]]] link:$$https://ferret.pmel.noaa.gov/Ferret/documentation/coards-netcdf-conventions$$[Conventions for the standardization of NetCDF Files]. Sponsored by the "Cooperative Ocean/Atmosphere Research Data Service," a NOAA/university cooperative for the sharing and distribution of global atmospheric and oceanographic research data sets. May 1995. +- [[[DCG19]]] link:$$https://doi.org/10.5194/gmd-12-4099-2019$$[Evaluation of lossless and lossy algorithms for the compression of scientific datasets in netCDF-4 or HDF5 files]. Delaunay, X., A. Courtois, and F. Gouillon: _Geosci. Model Dev._, 12, 4099-4113, 2019. - [[[FGDC]]] link:$$https://www.fgdc.gov/standards/projects/FGDC-standards-projects/metadata/base-metadata/v2_0698.pdf$$[Content Standard for Digital Geospatial Metadata]. Federal Geographic Data Committee, FGDC-STD-001-1998. - [[[IEEE_754]]] link:$$https://doi.org/10.1109/IEEESTD.2019.8766229$$[IEEE Standard for Floating-Point Arithmetic], in _IEEE Std 754-2019 (Revision of IEEE 754-2008)_, 22 July 2019. +- [[[Kou21]]] link:$$https://doi.org/10.5194/gmd-14-377-2021$$[A note on precision-preserving compression of scientific data]. Kouznetsov, R.: _Geosci. Model Dev._, 14, 377-389, 2021. +- [[[KRD21]]] link:$$https://doi.org/10.1038/s43588-021-00156-2$$[Compressing atmospheric data into its real information content]. Klöwer, M., Razinger, M., Dominguez, J. J., Düben, P., and Palmer, T. N.: _Nat. Comput. Sci._, 1, 713-724, 2021. - [[[NetCDF]]] link:$$https://doi.org/10.5065/D6H70CW6$$[NetCDF Software Package]. UNIDATA Program Center of the University Corporation for Atmospheric Research. - [[[NUG]]] link:$$https://docs.unidata.ucar.edu/nug/current/index.html$$[The NetCDF User's Guide]. - [[[OGC_WKT-CRS]]] link:$$https://www.opengeospatial.org/standards/wkt-crs$$[OGC Well-known text representation of coordinate reference systems]. @@ -16,7 +20,7 @@ OGC document 12-063. 1st May 2015. - [[[SCH02]]] link:$$https://doi.org/10.1175/1520-0493(2002)130<2459:ANTFVC>2.0.CO;2$$[A new terrain-following vertical coordinate formulation for atmospheric prediction models]. C Schaer, D Leuenberger, and O Fuhrer. 2002. _Monthly Weather Review_. 130. 2459-2480. - [[[Snyder]]] link:$$https://doi.org/10.3133/pp1395$$[Map Projections: A Working Manual]. USGS Professional Paper 1395. - [[[UDUNITS]]] link:$$https://doi.org/10.5065/D6KD1WN0$$[UDUNITS Software Package]. UNIDATA Program Center of the University Corporation for Atmospheric Research. +- [[[UGRID]]] link:$$https://ugrid-conventions.github.io/ugrid-conventions$$[UGRID Conventions for storing unstructured (or flexible mesh) data in netCDF files] - [[[W3C]]] link:$$https://www.w3.org/$$[World Wide Web Consortium (W3C)]. - [[[XML]]] link:$$https://www.w3.org/TR/1998/REC-xml-19980210$$[Extensible Markup Language (XML) 1.0]. T. Bray, J. Paoli, and C.M. Sperberg-McQueen. 10 February 1998. -- [[[CFDM]]] link:$$https://doi.org/10.5194/gmd-10-4619-2017$$[A data model of the Climate and Forecast metadata conventions (CF-1.6) with a software implementation (cf-python v2.1)]. Hassell, D., Gregory, J., Blower, J., Lawrence, B. N., and Taylor, K. E.: _Geosci. Model Dev._, 10, 4619-4646, 2017. -- [[[UGRID]]] link:$$https://ugrid-conventions.github.io/ugrid-conventions$$[UGRID Conventions for storing unstructured (or flexible mesh) data in netCDF files] +- [[[Zen16]]] link:$$https://doi.org/10.5194/gmd-9-3199-2016$$[Bit Grooming: Statistically accurate precision-preserving quantization with compression, evaluated in the netCDF Operators (NCO, v4.4.8+)]. Zender, C. S.: _Geosci. Model Dev._, 9, 3199-3211, 2016. diff --git a/ch08.adoc b/ch08.adoc index 55f21c5e..3996e9c7 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -678,14 +678,15 @@ For instance, a **`computational_precision**` value of **`"64"**` would specify [[lossy-compression-by-quantization, Section 8.4, "Lossy Compression by Quantization"]] === Lossy Compression by Quantization -Geoscientific models and measurements generate false precision (scientifically meaningless data bits) that wastes storage space. +Geoscientific models and measurements generate false floating-point precision (scientifically meaningless data bits) that wastes storage space. False precision can mislead (by implying noise is signal) and is scientifically pointless. The quantization technique can eliminate false precision, usually by setting the least significant bits of [<>] floating-point mantissas to zeros. +(Quantization of integer types, although theoretically allowed, is not covered by this convention.) The quantized results are valid [<>] values---no special software or decoder is necessary to read them. Importantly, the quantized bits compress more efficiently than random bits. Thus quantization is often referred to as a lossy compression technique though, strictly speaking, quantization only pre-conditions data for more efficient compression by a subsequent codec. -These CF conventions constitute a framework to provide quantization properties as metadata that accompanies quantized data. +These CF conventions define a metadata framework to provide quantization properties alongside quantized data. The goals are twofold. First, to inform interested users how, and to what degree, the quantized data differ from the original (and irrecoverable) unquantized data. Second, to provide the necessary provenance metadata for users to reproduce the data transformations on the same or other raw data. @@ -711,31 +712,101 @@ For this reason **`family`** is an optional though recommended attribute. If and when algorithms outside the quantize family are supported, the **`family`** attribute may become required. Other potential families of lossy algorithms include rounding, packing, zfp, and fpzip. -The **`algorithm`** attribute name a specific algorithm from those defined for the give **`family`**. -The netCDF library itself has supported three quantization algorithms since 2023: BitGroom, BitRound, and Granular BitRound. -The controlled vocabulary for the **`quantize`** algorithms consists of **`bitgroom`**, **`bitround`**, and **`granular_bitround`**. +The **`algorithm`** attribute names a specific lossy compression algorithm. +Four quantization algorithms are currently recognized for **`family = quantize`**: BitRound, BitGroom, DigitRound, and Granular BitRound. +The controlled vocabulary for the **`quantize`** algorithms thus consists of **`bitround`**, **`bitgroom`**, **`digitround`**, and **`granular_bitround`**. +See <> for a brief summary of these algorithms. The final attribute in a lossy compression variable is **`implementation`**. This required attribute contains free-form text that concisely conveys the algorithm provenance, including the name of the library or client that performed the quantization, and the software version. -Other information such as the author name(s) may be included in **`implementation`** if it is required convey a reproducible version of the algorithm employed. +**`implementation`** should include any other information required to disambiguate the source of the algorithm employed. [[per-variable-lossy-compression-attributes, Section 8.4.2, "Per-variable lossy compression attributes"]] ==== Per-variable lossy compression attributes Each variable that has been lossily compressed must include at least two attributes. -The lossy compression variables are associated with the data variables by the lossy_compression attribute. -This attribute is attached to data variables so that variables with compressed with different algorithms may be present in a single file. +Data variables use the **`lossy_compression`** attribute to associate themselves with a **`lossy compression`** variable. +This attribute is attached to data variables so that variables compressed with different algorithms may be present in a single file. + +Data variables that have been lossily compressed must also record the specific parameter value(s) used in the lossy compression algorithm. +The input parameter for all algorithms in the **`quantize`** family determines the precision preserved by the algorithm. +BitRound retains the specified number of significant bits (NSB) in the IEEE mantissa, and quantizes the trailing bits. +All data variables quantized by BitRound must record the NSB in the **`lossy_compression_nsb`** attribute. +Note that BitRound __counts only explicitly represented mantissa bits__. +It does not include the most-significant-bit with value 1 that implicitly begins all [<>] mantissas. +Thus **`lossy_compression_nsb`** is an integer type attribute with **`1 \<= NSB \<= 23`** for data type **`float`** or **`real`**, and **`1 \<= NSB \<= 52`** for data type **`double`**. + +The Bitgroom, Granular Bitgroom, and DigitRound algorithms guarantee preservation of a specified number of significant digits (NSD) in base 10 representation. +The actual number of mantissa bits quantized depends on the algorithm. +Thus all data variables quantized by BitGroom, Granular BitGroom, or DigitRound must have a corresponding attribute **`lossy_compression_nsd`**. +The value of **`lossy_compression_nsd`** is an integer with **`1 \<= NSD \<= 7`** for data type **`float`** or **`real`**, and **`1 \<= NSD \<= 15`** for data type **`double`**. -Data variables that have been lossily compressed must also record the specific parameter value(s) used in the compression algorithm. -The input parameter for all algorithms in the quantize family specifies the preserved precision. -BitRound retains the specified number of significant bits (NSB) in the IEEE mantissa, and quantize the trailing bits. -Thus all data variables quantized by BitRound shall have a corresponding attribute lossy_compression_nsb. -The value of lossy_compression_nsb is an integer with 1 <= NSB <= 23 for data type float or real, and 1 <= NSB <= 52 for data type double. +[[example-lossy-compression-nsb-libnetcdf]] +[caption="Example 8.8. "] +.Lossy compression performed by BitRound algorithm in libnetcdf +==== +---- + variables: + char compression_info ; + compression_info:family = "quantize" ; + compression_info:algorithm = "bitround" ; + compression_info:implementation = "libnetcdf version 4.9.3-development" ; + + float ps(time,lat,lon) ; + ps:_QuantizeBitRoundNumberOfSignificantBits = 9 ; + ps:lossy_compression = "compression_info" ; + ps:lossy_compression_nsb = 9 ; + ps:standard_name = "surface_air_pressure" ; + ps:units = "Pa" ; +---- +Note how the same NSB is reported in two attributes of the data variable **`ps`**. The **`lossy_compression`** container variable (**`compression_info`**) **`implementation`** attribute reveals that the netCDF library applied the BitRound algorithm. The netCDF library wrote the system-defined **`_QuantizeBitRoundNumberOfSignificantBits`** attribute [<>] which contains the same parameter value as the CF **`lossy_compression_nsb`** attribute. +==== -The Bitgroom and Granular Bitgroom algorithms guarantee preservation of a specified number of significant digits (NSD) in base 10 representation. -The actual number of mantissa bits quantized depends on the algorithm. -Thus all data variables quantized by BitGroom or Granular BitGroom shall have a corresponding attribute lossy_compression_nsd. -The value of lossy_compression_nsd is an integer with 1 <= NSD <= 7 for data type float or real, and 1 <= NSD <= 16 for data type double. +[[example-lossy-compression-nsd-multiple-variables-nco]] +[caption="Example 8.9. "] +.Lossy compression performed by Granular BitGroom algorithm in NCO +==== +Quantization of different variables to different levels often makes good scientific sense. Here the pressure variable **`ps`** has four significant digits of precision while the temperature variable **`ts`** retains only three significant digits. +---- + variables: + char compression_info ; + compression_info:family = "quantize" ; + compression_info:algorithm = "granular_bitround" ; + compression_info:implementation = "NCO version 5.2.5-alpha01" ; + + float ps(time,lat,lon) ; + ps:standard_name = "surface_air_pressure" ; + ps:units = "Pa" ; + ps:lossy_compression = "compression_info" ; + ps:lossy_compression_nsd = 4 ; + + float ts(time) ; + ts:standard_name = "surface_temperature" ; + ts:units = "K" ; + ts:lossy_compression = "compression_info" ; + ts:lossy_compression_nsd = 3 ; +---- +Both variables were quantized by the same algorithm and so utilize the same **`lossy_compression`** variable. **`compression_info`** reveals that the Granular BitRound algorithm in NCO performed the quantization. Since the netCDF library did not perform the quantization, there is no system-defined long quantization attribute. +==== -fxm got to here editing -The number of bits so quantized can be chosen by multiples criteria including: 1) Preserving a desired floating point precision that corresponds to the known measurement or model precision or accuracy; 2) Achieving a desired reduction in required storage space. +[[quantization-algorithms-description, Section 8.4.3, "Description of Quantization Algorithms"]] +==== Description of Quantization Algorithms + +This section briefly describes and contrasts each recognized **`quantize`** algorithm and points to further documentation. +BitRound is also called the "round-to-nearest" method [<>] and the "half-to-even" method [<>]. +This is the default [<>] rounding method and is thus bias-free and conservative for random distributions of numbers. +BitRound is preferred when the number of significant bits (NSB) to retain is known. + +The other **`quantize`** algorithms guarantee to preserve a given number of significant (base-10 representation) digits (NSD). +Their quantization errors never exceed half of the unit value at the NSD decimal place [<>]. +BitGroom [<>] appeared first, though is now known to be suboptimal in accuracy [<>] and in compressibility compared to later methods. +DigitRound [<>] has superior compressibility for a given NSD compared to BitGroom. +Granular BitGroom combines the DigitRound approach for compressibility with the BitRound approach for quantization. +Granular BitGroom and DigitRound are both good choices when the NSD to retain is known. + +The netCDF C and Fortran libraries can directly invoke BitRound, BitGroom, and Granular BitRound [<>]. +The netCDF library attaches a long, system-defined attribute to every data variable that it quantizes, such as +**`_QuantizeBitRoundNumberOfSignificantBits = 9`** in Example 8.8. +The leading underscore indicates that the netCDF library wrote this attribute [<>]. +Any data variable that has the library-defined attribute should, in addition, contain the corresponding CF metadata. +Example 8.9 shows how the CF metadata might appear for other (non-netCDF library) implementations of **`quantize`** algorithms. diff --git a/toc-extra.adoc b/toc-extra.adoc index eac9bb14..94965a20 100644 --- a/toc-extra.adoc +++ b/toc-extra.adoc @@ -96,6 +96,8 @@ J.5. <> 8.5. <> 8.6. <> 8.7. <> +8.8. <> +8.9. <> B.1. <> H.1. <> H.2. <> @@ -119,4 +121,4 @@ H.19. <> H.20. <> H.21. <> H.22. <> -I.1. <> \ No newline at end of file +I.1. <> From cd122e789ed00d69ed6b0f5f9e56d0910be68678 Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Wed, 17 Apr 2024 12:24:52 -0700 Subject: [PATCH 05/37] Conformance doc --- conformance.adoc | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/conformance.adoc b/conformance.adoc index fddd28ea..efe1fc33 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -528,7 +528,25 @@ The requirements on all other bounds tie point variable attributes are the same * An interpolation variable should have 0 dimensions. * The recommendations on bounds tie point variable attributes are the same as for bounds variables described in <>. +[[lossy-compression-by-quantization]] +=== 8.4 Lossy Compression by Quantization + +*Requirements:* + +* Lossy compression variables must have two string-valued attributes, **`algorithm`** and **`implementation`**. +* The value of **`algorithm`** must be **`bitround`**, **`bitgroom`**, **`digitround`**, or **`granular_bitround`**. +* The value of **`implementation`** is a free-form string that concisely conveys the algorithm provenance, including the name of the library or client that performed the quantization, and the software version. +* Data variables that were lossily compressed via quantization must have a string-valued attribute named **`lossy_compression`**. +* The value of **`lossy_compression`** is the name of the **`lossy compression`** container variable. +* Data variables that were lossily compressed via quantization must have an integer type attribute named either **`lossy_compression_nsb`** (for **`algorithm = bitround`**) or **`lossy_compression_nsd`** (for **`bitgroom`**, **`digitround`**, or **`granular_bitround`**). +* **`lossy_compression_nsb`** is an integer type attribute with **`1 \<= NSB \<= 23`** for data type **`float`** or **`real`**, and **`1 \<= NSB \<= 52`** for data type **`double`**. +* **`lossy_compression_nsd`** is an integer with **`1 \<= NSD \<= 7`** for data type **`float`** or **`real`**, and **`1 \<= NSD \<= 15`** for data type **`double`**.   + +*Recommendations:* + +* Lossy compression variables are recommend to have a string-valued attribute named **`family`**. The only supported value is currently **`quantize`**. + [[parametric-vertical-coordinates]] === Appendix D Parametric Vertical Coordinates From e165e95476e4bf53497c5f53e62b2ca460668b25 Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Wed, 26 Jun 2024 15:01:51 -0700 Subject: [PATCH 06/37] Update to replace "lossy compression" by "quantization" container variable and attribute names in accord with Jonathan Gregorys comments. --- ch01.adoc | 2 +- ch08.adoc | 102 ++++++++++++++++++++++------------------------- conformance.adoc | 19 +++++---- history.adoc | 2 +- toc-extra.adoc | 4 +- 5 files changed, 61 insertions(+), 68 deletions(-) diff --git a/ch01.adoc b/ch01.adoc index ff906c6d..ba37db1d 100644 --- a/ch01.adoc +++ b/ch01.adoc @@ -92,7 +92,7 @@ The word "apex" refers to position of this group at the vertex of the tree of gr longitude dimension:: A dimension of a netCDF variable that has an associated longitude coordinate variable. -lossy compression variable:: A variable used as a container for attributes that define a specific lossy compression algorithm. The type of the variable is arbitrary since it contains no data. +quantization variable:: A variable used as a container for attributes that define a specific quantization algorithm. The type of the variable is arbitrary since it contains no data. multidimensional coordinate variable:: An auxiliary coordinate variable that is multidimensional. diff --git a/ch08.adoc b/ch08.adoc index 3996e9c7..5d9f2220 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -5,7 +5,7 @@ There are three methods for reducing dataset size: packing, lossless compression, and lossy compression. By packing we mean altering the data in a way that reduces its precision (but has no other effect on accuracy). By lossless compression we mean techniques that store the data more efficiently and result in no loss of precision or accuracy. -By lossy compression we mean techniques that store the data more efficiently and retain its precision but result in some loss in accuracy. +By lossy compression we mean techniques that either store the data more efficiently and retain its precision but result in some loss in accuracy, or techniques that intentionally reduce data precision to improve the efficiency of subsequent lossless compression. Lossless compression only works in certain circumstances, e.g., when a variable contains a significant amount of missing or repeated data values. In this case it is possible to make use of standard utilities, e.g., UNIX **`compress`** or GNU **`gzip`**, to compress the entire file after it has been written. @@ -675,118 +675,112 @@ The data creator shall specify the floating-point arithmetic precision used duri Using the given computational precision in the interpolation computations is a necessary, but not sufficient, condition for the data user to be able to reconstitute the coordinates to an accuracy comparable to that intended by the data creator. For instance, a **`computational_precision**` value of **`"64"**` would specify that, using the same implementation and hardware as the creator of the compressed dataset, sufficient accuracy could not be reached when using a floating-point precision lower than 64-bit floating-point arithmetic in the interpolation computations required to reconstitute the coordinates. -[[lossy-compression-by-quantization, Section 8.4, "Lossy Compression by Quantization"]] -=== Lossy Compression by Quantization +[[lossy-compression-via-quantization, Section 8.4, "Lossy Compression via Quantization"]] +=== Lossy Compression via Quantization Geoscientific models and measurements generate false floating-point precision (scientifically meaningless data bits) that wastes storage space. False precision can mislead (by implying noise is signal) and is scientifically pointless. -The quantization technique can eliminate false precision, usually by setting the least significant bits of [<>] floating-point mantissas to zeros. +The quantization technique can eliminate false precision, usually by rounding the least significant bits of [<>] floating-point mantissas to zeros. (Quantization of integer types, although theoretically allowed, is not covered by this convention.) The quantized results are valid [<>] values---no special software or decoder is necessary to read them. Importantly, the quantized bits compress more efficiently than random bits. -Thus quantization is often referred to as a lossy compression technique though, strictly speaking, quantization only pre-conditions data for more efficient compression by a subsequent codec. +Thus quantization is sometimes referred to as a form of lossy compression though, strictly speaking, quantization only pre-conditions data for more efficient compression by a subsequent codec. These CF conventions define a metadata framework to provide quantization properties alongside quantized data. The goals are twofold. -First, to inform interested users how, and to what degree, the quantized data differ from the original (and irrecoverable) unquantized data. +First, to inform interested users how, and to what degree, the quantized data differ from the original unquantized data, which are not stored in the dataset and may no longer exist. Second, to provide the necessary provenance metadata for users to reproduce the data transformations on the same or other raw data. These conventions also allow users to better understand the precision that data producers expect from source models or measurements. -Software can use a variety of algorithms to quantize data and write it in netCDF format. -In practice, data purveyors are likely to employ the same lossy quantization algorithm to multiple variables in a single file, possibly with variable-specific levels of compression. +Quantization is irreversible so data providers must exercise judgement about which fields to quantize, and to what level. Observed and simulated geophysical fields always have finite accuracy and precision. Domain specialists are best qualified to suggest appropriate quantization levels. However, fields that describe idealized or reference coordinate grids, or grid transformations, are often known to the highest precision possible. These fields can include spatial and temporal coordinate variables (e.g., **`latitude`**, **`longitude`**, **`time`**) and properties derived from these coordinates (e.g., **`area`**, **`volume`**). Degrading the precision of such grid properties may have unintended side effects on the accuracy of subsequent operations such regridding, interpolation, and conservation checks which should generally be performed with the highest precision possible. In general, we recommend against quantizing any **`coordinate variable`**, **`bounds`** variable, **`cell_measures`** variables, and any variables employed in **`formula_terms`**. Use of these conventions ensures that all quantized variables are clearly marked as such, and thus alerts users to cases where these guidelines have not been followed. + +Software can use a variety of algorithms to quantize data and to write it in netCDF format. +In practice, data purveyors are likely to employ the same quantization algorithm to multiple variables in a single file, possibly with variable-specific levels of quantization. This suggests the use of a hybrid version of the container variable model for each quantization algorithm employed on variables in a given file. -The **`lossy compression`** container variable records the generic properties of the algorithm, while the algorithm parameters are stored as attributes of the specific data variables to which they were applied. -Keeping with CF precedents, many attributes make use of controlled vocabularies that are case-insensitive, with white space replaced by underscores. +The **`quantization`** container variable records the generic properties of the algorithm, while the algorithm parameters are stored as attributes of the specific data variables to which they were applied. +Keeping with CF precedents, quantization attributes that make use of controlled vocabularies are case-insensitive, with white space replaced by underscores. -[[lossy-compression-variable, Section 8.4.1, "Lossy Compression Variable"]] -==== Lossy compression variable +[[quantization-variable, Section 8.4.1, "Quantization Variable"]] +==== Quantization variable -A **`lossy compression`** variable provides the description of a lossy compression algorithm via a collection of attached attributes. +A **`quantization`** variable describes a quantization algorithm via a collection of attached attributes. It is of arbitrary type since it contains no data. Its purpose is to act as a container for the generic attributes of an algorithm. -Lossy compression variables are recommended to have at least three attributes: **`family`**, **`algorithm`**, and **`implementation`**. - -The **`family`** attribute conveys the general class of lossy algorithm to the user who may not be familiar with the names of all possible algorithms in each class. -The only valid value of **`family`** currently is **`quantize`**. -For this reason **`family`** is an optional though recommended attribute. -If and when algorithms outside the quantize family are supported, the **`family`** attribute may become required. -Other potential families of lossy algorithms include rounding, packing, zfp, and fpzip. +Quantization variables are recommended to have at least two attributes: **`algorithm`**, and **`implementation`**. -The **`algorithm`** attribute names a specific lossy compression algorithm. -Four quantization algorithms are currently recognized for **`family = quantize`**: BitRound, BitGroom, DigitRound, and Granular BitRound. -The controlled vocabulary for the **`quantize`** algorithms thus consists of **`bitround`**, **`bitgroom`**, **`digitround`**, and **`granular_bitround`**. +The **`algorithm`** attribute names a specific quantization algorithm. +Four quantization algorithms are currently recognized: BitRound, BitGroom, DigitRound, and Granular BitRound. +The controlled vocabulary these algorithms thus consists of **`bitround`**, **`bitgroom`**, **`digitround`**, and **`granular_bitround`**. See <> for a brief summary of these algorithms. -The final attribute in a lossy compression variable is **`implementation`**. +The second attribute in a quantization variable is **`implementation`**. This required attribute contains free-form text that concisely conveys the algorithm provenance, including the name of the library or client that performed the quantization, and the software version. **`implementation`** should include any other information required to disambiguate the source of the algorithm employed. -[[per-variable-lossy-compression-attributes, Section 8.4.2, "Per-variable lossy compression attributes"]] -==== Per-variable lossy compression attributes +[[per-variable-quantization-attributes, Section 8.4.2, "Per-variable quantization attributes"]] +==== Per-variable quantization attributes -Each variable that has been lossily compressed must include at least two attributes. -Data variables use the **`lossy_compression`** attribute to associate themselves with a **`lossy compression`** variable. +Each variable that has been quantized must include at least two attributes. +Data variables use the **`quantization`** attribute to associate themselves with a **`quantization`** container variable. This attribute is attached to data variables so that variables compressed with different algorithms may be present in a single file. -Data variables that have been lossily compressed must also record the specific parameter value(s) used in the lossy compression algorithm. -The input parameter for all algorithms in the **`quantize`** family determines the precision preserved by the algorithm. +Data variables that have been quantized must also record the specific parameter value(s) used in the quantization algorithm. +The input parameter for all quantization algorithms determines the precision preserved by the algorithm. BitRound retains the specified number of significant bits (NSB) in the IEEE mantissa, and quantizes the trailing bits. -All data variables quantized by BitRound must record the NSB in the **`lossy_compression_nsb`** attribute. +All data variables quantized by BitRound must record the NSB in the **`quantization_nsb`** attribute. Note that BitRound __counts only explicitly represented mantissa bits__. It does not include the most-significant-bit with value 1 that implicitly begins all [<>] mantissas. -Thus **`lossy_compression_nsb`** is an integer type attribute with **`1 \<= NSB \<= 23`** for data type **`float`** or **`real`**, and **`1 \<= NSB \<= 52`** for data type **`double`**. +Thus **`quantization_nsb`** is an integer type attribute with **`1 \<= NSB \<= 23`** for data type **`float`** or **`real`**, and **`1 \<= NSB \<= 52`** for data type **`double`**. The Bitgroom, Granular Bitgroom, and DigitRound algorithms guarantee preservation of a specified number of significant digits (NSD) in base 10 representation. The actual number of mantissa bits quantized depends on the algorithm. -Thus all data variables quantized by BitGroom, Granular BitGroom, or DigitRound must have a corresponding attribute **`lossy_compression_nsd`**. -The value of **`lossy_compression_nsd`** is an integer with **`1 \<= NSD \<= 7`** for data type **`float`** or **`real`**, and **`1 \<= NSD \<= 15`** for data type **`double`**. +Thus all data variables quantized by BitGroom, Granular BitGroom, or DigitRound must have a corresponding attribute **`quantization_nsd`**. +The value of **`quantization_nsd`** is an integer with **`1 \<= NSD \<= 7`** for data type **`float`** or **`real`**, and **`1 \<= NSD \<= 15`** for data type **`double`**. -[[example-lossy-compression-nsb-libnetcdf]] +[[example-quantization-nsb-libnetcdf]] [caption="Example 8.8. "] -.Lossy compression performed by BitRound algorithm in libnetcdf +.Quantization performed by BitRound algorithm in libnetcdf ==== ---- variables: - char compression_info ; - compression_info:family = "quantize" ; - compression_info:algorithm = "bitround" ; - compression_info:implementation = "libnetcdf version 4.9.3-development" ; + char quantization ; + quantization:algorithm = "bitround" ; + quantization:implementation = "libnetcdf version 4.9.3-development" ; float ps(time,lat,lon) ; ps:_QuantizeBitRoundNumberOfSignificantBits = 9 ; - ps:lossy_compression = "compression_info" ; - ps:lossy_compression_nsb = 9 ; + ps:quantization = "quantization" ; + ps:quantization_nsb = 9 ; ps:standard_name = "surface_air_pressure" ; ps:units = "Pa" ; ---- -Note how the same NSB is reported in two attributes of the data variable **`ps`**. The **`lossy_compression`** container variable (**`compression_info`**) **`implementation`** attribute reveals that the netCDF library applied the BitRound algorithm. The netCDF library wrote the system-defined **`_QuantizeBitRoundNumberOfSignificantBits`** attribute [<>] which contains the same parameter value as the CF **`lossy_compression_nsb`** attribute. +Note how the same NSB is reported in two attributes of the data variable **`ps`**. The **`quantization`** container variable (**`quantization`**) **`implementation`** attribute reveals that the netCDF library applied the BitRound algorithm. The netCDF library wrote the system-defined **`_QuantizeBitRoundNumberOfSignificantBits`** attribute [<>] which contains the same parameter value as the CF **`quantization_nsb`** attribute. ==== -[[example-lossy-compression-nsd-multiple-variables-nco]] +[[example-quantization-nsd-multiple-variables-nco]] [caption="Example 8.9. "] -.Lossy compression performed by Granular BitGroom algorithm in NCO +.Quantization performed by Granular BitGroom algorithm in NCO ==== Quantization of different variables to different levels often makes good scientific sense. Here the pressure variable **`ps`** has four significant digits of precision while the temperature variable **`ts`** retains only three significant digits. ---- variables: - char compression_info ; - compression_info:family = "quantize" ; - compression_info:algorithm = "granular_bitround" ; - compression_info:implementation = "NCO version 5.2.5-alpha01" ; + char quantization ; + quantization:algorithm = "granular_bitround" ; + quantization:implementation = "NCO version 5.2.5-alpha01" ; float ps(time,lat,lon) ; ps:standard_name = "surface_air_pressure" ; ps:units = "Pa" ; - ps:lossy_compression = "compression_info" ; - ps:lossy_compression_nsd = 4 ; + ps:quantization = "quantization" ; + ps:quantization_nsd = 4 ; float ts(time) ; ts:standard_name = "surface_temperature" ; ts:units = "K" ; - ts:lossy_compression = "compression_info" ; - ts:lossy_compression_nsd = 3 ; + ts:quantization = "quantization" ; + ts:quantization_nsd = 3 ; ---- -Both variables were quantized by the same algorithm and so utilize the same **`lossy_compression`** variable. **`compression_info`** reveals that the Granular BitRound algorithm in NCO performed the quantization. Since the netCDF library did not perform the quantization, there is no system-defined long quantization attribute. +Both variables were quantized by the same algorithm and so utilize the same **`quantization`** variable. **`quantization`** reveals that the Granular BitRound algorithm in NCO performed the quantization. Since the netCDF library did not perform the quantization, there is no system-defined long quantization attribute. ==== [[quantization-algorithms-description, Section 8.4.3, "Description of Quantization Algorithms"]] diff --git a/conformance.adoc b/conformance.adoc index efe1fc33..133565ae 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -528,24 +528,23 @@ The requirements on all other bounds tie point variable attributes are the same * An interpolation variable should have 0 dimensions. * The recommendations on bounds tie point variable attributes are the same as for bounds variables described in <>. -[[lossy-compression-by-quantization]] -=== 8.4 Lossy Compression by Quantization +[[lossy-compression-via-quantization]] +=== 8.4 Lossy Compression via Quantization *Requirements:* -* Lossy compression variables must have two string-valued attributes, **`algorithm`** and **`implementation`**. +* Quantization variables must have two string-valued attributes, **`algorithm`** and **`implementation`**. * The value of **`algorithm`** must be **`bitround`**, **`bitgroom`**, **`digitround`**, or **`granular_bitround`**. * The value of **`implementation`** is a free-form string that concisely conveys the algorithm provenance, including the name of the library or client that performed the quantization, and the software version. -* Data variables that were lossily compressed via quantization must have a string-valued attribute named **`lossy_compression`**. -* The value of **`lossy_compression`** is the name of the **`lossy compression`** container variable. -* Data variables that were lossily compressed via quantization must have an integer type attribute named either **`lossy_compression_nsb`** (for **`algorithm = bitround`**) or **`lossy_compression_nsd`** (for **`bitgroom`**, **`digitround`**, or **`granular_bitround`**). -* **`lossy_compression_nsb`** is an integer type attribute with **`1 \<= NSB \<= 23`** for data type **`float`** or **`real`**, and **`1 \<= NSB \<= 52`** for data type **`double`**. -* **`lossy_compression_nsd`** is an integer with **`1 \<= NSD \<= 7`** for data type **`float`** or **`real`**, and **`1 \<= NSD \<= 15`** for data type **`double`**. -  +* Data variables that were quantized must have a string-valued attribute named **`quantization`**. +* The value of **`quantization`** is the name of the **`quantization`** container variable. +* Data variables that were quantized must have an integer type attribute named either **`quantization_nsb`** (for **`algorithm = bitround`**) or **`quantization_nsd`** (for **`bitgroom`**, **`digitround`**, or **`granular_bitround`**). +* **`quantization_nsb`** is an integer type attribute with **`1 \<= NSB \<= 23`** for data type **`float`** or **`real`**, and **`1 \<= NSB \<= 52`** for data type **`double`**. +* **`quantization_nsd`** is an integer with **`1 \<= NSD \<= 7`** for data type **`float`** or **`real`**, and **`1 \<= NSD \<= 15`** for data type **`double`**. *Recommendations:* -* Lossy compression variables are recommend to have a string-valued attribute named **`family`**. The only supported value is currently **`quantize`**. +* None (fxm) [[parametric-vertical-coordinates]] === Appendix D Parametric Vertical Coordinates diff --git a/history.adoc b/history.adoc index 9556dad6..e5dc6438 100644 --- a/history.adoc +++ b/history.adoc @@ -7,7 +7,7 @@ === Working version (most recent first) -* {issues}403[Issue #403]: Metadata to encode lossy compression properties +* {issues}403[Issue #403]: Metadata to encode quantization properties * {issues}511[Issue #511]: Appendix B: New element in XML file header to record the "first published date" * {issues}509[Issue #509]: In exceptional cases allow a standard name to be aliased into two alternatives * {issues}501[Issue #501]: Clarify that data variables and variables containing coordinate data are highly recommended to have **`long_name`** or **`standard_name`** attributes, that **`cf_role`** is used only for discrete sampling geometries and UGRID mesh topologies, and that CF does not prohibit CF attributes from being used in ways that are not defined by CF but that in such cases their meaning is not defined by CF. diff --git a/toc-extra.adoc b/toc-extra.adoc index 94965a20..e73da712 100644 --- a/toc-extra.adoc +++ b/toc-extra.adoc @@ -96,8 +96,8 @@ J.5. <> 8.5. <> 8.6. <> 8.7. <> -8.8. <> -8.9. <> +8.8. <> +8.9. <> B.1. <> H.1. <> H.2. <> From 743519fcde7364cc44bb0c3b88902f651f0b278b Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Wed, 26 Jun 2024 17:12:50 -0700 Subject: [PATCH 07/37] Include all quantization algorithms in Appendix A. --- appa.adoc | 35 ++++++++++++++++++++++++++++++++++- ch08.adoc | 22 +++++++++++----------- conformance.adoc | 2 +- 3 files changed, 46 insertions(+), 13 deletions(-) diff --git a/appa.adoc b/appa.adoc index d59818f3..61006b4e 100644 --- a/appa.adoc +++ b/appa.adoc @@ -9,7 +9,7 @@ See <> for the grid mapping attributes, and <> for the distinction between **BI** and **BO**), and **-** for variables with some other purpose. +For variable attributes, the possible values of "Use" are: **C** for variables containing coordinate data, **D** for data variables, **M** for geometry container variables, **Q** for quantization container variables, **Do** for domain variables, **BI** and **BO** for boundary variables (see <> for the distinction between **BI** and **BO**), and **-** for variables with some other purpose. CF does not prohibit any of these attributes from being attached to variables of different kinds from those listed as their "Use" in this table, but their meanings are not defined by CF if they are used in these other ways. "Links" indicates the location of the attribute"s original definition (first link) and sections where the attribute is discussed in this document (additional links as necessary). @@ -38,6 +38,13 @@ Attribute If both **`scale_factor`** and **`add_offset`** attributes are present, the data are first scaled before the offset is added. In cases where there is a strong constraint on dataset size, it is allowed to pack the coordinate variables (using add_offset and/or scale_factor), but this is not recommended in general. +| **`algorithm`** +| S +| Q +| <>, and <> +| Name of the quantization algorithm employed. +Either **`bitround`**, **`bitgroom`**, **`digitround`**, or **`granular_bitround`**. + | **`ancillary_variables`** | S | D @@ -200,6 +207,12 @@ Use in conjunction with **`flag_meanings`**. | link:$$https://www.unidata.ucar.edu/software/netcdf/docs/attribute_conventions.html$$[NUG Appendix A, "Attribute Conventions"] | List of the applications that have modified the original data. +| **`implementation`** +| S +| Q +| <>, and <> +| The name and version of the library or client software that performed the quantization with **`algorithm`**. + | **`instance_dimension`** | S | - @@ -300,6 +313,26 @@ Allowed for auxiliary coordinate variables but not allowed for coordinate variab | <> | Direction of increasing vertical coordinate value. +| **`quantization`** +| S +| D +| <> +| Identifies a variable that defines a quantization algorithm and its provenance. + +| **`quantization_nsb`** +| N +| D +| <>, and <> +| Specifies the number of significant bits retained in the IEEE mantissa of data quantized with the BitRound algorithm. +Use in conjunction with **`quantization`**. + +| **`quantization_nsd`** +| N +| D +| <>, and <> +| Specifies the number of significant base-10 digits retained in the IEEE mantissa of data quantized with base-10 quantization algorithms. +Use in conjunction with **`quantization`**. + | **`references`** | S | G, D diff --git a/ch08.adoc b/ch08.adoc index 5d9f2220..3e23620c 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -724,7 +724,7 @@ Each variable that has been quantized must include at least two attributes. Data variables use the **`quantization`** attribute to associate themselves with a **`quantization`** container variable. This attribute is attached to data variables so that variables compressed with different algorithms may be present in a single file. -Data variables that have been quantized must also record the specific parameter value(s) used in the quantization algorithm. +Data variables that have been quantized must also record the specific parameter value used in the quantization algorithm. The input parameter for all quantization algorithms determines the precision preserved by the algorithm. BitRound retains the specified number of significant bits (NSB) in the IEEE mantissa, and quantizes the trailing bits. All data variables quantized by BitRound must record the NSB in the **`quantization_nsb`** attribute. @@ -732,7 +732,7 @@ Note that BitRound __counts only explicitly represented mantissa bits__. It does not include the most-significant-bit with value 1 that implicitly begins all [<>] mantissas. Thus **`quantization_nsb`** is an integer type attribute with **`1 \<= NSB \<= 23`** for data type **`float`** or **`real`**, and **`1 \<= NSB \<= 52`** for data type **`double`**. -The Bitgroom, Granular Bitgroom, and DigitRound algorithms guarantee preservation of a specified number of significant digits (NSD) in base 10 representation. +The BitGroom, Granular BitGroom, and DigitRound algorithms guarantee preservation of a specified number of significant digits (NSD) in base 10 representation. The actual number of mantissa bits quantized depends on the algorithm. Thus all data variables quantized by BitGroom, Granular BitGroom, or DigitRound must have a corresponding attribute **`quantization_nsd`**. The value of **`quantization_nsd`** is an integer with **`1 \<= NSD \<= 7`** for data type **`float`** or **`real`**, and **`1 \<= NSD \<= 15`** for data type **`double`**. @@ -743,13 +743,13 @@ The value of **`quantization_nsd`** is an integer with **`1 \<= NSD \<= 7`** for ==== ---- variables: - char quantization ; - quantization:algorithm = "bitround" ; - quantization:implementation = "libnetcdf version 4.9.3-development" ; + char quantization_info ; + quantization_info:algorithm = "bitround" ; + quantization_info:implementation = "libnetcdf version 4.9.3-development" ; float ps(time,lat,lon) ; ps:_QuantizeBitRoundNumberOfSignificantBits = 9 ; - ps:quantization = "quantization" ; + ps:quantization = "quantization_info" ; ps:quantization_nsb = 9 ; ps:standard_name = "surface_air_pressure" ; ps:units = "Pa" ; @@ -764,20 +764,20 @@ Note how the same NSB is reported in two attributes of the data variable **`ps`* Quantization of different variables to different levels often makes good scientific sense. Here the pressure variable **`ps`** has four significant digits of precision while the temperature variable **`ts`** retains only three significant digits. ---- variables: - char quantization ; - quantization:algorithm = "granular_bitround" ; - quantization:implementation = "NCO version 5.2.5-alpha01" ; + char quantization_info ; + quantization_info:algorithm = "granular_bitround" ; + quantization_info:implementation = "NCO version 5.2.5-alpha01" ; float ps(time,lat,lon) ; ps:standard_name = "surface_air_pressure" ; ps:units = "Pa" ; - ps:quantization = "quantization" ; + ps:quantization = "quantization_info" ; ps:quantization_nsd = 4 ; float ts(time) ; ts:standard_name = "surface_temperature" ; ts:units = "K" ; - ts:quantization = "quantization" ; + ts:quantization = "quantization_info" ; ts:quantization_nsd = 3 ; ---- Both variables were quantized by the same algorithm and so utilize the same **`quantization`** variable. **`quantization`** reveals that the Granular BitRound algorithm in NCO performed the quantization. Since the netCDF library did not perform the quantization, there is no system-defined long quantization attribute. diff --git a/conformance.adoc b/conformance.adoc index 133565ae..e43c2b30 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -544,7 +544,7 @@ The requirements on all other bounds tie point variable attributes are the same *Recommendations:* -* None (fxm) +* The value of **`implementation`** should be specified as _library_ _version_ _number_, e.g., **`libnetcdf version 4.9.2`** or as _client_ _version_ _number_, e.g., **`NCO version 5.2.6`**. [[parametric-vertical-coordinates]] === Appendix D Parametric Vertical Coordinates From 4d0730e234ff49a70a7b5825c8a486d61ed5c186 Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Wed, 26 Jun 2024 18:17:23 -0700 Subject: [PATCH 08/37] Address/merge textual suggestions from JGs PR review --- appa.adoc | 1 - ch08.adoc | 47 ++++++++++++++++++++++++----------------------- conformance.adoc | 9 ++++----- 3 files changed, 28 insertions(+), 29 deletions(-) diff --git a/appa.adoc b/appa.adoc index 61006b4e..6cb8418b 100644 --- a/appa.adoc +++ b/appa.adoc @@ -43,7 +43,6 @@ In cases where there is a strong constraint on dataset size, it is allowed to pa | Q | <>, and <> | Name of the quantization algorithm employed. -Either **`bitround`**, **`bitgroom`**, **`digitround`**, or **`granular_bitround`**. | **`ancillary_variables`** | S diff --git a/ch08.adoc b/ch08.adoc index 3e23620c..95bdc1b7 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -680,52 +680,53 @@ For instance, a **`computational_precision**` value of **`"64"**` would specify Geoscientific models and measurements generate false floating-point precision (scientifically meaningless data bits) that wastes storage space. False precision can mislead (by implying noise is signal) and is scientifically pointless. -The quantization technique can eliminate false precision, usually by rounding the least significant bits of [<>] floating-point mantissas to zeros. +Quantization algorithms can eliminate false precision, usually by rounding the least significant bits of [<>] floating-point mantissas to zeros. (Quantization of integer types, although theoretically allowed, is not covered by this convention.) The quantized results are valid [<>] values---no special software or decoder is necessary to read them. Importantly, the quantized bits compress more efficiently than random bits. -Thus quantization is sometimes referred to as a form of lossy compression though, strictly speaking, quantization only pre-conditions data for more efficient compression by a subsequent codec. +Thus quantization is sometimes referred to as a form of lossy compression although, strictly speaking, quantization only pre-conditions data for more efficient compression by a subsequent compressor. -These CF conventions define a metadata framework to provide quantization properties alongside quantized data. +These CF conventions define a metadata framework to record quantization properties alongside quantized data variables. The goals are twofold. First, to inform interested users how, and to what degree, the quantized data differ from the original unquantized data, which are not stored in the dataset and may no longer exist. Second, to provide the necessary provenance metadata for users to reproduce the data transformations on the same or other raw data. -These conventions also allow users to better understand the precision that data producers expect from source models or measurements. - -Quantization is irreversible so data providers must exercise judgement about which fields to quantize, and to what level. Observed and simulated geophysical fields always have finite accuracy and precision. Domain specialists are best qualified to suggest appropriate quantization levels. However, fields that describe idealized or reference coordinate grids, or grid transformations, are often known to the highest precision possible. These fields can include spatial and temporal coordinate variables (e.g., **`latitude`**, **`longitude`**, **`time`**) and properties derived from these coordinates (e.g., **`area`**, **`volume`**). Degrading the precision of such grid properties may have unintended side effects on the accuracy of subsequent operations such regridding, interpolation, and conservation checks which should generally be performed with the highest precision possible. In general, we recommend against quantizing any **`coordinate variable`**, **`bounds`** variable, **`cell_measures`** variables, and any variables employed in **`formula_terms`**. Use of these conventions ensures that all quantized variables are clearly marked as such, and thus alerts users to cases where these guidelines have not been followed. - -Software can use a variety of algorithms to quantize data and to write it in netCDF format. -In practice, data purveyors are likely to employ the same quantization algorithm to multiple variables in a single file, possibly with variable-specific levels of quantization. -This suggests the use of a hybrid version of the container variable model for each quantization algorithm employed on variables in a given file. -The **`quantization`** container variable records the generic properties of the algorithm, while the algorithm parameters are stored as attributes of the specific data variables to which they were applied. -Keeping with CF precedents, quantization attributes that make use of controlled vocabularies are case-insensitive, with white space replaced by underscores. +These conventions also allow users to better understand the precision that data producers expect from source models or measurements. +These conventions are intended to apply only to floating-point data variables. + +Quantization is irreversible so data providers must exercise judgement about which fields to quantize, and to what level. +Observed and simulated geophysical fields always have finite accuracy and precision. +Domain specialists are best qualified to suggest appropriate quantization levels. +However, fields that describe idealized or reference coordinate grids, or grid transformations, are often known to the highest precision possible. +These fields can include spatial and temporal coordinate variables (e.g., **`latitude`**, **`longitude`**, **`time`**) and properties derived from these coordinates (e.g., **`area`**, **`volume`**). +Degrading the precision of such grid properties may have unintended side effects on the accuracy of subsequent operations such regridding, interpolation, and conservation checks which should generally be performed with the highest precision possible. +In general, we recommend against quantizing any **`coordinate variable`**, **`bounds`** variable, **`cell_measures`** variables, and any variables employed in **`formula_terms`**. +Use of these conventions ensures that all quantized variables are clearly marked as such, and thus alerts users to cases where these guidelines have not been followed. [[quantization-variable, Section 8.4.1, "Quantization Variable"]] ==== Quantization variable -A **`quantization`** variable describes a quantization algorithm via a collection of attached attributes. +A quantization variable describes a quantization algorithm via a collection of attached attributes. It is of arbitrary type since it contains no data. -Its purpose is to act as a container for the generic attributes of an algorithm. -Quantization variables are recommended to have at least two attributes: **`algorithm`**, and **`implementation`**. +Its purpose is to act as a container for the generic attributes of a quantization algorithm. +Quantization variables are required to have at least two attributes: **`algorithm`**, and **`implementation`**. The **`algorithm`** attribute names a specific quantization algorithm. Four quantization algorithms are currently recognized: BitRound, BitGroom, DigitRound, and Granular BitRound. The controlled vocabulary these algorithms thus consists of **`bitround`**, **`bitgroom`**, **`digitround`**, and **`granular_bitround`**. See <> for a brief summary of these algorithms. -The second attribute in a quantization variable is **`implementation`**. -This required attribute contains free-form text that concisely conveys the algorithm provenance, including the name of the library or client that performed the quantization, and the software version. +The second attribute required in a quantization variable is **`implementation`**. +This attribute contains free-form, unstandardized text that concisely conveys the algorithm provenance, including the name of the library or client that performed the quantization, and the software version. **`implementation`** should include any other information required to disambiguate the source of the algorithm employed. [[per-variable-quantization-attributes, Section 8.4.2, "Per-variable quantization attributes"]] ==== Per-variable quantization attributes -Each variable that has been quantized must include at least two attributes. -Data variables use the **`quantization`** attribute to associate themselves with a **`quantization`** container variable. -This attribute is attached to data variables so that variables compressed with different algorithms may be present in a single file. - -Data variables that have been quantized must also record the specific parameter value used in the quantization algorithm. +Each data variable that has been quantized must include at least two attributes to describe the quantization. +First, all such data variables must have a `**quantization**` attribute containing the name of the quantization variable describing the algorithm. +Second, all such variables must record the specific parameter value used in the quantization algorithm. The input parameter for all quantization algorithms determines the precision preserved by the algorithm. + BitRound retains the specified number of significant bits (NSB) in the IEEE mantissa, and quantizes the trailing bits. All data variables quantized by BitRound must record the NSB in the **`quantization_nsb`** attribute. Note that BitRound __counts only explicitly represented mantissa bits__. @@ -788,7 +789,7 @@ Both variables were quantized by the same algorithm and so utilize the same **`q This section briefly describes and contrasts each recognized **`quantize`** algorithm and points to further documentation. BitRound is also called the "round-to-nearest" method [<>] and the "half-to-even" method [<>]. -This is the default [<>] rounding method and is thus bias-free and conservative for random distributions of numbers. +This is the default [<>] rounding method and is bias-free and conservative for random distributions of numbers. BitRound is preferred when the number of significant bits (NSB) to retain is known. The other **`quantize`** algorithms guarantee to preserve a given number of significant (base-10 representation) digits (NSD). diff --git a/conformance.adoc b/conformance.adoc index e43c2b30..01cc8027 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -534,13 +534,12 @@ The requirements on all other bounds tie point variable attributes are the same *Requirements:* * Quantization variables must have two string-valued attributes, **`algorithm`** and **`implementation`**. -* The value of **`algorithm`** must be **`bitround`**, **`bitgroom`**, **`digitround`**, or **`granular_bitround`**. -* The value of **`implementation`** is a free-form string that concisely conveys the algorithm provenance, including the name of the library or client that performed the quantization, and the software version. +* The value of **`algorithm`** must be one of the values permitted by this section. * Data variables that were quantized must have a string-valued attribute named **`quantization`**. -* The value of **`quantization`** is the name of the **`quantization`** container variable. +* The value of **`quantization`** must be the name of the quantization container variable which exists in the file. * Data variables that were quantized must have an integer type attribute named either **`quantization_nsb`** (for **`algorithm = bitround`**) or **`quantization_nsd`** (for **`bitgroom`**, **`digitround`**, or **`granular_bitround`**). -* **`quantization_nsb`** is an integer type attribute with **`1 \<= NSB \<= 23`** for data type **`float`** or **`real`**, and **`1 \<= NSB \<= 52`** for data type **`double`**. -* **`quantization_nsd`** is an integer with **`1 \<= NSD \<= 7`** for data type **`float`** or **`real`**, and **`1 \<= NSD \<= 15`** for data type **`double`**. +* The value of **`quantization_nsb`** must be in the range **`1 \<= NSB \<= 23`** for data type **`float`** or **`real`**, and **`1 \<= NSB \<= 52`** for data type **`double`**. +* The value of **`quantization_nsd`** must be in the range **`1 \<= NSD \<= 7`** for data type **`float`** or **`real`**, and **`1 \<= NSD \<= 15`** for data type **`double`**. *Recommendations:* From d92e51de8372e7e0c139c81ea79c6d72cb25fa2c Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Wed, 26 Jun 2024 18:34:25 -0700 Subject: [PATCH 09/37] Fix fonts/wording in figure captions --- ch08.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ch08.adoc b/ch08.adoc index 5b2965e6..ba644ead 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -755,7 +755,7 @@ The value of **`quantization_nsd`** is an integer with **`1 \<= NSD \<= 7`** for ps:standard_name = "surface_air_pressure" ; ps:units = "Pa" ; ---- -Note how the same NSB is reported in two attributes of the data variable **`ps`**. The **`quantization`** container variable (**`quantization`**) **`implementation`** attribute reveals that the netCDF library applied the BitRound algorithm. The netCDF library wrote the system-defined **`_QuantizeBitRoundNumberOfSignificantBits`** attribute [<>] which contains the same parameter value as the CF **`quantization_nsb`** attribute. +Note how the same NSB is reported in two attributes of the data variable **`ps`**. The quantization variable (**`quantization_info`**) **`implementation`** attribute reveals that the netCDF library applied the BitRound algorithm. The netCDF library wrote the system-defined **`_QuantizeBitRoundNumberOfSignificantBits`** attribute [<>] which contains the same parameter value as the CF **`quantization_nsb`** attribute. ==== [[example-quantization-nsd-multiple-variables-nco]] @@ -781,7 +781,7 @@ Quantization of different variables to different levels often makes good scienti ts:quantization = "quantization_info" ; ts:quantization_nsd = 3 ; ---- -Both variables were quantized by the same algorithm and so utilize the same **`quantization`** variable. **`quantization`** reveals that the Granular BitRound algorithm in NCO performed the quantization. Since the netCDF library did not perform the quantization, there is no system-defined long quantization attribute. +Both variables were quantized by the same algorithm and so utilize the same quantization variable. **`quantization_info`** reveals that the Granular BitRound algorithm in NCO performed the quantization. Since the netCDF library did not perform the quantization, there is no system-defined long quantization attribute. ==== [[quantization-algorithms-description, Section 8.4.3, "Description of Quantization Algorithms"]] From 1ce9eefd4278d33aeb01f2ade9fc89c3c266166a Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Thu, 27 Jun 2024 10:40:40 -0700 Subject: [PATCH 10/37] Update implementation attributes in example to realistic version numbers --- ch08.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ch08.adoc b/ch08.adoc index ba644ead..2e59cdef 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -746,7 +746,7 @@ The value of **`quantization_nsd`** is an integer with **`1 \<= NSD \<= 7`** for variables: char quantization_info ; quantization_info:algorithm = "bitround" ; - quantization_info:implementation = "libnetcdf version 4.9.3-development" ; + quantization_info:implementation = "libnetcdf version 4.9.2" ; float ps(time,lat,lon) ; ps:_QuantizeBitRoundNumberOfSignificantBits = 9 ; @@ -767,7 +767,7 @@ Quantization of different variables to different levels often makes good scienti variables: char quantization_info ; quantization_info:algorithm = "granular_bitround" ; - quantization_info:implementation = "NCO version 5.2.5-alpha01" ; + quantization_info:implementation = "NCO version 5.2.7" ; float ps(time,lat,lon) ; ps:standard_name = "surface_air_pressure" ; From 75b4f4938ce70eaa422cc1984ffbf85a5674a087 Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Fri, 28 Jun 2024 13:44:00 -0700 Subject: [PATCH 11/37] Address second round of JGs suggestions --- appa.adoc | 6 +++--- ch08.adoc | 22 +++++++++------------- conformance.adoc | 3 +++ 3 files changed, 15 insertions(+), 16 deletions(-) diff --git a/appa.adoc b/appa.adoc index 6cb8418b..1eaa8ee9 100644 --- a/appa.adoc +++ b/appa.adoc @@ -41,7 +41,7 @@ In cases where there is a strong constraint on dataset size, it is allowed to pa | **`algorithm`** | S | Q -| <>, and <> +| <>, and <> | Name of the quantization algorithm employed. | **`ancillary_variables`** @@ -209,7 +209,7 @@ Use in conjunction with **`flag_meanings`**. | **`implementation`** | S | Q -| <>, and <> +| <>, and <> | The name and version of the library or client software that performed the quantization with **`algorithm`**. | **`instance_dimension`** @@ -315,7 +315,7 @@ Allowed for auxiliary coordinate variables but not allowed for coordinate variab | **`quantization`** | S | D -| <> +| <> | Identifies a variable that defines a quantization algorithm and its provenance. | **`quantization_nsb`** diff --git a/ch08.adoc b/ch08.adoc index 2e59cdef..dcc06e07 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -686,24 +686,20 @@ The quantized results are valid [<>] values---no special software or d Importantly, the quantized bits compress more efficiently than random bits. Thus quantization is sometimes referred to as a form of lossy compression although, strictly speaking, quantization only pre-conditions data for more efficient compression by a subsequent compressor. -These CF conventions define a metadata framework to record quantization properties alongside quantized data variables. +The CF conventions of this section define a metadata framework to record quantization properties alongside quantized floating-point data variables. The goals are twofold. First, to inform interested users how, and to what degree, the quantized data differ from the original unquantized data, which are not stored in the dataset and may no longer exist. Second, to provide the necessary provenance metadata for users to reproduce the data transformations on the same or other raw data. These conventions also allow users to better understand the precision that data producers expect from source models or measurements. -These conventions are intended to apply only to floating-point data variables. +Use of these conventions ensures that all quantized variables are clearly marked as such, and thus alerts users to cases where these guidelines have not been followed. -Quantization is irreversible so data providers must exercise judgement about which fields to quantize, and to what level. -Observed and simulated geophysical fields always have finite accuracy and precision. -Domain specialists are best qualified to suggest appropriate quantization levels. -However, fields that describe idealized or reference coordinate grids, or grid transformations, are often known to the highest precision possible. -These fields can include spatial and temporal coordinate variables (e.g., **`latitude`**, **`longitude`**, **`time`**) and properties derived from these coordinates (e.g., **`area`**, **`volume`**). +These conventions must not be used with data variables of integer type, or any other kind of CF variable. +This is because fields that describe idealized or reference coordinate grids, or grid transformations, are often known to the highest precision possible. +These fields can include spatial and temporal coordinate variables (e.g., **`latitude`**, **`longitude`**, **`level`**, **`time`**) and properties derived from these coordinates (e.g., **`area`**, **`volume`**). Degrading the precision of such grid properties may have unintended side effects on the accuracy of subsequent operations such regridding, interpolation, and conservation checks which should generally be performed with the highest precision possible. -In general, we recommend against quantizing any **`coordinate variable`**, **`bounds`** variable, **`cell_measures`** variables, and any variables employed in **`formula_terms`**. -Use of these conventions ensures that all quantized variables are clearly marked as such, and thus alerts users to cases where these guidelines have not been followed. -[[quantization-variable, Section 8.4.1, "Quantization Variable"]] -==== Quantization variable +[[quantization-variables, Section 8.4.1, "Quantization Variables"]] +==== Quantization variables A quantization variable describes a quantization algorithm via a collection of attached attributes. It is of arbitrary type since it contains no data. @@ -719,7 +715,7 @@ The second attribute required in a quantization variable is **`implementation`** This attribute contains free-form, unstandardized text that concisely conveys the algorithm provenance, including the name of the library or client that performed the quantization, and the software version. **`implementation`** should include any other information required to disambiguate the source of the algorithm employed. -[[per-variable-quantization-attributes, Section 8.4.2, "Per-variable quantization attributes"]] +[[per-variable-quantization-attributes, Section 8.4.2, "Per-variable Quantization Attributes"]] ==== Per-variable quantization attributes Each data variable that has been quantized must include at least two attributes to describe the quantization. @@ -785,7 +781,7 @@ Both variables were quantized by the same algorithm and so utilize the same quan ==== [[quantization-algorithms-description, Section 8.4.3, "Description of Quantization Algorithms"]] -==== Description of Quantization Algorithms +==== Description of quantization algorithms This section briefly describes and contrasts each recognized **`quantize`** algorithm and points to further documentation. BitRound is also called the "round-to-nearest" method [<>] and the "half-to-even" method [<>]. diff --git a/conformance.adoc b/conformance.adoc index 01cc8027..2c3aa758 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -535,6 +535,7 @@ The requirements on all other bounds tie point variable attributes are the same * Quantization variables must have two string-valued attributes, **`algorithm`** and **`implementation`**. * The value of **`algorithm`** must be one of the values permitted by this section. +* Only floating-point type data variables can be quantized. * Data variables that were quantized must have a string-valued attribute named **`quantization`**. * The value of **`quantization`** must be the name of the quantization container variable which exists in the file. * Data variables that were quantized must have an integer type attribute named either **`quantization_nsb`** (for **`algorithm = bitround`**) or **`quantization_nsd`** (for **`bitgroom`**, **`digitround`**, or **`granular_bitround`**). @@ -545,6 +546,8 @@ The requirements on all other bounds tie point variable attributes are the same * The value of **`implementation`** should be specified as _library_ _version_ _number_, e.g., **`libnetcdf version 4.9.2`** or as _client_ _version_ _number_, e.g., **`NCO version 5.2.6`**. +* Data variables that appear in **`formula_terms`** attributes should not be quantized + [[parametric-vertical-coordinates]] === Appendix D Parametric Vertical Coordinates From 010e87a9e73a432f49b8db65c6e4ac35484c526d Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Tue, 2 Jul 2024 14:07:26 -0700 Subject: [PATCH 12/37] implement third round of JG suggestions --- ch08.adoc | 1 + conformance.adoc | 7 +++---- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/ch08.adoc b/ch08.adoc index dcc06e07..f1a02339 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -697,6 +697,7 @@ These conventions must not be used with data variables of integer type, or any o This is because fields that describe idealized or reference coordinate grids, or grid transformations, are often known to the highest precision possible. These fields can include spatial and temporal coordinate variables (e.g., **`latitude`**, **`longitude`**, **`level`**, **`time`**) and properties derived from these coordinates (e.g., **`area`**, **`volume`**). Degrading the precision of such grid properties may have unintended side effects on the accuracy of subsequent operations such regridding, interpolation, and conservation checks which should generally be performed with the highest precision possible. +For the same reason, it is recommended not to quantize any data variable which is referenced by a **`formula_terms`** attribute of any variable. [[quantization-variables, Section 8.4.1, "Quantization Variables"]] ==== Quantization variables diff --git a/conformance.adoc b/conformance.adoc index 2c3aa758..48867310 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -533,10 +533,9 @@ The requirements on all other bounds tie point variable attributes are the same *Requirements:* -* Quantization variables must have two string-valued attributes, **`algorithm`** and **`implementation`**. +* Quantization container variables must have two string-valued attributes, **`algorithm`** and **`implementation`**. * The value of **`algorithm`** must be one of the values permitted by this section. -* Only floating-point type data variables can be quantized. -* Data variables that were quantized must have a string-valued attribute named **`quantization`**. +* Only floating-point type data variables can be quantized. Quantized variables must have and are identified by having a string-valued attribute named **`quantization`**. * The value of **`quantization`** must be the name of the quantization container variable which exists in the file. * Data variables that were quantized must have an integer type attribute named either **`quantization_nsb`** (for **`algorithm = bitround`**) or **`quantization_nsd`** (for **`bitgroom`**, **`digitround`**, or **`granular_bitround`**). * The value of **`quantization_nsb`** must be in the range **`1 \<= NSB \<= 23`** for data type **`float`** or **`real`**, and **`1 \<= NSB \<= 52`** for data type **`double`**. @@ -546,7 +545,7 @@ The requirements on all other bounds tie point variable attributes are the same * The value of **`implementation`** should be specified as _library_ _version_ _number_, e.g., **`libnetcdf version 4.9.2`** or as _client_ _version_ _number_, e.g., **`NCO version 5.2.6`**. -* Data variables that appear in **`formula_terms`** attributes should not be quantized +* Data variables that appear in **`formula_terms`** attributes should not be quantized and therefore should not have a **`quantization`** attribute. [[parametric-vertical-coordinates]] === Appendix D Parametric Vertical Coordinates From 03fb0a7eb72f394a8ff71332c4c44459a5f4d5c5 Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Fri, 26 Jul 2024 12:35:18 -0700 Subject: [PATCH 13/37] Update conformance.adoc Co-authored-by: David Hassell --- conformance.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/conformance.adoc b/conformance.adoc index 48867310..baf23337 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -535,7 +535,7 @@ The requirements on all other bounds tie point variable attributes are the same * Quantization container variables must have two string-valued attributes, **`algorithm`** and **`implementation`**. * The value of **`algorithm`** must be one of the values permitted by this section. -* Only floating-point type data variables can be quantized. Quantized variables must have and are identified by having a string-valued attribute named **`quantization`**. +* Only floating-point type variables can be quantized. Quantized variables are identified by having a string-valued attribute named **`quantization`**. * The value of **`quantization`** must be the name of the quantization container variable which exists in the file. * Data variables that were quantized must have an integer type attribute named either **`quantization_nsb`** (for **`algorithm = bitround`**) or **`quantization_nsd`** (for **`bitgroom`**, **`digitround`**, or **`granular_bitround`**). * The value of **`quantization_nsb`** must be in the range **`1 \<= NSB \<= 23`** for data type **`float`** or **`real`**, and **`1 \<= NSB \<= 52`** for data type **`double`**. From 6f138ad44d039aeccce74dac9b3edb00dbbc3a96 Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Fri, 26 Jul 2024 13:01:48 -0700 Subject: [PATCH 14/37] Update ch08.adoc Sylistic or formatting suggested by DH Co-authored-by: David Hassell --- ch08.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ch08.adoc b/ch08.adoc index f1a02339..af337092 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -680,7 +680,7 @@ For instance, a **`computational_precision**` value of **`"64"**` would specify Geoscientific models and measurements generate false floating-point precision (scientifically meaningless data bits) that wastes storage space. False precision can mislead (by implying noise is signal) and is scientifically pointless. -Quantization algorithms can eliminate false precision, usually by rounding the least significant bits of [<>] floating-point mantissas to zeros. +Quantization algorithms can eliminate false precision, usually by rounding the least significant bits of <> floating-point mantissas to zeros. (Quantization of integer types, although theoretically allowed, is not covered by this convention.) The quantized results are valid [<>] values---no special software or decoder is necessary to read them. Importantly, the quantized bits compress more efficiently than random bits. From 6b6c3298c4485d577f67a7591628540adcc99cd5 Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Fri, 26 Jul 2024 13:02:32 -0700 Subject: [PATCH 15/37] Update ch08.adoc Co-authored-by: David Hassell --- ch08.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ch08.adoc b/ch08.adoc index af337092..e996c20e 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -682,7 +682,7 @@ Geoscientific models and measurements generate false floating-point precision (s False precision can mislead (by implying noise is signal) and is scientifically pointless. Quantization algorithms can eliminate false precision, usually by rounding the least significant bits of <> floating-point mantissas to zeros. (Quantization of integer types, although theoretically allowed, is not covered by this convention.) -The quantized results are valid [<>] values---no special software or decoder is necessary to read them. +The quantized results are valid <> values---no special software or decoder is necessary to read them. Importantly, the quantized bits compress more efficiently than random bits. Thus quantization is sometimes referred to as a form of lossy compression although, strictly speaking, quantization only pre-conditions data for more efficient compression by a subsequent compressor. From 761ba8b2ab7d617db3c26006d8337079c7c09461 Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Fri, 26 Jul 2024 13:04:29 -0700 Subject: [PATCH 16/37] Update ch08.adoc Co-authored-by: David Hassell --- ch08.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ch08.adoc b/ch08.adoc index e996c20e..e18d58d4 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -705,7 +705,7 @@ For the same reason, it is recommended not to quantize any data variable which i A quantization variable describes a quantization algorithm via a collection of attached attributes. It is of arbitrary type since it contains no data. Its purpose is to act as a container for the generic attributes of a quantization algorithm. -Quantization variables are required to have at least two attributes: **`algorithm`**, and **`implementation`**. +Quantization variables are required to have at least two attributes: **`algorithm`** and **`implementation`**. The **`algorithm`** attribute names a specific quantization algorithm. Four quantization algorithms are currently recognized: BitRound, BitGroom, DigitRound, and Granular BitRound. From 5a08a6ac8281fc4499850e990adad97ca1a22847 Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Fri, 26 Jul 2024 13:04:56 -0700 Subject: [PATCH 17/37] Update ch08.adoc Co-authored-by: David Hassell --- ch08.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ch08.adoc b/ch08.adoc index e18d58d4..f9b21b8d 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -798,7 +798,7 @@ Granular BitGroom and DigitRound are both good choices when the NSD to retain is The netCDF C and Fortran libraries can directly invoke BitRound, BitGroom, and Granular BitRound [<>]. The netCDF library attaches a long, system-defined attribute to every data variable that it quantizes, such as -**`_QuantizeBitRoundNumberOfSignificantBits = 9`** in Example 8.8. +**`_QuantizeBitRoundNumberOfSignificantBits = 9`** in <>. The leading underscore indicates that the netCDF library wrote this attribute [<>]. Any data variable that has the library-defined attribute should, in addition, contain the corresponding CF metadata. Example 8.9 shows how the CF metadata might appear for other (non-netCDF library) implementations of **`quantize`** algorithms. From 03e37717db15bb53dbd6c5f98eedd0751213492e Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Fri, 26 Jul 2024 13:05:32 -0700 Subject: [PATCH 18/37] Update ch08.adoc Co-authored-by: David Hassell --- ch08.adoc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/ch08.adoc b/ch08.adoc index f9b21b8d..5f334a74 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -752,7 +752,9 @@ The value of **`quantization_nsd`** is an integer with **`1 \<= NSD \<= 7`** for ps:standard_name = "surface_air_pressure" ; ps:units = "Pa" ; ---- -Note how the same NSB is reported in two attributes of the data variable **`ps`**. The quantization variable (**`quantization_info`**) **`implementation`** attribute reveals that the netCDF library applied the BitRound algorithm. The netCDF library wrote the system-defined **`_QuantizeBitRoundNumberOfSignificantBits`** attribute [<>] which contains the same parameter value as the CF **`quantization_nsb`** attribute. +Note how the same NSB is reported in two attributes of the data variable **`ps`**. +The quantization variable (**`quantization_info`**) **`implementation`** attribute reveals that the netCDF library applied the BitRound algorithm. +The netCDF library wrote the system-defined **`_QuantizeBitRoundNumberOfSignificantBits`** attribute <> which contains the same parameter value as the CF **`quantization_nsb`** attribute (see the main text for further details). ==== [[example-quantization-nsd-multiple-variables-nco]] From 0bc82a14643de59903a39e07378be6f8d0427db9 Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Fri, 26 Jul 2024 13:07:12 -0700 Subject: [PATCH 19/37] Update ch08.adoc Co-authored-by: David Hassell --- ch08.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ch08.adoc b/ch08.adoc index 5f334a74..13a67958 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -695,7 +695,7 @@ Use of these conventions ensures that all quantized variables are clearly marked These conventions must not be used with data variables of integer type, or any other kind of CF variable. This is because fields that describe idealized or reference coordinate grids, or grid transformations, are often known to the highest precision possible. -These fields can include spatial and temporal coordinate variables (e.g., **`latitude`**, **`longitude`**, **`level`**, **`time`**) and properties derived from these coordinates (e.g., **`area`**, **`volume`**). +These variables can include spatial and temporal coordinate variables (e.g., **`latitude`**, **`longitude`**, **`level`**, **`time`**), properties derived from these coordinates (e.g., **`area`**, **`volume`**), and variables referenced by the **`formula_terms`** attribute of a coordinate variable. Degrading the precision of such grid properties may have unintended side effects on the accuracy of subsequent operations such regridding, interpolation, and conservation checks which should generally be performed with the highest precision possible. For the same reason, it is recommended not to quantize any data variable which is referenced by a **`formula_terms`** attribute of any variable. From 05ac07a8e73af8813a286b227c890a1f9d51e3f4 Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Fri, 26 Jul 2024 13:35:01 -0700 Subject: [PATCH 20/37] Update ch08.adoc Co-authored-by: David Hassell --- ch08.adoc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/ch08.adoc b/ch08.adoc index 13a67958..1b4300ae 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -780,7 +780,9 @@ Quantization of different variables to different levels often makes good scienti ts:quantization = "quantization_info" ; ts:quantization_nsd = 3 ; ---- -Both variables were quantized by the same algorithm and so utilize the same quantization variable. **`quantization_info`** reveals that the Granular BitRound algorithm in NCO performed the quantization. Since the netCDF library did not perform the quantization, there is no system-defined long quantization attribute. +Both variables were quantized by the same algorithm and so utilize the same quantization variable. +**`quantization_info`** reveals that the Granular BitRound algorithm in NCO performed the quantization. +Since the netCDF library did not perform the quantization, there is no system-defined underscored quantization attribute. ==== [[quantization-algorithms-description, Section 8.4.3, "Description of Quantization Algorithms"]] From de29a87d08129d488a3554310f1826ce44a44ce0 Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Fri, 26 Jul 2024 15:21:04 -0700 Subject: [PATCH 21/37] Update conformance.adoc Co-authored-by: David Hassell --- conformance.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/conformance.adoc b/conformance.adoc index baf23337..d0e9790a 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -537,7 +537,7 @@ The requirements on all other bounds tie point variable attributes are the same * The value of **`algorithm`** must be one of the values permitted by this section. * Only floating-point type variables can be quantized. Quantized variables are identified by having a string-valued attribute named **`quantization`**. * The value of **`quantization`** must be the name of the quantization container variable which exists in the file. -* Data variables that were quantized must have an integer type attribute named either **`quantization_nsb`** (for **`algorithm = bitround`**) or **`quantization_nsd`** (for **`bitgroom`**, **`digitround`**, or **`granular_bitround`**). +* Variables that were quantized must have an integer type attribute named either **`quantization_nsb`** (if the corresponding quantization variable has the **`algorithm`** attribute value **`bitround`**) or **`quantization_nsd`** (if the corresponding quantization variable has one of the **`algorithm`** attribute values **`bitgroom`**, **`digitround`**, or **`granular_bitround`**). * The value of **`quantization_nsb`** must be in the range **`1 \<= NSB \<= 23`** for data type **`float`** or **`real`**, and **`1 \<= NSB \<= 52`** for data type **`double`**. * The value of **`quantization_nsd`** must be in the range **`1 \<= NSD \<= 7`** for data type **`float`** or **`real`**, and **`1 \<= NSD \<= 15`** for data type **`double`**. From 0d8354e80aecfcd92970535db92a0d209c1be2df Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Fri, 26 Jul 2024 15:22:33 -0700 Subject: [PATCH 22/37] Update ch08.adoc Co-authored-by: David Hassell --- ch08.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ch08.adoc b/ch08.adoc index 1b4300ae..88eb474f 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -804,5 +804,5 @@ The netCDF C and Fortran libraries can directly invoke BitRound, BitGroom, and G The netCDF library attaches a long, system-defined attribute to every data variable that it quantizes, such as **`_QuantizeBitRoundNumberOfSignificantBits = 9`** in <>. The leading underscore indicates that the netCDF library wrote this attribute [<>]. -Any data variable that has the library-defined attribute should, in addition, contain the corresponding CF metadata. +Any variable that has the library-defined attribute must, in addition, contain the corresponding CF metadata. Example 8.9 shows how the CF metadata might appear for other (non-netCDF library) implementations of **`quantize`** algorithms. From 7cc0412fb0599de4b2283270184bfa0776e5eb48 Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Fri, 26 Jul 2024 15:25:04 -0700 Subject: [PATCH 23/37] Update ch08.adoc Co-authored-by: David Hassell --- ch08.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ch08.adoc b/ch08.adoc index 88eb474f..701a4bf1 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -694,7 +694,7 @@ These conventions also allow users to better understand the precision that data Use of these conventions ensures that all quantized variables are clearly marked as such, and thus alerts users to cases where these guidelines have not been followed. These conventions must not be used with data variables of integer type, or any other kind of CF variable. -This is because fields that describe idealized or reference coordinate grids, or grid transformations, are often known to the highest precision possible. +This is because variables that describe metadata are often known to the highest precision possible, and degrading the precision of metadata properties may have unintended side effects on the accuracy of subsequent operations such regridding, interpolation, and conservation checks. These variables can include spatial and temporal coordinate variables (e.g., **`latitude`**, **`longitude`**, **`level`**, **`time`**), properties derived from these coordinates (e.g., **`area`**, **`volume`**), and variables referenced by the **`formula_terms`** attribute of a coordinate variable. Degrading the precision of such grid properties may have unintended side effects on the accuracy of subsequent operations such regridding, interpolation, and conservation checks which should generally be performed with the highest precision possible. For the same reason, it is recommended not to quantize any data variable which is referenced by a **`formula_terms`** attribute of any variable. From 9ccd62a156e275a62880983c9b3a620b648802c2 Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Fri, 26 Jul 2024 15:27:50 -0700 Subject: [PATCH 24/37] Update ch08.adoc Co-authored-by: David Hassell --- ch08.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ch08.adoc b/ch08.adoc index 701a4bf1..24c9a37f 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -713,7 +713,7 @@ The controlled vocabulary these algorithms thus consists of **`bitround`**, **`b See <> for a brief summary of these algorithms. The second attribute required in a quantization variable is **`implementation`**. -This attribute contains free-form, unstandardized text that concisely conveys the algorithm provenance, including the name of the library or client that performed the quantization, and the software version. +This attribute contains free-form, unstandardized text that concisely conveys the algorithm provenance, including the name of the library or client that performed the quantization, the software version, and any other information required to disambiguate the source of the algorithm employed. **`implementation`** should include any other information required to disambiguate the source of the algorithm employed. [[per-variable-quantization-attributes, Section 8.4.2, "Per-variable Quantization Attributes"]] From dd47793a2983a89c1ced5630075f6acffd7e9692 Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Fri, 26 Jul 2024 15:29:04 -0700 Subject: [PATCH 25/37] Update ch08.adoc Co-authored-by: David Hassell --- ch08.adoc | 1 - 1 file changed, 1 deletion(-) diff --git a/ch08.adoc b/ch08.adoc index 24c9a37f..866fac26 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -714,7 +714,6 @@ See <> for a brief summary of these algorit The second attribute required in a quantization variable is **`implementation`**. This attribute contains free-form, unstandardized text that concisely conveys the algorithm provenance, including the name of the library or client that performed the quantization, the software version, and any other information required to disambiguate the source of the algorithm employed. -**`implementation`** should include any other information required to disambiguate the source of the algorithm employed. [[per-variable-quantization-attributes, Section 8.4.2, "Per-variable Quantization Attributes"]] ==== Per-variable quantization attributes From 91bb467e12ce77dc5e487e6dd495e9fe5035f84c Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Fri, 26 Jul 2024 15:30:27 -0700 Subject: [PATCH 26/37] Update ch08.adoc Co-authored-by: David Hassell --- ch08.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ch08.adoc b/ch08.adoc index 866fac26..44d3fda9 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -726,7 +726,7 @@ The input parameter for all quantization algorithms determines the precision pre BitRound retains the specified number of significant bits (NSB) in the IEEE mantissa, and quantizes the trailing bits. All data variables quantized by BitRound must record the NSB in the **`quantization_nsb`** attribute. Note that BitRound __counts only explicitly represented mantissa bits__. -It does not include the most-significant-bit with value 1 that implicitly begins all [<>] mantissas. +It does not include the most-significant-bit with value 1 that implicitly begins all <> mantissas. Thus **`quantization_nsb`** is an integer type attribute with **`1 \<= NSB \<= 23`** for data type **`float`** or **`real`**, and **`1 \<= NSB \<= 52`** for data type **`double`**. The BitGroom, Granular BitGroom, and DigitRound algorithms guarantee preservation of a specified number of significant digits (NSD) in base 10 representation. From 93ad1095bb6312f59d6ea13a002c965f40d46b40 Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Fri, 26 Jul 2024 15:31:38 -0700 Subject: [PATCH 27/37] Update ch08.adoc Co-authored-by: David Hassell --- ch08.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ch08.adoc b/ch08.adoc index 44d3fda9..d5f21c9f 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -794,7 +794,7 @@ BitRound is preferred when the number of significant bits (NSB) to retain is kno The other **`quantize`** algorithms guarantee to preserve a given number of significant (base-10 representation) digits (NSD). Their quantization errors never exceed half of the unit value at the NSD decimal place [<>]. -BitGroom [<>] appeared first, though is now known to be suboptimal in accuracy [<>] and in compressibility compared to later methods. +BitGroom <> appeared first, though is now known to be suboptimal in accuracy <> and in compressibility compared to later methods. DigitRound [<>] has superior compressibility for a given NSD compared to BitGroom. Granular BitGroom combines the DigitRound approach for compressibility with the BitRound approach for quantization. Granular BitGroom and DigitRound are both good choices when the NSD to retain is known. From 5abbb1b017ff66178df8d0c0e5f7befb0981a2ab Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Fri, 26 Jul 2024 15:32:23 -0700 Subject: [PATCH 28/37] Update ch08.adoc Co-authored-by: David Hassell --- ch08.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ch08.adoc b/ch08.adoc index d5f21c9f..c3bc67e7 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -793,7 +793,7 @@ This is the default [<>] rounding method and is bias-free and conserva BitRound is preferred when the number of significant bits (NSB) to retain is known. The other **`quantize`** algorithms guarantee to preserve a given number of significant (base-10 representation) digits (NSD). -Their quantization errors never exceed half of the unit value at the NSD decimal place [<>]. +Their quantization errors never exceed half of the unit value at the NSD decimal place <>. BitGroom <> appeared first, though is now known to be suboptimal in accuracy <> and in compressibility compared to later methods. DigitRound [<>] has superior compressibility for a given NSD compared to BitGroom. Granular BitGroom combines the DigitRound approach for compressibility with the BitRound approach for quantization. From afb0fe5e8fff9eee401e72196bfea90b4f944063 Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Fri, 26 Jul 2024 15:33:01 -0700 Subject: [PATCH 29/37] Update ch08.adoc Co-authored-by: David Hassell --- ch08.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ch08.adoc b/ch08.adoc index c3bc67e7..6990cfba 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -788,7 +788,7 @@ Since the netCDF library did not perform the quantization, there is no system-de ==== Description of quantization algorithms This section briefly describes and contrasts each recognized **`quantize`** algorithm and points to further documentation. -BitRound is also called the "round-to-nearest" method [<>] and the "half-to-even" method [<>]. +BitRound is also called the "round-to-nearest" method <> and the "half-to-even" method <>. This is the default [<>] rounding method and is bias-free and conservative for random distributions of numbers. BitRound is preferred when the number of significant bits (NSB) to retain is known. From e767aae06cc304d39c1bed04e3daddc4861da761 Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Fri, 26 Jul 2024 15:33:52 -0700 Subject: [PATCH 30/37] Update ch08.adoc Co-authored-by: David Hassell --- ch08.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ch08.adoc b/ch08.adoc index 6990cfba..fbe259ac 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -789,7 +789,7 @@ Since the netCDF library did not perform the quantization, there is no system-de This section briefly describes and contrasts each recognized **`quantize`** algorithm and points to further documentation. BitRound is also called the "round-to-nearest" method <> and the "half-to-even" method <>. -This is the default [<>] rounding method and is bias-free and conservative for random distributions of numbers. +This is the default <> rounding method and is bias-free and conservative for random distributions of numbers. BitRound is preferred when the number of significant bits (NSB) to retain is known. The other **`quantize`** algorithms guarantee to preserve a given number of significant (base-10 representation) digits (NSD). From ef9134430e1027ae54adbeb49816b8cb33775fa7 Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Fri, 26 Jul 2024 15:34:50 -0700 Subject: [PATCH 31/37] Update ch08.adoc Co-authored-by: David Hassell --- ch08.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ch08.adoc b/ch08.adoc index fbe259ac..d3d67f8f 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -795,7 +795,7 @@ BitRound is preferred when the number of significant bits (NSB) to retain is kno The other **`quantize`** algorithms guarantee to preserve a given number of significant (base-10 representation) digits (NSD). Their quantization errors never exceed half of the unit value at the NSD decimal place <>. BitGroom <> appeared first, though is now known to be suboptimal in accuracy <> and in compressibility compared to later methods. -DigitRound [<>] has superior compressibility for a given NSD compared to BitGroom. +DigitRound <> has superior compressibility for a given NSD compared to BitGroom. Granular BitGroom combines the DigitRound approach for compressibility with the BitRound approach for quantization. Granular BitGroom and DigitRound are both good choices when the NSD to retain is known. From ac96f722b6b9e23db78a0bdd84ad4b5b756e6f5e Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Fri, 26 Jul 2024 15:35:25 -0700 Subject: [PATCH 32/37] Update ch08.adoc Co-authored-by: David Hassell --- ch08.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ch08.adoc b/ch08.adoc index d3d67f8f..65aabc5e 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -802,6 +802,6 @@ Granular BitGroom and DigitRound are both good choices when the NSD to retain is The netCDF C and Fortran libraries can directly invoke BitRound, BitGroom, and Granular BitRound [<>]. The netCDF library attaches a long, system-defined attribute to every data variable that it quantizes, such as **`_QuantizeBitRoundNumberOfSignificantBits = 9`** in <>. -The leading underscore indicates that the netCDF library wrote this attribute [<>]. +The leading underscore indicates that the netCDF library wrote this attribute <>. Any variable that has the library-defined attribute must, in addition, contain the corresponding CF metadata. Example 8.9 shows how the CF metadata might appear for other (non-netCDF library) implementations of **`quantize`** algorithms. From f4f80a4289b8a548222f639710f0c0fa65957b0d Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Tue, 30 Jul 2024 09:02:52 -0700 Subject: [PATCH 33/37] Update ch08.adoc Co-authored-by: David Hassell --- ch08.adoc | 1 - 1 file changed, 1 deletion(-) diff --git a/ch08.adoc b/ch08.adoc index 65aabc5e..e8d70184 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -696,7 +696,6 @@ Use of these conventions ensures that all quantized variables are clearly marked These conventions must not be used with data variables of integer type, or any other kind of CF variable. This is because variables that describe metadata are often known to the highest precision possible, and degrading the precision of metadata properties may have unintended side effects on the accuracy of subsequent operations such regridding, interpolation, and conservation checks. These variables can include spatial and temporal coordinate variables (e.g., **`latitude`**, **`longitude`**, **`level`**, **`time`**), properties derived from these coordinates (e.g., **`area`**, **`volume`**), and variables referenced by the **`formula_terms`** attribute of a coordinate variable. -Degrading the precision of such grid properties may have unintended side effects on the accuracy of subsequent operations such regridding, interpolation, and conservation checks which should generally be performed with the highest precision possible. For the same reason, it is recommended not to quantize any data variable which is referenced by a **`formula_terms`** attribute of any variable. [[quantization-variables, Section 8.4.1, "Quantization Variables"]] From 9215cde3d2213cb52861260c4d4b9ddbc5461bee Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Tue, 30 Jul 2024 14:53:40 -0700 Subject: [PATCH 34/37] Update ch08.adoc Co-authored-by: David Hassell --- ch08.adoc | 1 - 1 file changed, 1 deletion(-) diff --git a/ch08.adoc b/ch08.adoc index e8d70184..37377520 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -696,7 +696,6 @@ Use of these conventions ensures that all quantized variables are clearly marked These conventions must not be used with data variables of integer type, or any other kind of CF variable. This is because variables that describe metadata are often known to the highest precision possible, and degrading the precision of metadata properties may have unintended side effects on the accuracy of subsequent operations such regridding, interpolation, and conservation checks. These variables can include spatial and temporal coordinate variables (e.g., **`latitude`**, **`longitude`**, **`level`**, **`time`**), properties derived from these coordinates (e.g., **`area`**, **`volume`**), and variables referenced by the **`formula_terms`** attribute of a coordinate variable. -For the same reason, it is recommended not to quantize any data variable which is referenced by a **`formula_terms`** attribute of any variable. [[quantization-variables, Section 8.4.1, "Quantization Variables"]] ==== Quantization variables From b950ad248b4dc929ed55ee8660e83478751c1099 Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Tue, 30 Jul 2024 14:59:14 -0700 Subject: [PATCH 35/37] Remove overly meta sentence flagged by DH. --- ch08.adoc | 1 - 1 file changed, 1 deletion(-) diff --git a/ch08.adoc b/ch08.adoc index 37377520..6ee61451 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -691,7 +691,6 @@ The goals are twofold. First, to inform interested users how, and to what degree, the quantized data differ from the original unquantized data, which are not stored in the dataset and may no longer exist. Second, to provide the necessary provenance metadata for users to reproduce the data transformations on the same or other raw data. These conventions also allow users to better understand the precision that data producers expect from source models or measurements. -Use of these conventions ensures that all quantized variables are clearly marked as such, and thus alerts users to cases where these guidelines have not been followed. These conventions must not be used with data variables of integer type, or any other kind of CF variable. This is because variables that describe metadata are often known to the highest precision possible, and degrading the precision of metadata properties may have unintended side effects on the accuracy of subsequent operations such regridding, interpolation, and conservation checks. From b467fcf2b9165f5151488d7c8d78b77cd498eae4 Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Tue, 13 Aug 2024 22:19:49 -0700 Subject: [PATCH 36/37] Implement latest suggestions by JG to resolve DH review points regarding quantization of ancillary/domain variables, and the form of the implementation attribute. Plus miscellaneous fixes. --- ch08.adoc | 19 +++++++++++-------- conformance.adoc | 10 ++++------ 2 files changed, 15 insertions(+), 14 deletions(-) diff --git a/ch08.adoc b/ch08.adoc index 6ee61451..66cc1d0f 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -692,8 +692,9 @@ First, to inform interested users how, and to what degree, the quantized data di Second, to provide the necessary provenance metadata for users to reproduce the data transformations on the same or other raw data. These conventions also allow users to better understand the precision that data producers expect from source models or measurements. -These conventions must not be used with data variables of integer type, or any other kind of CF variable. -This is because variables that describe metadata are often known to the highest precision possible, and degrading the precision of metadata properties may have unintended side effects on the accuracy of subsequent operations such regridding, interpolation, and conservation checks. +These conventions must not be used with data variables of integer type. +They must not be used with any variable, even if it is also a data variable, that serves as a coordinate variable, or is named by a **`coordinates`**, **`formula_terms`** or **`cell_measures`** attribute of any other variable. +This is because variables that provide metadata or are used in computation of domain metrics are often known to the highest precision possible, and degrading the precision of metadata properties may have unintended side effects on the accuracy of subsequent operations such as regridding, interpolation, and conservation checks. These variables can include spatial and temporal coordinate variables (e.g., **`latitude`**, **`longitude`**, **`level`**, **`time`**), properties derived from these coordinates (e.g., **`area`**, **`volume`**), and variables referenced by the **`formula_terms`** attribute of a coordinate variable. [[quantization-variables, Section 8.4.1, "Quantization Variables"]] @@ -710,7 +711,9 @@ The controlled vocabulary these algorithms thus consists of **`bitround`**, **`b See <> for a brief summary of these algorithms. The second attribute required in a quantization variable is **`implementation`**. -This attribute contains free-form, unstandardized text that concisely conveys the algorithm provenance, including the name of the library or client that performed the quantization, the software version, and any other information required to disambiguate the source of the algorithm employed. +This attribute contains unstandardized text that concisely conveys the algorithm provenance including the name of the library or client that performed the quantization, the software version, and any other information required to disambiguate the source of the algorithm employed. +The text must take the form "_software-name_ version _version-string_ [( _optional-information_ )]" such as +**`libnetcdf version 4.9.2`** in <>. [[per-variable-quantization-attributes, Section 8.4.2, "Per-variable Quantization Attributes"]] ==== Per-variable quantization attributes @@ -726,9 +729,9 @@ Note that BitRound __counts only explicitly represented mantissa bits__. It does not include the most-significant-bit with value 1 that implicitly begins all <> mantissas. Thus **`quantization_nsb`** is an integer type attribute with **`1 \<= NSB \<= 23`** for data type **`float`** or **`real`**, and **`1 \<= NSB \<= 52`** for data type **`double`**. -The BitGroom, Granular BitGroom, and DigitRound algorithms guarantee preservation of a specified number of significant digits (NSD) in base 10 representation. +The BitGroom, Granular BitRound, and DigitRound algorithms guarantee preservation of a specified number of significant digits (NSD) in base 10 representation. The actual number of mantissa bits quantized depends on the algorithm. -Thus all data variables quantized by BitGroom, Granular BitGroom, or DigitRound must have a corresponding attribute **`quantization_nsd`**. +Thus all data variables quantized by BitGroom, Granular BitRound, or DigitRound must have a corresponding attribute **`quantization_nsd`**. The value of **`quantization_nsd`** is an integer with **`1 \<= NSD \<= 7`** for data type **`float`** or **`real`**, and **`1 \<= NSD \<= 15`** for data type **`double`**. [[example-quantization-nsb-libnetcdf]] @@ -755,7 +758,7 @@ The netCDF library wrote the system-defined **`_QuantizeBitRoundNumberOfSignific [[example-quantization-nsd-multiple-variables-nco]] [caption="Example 8.9. "] -.Quantization performed by Granular BitGroom algorithm in NCO +.Quantization performed by Granular BitRound algorithm in NCO ==== Quantization of different variables to different levels often makes good scientific sense. Here the pressure variable **`ps`** has four significant digits of precision while the temperature variable **`ts`** retains only three significant digits. ---- @@ -793,8 +796,8 @@ The other **`quantize`** algorithms guarantee to preserve a given number of sign Their quantization errors never exceed half of the unit value at the NSD decimal place <>. BitGroom <> appeared first, though is now known to be suboptimal in accuracy <> and in compressibility compared to later methods. DigitRound <> has superior compressibility for a given NSD compared to BitGroom. -Granular BitGroom combines the DigitRound approach for compressibility with the BitRound approach for quantization. -Granular BitGroom and DigitRound are both good choices when the NSD to retain is known. +Granular BitRound combines the DigitRound approach for compressibility with the BitRound approach for quantization. +Granular BitRound and DigitRound are both good choices when the NSD to retain is known. The netCDF C and Fortran libraries can directly invoke BitRound, BitGroom, and Granular BitRound [<>]. The netCDF library attaches a long, system-defined attribute to every data variable that it quantizes, such as diff --git a/conformance.adoc b/conformance.adoc index 8f4fb4b3..4e490ab9 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -548,12 +548,10 @@ The requirements on all other bounds tie point variable attributes are the same * Variables that were quantized must have an integer type attribute named either **`quantization_nsb`** (if the corresponding quantization variable has the **`algorithm`** attribute value **`bitround`**) or **`quantization_nsd`** (if the corresponding quantization variable has one of the **`algorithm`** attribute values **`bitgroom`**, **`digitround`**, or **`granular_bitround`**). * The value of **`quantization_nsb`** must be in the range **`1 \<= NSB \<= 23`** for data type **`float`** or **`real`**, and **`1 \<= NSB \<= 52`** for data type **`double`**. * The value of **`quantization_nsd`** must be in the range **`1 \<= NSD \<= 7`** for data type **`float`** or **`real`**, and **`1 \<= NSD \<= 15`** for data type **`double`**. - -*Recommendations:* - -* The value of **`implementation`** should be specified as _library_ _version_ _number_, e.g., **`libnetcdf version 4.9.2`** or as _client_ _version_ _number_, e.g., **`NCO version 5.2.6`**. - -* Data variables that appear in **`formula_terms`** attributes should not be quantized and therefore should not have a **`quantization`** attribute. +* Variables that serve as a coordinate variable, or are named by a **`coordinates`**, **`formula_terms`**, or **`cell_measures`** attribute of any other variable must not have a **`quantization`** attribute. +* The value of **`implementation`** must take the form +"_software-name_ version _version-string_ [( _optional-information_ )]". +where brackets indicate optional words. [[parametric-vertical-coordinates]] === Appendix D Parametric Vertical Coordinates From 85a57c12aed847af9bed29d8ab6d9a1f8a294c4b Mon Sep 17 00:00:00 2001 From: Charlie Zender Date: Wed, 14 Aug 2024 14:08:00 -0700 Subject: [PATCH 37/37] Fix two small typos --- ch08.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ch08.adoc b/ch08.adoc index 66cc1d0f..6c387ad2 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -707,7 +707,7 @@ Quantization variables are required to have at least two attributes: **`algorith The **`algorithm`** attribute names a specific quantization algorithm. Four quantization algorithms are currently recognized: BitRound, BitGroom, DigitRound, and Granular BitRound. -The controlled vocabulary these algorithms thus consists of **`bitround`**, **`bitgroom`**, **`digitround`**, and **`granular_bitround`**. +The controlled vocabulary for these algorithms thus consists of **`bitround`**, **`bitgroom`**, **`digitround`**, and **`granular_bitround`**. See <> for a brief summary of these algorithms. The second attribute required in a quantization variable is **`implementation`**. @@ -719,7 +719,7 @@ The text must take the form "_software-name_ version _version-string_ [( _option ==== Per-variable quantization attributes Each data variable that has been quantized must include at least two attributes to describe the quantization. -First, all such data variables must have a `**quantization**` attribute containing the name of the quantization variable describing the algorithm. +First, all such data variables must have a **`quantization`** attribute containing the name of the quantization variable describing the algorithm. Second, all such variables must record the specific parameter value used in the quantization algorithm. The input parameter for all quantization algorithms determines the precision preserved by the algorithm.