Skip to content
This repository has been archived by the owner on Mar 6, 2023. It is now read-only.
Mike Taves edited this page Oct 24, 2019 · 9 revisions

NetCDF is a binary file format used to store multidimensional data. This document is used to describe conventions used for model interoperability.

File structure

General naming rules

NetCDF files are built with named components, including dimensions, variables and attributes. The name for each of these should normally use only lower case characters (with a few exceptions), underscores and numbers. But no spaces, hyphen-minus or special characters. Essentially, these names should look as they would in a programming environment like Python. Also note that these names are often case sensitive.

Dimensions

Dimensions are used within variables to indicate their shape and size. The order specified is generally T, Z, Y and X.

Here is a list of common dimensions, in the order of how they should be used within variables:

  1. time - usually UNLIMITED, even if there is only one
  2. depth
  3. latitude / northing
  4. longitude / easting
  5. reach / id - or any type of unique identifier
  6. Any other dimension, such as ensemble or scenario, which is often one

Variables

A netCDF file generally has a variable to describe each dimension, often using the same name (e.g. float time(time)).

A variable has both dimension and attribute properties (see "Dimensions" and "Attributes")

Attributes

Each variable should have three attributes:

  • cdsm_name - the CDSM name, as agreed upon within the interoperable model group; must be unique within this file.
  • standard_name - if defined by the CF Standard Names, then use this. However, if not defined, use "(no standard name)"
  • long_name - a descriptive name for the variable, for example "easting" or "evapotranspiration flux"
  • units - e.g. "m", "m3 s-1", "dimensionless", "hours since 1970-01-01 00:00:00" (see Time variable)

Data types, values, missing data

Data type in netCDF include external and use-defined. External data types start with "NC_" and are compatible with those in other program languages. For example, "NC_INT" is 32-bit signed integer.

While not required, it is a good habit to assign following attributes to a variable for any data type.

  • _FillValue - a special value that indicates a missing value (e.g. NA or blank)
  • valid_min - lower limit for variable, e.g. 0
  • valid_max - upper limit for variable, e.g. 1e+6

Time variable

Date/times are represented numerically, by referencing a special units, which is agreed to be fixed at "hours since 1970-01-01 00:00:00" with calendar specified as gregorian or standard (i.e. real dates). The data type can then be any numeric type.

To convert to/from these dates in Python, see netCDF4's date2num and num2date functions.

Cell methods

Attribute cell_methods is used to describe the characteristic of a field that is represented by cell values (see CF conventions for cell methods). Its value is given by a string in the form of name: method. For example, cell_methods of variable river_flow_rate(time,site) can be time: mean, indicating that each value of river_flow_rate is the [mean] of river flow rate for that the given time period. See more methods in Cell Methods.

Global attributes

Global attributes describe general information. A minimal set of global attributes should include:

  • title - a short description for file contents, e.g. "simulated streamflow"
  • institution - e.g. "NIWA"
  • Conventions (note the upper "C") - normally "CF-1.7"
  • source - normally the name of the simulation software
  • comments - any relevant notes on the data which could be useful

Resources