Unsigned byte according to CF conventions #493

cpaulik · 2015-12-01T10:00:55Z

Reading Section 2.2 of the CF conventions I gather that I can only use np.byte datatypes but if valid_range is using np.ubyte values then the byte data should be interpreted as unsigned.

This is currently not the case in this library. If we want to support this during auto_mask_and_scale then I'm happy to look into it.

The text was updated successfully, but these errors were encountered:

jswhit · 2015-12-01T13:39:27Z

You can use unsigned data types when the file is created with the NETCDF4 flag, or even in the old format if the file is created using the new NETCDF3_64BIT_DATA flag available in version 1.2.2 (4.4.0 of the C lib). So, unless I'm missing something, it seems the CF section you refer to is out of date, and is trying to work around a problem that no longer exists.

jswhit · 2015-12-01T13:44:43Z

cf-convention/CF-2#3

jswhit · 2015-12-01T13:51:02Z

I guess I see your point though - there are probably a lot of files out there with byte variables that should be interpreted as unsigned. Are you proposing we check valid_range and returned an np.uint8 array if valid_range indicates the data is unsigned?

Not sure where auto_mask_and_scale enters into it...

cpaulik · 2015-12-01T15:24:55Z

Writing ubyte is very possible with the library. The problem I'm having is that I work in a project where the data has to be CF-conform according to the Compliance Checker so I can not use ubyte even though it would be technically possible and sensible.

Long story short: Yes I do propose that we check valid_range, valid_min and valid_max attributes and return np.uint8 if these attributes have unsigned data type.

I guess the decision would be:
if both attributes of valid_range are uint8 or if both valid_min and valid_max are uint8 then return uint8
Should this be the default or is it too likely that this conversion would break somebodies code?

On second thought auto_mask_and_scale does not really enter into it since it is only for offset and scale factor.

jswhit · 2015-12-01T18:56:27Z

Do you want np.uint8 if if valid_range is unsigned, or if valid_min > 0?
The only thing that concerns me about this is that up until now we have left metadata conventions to be the concern of downstream applications, and let netcdf4-python handle the general low-level detail of reading and writing. However, in this case I don't see the harm of returning an unsigned numpy array if the valid_min/valid_max indicate that the data should fit. What is the real benefit though - client code could always cast the array to uint as needed, right?

shoyer · 2015-12-02T18:35:26Z

Long story short: Yes I do propose that we check valid_range, valid_min and valid_max attributes and return np.uint8 if these attributes have unsigned data type.

I am 👎 on returning data with a different data type to follow CF conventions, unless the option can be toggled on/off. For tools like xray, we definitely don't want to be using this behavior in netCDF4. My preference would be for such new behavior to be opt in, because it would be slightly trickier to disable it otherwise in a cross-version compatible manner.

The only thing that concerns me about this is that up until now we have left metadata conventions to be the concern of downstream applications, and let netcdf4-python handle the general low-level detail of reading and writing.

This is exactly my perspective 👍.

ocefpaf · 2015-12-02T18:42:42Z

I am also a 👍 to leave any convention, beyond those that define what a netCDF file is, to the downstream applications.

cpaulik · 2015-12-02T19:33:45Z

That is fine with me. But the question is then what of the netCDF Attribute conventions should be implemented?

Should we follow http://www.unidata.ucar.edu/software/netcdf/docs/netcdf.html ? If so then the section about Attribute Conventions specifies the same implicit conversion in the section about signedness which says

Deprecated attribute, originally designed to indicate whether byte values should be treated as signed or unsigned. The attributes valid_min and valid_max may be used for this purpose. For example, if you intend that a byte variable store only non-negative values, you can use valid_min = 0 and valid_max = 255

cpaulik · 2015-12-02T19:57:32Z

Anyway. If we want this then it should definitely be opt-in.

JohnLCaron · 2015-12-02T23:41:11Z

Generally the netcdf libraries dont use the attribute conventions (except for the reserved _ (underscore) attributes). its up to the "user" to handle that. in java we have a wrapper class that will take into account attribute conventions, so the user can get it with or without.

forman mentioned this issue Apr 11, 2016

Problem interpreting short arrays #554

Closed

deeplycloudy mentioned this issue May 4, 2017

Interpretation of reserved _Unsigned attribute written by netCDF-Java #656

Closed

cpaulik mentioned this issue Jun 1, 2017

Usage of valid_min, valid_max attributes for masking data #670

Closed

aleksandervines mentioned this issue Jan 11, 2018

Check CF-compliance of test files nansencenter/nansat#184

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unsigned byte according to CF conventions #493

Unsigned byte according to CF conventions #493

cpaulik commented Dec 1, 2015

jswhit commented Dec 1, 2015

jswhit commented Dec 1, 2015

jswhit commented Dec 1, 2015

cpaulik commented Dec 1, 2015

jswhit commented Dec 1, 2015

shoyer commented Dec 2, 2015

ocefpaf commented Dec 2, 2015

cpaulik commented Dec 2, 2015

cpaulik commented Dec 2, 2015

JohnLCaron commented Dec 2, 2015

Unsigned byte according to CF conventions #493

Unsigned byte according to CF conventions #493

Comments

cpaulik commented Dec 1, 2015

jswhit commented Dec 1, 2015

jswhit commented Dec 1, 2015

jswhit commented Dec 1, 2015

cpaulik commented Dec 1, 2015

jswhit commented Dec 1, 2015

shoyer commented Dec 2, 2015

ocefpaf commented Dec 2, 2015

cpaulik commented Dec 2, 2015

cpaulik commented Dec 2, 2015

JohnLCaron commented Dec 2, 2015