Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsigned byte according to CF conventions #493

Open
cpaulik opened this issue Dec 1, 2015 · 10 comments
Open

Unsigned byte according to CF conventions #493

cpaulik opened this issue Dec 1, 2015 · 10 comments

Comments

@cpaulik
Copy link
Contributor

cpaulik commented Dec 1, 2015

Reading Section 2.2 of the CF conventions I gather that I can only use np.byte datatypes but if valid_range is using np.ubyte values then the byte data should be interpreted as unsigned.

This is currently not the case in this library. If we want to support this during auto_mask_and_scale then I'm happy to look into it.

@jswhit
Copy link
Collaborator

jswhit commented Dec 1, 2015

You can use unsigned data types when the file is created with the NETCDF4 flag, or even in the old format if the file is created using the new NETCDF3_64BIT_DATA flag available in version 1.2.2 (4.4.0 of the C lib). So, unless I'm missing something, it seems the CF section you refer to is out of date, and is trying to work around a problem that no longer exists.

@jswhit
Copy link
Collaborator

jswhit commented Dec 1, 2015

cf-convention/CF-2#3

@jswhit
Copy link
Collaborator

jswhit commented Dec 1, 2015

I guess I see your point though - there are probably a lot of files out there with byte variables that should be interpreted as unsigned. Are you proposing we check valid_range and returned an np.uint8 array if valid_range indicates the data is unsigned?

Not sure where auto_mask_and_scale enters into it...

@cpaulik
Copy link
Contributor Author

cpaulik commented Dec 1, 2015

Writing ubyte is very possible with the library. The problem I'm having is that I work in a project where the data has to be CF-conform according to the Compliance Checker so I can not use ubyte even though it would be technically possible and sensible.

Long story short: Yes I do propose that we check valid_range, valid_min and valid_max attributes and return np.uint8 if these attributes have unsigned data type.

I guess the decision would be:
if both attributes of valid_range are uint8 or if both valid_min and valid_max are uint8 then return uint8
Should this be the default or is it too likely that this conversion would break somebodies code?

On second thought auto_mask_and_scale does not really enter into it since it is only for offset and scale factor.

@jswhit
Copy link
Collaborator

jswhit commented Dec 1, 2015

Do you want np.uint8 if if valid_range is unsigned, or if valid_min > 0?
The only thing that concerns me about this is that up until now we have left metadata conventions to be the concern of downstream applications, and let netcdf4-python handle the general low-level detail of reading and writing. However, in this case I don't see the harm of returning an unsigned numpy array if the valid_min/valid_max indicate that the data should fit. What is the real benefit though - client code could always cast the array to uint as needed, right?

@shoyer
Copy link
Contributor

shoyer commented Dec 2, 2015

Long story short: Yes I do propose that we check valid_range, valid_min and valid_max attributes and return np.uint8 if these attributes have unsigned data type.

I am 👎 on returning data with a different data type to follow CF conventions, unless the option can be toggled on/off. For tools like xray, we definitely don't want to be using this behavior in netCDF4. My preference would be for such new behavior to be opt in, because it would be slightly trickier to disable it otherwise in a cross-version compatible manner.

The only thing that concerns me about this is that up until now we have left metadata conventions to be the concern of downstream applications, and let netcdf4-python handle the general low-level detail of reading and writing.

This is exactly my perspective 👍.

@ocefpaf
Copy link
Collaborator

ocefpaf commented Dec 2, 2015

I am also a 👍 to leave any convention, beyond those that define what a netCDF file is, to the downstream applications.

@cpaulik
Copy link
Contributor Author

cpaulik commented Dec 2, 2015

That is fine with me. But the question is then what of the netCDF Attribute conventions should be implemented?

Should we follow http://www.unidata.ucar.edu/software/netcdf/docs/netcdf.html ? If so then the section about Attribute Conventions specifies the same implicit conversion in the section about signedness which says

Deprecated attribute, originally designed to indicate whether byte values should be treated as signed or unsigned. The attributes valid_min and valid_max may be used for this purpose. For example, if you intend that a byte variable store only non-negative values, you can use valid_min = 0 and valid_max = 255

@cpaulik
Copy link
Contributor Author

cpaulik commented Dec 2, 2015

Anyway. If we want this then it should definitely be opt-in.

@JohnLCaron
Copy link

Generally the netcdf libraries dont use the attribute conventions (except for the reserved _ (underscore) attributes). its up to the "user" to handle that. in java we have a wrapper class that will take into account attribute conventions, so the user can get it with or without.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants