Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Support integer data types for netCDF _Unsigned convention #6352

Closed
snowman2 opened this issue Sep 13, 2022 · 5 comments
Closed

BUG: Support integer data types for netCDF _Unsigned convention #6352

snowman2 opened this issue Sep 13, 2022 · 5 comments
Assignees

Comments

@snowman2
Copy link
Contributor

Related:

Currently, GDAL only checks for the unsigned type when the datatype is byte:

// Special For Byte Bands: check for signed/unsigned byte.
if( nc_datatype == NC_BYTE )
{
// netcdf uses signed byte by default, but GDAL uses unsigned by default
// This may cause unexpected results, but is needed for back-compat.
if( poNCDFDS->bIsGdalFile )
bSignedData = false;
else
bSignedData = true;
// For NC4 format NC_BYTE is (normally) signed, NC_UBYTE is unsigned.
// But in case a NC3 file was converted automatically and has hints
// that it is unsigned, take them into account
if( poNCDFDS->eFormat == NCDF_FORMAT_NC4 )
{
bSignedData = true;
}
// Fix nodata value as it was stored signed.
if( !bSignedData && dfNoData < 0 )
{
dfNoData += 256;
}
// If we got valid_range, test for signed/unsigned range.
// http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Attribute-Conventions.html
if( bValidRangeValid )
{
// If we got valid_range={0,255}, treat as unsigned.
if( adfValidRange[0] == 0 && adfValidRange[1] == 255 )
{
bSignedData = false;
// Fix nodata value as it was stored signed.
if( dfNoData < 0 )
{
dfNoData += 256;
}
// Reset valid_range.
bValidRangeValid = false;
}
// If we got valid_range={-128,127}, treat as signed.
else if( adfValidRange[0] == -128 && adfValidRange[1] == 127 )
{
bSignedData = true;
// Reset valid_range.
bValidRangeValid = false;
}
}
// Else test for _Unsigned.
// http://www.unidata.ucar.edu/software/netcdf/docs/BestPractices.html
else
{
char *pszTemp = nullptr;
if( NCDFGetAttr(cdfid, nZId, "_Unsigned", &pszTemp) == CE_None )
{
if( EQUAL(pszTemp, "true") )
bSignedData = false;
else if( EQUAL(pszTemp, "false") )
bSignedData = true;
CPLFree(pszTemp);
}
// Fix nodata value as it was stored signed.
if( !bSignedData && dfNoData < 0 )
{
dfNoData += 256;
}
}
if( bSignedData )
{
// set PIXELTYPE=SIGNEDBYTE
// See http://trac.osgeo.org/gdal/wiki/rfc14_imagestructure
GDALPamRasterBand::SetMetadataItem("PIXELTYPE", "SIGNEDBYTE", "IMAGE_STRUCTURE");
}
}

However, there are scenarios when the datatype is an integer (See: corteva/rioxarray#574). Thoughts on this?

Also, sometimes users want to convert this data to other formats. I didn't see a standard way to preserve the original data type while denoting that the data should be converted to unsigned in the GDAL metadata. Did I miss something?

@snowman2
Copy link
Contributor Author

From file referenced in corteva/rioxarray#574:

	short LST(y, x) ;
		LST:_FillValue = -1s ;
		LST:long_name = "ABI L2+ Land Surface (Skin) Temperature" ;
		LST:standard_name = "surface_temperature" ;
		LST:_Unsigned = "true" ;
		LST:valid_range = 9200s, -9536s ;
		LST:scale_factor = 0.0025f ;
		LST:add_offset = 190.f ;
		LST:units = "K" ;
		LST:resolution = "y: 0.000056 rad x: 0.000056 rad" ;
		LST:coordinates = "retrieval_local_zenith_angle quantitative_local_zenith_angle solar_zenith_angle t y x" ;
		LST:grid_mapping = "goes_imager_projection" ;
		LST:cell_methods = "retrieval_local_zenith_angle: point (good or degraded quality pixel produced) quantitative_local_zenith_angle: point (good quality pixel produced) solar_zenith_angle: point (good quality pixel produced) t: point area: point where land" ;
		LST:ancillary_variables = "DQF" ;

@snowman2
Copy link
Contributor Author

@rouault
Copy link
Member

rouault commented Sep 13, 2022

I didn't see a standard way to preserve the original data type while denoting that the data should be converted to unsigned in the GDAL metadata. Did I miss something?

no. That would be quite horrible to have in the GDAL abstraction that remnant of a netCDF hack (or "implementation detail" if one wants to be more neutral)
(That said, the only exception to this is for signed byte as we don't have a native signed byte type for GDAL raster data types. So this is modeled as GDT_Byte + PIXELTYPE=SIGNEDBYTE in band STRUCTURE metadata domain.)

The right fix is here would be to expose short + _Unsigned=true as GDT_UInt16

@snowman2
Copy link
Contributor Author

That makes sense to me. Thanks for the clarification 👍. I will update the issue title.

@snowman2 snowman2 changed the title Handling of netCDF unsigned data conventions BUG: Support integer data types for netCDF _Unsigned convention Sep 14, 2022
@rouault rouault self-assigned this Sep 15, 2022
rouault added a commit that referenced this issue Sep 16, 2022
netCDF: handle variables of type NC_SHORT with _Unsigned=true as GDT_UInt16 (fixes #6352)
@snowman2
Copy link
Contributor Author

Thanks @rouault 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants