BUG: Support integer data types for netCDF _Unsigned convention #6352

snowman2 · 2022-09-13T21:03:11Z

to_raster() save function changes original data corteva/rioxarray#574
Interpretation of reserved _Unsigned attribute written by netCDF-Java Unidata/netcdf4-python#656
Best practice when the _Unsigned attribute is present in NetCDF files pydata/xarray#1444

Currently, GDAL only checks for the unsigned type when the datatype is byte:

Lines 626 to 702 in 8e4c84b

    
           // Special For Byte Bands: check for signed/unsigned byte. 
        
           if( nc_datatype == NC_BYTE ) 
        
           { 
        
               // netcdf uses signed byte by default, but GDAL uses unsigned by default 
        
               // This may cause unexpected results, but is needed for back-compat. 
        
               if( poNCDFDS->bIsGdalFile ) 
        
                   bSignedData = false; 
        
               else 
        
                   bSignedData = true; 
        
               // For NC4 format NC_BYTE is (normally) signed, NC_UBYTE is unsigned. 
        
               // But in case a NC3 file was converted automatically and has hints 
        
               // that it is unsigned, take them into account 
        
               if( poNCDFDS->eFormat == NCDF_FORMAT_NC4 ) 
        
               { 
        
                   bSignedData = true; 
        
               } 
        
               // Fix nodata value as it was stored signed. 
        
               if( !bSignedData && dfNoData < 0 ) 
        
               { 
        
                   dfNoData += 256; 
        
               } 
        
               // If we got valid_range, test for signed/unsigned range. 
        
               // http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Attribute-Conventions.html 
        
               if( bValidRangeValid ) 
        
               { 
        
                   // If we got valid_range={0,255}, treat as unsigned. 
        
                   if( adfValidRange[0] == 0 && adfValidRange[1] == 255 ) 
        
                   { 
        
                       bSignedData = false; 
        
                       // Fix nodata value as it was stored signed. 
        
                       if( dfNoData < 0 ) 
        
                       { 
        
                           dfNoData += 256; 
        
                       } 
        
                       // Reset valid_range. 
        
                       bValidRangeValid = false; 
        
                   } 
        
                   // If we got valid_range={-128,127}, treat as signed. 
        
                   else if( adfValidRange[0] == -128 && adfValidRange[1] == 127 ) 
        
                   { 
        
                       bSignedData = true; 
        
                       // Reset valid_range. 
        
                       bValidRangeValid = false; 
        
                   } 
        
               } 
        
               // Else test for _Unsigned. 
        
               // http://www.unidata.ucar.edu/software/netcdf/docs/BestPractices.html 
        
               else 
        
               { 
        
                   char *pszTemp = nullptr; 
        
                   if( NCDFGetAttr(cdfid, nZId, "_Unsigned", &pszTemp) == CE_None ) 
        
                   { 
        
                       if( EQUAL(pszTemp, "true") ) 
        
                           bSignedData = false; 
        
                       else if( EQUAL(pszTemp, "false") ) 
        
                           bSignedData = true; 
        
                       CPLFree(pszTemp); 
        
                   } 
        
                   // Fix nodata value as it was stored signed. 
        
                   if( !bSignedData && dfNoData < 0 ) 
        
                   { 
        
                       dfNoData += 256; 
        
                   } 
        
               } 
        
               if( bSignedData ) 
        
               { 
        
                   // set PIXELTYPE=SIGNEDBYTE 
        
                   // See http://trac.osgeo.org/gdal/wiki/rfc14_imagestructure 
        
                   GDALPamRasterBand::SetMetadataItem("PIXELTYPE", "SIGNEDBYTE", "IMAGE_STRUCTURE"); 
        
               } 
        
           }

However, there are scenarios when the datatype is an integer (See: corteva/rioxarray#574). Thoughts on this?

Also, sometimes users want to convert this data to other formats. I didn't see a standard way to preserve the original data type while denoting that the data should be converted to unsigned in the GDAL metadata. Did I miss something?

snowman2 · 2022-09-13T21:10:04Z

From file referenced in corteva/rioxarray#574:

	short LST(y, x) ;
		LST:_FillValue = -1s ;
		LST:long_name = "ABI L2+ Land Surface (Skin) Temperature" ;
		LST:standard_name = "surface_temperature" ;
		LST:_Unsigned = "true" ;
		LST:valid_range = 9200s, -9536s ;
		LST:scale_factor = 0.0025f ;
		LST:add_offset = 190.f ;
		LST:units = "K" ;
		LST:resolution = "y: 0.000056 rad x: 0.000056 rad" ;
		LST:coordinates = "retrieval_local_zenith_angle quantitative_local_zenith_angle solar_zenith_angle t y x" ;
		LST:grid_mapping = "goes_imager_projection" ;
		LST:cell_methods = "retrieval_local_zenith_angle: point (good or degraded quality pixel produced) quantitative_local_zenith_angle: point (good quality pixel produced) solar_zenith_angle: point (good quality pixel produced) t: point area: point where land" ;
		LST:ancillary_variables = "DQF" ;

snowman2 · 2022-09-13T21:10:50Z

How xarray handles it:
https://github.com/pydata/xarray/blob/abe1e613a96b000ae603c53d135828df532b952e/xarray/coding/variables.py#L306-L334

rouault · 2022-09-13T21:41:38Z

I didn't see a standard way to preserve the original data type while denoting that the data should be converted to unsigned in the GDAL metadata. Did I miss something?

no. That would be quite horrible to have in the GDAL abstraction that remnant of a netCDF hack (or "implementation detail" if one wants to be more neutral)
(That said, the only exception to this is for signed byte as we don't have a native signed byte type for GDAL raster data types. So this is modeled as GDT_Byte + PIXELTYPE=SIGNEDBYTE in band STRUCTURE metadata domain.)

The right fix is here would be to expose short + _Unsigned=true as GDT_UInt16

snowman2 · 2022-09-14T01:36:15Z

That makes sense to me. Thanks for the clarification 👍. I will update the issue title.

netCDF: handle variables of type NC_SHORT with _Unsigned=true as GDT_UInt16 (fixes #6352)

snowman2 · 2022-09-16T22:10:33Z

Thanks @rouault 👍

snowman2 changed the title ~~Handling of netCDF unsigned data conventions~~ BUG: Support integer data types for netCDF _Unsigned convention Sep 14, 2022

rouault self-assigned this Sep 15, 2022

rouault closed this as completed in b7cf47c Sep 16, 2022

rouault added a commit that referenced this issue Sep 16, 2022

Merge pull request #6369 from rouault/fix_6352

d9abe2e

netCDF: handle variables of type NC_SHORT with _Unsigned=true as GDT_UInt16 (fixes #6352)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Support integer data types for netCDF _Unsigned convention #6352

BUG: Support integer data types for netCDF _Unsigned convention #6352

snowman2 commented Sep 13, 2022

snowman2 commented Sep 13, 2022

snowman2 commented Sep 13, 2022

rouault commented Sep 13, 2022

snowman2 commented Sep 14, 2022

snowman2 commented Sep 16, 2022

BUG: Support integer data types for netCDF _Unsigned convention #6352

BUG: Support integer data types for netCDF _Unsigned convention #6352

Comments

snowman2 commented Sep 13, 2022

snowman2 commented Sep 13, 2022

snowman2 commented Sep 13, 2022

rouault commented Sep 13, 2022

snowman2 commented Sep 14, 2022

snowman2 commented Sep 16, 2022