-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xr.open_dataset() reading ubyte variables as float32 from DAP server #7782
Comments
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! |
As both variables have a _FillValue attached xarray converts these values to NaN effectively casting to float32 in this case. You might inspect the You can deactivate the automatic conversion by adding kwarg There is more information in the docs https://docs.xarray.dev/en/stable/user-guide/io.html |
Thank you for your quick reply. Adding the It would save me quite a lot of processing time since using |
Then you are somewhat deadlocked. You might be able to achieve what want by using I'll add a code example tomorrow if no one beats me to it. |
Your suggestion worked perfectly, thank you very much! Avoiding using |
Do these two have to be linked? I wonder if we can handle the filling later : xarray/xarray/coding/variables.py Lines 397 to 407 in 2657787
It seems like this code is setting fill values to the right type for CFMaskCoder which is the next step Lines 266 to 272 in 2657787
|
@dcherian Yes, that would work. We would want to check the different attributes and apply the coders only as needed. That might need some refactoring. I'm already wrapping my head around this for several weeks now. |
The current approach seeems OK no? It seems like the bug is that
EDIT: I mean that each coder checks whether it is applicable, so we already do that |
@dcherian The main issue here is that we have two different CF things which are applied, Unsigned and _FillValue/missing_value. For netcdf4-python the values would just be masked and the dtype would be preserved. For xarray it will be cast to float32 because of the _FillValue/missing_value. I agree, moving the Unsigned Coder out of mask_and_scale should help in that particular case. |
This is how netCDF4-python handles this data with different parameters: import netCDF4 as nc
with nc.Dataset("http://dap.ceda.ac.uk/thredds/dodsC/neodc/esacci/snow/data/scfv/MODIS/v2.0/2010/01/20100101-ESACCI-L3C_SNOW-SCFV-MODIS_TERRA-fv2.0.nc") as ds_dap:
v = ds_dap["scfv"]
print(v)
print("\n- default")
print(f"variable dtype: {v.dtype}")
print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}")
print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}")
print("\n- maskandscale False")
ds_dap.set_auto_maskandscale(False)
v = ds_dap["scfv"]
print(f"variable dtype: {v.dtype}")
print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}")
print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}")
print("\n- mask/scale False")
ds_dap.set_auto_mask(False)
ds_dap.set_auto_scale(False)
v = ds_dap["scfv"]
print(f"variable dtype: {v.dtype}")
print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}")
print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}")
print("\n- mask True / scale False")
ds_dap.set_auto_mask(True)
ds_dap.set_auto_scale(False)
v = ds_dap["scfv"]
print(f"variable dtype: {v.dtype}")
print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}")
print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}")
print("\n- mask False / scale True")
ds_dap.set_auto_mask(False)
ds_dap.set_auto_scale(True)
v = ds_dap["scfv"]
print(f"variable dtype: {v.dtype}")
print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}")
print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}")
print("\n- mask True / scale True")
ds_dap.set_auto_mask(True)
ds_dap.set_auto_scale(True)
v = ds_dap["scfv"]
print(f"variable dtype: {v.dtype}")
print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}")
print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}")
print("\n- maskandscale True")
ds_dap.set_auto_mask(False)
ds_dap.set_auto_scale(False)
ds_dap.set_auto_maskandscale(True)
v = ds_dap["scfv"]
print(f"variable dtype: {v.dtype}")
print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}")
print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}") <class 'netCDF4._netCDF4.Variable'>
int8 scfv(time, lat, lon)
_Unsigned: true
_FillValue: -1
standard_name: snow_area_fraction_viewable_from_above
long_name: Snow Cover Fraction Viewable
units: percent
valid_range: [ 0 -2]
actual_range: [ 0 100]
flag_values: [-51 -50 -46 -41 -4 -3 -2]
flag_meanings: Cloud Polar_Night_or_Night Water Permanent_Snow_and_Ice Classification_failed Input_Data_Error No_Satellite_Acquisition
missing_value: -1
ancillary_variables: scfv_unc
grid_mapping: spatial_ref
_ChunkSizes: [ 1 1385 2770]
unlimited dimensions: time
current shape = (1, 18000, 36000)
filling off
- default
variable dtype: int8
first 2 elements: uint8 [215 215]
last 2 elements: uint8 [215 215]
- maskandscale False
variable dtype: int8
first 2 elements: int8 [-41 -41]
last 2 elements: int8 [-41 -41]
- mask/scale False
variable dtype: int8
first 2 elements: int8 [-41 -41]
last 2 elements: int8 [-41 -41]
- mask True / scale False
variable dtype: int8
first 2 elements: int8 [-- --]
last 2 elements: int8 [-- --]
- mask False / scale True
variable dtype: int8
first 2 elements: uint8 [215 215]
last 2 elements: uint8 [215 215]
- mask True / scale True
variable dtype: int8
first 2 elements: uint8 [215 215]
last 2 elements: uint8 [215 215]
- maskandscale True
variable dtype: int8
first 2 elements: uint8 [215 215]
last 2 elements: uint8 [215 215] First, the dataset was created with As we can see from the above output, in netCDF4-python If Xarray is trying to align with netCDF4-python it should separate We would need a similar approach here for Xarray with additional kwargs |
Thanks for the in-depth investigation!
Do we know why this is so?
👍 |
TL;DR: NETCDF3 detail to allow (signal) unsigned integer, still used in recent formats
My suggestion would be to nudge the user by issuing warnings and link to new to be added documentation on the topic. This could be in line with the cf-coding conformance checks which have been discussed yesterday in the dev-meeting. |
What happened?
Trying to open and save a netcdf file through CEDA's DAP server (http://dap.ceda.ac.uk/thredds/dodsC/neodc/esacci/snow/data/scfv/MODIS/v2.0/2010/01/20100101-ESACCI-L3C_SNOW-SCFV-MODIS_TERRA-fv2.0.nc) whose variables
scfv
andscfv_unc
are of typeubyte
. File DDS is as follows:And its DAS has attribute
_Unsigned
astrue
.Using
xr.open_dataset(http://dap.ceda.ac.uk/thredds/dodsC/neodc/esacci/snow/data/scfv/MODIS/v2.0/2010/01/20100101-ESACCI-L3C_SNOW-SCFV-MODIS_TERRA-fv2.0.nc)
the mentioned variables get read asfloat32
instead ofubyte
or at leastbyte
What did you expect to happen?
The returned Dataset should have
scfv
andscfv_unc
of dtypeubyte
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.9.15 (main, Nov 24 2022, 14:31:59)
[GCC 11.2.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.80.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1
xarray: 2022.11.0
pandas: 1.5.2
numpy: 1.23.5
scipy: 1.9.3
netCDF4: 1.6.2
pydap: None
h5netcdf: None
h5py: 3.7.0
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.3.3
cfgrib: 0.9.10.3
iris: None
bottleneck: None
dask: 2022.11.1
distributed: None
matplotlib: 3.5.2
cartopy: 0.21.0
seaborn: 0.12.1
numbagg: None
fsspec: 2022.11.0
cupy: None
pint: 0.20.1
sparse: 0.13.0
flox: None
numpy_groupies: None
setuptools: 65.5.0
pip: 22.2.2
conda: 22.9.0
pytest: 7.2.0
IPython: 7.33.0
sphinx: 5.3.0
The text was updated successfully, but these errors were encountered: