-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fairly critical] extract_time()
preprocessor, masked data and new dask=2024.8.0 return cubes with fill values instead of masked elements
#2505
Comments
I can't see where the issue is coming from since if I look at a cube (netCDF4 Dataset) before and after extract time, it looks the same from netCDF4's point of view:
so this means iris is missing something it needs |
the code above, edited for the last cube in the list: import cf_units
import iris
import numpy as np
import pytest
from esmvalcore.preprocessor import extract_time
esmvaltool_sample_data = pytest.importorskip("esmvaltool_sample_data")
cubes = esmvaltool_sample_data.load_timeseries_cubes(mip_table='day')
c0 = cubes[0]
time_slice = {'start_year': 2001, 'end_year': 2002, 'start_month': 12, 'end_month': 2, 'start_day': 1, 'end_day': 1}
cube = cubes[-1]
cubes = extract_time(cube, **time_slice)
regrid_kwargs = {
'grid': c0,
'scheme': iris.analysis.Nearest(),
}
reg_cub = cube.regrid(**regrid_kwargs)
print(np.mean(reg_cub.data)) returns a valid numerical result - my brain is currently fried from trying to understand wth is going on in here, so am just gonna call it a day, for now. |
extract_time()
preprocessor and new dask=2024.8.0extract_time()
preprocessor, masked data and new dask=2024.8.0 return cubes with fill values instead of masked elements
OK got the bugger (almost)! It proves out if one constructs a list of cubes, then the bug creeps in: import cf_units
import iris
import numpy as np
import pytest
from esmvalcore.preprocessor import extract_time
esmvaltool_sample_data = pytest.importorskip("esmvaltool_sample_data")
cubes = esmvaltool_sample_data.load_timeseries_cubes(mip_table='day')
c0 = cubes[0]
time_slice = {'start_year': 2001, 'end_year': 2002, 'start_month': 12, 'end_month': 2, 'start_day': 1, 'end_day': 1}
cubes = [cubes[-1]]
cubes = [extract_time(cube, **time_slice) for cube in cubes]
regrid_kwargs = {
'grid': c0,
'scheme': iris.analysis.Nearest(),
}
for cube in cubes:
reg_cub = cube.regrid(**regrid_kwargs)
print(np.mean(reg_cub.data)) this will make the output from |
even simpler: import cf_units
import iris
import numpy as np
import pytest
from esmvalcore.preprocessor import extract_time
esmvaltool_sample_data = pytest.importorskip("esmvaltool_sample_data")
cubes = esmvaltool_sample_data.load_timeseries_cubes(mip_table='day')
c0 = cubes[0]
time_slice = {'start_year': 2001, 'end_year': 2002, 'start_month': 12, 'end_month': 2, 'start_day': 1, 'end_day': 1}
cubes = [cubes[-1]]
cubes = [extract_time(cube, **time_slice) for cube in cubes]
print(cubes[0].data) that cube will have 1e36s inside its data instead of masked elements! |
OK - pushing the enemy even closer to the town center: it appears that the problem is with the sample data loader - saving the "problem" cube to disk, and loading from there ie: import cf_units
import iris
import numpy as np
import pytest
from esmvalcore.preprocessor import extract_time
esmvaltool_sample_data = pytest.importorskip("esmvaltool_sample_data")
# cubes = esmvaltool_sample_data.load_timeseries_cubes(mip_table='day')
time_slice = {'start_year': 2001, 'end_year': 2002, 'start_month': 12, 'end_month': 2, 'start_day': 1, 'end_day': 1}
# iris.save(cubes[-1], "problem_cube.nc")
# cubes = [cubes[-1]]
cubes = [iris.load_cube("problem_cube.nc")]
cubes = [extract_time(cube, **time_slice) for cube in cubes]
print(cubes[0].data) will result in correct behaviour of the masked bit |
OK the
|
blithering Hell! I isolated the problem - it's an iris issue, here the MRE/MRC: import iris
import numpy as np
c1 = iris.load_cube("cubb-1.nc")
c2 = iris.load_cube("cubb-2.nc")
# apply slice to concatenated cube
slicer = (
np.random.choice(a=[False, True], size=(730,)),
slice(None, None, None),
slice(None, None, None),
slice(None, None, None)
)
# can use this to slice each of cubes c1 and/or c2
slicer1 = (
np.random.choice(a=[False, True], size=(365,)),
slice(None, None, None),
slice(None, None, None),
slice(None, None, None)
)
cube = iris.cube.CubeList([c1, c2]).concatenate_cube()
cubes = cube[slicer]
print("After slicing", cubes.data) Gonna close this, after I open an issue at iris 🍺 |
issue in iris SciTools/iris#6109 - will close this, and open a separate issue that monitors progress on the iris issue |
extract_time()
is causing an attribute loss that leads tofill_value
not being taken into account in the next preprocessor, see example code below:toggle the time slicing and if it's on: one gets some
inf
s in the mean computation with lazy data ie values of 1e+36 for data points, since the dataset looks like:This is happening only with the new Dask; and not with the old 2024.7.1, see #2503
The text was updated successfully, but these errors were encountered: