Fix for expver dimension #56

DarshanSP19 · 2023-09-25T12:14:37Z

Added the fix for expver dimension for recent months data.

DarshanSP19 · 2023-09-25T12:19:00Z

Hey @alxmrs We for the majority of the data files this fix seems working (Verified after back filling the data). But following two variables are hitting the assertion assert disjoint_nans, "The nans are not disjoint in expver=1 vs 5". It seems like data itself is inappropriate.

cloud_base_height
convective_inhibition

src/arco_era5/source_data.py

alxmrs

LGTM. If this has been tested on realistic data, than it makes sense to me.

src/arco_era5/source_data.py

alxmrs · 2023-09-28T21:52:24Z

src/arco_era5/source_data.py

@@ -372,12 +372,12 @@ def _read_nc_dataset(gpath_file):
        # and: https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation#ERA5:datadocumentation-Dataupdatefrequency  # pylint: disable=line-too-long
        # for further details.

-        all_dims_except_time = tuple(set(dataarray.dims) - {"time"})
+        all_dims_except_time = tuple(set(dataarray.dims) - {"time", "expver"})
        # Should have only trailing nans.
        a = dataarray.sel(expver=1).isnull().any(dim=all_dims_except_time)


According to Alvaro ([email protected]), the fix should be:

dataarray.sel(expver=1, drop=True)

alvarosg · 2023-09-28T21:56:18Z

src/arco_era5/source_data.py

@@ -372,12 +372,12 @@ def _read_nc_dataset(gpath_file):
        # and: https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation#ERA5:datadocumentation-Dataupdatefrequency  # pylint: disable=line-too-long
        # for further details.

-        all_dims_except_time = tuple(set(dataarray.dims) - {"time"})
+        all_dims_except_time = tuple(set(dataarray.dims) - {"time", "expver"})
        # Should have only trailing nans.
        a = dataarray.sel(expver=1).isnull().any(dim=all_dims_except_time)


If I understand correctly the reason why this was not working is that dataarray.sel(expver=1) and dataarray.sel(expver=5),
even though they no longer have the "expver" dim, they both remember which expver coordinate they came from (they keep a scalar expver coordinate), and then they complain when running arithmetics between them because the scalar coordinate don't match. Then by adding "expver" to the dimensions to reduce the scalar coordinate diappears too. However this is unintuitive, because the propose code tries to reduce across a dim, which no longer exists.

I believe the right fix, rather than setting:
all_dims_except_time = tuple(set(dataarray.dims) - {"time", "expver"})

should be to use:

dataarray.sel(expver=1, drop=True) dataarray.sel(expver=5, drop=True)

If you confirm this alternative fix works, this would be preferable.

I think the right fix for this should be:

Talking more with Alvaro here: I think these errors only crop up if we download ERA5T, not ERA5. If we simply updated the data, or put limits into how recently we were updating the data, I believe these issues would go away.

We've downloaded the data for 2023 on 21-22 September. Will that cause any trouble?
Tried below snipped giving an error.

all_dims_except_time = tuple(set(dataarray.dims) - {"time"}) a = dataarray.sel(expver=1, drop=True).isnull().any(dim=all_dims_except_time) b = dataarray.sel(expver=5, drop=True).isnull().any(dim=all_dims_except_time) disjoint_nans = bool(next(iter((a ^ b).all().data_vars.values()))) assert disjoint_nans, "The nans are not disjoint in expver=1 vs 5" dataarray = dataarray.sel(expver=1).combine_first(dataarray.sel(expver=5))

Error

ValueError: 'expver' not found in array dimensions ('time', 'latitude', 'longitude')

Could you indicate which line is throwing the error?

Perhaps the problem is that this file has an "expver" coordinate, but not an "expver" dim, which is something we have not found before, but can probably be solved by changing:

if "expver" in dataarray.coords: # previous code

to

if "expver" in dataarray.dims: # previous code elif "expver" in dataarray.coords: dataarray = dataarray.drop_vars('expver')

We've downloaded the data for 2023 on 21-22 September. Will that cause any trouble?

In short, yes I do think this is the root issue. Our goal with keeping data up-to-date is to have a ~1-2 month delay in freshness exactly for this reason. Unless we plan to re-download and update ERA5T data to ERA5 (I don't recommend this), then it would be best to change the interval at which we ingest data to avoid the case that this PR is trying to solve for.

DarshanSP19 · 2023-10-04T11:02:48Z

@alxmrs After verifying the code for April, May and June 2023 data, It seems that the changes are required for data ingestion. As the data for April and May also downloaded on 21st September so it should be fresh enough for the process further.

alxmrs

LGTM.

alxmrs · 2023-10-04T17:57:52Z

Thanks for the discussion in this PR.

Fix for expver dimension

ab5609c

DarshanSP19 self-assigned this Sep 25, 2023

DarshanSP19 requested a review from alxmrs September 25, 2023 12:18

alxmrs reviewed Sep 28, 2023

View reviewed changes

src/arco_era5/source_data.py Outdated Show resolved Hide resolved

alxmrs approved these changes Sep 28, 2023

View reviewed changes

alxmrs reviewed Sep 28, 2023

View reviewed changes

src/arco_era5/source_data.py Outdated Show resolved Hide resolved

alxmrs reviewed Sep 28, 2023

View reviewed changes

alvarosg reviewed Sep 28, 2023

View reviewed changes

Rename all_dims

e3478a5

alxmrs approved these changes Oct 4, 2023

View reviewed changes

DarshanSP19 merged commit f1327ae into main Oct 5, 2023
3 checks passed

DarshanSP19 deleted the fix-expver-dimension-issue branch October 5, 2023 10:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for expver dimension #56

Fix for expver dimension #56

DarshanSP19 commented Sep 25, 2023

DarshanSP19 commented Sep 25, 2023

alxmrs left a comment

alxmrs Sep 28, 2023

alvarosg Sep 28, 2023

alxmrs Sep 28, 2023

DarshanSP19 Sep 29, 2023

alvarosg Sep 29, 2023

alxmrs Sep 29, 2023

DarshanSP19 commented Oct 4, 2023

alxmrs left a comment

alxmrs commented Oct 4, 2023

Fix for expver dimension #56

Fix for expver dimension #56

Conversation

DarshanSP19 commented Sep 25, 2023

DarshanSP19 commented Sep 25, 2023

alxmrs left a comment

Choose a reason for hiding this comment

alxmrs Sep 28, 2023

Choose a reason for hiding this comment

alvarosg Sep 28, 2023

Choose a reason for hiding this comment

alxmrs Sep 28, 2023

Choose a reason for hiding this comment

DarshanSP19 Sep 29, 2023

Choose a reason for hiding this comment

alvarosg Sep 29, 2023

Choose a reason for hiding this comment

alxmrs Sep 29, 2023

Choose a reason for hiding this comment

DarshanSP19 commented Oct 4, 2023

alxmrs left a comment

Choose a reason for hiding this comment

alxmrs commented Oct 4, 2023