-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First Pass Zonal Average #82
base: main
Are you sure you want to change the base?
Conversation
mgrover1
commented
Mar 18, 2021
•
edited by andersy005
Loading
edited by andersy005
- Add zonal averaging and regridding tool
- Update environment files
- Closes Add function to compute zonal mean #33
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mgrover1
Mostly looks good.
One issue is how to handle TLONG/TLAT vs ULONG/ULAT. It would be nice if there were standard attributes that told us that one of them are the corner points for the other. I don't know that this exists but it could be worth digging around the sgrid conventions to see if there's something we can use. (https://sgrid.github.io/sgrid/)
Second, with this approach, the lowest order operation is the regrid to uniformly spaced grid (ds
→ regridded
). Operations after that are real easy regridded.mean("latitude")
and also let you do things other than the average (e.g. regridded.std("latitude")
).
Should we instead have users do
uniform = pop_tools.to_uniform_grid(ds, dx=0.1, dy=0.1)
zonal_average = uniform.mean("latitude")
pop_tools/zonal_average.py
Outdated
self.method_gen_grid = method_gen_grid | ||
|
||
# If the user does not input a mask, use default mask | ||
if not mask: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
applying a mask when mask is None
is counter-intuitive =)
Should the default be mask=True
pop_tools/zonal_average.py
Outdated
).rename({'TLAT': 'lat', 'TLONG': 'lon'}) | ||
|
||
# Inlcude only points that will have surrounding corners | ||
data_ds = data_ds.isel({'nlon': data_ds.nlon[1:], 'nlat': data_ds.nlat[1:]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work for datasets returned by to_xgcm_grid_and_dataset
which renames to nlon_t, nlon_u, nlat_t, nlat_u
.
I think the solution here is to use cf_xarray
data_ds = data_ds.cf.isel(X=slice(1,None), Y=slice(1, None))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a suggestion for adding a new X/Y coordinate? I noticed that the only cf axis within the POP grids is time
Coordinates:
- CF Axes: * T: ['time']
X, Y, Z: n/a
- CF Coordinates: longitude: ['TLONG', 'ULONG']
latitude: ['TLAT', 'ULAT', 'lat_aux_grid']
* vertical: ['z_t']
* time: ['time']
- Cell Measures: area, volume: n/a
- Standard Names: n/a
Data Variables:
- Cell Measures: area, volume: n/a
- Standard Names: n/a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh yeah I forgot. We need to make nlat
, nlon
dimension coordinates for this to work. That allows us to assign attributes like axis: "X"
ds["nlon"] = ("nlon", np.arange(ds.sizes["nlon"]), {"axis": "X"})
ds["nlat"] = ("nlon", np.arange(ds.sizes["nlon"]), {"axis": "Y"})
and similarly for nlat_*
, nlon_*
in to_xgcm_grid_dataset
. This too seems like a followup PR>
pop_tools/zonal_average.py
Outdated
data_ds = data_ds.isel({'nlon': data_ds.nlon[1:], 'nlat': data_ds.nlat[1:]}) | ||
|
||
# Use ulat and ulong values as grid corners, rename variables to match xESMF syntax | ||
grid_corners = grid_ds[['ULAT', 'ULONG']].rename( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is wrong for UVEL
for which the centers are ULAT, ULONG
and corners are TLAT, TLONG
EDIT: I see you've fixed this :) but a good solution would be to use UVEL.cf["longitude"]
to get ULON
and in theory there should be some standard attribute to lead you to TLONG
from there. I do not know what that attribute is. Might have to look in to the sgrid conventions (https://sgrid.github.io/sgrid/).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternately this kind of inference requires knowledge of the underlying grid. so xgcm.Grid
object would be useful input.
We can hardcode this for POP now and move on, but solving UVEL
regridding vs TEMP
regridding by interpreting some standard attribute or using the xgcm object allows us to robustly solve it for the POP & MOM & (insert-NCAR-model-here) regridding. SO some brainstorming would be useful...
pop_tools/zonal_average.py
Outdated
else: | ||
self.mask_labels = self.grid['region_name'] | ||
|
||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we never need no mask? I guess a global average across all basins is somewhat useless but would be required for the atmosphere.
pop_tools/zonal_average.py
Outdated
and ('velocity' not in obj[var].long_name.lower()) | ||
): | ||
ds_list.append(obj[var]) | ||
return xr.merge(ds_list).map(self._regrid_dataarray, keep_attrs=True, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead build a list of variables names and then ds[list_of_varnames].map(...)
pop_tools/zonal_average.py
Outdated
for region in tqdm(np.unique(mask)): | ||
|
||
if region != 0: | ||
ds_list.append(data.where(mask == region).groupby('lat').mean()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why groupby? Just a simple .mean("lat")
.
instead of this loop, I would recommend taking a 2D mask variable and expanding out to 3D with a new region
dimension (so that each slice along region has 1's only for the appropriate region) and then multiply. That way you're doing where
on a small array rather than a giant one. It also avoids the concat
down below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matt and were discussing, making use of the pop_tools.region_mask_3d
tool would be helpful with this, replacing "0" with nan
values...
# Read in the 3D mask
mask3d = pop_tools.region_mask_3d(grid_name, mask_name='default')
# Apply the 3D mask
data_masked = mask3d.where(mask3d > 0) * data[list_relavant_variables]
where the dimensions of data_masked would be ('region', 'nlat', 'nlon', 'time', 'z_t')
pop_tools/zonal_average.py
Outdated
if vertical_average: | ||
|
||
# Run the vertical, weighted average | ||
out = out.weighted(out['z_t'].fillna(0)).mean(dim=['z_t']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why weight by z_t
and not DZT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be calculated by using the following?
DZT = out.z_t.diff(dim='z_t')
Or am I misunderstanding what DZT is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out.z_t.diff(dim='z_t')
will return the distance between layer centers, and for 60 levels it will only return an array of length 59:
>>> import pop_tools
>>> gx1v7_grid = pop_tools.get_grid("POP_gx1v7")
>>> gx1v7_grid.z_t.diff(dim='z_t')
<xarray.DataArray 'z_t' (z_t: 59)>
array([ 1000. , 1000. , 1000. , 1000. , 1000. ,
1000. , 1000. , 1000. , 1000. , 1000. ,
1000. , 1000. , 1000. , 1000. , 1000. ,
1009.8404 , 1038.0646 , 1081.22175, 1136.90105, 1205.11015,
1286.69055, 1383.0544 , 1496.13345, 1628.40275, 1782.946 ,
1963.55735, 2174.8772 , 2422.5496 , 2713.41105, 3055.7061 ,
3459.30485, 3935.90165, 4499.1272 , 5164.489 , 5948.97315,
6870.06835, 7943.9187 , 9182.2754 , 10588.062 , 12149.8325 ,
13836.2715 , 15593.0264 , 17344.9219 , 19005.6846 , 20494.0957 ,
21751.2334 , 22751.50195, 23502.97165, 24038.333 , 24401.99805,
24638.8965 , 24787.6709 , 24878.15135, 24931.6328 , 24962.4424 ,
24979.77735, 24989.31735, 24994.45895, 24997.17675])
>>>
You want the cell thickness - the difference between layer interfaces, which is z_w_bot - z_w
. Those arrays should be the same length although the dimension names are different, but here's a kludgy example of at least returning a DataArray (the coordinates need to be cleaned up):
>>> gx1v7_grid.z_w_bot.swap_dims(z_w_bot="z_t") - gx1v7_grid.z_w.swap_dims(z_w="z_t")
<xarray.DataArray (z_t: 60)>
array([ 1000. , 1000. , 1000. , 1000. , 1000. ,
1000. , 1000. , 1000. , 1000. , 1000. ,
1000. , 1000. , 1000. , 1000. , 1000. ,
1000. , 1019.6808, 1056.4484, 1105.9951, 1167.807 ,
1242.4133, 1330.9678, 1435.141 , 1557.1259, 1699.6796,
1866.2124, 2060.9023, 2288.8521, 2556.2471, 2870.575 ,
3240.8372, 3677.7725, 4194.0308, 4804.2236, 5524.7544,
6373.1919, 7366.9448, 8520.8926, 9843.6582, 11332.4658,
12967.1992, 14705.3438, 16480.709 , 18209.1348, 19802.2344,
21185.957 , 22316.5098, 23186.4941, 23819.4492, 24257.2168,
24546.7793, 24731.0137, 24844.3281, 24911.9746, 24951.291 ,
24973.5938, 24985.9609, 24992.6738, 24996.2441, 24998.1094])
Coordinates:
z_w_bot (z_t) float64 1e+03 2e+03 3e+03 4e+03 ... 5e+05 5.25e+05 5.5e+05
z_w (z_t) float64 0.0 1e+03 2e+03 3e+03 ... 4.75e+05 5e+05 5.25e+05
Dimensions without coordinates: z_t
>>>
I think out
might already have a dz
data array, though, which would be the best thing to use:
>>> gx1v7_grid["dz"]
<xarray.DataArray 'dz' (z_t: 60)>
array([ 1000. , 1000. , 1000. , 1000. , 1000. ,
1000. , 1000. , 1000. , 1000. , 1000. ,
1000. , 1000. , 1000. , 1000. , 1000. ,
1000. , 1019.6808, 1056.4484, 1105.9951, 1167.807 ,
1242.4133, 1330.9678, 1435.141 , 1557.1259, 1699.6796,
1866.2124, 2060.9023, 2288.8521, 2556.2471, 2870.575 ,
3240.8372, 3677.7725, 4194.0308, 4804.2236, 5524.7544,
6373.1919, 7366.9448, 8520.8926, 9843.6582, 11332.4658,
12967.1992, 14705.3438, 16480.709 , 18209.1348, 19802.2344,
21185.957 , 22316.5098, 23186.4941, 23819.4492, 24257.2168,
24546.7793, 24731.0137, 24844.3281, 24911.9746, 24951.291 ,
24973.5938, 24985.9609, 24992.6738, 24996.2441, 24998.1094])
Coordinates:
* z_t (z_t) float64 500.0 1.5e+03 2.5e+03 ... 5.125e+05 5.375e+05
Attributes:
units: cm
long_name: thickness of layer k
>>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I adjusted this to utilize the dz
dataarray within the grid
out = out.weighted(self.grid.dz).mean(dim='z_t')
Which seems to work well.
I will look into that.. for now I was just using TLONG/TLAT and ULONG/ULAT since we read the grid directly from pop-tools, but moving toward this would definitely make this more extensible.
I like the that suggested approach - the reason for the current implementation is to reproduce the previously existing ZA code, but following this would make more sense. |
Co-authored-by: Deepak Cherian <[email protected]>
Co-authored-by: Deepak Cherian <[email protected]>
Co-authored-by: Anderson Banihirwe <[email protected]>
Co-authored-by: Anderson Banihirwe <[email protected]>
👍 we could hardcode it for now, choose the bounds based on |
pop_tools/zonal_average.py
Outdated
|
||
import numpy as np | ||
import xarray as xr | ||
import xesmf as xe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make xesmf
a soft dependency i.e.
try:
import xesmf as xe
except ImportError:
message = ("Zonal averaging requires xesmf package.\n\n"
"Please conda install as follows:\n\n"
" conda install -c conda-forge xesmf>=0.4.0"
)
raise ImportError(msg)
??
So... I went back and took a look at Xoak which is another way to approach this problem, allowing the user to index based on coordinates not within the default dimensions... below is the function I used to accomplish a zonal average, and the nice thing about this approach is inputting the output latitudes is easier than the previous approach using xESMF, where in this example, I use the def zonal_average(ds, out_lats):
# Set the index to be TLAT and TLONG
ds.xoak.set_index(['TLAT', 'TLONG'], 'sklearn_geo_balltree')
# output lons
lons = np.arange(-180, 180, 1)
# Build a meshgrid with the lats and lons
nlons, nlats = np.meshgrid(lons, lats)
# Build an output dataset
ds_data = xr.Dataset({
'lat': (('y', 'x'), nlats),
'lon': (('y', 'x'), nlons),
})
# Use xoak to select lats/lons
out = ds.xoak.sel(TLAT=ds_data.lat, TLONG=ds_data.lon)
# Reset the coordidinates
out = out.reset_coords()
# Add one-dimensional lats and lons
out['lon'] = ('x', lons)
out['lat'] = ('y', lats)
# Swap the dimensions
out = out.swap_dims({'x':'lon',
'y':'lat'})
return out
za_temp = zonal_average(ds[['TEMP']], ds.lat_aux_grid.values).mean('lon') Where ds is the input dataset |
Should we allow users to specify which pop-grid to use (ex. ''POP_gx1v7'')? Also, with this approach, would we automatically apply the region masking so the zonal average would include this? These are reasons why I am currently using the approach similar to xESMF where one sets up the regridder (using the associated pop-grid), then apply the regridder to some data. |
Any suggestions on how to speed up this calculation? Currently, the calculation a few minutes on the text files in pop-tools, regridding to 0.25 degree grid... this is the current zonal average section def zonal_average(self, obj, vertical_average=False, **kwargs):
data = self.regrid(obj, **kwargs)
mask = self.regrid(self.mask, regrid_method='nearest_s2d', **kwargs)
# Attach a name to the mask
mask.name = self.mask.name
# Replace zeros with nans and group into regions
out = mask.where(mask > 0) * data
# Check to see if a weighted vertical average is needed
if vertical_average:
# Calculate the vertical weighte average based on depth of the layer
out = out.weighted(self.grid.dz).mean(dim='z_t')
return out I think the main slow-down may be that it is separating into 13 |
It looks like you only need Re: speed. What does snakeviz say? It looks like you're regenerating the destination grid for every variable. That could be avoided. |
After some discussion this week with @matt-long and @klindsay28 , I think it would be best to implement a destination grid that matches the input grid.. here is a function that is hard coded for now in terms of dealing with POP, I am not sure how to generalize this since the grid corner information is specific to POP. Also, @klindsay28, how did you deal with the cyclic point in the previous implementation? Using conservative regridding, there are missing values near the origin of the grid. Using another interpolation method (ex. bilinear) would fix this, but in order to conserve areas/values, it would be good to stick with conservative regridding. The resultant dataset matches the input for xESMF, which makes it easy to implement into the current development. def flip_coords(ds):
"""
Given a set of longitudes and latitudes, flips s-hemisphere into northern hemisphere
"""
tlat = ds.TLAT.values
ulat = ds.ULAT.values
nlat = ds.nlat.values
nlat_shem_t = nlat[np.where(tlat[:, 0] < 0)]
nlat_shem_u = nlat[np.where(ulat[:, 0] < 0)]
new_ds_t = ds.sel(nlat=nlat_shem_t)
new_ds_u = ds.sel(nlat=nlat_shem_u)
# Pull out tlats/tlons
tlat = new_ds_t.TLAT.values[1:]
tlon = new_ds_t.TLONG.values
# Pull out ulats/ulons
ulat = new_ds_u.ULAT.values
ulon = new_ds_u.ULONG.values
# Use the grid spacing from the farthest south
grid_spacing = np.diff(tlat[:, 0])[0]
out_tlats = np.append(np.append(tlat[:, 0], -tlat[:, 0][::-1]), np.arange(np.max(-tlat[:, 0])+grid_spacing, 90, grid_spacing))
out_tlons = sorted(tlon[0, :])
tlons, tlats = np.meshgrid(out_tlons, out_tlats)
out_ulats = np.append(np.append(np.append(ulat[:, 0], 0), -ulat[:, 0][::-1]), np.arange(np.max(-ulat[:, 0])+grid_spacing, 90, grid_spacing))
out_ulons = sorted(ulon[0, :])
ulons, ulats = np.meshgrid(out_ulons, out_ulats)
ds_out = xr.Dataset(coords=dict(
lon=(["x"], out_tlons[1:]),
lat=(["y"], out_tlats[1:]),
lon_b=(["x_b"], out_ulons),
lat_b=(["y_b"], out_ulats)))
return ds_out |
I am struggling with finding the correct bounds for the pop-grid... here is a function I have that uses the grid corner function in pop-tools... would this be the right way to go about this? def gen_corner_calc(ULAT, ULONG):
"""
Generates corner information and creates single dataset with output
"""
# Use the function in pop-tools to get the grid corner information
corn_lat, corn_lon = pop_tools.grid._compute_corners(ULAT, ULONG)
lon_shape, lat_shape = corn_lon[:, :, 0].shape
out_shape = (lon_shape + 1, lat_shape + 1)
# Generate numpy arrays to store destination lats/lons
out_lons = np.zeros((out_shape))
out_lats = np.zeros((out_shape))
# Assign the northeast corner information
out_lons[1:, 1:] = corn_lon[:, :, 0]
out_lats[1:, 1:] = corn_lat[:, :, 0]
# Assign the northwest corner information
out_lons[:-1, 1:] = corn_lon[:, :, 1]
out_lats[:-1, 1:] = corn_lat[:, :, 1]
# Assign the southwest corner information
out_lons[:-1, :-1] = corn_lon[:, :, 2]
out_lats[:-1, :-1] = corn_lat[:, :, 2]
# Assign the southeast corner information
out_lons[1:, :-1] = corn_lon[:, :, 3]
out_lats[1:, :-1] = corn_lat[:, :, 3]
ds_out = xr.Dataset(coords=dict(lon_b=(["nlat_b", "nlon_b"], out_lons),
lat_b=(["nlat_b", "nlon_b"], out_lats)))
return ds_out |
print(data, dst_grid) | ||
|
||
# Convert the mask to a uniform grid, using nearest neighbor interpolation | ||
mask_regrid = to_uniform_grid( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have this support DataArrays directly? Call .to_dataset()
on the dataarray in to_uniform_grid
pop_tools/regrid.py
Outdated
else: | ||
raise TypeError('Missing defined longitude axis bounds') | ||
|
||
lons, lats = np.meshgrid(sorted(lon), lat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the sorting really required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily - I can go ahead and remove it
pop_tools/regrid.py
Outdated
for var in obj: | ||
if 'nlat' in obj[var].dims and 'nlon' in obj[var].dims: | ||
|
||
if obj[var].cf['latitude'].name == 'TLAT': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mrege this with the above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So adjust that to this?
if 'nlat' in obj[var].dims and 'nlon' in obj[var].dims and obj[var].cf['latitude'].name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
pop_tools/regrid.py
Outdated
print(np.unique(mask_regrid)) | ||
|
||
# Add a mask to the regridding | ||
dst_grid['mask'] = (('y', 'x'), (mask_regrid.where(mask_regrid == 0, 1, 0))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does np.logical_not do what you want here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're inverting the mask here:
(mask_regrid.where(mask_regrid == 0, 1, 0))
I think np.logical_not(mask_regrid)
could be equivalent and possibly faster?
pop_tools/regrid.py
Outdated
ds_list = [ | ||
data_regrid.where(mask_regrid == region) | ||
.weighted(weights_regrid.where(mask_regrid == region).fillna(0)) | ||
.mean('x') | ||
for region in tqdm(np.unique(mask_regrid)) | ||
if region != 0 | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments here:
- This function should have one job: compute a zonal average. Why should it bring in these regions.
- Do you really need the
where
on the weights. Also.where(..., 0)
will save an extra copy. - This could be a lot faster if you made
mask_regrid
a 3D variable: (lat, lon, region) and then passed it in as a weight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah...
- From the user API perspective, I thought it would be helpful to be able to specify the grid name and handle the logic within here... where would that be better placed? Or what is the higher level api that we would want the user to specify the region information?
- I can remove the where on the weights
- That is true; this might also be a better way of dealing with the fractional areas of the masked grid cells...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way would be to expect users to pass in ds * ds.region_mask
if region_mask
was a 3D variable (region, nlat, nlon)
. Would this be equivalent?
Does the order of operations here: regrid, then multiply by region mask and mean, follow what's in Keith's FORTRAN program?
Co-authored-by: Deepak Cherian <[email protected]>
for more information, see https://pre-commit.ci
pop_tools/regrid.py
Outdated
elif method == 'lat_axis': | ||
out_ds = xr.Dataset() | ||
|
||
if lat_axis_bnds is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that lat_axis_bnds
is required for this function to properly work. Instead of having it as a keyword argument, could we just make it a required argument and remove this check?
regrid_method = str | ||
Regridding method for xESMF, default is conservative | ||
|
||
**kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand this correctly, these kwargs
are meant to be passed to
def _regrid_dataset(da_in, dst_grid, regrid_method=None)
This function only has one keyword argument regrid_method
. Could we just expose this argument in to_uniform_grid
signature instead of using the catch-all kwargs
?
if isinstance(ds, xr.DataArray): | ||
grid_center_lat = ds.cf['latitude'].name | ||
grid_center_lon = ds.cf['longitude'].name | ||
ds = ds.to_dataset() | ||
|
||
elif isinstance(ds, xr.Dataset): | ||
var = list(ds.variables)[0] | ||
grid_center_lat = ds[var].cf['latitude'].name | ||
grid_center_lon = ds[var].cf['longitude'].name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using cf_xarray
accessor here, but I don't see cf_xarray
in the imports. Is this working as expected without the cf_xarray
import?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also list(ds.variables)[0]
?
I think the only sane way to do this is to look at loop over DataArrays and check "TLAT" in DataArray.attrs["coordinates"]
.So take an input dataset then split it in to two: those with coordinates TLAT, TLONG; and then those with ULAT, ULONG; process the two datasets separately then combine them to a single dataset and return that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dcherian This logic is handled within the to_uniform_grid
function...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgrover1, as Deepak pointed out, this line is going to result in errors when the "first" grabbed variable doesn't have spatial dimensions...
var = list(ds.variables)[0]
Could we make this more robust?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can go ahead and remove the check to see if it is a dataset since it should be a data array, and this is a function that is meant to used internally... in to_uniform_grid, we check to see if nlat, nlon, and TLAT are in the dimensions, creating a list of variables. These variables are then passed into prep_for_xesmf function, iterating over those variables.
pop_tools/regrid.py
Outdated
for var in obj: | ||
if 'nlat' in obj[var].dims and 'nlon' in obj[var].dims: | ||
|
||
if obj[var].cf['latitude'].name == 'TLAT': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
if isinstance(ds, xr.DataArray): | ||
grid_center_lat = ds.cf['latitude'].name | ||
grid_center_lon = ds.cf['longitude'].name | ||
ds = ds.to_dataset() | ||
|
||
elif isinstance(ds, xr.Dataset): | ||
var = list(ds.variables)[0] | ||
grid_center_lat = ds[var].cf['latitude'].name | ||
grid_center_lon = ds[var].cf['longitude'].name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also list(ds.variables)[0]
?
I think the only sane way to do this is to look at loop over DataArrays and check "TLAT" in DataArray.attrs["coordinates"]
.So take an input dataset then split it in to two: those with coordinates TLAT, TLONG; and then those with ULAT, ULONG; process the two datasets separately then combine them to a single dataset and return that.
@mgrover1 is this function something that would be generally useful for geoscience applications, or is it specific to POP? Just curious. Thanks! |
Co-authored-by: Anderson Banihirwe <[email protected]>
Co-authored-by: Anderson Banihirwe <[email protected]>
for more information, see https://pre-commit.ci
Co-authored-by: Anderson Banihirwe <[email protected]>
In its current state, it is fairly POP specific... but it should be relatively straight forward to make this more generic to the point where it could be applied to other use cases. |
Co-authored-by: Anderson Banihirwe <[email protected]>
assert isinstance(out, xr.Dataset) | ||
assert 'region' in out.dims | ||
assert 'TEMP' in out.variables | ||
assert 14 == len(out.region) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we produce some output with keith's za functionality, and then do some numerical comparison as part of the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am planning on holding off on this until we can figure out why it's next lining up exactly... It has something to do with the region masks, but I can't nail it down
pop_tools/regrid.py
Outdated
Parameters | ||
---------- | ||
method: str | ||
Method to use for generating the destination grid, options include 'regular_grid' or 'define_lat_aux' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's lat_aux_grid
down below.
Co-authored-by: Deepak Cherian <[email protected]>
An ASP grad student is asking about regridding POP to uniform grid. Can we merge the |
@dcherian that makes sense to me! |