-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RRFS-SD reader #154
base: develop
Are you sure you want to change the base?
Add RRFS-SD reader #154
Conversation
this addresses #152 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just few initial comments. I'll work on the pressure calculation once I have extracted some sample data.
if "ug/kg" in dset[i].attrs["units"]: | ||
# ug/kg -> ug/m3 using dry air density | ||
dset[i] = dset[i] * dset["pres_pa_mid"] / dset["temperature_k"] / 287.05535 | ||
dset[i].attrs["units"] = r"$\mu g m^{-3}$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we really want a pretty version here could use unicode, e.g. μg/m³
or μg m⁻³
. But maybe better to use just ASCII ug/m3
or ug m-3
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let’s stick with ascii. It may cause issues down the pipeline if we save the raw files out. It might not be recognized as CF convention otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I think ug m-3
best for CF (though they would want kg m-3
for canonical).
|
||
|
||
|
||
# convert "ug/kg to ug/m3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want this to be optional like Jordan's?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont see why
Co-authored-by: Zachary Moon <[email protected]>
Co-authored-by: Zachary Moon <[email protected]>
Co-authored-by: Zachary Moon <[email protected]>
Co-authored-by: Zachary Moon <[email protected]>
Co-authored-by: Zachary Moon <[email protected]>
dset['pm25_sd'] = dset.smoke + dset.dust | ||
dset['pm25_sd'].attrs['long_name'] = 'Particulate Matter < 2.5 microns' | ||
dset['pm25_sd'].attrs['units'] = "ug/kg" | ||
if {'smoke', 'dust', 'coarsepm'} <= dset.data_vars.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't know about this syntax. Really clean and useful.
phalf(k) = a(k) + surfpres * b(k) | ||
|
||
Mid layer pressures are calculated by: | ||
pfull(k) = (phalf(k+1)-phalf(k))/log(phalf(k+1)/phalf(k)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure where this formula came from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the formula used in the post_fv3 https://github.com/NOAA-EMC/fv3atm/blob/a82381c0b751a15e5343de5078ef836b2c444c89/io/post_fv3.F90#L4266-L4276
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zmoon still not sure where this formula came from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rschwant do you know? I think that you originally did this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raffaele sent it to me to describe the vertical structure in the model when I was helping with the boundary conditions. I'll forward you all the email. This is how he described it to me.
Interface pressure levels are computed using the hybrid interface formula:
p(k) = a(k) + ps * b(k)
where ps is the (actual/reference) surface pressure. These pressure levels correspond to phalf in your output dyn*.nc files, while pfull are computed as:
pfull(k) = (phalf(k+1)-phalf(k))/log(phalf(k+1)/phalf(k))
If there is an official typical way of calculating this we can use that instead too. We haven't really tested this much since we have not used the aircraft evaluation in MELDODIES MONET much in the RRFS-CMAQ model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll compare the two methods' results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is how Raffaele calculated it for UFS-Aerosols in the exglobal_aero_init function https://github.com/NOAA-EMC/global-workflow/blob/develop/ush/merge_fv3_aerosol_tile.py#L99-L101
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should use that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p = dset.dp_pa.copy().load() # Have to load into memory here so can assign levels. | ||
srfpres = dset.surfpres_pa.copy().load() | ||
for k in range(len(dset.z)): | ||
if surf_only: | ||
pres_2 = 0.0 + srfpres * 0.9978736 | ||
pres_1 = 0.0 + srfpres * 1.0 | ||
else: | ||
pres_2 = dset.ak[k + 1] + srfpres * dset.bk[k + 1] | ||
pres_1 = dset.ak[k] + srfpres * dset.bk[k] | ||
p[:, k, :, :] = (pres_2 - pres_1) / np.log(pres_2 / pres_1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like this
p = dset.dp_pa.copy().load() # Have to load into memory here so can assign levels. | |
srfpres = dset.surfpres_pa.copy().load() | |
for k in range(len(dset.z)): | |
if surf_only: | |
pres_2 = 0.0 + srfpres * 0.9978736 | |
pres_1 = 0.0 + srfpres * 1.0 | |
else: | |
pres_2 = dset.ak[k + 1] + srfpres * dset.bk[k + 1] | |
pres_1 = dset.ak[k] + srfpres * dset.bk[k] | |
p[:, k, :, :] = (pres_2 - pres_1) / np.log(pres_2 / pres_1) | |
surfpres = dset.surfpres_pa | |
a = dset.ak | |
b = dset.bk | |
if surf_only: | |
phalf_kp1 = 0.0 + surfpres * 0.9978736 | |
phalf_k = 0.0 + surfpres * 1.0 | |
else: | |
phalf_kp1 = a.shift(z=-1) + surfpres * b.shift(z=-1) | |
phalf_k = a + surfpres * b | |
p = (phalf_kp1 - phalf_k) / np.log(phalf_kp1 / phalf_k) |
going for consistency with the docstring variable names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since ak
and bk
are dataset attrs, not variables
phalf_kp1 = a[1:] + surfpres * b[1:]
but may need to make them DataArray
s for the dims to match up properly and such, maybe create
a_k = xr.DataArray(ds.ak[:-1], dims="z")
a_kp1 = xr.DataArray(ds.ak[1:], dims="z")
b_k = xr.DataArray(ds.bk[:-1], dims="z")
b_kp1 = xr.DataArray(ds.bk[1:], dims="z")
@bbakernoaa I made an extraction (forecast hour 6 of the run you shared, 5 levels closest to surface, selected variables, etc.), now available at https://csl.noaa.gov/groups/csl4/modeldata/melodies-monet/data/example_model_data/rrfssd_example/rrfs-sd_dynf006.nc (104M) for testing. We can set it up like I did here, which also uses data stored in that location. |
maybe we can add some netcdf tricks to compress that even further using integers instead of floats and the add_offset and scale_factor netcdf attributes |
Perhaps, but I think it is better to keep format closer to the original. And I used NCO lossy compression, which already decreases the size a lot due to quantization, I don't know if additionally doing the int packing transform would make it any more compressible. |
Geoptential height with attributes. | ||
""" | ||
sfc = f.surfalt_m.load() | ||
dz = f.dz_m.load() * -1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (loading all dz) is what Maggie's run is failing on, specifically:
Unable to allocate 129. GiB for an array with shape (37, 65, 2961, 4881) and data type float32
Like the pressure calc we should be able to write this in a Dask-friendly way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also like pressure calc, for the surf-only case there is a short version.
# These are negative in RRFS-CMAQ, but you resorted and are adding from the surface, | ||
# so make them positive. | ||
dz[:, 0, :, :] = dz[:, 0, :, :] + sfc # Add the surface altitude to the first model level only | ||
z = dz.rolling(z=len(f.z), min_periods=1).sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think .cumsum()
could be used.
create a separate reader for RRFS-SD. This is based on
_rrfs_cmaq_mm
but removes many of the functions as rrfs-sd is obviously much simpler.