-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Division of Features in Top-Level API #20
Comments
I'm not thoroughly familiar with xarray's backend API :( So, I'm going to tinker with the xarray's backend API source code sometime next week to figure out how much we can get out of it at a minimal cost (performance and user experience-wise). And then I'll post some updates. |
So here are my thoughts on this! I see four core categories of features for xwrf:
I initially envisioned (like I believe @lpilz is alluding to) that all four of these could work in the xwrf backend. However, given the hiccup that Dask chunking is a post-backend operation in xarray (which I didn't realize when writing the first cut of #14), I'm not sure that the workarounds required to fit diagnostics and destaggering lazily into the backend is worth it...it could be though... So, two options came to mind: 1) Do it all in the backend This is the cleanest from the user API point-of-view, as they can get their cleaned, ready-to-use dataset as simply ds = xr.open_dataset(
'wrfout_d01_2017-12-02_12 00 00',
engine='xwrf',
backend_kwargs={...}
) However, making the core diagnostics and destaggering lazy-loading in the backend will take much more work behind-the-scenes. I'm with @andersy005 in that I'm not thoroughly familiar with xarray's backend API, but what I can say is that, if we go this path, we will need some way to create "mocked variables" whose 2) Do coordinates and CF attrs in the backend, but diagnostics and destaggering externally This would be fairly easy to implement! The diagnostics could look just like that in https://github.com/jthielen/xwrf/blob/salem-style-coords-backup-0/xwrf/diagnostics.py, and the destaggering a "search and replace" routine that looks for a It is less ideal, however from the user side. If we have a "combo" method for both the core diagnostics and destaggering together, it would look something like ds = xr.open_dataset(
'wrfout_d01_2017-12-02_12 00 00',
engine='xwrf',
backend_kwargs={...}
).wrf.diag_and_destagger() or ds = xr.open_dataset(
'wrfout_d01_2017-12-02_12 00 00',
engine='xwrf',
backend_kwargs={...}
).pipe(xwrf.diag_and_destagger) Not terrible, but trying to justify always needing two separate operations to read in WRF doesn't seem like the greatest UX. Also, we'll have to come up with some "CF adjacent" metadata for the raw Does anyone have strong preferences between these two options? Or another option altogether? |
Also, in response to the raised points about not including diagnostics in the backend:
👍
As far as I can tell, we're only talking about 5 raw fields ("T", "P", "PB", "PH" and "PHB") being mapped to 4 derived fields ("air_potential_temperature", "air_pressure", "geopotential" and "geopotential_height"). I can't think of any case where the result coming out of the xwrf backend with respect to those fields would need correction by the user before xwrf diagnostics (i.e., if those fields are messed up so that the xwrf diagnostics won't work, they almost certainly would have been messed up in a way that the xwrf backend wouldn't recognize them either). But I could be mistaken too!
👍 It also makes it much easier to implement, as we'd just write the expected code (e.g.,
If you're post-processing (and possibly to saving another format), I'm not sure why would you want to re-save the raw components instead of the combined fields? It's more data and harder to describe with quality metadata (and I shudder at the thought of intentionally re-saving data with WRF-style metadata). |
Also, re:MPAS
Prefacing this by saying I have not worked with MPAS itself at all, taking a look through the raw example MPAS files available through MPAS-Analysis's documentation, I don't see any overlap with WRF-style metadata or weird offset/split components. So, if we stick to the limited scope for xwrf mentioned in #20 (comment), then I wouldn't forsee anything where MPAS can also profit from it. However, the upstream improvements to MetPy and xgcm that will be needed to achieve feature parity with wrf-python (e.g., grid-capable CAPE, sigma/hybrid-sigma to isobaric vertical coordinate transformation) will likely bring benefit to MPAS too (assuming that MPAS output can be made CF-friendly as well)! I believe FV3 would be a similar story (based on the limited experience I have with FV3-LAM short range weather app). |
Of the two options, I would prefer 2) for some of the same reasons @andersy005 mentioned. I was trying to remember whether we discussed this and didn't find anything in the meeting notes regarding this topic, so I would like to add the other option 3) Do everything in an xarray accessor I guess this would be the easiest option. I looked through the To me, there are quite a few reasons for considering this option:
Again, this is only true provided we aren't missing any data from just loading |
I'm planning to start working on the xarray accessor unless there are objections and/or someone has started working on this already |
That would be great 👍 |
I think #44 closes this discussion, do you agree? |
I concur! (Feel free to re-open if otherwise.) |
Progress!!!! 😄 |
While detailed API discussions will be ongoing based on #13 and other issues/discussions that follow from that, #14 (comment) and #14 (comment) raised a more high-level API point that would be good to clear up first: what features go into the xwrf backend, and what goes elsewhere (such as a
.wrf
accessor)?Original comments:
Wouldn't creating custom backend arrays be overkill? Assuming we want to support reading files via the Python-netCDF4 library, we might be able to write a custom data store that borrows from xarray's
NetCDF4DataStore
: https://github.com/pydata/xarray/blob/5db40465955a30acd601d0c3d7ceaebe34d28d11/xarray/backends/netCDF4_.py#L291. With this custom datastore, we would have more control over what to do with variables, dimensions, attrs before passing them to xarray. Wouldn't this suffice for the data loading (without the derived quantities)?I think there's value in keeping the backend plugin simple (e.g. performing simple tasks such as decoding coordinates, fixing attributes/metadata, etc) and everything else outside the backend. Deriving quantities doesn't seem simple enough to warrant having this functionality during the data loading.
Some of the benefits of deriving quantities outside the backend are that this approach:
(1) doesn't obfuscate what's going on,
(2) gives users the opportunity to fix aspects of the dataset that might be missed by xwrf during data loading before passing this cleaned dataset to the functionality for deriving quantities.
(3) removes the requirement for deriving quantities to be a lazy operation i.e. if your dataset is in memory, deriving the quantity is done eagerly...
Originally posted by @andersy005 in #14 (comment)
Also, Wouldn't it be beneficial for deriving quantities to be backend agnostic? I'm imagining cases in which the data have been post-processed and saved in a different format (e.g. Zarr) and you still want to be able to use the same code for deriving quantities on the fly.
Originally posted by @andersy005 in #14 (comment)
This sounds like it factors directly into the "keep the solutions as general as possible (so that maybe also MPAS can profit from it)" discussion. However, I feel that we have to think about the user-perspective too. I don't have any set opinions on this and we should definitely discuss this maybe in a larger group too. Here some thoughts on this so far:
I think the reason users like
wrf-python
is because it is an easy one-stop-shop for getting wrf output to work with python - this is especially true because lots of users are scientists and not software engineers or programmers. I personally take from this point that it would be prudent to keep the UX as easy as possible. I think this is what the Backend-approach does really well. Basically users just have to add theengine='xwrf'
kwarg and then it just works (TM). Meaning that it provides the users with CF-compliant de-WRFified meteo data. Also, given that the de-WRFification of the variable data is not too difficult (it's basically just adding fields for three variables), I think the overhead in complexity wouldn't be too great. However, while I do see that it breaks the conceptual barrier between data loading (and decoding etc.) and computation, this breakage would be required in order to provide the user with meteo data rather than raw wrf fields.@andersy005 do you already have some other ideas on how one could handle this elegantly?
Also, should we move this discussion to a separate issue maybe?
Originally posted by @lpilz in #14 (comment)
The text was updated successfully, but these errors were encountered: