Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace CDAT #346

Open
forsyth2 opened this issue Nov 9, 2022 · 13 comments · Fixed by #519 · May be fixed by #651
Open

Replace CDAT #346

forsyth2 opened this issue Nov 9, 2022 · 13 comments · Fixed by #519 · May be fixed by #651
Labels
priority: high High priority task

Comments

@forsyth2
Copy link
Collaborator

forsyth2 commented Nov 9, 2022

conda-forge/cdtime-feedstock#52 mentions that we need to replace CDAT in zppy. Note that we already have #80, which is specifically about cdscan.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Dec 9, 2022

@tomvothecoder I'm not quite sure what needs to be done for zppy.

$ git grep -n cdscan
docs/source/dev_guide/new_diags_set.rst:100:        cdscan -x ${xml_name} ${rofDir}/${v}_*.nc
zppy/templates/e3sm_diags.bash:104:        # Add this time series file to the list of files for cdscan to use
zppy/templates/e3sm_diags.bash:111:    cdscan -x ${xml_name} -f ${v}_files.txt
zppy/templates/e3sm_diags.bash:132:  cdscan -x ${xml_name} ${ts_rof_dir_source}/${v}_*.nc
zppy/templates/global_time_series.bash:46:cdscan -x glb.xml *.nc
zppy/templates/global_time_series.bash:70:    cdscan -x glb.xml mpaso.glb*.nc

$ git grep -n cdat
# No matches

$ git grep -n cdms
zppy/templates/readTS.py:1:import cdms2
zppy/templates/readTS.py:10:        self.f = cdms2.open(filename)

$ emacs conda/dev.yml
# No dependencies appear obviously related to cdat

$ emacs conda/meta.yml
# No dependencies appear obviously related to cdat

@tomvothecoder
Copy link
Collaborator

@forsyth2 I don't have familiarity with the zppy codebase to give input on what needs to be done specifically.

My general approach to refactoring is to identify what needs to be refactored, understand the implementation, and find alternative solutions that would reproduce the same/similar results.

@tomvothecoder
Copy link
Collaborator

It looks like you identified the CDAT modules that are used in zppy, which is a good first step. Luckily, the dependency is not that significant from what I can tell.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Dec 9, 2022

Ok, great!

https://cdat.llnl.gov/ lists "specific CDAT packages (cdms2, cdat_info, cdutil, cdtime, genutil, libcdms)". Out of those, it looks like zppy has:

$ git grep -n cdms2
zppy/templates/readTS.py:1:import cdms2
zppy/templates/readTS.py:10:        self.f = cdms2.open(filename)

$ git grep -n cdutil
zppy/templates/readTS.py:2:import cdutil
zppy/templates/readTS.py:65:            v = cdutil.YEAR(v)

So, basically I just need replacements for cdcscan, cdms2.open, and cdutil.YEAR.

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Dec 12, 2022

Those are the packages whose dependencies are being maintained until end of CY2023.

I doubt zppy uses other modules from CDAT. It might be a good idea to check for other modules that operate on cdms2 objects like MV2 though.

@forsyth2
Copy link
Collaborator Author

Thanks, out of the list at https://anaconda.org/cdat/repo, we just have:

$ git grep -n cdms2
zppy/templates/readTS.py:1:import cdms2
zppy/templates/readTS.py:10:        self.f = cdms2.open(filename)

$ git grep -n cdutil
zppy/templates/readTS.py:2:import cdutil
zppy/templates/readTS.py:65:            v = cdutil.YEAR(v)

@forsyth2 forsyth2 moved this from Todo to In Progress in forsyth2 current tasks Dec 16, 2022
@forsyth2
Copy link
Collaborator Author

We can probably use xarray to replace cdms2.open, and an XML reader to replace cdscan (mfopen to open multiple files). May have to replace with multiple tools.

@forsyth2 forsyth2 moved this from In Progress to Todo in forsyth2 current tasks May 12, 2023
@forsyth2
Copy link
Collaborator Author

@tomvothecoder What's the timeline/deadline for this issue? Are we trying not to have anything CDAT-related by the next Unified release?

@tomvothecoder
Copy link
Collaborator

@forsyth2 Just CC'd you in an email chain with Xylar. He gave us 6-9 months to refactor any E3SM packages from CDAT to alternative solutions so they can work in E3SM Unified.

It looks like zppy's CDAT dependency is minimal and much more straightforward to refactor compared to e3sm_diags.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Oct 2, 2023

@tomvothecoder Sounds good; I've moved this issue up in priority, so it can be completed by the next release (February/March of 2024)

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Oct 2, 2023

@tomvothecoder I apologize if this is a naive question, but are there xCDAT equivalents for all the CDAT commands? That seems like it would most simplify this change.

It looks like we need to change 3 things here:

@tomvothecoder
Copy link
Collaborator

Replace cdms2.open; I'm assuming this would be done with xarray.open_dataset (docs.xarray.dev/en/stable/generated/xarray.open_dataset.html) or perhaps xcdat.open_dataset (xcdat.readthedocs.io/en/stable/generated/xcdat.open_dataset.html#xcdat.open_dataset).

Yes, you can use either API if a single dataset is being opened. xcdat.open_dataset() has some additional capabilities listed in the docs. If you don't need them, xr.open_dataset() will work fine.

Replace cdscan -x; is there a recommended XML reader that could handle this?

In #80, Jill says "cdscan is used to concatenate files before running e3sm-diags and global mean time-series. One possibility is to use xarray's IO to read-in multiple files."

Since you'll be moving to xarray/xCDAT, I suggest exploring xarray.open_mfdataset() or xcdat.open_mfdataset(). These APIs can read a glob or list of .nc file paths and combine them into a single xr.Dataset.

Replace cdutil.YEAR; this seems trickiest. Looking at the file where it's defined, CDAT/cdutil@master/cdutil/times.py, there is quite a bit of functionality. Would the best option be to determine which functionality is needed by zppy and write that myself??

xCDAT has APIs for temporal averaging. The equivalent to cdutil.YEAR should be the xCDAT accessor method, xr.Dataset.temporal.group_average(...freq="year"). The docs include examples on how to use it.


Here's a CDAT to xCDAT mapping table (not all features are 1:1): https://xcdat.readthedocs.io/en/stable/api.html#cdat-mapping-table

@forsyth2
Copy link
Collaborator Author

Reopening since #519 was a partial resolution.

@forsyth2 forsyth2 reopened this May 23, 2024
@github-project-automation github-project-automation bot moved this from Done to In Progress in forsyth2 current tasks May 23, 2024
@forsyth2 forsyth2 linked a pull request Dec 10, 2024 that will close this issue
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: high High priority task
Projects
None yet
2 participants