Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace CDAT #519

Merged
merged 4 commits into from
May 23, 2024
Merged

Replace CDAT #519

merged 4 commits into from
May 23, 2024

Conversation

forsyth2
Copy link
Collaborator

@forsyth2 forsyth2 commented Oct 3, 2023

Replace CDAT. Resolves #346. Resolves #80.

@forsyth2 forsyth2 added DevOps CI/CD, configuration, etc. priority: high High priority task labels Oct 3, 2023
@forsyth2 forsyth2 self-assigned this Oct 3, 2023
@forsyth2 forsyth2 marked this pull request as draft October 3, 2023 23:43
@@ -43,7 +43,7 @@ atmosphere_only=${atmosphere_only,,}
echo 'Create xml files for atm'
export CDMS_NO_MPI=true
cd ${case_dir}/post/atm/glb/ts/monthly/${ts_num_years}yr
cdscan -x glb.xml *.nc
python cdscan_replacement.py glb.xml *.nc
Copy link
Collaborator Author

@forsyth2 forsyth2 Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zppy isn't finding the cdscan_replacement.py file.

global_time_series adds Python scripts to an accessible directory via the block at https://github.com/E3SM-Project/zppy/blob/main/zppy/global_time_series.py#L55:

            c["global_time_series_dir"] = os.path.join(
                scriptDir, "{}_dir".format(prefix)
            )
            if not os.path.exists(c["global_time_series_dir"]):
                os.mkdir(c["global_time_series_dir"])
            scripts = ["coupled_global.py", "readTS.py", "ocean_month.py"]
            for script in scripts:
                script_template = templateEnv.get_template(script)
                script_file = os.path.join(c["global_time_series_dir"], script)
                with open(script_file, "w") as f:
                    f.write(script_template.render(**c))
                makeExecutable(script_file)

Would it make sense to do something similar for cdscan_replacement? It seems like there should be a simpler way since there are no template parameters in that file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@forsyth2, why not make this a proper entry point in the zppy package? That's the "right" way to do this:
https://packaging.python.org/en/latest/specifications/entry-points/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do that, you would add your functions along with a main() function equivalent to your current if __name__ == "__main__": block to zppy somewhere and then you would edit your setup.py to add another entry point like this one:
https://github.com/E3SM-Project/zppy/blob/main/setup.py#L33

Copy link
Collaborator Author

@forsyth2 forsyth2 Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar Thanks, I tried implementing it that way, but I'm getting zppy_cdscan_replacement: command not found errors.

$ grep -n "cdscan_replacement" *.o*
e3sm_diags_atm_monthly_180x360_aave_environment_commands_model_vs_obs_1850-1851.o405614:1:/var/spool/slurmd/job405614/slurm_script: line 120: zppy_cdscan_replacement: command not found
e3sm_diags_atm_monthly_180x360_aave_environment_commands_model_vs_obs_1850-1853.o405616:1:/var/spool/slurmd/job405616/slurm_script: line 120: zppy_cdscan_replacement: command not found
e3sm_diags_atm_monthly_180x360_aave_environment_commands_model_vs_obs_1852-1853.o405615:1:/var/spool/slurmd/job405615/slurm_script: line 120: zppy_cdscan_replacement: command not found
e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1850-1851.o405611:1:/var/spool/slurmd/job405611/slurm_script: line 120: zppy_cdscan_replacement: command not found
e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1850-1853.o405613:1:/var/spool/slurmd/job405613/slurm_script: line 120: zppy_cdscan_replacement: command not found
e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1852-1853.o405612:1:/var/spool/slurmd/job405612/slurm_script: line 120: zppy_cdscan_replacement: command not found
global_time_series_1850-1860.o405621:2:/var/spool/slurmd/job405621/slurm_script: line 58: zppy_cdscan_replacement: command not found

Yet the command does show up when I run which in the dev environment.

$ which zppy
~/miniconda3/envs/zppy_dev_issue_346/bin/zppy
$ which zppy_cdscan_replacement
~/miniconda3/envs/zppy_dev_issue_346/bin/zppy_cdscan_replacement

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using a different environment (e.g. E3SM-Unified) when you try to call zppy_cdscan_replacement? If that command is part of the environment you use to launch jobs but then it's not available in the environment that's actually used when you run the jobs, it would make sense that you're getting these errors.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I don't think that worked but I can't seem to find any documentation as to why.

I do see on Oct. 3, Tom noted "I suggest exploring xarray.open_mfdataset()" in #346 (comment), but then in this thread from Oct. 5, I'm trying to use cdscan_replacement.py. That seems to imply something about open_mfdataset wasn't working properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, interesting! Still, I think it's worth trying again with open_mfdataset and reminding yourself what the error was. Maybe we can solve it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking into it; I want to say it was something about it needing to convert to xml.

It also occurs to me that even if we use xarray, we'd need to call it from a bash script, so the same general problem (of needing to get a bash wrapper defined in the environment) would remain.

Copy link
Collaborator

@tomvothecoder tomvothecoder May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the step-by-step requirements/description for the functionality involving cdscan? This would make it easier to identify the equivalent APIs needed from other packages to replace the cdscan code.

From my guess, in your comment here, it just looks like the code takes a glob of .nc filepaths and converts it to XML using cdscan. I assume this XML is used downstream with cdms2 (or another CDAT package) to open up the dataset.

$ git grep -n cdscan
docs/source/dev_guide/new_diags_set.rst:100:        cdscan -x ${xml_name} ${rofDir}/${v}_*.nc
zppy/templates/e3sm_diags.bash:104:        # Add this time series file to the list of files for cdscan to use
zppy/templates/e3sm_diags.bash:111:    cdscan -x ${xml_name} -f ${v}_files.txt
zppy/templates/e3sm_diags.bash:132:  cdscan -x ${xml_name} ${ts_rof_dir_source}/${v}_*.nc
zppy/templates/global_time_series.bash:46:cdscan -x glb.xml *.nc
zppy/templates/global_time_series.bash:70:    cdscan -x glb.xml mpaso.glb*.nc

$ git grep -n cdms
zppy/templates/readTS.py:1:import cdms2
zppy/templates/readTS.py:10:        self.f = cdms2.open(filename)

If that's the case, you can probably bypass the XML step entirely and just pass the glob paths (e.g., ${rofDir}/${v}_*.nc directly to xarray.open_mfdataset() to open up the dataset. However, you'll need to do some prototyping to validate a solution (I'm just throwing things out).

path =  "${rofDir}/${v}_*.nc"

dataset = xarray.open_mfdataset(path)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the step-by-step requirements/description for the functionality involving cdscan?

For e3sm_diags: cdscan generates an xml file, placing it in the relevant directory path parameter for e3sm_diags to use.

The relevant code steps
  1. cdscan is used to generate a xml file based on nc files corresponding to a particular variable between January of the start year and December of the end year.
$ git grep -n -B 2 cdscan e3sm_diags.bash
e3sm_diags.bash-102-      for file in ${ts_dir_source}/${v}_${YYYY}*.nc
e3sm_diags.bash-103-      do
e3sm_diags.bash:104:        # Add this time series file to the list of files for cdscan to use
--
e3sm_diags.bash-109-    xml_name=${v}_${begin_year}01_${end_year}12.xml
e3sm_diags.bash-110-    export CDMS_NO_MPI=true
e3sm_diags.bash:111:    zppy_cdscan_replacement ${xml_name} ${v}_files.txt
--
e3sm_diags.bash-130-  v="RIVER_DISCHARGE_OVER_LAND_LIQ"
e3sm_diags.bash-131-  xml_name=${v}_${begin_year}01_${end_year}12.xml
e3sm_diags.bash:132:  zppy_cdscan_replacement ${xml_name} ${ts_rof_dir_source}/${v}_*.nc
  1. This is done inside the create_links_ts function (and also the create_links_ts_rof function):
$ git grep -n -A 3 "create_links_ts(" e3sm_diags.bash
e3sm_diags.bash:84:create_links_ts()
e3sm_diags.bash-85-{
e3sm_diags.bash-86-  ts_dir_source=$1
e3sm_diags.bash-87-  ts_dir_destination=$2
  1. These 2 functions have created the xml files in the passed-in destination directories:
$ git grep -n -B 1 "create_links_ts" e3sm_diags.bash
e3sm_diags.bash-83-
e3sm_diags.bash:84:create_links_ts()
--
e3sm_diags.bash-120-
e3sm_diags.bash:121:create_links_ts_rof()
--
e3sm_diags.bash-201-ts_dir_source={{ output }}/post/atm/{{ grid }}/ts/monthly/{{ '%dyr' % (ts_num_years) }}
e3sm_diags.bash:202:create_links_ts ${ts_dir_source} ${ts_dir_primary} ${Y1} ${Y2} 5
--
e3sm_diags.bash-205-ts_dir_ref=ts_ref
e3sm_diags.bash:206:create_links_ts ${ts_dir_source} ${ts_dir_ref} ${ref_Y1} ${ref_Y2} 6
--
e3sm_diags.bash-221-ts_rof_dir_source="{{ output }}/post/rof/native/ts/monthly/{{ ts_num_years }}yr"
e3sm_diags.bash:222:create_links_ts_rof ${ts_rof_dir_source} ${ts_rof_dir_primary} ${Y1} ${Y2} 7
--
e3sm_diags.bash-225-ts_rof_dir_ref=ts_rof_ref
e3sm_diags.bash:226:create_links_ts_rof ${ts_rof_dir_source} ${ts_rof_dir_ref} ${ref_Y1} ${ref_Y2} 8
  1. Following one of the destination directories, we can see that ts_dir_primary is used to define test_ts in the generated e3sm_diags python file.
$ git grep -n -B 1 ts_dir_primary e3sm_diags.bash
e3sm_diags.bash-195-{% if run_type == "model_vs_obs" %}
e3sm_diags.bash:196:ts_dir_primary=ts
e3sm_diags.bash-197-{% elif run_type == "model_vs_model" %}
e3sm_diags.bash:198:ts_dir_primary=ts_test
--
e3sm_diags.bash-201-ts_dir_source={{ output }}/post/atm/{{ grid }}/ts/monthly/{{ '%dyr' % (ts_num_years) }}
e3sm_diags.bash:202:create_links_ts ${ts_dir_source} ${ts_dir_primary} ${Y1} ${Y2} 5
--
e3sm_diags.bash-278-short_name = '${short}'
e3sm_diags.bash:279:test_ts = '${ts_dir_primary}'
  1. That test_ts value is used to defined the test_data_path for several parameters.
$ git grep -n test_ts e3sm_diags.bash 
e3sm_diags.bash:279:test_ts = '${ts_dir_primary}'
e3sm_diags.bash:350:enso_param.test_data_path = test_ts
e3sm_diags.bash:406:qbo_param.test_data_path = test_ts
e3sm_diags.bash:436:ts_param.test_data_path = test_ts

For global_time_series: cdscan generates glb.xml files which are ultimately opened and processed to generate a global annual average for each variable contained.

The relevant code steps
  1. cdscan is used to generate a glb.xml file based on all the nc files in a particular glb subdirectory, for multiple subdirectories:
$ git grep -n -B 1 cdscan global_time_series.bash 
global_time_series.bash-44-    cd ${case_dir}/post/atm/glb/ts/monthly/${ts_num_years}yr
global_time_series.bash:45:    zppy_cdscan_replacement glb.xml *.nc
--
global_time_series.bash-56-    cd ${case_dir}/post/lnd/glb/ts/monthly/${ts_num_years}yr
global_time_series.bash:57:    zppy_cdscan_replacement -x glb.xml *.nc
--
global_time_series.bash-80-    cd ${case_dir}/post/ocn/glb/ts/monthly/${ts_num_years}yr
global_time_series.bash:81:    zppy_cdscan_replacement glb.xml mpaso.glb*.nc
  1. Which are then used to define values for keys in the exp directory:
$ git grep -n -B 2 "glb.xml" coupled_global.py
coupled_global.py-741-            "atmos": None
coupled_global.py-742-            if not use_atmos
coupled_global.py:743:            else "{}/post/atm/glb/ts/monthly/{}yr/glb.xml".format(
--
coupled_global.py-746-            "ice": None
coupled_global.py-747-            if not plots_ice
coupled_global.py:748:            else "{}/post/ice/glb/ts/monthly/{}yr/glb.xml".format(
--
coupled_global.py-751-            "land": None
coupled_global.py-752-            if not plots_lnd
coupled_global.py:753:            else "{}/post/lnd/glb/ts/monthly/{}yr/glb.xml".format(
--
coupled_global.py-756-            "ocean": None
coupled_global.py-757-            if not use_ocn
coupled_global.py:758:            else "{}/post/ocn/glb/ts/monthly/{}yr/glb.xml".format(
--
coupled_global.py-764-            "vol": None
coupled_global.py-765-            if not use_ocn
coupled_global.py:766:            else "{}/post/ocn/glb/ts/monthly/{}yr/glb.xml".format(
  1. These glb.xml files are then passed into set_var, implicitly as exp[exp_key]
$ git grep -n "set_var" coupled_global.py
coupled_global.py:552:def set_var(exp, exp_key, var_list, valid_vars, invalid_vars, rgn):
coupled_global.py:786:        set_var(exp, "atmos", vars_original, valid_vars, invalid_vars, rgn)
coupled_global.py:787:        set_var(exp, "atmos", plots_atm, valid_vars, invalid_vars, rgn)
coupled_global.py:788:        set_var(exp, "ice", plots_ice, valid_vars, invalid_vars, rgn)
coupled_global.py:789:        set_var(exp, "land", plots_lnd, valid_vars, invalid_vars, rgn)
coupled_global.py:790:        set_var(exp, "ocean", plots_ocn, valid_vars, invalid_vars, rgn)
  1. They're then passed to the TS class constructor:
$ git grep -n "exp\[exp_key\]" coupled_global.py
coupled_global.py:553:    if exp[exp_key] is not None:
coupled_global.py:554:        ts = TS(exp[exp_key])
  1. The glb.xml files are ultimately opened, formerly by cdms2.open and with this PR by xcdat.open_dataset
$ git grep -n -A 5 "TS" readTS.py
readTS.py:4:class TS(object):
readTS.py-5-    def __init__(self, filename):
readTS.py-6-
readTS.py-7-        self.filename = filename
readTS.py-8-
readTS.py-9-        self.f = xcdat.open_dataset(filename)
  1. For every variable in glb.xml, the globalAnnual is computed (the last 2 blocks have essentially just bypassed the function call of step 3):
git grep -n -B 3 "ts.globalAnnual" coupled_global.py
coupled_global.py-554-        ts = TS(exp[exp_key])
coupled_global.py-555-        for var in var_list:
coupled_global.py-556-            try:
coupled_global.py:557:                v, units = ts.globalAnnual(var)
--
coupled_global.py-792-        # Optionally read ohc
coupled_global.py-793-        if exp["ocean"] is not None:
coupled_global.py-794-            ts = TS(exp["ocean"])
coupled_global.py:795:            exp["annual"]["ohc"], _ = ts.globalAnnual("ohc")
--
coupled_global.py-798-
coupled_global.py-799-        if exp["vol"] is not None:
coupled_global.py-800-            ts = TS(exp["vol"])
coupled_global.py:801:            exp["annual"]["volume"], _ = ts.globalAnnual("volume")
  1. globalAnnual is computed formerly by cdutil.YEAR and with this PR by self.f.temporal.group_average:
$ git grep -n group_average readTS.py
readTS.py:68:            v = self.f.temporal.group_average(v, "year")

I assume this XML is used downstream with cdms2 (or another CDAT package) to open up the dataset.

Per the above, this is correct for global_time_series, but not for e3sm_diags (at least not directly. That is indeed probably the case within the e3sm_diags package -- but that is outside the scope of the zppy package).

you can probably bypass the XML step entirely and just pass the glob paths

Ok thanks, I will try experimenting with this.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented May 10, 2024

Following the example of E3SM-Project/e3sm_diags#803:

Running:

directory=zppy
cd ${directory}

# https://anaconda.org/cdat/repo?sort=_name&sort_order=asc
for cdat_package in cd77 cdat cdat_compute_graph cdat_info cdat_jupyter_vcdat cdms2 cdp cdtime cdutil cibots compute_graph django-allow-cidr dv3d e3sm_nex esgf-compute-api esgf-compute-cert esgf-search ezget genutil image-compare jupyter-vcdat jupyterlab-git lats libcdms lmoments mesalib myproxyclient output_viewer selenium_testlib testsrunner thermo toolchain vcdat vcs vcs-js vcsaddons vtk vtk-cdat wk xmgrace
do
    git grep -n ${cdat_package} ${directory}/* > /dev/null
    if [ $? == 0 ]; then
	echo ""
	echo "git grep -n ${cdat_package} ${directory}/*"
	git grep -n ${cdat_package} ${directory}/*
    fi
done

Gives:

zppy/templates/readTS.py:1:import cdms2
zppy/templates/readTS.py:10:        self.f = cdms2.open(filename)

git grep -n cdutil zppy/*
zppy/templates/readTS.py:2:import cdutil
zppy/templates/readTS.py:69:            v = cdutil.YEAR(v)

git grep -n wk zppy/*
zppy/templates/mpas_analysis.bash:301:size=`wc -c ${identifier}/logs/taskProgress.log | awk '{print $1}'`

And copied from #346 (comment):

$ git grep -n cdscan
docs/source/dev_guide/new_diags_set.rst:100:        cdscan -x ${xml_name} ${rofDir}/${v}_*.nc
zppy/templates/e3sm_diags.bash:104:        # Add this time series file to the list of files for cdscan to use
zppy/templates/e3sm_diags.bash:111:    cdscan -x ${xml_name} -f ${v}_files.txt
zppy/templates/e3sm_diags.bash:132:  cdscan -x ${xml_name} ${ts_rof_dir_source}/${v}_*.nc
zppy/templates/global_time_series.bash:46:cdscan -x glb.xml *.nc
zppy/templates/global_time_series.bash:70:    cdscan -x glb.xml mpaso.glb*.nc

$ git grep -n cdat
# No matches

$ git grep -n cdms
zppy/templates/readTS.py:1:import cdms2
zppy/templates/readTS.py:10:        self.f = cdms2.open(filename)

$ emacs conda/dev.yml
# No dependencies appear obviously related to cdat

$ emacs conda/meta.yml
# No dependencies appear obviously related to cdat

@forsyth2
Copy link
Collaborator Author

@tomvothecoder On the wider topic of any CDAT usage in zppy, it seems like there is very little direct CDAT dependency. It appears all CDAT usage in zppy is done in the package environments as opposed to the environment zppy itself is run from. The zppy dev environment doesn't even seem to depend on CDAT at all -- notice https://github.com/E3SM-Project/zppy/blob/main/conda/dev.yml doesn't appear to include any CDAT packages.

Comment on lines 78 to 81
# Refactor note: `AttributeError: 'Dataset' object has no attribute 'temporal'` seems to always occur
# Regardless if using CDAT or not, if using as object or class method.
# v = self.f.temporal.group_average(v, "year")
v = xarray.Dataset.temporal.group_average(v, "year")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomvothecoder It does indeed seem like I'm able to eliminate the cdscan call to generate an xml + the self.f = cdms2.open(filename) line by using xarray.open_mfdataset(f"{directory}/*.nc")

However, no matter what I do, Python doesn't seem to think temporal exists. I'm looking at https://xcdat.readthedocs.io/en/latest/generated/xarray.Dataset.temporal.group_average.html, and this is running in the latest Unified environment. Am I using the wrong averaging function here? (This line replaces v = cdutil.YEAR(v))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to import xcdat to gain access to the .temporal accessor class from xCDAT.

The xCDAT accessor classes extend top of xarray.Dataset objects.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, I just assumed it would be an xarray sub-module. Thanks for the info!

@forsyth2
Copy link
Collaborator Author

@tomvothecoder After much debugging/learning of xarray's types and data structures, I appear to actually have global_time_series working without any CDAT dependencies.

e3sm_diags may still require the cdscan replacement to create xml file (unless the e3sm_diags package is updated similarly to take the glob path directly).

As a side note, it's a real shame we weren't using typing on all zppy development, or all e3sm_diags development for that matter. Using explicit types really clears up if a variable is a CDAT remnant. In any case, using typing more going forward will certainly help clean up the code.

import matplotlib as mpl
import matplotlib.backends.backend_pdf
import matplotlib.pyplot as plt
import numpy as np
import xarray
import xcdat # noqa: F401
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed here

@xylar
Copy link
Contributor

xylar commented May 16, 2024

e3sm_diags may still require the cdscan replacement to create xml file (unless the e3sm_diags package is updated similarly to take the glob path directly).

I would hope that a similar approach would work in e3sm_diags and anywhere else using cdscan. Is the issue only the difference between passing a file (e.g. the XML file produced by cdscan) that has a list of files vs. just passing the list of files?

@tomvothecoder
Copy link
Collaborator

e3sm_diags may still require the cdscan replacement to create xml file (unless the e3sm_diags package is updated similarly to take the glob path directly).

e3sm_diags doesn't actually use cdscan as a dependency. I validated this by performing a full code-base search and no results appeared (it's also not in the dev.yml).

What I think zppy is doing is creating the XML linking multi-file datasets (more than 1 .nc file) then passing it to e3sm_diags to be opened with cdms2 internally. You probably have better knowledge though since this is the zppy codebase. Also pinging @chengzhuzhang to see if this sounds correct.

If it is correct, then we need to refactor e3sm_diags to use Xarray/xCDAT first before refactoring zppy is possible. I think cdms2 only supports opening multi-file datasets using cdscan, while Xarray can support the glob path or list of paths.

@chengzhuzhang
Copy link
Collaborator

@tomvothecoder you are correct cdscan is not a dependency for e3sm_diags. I think instead of xmls, zppy will instead to pass a list of file paths (that xarray.open_datasets supports) before runninge3sm_diags which we should test with refactored e3sm_diags code.

@forsyth2
Copy link
Collaborator Author

Is the issue only the difference between passing a file (e.g. the XML file produced by cdscan) that has a list of files vs. just passing the list of files?

@xylar Yes, I believe that was the point of producing the xml files.

What I think zppy is doing is creating the XML linking multi-file datasets (more than 1 .nc file) then passing it to e3sm_diags to be opened with cdms2 internally.

@tomvothecoder Yes, sorry I should have been clearer. By "e3sm_diags may still require the cdscan replacement to create xml file ", I meant that zppy would need the replacement in order to generate the xml file for e3sm_diags

I think cdms2 only supports opening multi-file datasets using cdscan, while Xarray can support the glob path or list of paths.

@tomvothecoder Interesting, yes that's what I meant by "(unless the e3sm_diags package is updated similarly to take the glob path directly)." Ok, so it sounds like we want to go with that option then? That is, give e3sm_diags the ability to take in multiple files, rather than having zppy create a single file for it to use?

Basically the two options I was getting at were:

  1. Create a cdscan replacement in zppy so zppy can pass a single xml file to e3sm_diags
  2. Update e3sm_diags so it can just take a list of files (via glob path) directly

So it sounds like we want (2)

@tomvothecoder
Copy link
Collaborator

Basically the two options I was getting at were:

  1. Create a cdscan replacement in zppy so zppy can pass a single xml file to e3sm_diags

  2. Update e3sm_diags so it can just take a list of files (via glob path) directly

So it sounds like we want (2)

Option (1) will work for the existing CDAT-based e3sm_diags codebase. However, option (2) is required for the refactored Xarray/xCDAT-based e3sm_diags codebase since cdms2 is being removed (which uses the XML files generated by cdscan through zppy).

You're correct, we want option (2). You can create a separate branch specifically to refactor the zppy code to pass a glob of input files to e3sm_diags. We can test this branch with e3sm_diags either by:

  1. Installing a custom build of e3sm_diags with the cdat-migration-fy24 branch in a zppy dev env -- this is preferred since we can test earlier rather than later
  2. Wait for a new release of e3sm_diags with completely refactored code -- this is my worst case

@forsyth2
Copy link
Collaborator Author

Installing a custom build of e3sm_diags with the cdat-migration-fy24 branch in a zppy dev env -- this is preferred since we can test earlier rather than later

@tomvothecoder Sure, that sounds good to me. Would it be better to merge the parts we can now though? I.e. global_time_series is now working without CDAT, so I would say we should just merge that now. And then we can do this test of e3sm_diags through zppy using the cdat-migration-fy24, but not actually merge that until e3sm_diags is in fact refactored on main (since otherwise it's not really useful and in fact breaks the current implementation)?

@tomvothecoder
Copy link
Collaborator

Installing a custom build of e3sm_diags with the cdat-migration-fy24 branch in a zppy dev env -- this is preferred since we can test earlier rather than later

@tomvothecoder Sure, that sounds good to me. Would it be better to merge the parts we can now though? I.e. global_time_series is now working without CDAT, so I would say we should just merge that now. And then we can do this test of e3sm_diags through zppy using the cdat-migration-fy24, but not actually merge that until e3sm_diags is in fact refactored on main (since otherwise it's not really useful and in fact breaks the current implementation)?

Yes, I would split out the work to refactor e3sm_diags and cdscan in a separate branch since it will be long-standing. Work that can be merged into main can be done on their own branch(es) ahead of time.

@forsyth2 forsyth2 force-pushed the issue-346 branch 2 times, most recently from 10451af to cd15860 Compare May 23, 2024 21:53
@forsyth2 forsyth2 marked this pull request as ready for review May 23, 2024 21:58
@forsyth2 forsyth2 merged commit 0883805 into main May 23, 2024
4 checks passed
@forsyth2 forsyth2 deleted the issue-346 branch May 23, 2024 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DevOps CI/CD, configuration, etc. priority: high High priority task
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace CDAT Explore replacement of cdscan
4 participants