-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace CDAT #519
Replace CDAT #519
Conversation
@@ -43,7 +43,7 @@ atmosphere_only=${atmosphere_only,,} | |||
echo 'Create xml files for atm' | |||
export CDMS_NO_MPI=true | |||
cd ${case_dir}/post/atm/glb/ts/monthly/${ts_num_years}yr | |||
cdscan -x glb.xml *.nc | |||
python cdscan_replacement.py glb.xml *.nc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zppy
isn't finding the cdscan_replacement.py
file.
global_time_series
adds Python scripts to an accessible directory via the block at https://github.com/E3SM-Project/zppy/blob/main/zppy/global_time_series.py#L55:
c["global_time_series_dir"] = os.path.join(
scriptDir, "{}_dir".format(prefix)
)
if not os.path.exists(c["global_time_series_dir"]):
os.mkdir(c["global_time_series_dir"])
scripts = ["coupled_global.py", "readTS.py", "ocean_month.py"]
for script in scripts:
script_template = templateEnv.get_template(script)
script_file = os.path.join(c["global_time_series_dir"], script)
with open(script_file, "w") as f:
f.write(script_template.render(**c))
makeExecutable(script_file)
Would it make sense to do something similar for cdscan_replacement
? It seems like there should be a simpler way since there are no template parameters in that file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@forsyth2, why not make this a proper entry point in the zppy
package? That's the "right" way to do this:
https://packaging.python.org/en/latest/specifications/entry-points/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do that, you would add your functions along with a main()
function equivalent to your current if __name__ == "__main__":
block to zppy somewhere and then you would edit your setup.py
to add another entry point like this one:
https://github.com/E3SM-Project/zppy/blob/main/setup.py#L33
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar Thanks, I tried implementing it that way, but I'm getting zppy_cdscan_replacement: command not found
errors.
$ grep -n "cdscan_replacement" *.o*
e3sm_diags_atm_monthly_180x360_aave_environment_commands_model_vs_obs_1850-1851.o405614:1:/var/spool/slurmd/job405614/slurm_script: line 120: zppy_cdscan_replacement: command not found
e3sm_diags_atm_monthly_180x360_aave_environment_commands_model_vs_obs_1850-1853.o405616:1:/var/spool/slurmd/job405616/slurm_script: line 120: zppy_cdscan_replacement: command not found
e3sm_diags_atm_monthly_180x360_aave_environment_commands_model_vs_obs_1852-1853.o405615:1:/var/spool/slurmd/job405615/slurm_script: line 120: zppy_cdscan_replacement: command not found
e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1850-1851.o405611:1:/var/spool/slurmd/job405611/slurm_script: line 120: zppy_cdscan_replacement: command not found
e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1850-1853.o405613:1:/var/spool/slurmd/job405613/slurm_script: line 120: zppy_cdscan_replacement: command not found
e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1852-1853.o405612:1:/var/spool/slurmd/job405612/slurm_script: line 120: zppy_cdscan_replacement: command not found
global_time_series_1850-1860.o405621:2:/var/spool/slurmd/job405621/slurm_script: line 58: zppy_cdscan_replacement: command not found
Yet the command does show up when I run which
in the dev environment.
$ which zppy
~/miniconda3/envs/zppy_dev_issue_346/bin/zppy
$ which zppy_cdscan_replacement
~/miniconda3/envs/zppy_dev_issue_346/bin/zppy_cdscan_replacement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you using a different environment (e.g. E3SM-Unified) when you try to call zppy_cdscan_replacement
? If that command is part of the environment you use to launch jobs but then it's not available in the environment that's actually used when you run the jobs, it would make sense that you're getting these errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I don't think that worked but I can't seem to find any documentation as to why.
I do see on Oct. 3, Tom noted "I suggest exploring xarray.open_mfdataset()" in #346 (comment), but then in this thread from Oct. 5, I'm trying to use cdscan_replacement.py
. That seems to imply something about open_mfdataset
wasn't working properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, interesting! Still, I think it's worth trying again with open_mfdataset and reminding yourself what the error was. Maybe we can solve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking into it; I want to say it was something about it needing to convert to xml.
It also occurs to me that even if we use xarray
, we'd need to call it from a bash
script, so the same general problem (of needing to get a bash wrapper defined in the environment) would remain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the step-by-step requirements/description for the functionality involving cdscan
? This would make it easier to identify the equivalent APIs needed from other packages to replace the cdscan
code.
From my guess, in your comment here, it just looks like the code takes a glob of .nc
filepaths and converts it to XML using cdscan
. I assume this XML is used downstream with cdms2
(or another CDAT package) to open up the dataset.
$ git grep -n cdscan
docs/source/dev_guide/new_diags_set.rst:100: cdscan -x ${xml_name} ${rofDir}/${v}_*.nc
zppy/templates/e3sm_diags.bash:104: # Add this time series file to the list of files for cdscan to use
zppy/templates/e3sm_diags.bash:111: cdscan -x ${xml_name} -f ${v}_files.txt
zppy/templates/e3sm_diags.bash:132: cdscan -x ${xml_name} ${ts_rof_dir_source}/${v}_*.nc
zppy/templates/global_time_series.bash:46:cdscan -x glb.xml *.nc
zppy/templates/global_time_series.bash:70: cdscan -x glb.xml mpaso.glb*.nc
$ git grep -n cdms
zppy/templates/readTS.py:1:import cdms2
zppy/templates/readTS.py:10: self.f = cdms2.open(filename)
If that's the case, you can probably bypass the XML step entirely and just pass the glob paths (e.g., ${rofDir}/${v}_*.nc
directly to xarray.open_mfdataset()
to open up the dataset. However, you'll need to do some prototyping to validate a solution (I'm just throwing things out).
path = "${rofDir}/${v}_*.nc"
dataset = xarray.open_mfdataset(path)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the step-by-step requirements/description for the functionality involving
cdscan
?
For e3sm_diags
: cdscan
generates an xml
file, placing it in the relevant directory path parameter for e3sm_diags
to use.
The relevant code steps
cdscan
is used to generate axml
file based onnc
files corresponding to a particular variable between January of the start year and December of the end year.
$ git grep -n -B 2 cdscan e3sm_diags.bash
e3sm_diags.bash-102- for file in ${ts_dir_source}/${v}_${YYYY}*.nc
e3sm_diags.bash-103- do
e3sm_diags.bash:104: # Add this time series file to the list of files for cdscan to use
--
e3sm_diags.bash-109- xml_name=${v}_${begin_year}01_${end_year}12.xml
e3sm_diags.bash-110- export CDMS_NO_MPI=true
e3sm_diags.bash:111: zppy_cdscan_replacement ${xml_name} ${v}_files.txt
--
e3sm_diags.bash-130- v="RIVER_DISCHARGE_OVER_LAND_LIQ"
e3sm_diags.bash-131- xml_name=${v}_${begin_year}01_${end_year}12.xml
e3sm_diags.bash:132: zppy_cdscan_replacement ${xml_name} ${ts_rof_dir_source}/${v}_*.nc
- This is done inside the
create_links_ts
function (and also thecreate_links_ts_rof
function):
$ git grep -n -A 3 "create_links_ts(" e3sm_diags.bash
e3sm_diags.bash:84:create_links_ts()
e3sm_diags.bash-85-{
e3sm_diags.bash-86- ts_dir_source=$1
e3sm_diags.bash-87- ts_dir_destination=$2
- These 2 functions have created the
xml
files in the passed-in destination directories:
$ git grep -n -B 1 "create_links_ts" e3sm_diags.bash
e3sm_diags.bash-83-
e3sm_diags.bash:84:create_links_ts()
--
e3sm_diags.bash-120-
e3sm_diags.bash:121:create_links_ts_rof()
--
e3sm_diags.bash-201-ts_dir_source={{ output }}/post/atm/{{ grid }}/ts/monthly/{{ '%dyr' % (ts_num_years) }}
e3sm_diags.bash:202:create_links_ts ${ts_dir_source} ${ts_dir_primary} ${Y1} ${Y2} 5
--
e3sm_diags.bash-205-ts_dir_ref=ts_ref
e3sm_diags.bash:206:create_links_ts ${ts_dir_source} ${ts_dir_ref} ${ref_Y1} ${ref_Y2} 6
--
e3sm_diags.bash-221-ts_rof_dir_source="{{ output }}/post/rof/native/ts/monthly/{{ ts_num_years }}yr"
e3sm_diags.bash:222:create_links_ts_rof ${ts_rof_dir_source} ${ts_rof_dir_primary} ${Y1} ${Y2} 7
--
e3sm_diags.bash-225-ts_rof_dir_ref=ts_rof_ref
e3sm_diags.bash:226:create_links_ts_rof ${ts_rof_dir_source} ${ts_rof_dir_ref} ${ref_Y1} ${ref_Y2} 8
- Following one of the destination directories, we can see that
ts_dir_primary
is used to definetest_ts
in the generatede3sm_diags
python file.
$ git grep -n -B 1 ts_dir_primary e3sm_diags.bash
e3sm_diags.bash-195-{% if run_type == "model_vs_obs" %}
e3sm_diags.bash:196:ts_dir_primary=ts
e3sm_diags.bash-197-{% elif run_type == "model_vs_model" %}
e3sm_diags.bash:198:ts_dir_primary=ts_test
--
e3sm_diags.bash-201-ts_dir_source={{ output }}/post/atm/{{ grid }}/ts/monthly/{{ '%dyr' % (ts_num_years) }}
e3sm_diags.bash:202:create_links_ts ${ts_dir_source} ${ts_dir_primary} ${Y1} ${Y2} 5
--
e3sm_diags.bash-278-short_name = '${short}'
e3sm_diags.bash:279:test_ts = '${ts_dir_primary}'
- That
test_ts
value is used to defined thetest_data_path
for several parameters.
$ git grep -n test_ts e3sm_diags.bash
e3sm_diags.bash:279:test_ts = '${ts_dir_primary}'
e3sm_diags.bash:350:enso_param.test_data_path = test_ts
e3sm_diags.bash:406:qbo_param.test_data_path = test_ts
e3sm_diags.bash:436:ts_param.test_data_path = test_ts
For global_time_series
: cdscan
generates glb.xml
files which are ultimately opened and processed to generate a global annual average for each variable contained.
The relevant code steps
cdscan
is used to generate aglb.xml
file based on all thenc
files in a particularglb
subdirectory, for multiple subdirectories:
$ git grep -n -B 1 cdscan global_time_series.bash
global_time_series.bash-44- cd ${case_dir}/post/atm/glb/ts/monthly/${ts_num_years}yr
global_time_series.bash:45: zppy_cdscan_replacement glb.xml *.nc
--
global_time_series.bash-56- cd ${case_dir}/post/lnd/glb/ts/monthly/${ts_num_years}yr
global_time_series.bash:57: zppy_cdscan_replacement -x glb.xml *.nc
--
global_time_series.bash-80- cd ${case_dir}/post/ocn/glb/ts/monthly/${ts_num_years}yr
global_time_series.bash:81: zppy_cdscan_replacement glb.xml mpaso.glb*.nc
- Which are then used to define values for keys in the
exp
directory:
$ git grep -n -B 2 "glb.xml" coupled_global.py
coupled_global.py-741- "atmos": None
coupled_global.py-742- if not use_atmos
coupled_global.py:743: else "{}/post/atm/glb/ts/monthly/{}yr/glb.xml".format(
--
coupled_global.py-746- "ice": None
coupled_global.py-747- if not plots_ice
coupled_global.py:748: else "{}/post/ice/glb/ts/monthly/{}yr/glb.xml".format(
--
coupled_global.py-751- "land": None
coupled_global.py-752- if not plots_lnd
coupled_global.py:753: else "{}/post/lnd/glb/ts/monthly/{}yr/glb.xml".format(
--
coupled_global.py-756- "ocean": None
coupled_global.py-757- if not use_ocn
coupled_global.py:758: else "{}/post/ocn/glb/ts/monthly/{}yr/glb.xml".format(
--
coupled_global.py-764- "vol": None
coupled_global.py-765- if not use_ocn
coupled_global.py:766: else "{}/post/ocn/glb/ts/monthly/{}yr/glb.xml".format(
- These
glb.xml
files are then passed intoset_var
, implicitly asexp[exp_key]
$ git grep -n "set_var" coupled_global.py
coupled_global.py:552:def set_var(exp, exp_key, var_list, valid_vars, invalid_vars, rgn):
coupled_global.py:786: set_var(exp, "atmos", vars_original, valid_vars, invalid_vars, rgn)
coupled_global.py:787: set_var(exp, "atmos", plots_atm, valid_vars, invalid_vars, rgn)
coupled_global.py:788: set_var(exp, "ice", plots_ice, valid_vars, invalid_vars, rgn)
coupled_global.py:789: set_var(exp, "land", plots_lnd, valid_vars, invalid_vars, rgn)
coupled_global.py:790: set_var(exp, "ocean", plots_ocn, valid_vars, invalid_vars, rgn)
- They're then passed to the
TS
class constructor:
$ git grep -n "exp\[exp_key\]" coupled_global.py
coupled_global.py:553: if exp[exp_key] is not None:
coupled_global.py:554: ts = TS(exp[exp_key])
- The
glb.xml
files are ultimately opened, formerly bycdms2.open
and with this PR byxcdat.open_dataset
$ git grep -n -A 5 "TS" readTS.py
readTS.py:4:class TS(object):
readTS.py-5- def __init__(self, filename):
readTS.py-6-
readTS.py-7- self.filename = filename
readTS.py-8-
readTS.py-9- self.f = xcdat.open_dataset(filename)
- For every variable in
glb.xml
, theglobalAnnual
is computed (the last 2 blocks have essentially just bypassed the function call of step 3):
git grep -n -B 3 "ts.globalAnnual" coupled_global.py
coupled_global.py-554- ts = TS(exp[exp_key])
coupled_global.py-555- for var in var_list:
coupled_global.py-556- try:
coupled_global.py:557: v, units = ts.globalAnnual(var)
--
coupled_global.py-792- # Optionally read ohc
coupled_global.py-793- if exp["ocean"] is not None:
coupled_global.py-794- ts = TS(exp["ocean"])
coupled_global.py:795: exp["annual"]["ohc"], _ = ts.globalAnnual("ohc")
--
coupled_global.py-798-
coupled_global.py-799- if exp["vol"] is not None:
coupled_global.py-800- ts = TS(exp["vol"])
coupled_global.py:801: exp["annual"]["volume"], _ = ts.globalAnnual("volume")
globalAnnual
is computed formerly bycdutil.YEAR
and with this PR byself.f.temporal.group_average
:
$ git grep -n group_average readTS.py
readTS.py:68: v = self.f.temporal.group_average(v, "year")
I assume this XML is used downstream with cdms2 (or another CDAT package) to open up the dataset.
Per the above, this is correct for global_time_series
, but not for e3sm_diags
(at least not directly. That is indeed probably the case within the e3sm_diags
package -- but that is outside the scope of the zppy
package).
you can probably bypass the XML step entirely and just pass the glob paths
Ok thanks, I will try experimenting with this.
Following the example of E3SM-Project/e3sm_diags#803: Running:
Gives:
And copied from #346 (comment):
|
@tomvothecoder On the wider topic of any CDAT usage in |
zppy/templates/readTS.py
Outdated
# Refactor note: `AttributeError: 'Dataset' object has no attribute 'temporal'` seems to always occur | ||
# Regardless if using CDAT or not, if using as object or class method. | ||
# v = self.f.temporal.group_average(v, "year") | ||
v = xarray.Dataset.temporal.group_average(v, "year") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomvothecoder It does indeed seem like I'm able to eliminate the cdscan
call to generate an xml
+ the self.f = cdms2.open(filename)
line by using xarray.open_mfdataset(f"{directory}/*.nc")
However, no matter what I do, Python doesn't seem to think temporal
exists. I'm looking at https://xcdat.readthedocs.io/en/latest/generated/xarray.Dataset.temporal.group_average.html, and this is running in the latest Unified environment. Am I using the wrong averaging function here? (This line replaces v = cdutil.YEAR(v)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to import xcdat
to gain access to the .temporal
accessor class from xCDAT.
The xCDAT accessor classes extend top of xarray.Dataset
objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, I just assumed it would be an xarray
sub-module. Thanks for the info!
@tomvothecoder After much debugging/learning of
As a side note, it's a real shame we weren't using typing on all |
zppy/templates/coupled_global.py
Outdated
import matplotlib as mpl | ||
import matplotlib.backends.backend_pdf | ||
import matplotlib.pyplot as plt | ||
import numpy as np | ||
import xarray | ||
import xcdat # noqa: F401 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed here
I would hope that a similar approach would work in |
What I think If it is correct, then we need to refactor |
@tomvothecoder you are correct |
@xylar Yes, I believe that was the point of producing the
@tomvothecoder Yes, sorry I should have been clearer. By "
@tomvothecoder Interesting, yes that's what I meant by "(unless the Basically the two options I was getting at were:
So it sounds like we want (2) |
Option (1) will work for the existing CDAT-based You're correct, we want option (2). You can create a separate branch specifically to refactor the
|
@tomvothecoder Sure, that sounds good to me. Would it be better to merge the parts we can now though? I.e. |
Yes, I would split out the work to refactor |
10451af
to
cd15860
Compare
Replace CDAT. Resolves #346. Resolves #80.