-
Notifications
You must be signed in to change notification settings - Fork 46
[rrfs-mpas-jedi] Updates for running rrfs-workflow on WCOSS2 #803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rrfs-mpas-jedi] Updates for running rrfs-workflow on WCOSS2 #803
Conversation
README.md can be updated to include WCOSS2 now, :) |
@SamuelDegelia-NOAA I agree with you that But that can be done in a separate PR. Thanks! |
…e, and update README
Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks a lot for completing this heavy-lift work and addressing my comments!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like these definitions of things like NDATE and FSYNC shouldn't be needed for WCOSS2. If the prod_util module is loaded, they should be available. Or is something different with the spack libraries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just kept the same method for loading the prod_util
commands that we used for the other machines. This modulefile manually defines these variables. But it does look like we could just load prod_util
through spack-stack (or as a default module such as available on WCOSS2) and these commands and variables would be available without needing this extra modulefile/prod_util
directory.
@guoqing-noaa Do you know why we went with this method that manually defines the NDATE
etc. variables instead of just loading prod_util
through spack-stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MatthewPyle-NOAA and @SamuelDegelia-NOAA
As @SamuelDegelia-NOAA mentioned, the wcoss2.lua is essentially a copy of the original prod_util lua file of whatever available on each platform.
I did not recall all the details. But I think the reason to load this separately is that we only want to load modules as needed. The workflow only uses err_exit
, err_chk
, NDATE
, cpreq
from the prod_util, so we don't need any module dependencies in the original prod_util
. Also, I think we will load a few extra modules if we load prod_util directly from the spack-stack.
We may revisit this solution in the future. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanations, @SamuelDegelia-NOAA and @guoqing-noaa
DESCRIPTION OF CHANGES:
This PR adds config files and various other updates to allow running rrfs-workflow (version 2) on WCOSS2. Results from the 2024052700 retro are compared against results from Hera in #773. The workflow appears to be working as expected on WCOSS2.
A few notes:
cc
,CC
, andftn
). These compilers are needed to correctly handle MPI on WCOSS2. IFor MPASSIT, this is handled by a hash update. For MPAS-Model, since we only plan to update the model at certain times, I instead replaceMakefile
using the_workaround_
method.versions/build.ver
which overwrote the module versions loaded for UPP. To solve this, I addedversions/unset.ver
which clears these vars before loadingsorc/UPP/modulefiles/wcoss2.lua
.NTASKS
,PPN
) and thus is defined inworkflow/sideload/launch.sh
instead of in theexp.setup
file. This means we can probably removeMPI_RUN_CMD
from theexp.setup
files in a future update.ioda_bufr
. So instead we source a python virtual environment used for RRFSv1.TESTS CONDUCTED:
Ran 24-h of cycling with the default
exp/exp.conus12km
configuration file. Results are shown in issue #773.Machines/Platforms:
ISSUE:
Resolves #773