-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add module files for building SCM with spack-stack on Derecho, Hera, Jet, Orion #406
Conversation
scm/etc/modules/derecho_intel.lua
Outdated
load(pathJoin("intel-classic", os.getenv("intel_classic_ver") or "2023.0.0")) | ||
load(pathJoin("cray-mpich", os.getenv("cray_mpich_ver") or "8.1.25")) | ||
|
||
prepend_path("MODULEPATH","/glade/work/epicufsrt/contrib/derecho/hpc-stack/intel-classic-2023.0.0/modulefiles/stack") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope you are aware that hpc-stack is essentially frozen and that EPIC, EMC and the UFS community has moved on to spack-stack. spack-stack modules are available on all platforms (please correct me if I am wrong) and I am pretty sure they can be used as drop-in replacements for hpc-stack modules. On Derecho, you'd also have a gnu stack available, by the way: https://spack-stack.readthedocs.io/en/1.5.1/PreConfiguredSites.html#ncar-wyoming-derecho
What's more, you wouldn't need conda to build Python envs on top of the software stack, since everything should be available (I am happy to try this for you).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that there is still a reliance on some non-standard python packages, like f90nml. Would we still need to create a python environment on top of the one from spack-stack in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give me a few minutes to try this and answer your question please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@climbfuji Thanks for the link; last I had checked spack-stack was still not supported on Derecho so I stuck with the libraries I knew would work. I will see if those work for the SCM build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@climbfuji I wasn't able to get spack-stack to work on Derecho. I receive the following error when attempting to load the spack-stack modules:
libfabric/1.15.2.0
While processing the following module(s):
Module fullname Module Filename
--------------- ---------------
stack-cray-mpich/8.1.25 /glade/work/epicufsrt/contrib/spack-stack/derecho/spack-stack-1.5.1/envs/unified-env/install/modulefiles/intel/2021.10.0/stack-cray-mpich/8.1.25.lua
derecho_intel /glade/derecho/scratch/kavulich/SCM/PR_406/ccpp-scm/scm/etc/modules/derecho_intel.lua
Which doesn't make sense to me, because libfabric/1.15.2.0
is supposed to be loaded inside the stack-cray-mpich/8.1.25
module (and trying to manually load it right before that step doesn't work either). I'll admit I'm a little unclear on the finer details of these modules though, is it possible I'm not loading these in the right order or something? The modulefile I'm using is below:
help([[
This module loads libraries for building the CCPP Single-Column Model on
the CISL machine Derecho (Cray) using Intel-classic-2023.0.0
]])
whatis([===[Loads libraries needed for building the CCPP SCM on Derecho ]===])
load(pathJoin("cmake", os.getenv("cmake_ver") or "3.26.3"))
load(pathJoin("ncarenv", os.getenv("ncarenv_ver") or "23.06"))
load(pathJoin("craype", os.getenv("craype_ver") or "2.7.20"))
prepend_path("MODULEPATH","/glade/work/epicufsrt/contrib/spack-stack/derecho/spack-stack-1.5.1/envs/unified-env/install/modulefiles/Core")
load("stack-intel/2021.10.0")
load("stack-cray-mpich/8.1.25")
load("stack-python/3.10.8")
load("bacio/2.4.1")
load("sp/2.3.3")
load("w3emc/2.9.2")
setenv("CC","cc")
setenv("FC","ftn")
setenv("CXX","CC")
setenv("CMAKE_C_COMPILER","cc")
setenv("CMAKE_CXX_COMPILER","CC")
setenv("CMAKE_Fortran_COMPILER","ftn")
setenv("CMAKE_Platform","derecho.intel")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that there is still a reliance on some non-standard python packages, like f90nml. Would we still need to create a python environment on top of the one from spack-stack in this case?
@grantfirl It looks like we do need to keep our own python environment, at least for now (the spack-stack environment does not contain f90nml as you expected).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does, module load py-f90nml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @climbfuji. So all the python packages in spack-stack are loaded via modules? I didn't think to look there, it looks like a lot of good packages 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, all of them are modules and all python packages start with py-
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can extend these spack python environments in case packages are missing to load whatever you can as modules from spack-stack, then create a virtual environment (python3 -m venv venv
) and then install missing packages via pip (python3 -m pip install NAME
). This way, all the spack-stack Python utilities are used unless there are version conflicts.
scm/etc/modules/hera_intel.lua
Outdated
prepend_path("MODULEPATH","/contrib/sutils/modulefiles") | ||
load("sutils") | ||
|
||
prepend_path("MODULEPATH", "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.4.1/envs/unified-env/install/modulefiles/Core") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the same version of spack-stack as in ufs-weather-model: https://github.com/ufs-community/ufs-weather-model/blob/develop/modulefiles/ufs_hera.intel.lua?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - ufs will move to 1.5.1 shortly. Do you want me to create a PR and update to 1.5.1? This way you don't waste your time if something goes less smooth than I was bragging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkavulich Can you update this PR at least to spack-stack-1.5.0 and then once ufs moves to 1.5.1, we can try to follow suit. I'd like to keep the SCM using the same stack as ufs-wm going forward, we just haven't been paying attention for a while due to not having a release.
I'll push a branch with an example spack-stack file for hera/intel shortly. The only thing that's missing I think is the optional doyxgen. We can consider adding that to spack-stack (but I remember it's a bit of a tricky package). We can also consider adding a separate template for the scm so that users who just want that (and not the full ufs-weather-model) only have to build a few libraries. But then again, the ufs-weather-model template should be good enough. |
1b140ba
to
17529c9
Compare
@DomHeinzeller @grantfirl I have updated the new modules to all use spack-stack 1.5.1, and also added one for Derecho GNU. I did re-run the regression tests for Hera Intel/GNU and they all passed but I have not done a comparison with the main branch baseline for Hera to ensure close-ish results; I can do that if you'd like but I just haven't had time yet. I also rebased my branch on the latest develop to fix the CI tests, all now seem to be passing. |
@mkavulich I've tried loading hera_intel on Hera with this code, and it works fine for me. I'm guessing that we'll need to tell folks to manually set the SCM_ROOT variable or does it make any sense to try to set it via the lua file? |
@mkavulich @dustinswales How could we use spack-stack for the CI tests? I'm guessing that if we want to switch that over too, we'll do that in a separate PR? For example, see https://github.com/JCSDA/spack-stack/blob/release/1.5.1/.github/workflows/ubuntu-ci-x86_64.yaml for setting up the environment? |
I've been meaning to talk to you about that. It seems to me like using this variable is unnecessary complexity, if this is set automatically through a setup script or modulefile why don't we just set it directly in the python script? |
Ya, that should work fine. The whole idea of having SCM_ROOT in the first place was to allow for flexibility with respect to where executables are stored and where the output goes. In the run script, we could check if the |
You could try to pull the containers we create for JEDI CI, they should have all the dependencies you need (but I agree that making this or any other solution a separate PR is better) |
@mkavulich I'm running into issues on Derecho. It apparently can't find NetCDF-fortran. You don't get this error? CMake Error at /glade/work/grantf/ccpp-scm/CMakeModules/Modules/FindNetCDF.cmake:246 (message): CMake Error at /glade/u/apps/derecho/23.09/spack/opt/spack/cmake/3.26.3/gcc/7.5.0/k34x/share/cmake-3.26/Modules/FindPackageHandleStandardArgs.cmake:230 (message): I see that the Hera module files have: but the Derecho ones do not. Is there a reason? |
@mkavulich FYI, if I add the netCDF load commands to the Derecho lua files, everything works fine for me. |
140bfc3
to
83d6e5b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with everything except NetCDF on Derecho (see comments).
@grantfirl Thanks for testing this out, I have had some testing frustrations because |
To me this is a bit suspicious. The spack |
@climbfuji I agree that it is suspicious that this issue is occurring. There appears to be something going on with different hdf5 versions compared to the system default. When you don't run a
Running
Now, both of those do work, but it maybe the warning does give some hint as to why those netcdf modules need to be explicitly loaded. |
You have to follow exactly the steps in https://spack-stack.readthedocs.io/en/latest/PreConfiguredSites.html#ncar-wyoming-derecho unless you want to set yourself up for trouble:
|
@climbfuji so I guess that means I have omitted ecflow and mysql because we don't use those applications. The new modulefiles appear to be working much better (along with doing a purge first); I pushed the updated Derecho files, and I'll make and test those changes for Hera later. @grantfirl can you try again with the latest files on Derecho (remembering to |
Yes - |
214a4a7
to
800d9b6
Compare
@climbfuji @grantfirl I am still waiting on help installing LaTeX tools for updating the users guide, but aside from that I think this PR is ready for re-review. I also added modulefiles for Jet and Orion while I was at it since it was simple to add based on the spack-stack instructions Dom sent (I don't have access to any of the other machines). |
what tools are you missing?
… On Nov 27, 2023, at 2:10 PM, Michael Kavulich ***@***.***> wrote:
@climbfuji <https://github.com/climbfuji> @grantfirl <https://github.com/grantfirl> I am still waiting on help installing LaTeX tools for updating the users guide, but aside from that I think this PR is ready for re-review. I also added modulefiles for Jet and Orion while I was at it since it was simple to add based on the spack-stack instructions Dom sent (I don't have access to any of the other machines).
—
Reply to this email directly, view it on GitHub <#406 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AN7FF5DZJBITCXUX6YEWFXTYGT6TLAVCNFSM6AAAAAA7DNMUPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRYGYYTGOBUG4>.
You are receiving this because you were mentioned.
|
@mkavulich Here is the PDF of the updated docs if you want to include it in this PR: |
@mkavulich I'd like to re-test this on Hera and Derecho so that we can maybe get this merged today. |
@mkavulich Can you merge in the latest NCAR/main commit: 1d8894f |
Everything works with Intel/GNU on Hera/Derecho. @mkavulich I'll approve/merge once this is updated to the latest NCAR/main commit. |
- Update hera_gnu for spack-stack 1.5.1
…ser forgot to clone recursively they don't get silent failures
…r me without them earlier...
…ecause for some reason calling logging prior to setup was messing up all the subsequent logging.
… been re-generated: my laptop currently lacks the necessary software (hopefully will get it soon).
800d9b6
to
6a7eb94
Compare
@grantfirl The branch should now be updated, and I tested one more time on Derecho with Intel. I think it's ready to go 👍 |
Yay! Welcome to the spack-stack user community :-) |
This PR introduces modulefiles for building SCM for Derecho (Intel) and Hera (Intel, GNU). It should be fairly easy to add analogous modulefiles for other EPIC-supported platforms, so let me know if that's desired.
I ran the regression test suite and there were some differences on Hera as expected. These differences were almost entirely at the precision noise level (<1e-10) except for a tests that had isolated significant differences. The vast majority of diffs across all fields and all tests were exactly 0. Differences from the baseline (compiled from top of develop with the old shell environment files) can be found in the files in the following directories if anyone wants to take a closer look:
Documentation has been updated in the
.tex
files, but I haven't been able to re-build the PDF yet. For now instructions for building on Derecho are here:https://docs.google.com/document/d/1Wg5dBIzwhjoYf6BhgmsUJczPEfxD3dA2yTRS5ftSICk/edit