-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gaea C6 support for UFSWM #2448
Draft
BrianCurtis-NOAA
wants to merge
25
commits into
ufs-community:develop
Choose a base branch
from
BrianCurtis-NOAA:gaeac6
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from 6 commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
b968b96
initial testing to get UFSWM working on Gaea C6
BrianCurtis-NOAA efe342e
Merge branch 'develop' of github.com:ufs-community/ufs-weather-model …
BrianCurtis-NOAA 7476837
gaea->gaea-c5 and gaeac6->gaea-c6
BrianCurtis-NOAA 742a7c2
Fixed linter issue
BrianCurtis-NOAA 5bee5b2
Update to 192 cores on Gaea-c6
BrianCurtis-NOAA 63a56ac
Update tests to gaea-c5 and added gaea-c6 where necessary
BrianCurtis-NOAA bb83396
Remove MOM6SOLO from compile.sh
BrianCurtis-NOAA 532f418
Merge branch 'develop' into gaeac6
BrianCurtis-NOAA 4113fea
gaea-c5 --> gaeac5 and gaea-c6 --> gaeac6
BrianCurtis-NOAA fc0d9e6
Merge branch 'gaeac6' of github.com:BrianCurtis-NOAA/ufs-weather-mode…
BrianCurtis-NOAA 0c4790e
Bring in c5 changes from @RatkoVasic-NOAA and testing export FI_VERBS…
BrianCurtis-NOAA e1de81e
make identical but separate c5/c6 intelllvm lua for testing, fix miss…
BrianCurtis-NOAA 402c05b
update wording in intelllvm lua
BrianCurtis-NOAA f55fc8a
Update fv3_slurm.IN_gaeac6
jkbk2004 ece273d
Update rt.sh
jkbk2004 103bd7c
Merge remote-tracking branch 'upstream/develop' into gaeac6
RatkoVasic-NOAA 1d6908c
Update rocoto and ecflow module loading for Gaea-C6
RatkoVasic-NOAA fa68c07
Fix HAFS runtime errors.
RatkoVasic-NOAA 8942489
Change work-dir to open-for-read space.
RatkoVasic-NOAA 7b17f0f
Gaea C6 additions
RatkoVasic-NOAA 6107caa
Increase number of nodes for some cases on Gaea C6
RatkoVasic-NOAA 32b8418
Correct errors (variable MACHINE_ID to RT_COMPILER)
RatkoVasic-NOAA 3dcee6e
Adjust TPN for some test cases.
RatkoVasic-NOAA a035799
Merge branch 'develop' into gaeac6
RatkoVasic-NOAA 29cef49
Fix AND to OR in if statement.
RatkoVasic-NOAA File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
help([[ | ||
This module loads libraries required for building and running UFS Weather Model | ||
on the NOAA RDHPC machine Gaea C6 using Intel-2023.2.0. | ||
]]) | ||
|
||
whatis([===[Loads libraries needed for building the UFS Weather Model on Gaea C6]===]) | ||
|
||
prepend_path("MODULEPATH", "/ncrc/proj/epic/spack-stack/c6/spack-stack-1.6.0/envs/fms-2024.01/install/modulefiles/Core") | ||
|
||
stack_intel_ver=os.getenv("stack_intel_ver") or "2023.2.0" | ||
load(pathJoin("stack-intel", stack_intel_ver)) | ||
|
||
stack_cray_mpich_ver=os.getenv("stack_cray_mpich_ver") or "8.1.29" | ||
load(pathJoin("stack-cray-mpich", stack_cray_mpich_ver)) | ||
|
||
stack_python_ver=os.getenv("stack_python_ver") or "3.10.13" | ||
load(pathJoin("stack-python", stack_python_ver)) | ||
|
||
cmake_ver=os.getenv("cmake_ver") or "3.23.1" | ||
load(pathJoin("cmake", cmake_ver)) | ||
|
||
load("ufs_common") | ||
|
||
nccmp_ver=os.getenv("nccmp_ver") or "1.9.0.1" | ||
load(pathJoin("nccmp", nccmp_ver)) | ||
|
||
unload("darshan-runtime") | ||
unload("cray-libsci") | ||
|
||
setenv("CC","cc") | ||
setenv("CXX","CC") | ||
setenv("FC","ftn") | ||
setenv("CMAKE_Platform","gaea.intel") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
#!/bin/bash -l | ||
#SBATCH -e err | ||
#SBATCH -o out | ||
#SBATCH --account=@[ACCNR] | ||
##SBATCH --qos=@[QUEUE] | ||
#SBATCH --clusters=es | ||
#SBATCH --partition=eslogin_c6 | ||
#SBATCH --nodes=1 | ||
#SBATCH --ntasks-per-node=8 | ||
#SBATCH --mem-per-cpu=4G | ||
#SBATCH --time=180 | ||
#SBATCH --job-name="@[JBNME]" | ||
|
||
set -eux | ||
|
||
echo -n " $( date +%s )," > job_timestamp.txt | ||
echo "Compile started: " `date` | ||
|
||
@[PATHRT]/compile.sh @[MACHINE_ID] "@[MAKE_OPT]" @[COMPILE_ID] @[RT_COMPILER] | ||
|
||
echo "Compile ended: " `date` | ||
echo -n " $( date +%s )," >> job_timestamp.txt |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
#!/bin/bash -l | ||
#SBATCH -e err | ||
#SBATCH -o out | ||
#SBATCH --job-name="@[JBNME]" | ||
#SBATCH --account=@[ACCNR] | ||
#SBATCH --qos=@[QUEUE] | ||
#SBATCH --clusters=c6 | ||
#SBATCH --partition=batch | ||
#SBATCH --nodes=@[NODES] | ||
#SBATCH --ntasks-per-node=@[TPN] | ||
#SBATCH --time=@[WLCLK] | ||
|
||
set -eux | ||
echo -n " $( date +%s )," > job_timestamp.txt | ||
|
||
set +x | ||
MACHINE_ID=gaeac6 | ||
source ./module-setup.sh | ||
module use --prepend $PWD/modulefiles | ||
module load modules.fv3 | ||
module list | ||
set -x | ||
|
||
echo "Model started: " `date` | ||
|
||
export OMP_NUM_THREADS=@[THRD] | ||
export OMP_STACKSIZE=1024M | ||
export NC_BLKSZ=1M | ||
export ESMF_RUNTIME_PROFILE=ON | ||
export ESMF_RUNTIME_PROFILE_OUTPUT="SUMMARY" | ||
|
||
# Avoid job errors because of filesystem synchronization delays | ||
sync && sleep 1 | ||
|
||
# This "if" block is part of the rt.sh self-tests in error-test.conf. It emulates the model failing to run. | ||
if [ "${JOB_SHOULD_FAIL:-NO}" = WHEN_RUNNING ] ; then | ||
echo "The job should abort now, with exit status 1." 1>&2 | ||
echo "If error checking is working, the metascheduler should mark the job as failed." 1>&2 | ||
false | ||
fi | ||
|
||
srun --label -n @[TASKS] ./fv3.exe | ||
|
||
echo "Model ended: " `date` | ||
echo -n " $( date +%s )," >> job_timestamp.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
====START OF GAEAC6 REGRESSION TESTING LOG==== | ||
|
||
UFSWM hash used in testing: | ||
2ccc549348da37aac51ab44482174dff2bb2912d | ||
|
||
Submodule hashes used in testing: | ||
37cbb7d6840ae7515a9a8f0dfd4d89461b3396d1 AQM (v0.2.0-37-g37cbb7d) | ||
be5d28fd1b60522e6fc98aefeead20e6aac3530b AQM/src/model/CMAQ (CMAQv5.2.1_07Feb2018-198-gbe5d28fd1) | ||
1f9eaaa142c8b07ed6b788c9f44ea02cc86d0bae CDEPS-interface/CDEPS (cdeps0.4.17-42-g1f9eaaa) | ||
635d9a100a736bd8d14ad091e879d5da6e4eb2bd CICE-interface/CICE (CICE6.0.0-373-g635d9a1) | ||
4c87095256c1c599c3ccaa857a95744158751a60 CICE-interface/CICE/icepack (Icepack1.1.0-191-g4c87095) | ||
dc977bcadd1ade1a528dee75f1ad45e8bd80ca0a CMEPS-interface/CMEPS (cmeps_v0.4.1-2310-gdc977bc) | ||
cabd7753ae17f7bfcc6dad56daf10868aa51c3f4 CMakeModules (v1.0.0-28-gcabd775) | ||
a9364591091c836984a40107729720705847c195 FV3 (heads/develop) | ||
ac3055eff06099d61cd65e18bc4f0353ffd83f46 FV3/atmos_cubed_sphere (201912_public_release-405-gac3055e) | ||
0f8232724975c13289cad390c9a71fa2c6a9bff4 FV3/ccpp/framework (2024-07-11-dev) | ||
b6c433354394bd8ed5e46692a81149441ff4ae38 FV3/ccpp/physics (EP4-873-gb6c43335) | ||
74a0e098b2163425e4b5466c2dfcf8ae26d560a5 FV3/ccpp/physics/physics/Radiation/RRTMGP/rte-rrtmgp (v1.6) | ||
81b38a88d860ce7e34e8507c2246151a54d96a39 FV3/upp (upp_v10.2.0-218-g81b38a88) | ||
-1ba8270870947b583cd51bc72ff8960f4c1fb36e FV3/upp/sorc/libIFI.fd | ||
-567edcc94bc418d0dcd6cdaafed448eeb5aab570 FV3/upp/sorc/ncep_post.fd/post_gtg.fd | ||
041422934cae1570f2f0e67239d5d89f11c6e1b7 GOCART (sdr_v2.1.2.6-119-g0414229) | ||
bcf7777bb037ae2feb2a8a8ac51aacb3511b52d9 HYCOM-interface/HYCOM (2.3.00-122-gbcf7777) | ||
5e0c21f64fa5b20efc8f29f8709766e1e6793a79 MOM6-interface/MOM6 (dev/master/repository_split_2014.10.10-10230-g5e0c21f64) | ||
9423197f894112edfcb1502245f7d7b873d551f9 MOM6-interface/MOM6/pkg/CVMix-src (9423197) | ||
29e64d652786e1d076a05128c920f394202bfe10 MOM6-interface/MOM6/pkg/GSW-Fortran (29e64d6) | ||
3ac32f0db7a2a97d930f44fa5f060c983ff31ee8 NOAHMP-interface/noahmp (v3.7.1-436-g3ac32f0) | ||
7f548c795a348bbb0fe4967dd25692c79036dc73 WW3 (6.07.1-346-g7f548c79) | ||
05cad173feeb598431e3ef5f17c2df6562c8d101 fire_behavior (v0.2.0-1-g05cad17) | ||
fad2fe9f42f6b7f744b128b4a2a9433f91e4296f stochastic_physics (ufs-v2.0.0-219-gfad2fe9) | ||
|
||
|
||
NOTES: | ||
[Times](Memory) are at the end of each compile/test in format [MM:SS](Size). | ||
The first time is for the full script (prep+run+finalize). | ||
The second time is specifically for the run phase. | ||
Times/Memory will be empty for failed tests. | ||
|
||
BASELINE DIRECTORY: /gpfs/f6/drsa-fire2/world-shared/Brian.Curtis/NEMSfv3gfs/develop-20240909 | ||
COMPARISON DIRECTORY: /gpfs/f6/drsa-fire2/scratch/Brian.Curtis/RT_RUNDIRS/Brian.Curtis/FV3_RT/rt_2186049 | ||
|
||
RT.SH OPTIONS USED: | ||
* (-a) - HPC PROJECT ACCOUNT: drsa-fire2 | ||
* (-c) - CREATE NEW BASELINES | ||
* (-n) - RUN SINGLE TEST: cpld_control_p8 | ||
* (-e) - USE ECFLOW | ||
|
||
PASS -- COMPILE 's2swa_intel' [09:12, 08:08] ( 6 warnings 10 remarks ) | ||
FAILED: TEST TIMED OUT -- TEST 'cpld_control_p8_intel' [, ]( MB) | ||
|
||
SYNOPSIS: | ||
Starting Date/Time: 20241002 11:56:43 | ||
Ending Date/Time: 20241002 13:19:28 | ||
Total Time: 01h:23m:40s | ||
Compiles Completed: 1/1 | ||
Tests Completed: 0/1 | ||
Failed Tests: | ||
* TEST cpld_control_p8_intel: FAILED: TEST TIMED OUT | ||
-- LOG: /gpfs/f6/drsa-fire2/scratch/Brian.Curtis/RT_RUNDIRS/Brian.Curtis/FV3_RT/rt_2186049/cpld_control_p8_intel/err | ||
|
||
NOTES: | ||
A file 'test_changes.list' was generated with list of all failed tests. | ||
You can use './rt.sh -c -b test_changes.list' to create baselines for the failed tests. | ||
If you are using this log as a pull request verification, please commit 'test_changes.list'. | ||
|
||
Result: FAILURE | ||
|
||
====END OF GAEAC6 REGRESSION TESTING LOG==== |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -709,8 +709,8 @@ case ${MACHINE_ID} in | |
PTMP="/lfs/h2/emc/ptmp" | ||
SCHEDULER="pbs" | ||
;; | ||
gaea) | ||
echo "rt.sh: Setting up gaea..." | ||
gaea-c5) | ||
echo "rt.sh: Setting up gaea-c5..." | ||
if [[ "${ROCOTO:-false}" == true ]] ; then | ||
module use /ncrc/proj/epic/rocoto/modulefiles | ||
module load rocoto | ||
|
@@ -739,6 +739,41 @@ case ${MACHINE_ID} in | |
STMP=${STMP:-${dprefix}/RT_BASELINE} | ||
PTMP=${PTMP:-${dprefix}/RT_RUNDIRS} | ||
|
||
SCHEDULER="slurm" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can replace these couple of lines with:
Natalie installed rocoto on C6 (included in code change suggestion). |
||
;; | ||
gaea-c6) | ||
echo "rt.sh: Setting up gaea-c6..." | ||
if [[ "${ROCOTO:-false}" == true ]] ; then | ||
# module use /ncrc/proj/epic/rocoto/modulefiles | ||
module load rocoto | ||
ROCOTO_SCHEDULER="slurm" | ||
fi | ||
|
||
export LD_PRELOAD=/usr/lib64/libstdc++.so.6 | ||
module use /ncrc/proj/epic/spack-stack/c6/spack-stack-1.6.0/envs/fms-2024.01/install/modulefiles/Core | ||
#module load PrgEnv-intel/8.5.0 | ||
module load stack-intel/2023.2.0 | ||
#module load cray-mpich/8.1.29 | ||
module load python/3.10.13 | ||
module use /ncrc/proj/epic/spack-stack/modulefiles | ||
#module load gcc-native/12.3 | ||
if [[ "${ECFLOW:-false}" == true ]] ; then | ||
#module load ecflow/5.8.4 | ||
module load ecflow | ||
ECF_HOST=$(hostname) | ||
ECF_PORT=$(( $(id -u) + 1500 )) | ||
export ECF_PORT ECF_HOST | ||
fi | ||
|
||
#DISKNM=/gpfs/f5/epic/world-shared/UFS-WM_RT | ||
DISKNM=/gpfs/f6/drsa-fire2/world-shared/Brian.Curtis | ||
QUEUE=normal | ||
COMPILE_QUEUE=normal | ||
PARTITION=c6 | ||
dprefix=${dprefix:-/gpfs/f6/${ACCNR}/scratch/${USER}} | ||
STMP=${STMP:-${dprefix}/RT_BASELINE} | ||
PTMP=${PTMP:-${dprefix}/RT_RUNDIRS} | ||
|
||
SCHEDULER="slurm" | ||
;; | ||
hera) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we even need this logic here, adding or not adding -DMOM6SOLO=ON? As far as I know, we do not regression test MOM6SOLO. Can we remove this block of code entirely from this script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. It was added there for a reason, and I don't recall if we ever RT'd MOM6SOLO. @junwang-noaa do you recall what this block of code was used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly, this is to support standalone MOM testing. @jiandewang Do you know why MOM6 SOLO does not work on gaea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! I'm new to the UFS, but AFAIK, nobody seems to use
-DMOM6SOLO=ON
, though I would differ it to @junwang-noaa.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@junwang-noaa My understanding from @jiandewang is that he (and others) are no longer routinely testing MOM solo config; I have always built using instructions at MOM6-examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was added here many years ago and we never tried this SOLO on any platform. My understanding is with nuopc_cap it has to be coupled with something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes we use MOM-example to do standalone test when it's needed to do some debug work (to help GFDL to narrow down issue when their big PR is not working as expected in UWM).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you do not use tests/compile.sh to build standalone test, is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we should remove it from compile.sh