Replies: 5 comments 1 reply
-
While the CCPP team is investigating this issue, here is a temporary solution that you may try: In your Hera login .cshrc file, add this line: Source .cshrc and rerun your model. |
Beta Was this translation helpful? Give feedback.
-
I did what you recommended, and the model that I previously compiled now begins to run, but eventually crashes on writing the initial “atmf000.nc” file with the following error:
+ srun --label -n 160 ./fv3.exe
0: Using aerosol-aware version of Thompson microphysics
0: calling table_ccnAct routine
0: creating qc collision eff tables
0: creating rain evap table
0: creating ice converting to snow table
0: creating rain collecting graupel table
0: creating freezing of water drops table
0: ... DONE microphysical lookup tables
144: file: module_write_netcdf.F90 line: 917
144: NetCDF: Name contains illegal characters
144: Abort(1) on node 144 (rank 144 in comm 496): application called MPI_Abort(comm=0x84000002, 1) - process 144
srun: error: h23c04: tasks 76-77,79: Exited with exit code 1
srun: launch/slurm: _step_signal: Terminating StepId=37444524.0
0: slurmstepd: error: *** STEP 37444524.0 ON h23c02 CANCELLED AT 2022-11-04T18:35:06 ***
srun: error: h23c02: tasks 1,3-4: Exited with exit code 1
srun: error: h23c06: tasks 82-83,94-95,97,100-101,119: Exited with exit code 1
srun: error: h23c02: tasks 19-20,22-23: Exited with exit code 1
srun: error: h23c06: tasks 80-81,84-93,96,98-99,102-118: Exited with exit code 1
srun: error: h23c02: tasks 0,2,5-18,21,24-39: Exited with exit code 1
srun: error: h25c35: tasks 120-159: Exited with exit code 1
srun: error: h23c04: tasks 40-75,78: Exited with exit code 1
… On Nov 4, 2022, at 12:00 PM, mzhangw ***@***.***> wrote:
While the CCPP team is investigating this issue, here is a temporary solution that you may use: In your Hera login .cshrc file, add this line:
setenv LD_LIBRARY_PATH "/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack/intel-2022.1.2/zstd/1.5.0/lib:$LD_LIBRARY_PATH"
Source .cshrc and rerun your model.
—
Reply to this email directly, view it on GitHub <#980 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ARRVLIE6OO7FN7IX2OGWX4LWGVFKXANCNFSM6AAAAAARXLEYZA>.
You are receiving this because you authored the thread.
|
Beta Was this translation helpful? Give feedback.
-
Hi Mike, It looks like the hpc-stack installation was somehow corrupted on Hera or something. This isn't a CCPP issue -- it's affecting any code that is pulling in the hpc-stack libraries from /scratch2/NCEPDEV/nwprod/hpc-stack. The SCM was using this on Hera too, which is why it was showing up there. It looks like a PR is in ufs-weather-model to start using a different, newer hpc-stack installation from EPIC that should fix your issue. You can try to bring in that PR branch to your code if you don't want to wait for this PR to be merged, or otherwise open up a discussion under ufs-weather-model to get more help. |
Beta Was this translation helpful? Give feedback.
-
It might be worth to take a look at ufs-wm issue 1450 |
Beta Was this translation helpful? Give feedback.
-
Just tested. ufs-wm develop branch work ok: /scratch1/NCEPDEV/stmp2/Jong.Kim/FV3_RT/rt_19377/cpld_control_p8 |
Beta Was this translation helpful? Give feedback.
-
Hello,
I ran a single-test Regression Test to create a run directory for running canned test cases. The executable that was created runs fine in the "canned" run directory that was created. However, when I compile the same code with tests/compile.sh, the executable crashes with the error message:
./fv3.exe: error while loading shared libraries: libzstd.so.1: cannot open shared object file: No such file or directory
It seems that the modules that are loaded for compilation are different when run from the regression test versus directly from tests/compile.sh.
My canned run directory is on Hera at:
/scratch1/BMC/wrfruc/mtoy/git_local/ufs-weather-model/rt_23531/rap_unified_ugwp_debug
My "tests" directory is at:
scratch1/BMC/wrfruc/mtoy/git_local/ufs-weather-model/tests
This seems to be a new issue -- with earlier versions of the code, this was not a problem.
Thanks,
Mike
Beta Was this translation helpful? Give feedback.
All reactions