-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
p8 / p5 tag issue on gaea: CPC experiment support #1755
Comments
Thanks for your help in advance. I git clone tags/Prototype-P8 on Gaea. In the tests/, I replaced module-setup.sh with the one in the develop branch. The main difference is "source /lustre/f2/dev/role.epic/contrib/Lmod_init.sh". But when I compiled it, I got the following error, My directory is /lustre/f2/scratch/ncep/JieShun.Zhu/FV3_RT/rt_22995/compile_001 In addition, my default shell is tsch. Before compiling, I changed it to bash by typing "bash". |
@jieshunzhu - |
@natalie-perlin Thanks for it. I am looking at #1753. |
I replaced /modulefiles with the one in develop branch. When compiling, I got the error about w3nco (/lustre/f2/scratch/ncep/JieShun.Zhu/FV3_RT/rt_12517/compile_001/err) Could not find a package configuration file provided by "w3nco" (requested
Add the installation prefix of "w3nco" to CMAKE_PREFIX_PATH or set In CMakeLists.txt of P8, I found "find_package(w3nco 2.4.0 REQUIRED)". But in the same file of develop branch, I found "find_package(w3emc 2.9.2 REQUIRED)". Are w3nco and w3emc replaceable? |
@jkbk2004 - The main difference is that before the C3 upgrade, the UFS weather-model compile jobs in regression tests were built on Gaea C3 login node, which would then use the same compilers and Cray prog. environment as used during the hpc-stack build time. Only the RT test binaries were run on C4. After the C3 upgrade, the RT weather-model compile jobs use different modules and prog. environment from the time the ./hpc-stack/intel-2022.0.2/ was built. (It may or may not create issues during the runtime.) |
An updated stack had been prepared with the same compilers as for ./hpc-stack/intel-2022.0.2/,
A subset of regression tests (from Closing the issue #1753 at the moment, which was for stack for the higher-version compilers. |
@jkbk2004 @zach1221 The logs from the remaining set of regression test (coupled) is attached. |
@natalie-perlin can you add yafyaml/v0.5.1 to /lustre/f2/dev/role.epic/contrib/hpc-stack/intel-2022.0.2/modulefiles/stack ? We used to use yafyaml with p8 tag @jieshunzhu is trying to use. |
By using intel-2022.0.2 and some other minor changes, I was able to compile P8 tag. For the regression tests, however, it missed baseline. Same thing happened for tag GFSv17.HR1. I will try intel-classic-2022.0.2 @natalie-perlin pointed. |
I agree baselines for those tags might be missing during OS transition. But we can compare a few cases with creating new baselines with tag. Compiler change is likely to cause some change at white noise level. We can confirm manually. |
@jkbk2004 That's what you find when loading the ./intel-2022.0.2/ stack:
... and when loading the ./intel-classic-2022.0.2/ stack:
UPD: Links created, modules are loadable either way, yafyaml/v0.5.1 or yafyaml/0.5.1. |
Even though it might not matter for me (because I have got P8 and HR1 complied by using intel-2022.0.2), I want to give you the update about HR1 compilation with intel-classic-2022.0.2. I got the error related to ESMF library. CMake Error at /ncrc/sw/gaea-cle7/uasw/ncrc/envs/20200417/opt/linux-sles15-x86_64/gcc-7.5.0/cmake-3.20.1-w7tkahac22qulhh bcbi6io54u5dfr36zs/share/cmake-3.20/Modules/FindPackageHandleStandardArgs.cmake:230 (message): More details are seen in /lustre/f2/scratch/ncep/JieShun.Zhu/FV3_RT/rt_7661/compile_001 |
@jieshunzhu - looking into it now! |
@jieshunzhu - It needs to have the following: load(pathJoin("cray-mpich", os.getenv("cray_mpich_ver") or "7.7.20")) load(pathJoin("hpc-cray-mpich", os.getenv("hpc_cray_mpich_ver") or "7.7.20")) |
@natalie-perlin Thanks for the quick response. Got your idea. Let me try it again. I will update soon. |
@natalie-perlin now both compilation and regression tests are done, but regression tests miss baseline. |
Do you wan us to create baseline with the code you are testing? so that we can continue to follow on as you move. |
@jkbk2004 not necessary if you are busy on other projects. Thanks for the help. Really appreciate it. |
@jkbk2004 @natalie-perlin Could you please reopen the issue? It looks like someone removed the hpc-stack which I used for building P8 months ago: /lustre/f2/dev/role.epic/contrib/hpc-stack/intel-2022.0.2/modulefiles/stack Now, I tried to rebuild it with /lustre/f2/dev/wpo/role.epic/contrib/spack-stack/spack-stack-1.4.1-c4/envs/ufs-pio-2.5.10/install/modulefiles/Core. With the spack-stack-1.4.1-c4, I can compile develop branch. But when building P8, I got errors about "PIO". Could you please help me take a look at it? |
Forgot to mention my directory with the error message: /lustre/f2/scratch/ncep/JieShun.Zhu/FV3_RT/rt_42113/compile_001 |
@natalie-perlin. I have no clear idea about which way is more efficient. The P5 version was set up at CPC two years ago by another person who has retired. |
@jieshunzhu - yes, I'm looking into these scripts, too. |
@natalie-perlin The person built it at CPC is Weiyu Yang. I don't know if he followed the structure of EMC's or completely his own style. He has retired, but let me try contacting him. If I find any useful information, I will share with you. Really appreciate your helps, Natalie. |
@natalie-perlin I called Weiyu and didnot get any useful information. As I mentioned earlier, Weiyu put in lots of hard-coded modifications. We have to modify them one by one when testing any new stacks. In addition, I compiled the system around 2 years ago. The associated log files are still here: /lustre/f2/dev/ncep/JieShun.Zhu/ufsp5/ufs-s2s-model_zbot/tests/log_gaea.intel/compile_1.log. That may help you better follow the scripts. |
Thank you for clarification of what needs to be done and for the log files. |
@jkbk2004 @natalie-perlin I am able to compile P5 using libraries that were built for P8 with hpc-stack. But when testing the executable, I got errors saying "Please verify that both the operating system and the processor support Intel(R) X87, CMOV, MMX, FXSAVE, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, MOVBE, POPCNT, AVX, F16C, FMA, BMI, LZCNT and AVX2 instructions." Did you see the error before? Thanks. |
sounds like it is not capturing processor information at compiler level. what about running cpuinfo ? |
@jkbk2004 Can you point me where I specify processor information in UFS? Here is my job card for my experiment: /lustre/f2/scratch/ncep/JieShun.Zhu/UFS_zbot/fcst_25e1/cpld_fv3_ccpp_mom6_cice_cmeps_cold_2023102500/job_card. The error is shown in the "out" file. |
Somewhere cmake level, I think: https://github.com/ufs-community/ufs-weather-model/blob/develop/cmake/configure_s4.intel.cmake or add directly to compile flag like ./CICE-interface/CICE/configuration/scripts/machines/Macros.derecho_intel:FFLAGS := -fp-model precise -convert big_e |
@jkbk2004 In my original compilation flags there was an option xcore-avx2 which is related to Intel processors. After removing it, the model can run a bit, but stopped after "COMPLETED MOM INITIALIZATION". The model just stuck there until reaching the wall clock. Did you or @natalie-perlin see a similar problem before? |
@jieshunzhu Maybe it might be worth to build mom6 with debug. Or some print out at mom6 main driver level. |
@jkbk2004 Thanks for the suggestions. I tried building MOM6 with debug, but it is interesting that I did not see additional log information. I am actually working with the same strategy as your second idea. I will let you know if I find anything. |
@jieshunzhu I am not sure if DDT (debugger) is available on gaea. I will check just in case. |
Thanks @jkbk2004. I havenot tried DDT before. I may ask you questions about it later. |
@jieshunzhu @jkbk2004 I'm getting close to have a P5 code compiled on my end on Gaea C5 with the spack-stack/1.4.1, which corresponds to the same version of compilers as EPIC-built hpc-stack (intel-classic-2023.1.0), and higher versions of hdf5/1.14.0, netcdf-c/4.9.2, esmf/4.8.2. There a couple of relatively simple errors/paths still need fixing for the fms build. |
@natalie-perlin Thanks for the update. I think I almost fix the problem by using hpc-stack. You can hold it on your side (I do not want to waste your time). But I may need to ask you about how to build spack-stack which I need to use for jedi-soca. Sine the jedi-soca version is not the develop branch, I may need to build an elder spack-stack. Thanks again for your and @jkbk2004 Jong's persistent support and help on our projects at CPC. Really appreciate it! |
@jkbk2004 @natalie-perlin Just want to give you an update about transitioning P5 to C5: it works now. The key thing here is still about the version of ESMF. I need to use an old version for P5. Thanks again for all your supports! |
Thank you so much for letting us know that this works for you! |
As to older spack-stack, if the packages and versions that you need in the jedi-soca have been made available to spack central repository, there should be no issues of building them as a part of custom spack-stack. The key is to know the list of exact packages to specify for the spack-stack configuration. |
Sure. It will be my pleasure. |
Thanks for sharing the information. I need to finish some other more urgent projects before going into the spack-stack. When starting with it, I may ask you questions about it. Thanks in advance. |
@jieshunzhu Congrats! It will be beneficial to continue the support for cpc's p5/p8/c5 operational run: stack, ufs-wm version update, etc. I will tag you up later. |
@jkbk2004 @natalie-perlin Do you have time to help me with another small tool? This tool converts CFSR atmospheric states to FV3 initial conditions. It uses lots of libraries of UFS/FV3, i.e., hpc-stack. I need to compile it on C5 as well.
|
Never mind. I got the problem fixed. Thanks anyway. |
@jieshunzhu we can extend a bit of #2005 on our side. |
@jkbk2004 @natalie-perlin Happy New Year! I am now trying to transition the JEDI soca-science to C5. Similar to my UFS problem, on C5 I failed running the version of soca-science I need by using spack-stack 1.5.1 (which works for the "develop" repository of soca-science). On C4, I can run it with spack-stack1.4.0. So I tried to install spack-stack1.4.0 in my own directory (/lustre/f2/dev/ncep/JieShun.Zhu/util/spack-stack/c5/spack-stack-1.4.0). I git clone spack-stack-1.4.0 directly from JCSDA website, and didnot make any changes. After installation, I cannot see Core/ under /envs/unified-dev/install/modulefiles/. Could you please give me some some hints about my problems? I saved installation log files in my directory. Thanks in advance. |
@jieshunzhu I'm going to place this ticket in resolved. Please let me know if you feel it should be kept open. |
@zach1221 Sure, it can be closed. Thanks! |
Description
Solution
The text was updated successfully, but these errors were encountered: