Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Intel/impi version to Intel 2021/2022 #1221

Merged
merged 24 commits into from
May 22, 2022

Conversation

junwang-noaa
Copy link
Collaborator

@junwang-noaa junwang-noaa commented May 16, 2022

PR Checklist

  • This PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.

  • This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR

  • An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR
    are specified below.

  • Results for one or more of the regression tests change and the reasons for the changes are understood and explained below.

  • New or updated input data is required by this PR. If checked, please work with the code managers to update input data sets on all platforms.

Instructions: All subsequent sections of text should be filled in as appropriate.

The information provided below allows the code managers to understand the changes relevant to this PR, whether those changes are in the ufs-weather-model repository or in a subcomponent repository. Ufs-weather-model code managers will use the information provided to add any applicable labels, assign reviewers and place it in the Commit Queue. Once the PR is in the Commit Queue, it is the PR owner's responsiblity to keep the PR up-to-date with the develop branch of ufs-weather-model.

Description

In this PR, the Intel compiler will be updated to Intel 2021/2022 for hera, orion, jet, gaea, and Cheyenne. The wcoss dell/cray will go down in two months and additional disk space cleanup is required for installing the hpc-stack with the new Intel compiler, so the intel compiler is not updated on these two platforms.

Additional changes are:

  1. The environment variable I_MPI_DAPL_UD is turned off and all the GOCART tests passed.
  2. png library is changed to libpng in ufs_common/ufs_common_debug.

The branch was tested on hera, orion, jet and gaea. Butterfly tests were conducted on orion/hera. The results are summarized in the issue #1113.

Issue(s) addressed

Link the issues to be closed with this PR, whether in this repository, or in another repository.
(Remember, issues must always be created before starting work on a PR branch!)

Testing

How were these changes tested? What compilers / HPCs was it tested with? Are the changes covered by regression tests? (If not, why? Do new tests need to be added?) Have regression tests and unit tests (utests) been run? On which platforms and with which compilers? (Note that unit tests can only be run on tier-1 platforms)

Dependencies

If testing this branch requires non-default branches in other repositories, list them. Those branches should have matching names (ideally).

Do PRs in upstream repositories need to be merged first?
If so add the "waiting for other repos" label and list the upstream PRs

  • waiting on noaa-emc/nems/pull/<pr_number>
  • waiting on noaa-emc/fv3atm/pull/<pr_number>

@jkbk2004
Copy link
Collaborator

@junwang-noaa I am testing cpld_control_p8 on cheyenne. It seems results changes. Maybe I need butterfly test for validation.

@junwang-noaa
Copy link
Collaborator Author

@jkbk2004 Updating Intel/impi version will change the results. But the library updates (g2 (3.4.5), Jasper (2.0.25), PIO (2.5.3), libpng (1.6.37)) will not change results. Please let me know from which libraries that you found result change.

@junwang-noaa junwang-noaa added Baseline Updates Current baselines will be updated. hera-gnu-BL labels May 20, 2022
@BrianCurtis-NOAA
Copy link
Collaborator

I don't see hera.gnu and I don't see a label that would have started it after the previous failure. Is this being worked on?

@jkbk2004
Copy link
Collaborator

@MinsukJi-NOAA @BrianCurtis-NOAA do you want me to run RT on hera.gnu to confirm? so that we can merge.

@jkbk2004
Copy link
Collaborator

@BrianCurtis-NOAA @MinsukJi-NOAA hera.gnu should work since this pr has not much to do with gnu.

@BrianCurtis-NOAA
Copy link
Collaborator

@MinsukJi-NOAA Would you rather copy over the GNU baselines from the previous date, since i assume they won't change and add hera-gnu-RT, or just add the hera-gnu-BL?

@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification
Machine: hera
Compiler: gnu
Job: RT
[RT] Repo location: /scratch1/NCEPDEV/nems/emc.nemspara/autort/pr/937515199/20220520180014/ufs-weather-model
Please make changes and add the following label back: hera-gnu-RT

@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification Machine: hera Compiler: gnu Job: RT [RT] Repo location: /scratch1/NCEPDEV/nems/emc.nemspara/autort/pr/937515199/20220520180014/ufs-weather-model Please make changes and add the following label back: hera-gnu-RT

@MinsukJi-NOAA

'cat: /scratch1/NCEPDEV/nems/emc.nemspara/autort/pr/937515199/20220520180014/ufs-weather-model/tests/log_hera.gnu/compile_*_time.log: No such file or directory', "++ echo 'rt.sh error on line 858'", 'rt.sh error on line 858'

@BrianCurtis-NOAA
Copy link
Collaborator

Project: stmp1 
		Directory: /scratch2/NCEPDEV/stmp1 DiskInUse=420724 GB, Quota=400000 GB, Files=27611254, FileQUota=80000000
		Directory: /scratch1/NCEPDEV/stmp2 DiskInUse=418850 GB, Quota=400000 GB, Files=40878708, FileQUota=80000000
		Directory: /scratch2/NCEPDEV/stmp3 DiskInUse=420724 GB, Quota=400000 GB, Files=27611254, FileQUota=80000000
		Directory: /scratch1/NCEPDEV/stmp4 DiskInUse=418850 GB, Quota=400000 GB, Files=40878708, FileQUota=80000000

We're full, maybe thats it?

@MinsukJi-NOAA
Copy link
Contributor

@jkbk2004 @BrianCurtis-NOAA This is Hera.gnu compile error message.

Lmod has detected the following error: These module(s) or extension(s) exist
but cannot be loaded as requested: "libpng/1.6.37"
   Try: "module spider libpng/1.6.37" to see how to load the module(s).

@BrianCurtis-NOAA
Copy link
Collaborator

BrianCurtis-NOAA commented May 20, 2022

@MinsukJi-NOAA

It loaded fine when i did it manually. I wonder why it had issue? Maybe something temp on Hera?

@MinsukJi-NOAA
Copy link
Contributor

MinsukJi-NOAA commented May 20, 2022

@MinsukJi-NOAA

It loaded fine when i did it manually. I wonder why it had issue? Maybe something temp on Hera?

@BrianCurtis-NOAA I am not able to manually load libpng

[Minsuk.Ji@hecflow01 tests]$ module use ../modulefiles/
[Minsuk.Ji@hecflow01 tests]$ pwd
/scratch1/NCEPDEV/stmp4/Minsuk.Ji/PR1221/tests
[Minsuk.Ji@hecflow01 tests]$ module use /scratch1/NCEPDEV/nems/emc.nemspara/soft/modulefiles
[Minsuk.Ji@hecflow01 tests]$ module load miniconda3/3.7.3
[Minsuk.Ji@hecflow01 tests]$ 
[Minsuk.Ji@hecflow01 tests]$ module use /contrib/sutils/modulefiles
[Minsuk.Ji@hecflow01 tests]$ module load sutils
[Minsuk.Ji@hecflow01 tests]$ 
[Minsuk.Ji@hecflow01 tests]$ module load cmake/3.20.1
[Minsuk.Ji@hecflow01 tests]$ 
[Minsuk.Ji@hecflow01 tests]$ module use /scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack/modulefiles/stack
[Minsuk.Ji@hecflow01 tests]$ 
[Minsuk.Ji@hecflow01 tests]$ module load hpc/1.1.0
[Minsuk.Ji@hecflow01 tests]$ 
[Minsuk.Ji@hecflow01 tests]$ module load hpc-gnu/9.2.0
[Minsuk.Ji@hecflow01 tests]$ module load hpc-mpich/3.3.2
[Minsuk.Ji@hecflow01 tests]$ module load ufs_common
Lmod has detected the following error:  These module(s) or extension(s) exist but cannot be loaded as requested: "libpng/1.6.37"
   Try: "module spider libpng/1.6.37" to see how to load the module(s).

@jkbk2004
Copy link
Collaborator

@BrianCurtis-NOAA @MinsukJi-NOAA @junwang-noaa seems like no libpng with gnu there. we have to follow up with nceplibs team.
------------------ /scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack/modulefiles/compiler/gnu/9.2.0 -------------------
bacio/2.4.1 g2c/1.6.2 grib_util/1.2.4 (D) png/1.6.35 w3nco/2.4.1

@jkbk2004
Copy link
Collaborator

@BrianCurtis-NOAA @MinsukJi-NOAA I sent a message on hpc-stack slack and asked Kyle and Hang to install one.

@jkbk2004
Copy link
Collaborator

@MinsukJi-NOAA @BrianCurtis-NOAA I checked Hang just installed libpng/1.6.37. Its available on hera now. So nice to see his quick installation.

@MinsukJi-NOAA
Copy link
Contributor

@MinsukJi-NOAA @BrianCurtis-NOAA I checked Hang just installed libpng/1.6.37. Its available on hera now. So nice to see his quick installation.

Thanks so much @jkbk2004! hera.gnu RT is being rerun.

@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification
Machine: gaea
Compiler: intel
Job: BL
[BL] Repo location: /lustre/f2/pdata/ncep/emc.nemspara/autort/pr/937515199/20220520033006/ufs-weather-model
[BL] Baseline creation and move successful
[RT] Repo location: /lustre/f2/pdata/ncep/emc.nemspara/autort/pr/937515199/20220520104410/ufs-weather-model
[RT] Error: Test cpld_restart_c384_p8 011 failed in run_test failed
Please make changes and add the following label back: gaea-intel-BL

@junwang-noaa
Copy link
Collaborator Author

@MinsukJi-NOAA @jkbk2004 @BrianCurtis-NOAA Thank you very much for helping running RT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Baseline Updates Current baselines will be updated.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update intel/impi compiler to 2022
6 participants