-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configuration issue with cray-mpich and CMake 3.22 or higher #517
Comments
@jayeshkrishna |
This issue can also be reproduced on Frontier without using SCORPIO:
CMake errors:
|
Even if CMAKE_SYSTEM_NAME is not set to Catamount, if craype-accel-amd-gfx90a and rocm/5.4.0 are loaded and -fopenmp flag is set to LDFLAGS, CMake 3.22 or higher also fails on Frontier with PrgEnv-cray:
CMake errors:
|
@jayeshkrishna A related issue has been created for CMake developers: |
This PR applies a temporary but necessary workaround to address a CMake regression (3.22.0 or higher) that causes find_package(MPI) to fail or hang when detected in subprojects on Cray systems. This workaround ensures that find_package(MPI) is properly detected in subprojects until the regression is resolved by CMake developers. Fixes #517 * dqwu/cmake_regression_workaround: Applying a workaround for CMake FindMPI regression on Cray systems
[Summary]
This seems to be an issue related to CMake 3.22 or higher on Cray systems:
not reproducible with 3.21.6, reproducible with 3.22.0, reproducible with latest 3.31.1.
Reproducible on some E3SM machines with available Cray MPICH, including Perlmutter, Crusher/Frontier, and Sunspot/Aurora.
[Steps to reproduce the CMake error of case 1]
On Frontier, run the commands below:
CMake output:
[Steps to reproduce the hanging issue of case 2]
On Frontier, run the commands below:
CMake output:
[Steps to reproduce the CMake error of case 3]
On Frontier, run the commands below:
CMake output:
[Comments]
E3SM previously set CMAKE_SYSTEM_NAME to Catamount on Crusher/Frontier, but this is no longer the case.
E3SM uses non-Cray MPI wrappers (e.g., mpicxx) on Frontier to enable the use of hipcc.
For some E3SM cases running on Frontier GPU nodes, the modules craype-accel-amd-gfx90a and rocm are loaded, often with the fopenmp flag added to the build configuration.
PR #439 moved MPI detection from the root level to subprojects, which is now affected by this regression in CMake (version 3.22.0 or higher). A confirmed workaround is to add a redundant find_package(MPI) call at the root level.
A related issue has been reported to the CMake developers:
https://discourse.cmake.org/t/regression-cmake-3-22-fails-to-find-mpi-on-cray-systems-reproducible-on-frontier-supercomputer/13045
Potential unconfirmed changes in CMake that might have caused this regression:
https://gitlab.kitware.com/cmake/cmake/-/merge_requests/6264
The text was updated successfully, but these errors were encountered: