Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuration issue with cray-mpich and CMake 3.22 or higher #517

Open
dqwu opened this issue May 27, 2023 · 4 comments · May be fixed by #623
Open

Configuration issue with cray-mpich and CMake 3.22 or higher #517

dqwu opened this issue May 27, 2023 · 4 comments · May be fixed by #623

Comments

@dqwu
Copy link
Contributor

dqwu commented May 27, 2023

[Summary]
This seems to be an issue related to CMake 3.22 or higher on Cray systems:
not reproducible with 3.21.6, reproducible with 3.22.0, reproducible with latest 3.31.1.

  1. For Cray wrappers (cc, CC, ftn) and "-DCMAKE_SYSTEM_NAME=Catamount", there is a CMake error (Could NOT find MPI). Not reproducible if CMAKE_SYSTEM_NAME is not set.
  2. For non-Cray MPI wrappers (mpicc, mpicxx, mpifort), there is a hanging issue during configuration, no matter CMAKE_SYSTEM_NAME is set to Catamount or not.
  3. For Cray wrappers (cc, CC, ftn) of PrgEnv-cray, if craype-accel-amd-gfx90a and rocm modules are loaded, with "fopenmp" set to LDFLAGS, there is a CMake error (Could NOT find MPI). Not reproducible with PrgEnv-gnu.

Reproducible on some E3SM machines with available Cray MPICH, including Perlmutter, Crusher/Frontier, and Sunspot/Aurora.

[Steps to reproduce the CMake error of case 1]
On Frontier, run the commands below:

module load PrgEnv-gnu
module load cmake/3.27.9

git clone https://github.com/E3SM-Project/scorpio.git
cd scorpio

mkdir build
cd build

CC=cc CXX=CC FC=ftn \
cmake -Wno-dev \
-DCMAKE_SYSTEM_NAME=Catamount \
-DWITH_NETCDF=OFF \
-DPnetCDF_PATH=/opt/cray/pe/parallel-netcdf/1.12.3.3/gnu/9.1 \
..

CMake output:

...
-- ===== Configuring SCORPIO File info tool... =====
-- Could NOT find MPI_C (missing: MPI_C_WORKS) 
-- Found MPI_CXX: /opt/cray/pe/craype/2.7.31.11/bin/CC (found version "3.1") 
-- Could NOT find MPI_Fortran (missing: MPI_Fortran_WORKS) 
CMake Error at /autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find MPI (missing: MPI_C_FOUND MPI_Fortran_FOUND) (found version
  "3.1")
Call Stack (most recent call first):
  /autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
  /autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindMPI.cmake:1837 (find_package_handle_standard_args)
  tools/spio_finfo/CMakeLists.txt:21 (find_package)

[Steps to reproduce the hanging issue of case 2]
On Frontier, run the commands below:

module load PrgEnv-gnu
module load cmake/3.27.9

git clone https://github.com/E3SM-Project/scorpio.git
cd scorpio

mkdir build
cd build

CC=mpicc CXX=mpicxx FC=mpifort \
cmake -Wno-dev \
-DWITH_NETCDF=OFF \
-DPnetCDF_PATH=/opt/cray/pe/parallel-netcdf/1.12.3.3/gnu/9.1 \
..

CMake output:

...
-- Looking for gettimeofday - found
-- ===== Configuring SCORPIO C library... =====
// Hanging here

[Steps to reproduce the CMake error of case 3]
On Frontier, run the commands below:

module load PrgEnv-cray
module load cmake/3.27.9
module load craype-accel-amd-gfx90a rocm/5.4.0

git clone https://github.com/E3SM-Project/scorpio.git
cd scorpio

mkdir build
cd build

CC=cc CXX=CC FC=ftn \
LDFLAGS="-fopenmp" \
cmake -Wno-dev \
-DWITH_NETCDF=OFF \
-DPnetCDF_PATH=/opt/cray/pe/parallel-netcdf/1.12.3.1/crayclang/14.0 \
..

CMake output:

...
-- ===== Configuring SCORPIO File info tool... =====
-- Could NOT find MPI_C (missing: MPI_C_WORKS) 
-- Found MPI_CXX: /opt/cray/pe/craype/2.7.31.11/bin/CC (found version "3.1") 
-- Could NOT find MPI_Fortran (missing: MPI_Fortran_WORKS) 
CMake Error at /autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find MPI (missing: MPI_C_FOUND MPI_Fortran_FOUND) (found version
  "3.1")
Call Stack (most recent call first):
  /autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
  /autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindMPI.cmake:1837 (find_package_handle_standard_args)
  tools/spio_finfo/CMakeLists.txt:21 (find_package)

[Comments]
E3SM previously set CMAKE_SYSTEM_NAME to Catamount on Crusher/Frontier, but this is no longer the case.

E3SM uses non-Cray MPI wrappers (e.g., mpicxx) on Frontier to enable the use of hipcc.

For some E3SM cases running on Frontier GPU nodes, the modules craype-accel-amd-gfx90a and rocm are loaded, often with the fopenmp flag added to the build configuration.

PR #439 moved MPI detection from the root level to subprojects, which is now affected by this regression in CMake (version 3.22.0 or higher). A confirmed workaround is to add a redundant find_package(MPI) call at the root level.

A related issue has been reported to the CMake developers:
https://discourse.cmake.org/t/regression-cmake-3-22-fails-to-find-mpi-on-cray-systems-reproducible-on-frontier-supercomputer/13045

Potential unconfirmed changes in CMake that might have caused this regression:
https://gitlab.kitware.com/cmake/cmake/-/merge_requests/6264

@dqwu
Copy link
Contributor Author

dqwu commented Jun 5, 2023

@jayeshkrishna
FYI, E3SM developers have decided to remove CMake macro file used on old Cray supercomputers using Catamount OS, see E3SM-Project/E3SM#5745

@dqwu
Copy link
Contributor Author

dqwu commented Nov 22, 2024

This issue can also be reproduced on Frontier without using SCORPIO:

module load PrgEnv-gnu
module load cmake/3.27.9

mkdir src1
mkdir src2

cat <<EOF >> CMakeLists.txt
project (MY_PROJECT C)
message(STATUS "Configuring src1")
add_subdirectory(src1)
message(STATUS "Configuring src2")
add_subdirectory(src2)
EOF

cd src1
mkdir src1_subdir1
mkdir src1_subdir2

cat <<EOF >> CMakeLists.txt
add_subdirectory(src1_subdir1)
add_subdirectory(src1_subdir2)
EOF

cd src1_subdir1
cat <<EOF >> CMakeLists.txt
message(STATUS "Configuring src1_subdir1")
find_package(MPI REQUIRED COMPONENTS C)
EOF

cd ../src1_subdir2
cat <<EOF >> CMakeLists.txt
message(STATUS "Configuring src1_subdir2")
find_package(MPI REQUIRED COMPONENTS C)
EOF

cd ../../src2
mkdir src2_subdir

cat <<EOF >> CMakeLists.txt
add_subdirectory(src2_subdir)
EOF

cd src2_subdir
cat <<EOF >> CMakeLists.txt
message(STATUS "Configuring src2_subdir")
find_package(MPI REQUIRED COMPONENTS C)
EOF

cd ../..

mkdir build
cd build

CC=cc \
cmake -Wno-dev \
-DCMAKE_SYSTEM_NAME=Catamount \
..

CMake errors:

-- The C compiler identification is GNU 12.3.0
-- Cray Programming Environment 2.7.31.11 C
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/cray/pe/craype/2.7.31.11/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Configuring src1
-- Configuring src1_subdir1
-- Found MPI_C: /opt/cray/pe/craype/2.7.31.11/bin/cc (found version "3.1") 
-- Found MPI: TRUE (found version "3.1") found components: C 
-- Configuring src1_subdir1
-- Configuring src1_subdir1
-- Configuring src1_subdir2
-- Configuring src1_subdir2
-- Configuring src1_subdir2
-- Configuring src1_subdir1
-- Could NOT find MPI_C (missing: MPI_C_WORKS) 
CMake Error at /autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find MPI (missing: MPI_C_FOUND C)
Call Stack (most recent call first):
  /autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
  /autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindMPI.cmake:1837 (find_package_handle_standard_args)
  src1/src1_subdir1/CMakeLists.txt:2 (find_package)

@dqwu
Copy link
Contributor Author

dqwu commented Nov 23, 2024

Even if CMAKE_SYSTEM_NAME is not set to Catamount, if craype-accel-amd-gfx90a and rocm/5.4.0 are loaded and -fopenmp flag is set to LDFLAGS, CMake 3.22 or higher also fails on Frontier with PrgEnv-cray:

module load PrgEnv-cray
module load craype-accel-amd-gfx90a rocm/5.4.0
module load cmake/3.27.9

mkdir src1
mkdir src2

cat <<EOF >> CMakeLists.txt
project (MY_PROJECT C)
message(STATUS "Configuring src1")
add_subdirectory(src1)
message(STATUS "Configuring src2")
add_subdirectory(src2)
EOF

cd src1
mkdir src1_subdir1
mkdir src1_subdir2

cat <<EOF >> CMakeLists.txt
add_subdirectory(src1_subdir1)
add_subdirectory(src1_subdir2)
EOF

cd src1_subdir1
cat <<EOF >> CMakeLists.txt
message(STATUS "Configuring src1_subdir1")
find_package(MPI REQUIRED COMPONENTS C)
EOF

cd ../src1_subdir2
cat <<EOF >> CMakeLists.txt
message(STATUS "Configuring src1_subdir2")
find_package(MPI REQUIRED COMPONENTS C)
EOF

cd ../../src2
mkdir src2_subdir

cat <<EOF >> CMakeLists.txt
add_subdirectory(src2_subdir)
EOF

cd src2_subdir
cat <<EOF >> CMakeLists.txt
message(STATUS "Configuring src2_subdir")
find_package(MPI REQUIRED COMPONENTS C)
EOF

cd ../..

mkdir build
cd build

CC=cc \
LDFLAGS="-fopenmp" \
cmake -Wno-dev \
..

CMake errors:

-- The C compiler identification is Clang 17.0.3
-- Cray Programming Environment 2.7.31.11 C
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/cray/pe/craype/2.7.31.11/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Configuring src1
-- Configuring src1_subdir1
-- Found MPI_C: /opt/cray/pe/craype/2.7.31.11/bin/cc (found version "3.1") 
-- Found MPI: TRUE (found version "3.1") found components: C 
-- Configuring src1_subdir1
-- Configuring src1_subdir2
-- Configuring src1_subdir2
-- Configuring src1_subdir1
-- Could NOT find MPI_C (missing: MPI_C_WORKS) 
CMake Error at /autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find MPI (missing: MPI_C_FOUND C)
Call Stack (most recent call first):
  /autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
  /autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindMPI.cmake:1837 (find_package_handle_standard_args)
  src1/src1_subdir1/CMakeLists.txt:2 (find_package)

@dqwu
Copy link
Contributor Author

dqwu commented Nov 23, 2024

dqwu added a commit that referenced this issue Nov 27, 2024
This PR applies a temporary but necessary workaround to address a
CMake regression (3.22.0 or higher) that causes find_package(MPI)
to fail or hang when detected in subprojects on Cray systems.

This workaround ensures that find_package(MPI) is properly detected
in subprojects until the regression is resolved by CMake developers.

Fixes #517

* dqwu/cmake_regression_workaround:
  Applying a workaround for CMake FindMPI regression on Cray systems
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants