Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing MPI in custom-matrix-format example #1370

Closed
wants to merge 4 commits into from

Conversation

MarcelKoch
Copy link
Member

This PR fixes the missing MPI in the custom-matrix-format example when using the MPI wrapper as CXX compiler. There are actually two ways of fixing this implemented here:

  1. remove the <ginkgo/ginkgo.hpp> include in the cuda file. Now the MPI headers are not required when compiling the cuda code anymore.
  2. Use the CMake option `MPI_ASSUME_NO_BUILTIN_MPI to disable using MPI from the compiler wrapper directly.

Both of these fix the issue. I think having both of them is a good idea, because 1. the include is just not necessary, and 2. this would allow users to include <ginkgo/ginkgo.hpp> in device code.

Using MPI from the MPI compiler wrapper directly means that no include path or library is added for MPI. This gives errors if a file that is not compiled by the wrappers and is linked against/includes ginkgo can't find the MPI headers or implementation. This mostly effects device code, since they will most likely not be compiled with the MPI wrapper compilers.
@MarcelKoch MarcelKoch self-assigned this Jul 12, 2023
@ginkgo-bot ginkgo-bot added reg:build This is related to the build system. reg:example This is related to the examples. labels Jul 12, 2023
@MarcelKoch MarcelKoch requested a review from a team July 12, 2023 08:10
@MarcelKoch
Copy link
Member Author

Fixes part of #1367

@yhmtsai
Copy link
Member

yhmtsai commented Jul 12, 2023

sorry, I do not get the usage of MPI_ASSUME_NO_BUILTIN_MPI.
could you elaborate it on two cases from gcc and mpicc?

upsj
upsj previously approved these changes Jul 12, 2023
Copy link
Member

@upsj upsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! One minor nit (we already have way too many configuration options)

Comment on lines +18 to +19
set(GINKGO_MPI_ASSUME_NO_BUILTIN_MPI ON CACHE BOOL "Disables using MPI wrapper compiler directly.")
mark_as_advanced(GIKNGO_MPI_ASSUME_NO_BUILTIN_MPI)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to make this controllable at all? It seems to be relevant only in very limited circumstances, where either mpiexec is missing from the environment, or the user has a MPI compiler wrapper loaded that is not the one specified by CXX_COMPILER

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I'm not 100% sure how cmake finds MPI with this option on. Perhaps there are some weird cases, where it doesn't use the same MPI as the MPI wrapper would have. So I added the option to turn it off. I also marked it as advanced to indicate that users generally need not to worry about it.

@MarcelKoch
Copy link
Member Author

@yhmtsai If you set CXX=mpicxx then compiling with it doesn't require adding include paths or libraries for MPI. That gets picked up by cmake, when it finds the MPI package. So, on the cmake level, linking to MPI doesn't require any additional flags. This will lead to errors, when something is compiled not using mpicxx, but still including/linking to MPI. The MPI_ASSUME_NO_BUILTIN_MPI flag tells cmake to somehow ignore the MPI compiler wrapper and add the needed include path and libraries regardless.

@upsj
Copy link
Member

upsj commented Jul 12, 2023

This whole setting has me a bit confused - theoretically, CMake should interrogate the compiler wrapper for the location of the mpi.h file and add the include flag to the MPI target, but that doesn't seem to be happening here? From FindMPI.cmake: # Given the new design of FindMPI, INCLUDE_DIRS will always be located, even under implicit linking.

@MarcelKoch
Copy link
Member Author

@upsj The comment from FindMPI.cmake is not in the official documentation, so I don't think we can rely on that. I tried again (without MPI_ASSUME_NO_BUILTIN_MPI) with the latest cmake release (3.26.4) and they still not set the include directories.
So maybe that is a bug in cmake, or the comment is just not very clear.

@upsj
Copy link
Member

upsj commented Jul 13, 2023

@MarcelKoch yes, I'd really like to get a full trace of the execution, because this seems like a bug to me. For this to be fixed, we need MPI_CUDA_INCLUDE_DIRS MPI_CXX_COMPILER_INCLUDE_DIRS to be set, which seems to be missing.

@MarcelKoch
Copy link
Member Author

@upsj I will open an cmake issue to see what's going on. From a brief look at the source, you're right, it always sets the include directories, but if a compiler wrapper is used, the value of the used variable is empty. So the source code comment is moot.

@MarcelKoch
Copy link
Member Author

Here is the issue, let's see what happens: https://gitlab.kitware.com/cmake/cmake/-/issues/25085

@upsj
Copy link
Member

upsj commented Jul 17, 2023

I no longer think this fix is necessary, instead we should be propagating the C++ compiler as CUDA host compiler correctly.

@upsj upsj dismissed their stale review July 17, 2023 08:37

there is a more general fix

@MarcelKoch
Copy link
Member Author

Fixed by #1368 and #1205

@MarcelKoch MarcelKoch closed this Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reg:build This is related to the build system. reg:example This is related to the examples. type:distributed-functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants