-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simpleBuildAgainstTrilinos: install testing issues with Cuda builds #11955
Comments
@ndellingwood, why are we not seeing this with the Trilinos PR CUDA build? What is that build doing differently from this build? What happens when you use the versions of Kokkos and KokkosKernels that are in the Trilinos 'develop' branch with this exact same build and test process? |
@ndellingwood, can you run |
I'm not certain
I'm in progress with this install now for testing, will post back shortly
I don't have the nightly properly configured to post to CDash, I have this on my list of TODO's; I haven't set this up before, hopefully you don't mind if I follow-up by email with you as questions arise? |
@ndellingwood, every Trilinos build has the ability to post an experimental submission for the current Trilinos configuration to CDash (as long as it can reach https://testing.sandia.gov/cdash). There is zero you should have to do (except perhaps mess with the proxy). See: Just try P.S. This requires you have a working configuration of Trilinos that writes the build files to the build dir. This feature can't post results if there are configure failures. |
@bartlettroscoe The issue is also present with Trilinos develop branch using the kokkos and kokkos-kernels packages (no source override) |
Oh, that's convenient! I hadn't seen this document before, I had assumed I would need to write a ctest driver script. I'll test this out to get this added to the Jenkins nightly builds As far as running |
@ndellingwood, no, you can drive the install as well with posting to CDash. Just run with:
See: Just make sure that you set |
I tried running
|
@ndellingwood, can you attach the full STDOUT from above and the generated configure output file |
@ndellingwood, then the difference is likely how the environment is set up and the configuration is being done. Is the right CUDA version being found shown in:
That looks to be a env problem. (This is an error coming from inside of the standard CMake module |
@bartlettroscoe the configure output I posted was from a nightly build using cuda/11.2.2 on Weaver (sems modules not available); the reproducer notes I posted were from a different machine using sems modules (using cuda/11.4.2), that is the source of the discrepancy. I posted the sems-based reproducer because I thought that would be more portable for testing on various machines |
Here's the stdout output in collapsible block (lots of stuff) Summary
|
My |
@bartlettroscoe I saw this line in the stdout output:
The name of my configuration script is configure.sh, should I rename to do-configure? |
@bartlettroscoe here is my LastConfigure_*.log file (I was distracted and looked in the wrong build dir before) |
I experimented by copying my configure.sh to do-configure, blew away the CMake* directories, manually reconfigured then tried running make dashboard snip:
(Lots of stuff enabled that I don't enable in my script)L manually running configure.sh snip:
|
@ndellingwood, if you look at that file, you will see:
This looks to be an env problem. How is it that Can you please provide the STDOUT from the from-scratch configure?
Then, what happens when you just run:
instead of |
@bartlettroscoe in my configuration I'm not enabling SEACAS or any SEACAS sub-packages, I don't understand why that is being triggered when I call |
export TRILINOS_DIR=$PWD/../..
export KOKKOS_PATH=$TRILINOS_DIR/packages/kokkos
module purge
module load sems-cmake/3.24.3 sems-gcc/8.3.0 sems-cuda/11.4.2 sems-openmpi/4.0.5 sems-python/3.7.9 sems-ninja/1.10.1
export OMPI_CXX=$KOKKOS_PATH/bin/nvcc_wrapper
# configure
cmake \
-GNinja \
-DCMAKE_CXX_STANDARD=17 \
-DCMAKE_INSTALL_PREFIX="${PWD}/install" \
-DTPL_ENABLE_MPI=ON \
-DTrilinos_ENABLE_TESTS=OFF \
-DTrilinos_ENABLE_Kokkos=ON \
-DKokkos_ENABLE_CUDA=ON \
-DKokkos_ENABLE_CUDA_UVM=ON \
-DKokkos_ENABLE_CUDA_LAMBDA=ON \
-DTrilinos_ENABLE_KokkosKernels=ON \
-DKokkosKernels_INST_MEMSPACE_CUDAUVMSPACE=ON \
-DTrilinos_ENABLE_Tpetra=ON \
$TRILINOS_DIR Configuration output
|
@ndellingwood, well, that is the issue then. Let me look into what is happening and will let you know if I need any more info. (I need to see if there is a missing use case in TriBITS testing of this feature.) |
@ndellingwood, looks like there is a defect in
Above, note the empty lists The This use case can be supported for the all-at-once mode for when there are no tests enabled and the For now, could you please add I can add a test case to TriBITS where no tests are enabled and the |
Thanks, that explains it!
Will do this now |
Nice, enabling the Tpetra tests worked successfully to post the configure, build, and unit test results to the cdash experimental track (this is from my local build on kokkos-dev-2): I forgot to set |
@bartlettroscoe it looks like setting |
CCing @sebrowne to inform him about this as well ... @ndellingwood, please set the following configure options:
and configure and run |
@bartlettroscoe thanks, I wasn't aware of the The install tests passed when run this way. I reran them locally following the process in my OP:
and I am still encountering the same problems. Is there additional cmake options I should be passing on the cmake line to get this to work? What I have posted worked prior to #11863 |
To clarify, one of the problems is that the correct compiler is not being selected during configuration - g++ is being selected, rather than nvcc_wrapper (which was pointed at by Edit: Adding the configuration output and compilation error:
|
@ndellingwood, you now have to pass the compilers to use explicitly as shown in the test TrilinosInstallTests_simpleBuildAgainstTrilinos showing:
You can't get the compilers automatically from |
Thanks @bartlettroscoe , I missed that comment but confirmed including the compilers on the cmake line resolved the issue I was seeing (compilers are not properly set, code compiles, test passes)
Re: README.md updates, I think it would help to have step 3 updated |
I changed the labeling from bug to question, thanks @bartlettroscoe for your help! |
Added explicit mention of needing to pass in compatible compilers to CMake. This example can't get compilers from `find_package(Trilinos)` for CUDA builds (see #11955 (comment)).
See the new PR: Can you please review? But I think some positive things came out of this issue anyway:
|
Bug Report
After merge of #11863 I am encountering problems when testing a Trilinos install with the
simpleBuildAgainstTrilinos
demo with Cuda builds.I'm adding the @trilinos/tribits label for now, can re-label as needed, @bartlettroscoe
Here are some notes, reproducer details will follow below
simpleBuildAgainstTrilinos
configuration output: (leading to failed compilation)The configuration above leads to compilation errors, as g++ is being picked up as the compiler rather than nvcc_wrapper through the OMPI_CXX environment variable
Compilation error:
My cmake line for configuring
simpleBuildAgainstTrilinos
is as follows, which worked prior to the #11863 updates:From comparison, this is the configuration output from the previous round of passing tests:
In case this is relevant, these failures began showing in nightly tests against Kokkos Core+Kernels develop branches where I use source override to build using the source code from their respective repos (rather than the Trilinos packages). I still need to reproduce with just Trilinos' develop branch
Steps to Reproduce
2.Reproducer using sems modules:
The text was updated successfully, but these errors were encountered: