-
Notifications
You must be signed in to change notification settings - Fork 18
Kokkos Caveats
Mostly, due to the magic of CUDA 10+ (or SYCL), you can write device-side functions as if they were host-side functions, with full C++14 or C++17 syntax. There are a couple of limitations and conventions to note, though.
-
You can't necessarily access host pointers -- see below for particular issues. Most importantly, this includes all print statements (technically
printf
is hacked into CUDA, but SYCL sensibly bans them entirely). If you desperately need to print inside a kernel for debugging, you can always compile the code for OpenMP, which will still compile and run just fine. But first, consider writing a function to check the results after the kernel has run, as in the examples indebug.cpp
. That way, the debugging code can be preserved at no cost to speed, and used later when similar problems inevitably crop up. -
Passing large objects to the device by value costs time. This limits access to some large Parthenon objects like parameters & containers -- instead, options are pulled out of the package/global parameters before invoking a kernel, and passed to device-side functions as arguments.
If you get nonsensical template errors, it's likely something in Kokkos, most likely related to the lambda functions. The simplest of these is when you've written the wrong lambda function into a par_for
loop. Consider:
vs the correct version
Using the new clang-based Intel, for example, this spits the error . Other compilers can be even worse. Type errors concerning function types are likely either this issue, or issues with templates lacking a concrete instantiation as used.
If you've tried to be clever and the code is segfaulting for no reason, you may have done one of these things.
One of the very common but hard to debug errors possible with Kokkos is the inadequate capture of pointers in C++ Lambda functions.
Relevant Kokkos page: Lambda Dispatch
Basically, the C++14 Lambda captures pointers, but not the things they point to. For Kokkos, this means that any pointers used in a KOKKOS_LAMBDA
function will still point to the host memory address, which at best crashes the code in place, and at worst corrupts device memory so that it crashes later, in a perfectly benign kernel.
This issue is most pernicious with the this-> pointer, which is added implicitly to member functions & member variables of objects. Be very careful writing or modifying object-oriented code which calls device-side code, or is intended to run device-side itself. Basically, the cardinal rule is that member functions can only access member View
objects, not regular member variables.
A decent (or at least working) example of a host-side object with device-side functions and calls is the Grid object. When it is captured by value, ideally all of its members should be accessible device-side and host-side equivalently.
If you still want to write new objects for use on the host or device, be careful of inheritance. There is no virtual function map on the device, and even if you can avoid the use of virtual functions, requiring device-side function calls can (ironically enough) be very slow on CPUs, as it prevents inlining and thereby vectorization.
See the relevant Kokkos page for details and an example.