Skip to content

Kokkos Caveats

Ben Prather edited this page Nov 16, 2021 · 3 revisions

Limitations to device-side functions

Mostly, due to the magic of CUDA 10+ (or SYCL), you can write device-side functions as if they were host-side functions, with full C++14 or C++17 syntax. There are a couple of limitations and conventions to note, though.

  1. You can't necessarily access host pointers -- see below for particular issues. Most importantly, this includes all print statements (technically printf is hacked into CUDA, but SYCL sensibly bans them entirely). If you desperately need to print inside a kernel for debugging, you can always compile the code for OpenMP, which will still compile and run just fine. But first, consider writing a function to check the results after the kernel has run, as in the examples in debug.cpp. That way, the debugging code can be preserved at no cost to speed, and used later when similar problems inevitably crop up.

  2. Passing large objects to the device by value costs time. This limits access to some large Parthenon objects like parameters & containers -- instead, options are pulled out of the package/global parameters before invoking a kernel, and passed to device-side functions as arguments.

Template Gotchas

If you get nonsensical template errors, it's likely something in Kokkos, most likely related to the lambda functions. The simplest of these is when you've written the wrong lambda function into a par_for loop. Consider:

vs the correct version

Using the new clang-based Intel, for example, this spits the error . Other compilers can be even worse. Type errors concerning function types are likely either this issue, or issues with templates lacking a concrete instantiation as used.

Fancy Segfaults

If you've tried to be clever and the code is segfaulting for no reason, you may have done one of these things.

Lambda capture

One of the very common but hard to debug errors possible with Kokkos is the inadequate capture of pointers in C++ Lambda functions.

Relevant Kokkos page: Lambda Dispatch

Basically, the C++14 Lambda captures pointers, but not the things they point to. For Kokkos, this means that any pointers used in a KOKKOS_LAMBDA function will still point to the host memory address, which at best crashes the code in place, and at worst corrupts device memory so that it crashes later, in a perfectly benign kernel.

This issue is most pernicious with the this-> pointer, which is added implicitly to member functions & member variables of objects. Be very careful writing or modifying object-oriented code which calls device-side code, or is intended to run device-side itself. Basically, the cardinal rule is that member functions can only access member View objects, not regular member variables.

A decent (or at least working) example of a host-side object with device-side functions and calls is the Grid object. When it is captured by value, ideally all of its members should be accessible device-side and host-side equivalently.

Device objects and inheritance

If you still want to write new objects for use on the host or device, be careful of inheritance. There is no virtual function map on the device, and even if you can avoid the use of virtual functions, requiring device-side function calls can (ironically enough) be very slow on CPUs, as it prevents inlining and thereby vectorization.

See the relevant Kokkos page for details and an example.