-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating tool to extract kernel launch configuration (block, grid, launch mechanism, ...) #238
Comments
Thanks for this. It is a good point. I don't know how easy it would be to make modifications to Kokkos core and extract out launch configuration code from the execute() function. I think the solution should involve a new Kokkos Tools callback. I haven't sketched it out in detail but you would need to make changes in profiling/all/ to add this new callback. |
@maartenarnst what do these tools like ncu and rocprof need as inputs to extract this information (e.g. a pointer to the kernel function?). If its runtime information like that, my first thought would be that Kokkos should pass that information to Tools through an appropriate interface and then Tools can use it as needed. I guess one issue I see is that Core will do things like launch mechanism and parameters before actually launching the kernel, so we'd have to resolve how to give Tools enough information to correlate that with the following kernel launch and plumb that through Core. If it's static information, perhaps it can be integrated with the PR @dalg24 referenced above. |
With @maartenarnst, we think there is a bigger picture question we should answer before we go on. What should
|
tl;dr to answer @romintomasetti question: I also think launch grid configuration is in the scope, assuming a Kokkos user can gain some insight from apples-to-apples comparison of launch configurations across different vendor tools. I elaborate below, though we may want to move this elaboration to another Kokkos Tools github issue: You can take a look at the Kokkos Tools documentation README.md and the wiki for the scope and purpose of Kokkos Tools, but let me summarize and target it in the context of your question:
Consider the problem of Kokkos function name demangling that one would have without Kokkos Tools. The problem is not (just) that reading the function name is hard for a Kokkos user running on one particular backend. I think the more fundamental problem comes in portable tooling: How does one compare timings of a particular Kokkos::parallel_for run on an AMD GPU (with the HIP backend) with that of an NVIDIA GPU (with CUDA backend)? Kokkos Tools provides for an apples-to-apples - portable - comparison of a labeled Kokkos kernel across the two different vendor GPUs. Otherwise, the Kokkos programmer has to take time doing such a comparison on their own (note how this directly corresponds to effort of programming and maintaining CUDA and HIP backend if he/she didn't have Kokkos). So, to answer the question: I think launch grid configuration is in the scope, but this is assuming a Kokkos user can gain some insight from apples-to-apples comparison of launch configurations across different vendor tools. More generally, any tooling for Kokkos program is in scope if it has meaning across different Kokkos backends is in Kokkos Tools. |
It can be of interest to developers of Kokkos applications to have some insight into the configuration that Kokkos uses to launch kernels (block, grid, launch mechanism, ...).
However, currently, in Kokkos, the determination of such a launch configuration is implemented typically inside the body of an
execute()
function. Hence, it cannot be accessed directly. And it seems that to access launch configurations, developers currently are led to use tools likencu
androcprof
. Another option (not sustainable) is to copy-paste pieces of the bodies of theexecute()
functions to custom functions.An option may be to extract these functionalities from the
execute()
functions in Kokkos and put them into dedicated functions that could become part of the api of Kokkos. However, implementing the launch configurations inside the bodies ofexecute()
functions, and thus choosing not to expose them, may have been a deliberate design decision in Kokkos (?).Thus putting the question here how best to proceed to make it possible to extract launch configuration properties?
If it's not an option to expose them in Kokkos itself, it appears interesting to explore whether gaining insight into launch configurations could be made a part of Kokkos tools. I.e., whether it would be of interest to define new callbacks that can provide the launch configuration and develop a new Kokkos tools connector to collect such information.
@romintomasetti
@dalg24, @masterleinad, @vlkale
The text was updated successfully, but these errors were encountered: