-
Notifications
You must be signed in to change notification settings - Fork 58
Home
Kokkos Tools provides a lightweight set of profiling and debugging tools and utilities to enhance application programmer productivity in developing Kokkos performance portable parallel programs. Kokkos Tools provides a tools interface with instrumentation hooks built directly into the Kokkos runtime. Compared to 3rd party tools, Kokkos Tools provides much cleaner and more context-specific information: in particular, the tools allow kernel-centric analysis and they use labels provided to Kokkos objects, e.g., Kokkos kernel launches and Kokkos Views.
Under most circumstances, the profiling hooks are compiled into Kokkos executables by default. That means the set of tools works for your existing Kokkos application programs, assuming that the version for the profiling hooks is compatible with the tools version. No recompilation or changes to your build procedures are required.
Note: Kokkos_ENABLE_LIBDL
must be on to load profiling hooks dynamically. It should be on by default, however.
To use one of the tools, you have to compile it, which will generate a dynamic library. Then, just before executing the Kokkos application, you have to set the environment variable KOKKOS_TOOLS_LIBS
to point to the dynamic library. Let us say that you are in the directory of your application and want a reproducible run of random sample of half the memory events for each Kokkos kernel library function. You first need to build the sampler utility in Kokkos Tools and memory events Kokkos Tools.
To build the libraries with cmake
, you must go to the source directory of Kokkos Tools, say, YOUR_KTO_SRC_DIR
and then go into the subdirectory of that particular utility and the tool, and type cmake ..
with the appropriate place that you want the libraries to be installed, say, YOUR_KTO_INSTALL
, to configure the build and install of the libraries.
cd ${YOUR_KTO_SRC_DIR}; mkdir mybuild; cmake ..` -DCMAKE_INSTALL_PREFIX=${YOUR_KTO_INSTALL}'
If the above is successful, you then type make; make install
to actually build and install Kokkos Tools on your machine.
Then, go back to your application code directory, and set Kokkos_TOOLS_LIBS to a library with the sampling utility and the memory event dynamic library.
cd -; export KOKKOS_TOOLS_LIBS='${YOUR_KTO_INSTALL}/libkp_kokkos_sampler.so;${YOUR_KTO_INSTALL}/libkp_memory_events.so'
set any options for the tool and utility such as the sampling skip rate of 2 (every other Kokkos kernel invocation) and global fencing to capture state of memory events:
export KOKKOS_TOOLS_SAMPLER_SKIP=2; export KOKKOS_TOOLS_GLOBALFENCES=1;
and then run your application normally, e.g., in Bash:
myKokkosApp.exe;
To use a particular Kokkos library that is at YOUR_KOKKOS_INSTALL
for building your Kokkos Tools library, add the flag
-DKokkos_ROOT=${YOUR_KOKKOS_INSTALL}
to the cmake ..
command shown above. This is particularly important when one wants to using third-party Kokkos Tool connector libraries like nvtx-connector
.
Alternative to the environment variable, one can pass a parameterized flag to a Kokkos program executable --kokkos-tools-library
, e.g., in Bash:
myApp.exe --kokkos-tools-library='${YOUR_KTO_SRC_DIR}/kokkos-tools/src/tools/memory-events/kp_memory_event.so'
Many of the tools of Kokkos Tools will produce an output file which uses the hostname as well as the process id as part of the filename.
Though using cmake is recommended to build the dynamic libraries for tool utilities and tools, you can also use the Makefiles if you wish. Instead of using any cmake commands mentioned in the above, you would do the following:
cd ${YOUR_KTO_SRC_DIR}; cd profiling/memory-events; make; cd ../../common/kokkos-sampler/; make;
Then, you specify the resulting dynamic library files in the source directory in KOKKOS_TOOLS_LIBS
, in a similar fashion shown above.
One can explicitly add instrumentation to a library or an application. Currently, the only hooks intended for explicit programmer use are the region-related and section-related hooks. These use a push/pop model to mark coarser regions in your code.
void foo() {
Kokkos::Profiling::pushRegion("foo");
bar();
stool();
Kokkos::Profiling::popRegion();
}
-
A tool utility which is used in conjunction with analysis tools, to restrict the tooling to a subset of the application's Kokkos library functions.
-
A tool utility which is used in conjunction with analysis tools, to restrict the tooling to samples of Kokkos kernel invocations.
-
Outputs high water mark of memory usage of the application.
-
Generates a timeline of memory utilization for each Memory Space and data transferred between Memory Spaces.
-
Tool to track memory events such as allocation and deallocation. It also provides the information of the MemoryUsage tool.
-
Captures basic timing information for Kernels.
-
Prints Kernel and Region events during runtime.
-
Provides Kokkos Kernel Names to nvtx so that analysis can be performed on a per-kernel basis. This was previously called the nvprof-connector.
-
Like nvtxConnector but it turns profiling off for those kernels filtered out. It should be used in conjunction with the KernelFilter utility. This was previously called the nvprof-focused-connector.
-
Provides Kokkos Kernel Names to Roctx so that analysis can be performed on a per-kernel basis.
-
Provides Kokkos Kernel Names to VTune so that analysis can be performed on a per-kernel basis.
-
Like vTuneConnector but it turns profiling off for those kernels filtered out. It should be used in conjunction with the KernelFilter utility.
-
Modular connector for accumulating timing, memory usage, hardware counters, and other various metrics. Supports controlling VTune, CUDA profilers, and TAU + kernel name forwarding to VTune, NVTX, TAU, Caliper, and LIKWID.
Defining a timemory component will enable your plug-in to output to stdout, text, and JSON, accumulate statistics, and utilize various portable function calls for common needs w.r.t. timers, resource usage, etc.
-
This is a tool for automated tuning of a large variety of programming models and languages. It is available as a submodule in the Kokkos Tools git repository. It has a hook for Kokkos and sophisticated mechanisms to tune parameters of Kokkos functions' parameters, e.g., the team size in a Kokkos parallel_for. See Apex for more information.
This a tool that complements Apex. Apollo provides a framework ML-guided auto-tuning capabilities to tune arbitrary performance parameters. The Apollo Kokkos Tools connector allows for such a capability for Kokkos programs.
-
This is a tool for automated tuning of a large variety of programming models and languages. It is available as a git submodule in the Kokkos Tools git repository. It has a hook for Kokkos and sophisticated mechanisms to tune parameters of Kokkos functions' parameters. See Caliper for more information.
-
LDMS, which stands for Lightweight Data Monitoring System, is software for performance monitoring HPC Systems. The LDMS Kokkos Tools connector invokes functions of LDMS to extract profiling data samples from a Kokkos application program.
The success of Kokkos Tools comes from having a collection of libraries built in-house and by the broader Kokkos community. The Kokkos Tools developers from the Kokkos team welcome contributions from other developers and from users of Kokkos alike. Contributions are welcome, particularly in the form of:
- Developing and improving on existing tools in the set of Kokkos Tools (each tool is a connector)
- Creating and contributing a new tool (connector) to add to the set of tools
- Improving documentation for Kokkos Tools
- Providing experiences of your Kokkos program use cases when using Kokkos Tools, telling about a success stories and/or failures with respect to Kokkos Tools.
The Kokkos Tools developers have (1) general guidelines and overview of development in Kokkos Tools and (2) specific guidelines and tips for contributing via each of the four ways listed. The wiki page for the general guidelines is here and the wiki pages for specific guidelines are referenced in the paragraphs below.
For items 1 and 2, one contributes by first identifying the problem by creating a Github Issue in the Kokkos Tools repo at github.com/kokkos/kokkos-tools/issues/ and then provides a suggested solution to the Github Issue via a PR against the develop branch of the Kokkos Tools repo.
For 3 and 4, please email [email protected] and [email protected] or, if you can, provide suggestions to the Kokkos Team slack channel. For 3, you can also submit a PR for the files involving documentation in the Kokkos Tools repo, e.g., README.md, Build.md. For 4, if appropriate, the Kokkos Tools developers will showcase them in Kokkos Tools tutorials and other presentations, with your permission.
If you would like to learn about Kokkos Tools in a formal format, tutorials are available. You can find a good overview of all tooling support for Kokkos (including Kokkos TOols) in the slides and recording at NERSC 2024 Advanced Kokkos Tutorial: https://www.nersc.gov/users/training/past-training-events/2024/portability-series-kokkos-apr2024/
You can also look at https://github.com/kokkos/kokkos-tutorials/blob/main/Profiling/Kokkos-Profiling.pdf for slides for an overview of Kokkos Tools. A more in depth discussion is in the YouTube video here: https://www.youtube.com/watch?v=MH6zFYGw0HU. See the latest tutorials at https://github.com/kokkos/kokkos-tutorials.
Kokkos Tools provides the gateway to analysis and improvement of Kokkos Programs. Each of the libraries of Kokkos Tools hooks directly into the Kokkos runtime library. There are other projects that also support tooling for Kokkos. These are laid out here.
- HPCToolkit for Kokkos - POC: John Mellor-Crummey ([email protected])
- Tau support for Kokkos - POC: Sameer Shende ([email protected])
- Automated Testing of Kokkos Programs - POC: Vivek Kale ([email protected])
- Examples: You can find examples of Kokkos Tools being used at https://github.com/DavidPoliakoff/kokkos-tools-examples.
- Kokkos.org has a top-level view of the Kokkos project, with a page on Kokkos Tools.
- Issue on using nsys and ncu for TensorRT: https://github.com/NVIDIA/TensorRT-LLM/issues/183
SAND2017-3786