Kernel Performance Benchmarking/Reporting #18521
Labels
infrastructure/benchmark
Relating to benchmarking infrastructure
infrastructure
Relating to build systems, CI, or testing
Initial V0 Goal
Main Requirements:
Needs to be able to be run locally by developer easily (don't require a bunch of unnecessary requirements such as RocBLAS, HipBLASLt , etc.)
Needs to be able to support testing any combination we need including being able to understand or at least for a developer to manually send specific tests to specific architectures. Main thing that we need to do here is generalize/rework Surya's existing benchmarking infrastructure (https://github.com/nod-ai/rocm-gemm-benchmark) such that it is easy for anyone to add a different problem configurations and run performance benchmarking on the kernels of interest.
Add full support for convolutional and fp8 attention (if it is ready for perf testing) benchmarking. An initial version of convolution benchmarking exists on this branch (https://github.com/nod-ai/rocm-gemm-benchmark/tree/conv-benchmark), but I didn't leverage the rotating buffer, flush icache, and simulating gpu strain because I use iree's python iree-benchmark-module. After completing the bullet above, where it is easy to integrate convolution problems into the C++ project, we can get all that control during the benchmarking phase.
Need the ability for someone not to run anything to see the results measured against a roofline online somewhere. We want CI runs to run nightly and generate roofline plot artifacts. Currently, we have CI setup and it generates data csv files where we can see a bunch of data. Providing an easy way to access visuals with a few clicks is the goal here. Eventually, we have ideas to incorporate this into a neat metrics dashboard UI.
The text was updated successfully, but these errors were encountered: