[Codegen][GPU] Keep range and divisibility annotations on push constants #19348

krzysz00 · 2024-12-03T00:25:54Z

IREE has useful information indicating the minimum values, maximum values, and divisibility of push constants encoded in util.assume.int ops. This information was being thrown away when, in some cases, it could be profitably communicated to compiler backends.

This commit:

Changes drop-compiler-hints to have an option that keeps util.assume.int ops
Adds rewrites to the LLVMGPU and SPIRV lowerings that erase these ops
Changes the rewrites for hal.interface.constant.load to look for util.assume.int ops in the input IR and use them to add annotations to the loaded constant
- In the LLVM case, these annotations take the form of a range(iN lb, ub) attribute on the corresponding function parameter
- For SPIR-V, these annotations are calls to KHR_AssumeTrue if the capability is avaliable
This commit also adds a case for integer assumption operations to the SPIR-V i64 emulation pass

While I was here, I converted some of the LLVM lowering patterns to use ConvertOpToLLVMPattern<>.

MaheshRavishankar

Do we have any idea of the performance impact of the changes here. Basically are there somethings that we know LLVM can pick up to make better use of this?

MaheshRavishankar · 2024-12-03T23:43:25Z

compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp

+    // If the constant has non-trivial assumptions placed on it about
+    // its min and max values or divisibility, use that information to
+    // annotate the corresponding arguments.
+    if (op.getResult().hasOneUse()) {


Why the one use condition?

MaheshRavishankar · 2024-12-03T23:49:11Z

compiler/src/iree/compiler/Dialect/Util/Transforms/DropCompilerHints.cpp

@@ -28,8 +29,10 @@ class DropCompilerHintsPass
        op.replaceAllUsesWith(op.getOperands());
        op.erase();
      } else if (auto op = dyn_cast<IREE::Util::AssumeIntOp>(genericOp)) {
-        op.replaceAllUsesWith(op.getOperands());
-        op.erase();
+        if (!keepAssumeInt) {


Nit: Convert this to

if (auto op = dyn_cast<IREE::Util::OptimizationBarrierOp>(...)) { ... return; } if (auto op = dyn_cast<IREE::Util::AssumeIntOp>(...)) { if (keepAssumeInt) return; ..... }

krzysz00 · 2024-12-05T21:36:26Z

I'll try and find a way to perf this - especially since I need to get index narrowing run MLIR-side, it looks like, but an eyeball of the IR (when combined with my linearize-mma changes, which may be related) revealed a lot more or disjoint and nuw and nsw than before, and I do know inferring that is codegen-relevant

IREE has useful information indicating the minimum values, maximum values, and divisibility of push constants encoded in util.assume.int ops. This information was being thrown away when, in some cases, it could be profitably communicated to compiler backends. This commit: - Changes drop-compiler-hints to have an option that keeps util.assume.int ops - Adds rewrites to the LLVMGPU and SPIRV lowerings that erase these ops - Changes the rewrites for hal.interface.constant.load to look for util.assume.int ops in the input IR and use them to add annotations to the loaded constant - In the LLVM case, these annotations take the form of a `range(iN lb, ub)` attribute on the corresponding function parameter - For SPIR-V, these annotations are calls to KHR_AssumeTrue if the capability is avaliable - This commit also adds a case for integer assumption operations to the SPIR-V i64 emulation pass While I was here, I converted some of the LLVM lowering patterns to use ConvertOpToLLVMPattern<>.

krzysz00 · 2024-12-13T21:45:50Z

Ok, so, on the performance front, I ran the IREE gemm kernel benchmarks with both my patch stack (this plus the integer range narrowing PR plus linearize-mma) and a recent main commit and then looked at the TFlops changes.

So, here're the summary statistics for the percent increase from my patches:

In [32]: ((branch['tflops'] - mainline['tflops']) / mainline['tflops'] * 100).describe()
    ...:
Out[32]:
count    582.000000
mean       0.820292
std        7.603068
min      -21.561822
25%       -2.335348
50%        0.000000
75%        3.215377
max       74.300112
Name: tflops, dtype: float64

However, that included a bunch of matvecs and other things that're really susceptible to benchmark noise. So, if I restrict myself to only configs where main got at least 10 tflop/s, I get

In [33]: ((branch_fast['tflops'] - mainline_fast['tflops']) / mainline_fast['tflops'] *
    ...: 100).describe()
Out[33]:                                                                                count    212.000000
mean       1.270069
std        4.050435
min      -11.428650
25%       -1.167054
50%        0.817124
75%        3.703665
max       12.999693
Name: tflops, dtype: float64

I haven't yet benchmarked the slow LLama dispatch, specifically, but that's waiting on fixes to this and other related PRs where I realized I missed a few opportunities.

I think the conclusion here is that I've dug up a marginal, but real, series of performance improvements.

krzysz00 requested review from antiagainst, MaheshRavishankar, kuhar, qedawkins, Groverkss and benvanik as code owners December 3, 2024 00:25

MaheshRavishankar requested changes Dec 3, 2024

View reviewed changes

krzysz00 force-pushed the keep-range-divisibility-annotations branch 2 times, most recently from d567cd2 to e26a966 Compare December 12, 2024 17:11

krzysz00 force-pushed the keep-range-divisibility-annotations branch from e26a966 to d240e2b Compare December 13, 2024 20:18

Preserve i64 assumes, address review comments

303aa30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Codegen][GPU] Keep range and divisibility annotations on push constants #19348

[Codegen][GPU] Keep range and divisibility annotations on push constants #19348

krzysz00 commented Dec 3, 2024

MaheshRavishankar left a comment

MaheshRavishankar Dec 3, 2024

MaheshRavishankar Dec 3, 2024

krzysz00 commented Dec 5, 2024

krzysz00 commented Dec 13, 2024

[Codegen][GPU] Keep range and divisibility annotations on push constants #19348

Are you sure you want to change the base?

[Codegen][GPU] Keep range and divisibility annotations on push constants #19348

Conversation

krzysz00 commented Dec 3, 2024

MaheshRavishankar left a comment

Choose a reason for hiding this comment

MaheshRavishankar Dec 3, 2024

Choose a reason for hiding this comment

MaheshRavishankar Dec 3, 2024

Choose a reason for hiding this comment

krzysz00 commented Dec 5, 2024

krzysz00 commented Dec 13, 2024