-
Notifications
You must be signed in to change notification settings - Fork 631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Codegen][GPU] Keep range and divisibility annotations on push constants #19348
base: main
Are you sure you want to change the base?
[Codegen][GPU] Keep range and divisibility annotations on push constants #19348
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any idea of the performance impact of the changes here. Basically are there somethings that we know LLVM can pick up to make better use of this?
// If the constant has non-trivial assumptions placed on it about | ||
// its min and max values or divisibility, use that information to | ||
// annotate the corresponding arguments. | ||
if (op.getResult().hasOneUse()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the one use condition?
@@ -28,8 +29,10 @@ class DropCompilerHintsPass | |||
op.replaceAllUsesWith(op.getOperands()); | |||
op.erase(); | |||
} else if (auto op = dyn_cast<IREE::Util::AssumeIntOp>(genericOp)) { | |||
op.replaceAllUsesWith(op.getOperands()); | |||
op.erase(); | |||
if (!keepAssumeInt) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Convert this to
if (auto op = dyn_cast<IREE::Util::OptimizationBarrierOp>(...)) {
...
return;
}
if (auto op = dyn_cast<IREE::Util::AssumeIntOp>(...)) {
if (keepAssumeInt) return;
.....
}
I'll try and find a way to perf this - especially since I need to get index narrowing run MLIR-side, it looks like, but an eyeball of the IR (when combined with my linearize-mma changes, which may be related) revealed a lot more |
d567cd2
to
e26a966
Compare
IREE has useful information indicating the minimum values, maximum values, and divisibility of push constants encoded in util.assume.int ops. This information was being thrown away when, in some cases, it could be profitably communicated to compiler backends. This commit: - Changes drop-compiler-hints to have an option that keeps util.assume.int ops - Adds rewrites to the LLVMGPU and SPIRV lowerings that erase these ops - Changes the rewrites for hal.interface.constant.load to look for util.assume.int ops in the input IR and use them to add annotations to the loaded constant - In the LLVM case, these annotations take the form of a `range(iN lb, ub)` attribute on the corresponding function parameter - For SPIR-V, these annotations are calls to KHR_AssumeTrue if the capability is avaliable - This commit also adds a case for integer assumption operations to the SPIR-V i64 emulation pass While I was here, I converted some of the LLVM lowering patterns to use ConvertOpToLLVMPattern<>.
e26a966
to
d240e2b
Compare
Ok, so, on the performance front, I ran the IREE gemm kernel benchmarks with both my patch stack (this plus the integer range narrowing PR plus linearize-mma) and a recent So, here're the summary statistics for the percent increase from my patches:
However, that included a bunch of matvecs and other things that're really susceptible to benchmark noise. So, if I restrict myself to only configs where main got at least 10 tflop/s, I get
I haven't yet benchmarked the slow LLama dispatch, specifically, but that's waiting on fixes to this and other related PRs where I realized I missed a few opportunities. I think the conclusion here is that I've dug up a marginal, but real, series of performance improvements. |
IREE has useful information indicating the minimum values, maximum values, and divisibility of push constants encoded in util.assume.int ops. This information was being thrown away when, in some cases, it could be profitably communicated to compiler backends.
This commit:
range(iN lb, ub)
attribute on the corresponding function parameterWhile I was here, I converted some of the LLVM lowering patterns to use ConvertOpToLLVMPattern<>.