Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Codegen][GPU] Keep range and divisibility annotations on push constants #19348

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

krzysz00
Copy link
Contributor

@krzysz00 krzysz00 commented Dec 3, 2024

IREE has useful information indicating the minimum values, maximum values, and divisibility of push constants encoded in util.assume.int ops. This information was being thrown away when, in some cases, it could be profitably communicated to compiler backends.

This commit:

  • Changes drop-compiler-hints to have an option that keeps util.assume.int ops
  • Adds rewrites to the LLVMGPU and SPIRV lowerings that erase these ops
  • Changes the rewrites for hal.interface.constant.load to look for util.assume.int ops in the input IR and use them to add annotations to the loaded constant
    • In the LLVM case, these annotations take the form of a range(iN lb, ub) attribute on the corresponding function parameter
    • For SPIR-V, these annotations are calls to KHR_AssumeTrue if the capability is avaliable
  • This commit also adds a case for integer assumption operations to the SPIR-V i64 emulation pass

While I was here, I converted some of the LLVM lowering patterns to use ConvertOpToLLVMPattern<>.

Copy link
Contributor

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any idea of the performance impact of the changes here. Basically are there somethings that we know LLVM can pick up to make better use of this?

// If the constant has non-trivial assumptions placed on it about
// its min and max values or divisibility, use that information to
// annotate the corresponding arguments.
if (op.getResult().hasOneUse()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the one use condition?

@@ -28,8 +29,10 @@ class DropCompilerHintsPass
op.replaceAllUsesWith(op.getOperands());
op.erase();
} else if (auto op = dyn_cast<IREE::Util::AssumeIntOp>(genericOp)) {
op.replaceAllUsesWith(op.getOperands());
op.erase();
if (!keepAssumeInt) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Convert this to

if (auto op = dyn_cast<IREE::Util::OptimizationBarrierOp>(...)) {
  ...
  return;
}
if (auto op = dyn_cast<IREE::Util::AssumeIntOp>(...)) {
  if (keepAssumeInt) return;
  .....
}

@krzysz00
Copy link
Contributor Author

krzysz00 commented Dec 5, 2024

I'll try and find a way to perf this - especially since I need to get index narrowing run MLIR-side, it looks like, but an eyeball of the IR (when combined with my linearize-mma changes, which may be related) revealed a lot more or disjoint and nuw and nsw than before, and I do know inferring that is codegen-relevant

@krzysz00 krzysz00 force-pushed the keep-range-divisibility-annotations branch 2 times, most recently from d567cd2 to e26a966 Compare December 12, 2024 17:11
IREE has useful information indicating the minimum values, maximum
values, and divisibility of push constants encoded in util.assume.int
ops. This information was being thrown away when, in some cases, it
could be profitably communicated to compiler backends.

This commit:
- Changes drop-compiler-hints to have an option that keeps
util.assume.int ops
- Adds rewrites to the LLVMGPU and SPIRV lowerings that erase these
ops
- Changes the rewrites for hal.interface.constant.load to look for
util.assume.int ops in the input IR and use them to add annotations to
the loaded constant
  - In the LLVM case, these annotations take the form of a
    `range(iN lb, ub)` attribute on the corresponding function
    parameter
  - For SPIR-V, these annotations are calls to KHR_AssumeTrue if the
  capability is avaliable
- This commit also adds a case for integer assumption operations to
the SPIR-V i64 emulation pass

While I was here, I converted some of the LLVM lowering patterns to
use ConvertOpToLLVMPattern<>.
@krzysz00 krzysz00 force-pushed the keep-range-divisibility-annotations branch from e26a966 to d240e2b Compare December 13, 2024 20:18
@krzysz00
Copy link
Contributor Author

Ok, so, on the performance front, I ran the IREE gemm kernel benchmarks with both my patch stack (this plus the integer range narrowing PR plus linearize-mma) and a recent main commit and then looked at the TFlops changes.

So, here're the summary statistics for the percent increase from my patches:

In [32]: ((branch['tflops'] - mainline['tflops']) / mainline['tflops'] * 100).describe()
    ...:
Out[32]:
count    582.000000
mean       0.820292
std        7.603068
min      -21.561822
25%       -2.335348
50%        0.000000
75%        3.215377
max       74.300112
Name: tflops, dtype: float64

However, that included a bunch of matvecs and other things that're really susceptible to benchmark noise. So, if I restrict myself to only configs where main got at least 10 tflop/s, I get

In [33]: ((branch_fast['tflops'] - mainline_fast['tflops']) / mainline_fast['tflops'] *
    ...: 100).describe()
Out[33]:                                                                                count    212.000000
mean       1.270069
std        4.050435
min      -11.428650
25%       -1.167054
50%        0.817124
75%        3.703665
max       12.999693
Name: tflops, dtype: float64

I haven't yet benchmarked the slow LLama dispatch, specifically, but that's waiting on fixes to this and other related PRs where I realized I missed a few opportunities.

I think the conclusion here is that I've dug up a marginal, but real, series of performance improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants