Skip to content

Commit

Permalink
[Docs][Tuner] Add initial tuning spec docs (iree-org#19462)
Browse files Browse the repository at this point in the history
General intro, list the main flags, show an example.

Issue: iree-org#19214
  • Loading branch information
kuhar authored Dec 11, 2024
1 parent dea512b commit 52ce3d3
Show file tree
Hide file tree
Showing 3 changed files with 138 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/website/docs/reference/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,5 @@ repository.

* [Glossary](./glossary.md)
* [Optimization options](./optimization-options.md)
* [Tuning](./tuning.md)
* [Extensions](./extensions.md)
136 changes: 136 additions & 0 deletions docs/website/docs/reference/tuning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
---
icon: octicons/meter-16
---

# Tuning

This page documents support for IREE dispatch tuning. The compiler supports
both default and user-provided tuning specs (specifications) that override
compiler heuristics that guide dispatch code generation. In our experience,
tuning specs can provide meaningful speedup of model execution. For example, we
achieved a ~10% improvement on the Stable Diffusion XL (SDXL) model with the
MI300X GPU.

## Tuning specs

The default specs are shipped with the IREE compiler and are target-specific.
We aim to provide default tuning specs that cover the most in-demand hardware
and dispatches from most popular ML models, although we do not guarantee
completeness.

User-provided tuning specs are a mechanism that allows for users to get the
best performance on custom models and hardware targets without having to modify
the compiler source code or needlessly special-case compiler heuristics.

Currently, the dispatch tuner that generates tuning specs is still experimental
and hosted
[in an external repo](https://github.com/nod-ai/shark-ai/tree/main/tuner). This
document describes how to work with tuning specs generated by the SHARK Tuner
or produced manually, but it does not go into detail on how to generate these
specs.

## Flags

The use of tuning specs in `iree-compile` is controlled with the following
flags:

* `--iree-codegen-enable-default-tuning-specs` -- enables or disables the
default tuning specs shipped with the compiler.
* `--iree-codegen-tuning-spec-path` -- loads a user-specified tuning spec.
* `--iree-codegen-dump-tuning-specs-to` -- dumps final tuning specs to a
directory or standard output.

Note that both default and user-provided specs can be enabled at the same time.
The compiler will link them together and invoke the user-provided spec before
attempting the default one.

## Anatomy of a tuning spec

### Example

```mlir
module @my_spec attributes { transform.with_named_sequence } {
transform.named_sequence @apply_op_config(%op: !transform.any_op {transform.readonly},
%config: !transform.any_param {transform.readonly}) {
transform.annotate %op "compilation_info" = %config : !transform.any_op, !transform.any_param
transform.yield
}
transform.named_sequence
@match_mmt_f16_f16_f32(%root: !transform.any_op {transform.readonly}) -> !transform.any_op {
transform.match.operation_name %root ["linalg.generic"] : !transform.any_op
%ins, %outs = transform.iree.match.cast_compatible_dag_from_root %root {
^bb0(%lhs: tensor<?x?xf16>, %rhs: tensor<?x?xf16>, %out: tensor<?x?xf32>):
%7 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>,
affine_map<(d0, d1, d2) -> (d1, d2)>,
affine_map<(d0, d1, d2) -> (d0, d1)>],
iterator_types = ["parallel", "parallel", "reduction"]}
ins(%lhs, %rhs : tensor<?x?xf16>, tensor<?x?xf16>) outs(%out : tensor<?x?xf32>) {
^bb0(%in: f16, %in_0: f16, %acc: f32):
%8 = arith.extf %in : f16 to f32
%9 = arith.extf %in_0 : f16 to f32
%10 = arith.mulf %8, %9 : f32
%11 = arith.addf %acc, %10 : f32
linalg.yield %11 : f32
} -> tensor<?x?xf32>
} : (!transform.any_op) -> (!transform.any_value, !transform.any_value)
transform.yield %root : !transform.any_op
}
transform.named_sequence
@match_mmt_2048x1280x5120_f16_f16_f32(%matmul: !transform.any_op {transform.readonly})
-> (!transform.any_op, !transform.any_param) {
%mmt = transform.include @match_mmt_f16_f16_f32 failures(propagate) (%matmul)
: (!transform.any_op) -> !transform.any_op
%lhs = transform.get_operand %matmul[0] : (!transform.any_op) -> !transform.any_value
%rhs = transform.get_operand %matmul[1] : (!transform.any_op) -> !transform.any_value
transform.iree.match.cast_compatible_type %lhs = tensor<2048x5120xf16> : !transform.any_value
transform.iree.match.cast_compatible_type %rhs = tensor<1280x5120xf16> : !transform.any_value
%config = transform.param.constant #iree_codegen.compilation_info<
lowering_config = #iree_gpu.lowering_config<{promote_operands = [0, 1],
mma_kind = #iree_gpu.mma_layout<MFMA_F32_16x16x16_F16>,
subgroup_m_count = 2, subgroup_n_count = 2,
reduction = [0, 0, 64],
workgroup = [64, 128, 0]}>,
translation_info = #iree_codegen.translation_info<pipeline = LLVMGPUVectorDistribute
workgroup_size = [256, 1, 1] subgroup_size = 64,
{gpu_pipeline_options = #iree_gpu.pipeline_options<prefetch_shared_memory = true>}>
> -> !transform.any_param
transform.yield %matmul, %config : !transform.any_op, !transform.any_param
}
transform.named_sequence
@__kernel_config(%variant_op: !transform.any_op {transform.consumed}) -> !transform.any_op
attributes { iree_codegen.tuning_spec_entrypoint } {
%res = transform.foreach_match in %variant_op
@match_mmt_2048x1280x5120_f16_f16_f32 -> @apply_op_config
: (!transform.any_op) -> !transform.any_op
transform.yield %res : !transform.any_op
}
}
```

### Explanation

Tuning specs are
[transform dialect](https://mlir.llvm.org/docs/Dialects/Transform/) libraries
that conform to the following format:

* All tuning spec entry points (named sequence ops) are marked with the
`iree_codegen.tuning_spec_entrypoint` attribute. They have a single argument
of type `!transform.any_op` and return a single value of type
`!transform.any_op`.
* All entry points in the final tuning specs must either read
(`transform.readonly`) or consume (`transform.consumed`) the argument.

The tuning spec above attempts to match `linalg.generic` ops that correspond to the
matmul operation with the RHS operand transposed (a.k.a. mmt) of shape
`2048x1280x5120` and `f16` operand element types and `f32` result element type.

If the match succeeds, the tuning spec applies the `compilation_info` attribute
that will drive the code generation. This attribute is considered a compiler
implementation detail; in general, each codegen pipeline has its own
requirements as to what is considered a valid compilation info and how to
interpret it.

Tuning specs get executed by the 'Materialize User Configs` pass.
1 change: 1 addition & 0 deletions docs/website/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,7 @@ nav:
- "Other topics":
- Glossary: "reference/glossary.md"
- Optimization options: "reference/optimization-options.md"
- Tuning: "reference/tuning.md"
- Extensions: "reference/extensions.md"
- "Developers":
- "developers/index.md"
Expand Down

0 comments on commit 52ce3d3

Please sign in to comment.