diff --git a/docs/website/docs/reference/index.md b/docs/website/docs/reference/index.md index a5d1a50b3130..c8e361504d38 100644 --- a/docs/website/docs/reference/index.md +++ b/docs/website/docs/reference/index.md @@ -18,4 +18,5 @@ repository. * [Glossary](./glossary.md) * [Optimization options](./optimization-options.md) +* [Tuning](./tuning.md) * [Extensions](./extensions.md) diff --git a/docs/website/docs/reference/tuning.md b/docs/website/docs/reference/tuning.md new file mode 100644 index 000000000000..fe4a7ff78ab8 --- /dev/null +++ b/docs/website/docs/reference/tuning.md @@ -0,0 +1,136 @@ +--- +icon: octicons/meter-16 +--- + +# Tuning + +This page documents support for IREE dispatch tuning. The compiler supports +both default and user-provided tuning specs (specifications) that override +compiler heuristics that guide dispatch code generation. In our experience, +tuning specs can provide meaningful speedup of model execution. For example, we +achieved a ~10% improvement on the Stable Diffusion XL (SDXL) model with the +MI300X GPU. + +## Tuning specs + +The default specs are shipped with the IREE compiler and are target-specific. +We aim to provide default tuning specs that cover the most in-demand hardware +and dispatches from most popular ML models, although we do not guarantee +completeness. + +User-provided tuning specs are a mechanism that allows for users to get the +best performance on custom models and hardware targets without having to modify +the compiler source code or needlessly special-case compiler heuristics. + +Currently, the dispatch tuner that generates tuning specs is still experimental +and hosted +[in an external repo](https://github.com/nod-ai/shark-ai/tree/main/tuner). This +document describes how to work with tuning specs generated by the SHARK Tuner +or produced manually, but it does not go into detail on how to generate these +specs. + +## Flags + +The use of tuning specs in `iree-compile` is controlled with the following +flags: + +* `--iree-codegen-enable-default-tuning-specs` -- enables or disables the + default tuning specs shipped with the compiler. +* `--iree-codegen-tuning-spec-path` -- loads a user-specified tuning spec. +* `--iree-codegen-dump-tuning-specs-to` -- dumps final tuning specs to a + directory or standard output. + +Note that both default and user-provided specs can be enabled at the same time. +The compiler will link them together and invoke the user-provided spec before +attempting the default one. + +## Anatomy of a tuning spec + +### Example + +```mlir +module @my_spec attributes { transform.with_named_sequence } { +transform.named_sequence @apply_op_config(%op: !transform.any_op {transform.readonly}, + %config: !transform.any_param {transform.readonly}) { + transform.annotate %op "compilation_info" = %config : !transform.any_op, !transform.any_param + transform.yield +} + +transform.named_sequence +@match_mmt_f16_f16_f32(%root: !transform.any_op {transform.readonly}) -> !transform.any_op { + transform.match.operation_name %root ["linalg.generic"] : !transform.any_op + %ins, %outs = transform.iree.match.cast_compatible_dag_from_root %root { + ^bb0(%lhs: tensor, %rhs: tensor, %out: tensor): + %7 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, + affine_map<(d0, d1, d2) -> (d1, d2)>, + affine_map<(d0, d1, d2) -> (d0, d1)>], + iterator_types = ["parallel", "parallel", "reduction"]} + ins(%lhs, %rhs : tensor, tensor) outs(%out : tensor) { + ^bb0(%in: f16, %in_0: f16, %acc: f32): + %8 = arith.extf %in : f16 to f32 + %9 = arith.extf %in_0 : f16 to f32 + %10 = arith.mulf %8, %9 : f32 + %11 = arith.addf %acc, %10 : f32 + linalg.yield %11 : f32 + } -> tensor + } : (!transform.any_op) -> (!transform.any_value, !transform.any_value) + transform.yield %root : !transform.any_op +} + +transform.named_sequence +@match_mmt_2048x1280x5120_f16_f16_f32(%matmul: !transform.any_op {transform.readonly}) + -> (!transform.any_op, !transform.any_param) { + %mmt = transform.include @match_mmt_f16_f16_f32 failures(propagate) (%matmul) + : (!transform.any_op) -> !transform.any_op + %lhs = transform.get_operand %matmul[0] : (!transform.any_op) -> !transform.any_value + %rhs = transform.get_operand %matmul[1] : (!transform.any_op) -> !transform.any_value + transform.iree.match.cast_compatible_type %lhs = tensor<2048x5120xf16> : !transform.any_value + transform.iree.match.cast_compatible_type %rhs = tensor<1280x5120xf16> : !transform.any_value + %config = transform.param.constant #iree_codegen.compilation_info< + lowering_config = #iree_gpu.lowering_config<{promote_operands = [0, 1], + mma_kind = #iree_gpu.mma_layout, + subgroup_m_count = 2, subgroup_n_count = 2, + reduction = [0, 0, 64], + workgroup = [64, 128, 0]}>, + translation_info = #iree_codegen.translation_info}> + > -> !transform.any_param + transform.yield %matmul, %config : !transform.any_op, !transform.any_param +} + +transform.named_sequence +@__kernel_config(%variant_op: !transform.any_op {transform.consumed}) -> !transform.any_op + attributes { iree_codegen.tuning_spec_entrypoint } { + %res = transform.foreach_match in %variant_op + @match_mmt_2048x1280x5120_f16_f16_f32 -> @apply_op_config + : (!transform.any_op) -> !transform.any_op + transform.yield %res : !transform.any_op +} +} +``` + +### Explanation + +Tuning specs are +[transform dialect](https://mlir.llvm.org/docs/Dialects/Transform/) libraries +that conform to the following format: + +* All tuning spec entry points (named sequence ops) are marked with the + `iree_codegen.tuning_spec_entrypoint` attribute. They have a single argument + of type `!transform.any_op` and return a single value of type + `!transform.any_op`. +* All entry points in the final tuning specs must either read + (`transform.readonly`) or consume (`transform.consumed`) the argument. + +The tuning spec above attempts to match `linalg.generic` ops that correspond to the +matmul operation with the RHS operand transposed (a.k.a. mmt) of shape +`2048x1280x5120` and `f16` operand element types and `f32` result element type. + +If the match succeeds, the tuning spec applies the `compilation_info` attribute +that will drive the code generation. This attribute is considered a compiler +implementation detail; in general, each codegen pipeline has its own +requirements as to what is considered a valid compilation info and how to +interpret it. + +Tuning specs get executed by the 'Materialize User Configs` pass. diff --git a/docs/website/mkdocs.yml b/docs/website/mkdocs.yml index 20ce3f014db0..e43683075333 100644 --- a/docs/website/mkdocs.yml +++ b/docs/website/mkdocs.yml @@ -188,6 +188,7 @@ nav: - "Other topics": - Glossary: "reference/glossary.md" - Optimization options: "reference/optimization-options.md" + - Tuning: "reference/tuning.md" - Extensions: "reference/extensions.md" - "Developers": - "developers/index.md"