[tuner] Refactor tuner flow to use unified functions for models and dispatches #577

Max191 · 2024-11-20T22:14:08Z

This PR refactors the tuner flow into a unified compilation + benchmark path, and adds an example test showing how to use the new flow. The PR does not remove any of the code from the old paths, because the new path requires a newer version of iree-compiler, which is not compatible with most of the tuner code yet. This PR is quite lengthy so the new flow is described below:

Goals and Motivations

The purpose of this PR is first to unify the model and dispatch compilation and benchmark flow, and second to remove much of the spooky action at a distance between tuner clients and the internal tuner implementation. The first is accomplished by having a single compile and benchmark function, and the second is greatly improved by attempting to hide as much of the CandidateTracker information from the client as possible.

Candidate Generation

Candidate generation uses the same constraint logic to determine a set of potential configurations, and then directly generates transform dialect specs based on the input mlir file, and the generated constraints. The TD specs are created by matching specific operation types in the input mlir file, and then creating a transform.iree.match.cast_compatible_dag_from_root in the TD named sequence matcher. This uses a mix of python bindings for matching root ops, and string formatting for building the TD spec, but can eventually be shifted to using exclusively python bindings.

The main difference in the new candidate generation is that only TD specs are created, instead of generating the full configured source for compilation later. The source will be stripped of compilation info and compiled with the TD spec during the compile step.

Note: Candidate generation with the new path was only implemented for MMT and Conv in this initial PR. The other operations can be supported fairly easily after this PR is merged.

Candidate Compilation

Candidates are now compiled with a single function for both models and dispatches. There are a few primary differences here:

Candidates are stripped of their compilation info with the iree-codegen-strip-compilation-info pass before compilation. This allows dispatches to be compiled in the same way as models.
The TuningClient has a new function called get_iree_compile_flags, which is used in place of get_dispatch_compile_command and get_model_compile_command. This new function hides the candidate trackers from the client, and is only meant to return some additional iree-compile flags (not including the source file path or iree-hal-target-backends). This simplifies the TuningClient implementations significantly, and makes it much easier to understand how to create one.
The compile function takes an optional argument to use as the compilation input. This overwrites whatever benchmark file was used for candidate generation during compilation. This allows using the same compile command for the full model by overwriting the input with the model ir.

Candidate Benchmarking

Similar to the compilation, the get_dispatch_benchmark_command and get_model_benchmark_command are replaced by a single get_iree_benchmark_module_flags function. This function hides the candidate trackers from the client, and simply expects a list of extra flags for iree-benchmark-module (not including --device, --module, or --benchmark_format). The benchmark parsing also needed to be rewritten to match the new benchmarking command.

…ispatches Signed-off-by: Max Dawkins <[email protected]>

Max191 · 2024-11-20T22:17:38Z

This PR is not really ready to land yet, since it requires an updated iree-compiler version. I'm mainly sending this out right now for visibility.

EDIT: I also realized I forgot to add the actual TD spec application function in the TD spec generation. I will add that in.

kuhar

Overall looks great, thanks for the detailed description and breakdown of the plan!

We should split it into smaller PRs and start landing 😸

kuhar · 2024-11-21T00:02:39Z

tuner/examples/test/tuner_test.py

+class TestTuner(libtuner.TuningClient):
+    def __init__(self):
+        self.compile_flags = [
+            "--iree-hip-target=gfx942",


For dispatches, we don't need to provide the target as dispatches are already configured.

But we will need this for the full model,

kuhar · 2024-11-21T00:03:18Z

tuner/examples/test/tuner_test.py

+        self.benchmark_flags = [
+            "--benchmark_repetitions=3",
+            "--benchmark_format=json",


Eventually we should use the bindings for benchmarking too. Just FYI, this is definitely outside of the scope of this PR.

Yeah, that's actually what I did at first, but the compiler bindings did not seem to recognize many of the compiler codegen flags. I didn't dig too deep into it, but I think we may need to expose more flags to the compiler api in IREE before we can do this for compilation. And then since I was already using subprocess for compilation, I just did the same for benchmarking for consistency.

kuhar · 2024-11-21T00:04:18Z

tuner/pyproject.toml

@@ -1,5 +1,5 @@
 [project]
-name = "SHARK Tuner"
+name = "SHARK-Tuner"


The space was causing my pip install of the tuner to fail for some reason, so I just made it a dash. I forgot to remove this change before pushing. I'm not sure why it fails to pip install on my machine (and evidently not on the CI), but if there is no preference for having a space then I think it's probably safer to use a dash.

Oh I've never tried installing it, I don't know if it actually works. I mostly copy-pasted from other project file in this repo

kuhar · 2024-11-21T00:06:38Z

tuner/tuner/candidate_gen.py

+        tile_sizes = [0, 0]
+        tile_sizes.append(configuration.tile_sizes[2])
+        return tile_sizes
+
+    def get_workgroup_tile_sizes(self, configuration: Configuration):
+        tile_sizes = configuration.tile_sizes[:2]
+        tile_sizes.append(0)


We will probably also have to refactor how we store tile sizes to support other pipelines

Yeah, there are a number of additional things that could probably use refactoring, but I left that out of the scope of this PR, since it is already very big. I think it makes sense to refactor this when we move the TD generation to use purely python bindings.

[tuner] Refactor tuner flow to use unified functions for models and d…

5f7da02

…ispatches Signed-off-by: Max Dawkins <[email protected]>

kuhar reviewed Nov 21, 2024

View reviewed changes

Max191 mentioned this pull request Nov 25, 2024

[tuner] Add direct TD spec generation for candidates #606

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tuner] Refactor tuner flow to use unified functions for models and dispatches #577

[tuner] Refactor tuner flow to use unified functions for models and dispatches #577

Max191 commented Nov 20, 2024 •

edited

Loading

Max191 commented Nov 20, 2024 •

edited

Loading

kuhar left a comment

kuhar Nov 21, 2024

kuhar Nov 21, 2024

Max191 Nov 21, 2024

kuhar Nov 21, 2024

Max191 Nov 21, 2024 •

edited

Loading

kuhar Nov 21, 2024

kuhar Nov 21, 2024

Max191 Nov 21, 2024

[tuner] Refactor tuner flow to use unified functions for models and dispatches #577

Are you sure you want to change the base?

[tuner] Refactor tuner flow to use unified functions for models and dispatches #577

Conversation

Max191 commented Nov 20, 2024 • edited Loading

Goals and Motivations

Candidate Generation

Candidate Compilation

Candidate Benchmarking

Max191 commented Nov 20, 2024 • edited Loading

kuhar left a comment

Choose a reason for hiding this comment

kuhar Nov 21, 2024

Choose a reason for hiding this comment

kuhar Nov 21, 2024

Choose a reason for hiding this comment

Max191 Nov 21, 2024

Choose a reason for hiding this comment

kuhar Nov 21, 2024

Choose a reason for hiding this comment

Max191 Nov 21, 2024 • edited Loading

Choose a reason for hiding this comment

kuhar Nov 21, 2024

Choose a reason for hiding this comment

kuhar Nov 21, 2024

Choose a reason for hiding this comment

Max191 Nov 21, 2024

Choose a reason for hiding this comment

Max191 commented Nov 20, 2024 •

edited

Loading

Max191 commented Nov 20, 2024 •

edited

Loading

Max191 Nov 21, 2024 •

edited

Loading