Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tuner] Refactor tuner flow to use unified functions for models and dispatches #577

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Max191
Copy link

@Max191 Max191 commented Nov 20, 2024

This PR refactors the tuner flow into a unified compilation + benchmark path, and adds an example test showing how to use the new flow. The PR does not remove any of the code from the old paths, because the new path requires a newer version of iree-compiler, which is not compatible with most of the tuner code yet. This PR is quite lengthy so the new flow is described below:

Goals and Motivations

The purpose of this PR is first to unify the model and dispatch compilation and benchmark flow, and second to remove much of the spooky action at a distance between tuner clients and the internal tuner implementation. The first is accomplished by having a single compile and benchmark function, and the second is greatly improved by attempting to hide as much of the CandidateTracker information from the client as possible.

Candidate Generation

Candidate generation uses the same constraint logic to determine a set of potential configurations, and then directly generates transform dialect specs based on the input mlir file, and the generated constraints. The TD specs are created by matching specific operation types in the input mlir file, and then creating a transform.iree.match.cast_compatible_dag_from_root in the TD named sequence matcher. This uses a mix of python bindings for matching root ops, and string formatting for building the TD spec, but can eventually be shifted to using exclusively python bindings.

The main difference in the new candidate generation is that only TD specs are created, instead of generating the full configured source for compilation later. The source will be stripped of compilation info and compiled with the TD spec during the compile step.

Note: Candidate generation with the new path was only implemented for MMT and Conv in this initial PR. The other operations can be supported fairly easily after this PR is merged.

Candidate Compilation

Candidates are now compiled with a single function for both models and dispatches. There are a few primary differences here:

  • Candidates are stripped of their compilation info with the iree-codegen-strip-compilation-info pass before compilation. This allows dispatches to be compiled in the same way as models.
  • The TuningClient has a new function called get_iree_compile_flags, which is used in place of get_dispatch_compile_command and get_model_compile_command. This new function hides the candidate trackers from the client, and is only meant to return some additional iree-compile flags (not including the source file path or iree-hal-target-backends). This simplifies the TuningClient implementations significantly, and makes it much easier to understand how to create one.
  • The compile function takes an optional argument to use as the compilation input. This overwrites whatever benchmark file was used for candidate generation during compilation. This allows using the same compile command for the full model by overwriting the input with the model ir.

Candidate Benchmarking

Similar to the compilation, the get_dispatch_benchmark_command and get_model_benchmark_command are replaced by a single get_iree_benchmark_module_flags function. This function hides the candidate trackers from the client, and simply expects a list of extra flags for iree-benchmark-module (not including --device, --module, or --benchmark_format). The benchmark parsing also needed to be rewritten to match the new benchmarking command.

@Max191
Copy link
Author

Max191 commented Nov 20, 2024

This PR is not really ready to land yet, since it requires an updated iree-compiler version. I'm mainly sending this out right now for visibility.

EDIT: I also realized I forgot to add the actual TD spec application function in the TD spec generation. I will add that in.

Copy link
Member

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great, thanks for the detailed description and breakdown of the plan!

We should split it into smaller PRs and start landing 😸

class TestTuner(libtuner.TuningClient):
def __init__(self):
self.compile_flags = [
"--iree-hip-target=gfx942",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For dispatches, we don't need to provide the target as dispatches are already configured.

But we will need this for the full model,

Comment on lines +36 to +38
self.benchmark_flags = [
"--benchmark_repetitions=3",
"--benchmark_format=json",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually we should use the bindings for benchmarking too. Just FYI, this is definitely outside of the scope of this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's actually what I did at first, but the compiler bindings did not seem to recognize many of the compiler codegen flags. I didn't dig too deep into it, but I think we may need to expose more flags to the compiler api in IREE before we can do this for compilation. And then since I was already using subprocess for compilation, I just did the same for benchmarking for consistency.

@@ -1,5 +1,5 @@
[project]
name = "SHARK Tuner"
name = "SHARK-Tuner"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Author

@Max191 Max191 Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The space was causing my pip install of the tuner to fail for some reason, so I just made it a dash. I forgot to remove this change before pushing. I'm not sure why it fails to pip install on my machine (and evidently not on the CI), but if there is no preference for having a space then I think it's probably safer to use a dash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I've never tried installing it, I don't know if it actually works. I mostly copy-pasted from other project file in this repo

Comment on lines +216 to +222
tile_sizes = [0, 0]
tile_sizes.append(configuration.tile_sizes[2])
return tile_sizes

def get_workgroup_tile_sizes(self, configuration: Configuration):
tile_sizes = configuration.tile_sizes[:2]
tile_sizes.append(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will probably also have to refactor how we store tile sizes to support other pipelines

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there are a number of additional things that could probably use refactoring, but I left that out of the scope of this PR, since it is already very big. I think it makes sense to refactor this when we move the TD generation to use purely python bindings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants