-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TileAndFuse] Add thread groups for convolution ops #695
Merged
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
// RUN: iree-opt --pass-pipeline='builtin.module(func.func(iree-amdaie-tile-and-fuse{tiling-level=0}))' --split-input-file %s | FileCheck %s --check-prefix=TILE-LEVEL-0 | ||
// RUN: iree-opt --pass-pipeline='builtin.module(func.func(iree-amdaie-tile-and-fuse{tiling-level=1}))' --split-input-file %s | FileCheck %s --check-prefix=TILE-LEVEL-1 | ||
// RUN: iree-opt --pass-pipeline='builtin.module(func.func(iree-amdaie-tile-and-fuse{tiling-level=0}))' --split-input-file %s | FileCheck %s --check-prefix=TILE-LEVEL-0 | ||
// RUN: iree-opt --pass-pipeline='builtin.module(func.func(iree-amdaie-tile-and-fuse{tiling-level=1}))' --split-input-file --verify-diagnostics %s | FileCheck %s --check-prefix=TILE-LEVEL-1 | ||
// RUN: iree-opt --pass-pipeline='builtin.module(func.func(iree-amdaie-tile-and-fuse{tiling-level=0 tile-elementwise=false}))' --split-input-file %s | FileCheck %s --check-prefix=TILE-MATMUL-ONLY | ||
|
||
func.func @matmul_static(%arg0: tensor<8x16xi32>, %arg1 : tensor<16x8xi32>) -> tensor<8x8xi32> { | ||
|
@@ -32,7 +32,25 @@ func.func @conv_2d_nhwc_hwcf(%arg0: tensor<2x14x14x32xbf16>, %arg1: tensor<3x3x3 | |
// TILE-LEVEL-0-SAME: { | ||
// TILE-LEVEL-0: linalg.fill | ||
// TILE-LEVEL-0: linalg.conv_2d_nhwc_hwcf | ||
// TILE-LEVEL-0: } | ||
// TILE-LEVEL-0: } {mapping = [#gpu.block<y>, #gpu.block<x>, #gpu.block<z>]} | ||
|
||
// TILE-LEVEL-1: @conv_2d_nhwc_hwcf | ||
// TILE-LEVEL-1: scf.forall | ||
// TILE-LEVEL-1-SAME: { | ||
// TILE-LEVEL-1: linalg.fill | ||
// TILE-LEVEL-1: linalg.conv_2d_nhwc_hwcf | ||
// TILE-LEVEL-1: } {mapping = [#gpu.thread<z>, #gpu.thread<linear_dim_0>, #gpu.thread<y>, #gpu.thread<x>]} | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This looks good to me. |
||
// ----- | ||
|
||
func.func @conv_2d_nhwc_hwcf_unsupported_tiling(%arg0: tensor<2x14x14x32xbf16>, %arg1: tensor<3x3x32x64xbf16>) -> tensor<2x12x12x64xf32> { | ||
%cst = arith.constant 0.000000e+00 : f32 | ||
%0 = tensor.empty() : tensor<2x12x12x64xf32> | ||
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<2x12x12x64xf32>) -> tensor<2x12x12x64xf32> | ||
// expected-error @+1 {{'linalg.conv_2d_nhwc_hwcf' op has requested tile sizes [1, 4, 4, 4, 0, 0, 0]. Currently we only support tiling thread dimensions with at most 2 dimensions having a tile size greater than 1, there are 3 here.}} | ||
%2 = linalg.conv_2d_nhwc_hwcf {dilations = dense<1> : vector<2xi64>, lowering_config = #iree_codegen.lowering_config<tile_sizes = [[0, 4, 4, 4, 0, 0, 0], [1, 4, 4, 4, 0, 0, 0], [0, 0, 0, 0, 1, 1, 8]]>, strides = dense<1> : vector<2xi64>} ins(%arg0, %arg1 : tensor<2x14x14x32xbf16>, tensor<3x3x32x64xbf16>) outs(%1 : tensor<2x12x12x64xf32>) -> tensor<2x12x12x64xf32> | ||
return %2 : tensor<2x12x12x64xf32> | ||
} | ||
|
||
// ----- | ||
|
||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, the order of mapping attributes for block and thread should be corresponding to each other. Although we are not using block attributes at the moment, it's good to keep the attributes in the same order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so the logic is actually identical for threads and blocks. What's happening in this example is that the tiling sizes at thread and block levels are different:
For blocks:
[0, 4, 4, 4, 0, 0, 0]
For threads:
[1, 1, 4, 4, 0, 0, 0]
The logic works implemented in this PR works as follows:
First, assign attributes to dimensions with tile size greater than 1. For threads, that is dimensions 2 and 3.
Second, assign attributes to dimensions with tile size equal to 1. For threads, that is dimensions 0 and 1.
The attributes assigned in the order
y
thenx
thenz
thenlinear_dim_0, linear_dim_1
etc.For
[1, 1, 4, 4, 0, 0, 0]
, after step 1 the assigned dimensions are[none, none, y, x, none, none, none]
and then after step 2 the assigned dimensions are
[z, linear_dim_0, y, x, none, none, none]
.And that is why at the thread level we end up with [z, linear_dim_0, y, x].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. Yeah, I can follow the steps to set up mapping attributes. It's just not typical that the attributes are different for the same dim in block and thread mapping. I don't have a good solution to solve this other than hardcoding the dimensions. On the other hand, since the block attributes are not used anyway, maybe we could remove these block attributes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried removing block dimension tiling, but for small matmul examples where the block tile sizes are all 1, there's a problem: scf.forall without tiling attributes get canonicalized away when the iterations space is size 1. i.e. the block-level scf.forall gets removed completely. Ideally we'd be able to work without this outer scf.forall, but currently the passes aren't set up to handle this, I guess. So removing block dimensions for matmuls isn't something we can immediately do.
I could remove them for convolution, but we might have the same issue when we have small convolutions.
So yeah, not sure. Maybe, for now, it's ok to keep the block dimensions as they are for convolution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see... We should find a way to solve the scf.forall canonicalization problem. Could you add a TODO comment for the block mapping attributes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll add a comment. Thanks for accepting :)