-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Depthwise convolution (no new pipeline) #565
Conversation
Does aievec support |
I've asked @jsetoain about this IR, hoping for some feedback tomorrow |
I was going to say that aievec doesn't need to support |
tileSizeLevel0 = { | ||
/* N */ 0, /* output width */ 4, /* output height */ 1, | ||
/* channel */ 16, /* kernel height */ 0, /* kernel width */ 0}; | ||
|
||
// Inner-most scf.forall tiling. The iteration space corresponds to | ||
// individual AIE cores. TODO(newling) | ||
// 1) check that the core array of shape Nx(16/4) is valid. N columns? | ||
tileSizeLevel1 = { | ||
/* N */ 1, /* output width */ 0, /* output height */ 0, | ||
/* channel */ 4, /* kernel height */ 0, /* kernel width */ 0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to Conv2d channel last, I think we should first try tile sizes as
tileSizeLevel0 = {0, 4, OW, OC, 0, 0};
tileSizeLevel1 = {1, 1, OW, OC, 0, 0};
tileSizeLevel2 = {0, 0, 0, 0, 1, 1};
If the batch size = 1, we could use 4x4 array, which means OW (OC) in the first level can be 4OW (4OC).
With current mlir-air head (8727a53ccd5ad771deb15b404d7a81db07d4c7ba) depthwise with i32 inputs and outputs shall pass the test. |
@jsetoain for bf16->f32 vectorization, I got error
The IR snippet
|
I don't understand why you're getting that error. When I try to lower your code snippet, I find issues with this:
I'm not entirely sure how the That said, the |
Update: I've noticed an easy-to-fix issue ( |
feca6f4
to
9cca226
Compare
c1a7d10
to
23ff9ef
Compare
Below are the IR evolutions for bf16->f32 and i8->i32 depthwise convolutions. Compilation 1:With this tiling strategy, there is an elementwise multiply-add on 16 elements (which for i8 I think is incorrect for AIE?)
which I think is because there was a problem earlier in lowering. Compilation 2:This does the elementwise multiply-add on 32 elements (which for i8 I think is correct for AIE?)
in aievec-to-llvm. Compilation 3:This is for bf16... added here for completeness but I'd like to focus on i8 for now.
@jsetoain can you please take a look at the file attached (compilation 2 I think is most relevant) and see if the lowering through aievec looks sensible? Note that is possible that there is something which Maks didn't port in c1f4984 (I hope we don't have to be doing this porting for too long!) |
WIP: need to resolve lowering issues. Example IR: