-
Notifications
You must be signed in to change notification settings - Fork 631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Dispatch] Don't sink collapse_shape through k2 dims #19379
Conversation
9f807c5
to
15172ac
Compare
Signed-off-by: Ian Wood <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why we need this? I dont understand why its not good to do this
It's to deal with the fact that GPU codegen can't handle unit length dimensions on the reduction dims. For some more context:
It's just a workaround for now until either can be implemented (or the unit dims get folded out during DropUnitExtentDims) |
Right, i think passing inner unit dims to a dispatch is always bad. Shouldn’t this be handled by CollapseDimensions pass though? My understanding was reshaped are propagated to enable fusion properties and then CollapseDimensions collapses the extra reshapes that werent required for the dispatch. I’ll unblock, but this seems like a hack because CollapseDimensions doesn’t understand reduction dimensions well. Can you confirm if my thinking is right? |
Yes, this is very hacky and should be handled by collapse dimensions. The problem is that the attention op has dynamic dims which need some work to support. We also need to figure out how to handle bitcast-like ops. This is where the unit dims are coming from. For example: %expanded_24 = tensor.expand_shape %214 [[0, 1, 2], [3]] output_shape [4, %21, 1, 128] : tensor<?x128xf16> into tensor<4x?x1x128xf16>
%323 = flow.tensor.bitcast %expanded_24 : tensor<4x?x1x128xf16>{%21} -> tensor<4x?x1x64xcomplex<f16>>{%21}
%collapsed_40 = tensor.collapse_shape %323 [[0], [1, 2], [3]] : tensor<4x?x1x64xcomplex<f16>> into tensor<4x?x64xcomplex<f16>> But since |
Okay, I'll merge this so that llama compilation is unblocked after mahesh's PR, but I'll start working a non-hack solution for this. |
This change adds
linalgExtExpansionFn
to limit sinking ofcollapse_shape
ops throughiree_linalg_ext.attention
only when the k2 dimensions are not expanded by the reshape fusion. Currently, GPU Codegen cannot support unit dims on the k2 dimensions, so anycollapse_shape
that expands out unit dimensions on these dims will cause compilation errors.This fixes the unit dim error in #19263 but it uncovered furtherk but unrelated, compilation errors tracked in #19377.