Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable MmaOp to receive unbroadcasted inputs #3372

Open
jacobhinkle opened this issue Nov 7, 2024 · 0 comments
Open

Enable MmaOp to receive unbroadcasted inputs #3372

jacobhinkle opened this issue Nov 7, 2024 · 0 comments
Assignees
Labels

Comments

@jacobhinkle
Copy link
Collaborator

This is a proposal to enable MmaOp to receive inputs shaped like [M, K] and [N, K] instead of [M, 1, K] and [1, N, K].

This is an alternative to #3366.

Motivation

Currently, MmaOp requires at least 3D inputs in which all of the dimensions "line up". That means that M dimensions should be Iteration in the A operand and Broadcast in the B operand for example. This lets us use the default exact domain mapping between operands and MmaOp output. However, it means that if we are translating a Fusion that has MatmulOp or LinearOp to use MmaOp, we need to introduce BroadcastOp nodes, which interferes with the optimal gmem->smem->mma pipeline on Hopper.

Proposed Approach

I propose to do the following:

  • Add attributes to the MmaOp
  • Add a special case in PairwiseLogicalDomainMap that will map the output domains to domains in the inputs. This is similar to what we do for SdpaFwdOp and SdpaBwdOp currently.
  • Update mma_utils::MatmulPattern::translateToMmaOp to skip inserting broadcasts.
  • Update the Ampere and Hopper matmul schedulers to not assume there is a broadcast M or N dimension in the ab and bb tensors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant