-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Support root->logical transforms in Fusion inputs #3366
Comments
My general concern would be that these approaches would not retain the same information as what I'd feel more comfortable if these scheduling were only done by a scheduler rather than more globally as a preseg pass. I'm doing something similar for slice and concat. |
FYI: Instead of root, I think @wujingyue was thinking about binding allocation domain instead for distributed support: #3282 |
Thanks for tagging me! I think we are trying to overload this poor at::Tensor with too many meanings :) I was thinking of letting at::Tensor to match allocation because it has limitations representing more "abstract" tensor domains like logical. I suspect allocation would also work for this case as long as transforms don't have to go one direction (today, it typically flows logical to allocation). Wdyt? |
We can revisit this later if needed. For now, because of simplicity and smaller scope, I'm going to pursue #3372 instead. |
Yeah I like that. The allocation domain is really telling us how the input should look in memory which is all we need. Really once the fusion is defined I think the only reason we care at all about the logical size of input |
NOTICE: See #3372
This is a proposal to fully support Fusion input TensorViews to contain non-trivial root domains. The ATen tensor passed should then match the root domain of the fusion input, not the logical domain.
Motivation
The primary motivation for this proposal is basically #1628. Usually for Hopper matmul we will want to load both operands to smem using TMA, then directly call the mma instruction using those smem operands. If the Fusion inputs are [M, K] and [N, K], they must be broadcasted to [M, 1, K] and [1, N, K] before they can pass through the MmaOp, which we do using a
BroadcastOp
inmma_utils::MatmulPattern::translateToMmaOp()
. This introduces a tensor that we can't get rid of in our current system.Approach
I propose that we do the following:
CatOp
.ExpressionEvaluator
and bindtv->getMaybeRootDomain()
instead oftv->getLogicalDomain()
to the received shapes of input tensors.SchedulerRuntimeInfo
which also handles theat::Tensor
and needs to know about the root and/or logical domain.I believe this is all that is needed, since we don't actually use the root domain for input tensors and broadcasts should not affect the actual memory layout so the allocation domain matching the logical instead of what is in the ATen tensor is not a problem.
Details
Suppose we have
We can translate this to the following:
Specifically, what was done:
Possible challenges
Allreduce
One challenge is "allreduce", which is a pattern we detect at lowering/codegen where we reduce a dimension then broadcast a new dimension in its place immediately.
If we ignore this pattern while zipping up
BroadcastOp
then we might translate this toI think patterns like this are easy to detect and we can leave the
BroadcastOp
in place in these cases, but we should be careful.I think this is the only way we could actually have a
BroadcastOp
in the fusion if we implement this proposal as a preseg pass. In that case, we could also go ahead and be done withBroadcastOp
once and for all if we did something like introduceIterType::AllReduce
to replace the reduced+broadcasted axis.Aliasing
If an input tensor has a root domain and it is aliased with an output tensor, should this be allowed? I think so but I haven't thought very deeply about it, so I'd probably refuse to do such aliasing until needed.
Summary
Originally we can make light use of this and only apply it to the prologue of translated matmuls. However if it works well it might be a nice simplifying step that we could run as a preseg pass.
Related:
The text was updated successfully, but these errors were encountered: