Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sharktank] Use reshape instead of view #681

Merged
merged 2 commits into from
Dec 13, 2024
Merged

[sharktank] Use reshape instead of view #681

merged 2 commits into from
Dec 13, 2024

Conversation

marbre
Copy link
Collaborator

@marbre marbre commented Dec 12, 2024

Fixes RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead. by using reshape which occurs when running tests/layers/mmdit_test.py with Python 3.11.10 and torch==2.5.1+cpu.

Fixes `RuntimeError: view size is not compatible with input tensor's
size and stride (at least one dimension spans across two contiguous
subspaces). Use .reshape(...) instead.` by using `reshape`. which occurs
when running `tests/layers/mmdit_test.py` with Python 3.11.10 and
`torch==2.5.1+cpu`.
@ScottTodd ScottTodd requested review from sogartar and archana-ramalingam and removed request for ScottTodd December 12, 2024 18:30
@@ -41,7 +41,7 @@ def attention(q, k, v, pe):
q=q, k=k, v=v, a=None, is_causal=True, scale=None
)
x = ops.permute(x, (0, 2, 1, 3))
x = x.view(x.shape[0], x.shape[1], -1)
x = x.reshape(x.shape[0], x.shape[1], -1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played a bit with this example. What is going on is that the scaled dot product attention is returning an non-contiguous tensor.
In torch 2.5.1

x.shape = torch.Size([3, 24, 1536, 128]), x.stride() = (4718592, 196608, 128, 1)

While in torch 2.3.0 it produces

x.shape = torch.Size([3, 24, 1536, 128]), x.stride() = (4718592, 128, 3072, 1)

I guess by accident in 2.3.0 the stride was such that it allowed for the view.
I think on a lot of operations we implicitly assume that they will produce contiguous tensors. It seems that the doc does not state this, but I would bet that even if this was the intention on a lot of ops, it would not be stated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing and testing!

@marbre marbre merged commit 0bec8f1 into nod-ai:main Dec 13, 2024
8 checks passed
@marbre marbre deleted the mmdit branch December 13, 2024 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants