Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update to rocm 6.3 wheels #1192

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

johnnynunez
Copy link

@johnnynunez johnnynunez commented Jan 10, 2025

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.2.4
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3

@facebook-github-bot facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: rocm labels Jan 10, 2025
@johnnynunez johnnynunez marked this pull request as ready for review January 10, 2025 18:28
@tenpercent
Copy link
Contributor

cc @bottler @danthe3rd

@bottler
Copy link
Contributor

bottler commented Jan 15, 2025

@tenpercent I think you have much more context on a PR like this than me. If you say you want it merged, I can merge it! But I can't really review it.

Copy link
Contributor

@tenpercent tenpercent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

win build is failing and torch version should be same between building and uploading flows

Copy link
Contributor

@tenpercent tenpercent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, let's wait for CI and merge

@bottler
Copy link
Contributor

bottler commented Jan 16, 2025

@tenpercent @johnnynunez Not sure if you see the errors on the CI? Like this on 6.1 and 6.2

cp: cannot stat 'all-dist/*torch2.5.1+rocm6.2*_facebookresearch/*.whl': No such file or directory

image

@tenpercent
Copy link
Contributor

@johnnynunez also it looks like rocm6.3 build hasn't actually built the flash attention extension
https://github.com/facebookresearch/xformers/actions/runs/12795689183/job/35728108356?pr=1192
typically the number of sources is around 600

@tenpercent
Copy link
Contributor

@johnnynunez I think the root cause is there is no torch wheel uploaded for rocm6.3 on download.pytorch.org. It'll appear there in some time. I think, for a meantime, we can build at least for rocm6.2

@johnnynunez
Copy link
Author

johnnynunez commented Jan 16, 2025

@johnnynunez I think the root cause is there is no torch wheel uploaded for rocm6.3 on download.pytorch.org. It'll appear there in some time. I think, for a meantime, we can build at least for rocm6.2

I ask to pytorch team, and I can confirm that:
https://dev-discuss.pytorch.org/t/pytorch-2-6-release-branch-cut-for-pytorch-core-is-completed/2656/5?u=johnnynunez

rocm6.3 is coming with pytorch 2.7
I think that we can keep it for build but not for upload wheels.
What do you think?

It's coming with pytorch 2.7
@tenpercent
Copy link
Contributor

tenpercent commented Jan 16, 2025

@johnnynunez I think the root cause is there is no torch wheel uploaded for rocm6.3 on download.pytorch.org. It'll appear there in some time. I think, for a meantime, we can build at least for rocm6.2

I ask to pytorch team, and I can confirm that: https://dev-discuss.pytorch.org/t/pytorch-2-6-release-branch-cut-for-pytorch-core-is-completed/2656/5?u=johnnynunez

rocm6.3 is coming with pytorch 2.7 I think that we can keep it for build but not for upload wheels. What do you think?

I think right now we need the torch wheel built for rocm6.3 as a dependency to build the xformers wheel for rocm6.3. Even though the cpp extension for ROCm doesn't use any API specific to rocm6.3
The alternative might be to use a torch wheel built for rocm6.2 but then there would be runtime warning about torch version mismatch
upd: in that place

msg = f"""xFormers was built for:
PyTorch {self.build_info.torch_version} with CUDA {self.build_info.cuda_version} (you have {torch.__version__})
Python {self.build_info.python_version} (you have {platform.python_version()})"""

@johnnynunez
Copy link
Author

johnnynunez commented Jan 16, 2025

@johnnynunez I think the root cause is there is no torch wheel uploaded for rocm6.3 on download.pytorch.org. It'll appear there in some time. I think, for a meantime, we can build at least for rocm6.2

I ask to pytorch team, and I can confirm that: https://dev-discuss.pytorch.org/t/pytorch-2-6-release-branch-cut-for-pytorch-core-is-completed/2656/5?u=johnnynunez
rocm6.3 is coming with pytorch 2.7 I think that we can keep it for build but not for upload wheels. What do you think?

I think right now we need the torch wheel built for rocm6.3 as a dependency to build the xformers wheel for rocm6.3. Even though the cpp extension for ROCm doesn't use any API specific to rocm6.3 The alternative might be to use a torch wheel built for rocm6.2 but then there would be a runtime warning about torch version mismatch

Maybe maintain this philosophy. Why did I do this? In my case, rocm is better and better in each version supporting new features and fewer errors builds. For example, I went from 364 compilation errors with rocm6.1 to 20 with rocm6.2 to 7 with rocm6.3 in flash attention for my w7900 dual slot

PY_ROCM = [(PY_VERSIONS[-1], ROCM_VERSIONS[-2])] # Always last version is upcoming version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: rocm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants