Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Distributed] Bump torch version #1225

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

[Distributed] Bump torch version #1225

wants to merge 3 commits into from

Conversation

kwen2501
Copy link
Contributor

Distributed inference (pipeline parallel part, to be specific) requires two features landed in pytorch nightly:

Thus bumping the torch version to 2.6.0.dev20240925.

Tested locally.

Cc: @Jack-Khuu @lessw2020

Copy link

pytorch-bot bot commented Sep 28, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1225

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 5e3c3ce with merge base 24d00ea (image):

NEW FAILURE - The following job has failed:

  • pull / test-torchao-experimental (macos-14-xlarge) (gh)
    torch._dynamo.exc.Unsupported: 'skip function dequantize_per_channel_group in file /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/ao/quantization/fx/_decomposed.py'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 28, 2024
Copy link
Contributor

@lessw2020 lessw2020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to bump the nightly vision version to avoid an error.
i.e. this:
VISION_NIGHTLY_VERSION=dev20240901

Bumping to same as PYTORCH_NIGHTLY_VERSION fixes this error for me.
See inline comment.

@Jack-Khuu
Copy link
Contributor

Vision and PT being on the same nightly pin usually solves this

@Jack-Khuu
Copy link
Contributor

I'll also be bumping the version over on ExecuTorch pytorch/executorch#5549

I can match you at 925

@kwen2501
Copy link
Contributor Author

@metascroy Wondering if you could have a look at the CI issue? Thanks!

@metascroy
Copy link
Contributor

@metascroy Wondering if you could have a look at the CI issue? Thanks!

At an initial glance, I'm not sure why it is failing. I'll investigate more after lunch, but I'm wondering if PyTorch changed how custom ops work in the pin bump.

@kwen2501
Copy link
Contributor Author

Cc @zou3519 any recent changes in custom op support that may be relevant to the CI failure here?

@zou3519
Copy link

zou3519 commented Sep 30, 2024

Maybe @jerryzh168 ?

@metascroy
Copy link
Contributor

@metascroy Wondering if you could have a look at the CI issue? Thanks!

At an initial glance, I'm not sure why it is failing. I'll investigate more after lunch, but I'm wondering if PyTorch changed how custom ops work in the pin bump.

@kwen2501 I figured out the issue.

ExecuTorch uses PyTorch 9/1, so when ExecuTorch installs, it installs 9/1 and the two different PyTorch versions cause an issue. In #1235, I bump the PT pin, but remove the ET parts of the test and it passes.

So before PT pin can be updated, it needs to be updated in ET first and the PT/ET pin then need to be updated together in torchchat cc @Jack-Khuu

@kwen2501
Copy link
Contributor Author

Thanks @metascroy !

I'll also be bumping the version over on ExecuTorch pytorch/executorch#5549

I can match you at 925

@Jack-Khuu Looks like this bump here needs to wait for your bump in ET to land first :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants