-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(backend): mps should not use non_blocking
#6549
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We can get black outputs when moving tensors from CPU to MPS. It appears MPS to CPU is fine. See: - pytorch/pytorch#107455 - https://discuss.pytorch.org/t/should-we-set-non-blocking-to-true/38234/28 Changes: - Add properties for each device on `TorchDevice` as a convenience. - Add `get_non_blocking` static method on `TorchDevice`. This utility takes a torch device and returns the flag to be used for non_blocking when moving a tensor to the device provided. - Update model patching and caching APIs to use this new utility. Fixes: #6545
psychedelicious
requested review from
lstein,
blessedcoolant,
brandonrising,
RyanJDick and
hipsterusername
as code owners
June 27, 2024 09:18
github-actions
bot
added
python
PRs that change python files
backend
PRs that change backend files
labels
Jun 27, 2024
RyanJDick
approved these changes
Jun 27, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes make sense to me.
I ran the following tests:
- On CUDA
- Generated a handful of images with LoRA and IP-Adapter. Everything looked good. I confirmed that
non_blocking=True
was still being applied.
- Generated a handful of images with LoRA and IP-Adapter. Everything looked good. I confirmed that
- On MPS
- Confirmed that I could reproduce the problem on
main
- Upgraded to this branch. Generated 5 images with LoRA + IP-Adapter. No issues.
- Confirmed that I could reproduce the problem on
psychedelicious
added a commit
that referenced
this pull request
Jul 15, 2024
In #6490 we enabled non-blocking torch device transfers throughout the model manager's memory management code. When using this torch feature, torch attempts to wait until the tensor transfer has completed before allowing any access to the tensor. Theoretically, that should make this a safe feature to use. This provides a small performance improvement but causes race conditions in some situations. Specific platforms/systems are affected, and complicated data dependencies can make this unsafe. - Intermittent black images on MPS devices - reported on discord and #6545, fixed with special handling in #6549. - Intermittent OOMs and black images on a P4000 GPU on Windows - reported in #6613, fixed in this commit. On my system, I haven't experience any issues with generation, but targeted testing of non-blocking ops did expose a race condition when moving tensors from CUDA to CPU. One workaround is to use torch streams with manual sync points. Our application logic is complicated enough that this would be a lot of work and feels ripe for edge cases and missed spots. Much safer is to fully revert non-locking - which is what this change does.
3 tasks
psychedelicious
added a commit
that referenced
this pull request
Jul 15, 2024
In #6490 we enabled non-blocking torch device transfers throughout the model manager's memory management code. When using this torch feature, torch attempts to wait until the tensor transfer has completed before allowing any access to the tensor. Theoretically, that should make this a safe feature to use. This provides a small performance improvement but causes race conditions in some situations. Specific platforms/systems are affected, and complicated data dependencies can make this unsafe. - Intermittent black images on MPS devices - reported on discord and #6545, fixed with special handling in #6549. - Intermittent OOMs and black images on a P4000 GPU on Windows - reported in #6613, fixed in this commit. On my system, I haven't experience any issues with generation, but targeted testing of non-blocking ops did expose a race condition when moving tensors from CUDA to CPU. One workaround is to use torch streams with manual sync points. Our application logic is complicated enough that this would be a lot of work and feels ripe for edge cases and missed spots. Much safer is to fully revert non-locking - which is what this change does.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
We can get black outputs when moving tensors from CPU to MPS. It appears MPS to CPU is fine. See:
Changes:
TorchDevice
as a convenience.get_non_blocking
static method onTorchDevice
. This utility takes a torch device and returns the flag to be used for non_blocking when moving a tensor to the device provided.Related Issues / Discussions
Fixes: #6545
QA Instructions
For both MPS and CUDA:
Merge Plan
We have pagination merged into
main
but aren't ready for that to be released.Once this fix is tested and merged, we will probably want to create a
v4.2.5post1
branch off thev4.2.5
tag, cherry-pick the fix and do a release from the hotfix branch.Checklist