Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(backend): mps should not use non_blocking #6549

Merged
merged 2 commits into from
Jun 27, 2024

Conversation

psychedelicious
Copy link
Collaborator

Summary

We can get black outputs when moving tensors from CPU to MPS. It appears MPS to CPU is fine. See:

Changes:

  • Add properties for each device on TorchDevice as a convenience.
  • Add get_non_blocking static method on TorchDevice. This utility takes a torch device and returns the flag to be used for non_blocking when moving a tensor to the device provided.
  • Update model patching and caching APIs to use this new utility.

Related Issues / Discussions

Fixes: #6545

QA Instructions

For both MPS and CUDA:

  • Generate at least 5 images using LoRAs
  • Generate at least 5 images using IP Adapters

Merge Plan

We have pagination merged into main but aren't ready for that to be released.

Once this fix is tested and merged, we will probably want to create a v4.2.5post1 branch off the v4.2.5 tag, cherry-pick the fix and do a release from the hotfix branch.

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable) @RyanJDick @lstein This feels testable but I'm not sure how.
  • Documentation added / updated (if applicable)

We can get black outputs when moving tensors from CPU to MPS. It appears MPS to CPU is fine. See:
- pytorch/pytorch#107455
- https://discuss.pytorch.org/t/should-we-set-non-blocking-to-true/38234/28

Changes:
- Add properties for each device on `TorchDevice` as a convenience.
- Add `get_non_blocking` static method on `TorchDevice`. This utility takes a torch device and returns the flag to be used for non_blocking when moving a tensor to the device provided.
- Update model patching and caching APIs to use this new utility.

Fixes: #6545
@github-actions github-actions bot added python PRs that change python files backend PRs that change backend files labels Jun 27, 2024
Copy link
Collaborator

@RyanJDick RyanJDick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes make sense to me.

I ran the following tests:

  • On CUDA
    • Generated a handful of images with LoRA and IP-Adapter. Everything looked good. I confirmed that non_blocking=True was still being applied.
  • On MPS
    • Confirmed that I could reproduce the problem on main
    • Upgraded to this branch. Generated 5 images with LoRA + IP-Adapter. No issues.

@RyanJDick RyanJDick merged commit aba1608 into main Jun 27, 2024
14 checks passed
@RyanJDick RyanJDick deleted the psyche/fix/mps-black-image branch June 27, 2024 14:11
psychedelicious added a commit that referenced this pull request Jul 15, 2024
In #6490 we enabled non-blocking torch device transfers throughout the model manager's memory management code. When using this torch feature, torch attempts to wait until the tensor transfer has completed before allowing any access to the tensor. Theoretically, that should make this a safe feature to use.

This provides a small performance improvement but causes race conditions in some situations. Specific platforms/systems are affected, and complicated data dependencies can make this unsafe.

- Intermittent black images on MPS devices - reported on discord and #6545, fixed with special handling in #6549.
- Intermittent OOMs and black images on a P4000 GPU on Windows - reported in #6613, fixed in this commit.

On my system, I haven't experience any issues with generation, but targeted testing of non-blocking ops did expose a race condition when moving tensors from CUDA to CPU.

One workaround is to use torch streams with manual sync points. Our application logic is complicated enough that this would be a lot of work and feels ripe for edge cases and missed spots.

Much safer is to fully revert non-locking - which is what this change does.
psychedelicious added a commit that referenced this pull request Jul 15, 2024
In #6490 we enabled non-blocking torch device transfers throughout the model manager's memory management code. When using this torch feature, torch attempts to wait until the tensor transfer has completed before allowing any access to the tensor. Theoretically, that should make this a safe feature to use.

This provides a small performance improvement but causes race conditions in some situations. Specific platforms/systems are affected, and complicated data dependencies can make this unsafe.

- Intermittent black images on MPS devices - reported on discord and #6545, fixed with special handling in #6549.
- Intermittent OOMs and black images on a P4000 GPU on Windows - reported in #6613, fixed in this commit.

On my system, I haven't experience any issues with generation, but targeted testing of non-blocking ops did expose a race condition when moving tensors from CUDA to CPU.

One workaround is to use torch streams with manual sync points. Our application logic is complicated enough that this would be a lot of work and feels ripe for edge cases and missed spots.

Much safer is to fully revert non-locking - which is what this change does.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend PRs that change backend files python PRs that change python files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug]: black image when using IP adapter after updating developer install
2 participants