fix(backend): revert non-blocking device transfer #6624
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
In #6490 we enabled non-blocking torch device transfers throughout the model manager's memory management code. When using this torch feature, torch attempts to wait until the tensor transfer has completed before allowing any access to the tensor. Theoretically, that should make this a safe feature to use.
This provides a small performance improvement but causes race conditions in some situations. Specific platforms/systems are affected, and complicated data dependencies can make this unsafe.
non_blocking
#6549.On my system, I haven't experience any issues with generation, but targeted testing of non-blocking ops did expose a race condition when moving tensors from CUDA to CPU.
One workaround is to use torch streams with manual sync points. Our application logic is complicated enough that this would be a lot of work and feels ripe for edge cases and missed spots.
Much safer is to fully revert non-blocking - which is what this change does.
Test script demonstrating CUDA -> CPU race condition
This script induces the race condition. The tensor is different immediately after a device transfer and after waiting a couple seconds for torhc to sync. For me, I reliably get an inconsistency on the second GPU -> CPU transfer with
non_blocking=True
.Related Issues / Discussions
Closes #6613
QA Instructions
I have tried these combinations of models and had no issues. I don't think this change introduces any changes to behaviours. It's a one-shot revert of #6490 and #6549.
SDXL
SD1.5
Merge Plan
We'll do a bugfix release with this once merged.
Checklist