Recover gracefully from VRAM out of memory errors #5793
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What type of PR is this? (check all applicable)
Have you discussed this change with the InvokeAI team?
Have you updated all relevant documentation?
Description
At least on my system, if the model manager runs out of VRAM while moving a model into the GPU, the partial model gets stuck in VRAM and can't easily be removed. This makes the model unusable, and uses precious VRAM.
I encountered this when playing with large language models on the same system, but I suspect it will also happen if a video game is being played. I tried various approaches to recover from this state, including clearing the vram cache, deleting the model object, and running garbage collection, but without success.
This PR avoids the issue by implementing a check for sufficient available VRAM before trying to move a model to a CUDA GPU. If there is insufficient room, it raises a
torch.cuda.OutOfMemoryError
. This message is propagated to the front end. If more VRAM becomes available later, invocations will begin to work again.Note: This pull request is against
main
. The model manager code has changed a bit, so I'm making a separate PR fornext
.Related Tickets & Documents
QA Instructions, Screenshots, Recordings
Launch InvokeAI web service and another application that uses a lot of GPU VRAM. For my testing, I used ollama with a large model loaded. Run a generation and see if it generates an out of memory error. Try this repeatedly - should get the same error each time. Now kill the other application to free up VRAM and try to generate an image. It should work!
Merge Plan
Can merge when approved.
Added/updated tests?
have not been included
[optional] Are there any post deployment tasks we need to perform?