Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover gracefully from VRAM out of memory errors #5793

Closed
wants to merge 1 commit into from

Conversation

lstein
Copy link
Collaborator

@lstein lstein commented Feb 24, 2024

What type of PR is this? (check all applicable)

  • Refactor
  • Feature
  • Bug Fix
  • Optimization
  • Documentation Update
  • Community Node Submission

Have you discussed this change with the InvokeAI team?

  • Yes
  • No, because: straightforward fix

Have you updated all relevant documentation?

  • Yes
  • No

Description

At least on my system, if the model manager runs out of VRAM while moving a model into the GPU, the partial model gets stuck in VRAM and can't easily be removed. This makes the model unusable, and uses precious VRAM.

I encountered this when playing with large language models on the same system, but I suspect it will also happen if a video game is being played. I tried various approaches to recover from this state, including clearing the vram cache, deleting the model object, and running garbage collection, but without success.

This PR avoids the issue by implementing a check for sufficient available VRAM before trying to move a model to a CUDA GPU. If there is insufficient room, it raises a torch.cuda.OutOfMemoryError. This message is propagated to the front end. If more VRAM becomes available later, invocations will begin to work again.

Note: This pull request is against main. The model manager code has changed a bit, so I'm making a separate PR for next.

Related Tickets & Documents

  • Related Issue #
  • Closes #

QA Instructions, Screenshots, Recordings

Launch InvokeAI web service and another application that uses a lot of GPU VRAM. For my testing, I used ollama with a large model loaded. Run a generation and see if it generates an out of memory error. Try this repeatedly - should get the same error each time. Now kill the other application to free up VRAM and try to generate an image. It should work!

Merge Plan

Can merge when approved.

Added/updated tests?

  • Yes
  • No : please replace this line with details on why tests
    have not been included

[optional] Are there any post deployment tasks we need to perform?

@github-actions github-actions bot added python PRs that change python files backend PRs that change backend files labels Feb 24, 2024
Copy link
Member

@hipsterusername hipsterusername left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not merge this against next only?

@lstein
Copy link
Collaborator Author

lstein commented Feb 24, 2024

There's a separate request for next because the MM code changed a bit. If nobody has complained about this bug, it's probably safe to leave it unmerged to main.

@psychedelicious
Copy link
Collaborator

Closing as we aren't merging anything to main if we can help it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend PRs that change backend files python PRs that change python files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants