Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Manually load/unload checkpoints into GPU #16129

Open
1 task done
AnthoneoJ opened this issue Jul 2, 2024 · 7 comments
Open
1 task done

[Feature Request]: Manually load/unload checkpoints into GPU #16129

AnthoneoJ opened this issue Jul 2, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@AnthoneoJ
Copy link

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

I want to achieve the following either programmatically or via API:

  • List of all checkpoints and their status (loaded or unloaded)
  • Load a checkpoint
  • Unload a checkpoint

Proposed workflow

  1. Retrieve the available checkpoints and their status via HTTP request e.g. http://0.0.0.0:7860/sdapi/v1/checkpoint-status
  2. Load a specified checkpoint via HTTP request e.g http://0.0.0.0:7860/sdapi/v1/load-checkpoint?checkpointid=abc123
  3. Unload a specified checkpoint via HTTP request e.g http://0.0.0.0:7860/sdapi/v1/unload-checkpoint?checkpointid=abc123
    The checkpoint parameter passed in 2 and 3 should be obtained from 1. For example, the object returned in 1 could contain a "uniqueid" key.

Additional information

I've made a fork but it only loads and unloads the currently selected checkpoint. The relevant endpoints are unloadmodel, loadmodel and get_model_status in api.py.
https://github.com/AnthoneoJ/stable-diffusion-webui

@AnthoneoJ AnthoneoJ added the enhancement New feature or request label Jul 2, 2024
@w-e-w
Copy link
Collaborator

w-e-w commented Jul 2, 2024

why?

@AnthoneoJ
Copy link
Author

why?

In my use case, the machine is running multiple AI services (one of them being this webui). There are several machines that do the same. So the checkpoints should be loaded upon machine boot up and unloaded if memory is needed for another AI service, etc.

@w-e-w
Copy link
Collaborator

w-e-w commented Jul 2, 2024

modle loading is a mess in webui
I suggest you just settle with
Maximum number of checkpoints loaded at the same time to 1
and Only keep one model on device True

image

the 2 api's endpoints to be honest works more like putting web UI to sleep and wake it up from sleep
/sdapi/v1/unload-checkpoint and /sdapi/v1/reload-checkpoint

yo ucan put it to sleep and save VRAM and wake it before use

you need to manually wake it before use (bad design on our part)

there is an issue with /sdapi/v1/unload-checkpoint, if the Maximum number of checkpoints loaded at the same time is > 1, the sleep will only send the current main model to ram
there's no distinguish between which models it just unloads the main model, it only cares about the main model

example if you have Maximum number of checkpoints loaded at the same time set to 3 Only keep one model on device False after switch model or more 3 times there will be 3 modles loaded
now if you use /sdapi/v1/unload-checkpoint only 1 model will be unloaded, 2 will be still loaded


change model (load model) can be doen by post to /sdapi/v1/options with

{
    "sd_model_checkpoint": "YOUR model"
}

or you could use add override_settings arg in payload of txt2img / img2img api call
this method is generally more reliable when dealing with multiple users

"override_settings": {
     "sd_model_checkpoint": "YOUR model"
}

you can get a list all models by using /sdapi/v1/sd-models


it should is possible to improve it but some people need to wanted enough to work on that future
it might even be possible to implement this as an extension

I might trying to work on this but no guarantees



initially I was confused because I somehow misread your request as you wanting to load every model in sequence then unload them for no apparent reason

@w-e-w
Copy link
Collaborator

w-e-w commented Jul 2, 2024

these can aslo help
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Command-Line-Arguments-and-Settings
--nowebui --skip-load-model-at-start

@AnthoneoJ
Copy link
Author

Ah, thanks! I knew bits and pieces from inspecting the codebase. This puts them all together. One more thing before I can go off on my own: how do I know whether a model is currently loaded or not? At the moment, I'm inferring this from sd_models.model_data.sd_model. If it's None, the model is unloaded, and vice versa.

@w-e-w
Copy link
Collaborator

w-e-w commented Jul 3, 2024

sd_models.model_data.sd_model
yeah I think that's pretty much the place you want to look

however if you also used Checkpoints to cache in RAM > 0 then
I think you also want to inspect shared.opts.sd_checkpoint_cache and checkpoints_loaded


if you have improvements that you think that can benefit everyone then don't hesitate to contribute

@randelreiss
Copy link

Ollama framework has a really handy environment and API accessible variable:

OLLAMA_KEEP_ALIVE=[# of seconds] | [xM] | 0

I think it's mostly used for people who want the last loaded chat model to stay loaded longer. But I use it set to zero to keep the GPU VRAM as empty as possible as soon as possible. This is because I have many users that mostly use the GPU for chat and occasionally for Text-to-speech and SD image creation - loading up the GPU VRAM. Unfortunately SDWeb keeps its last model loaded indefinitely. It would be great if SDWeb had a similar Keep Alive option to let us decide how long to keep the last model loaded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants