You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to generate images w/ flux-dev fp16 safetensors, and running into unexpected memory usage issues. From what I understand, flux-dev should be able to run w/under 24GB VRAM utilization, instead I'm seeing close to 60 GB allocated(before any image generation takes place) on my 36GB M3 macbook w/unified memory(24GB or so in swap). I could have some bad assumptions about RAM requirements, though. Also, I'm running Kobold with no GGUF model for text completion, only flux.
loading tensors from /Users/username/flux/clip_l.safetensors
loading tensors from /Users/username/flux/t5xxl_fp16.safetensors
unknown tensor 'text_encoders.t5xxl.transformer.encoder.embed_tokens.weight | f16 | 2 [4096, 32128, 1, 1, 1]' in model file
loading tensors from /Users/username/flux/ae.safetensors
loading tensors from /Users/username/flux/flux1-dev.safetensors
total params memory size = 54879.10MB (VRAM 45560.27MB, RAM 9318.83MB): clip 9318.83MB(RAM), unet 45400.27MB(VRAM), vae 160.00MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
loading model from '/Users/username/flux/flux1-dev.safetensors' completed, taking 16.25s
running in Flux FLOW mode
finished loaded fileLoad Image Model OK: True
Embedded KoboldAI Lite loaded.
Embedded API docs loaded.
Embedded SDUI loaded.
Flux-dev original weights are not fp16, but bf16. Bf16 is not supported by sdcpp, so it's converted to fp32 (this is a lossless conversion), which takes twice the amount of memory.
You can either pre-convert the weights to a smaller GGUF quant, or you can load with the "Compress Weights" toggle enabled (that will do a runtime quant to q4, caution it will be slow)
Problem
I'm trying to generate images w/ flux-dev fp16 safetensors, and running into unexpected memory usage issues. From what I understand, flux-dev should be able to run w/under 24GB VRAM utilization, instead I'm seeing close to 60 GB allocated(before any image generation takes place) on my 36GB M3 macbook w/unified memory(24GB or so in swap). I could have some bad assumptions about RAM requirements, though. Also, I'm running Kobold with no GGUF model for text completion, only flux.
Perhaps has something to do with the way stable-diffusion.cpp handles model initialization?
The text was updated successfully, but these errors were encountered: