Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flux-dev Double RAM usage on Apple Silicon #1220

Open
d-z-m opened this issue Nov 17, 2024 · 2 comments
Open

flux-dev Double RAM usage on Apple Silicon #1220

d-z-m opened this issue Nov 17, 2024 · 2 comments

Comments

@d-z-m
Copy link

d-z-m commented Nov 17, 2024

Problem

I'm trying to generate images w/ flux-dev fp16 safetensors, and running into unexpected memory usage issues. From what I understand, flux-dev should be able to run w/under 24GB VRAM utilization, instead I'm seeing close to 60 GB allocated(before any image generation takes place) on my 36GB M3 macbook w/unified memory(24GB or so in swap). I could have some bad assumptions about RAM requirements, though. Also, I'm running Kobold with no GGUF model for text completion, only flux.

loading tensors from /Users/username/flux/clip_l.safetensors
loading tensors from /Users/username/flux/t5xxl_fp16.safetensors
unknown tensor 'text_encoders.t5xxl.transformer.encoder.embed_tokens.weight | f16 | 2 [4096, 32128, 1, 1, 1]' in model file
loading tensors from /Users/username/flux/ae.safetensors
loading tensors from /Users/username/flux/flux1-dev.safetensors
total params memory size = 54879.10MB (VRAM 45560.27MB, RAM 9318.83MB): clip 9318.83MB(RAM), unet 45400.27MB(VRAM), vae 160.00MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
loading model from '/Users/username/flux/flux1-dev.safetensors' completed, taking 16.25s
running in Flux FLOW mode
finished loaded fileLoad Image Model OK: True
Embedded KoboldAI Lite loaded.
Embedded API docs loaded.
Embedded SDUI loaded.

Perhaps has something to do with the way stable-diffusion.cpp handles model initialization?

@stduhpf
Copy link

stduhpf commented Nov 17, 2024

Flux-dev original weights are not fp16, but bf16. Bf16 is not supported by sdcpp, so it's converted to fp32 (this is a lossless conversion), which takes twice the amount of memory.

https://github.com/leejet/stable-diffusion.cpp/blob/ac54e0076052a196b7df961eb1f792c9ff4d7f22/model.cpp#L1626C21-L1629C22

@LostRuins
Copy link
Owner

LostRuins commented Nov 18, 2024

You can either pre-convert the weights to a smaller GGUF quant, or you can load with the "Compress Weights" toggle enabled (that will do a runtime quant to q4, caution it will be slow)

This is an all in one flux model which includes T5 and Clip L: https://huggingface.co/Comfy-Org/flux1-dev/blob/main/flux1-dev-fp8.safetensors

Alternatively, you can grab the text encoders separately and use https://huggingface.co/leejet/FLUX.1-dev-gguf/tree/main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants