FLUX

Black Forest Labs FLUX.1

FLUX.1 family consists of 3 variations:

Pro
Model weights are NOT released, model is available only via Black Forest Labs
Dev
Open-weight, guidance-distilled from Pro variation, available for non-commercial applications
Schnell
Open-weight, timestep-distilled from Dev variation, available under Apache2.0 license

Additionally SD.Next includes pre-quantized variations of FLUX.1 Dev variation: qint8, qint4 and nf4

To use either any variations or quantizations, simply select it from Networks -> Reference
and model will be auto-downloaded on first use

Notes:

FLUX.1 Dev variant is a gated model, you need to accept the terms and conditions to use it
Do not download any of the base model manually, use built-in downloader!

Tip

Pick variant that uses less memory as model in original form has very high requirements
Set appropriate offloading setting before loading the model to avoid out-of-memory errors

Notes

FLUX.1 is based on Flow-matching scheduling, only supported sampler is Euler Flow Match (Default)
Setting any other sampler will be ignored
Use of FLUX.1 LoRAs is included with limited support
Not all LoRAs are supported with more variations coming soon
To enable image previews during generate, set
Settings -> Live Preview -> Method to TAESD
To further speed up generation, you can disable "full quality"
which triggers use of TAESD instead of full VAE to decode final image
To use prompt attention syntax with FLUX.1, set
Settings -> Execution -> Prompt attention to xhinker

Quantization

Quantization can significantly reduce memory requirements, but it can also slightly reduce quality of outputs
Also, different quantization options are very platform and GPU dependent and are not supported on all platforms

qint8 and qint8 quantization require optimum-quanto which will be auto-installed on first use
note: qint quantization requires torch==2.4.0
note: is not compatible with balanced offload
nf4 quantization requires bitsandbytes which will be auto-installed on first use
note: bitsandbytes package is not compatible with all platforms and gpus

Another option is NNCF which performs quantization during model load (instead of having a pre-quantized model)
Advantage of NNCF is that does work with both sequential and balanced offload and works with on any platform

Example image with both dev and schnell variations and different transformer quantization options
flux-transformer

Offloading

FLUX.1 is a massive model at ~32GB and as such it is recommended to use offloading
To set offloading, see Settings -> Diffusers -> Model offload mode:

Balanced
Recommended for compatible high VRAM GPUs
Faster but requires compatible platform and sufficient VRAM
Not compatible with Quanto qint quantization
Sequential
Recommended for low VRAM GPUs Much slower but allows FLUX.1 to run on GPUs with 6GB VRAM
Not compatible with Quanto qint or BitsAndBytes nf4 quantization
Model
High compatibility than either balanced and sequential, but lesser savings

Performance

Performance and memory usage of different FLUX.1 variations:

dtype	time (sec)	performance	memory	offload	note
bf16			>32 GB	none	*1
bf16	50.47	0.40 it/s		balanced	*2
bf16	94.28	0.21 it/s	1.89 GB	sequential
nf4	14.69	1.36 it/s	17.92 GB	none
nf4	21.02	0.95 it/s		balanced	*2
nf4				sequential	*3
qint8	15.42	1.30 it/s	18.85 GB	none
qint8				balanced	*4
qint8				sequential	*5
qint4	18.37	1.09 it/s	11.38 GB	none
qint4				balanced	*4
qint4				sequential	*5

Notes:

*1: Memory usage exceeeds 32GB and is not recommended
*2: Balanced offload VRAM usage is not included since it depends on desired threshold
*3: BitsAndBytes nf4 quantization is not compatible with sequential offload

Error: Blockwise quantization only supports 16/32-bit floats
*4: Quanto qint quantization is not compatible with balanced offload

Error: QBytesTensor.new() missing 5 required positional arguments
*5: Quanto qint quantization is not compatible with sequential offload

Error: Expected all tensors to be on the same device

Fine-tunes

Diffusers

There are already many FLUX.1 unofficial variations available
Any Diffuser-based variation can be downloaded and loaded into SD.Next using Models -> Huggingface -> Download
For example, interesting variation is a merge of Dev and Schnell variations by sayakpaul: sayakpaul/FLUX.1-merged

LoRAs

SD.Next includes support for FLUX.1 LoRAs

Since LoRA keys vary singnificantly between tools used to train LoRA as well as LoRA types,
support for additional LoRAs will be added as needed - please report any non-functional LoRAs!

All-in-one

Note: Loading of all-in-one single-file safetensors requires SD.Next.dev branch AND Diffusers.dev

Typical all-in-one safetensors file is over 20GB in size and contains full model with transformer, both text-encoders and VAE
Since text encoders and VAE are same between all FLUX.1 models, using all-in-one safetensors is not recommended

Unet/Transformer

Unet/Transformer component of FLUX.1 is a typical model fine-tune and is around 11GB in size

To load a Unet/Transformer safetensors file:

Download safetensors file from desired source and place it in models/UNET folder
example: FastFlux Unchained
Load FLUX.1 model as usual and then
Replace transformer with one in desired safetensors file using:
Settings -> Execution & Models -> UNet

Tip

For convience, you can add that setting to your quicksettings by adding Settings -> User Interface -> Quicksettings list -> sd_unet

Text Encoder

SD.Next allows changing optional text encoder on-the-fly

Go to Settings -> Models -> Text encoder and select the desired text encoder
T5 enhances text rendering and some details, but its otherwise very lightly used and optional
Loading lighter T5 will greatly decrease model resource usage, but may not be compatible with all offloading modes

Note: T5 can only be loaded from predefined list, it does not support loading from manually downloaded file(s)

Example image with different encoder quantization options
flux-encoder

Tip

If you want to frequently switch between text encoders, you can add that setting to quicksettings by adding Settings -> User Interface -> Quicksettings list -> sd_text_encoder

VAE

SD.Next allows changing VAE model used by FLUX.1 on-the-fly
There are no alternative VAE models released, so this setting is mostly for future use

Scheduler

As mentioned, FLUX.1 at the moment supports only Euler FlowMatch scheduler, additional schedulers will be added in the future
Due to specifics of flow-matching methods, number of steps also has strong influence on the image composition, not just on the way how its resolved

Example image at different steps
flux-steps

Additionally, sampler can be tuned with shift parameter which roughly modifies how long does model spend on composition vs actual diffusion

Example image with different sampler shift values flux-shift

Tip

If you want to frequently switch between text encoders, you can add that setting to quicksettings by adding Settings -> User Interface -> Quicksettings list -> sd_vae

Example quicksettings

ToDo / Future

Additional core support will be added in diffusers==0.31 and subsequently included in SD.Next:

Additional LoRAs: in dev branch
Additional loading of individual safetensors: in dev branch
Diffusers generic quantization: https://github.com/huggingface/diffusers/issues/9174
IP-Adapter: https://huggingface.co/XLabs-AI/flux-ip-adapter
ControlNet: https://github.com/huggingface/diffusers/pull/9175 https://github.com/huggingface/diffusers/issues/9301
Inpaint and Img2Img: https://github.com/huggingface/diffusers/pull/9135
Differential diffusion: https://github.com/huggingface/diffusers/pull/9268
Additional schedulers
GGUF support: https://github.com/city96/ComfyUI-GGUF/tree/main/tools

Provide feedback

Saved searches

Use saved searches to filter your results more quickly