Skip to content
Vladimir Mandic edited this page Jun 13, 2024 · 4 revisions



Note: SD3 is under gated access: go to and fill the form to get access

Load Model


SD Model consists of:

  • MMDiT (multi-modal diffusion transformer)
    Note: "medium" primarily refers to number of parameters in MMDiT component: 2B
    StabilityAI may release smaller and/or larger variations as full SD3 has 8B parameters
  • VAE (variational autoencoder)
  • Multiple text encoders: CLIP-ViT/L, OpenCLIP-ViT/G, T5 Version 1.1
    TE3 (T5) is optional and used primarily to render text

Load using Reference Models

Select: Networks -> Models -> Reference -> StabilityAI Stable Diffusion 3 Medium


To allow access to the models from SDNext server get your Huggingface token from your huggingface profile -> settings -> access tokens and enter it in SDNext -> settings -> diffusers -> huggingface token


Alternatively, login to Huggingface CLI and use the token from there

source venv/bin/activate
venv/bin/huggingface-cli login

Load using Manually provided single-file

Download SD3 models from Huggingface


  • sd3_medium.safetensors: includes the MMDiT and VAE weights only, SD.Next will automatically load CLiP models as needed
  • sd3_medium_incl_clips.safetensors: includes all necessary weights except for the t5 text encoder


  • sd3_medium_incl_clips_t5xxlfp8.safetensors: contains all necessary weights and t5 fp8 variant support for this version is planned in the near-future due to nature of fp8 quantization packaged in the file
    t5 can be loaded/unloaded separately

Load Text Encoder

SD.Next allows changing optional text encoder on-the-fly

Go to settings -> models -> text encoder and select the desired text encoder
Default is None, supported are T5 FP8 and T5 FP16 (not recommended due to size)
T5 enhances text rendering and some details, but its otherwise very lightly used and optional
Loading T5 will greatly increase model resource usage and automatically enables sequential offloading


If you want to frequently switch between text encoders, you can add that setting to quicksettings


  • Mandatory parameters:
    Sampler: Default
    Note: SD3 uses custom sampler FlowMatchEulerDiscreteScheduler
    you can experiment with different samplers, but results are not guaranteed
  • StabilityAI recommended parameters:
    Resolution: 1024x1024, CFG scale: 7.0, Steps: 28


  • Add prompt attention parser
  • Add inpainting


Clone this wiki locally