FLUX.1-dev runs very slow on 3090 #138

jerrymatjila · 2024-09-03T16:48:46Z

black-forest-labs/FLUX.1-dev runs very slow. it takes about 15min to generate 1344x768 (wxh) image. Has anyone experienced the same or is it just me.

    pipe = FluxPipeline.from_pretrained(args.model, torch_dtype=torch.bfloat16)
    #pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power
    pipe.enable_sequential_cpu_offload()
    pipe.vae.enable_slicing()
    pipe.vae.enable_tiling()
    pipe.to(torch.float16) # casting here instead of in the pipeline constructor because doing so in the constructor loads all models into CPU memory at once

    prompt = args.prompt
    image = pipe(
        prompt,
        height=args.height,
        width=args.width,
        guidance_scale=0.0,
        num_inference_steps=args.num_inference_steps,
        max_sequence_length=512,
        generator=torch.Generator("cpu").manual_seed(0)
    ).images[0]
    Path(args.output).parent.mkdir(parents=True, exist_ok=True)
    image.save(args.output)

args.num_inference_steps=50

The text was updated successfully, but these errors were encountered:

hungho77 · 2024-09-05T05:11:17Z

if having enough vram gpu, try to comment this line pipe.enable_sequential_cpu_offload()

aproust08 · 2024-09-14T06:27:20Z

@jerrymatjila, I'm having the same issue with my 3090 card. Were you able to fix it?
Thx

JonasLoos · 2024-09-30T20:37:55Z

The 24GB VRAM should just be enough to keep the transformer model fully in VRAM, that means you can use pipe.enable_model_cpu_offload() instead of pipe.enable_sequential_cpu_offload(). Maybe you don't even need the vae slicing/tiling.

I.e.:

pipe = FluxPipeline.from_pretrained(args.model, torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()  # save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power

prompt = args.prompt
image = pipe(
    prompt,
    height=args.height,
    width=args.width,
    guidance_scale=0.0,
    num_inference_steps=args.num_inference_steps,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
Path(args.output).parent.mkdir(parents=True, exist_ok=True)
image.save(args.output)

If that still uses too more VRAM than available (see task manager), you can look into quantizing the model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FLUX.1-dev runs very slow on 3090 #138

FLUX.1-dev runs very slow on 3090 #138

jerrymatjila commented Sep 3, 2024

hungho77 commented Sep 5, 2024

aproust08 commented Sep 14, 2024

JonasLoos commented Sep 30, 2024

FLUX.1-dev runs very slow on 3090 #138

FLUX.1-dev runs very slow on 3090 #138

Comments

jerrymatjila commented Sep 3, 2024

hungho77 commented Sep 5, 2024

aproust08 commented Sep 14, 2024

JonasLoos commented Sep 30, 2024