Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FLUX.1-dev runs very slow on 3090 #138

Open
jerrymatjila opened this issue Sep 3, 2024 · 3 comments
Open

FLUX.1-dev runs very slow on 3090 #138

jerrymatjila opened this issue Sep 3, 2024 · 3 comments

Comments

@jerrymatjila
Copy link

black-forest-labs/FLUX.1-dev runs very slow. it takes about 15min to generate 1344x768 (wxh) image. Has anyone experienced the same or is it just me.

    pipe = FluxPipeline.from_pretrained(args.model, torch_dtype=torch.bfloat16)
    #pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power
    pipe.enable_sequential_cpu_offload()
    pipe.vae.enable_slicing()
    pipe.vae.enable_tiling()
    pipe.to(torch.float16) # casting here instead of in the pipeline constructor because doing so in the constructor loads all models into CPU memory at once

    prompt = args.prompt
    image = pipe(
        prompt,
        height=args.height,
        width=args.width,
        guidance_scale=0.0,
        num_inference_steps=args.num_inference_steps,
        max_sequence_length=512,
        generator=torch.Generator("cpu").manual_seed(0)
    ).images[0]
    Path(args.output).parent.mkdir(parents=True, exist_ok=True)
    image.save(args.output)

args.num_inference_steps=50

@hungho77
Copy link

hungho77 commented Sep 5, 2024

if having enough vram gpu, try to comment this line pipe.enable_sequential_cpu_offload()

@aproust08
Copy link

@jerrymatjila, I'm having the same issue with my 3090 card. Were you able to fix it?
Thx

@JonasLoos
Copy link

The 24GB VRAM should just be enough to keep the transformer model fully in VRAM, that means you can use pipe.enable_model_cpu_offload() instead of pipe.enable_sequential_cpu_offload(). Maybe you don't even need the vae slicing/tiling.

I.e.:

pipe = FluxPipeline.from_pretrained(args.model, torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()  # save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power

prompt = args.prompt
image = pipe(
    prompt,
    height=args.height,
    width=args.width,
    guidance_scale=0.0,
    num_inference_steps=args.num_inference_steps,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
Path(args.output).parent.mkdir(parents=True, exist_ok=True)
image.save(args.output)

If that still uses too more VRAM than available (see task manager), you can look into quantizing the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants