Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to load flux.1-dev with enable_sequential_cpu_offload and use_fp8_t5_encoder (4090) #407

Open
WeiboXu opened this issue Dec 24, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@WeiboXu
Copy link

WeiboXu commented Dec 24, 2024

command:
CUDA_VISIBLE_DEVICES=4,5 torchrun --nproc_per_node=2 examples/flux_example.py --model /models/sfast_model/FLUX.1-dev --height 512 --width 512 --no_use_resolution_binning --pipefusion_parallel_degree 2 --ulysses_degree 1 --num_inference_steps 2 --warmup_steps 0 --prompt "A small dog" --tensor_parallel_degree 1 --use_fp8_t5_encoder --enable_sequential_cpu_offload

environment:
cuda: 12.2
Driver Version: 535.146.02
python3.10.12
torch: 2.5.1

log:
W1224 02:55:39.414000 5378 torch/distributed/run.py:793]
W1224 02:55:39.414000 5378 torch/distributed/run.py:793] *****************************************
W1224 02:55:39.414000 5378 torch/distributed/run.py:793] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W1224 02:55:39.414000 5378 torch/distributed/run.py:793] *****************************************
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.1.2+cu121 with CUDA 1201 (you have 2.5.1+cu124)
Python 3.10.13 (you have 3.10.12)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.1.2+cu121 with CUDA 1201 (you have 2.5.1+cu124)
Python 3.10.13 (you have 3.10.12)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
/usr/local/lib/python3.10/dist-packages/xformers/triton/softmax.py:30: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
@custom_fwd(cast_inputs=torch.float16 if _triton_softmax_fp16_enabled else None)
/usr/local/lib/python3.10/dist-packages/xformers/triton/softmax.py:87: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead.
def backward(
/usr/local/lib/python3.10/dist-packages/xformers/ops/swiglu_op.py:107: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
def forward(cls, ctx, x, w1, b1, w2, b2, w3, b3):
/usr/local/lib/python3.10/dist-packages/xformers/ops/swiglu_op.py:128: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead.
def backward(cls, ctx, dx5):
/usr/local/lib/python3.10/dist-packages/xformers/triton/softmax.py:30: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
@custom_fwd(cast_inputs=torch.float16 if _triton_softmax_fp16_enabled else None)
/usr/local/lib/python3.10/dist-packages/xformers/triton/softmax.py:87: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead.
def backward(
/usr/local/lib/python3.10/dist-packages/xformers/ops/swiglu_op.py:107: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
def forward(cls, ctx, x, w1, b1, w2, b2, w3, b3):
/usr/local/lib/python3.10/dist-packages/xformers/ops/swiglu_op.py:128: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead.
def backward(cls, ctx, dx5):
WARNING 12-24 02:55:43 [args.py:326] Distributed environment is not initialized. Initializing...
DEBUG 12-24 02:55:43 [parallel_state.py:179] world_size=-1 rank=-1 local_rank=-1 distributed_init_method=env:// backend=nccl
WARNING 12-24 02:55:43 [args.py:326] Distributed environment is not initialized. Initializing...
DEBUG 12-24 02:55:43 [parallel_state.py:179] world_size=-1 rank=-1 local_rank=-1 distributed_init_method=env:// backend=nccl
INFO 12-24 02:55:43 [config.py:120] Ring degree not set, using default value 1
INFO 12-24 02:55:43 [config.py:120] Ring degree not set, using default value 1
INFO 12-24 02:55:43 [config.py:164] Pipeline patch number not set, using default value 2
INFO 12-24 02:55:43 [config.py:164] Pipeline patch number not set, using default value 2
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 2/2 [00:05<00:00, 2.85s/it]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 2/2 [00:05<00:00, 2.86s/it]
Loading pipeline components...: 0%| | 0/7 [00:00<?, ?it/s]You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 0%| | 0/7 [00:00<?, ?it/s]You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 43%|█████████████████████▊ | 3/7 [00:00<00:01, 3.20it/s]=====checkpoint_file: /models/sfast_model/FLUX.1-dev/vae/diffusion_pytorch_model.safetensors
Loading pipeline components...: 100%|███████████████████████████████████████████████████| 7/7 [00:01<00:00, 5.72it/s]
WARNING 12-24 02:56:12 [runtime_state.py:63] Model parallel is not initialized, initializing...
Loading pipeline components...: 57%|█████████████████████████████▏ | 4/7 [00:01<00:00, 3.12it/s]=====checkpoint_file: /models/sfast_model/FLUX.1-dev/vae/diffusion_pytorch_model.safetensors
Loading pipeline components...: 100%|███████████████████████████████████████████████████| 7/7 [00:01<00:00, 5.19it/s]
WARNING 12-24 02:56:12 [runtime_state.py:63] Model parallel is not initialized, initializing...
INFO 12-24 02:56:12 [base_pipeline.py:292] Transformer backbone found, paralleling transformer...
INFO 12-24 02:56:12 [base_pipeline.py:292] Transformer backbone found, paralleling transformer...
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.0.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.1.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.2.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.3.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.4.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.0.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.5.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.1.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.6.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.2.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.7.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.3.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.4.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.8.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.5.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.9.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.6.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.7.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.10.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.8.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.11.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.9.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.12.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.10.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.11.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.13.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.12.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.14.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.13.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.14.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.15.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.15.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.16.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.16.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.17.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.17.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.18.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.18.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.19.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.20.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.0.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.21.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.1.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.22.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.2.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.23.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.3.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.24.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.4.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.25.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.5.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.26.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.6.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.27.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.7.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.8.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.9.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_pipeline.py:343] Scheduler found, paralleling scheduler...
INFO 12-24 02:56:12 [base_pipeline.py:343] Scheduler found, paralleling scheduler...
[rank1]: Traceback (most recent call last):
[rank1]: File "/workspace/xDiT/examples/flux_example.py", line 96, in
[rank1]: main()
[rank1]: File "/workspace/xDiT/examples/flux_example.py", line 46, in main
[rank1]: pipe.enable_sequential_cpu_offload(gpu_id=local_rank)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/pipeline_utils.py", line 1151, in enable_sequential_cpu_offload
[rank1]: cpu_offload(model, device, offload_buffers=offload_buffers)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/accelerate/big_modeling.py", line 205, in cpu_offload
[rank1]: attach_align_device_hook(
[rank1]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 518, in attach_align_device_hook
[rank1]: attach_align_device_hook(
[rank1]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 518, in attach_align_device_hook
[rank1]: attach_align_device_hook(
[rank1]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 518, in attach_align_device_hook
[rank1]: attach_align_device_hook(
[rank1]: [Previous line repeated 4 more times]
[rank1]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 509, in attach_align_device_hook
[rank1]: add_hook_to_module(module, hook, append=True)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 161, in add_hook_to_module
[rank1]: module = hook.init_hook(module)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 308, in init_hook
[rank1]: set_module_tensor_to_device(module, name, "meta")
[rank1]: File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py", line 365, in set_module_tensor_to_device
[rank1]: new_value = param_cls(new_value, requires_grad=old_value.requires_grad).to(device)
[rank1]: TypeError: WeightQBytesTensor.new() missing 6 required positional arguments: 'axis', 'size', 'stride', 'data', 'scale', and 'activation_qtype'
[rank0]: Traceback (most recent call last):
[rank0]: File "/workspace/xDiT/examples/flux_example.py", line 96, in
[rank0]: main()
[rank0]: File "/workspace/xDiT/examples/flux_example.py", line 46, in main
[rank0]: pipe.enable_sequential_cpu_offload(gpu_id=local_rank)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/pipeline_utils.py", line 1151, in enable_sequential_cpu_offload
[rank0]: cpu_offload(model, device, offload_buffers=offload_buffers)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/big_modeling.py", line 205, in cpu_offload
[rank0]: attach_align_device_hook(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 518, in attach_align_device_hook
[rank0]: attach_align_device_hook(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 518, in attach_align_device_hook
[rank0]: attach_align_device_hook(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 518, in attach_align_device_hook
[rank0]: attach_align_device_hook(
[rank0]: [Previous line repeated 4 more times]
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 509, in attach_align_device_hook
[rank0]: add_hook_to_module(module, hook, append=True)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 161, in add_hook_to_module
[rank0]: module = hook.init_hook(module)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 308, in init_hook
[rank0]: set_module_tensor_to_device(module, name, "meta")
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py", line 365, in set_module_tensor_to_device
[rank0]: new_value = param_cls(new_value, requires_grad=old_value.requires_grad).to(device)
[rank0]: TypeError: WeightQBytesTensor.new() missing 6 required positional arguments: 'axis', 'size', 'stride', 'data', 'scale', and 'activation_qtype'
[rank0]:[W1224 02:56:13.235291809 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
E1224 02:56:13.579000 5378 torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 0 (pid: 5443) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 355, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 919, in main
run(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 910, in run
elastic_launch(
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 138, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

examples/flux_example.py FAILED

Failures:
[1]:
time : 2024-12-24_02:56:13
host : l117-11-p-ga
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 5444)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2024-12-24_02:56:13
host : l117-11-p-ga
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 5443)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

@feifeibear
Copy link
Collaborator

This issue might be caused by a mismatch between the versions of the libraries you're using or an incompatibility in the model's configuration.

Could you please check the version of your diffusers? we recommend diffusers>=0.32.0.dev for flux

@WeiboXu
Copy link
Author

WeiboXu commented Dec 26, 2024

Here is diffusers' version:

diffusers 0.32.1

@feifeibear feifeibear changed the title Failed to load flux.1-dev with enable_sequential_cpu_offload using 4090 Failed to load flux.1-dev with enable_sequential_cpu_offload and use_fp8_t5_encoder Dec 26, 2024
@feifeibear feifeibear changed the title Failed to load flux.1-dev with enable_sequential_cpu_offload and use_fp8_t5_encoder Failed to load flux.1-dev with enable_sequential_cpu_offload and use_fp8_t5_encoder (4090) Dec 26, 2024
@feifeibear
Copy link
Collaborator

I am using Diffusers version 0.31.0.

While I have successfully managed to run pp=2 with CPU offloading, I am encountering the exact the same issues when attempting to use both enable_sequential_cpu_offload and use_fp8_t5_encoder simultaneously.

In conclusion, the combination of these two features seems to cause unexpected behavior or errors. I would appreciate any insights or suggestions on how to resolve this compatibility issue.

@feifeibear feifeibear added the bug Something isn't working label Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants