You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
log:
W1224 02:55:39.414000 5378 torch/distributed/run.py:793]
W1224 02:55:39.414000 5378 torch/distributed/run.py:793] *****************************************
W1224 02:55:39.414000 5378 torch/distributed/run.py:793] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W1224 02:55:39.414000 5378 torch/distributed/run.py:793] *****************************************
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.1.2+cu121 with CUDA 1201 (you have 2.5.1+cu124)
Python 3.10.13 (you have 3.10.12)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.1.2+cu121 with CUDA 1201 (you have 2.5.1+cu124)
Python 3.10.13 (you have 3.10.12)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
/usr/local/lib/python3.10/dist-packages/xformers/triton/softmax.py:30: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
@custom_fwd(cast_inputs=torch.float16 if _triton_softmax_fp16_enabled else None)
/usr/local/lib/python3.10/dist-packages/xformers/triton/softmax.py:87: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead.
def backward(
/usr/local/lib/python3.10/dist-packages/xformers/ops/swiglu_op.py:107: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
def forward(cls, ctx, x, w1, b1, w2, b2, w3, b3):
/usr/local/lib/python3.10/dist-packages/xformers/ops/swiglu_op.py:128: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead.
def backward(cls, ctx, dx5):
/usr/local/lib/python3.10/dist-packages/xformers/triton/softmax.py:30: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
@custom_fwd(cast_inputs=torch.float16 if _triton_softmax_fp16_enabled else None)
/usr/local/lib/python3.10/dist-packages/xformers/triton/softmax.py:87: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead.
def backward(
/usr/local/lib/python3.10/dist-packages/xformers/ops/swiglu_op.py:107: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
def forward(cls, ctx, x, w1, b1, w2, b2, w3, b3):
/usr/local/lib/python3.10/dist-packages/xformers/ops/swiglu_op.py:128: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead.
def backward(cls, ctx, dx5):
WARNING 12-24 02:55:43 [args.py:326] Distributed environment is not initialized. Initializing...
DEBUG 12-24 02:55:43 [parallel_state.py:179] world_size=-1 rank=-1 local_rank=-1 distributed_init_method=env:// backend=nccl
WARNING 12-24 02:55:43 [args.py:326] Distributed environment is not initialized. Initializing...
DEBUG 12-24 02:55:43 [parallel_state.py:179] world_size=-1 rank=-1 local_rank=-1 distributed_init_method=env:// backend=nccl
INFO 12-24 02:55:43 [config.py:120] Ring degree not set, using default value 1
INFO 12-24 02:55:43 [config.py:120] Ring degree not set, using default value 1
INFO 12-24 02:55:43 [config.py:164] Pipeline patch number not set, using default value 2
INFO 12-24 02:55:43 [config.py:164] Pipeline patch number not set, using default value 2
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 2/2 [00:05<00:00, 2.85s/it]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 2/2 [00:05<00:00, 2.86s/it]
Loading pipeline components...: 0%| | 0/7 [00:00<?, ?it/s]You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 0%| | 0/7 [00:00<?, ?it/s]You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 43%|█████████████████████▊ | 3/7 [00:00<00:01, 3.20it/s]=====checkpoint_file: /models/sfast_model/FLUX.1-dev/vae/diffusion_pytorch_model.safetensors
Loading pipeline components...: 100%|███████████████████████████████████████████████████| 7/7 [00:01<00:00, 5.72it/s]
WARNING 12-24 02:56:12 [runtime_state.py:63] Model parallel is not initialized, initializing...
Loading pipeline components...: 57%|█████████████████████████████▏ | 4/7 [00:01<00:00, 3.12it/s]=====checkpoint_file: /models/sfast_model/FLUX.1-dev/vae/diffusion_pytorch_model.safetensors
Loading pipeline components...: 100%|███████████████████████████████████████████████████| 7/7 [00:01<00:00, 5.19it/s]
WARNING 12-24 02:56:12 [runtime_state.py:63] Model parallel is not initialized, initializing...
INFO 12-24 02:56:12 [base_pipeline.py:292] Transformer backbone found, paralleling transformer...
INFO 12-24 02:56:12 [base_pipeline.py:292] Transformer backbone found, paralleling transformer...
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.0.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.1.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.2.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.3.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.4.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.0.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.5.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.1.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.6.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.2.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.7.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.3.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.4.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.8.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.5.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.9.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.6.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.7.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.10.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.8.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.11.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.9.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.12.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.10.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.11.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.13.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.12.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.14.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.13.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.14.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.15.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.15.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.16.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.16.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.17.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.17.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.18.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.18.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.19.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.20.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.0.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.21.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.1.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.22.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.2.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.23.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.3.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.24.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.4.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.25.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.5.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.26.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.6.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.27.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.7.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.8.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.9.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_pipeline.py:343] Scheduler found, paralleling scheduler...
INFO 12-24 02:56:12 [base_pipeline.py:343] Scheduler found, paralleling scheduler...
[rank1]: Traceback (most recent call last):
[rank1]: File "/workspace/xDiT/examples/flux_example.py", line 96, in
[rank1]: main()
[rank1]: File "/workspace/xDiT/examples/flux_example.py", line 46, in main
[rank1]: pipe.enable_sequential_cpu_offload(gpu_id=local_rank)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/pipeline_utils.py", line 1151, in enable_sequential_cpu_offload
[rank1]: cpu_offload(model, device, offload_buffers=offload_buffers)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/accelerate/big_modeling.py", line 205, in cpu_offload
[rank1]: attach_align_device_hook(
[rank1]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 518, in attach_align_device_hook
[rank1]: attach_align_device_hook(
[rank1]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 518, in attach_align_device_hook
[rank1]: attach_align_device_hook(
[rank1]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 518, in attach_align_device_hook
[rank1]: attach_align_device_hook(
[rank1]: [Previous line repeated 4 more times]
[rank1]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 509, in attach_align_device_hook
[rank1]: add_hook_to_module(module, hook, append=True)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 161, in add_hook_to_module
[rank1]: module = hook.init_hook(module)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 308, in init_hook
[rank1]: set_module_tensor_to_device(module, name, "meta")
[rank1]: File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py", line 365, in set_module_tensor_to_device
[rank1]: new_value = param_cls(new_value, requires_grad=old_value.requires_grad).to(device)
[rank1]: TypeError: WeightQBytesTensor.new() missing 6 required positional arguments: 'axis', 'size', 'stride', 'data', 'scale', and 'activation_qtype'
[rank0]: Traceback (most recent call last):
[rank0]: File "/workspace/xDiT/examples/flux_example.py", line 96, in
[rank0]: main()
[rank0]: File "/workspace/xDiT/examples/flux_example.py", line 46, in main
[rank0]: pipe.enable_sequential_cpu_offload(gpu_id=local_rank)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/pipeline_utils.py", line 1151, in enable_sequential_cpu_offload
[rank0]: cpu_offload(model, device, offload_buffers=offload_buffers)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/big_modeling.py", line 205, in cpu_offload
[rank0]: attach_align_device_hook(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 518, in attach_align_device_hook
[rank0]: attach_align_device_hook(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 518, in attach_align_device_hook
[rank0]: attach_align_device_hook(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 518, in attach_align_device_hook
[rank0]: attach_align_device_hook(
[rank0]: [Previous line repeated 4 more times]
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 509, in attach_align_device_hook
[rank0]: add_hook_to_module(module, hook, append=True)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 161, in add_hook_to_module
[rank0]: module = hook.init_hook(module)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 308, in init_hook
[rank0]: set_module_tensor_to_device(module, name, "meta")
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py", line 365, in set_module_tensor_to_device
[rank0]: new_value = param_cls(new_value, requires_grad=old_value.requires_grad).to(device)
[rank0]: TypeError: WeightQBytesTensor.new() missing 6 required positional arguments: 'axis', 'size', 'stride', 'data', 'scale', and 'activation_qtype'
[rank0]:[W1224 02:56:13.235291809 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
E1224 02:56:13.579000 5378 torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 0 (pid: 5443) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 355, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 919, in main
run(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 910, in run
elastic_launch(
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 138, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
feifeibear
changed the title
Failed to load flux.1-dev with enable_sequential_cpu_offload using 4090
Failed to load flux.1-dev with enable_sequential_cpu_offload and use_fp8_t5_encoder
Dec 26, 2024
feifeibear
changed the title
Failed to load flux.1-dev with enable_sequential_cpu_offload and use_fp8_t5_encoder
Failed to load flux.1-dev with enable_sequential_cpu_offload and use_fp8_t5_encoder (4090)
Dec 26, 2024
While I have successfully managed to run pp=2 with CPU offloading, I am encountering the exact the same issues when attempting to use both enable_sequential_cpu_offload and use_fp8_t5_encoder simultaneously.
In conclusion, the combination of these two features seems to cause unexpected behavior or errors. I would appreciate any insights or suggestions on how to resolve this compatibility issue.
command:
CUDA_VISIBLE_DEVICES=4,5 torchrun --nproc_per_node=2 examples/flux_example.py --model /models/sfast_model/FLUX.1-dev --height 512 --width 512 --no_use_resolution_binning --pipefusion_parallel_degree 2 --ulysses_degree 1 --num_inference_steps 2 --warmup_steps 0 --prompt "A small dog" --tensor_parallel_degree 1 --use_fp8_t5_encoder --enable_sequential_cpu_offload
environment:
cuda: 12.2
Driver Version: 535.146.02
python3.10.12
torch: 2.5.1
log:
W1224 02:55:39.414000 5378 torch/distributed/run.py:793]
W1224 02:55:39.414000 5378 torch/distributed/run.py:793] *****************************************
W1224 02:55:39.414000 5378 torch/distributed/run.py:793] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W1224 02:55:39.414000 5378 torch/distributed/run.py:793] *****************************************
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.1.2+cu121 with CUDA 1201 (you have 2.5.1+cu124)
Python 3.10.13 (you have 3.10.12)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.1.2+cu121 with CUDA 1201 (you have 2.5.1+cu124)
Python 3.10.13 (you have 3.10.12)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
/usr/local/lib/python3.10/dist-packages/xformers/triton/softmax.py:30: FutureWarning:
torch.cuda.amp.custom_fwd(args...)
is deprecated. Please usetorch.amp.custom_fwd(args..., device_type='cuda')
instead.@custom_fwd(cast_inputs=torch.float16 if _triton_softmax_fp16_enabled else None)
/usr/local/lib/python3.10/dist-packages/xformers/triton/softmax.py:87: FutureWarning:
torch.cuda.amp.custom_bwd(args...)
is deprecated. Please usetorch.amp.custom_bwd(args..., device_type='cuda')
instead.def backward(
/usr/local/lib/python3.10/dist-packages/xformers/ops/swiglu_op.py:107: FutureWarning:
torch.cuda.amp.custom_fwd(args...)
is deprecated. Please usetorch.amp.custom_fwd(args..., device_type='cuda')
instead.def forward(cls, ctx, x, w1, b1, w2, b2, w3, b3):
/usr/local/lib/python3.10/dist-packages/xformers/ops/swiglu_op.py:128: FutureWarning:
torch.cuda.amp.custom_bwd(args...)
is deprecated. Please usetorch.amp.custom_bwd(args..., device_type='cuda')
instead.def backward(cls, ctx, dx5):
/usr/local/lib/python3.10/dist-packages/xformers/triton/softmax.py:30: FutureWarning:
torch.cuda.amp.custom_fwd(args...)
is deprecated. Please usetorch.amp.custom_fwd(args..., device_type='cuda')
instead.@custom_fwd(cast_inputs=torch.float16 if _triton_softmax_fp16_enabled else None)
/usr/local/lib/python3.10/dist-packages/xformers/triton/softmax.py:87: FutureWarning:
torch.cuda.amp.custom_bwd(args...)
is deprecated. Please usetorch.amp.custom_bwd(args..., device_type='cuda')
instead.def backward(
/usr/local/lib/python3.10/dist-packages/xformers/ops/swiglu_op.py:107: FutureWarning:
torch.cuda.amp.custom_fwd(args...)
is deprecated. Please usetorch.amp.custom_fwd(args..., device_type='cuda')
instead.def forward(cls, ctx, x, w1, b1, w2, b2, w3, b3):
/usr/local/lib/python3.10/dist-packages/xformers/ops/swiglu_op.py:128: FutureWarning:
torch.cuda.amp.custom_bwd(args...)
is deprecated. Please usetorch.amp.custom_bwd(args..., device_type='cuda')
instead.def backward(cls, ctx, dx5):
WARNING 12-24 02:55:43 [args.py:326] Distributed environment is not initialized. Initializing...
DEBUG 12-24 02:55:43 [parallel_state.py:179] world_size=-1 rank=-1 local_rank=-1 distributed_init_method=env:// backend=nccl
WARNING 12-24 02:55:43 [args.py:326] Distributed environment is not initialized. Initializing...
DEBUG 12-24 02:55:43 [parallel_state.py:179] world_size=-1 rank=-1 local_rank=-1 distributed_init_method=env:// backend=nccl
INFO 12-24 02:55:43 [config.py:120] Ring degree not set, using default value 1
INFO 12-24 02:55:43 [config.py:120] Ring degree not set, using default value 1
INFO 12-24 02:55:43 [config.py:164] Pipeline patch number not set, using default value 2
INFO 12-24 02:55:43 [config.py:164] Pipeline patch number not set, using default value 2
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 2/2 [00:05<00:00, 2.85s/it]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 2/2 [00:05<00:00, 2.86s/it]
Loading pipeline components...: 0%| | 0/7 [00:00<?, ?it/s]You set
add_prefix_space
. The tokenizer needs to be converted from the slow tokenizersLoading pipeline components...: 0%| | 0/7 [00:00<?, ?it/s]You set
add_prefix_space
. The tokenizer needs to be converted from the slow tokenizersLoading pipeline components...: 43%|█████████████████████▊ | 3/7 [00:00<00:01, 3.20it/s]=====checkpoint_file: /models/sfast_model/FLUX.1-dev/vae/diffusion_pytorch_model.safetensors
Loading pipeline components...: 100%|███████████████████████████████████████████████████| 7/7 [00:01<00:00, 5.72it/s]
WARNING 12-24 02:56:12 [runtime_state.py:63] Model parallel is not initialized, initializing...
Loading pipeline components...: 57%|█████████████████████████████▏ | 4/7 [00:01<00:00, 3.12it/s]=====checkpoint_file: /models/sfast_model/FLUX.1-dev/vae/diffusion_pytorch_model.safetensors
Loading pipeline components...: 100%|███████████████████████████████████████████████████| 7/7 [00:01<00:00, 5.19it/s]
WARNING 12-24 02:56:12 [runtime_state.py:63] Model parallel is not initialized, initializing...
INFO 12-24 02:56:12 [base_pipeline.py:292] Transformer backbone found, paralleling transformer...
INFO 12-24 02:56:12 [base_pipeline.py:292] Transformer backbone found, paralleling transformer...
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.0.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.1.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.2.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.3.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.4.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.0.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.5.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.1.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.6.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.2.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.7.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.3.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.4.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.8.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.5.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.9.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.6.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.7.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.10.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.8.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.11.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.9.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.12.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.10.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.11.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.13.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.12.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.14.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.13.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.14.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.15.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.15.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.16.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.16.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.17.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.17.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.18.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping transformer_blocks.18.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.19.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.20.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.0.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.21.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.1.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.22.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.2.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.23.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.3.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.24.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.4.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.25.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.5.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.26.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.6.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 1] Wrapping single_transformer_blocks.27.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.7.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.8.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_model.py:83] [RANK 0] Wrapping single_transformer_blocks.9.attn in model class FluxTransformer2DModel with xFuserAttentionWrapper
INFO 12-24 02:56:12 [base_pipeline.py:343] Scheduler found, paralleling scheduler...
INFO 12-24 02:56:12 [base_pipeline.py:343] Scheduler found, paralleling scheduler...
[rank1]: Traceback (most recent call last):
[rank1]: File "/workspace/xDiT/examples/flux_example.py", line 96, in
[rank1]: main()
[rank1]: File "/workspace/xDiT/examples/flux_example.py", line 46, in main
[rank1]: pipe.enable_sequential_cpu_offload(gpu_id=local_rank)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/pipeline_utils.py", line 1151, in enable_sequential_cpu_offload
[rank1]: cpu_offload(model, device, offload_buffers=offload_buffers)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/accelerate/big_modeling.py", line 205, in cpu_offload
[rank1]: attach_align_device_hook(
[rank1]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 518, in attach_align_device_hook
[rank1]: attach_align_device_hook(
[rank1]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 518, in attach_align_device_hook
[rank1]: attach_align_device_hook(
[rank1]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 518, in attach_align_device_hook
[rank1]: attach_align_device_hook(
[rank1]: [Previous line repeated 4 more times]
[rank1]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 509, in attach_align_device_hook
[rank1]: add_hook_to_module(module, hook, append=True)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 161, in add_hook_to_module
[rank1]: module = hook.init_hook(module)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 308, in init_hook
[rank1]: set_module_tensor_to_device(module, name, "meta")
[rank1]: File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py", line 365, in set_module_tensor_to_device
[rank1]: new_value = param_cls(new_value, requires_grad=old_value.requires_grad).to(device)
[rank1]: TypeError: WeightQBytesTensor.new() missing 6 required positional arguments: 'axis', 'size', 'stride', 'data', 'scale', and 'activation_qtype'
[rank0]: Traceback (most recent call last):
[rank0]: File "/workspace/xDiT/examples/flux_example.py", line 96, in
[rank0]: main()
[rank0]: File "/workspace/xDiT/examples/flux_example.py", line 46, in main
[rank0]: pipe.enable_sequential_cpu_offload(gpu_id=local_rank)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/pipeline_utils.py", line 1151, in enable_sequential_cpu_offload
[rank0]: cpu_offload(model, device, offload_buffers=offload_buffers)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/big_modeling.py", line 205, in cpu_offload
[rank0]: attach_align_device_hook(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 518, in attach_align_device_hook
[rank0]: attach_align_device_hook(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 518, in attach_align_device_hook
[rank0]: attach_align_device_hook(
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 518, in attach_align_device_hook
[rank0]: attach_align_device_hook(
[rank0]: [Previous line repeated 4 more times]
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 509, in attach_align_device_hook
[rank0]: add_hook_to_module(module, hook, append=True)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 161, in add_hook_to_module
[rank0]: module = hook.init_hook(module)
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 308, in init_hook
[rank0]: set_module_tensor_to_device(module, name, "meta")
[rank0]: File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py", line 365, in set_module_tensor_to_device
[rank0]: new_value = param_cls(new_value, requires_grad=old_value.requires_grad).to(device)
[rank0]: TypeError: WeightQBytesTensor.new() missing 6 required positional arguments: 'axis', 'size', 'stride', 'data', 'scale', and 'activation_qtype'
[rank0]:[W1224 02:56:13.235291809 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
E1224 02:56:13.579000 5378 torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 0 (pid: 5443) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 355, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 919, in main
run(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 910, in run
elastic_launch(
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 138, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
examples/flux_example.py FAILED
Failures:
[1]:
time : 2024-12-24_02:56:13
host : l117-11-p-ga
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 5444)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Root Cause (first observed failure):
[0]:
time : 2024-12-24_02:56:13
host : l117-11-p-ga
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 5443)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
The text was updated successfully, but these errors were encountered: