You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm trying to run Stable Diffusion 3 with xDiT on single 4090 via following command. I set all parrallel related parameter to 1 to disable parrallelism. And I got epoch time: 7.96 sec, parameter memory: 17.58 GB, peak memory: 20.16 GB.
torchrun --nproc_per_node=1 examples/sd3_example.py \
--model "stabilityai/stable-diffusion-3-medium-diffusers" \
--pipefusion_parallel_degree 1 --ulysses_degree 1 \
--data_parallel_degree 1 --ring_degree 1 --tensor_parallel_degree 1 \
--num_inference_steps 50 --warmup_steps 0 \
--prompt "A sign that reads Raining Cats and Dogs with a dog smiling and wagging its tail."
However, I found that the inference time only with diffusers is slower than xDiT. I coded as below and got a result of 9.28s. It makes me confused about whether some optimization used in the situation of single gpu and non parrallelism. And I tried some difference inference steps , but there is always a gap around 1.4 seconds. So my question is that, what's reason of this difference?
importtorchfromdiffusersimportStableDiffusion3Pipelineimporttimepipe=StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16)
pipe=pipe.to("cuda")
start_time=time.time()
image=pipe(
"A sign that reads Raining Cats and Dogs with a dog smiling and wagging its tail.",
negative_prompt="",
num_inference_steps=50,
guidance_scale=7.0,
).images[0]
end_time=time.time()
print(f"inference time: {end_time-start_time:.2f} s")
image.save("sign.png")
The text was updated successfully, but these errors were encountered:
Yeah, I know it "should be the same". But i tested on multiple machines with different GPUs. Tha gap is still exists. I don't know why this phenomena happens. Is there something I did wrong? I refactor my scirpt format above. Could you help me?
Hi, I'm trying to run Stable Diffusion 3 with xDiT on single 4090 via following command. I set all parrallel related parameter to 1 to disable parrallelism. And I got
epoch time: 7.96 sec, parameter memory: 17.58 GB, peak memory: 20.16 GB
.However, I found that the inference time only with diffusers is slower than xDiT. I coded as below and got a result of 9.28s. It makes me confused about whether some optimization used in the situation of single gpu and non parrallelism. And I tried some difference inference steps , but there is always a gap around 1.4 seconds. So my question is that, what's reason of this difference?
The text was updated successfully, but these errors were encountered: