Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flux.1 Hopper Performance #378

Open
thomasbtnfr opened this issue Dec 4, 2024 · 3 comments
Open

Flux.1 Hopper Performance #378

thomasbtnfr opened this issue Dec 4, 2024 · 3 comments

Comments

@thomasbtnfr
Copy link

thomasbtnfr commented Dec 4, 2024

Hello everyone,
I'm benchmarking FLUX.1 [dev] with xDiT. Comparing my results with those presented here, I notice some important differences. My results are much worse...

Here is the table with two new columns corresponding to the results of my experiments:

Configuration PyTorch (Sec) torch.compile (Sec) Me PyTorch (Sec) Me torch.compile (Sec)
1 GPU 6.71 4.30 6.27 3.88
Ulysses-2 4.38 2.68 5.05 4.01
Ring-2 5.31 2.60 5.01 3.7
Ulysses-2 x Ring-2 5.19 1.80 3.23 2.85
Ulysses-4 4.24 1.63 2.96 2.21
Ring-4 5.11 1.98 3.7 3.05

My results are quite different, especially when using torch.compile. Do you have any ideas?

Environment:

  • I measure the time taken to compute pipe only. In the table, I indicate the average time per GPU.
  • The H100s (SXM5) I use provide an intra-node NVLink bandwidth of ~900gb/s.
  • xfuser==0.3.4
  • torch==2.5.0

Options:

  • num_inference_steps 28
  • height 1024
  • width 1024
  • no_use_resolution_binning
  • warmup_steps 1
@feifeibear
Copy link
Collaborator

Our torch version is 2.5.1 cuda 12.6.

Could you upgrade your libraries?

@eppane
Copy link

eppane commented Dec 20, 2024

@feifeibear does 2.5.1 support cuda 12.6? Is your torch built/compiled with cuda 12.6, or e.g. 2.5.1+cu124? (what is the output of torch.version.cuda)

@feifeibear
Copy link
Collaborator

2.5.1+cu124 this one!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants