Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UniPC for diffusion sampling #2684

Merged
merged 9 commits into from
Jan 1, 2025
Merged

Conversation

nicksenger
Copy link
Contributor

Hi, thanks for the awesome library! It's great having these tools available in the Rust ecosystem.

I was interested in low step-count inference for some experiments, so ported over the UniPC scheduler. I figured I'd open a PR here in case this work is useful to others. Here is a comparison with Euler A and DDIM at 5 steps which demonstrates UniPC's benefits for quick convergence:

comparison

@LaurentMazare
Copy link
Collaborator

Would you have some examples with good quality where this makes a difference?

@nicksenger
Copy link
Contributor Author

Sure, here is a comparison of DDIM (default cfg) and UniPC (corrector enabled past step 2, Bh2 solver type, otherwise defaults) on sd1.5 across 50 steps with the following prompt with seed 1984 and guidance scale 15:

a rusty robot holding a fire torch in its hand,android scrapyard,oxidized,intricate exposed wires,arcing,neon lights,sci-fi,dystopia,futuristic city background,silhouettes,strange illuminated mannequins,geisha billboard,flying cars,4k,cinematic lighting,photo-realistic,extremely detailed,high quality,epic,lfg

negative prompt:

reduction,reducing agents,electron rich,drawing,illustrated,low quality,out of focus,blurry,cut off,simple,clean,organized,daytime,utopia,peaceful,low contrast,lowres,worst quality

steps

6 steps is about where the output starts becoming clear from both schedulers, but the composition is quite different:

1984-15-6-ddim-512
1984-15-6-unipc-512

At 50 steps both have converged to a similar output with some differences. Which one looks better is a bit subjective, but I think most would agree the composition is closer to the 6-step output from UniPC than that from DDIM:

1984-15-50-ddim-512
1984-15-50-unipc-512

Just for fun here's the 50-step output from UniPC with the same settings as above but using 3 for the solver order:

1984-15-50-order3-512

Somewhat different output again, but overall the composition pretty much resembles what it produces at 5 steps:

1984-15-5-order3-512

candle-transformers/src/models/stable_diffusion/uni_pc.rs Outdated Show resolved Hide resolved
}

#[derive(Clone, Copy)]
struct FloatOrd(f64);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give some insights on the ordering that this provides?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment in fedce6287405fe04826344c3b5537375b317760a, the ordering is:

NaN | -Infinity | x < 0 | -0 | +0 | x > 0 | +Infinity | NaN

It's the same strategy used by float-ord, which is in turn a dependency of the average crate where this quantile computation comes from. These are only used for the dynamic thresholding, so I thought it'd be better to hide the logic within this module versus introducing additional dependencies crate-wide.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that it's better not to introduce a dependency but why not use the total ordering f64::total_cmp? Do you expect some differences with it that would be helpful here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nope. This was just an oversight on my part, I thought the msrv for the project was lower for some reason, but now I'm seeing total_cmp is used in quite a few other places. It looks like the standard lib uses the same method: https://doc.rust-lang.org/src/core/num/f64.rs.html#1350

Updated in f0384fa

candle-transformers/src/models/stable_diffusion/uni_pc.rs Outdated Show resolved Hide resolved
candle-transformers/src/models/stable_diffusion/uni_pc.rs Outdated Show resolved Hide resolved
@LaurentMazare
Copy link
Collaborator

Thanks for the detailed analysis, I've put a bunch of comments inline, please also look at the clippy and rustfmt failures in the CI.

@nicksenger
Copy link
Contributor Author

Thanks for the detailed analysis, I've put a bunch of comments inline, please also look at the clippy and rustfmt failures in the CI.

Thanks, these should be addressed now.

@LaurentMazare LaurentMazare merged commit cbaa0ad into huggingface:main Jan 1, 2025
10 checks passed
@LaurentMazare
Copy link
Collaborator

Thanks!

@super-fun-surf
Copy link

stoked!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants