-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prompt to prompt editing #7
Comments
Hi, thanks for the question! This surprised us too. I love P2P and thought it'd boost our results. We don't have a full explanation for it (we haven't focused on this a ton but I've dug through the code to double-check that things are wired correctly), but typically what I see with EDICT+P2P is some combo of
The puzzling thing about 1. is that P2P clearly works in something like null-text inversion. So again it must be something EDICT-specific. One hypothesis is that the combination of the averaging layers with predictions operating on the counterpart sequence (e.g. x to y) dampens the amount of change that can be made when attention maps are constrained. It definitely makes the concept of self-attention more awkward. It's possible that softening the locking of attention maps to re-weighting or being more selective in their application (or customizing them to EDICT) could work. This definitely is an area we want to keep thinking about so I'm curious if you have any further insight (experimental or otherwise). Happy to have follow-up discussions! |
After some more experiments, I start finding the P2P interface too restrictive for general use, so I am not sure I would use that with EDICT even if it was available. Putting just any target prompt is so much more convenient. Anyway, I don't have good explanations. I actually rewrote the P2P part to use the official Prompt to Prompt implementation, but I never got any good results with that |
I am sorry I can't, it is part of a proprietary codebase |
I am trying to understand to what extent your method replaces prompt-to-prompt. It seems to me that EDICT is a clever way to invert DDIM diffusion. If so, once we get our latents, we should be able to apply prompt to prompt editing techniques. Instead, what you propose is just to run DDIM denoising conditioned on the target prompt to obtain the edited image.
It has been observed that (on generated images) prompt-to-prompt obtains more realistic and semantically meaningful edits. I guess the technique should be readily applicable to a latent obtained by EDICT inversion - and the code seems to support it - but the paper does not mention this combination, and in fact setting
use_p2p=True
gives me inferior results.Do you have an explanation why using prompt-to-prompt is not beneficial?
The text was updated successfully, but these errors were encountered: