Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why we use diffusion for prior model #313

Open
xiaotingxuan opened this issue Dec 5, 2022 · 1 comment
Open

Why we use diffusion for prior model #313

xiaotingxuan opened this issue Dec 5, 2022 · 1 comment

Comments

@xiaotingxuan
Copy link

xiaotingxuan commented Dec 5, 2022

Hi , I am a greenhorn for diffusion model

According to Dalle2 paper,Prior model is used to predict clip image embeddings from clip text embeddings. I think they design this model to minimize modality gap

I just don't konw why we need to use diffusion model for Prior. I know we can, but why we don't use a simpler network(I don't know ,maybe just a MLP) to implements the mapping from image embeddings to text embeddings. Diffusion model is quite expensive in terms of time and compute.

@borisdayma
Copy link
Owner

In the paper they claim it trains faster but you can probably get similar results without it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants