Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor results in CIRR #9

Open
neouyghur opened this issue Jan 26, 2025 · 2 comments
Open

Poor results in CIRR #9

neouyghur opened this issue Jan 26, 2025 · 2 comments

Comments

@neouyghur
Copy link

Hi, I have tried to evaluate your model in CIRR test set, and I got this for composed features.

Recall@ Score (Accuracy Perc.)
1 14.193
2 21.133
5 32.651
10 42.386
50 63.711

and image features:
1 7.663
2 12.819
5 23.831
10 34.169
50 57.566

Could you explain this? Thanks.

Or could you share your evaluation code? Thanks.

@neouyghur
Copy link
Author

will it be related to this?

Some weights of the model checkpoint at navervision/CompoDiff-Aesthetic were not used when initializing CompoDiffModel: ['model.noise_scheduler.biass']
- This IS expected if you are initializing CompoDiffModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing CompoDiffModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of CompoDiffModel were not initialized from the model checkpoint at navervision/CompoDiff-Aesthetic and are newly initialized: ['model.noise_scheduler.betas']

@geonm
Copy link
Contributor

geonm commented Feb 2, 2025

@neouyghur Sorry for the late.

I'd like to clarify that the model we made public is distinct from the one discussed in the CompoDiff paper.

This version is termed CompoDiff-Aesthetic.

This model is integral to our diffusion-based text-to-image generator, Graphit, which you can explore further on Github https://github.com/navervision/Graphit

Here are the key modifications we applied:

  1. We transitioned to using a different textual feature extractor, moving away from the OpenCLIP-Giga model, while retaining the CLIP-Large model for visual feature extraction.
  2. We followed the initial two stages outlined in our paper and added a further step; training the model with an additional dataset comprising over 10 million images from the LAION 5B collection. This enhancement ensures that CompoDiff can produce superior aesthetic visual features for image generation for the Graphit model.

Thanks.

also you can find SynthTriplets18M to train the CompoDiff by yourself here https://huggingface.co/datasets/navervision/SynthTriplets18M

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants