Poor results in CIRR #9

neouyghur · 2025-01-26T11:13:07Z

Hi, I have tried to evaluate your model in CIRR test set, and I got this for composed features.

Recall@ Score (Accuracy Perc.)
1 14.193
2 21.133
5 32.651
10 42.386
50 63.711

and image features:
1 7.663
2 12.819
5 23.831
10 34.169
50 57.566

Could you explain this? Thanks.

Or could you share your evaluation code? Thanks.

neouyghur · 2025-01-26T11:36:07Z

will it be related to this?

Some weights of the model checkpoint at navervision/CompoDiff-Aesthetic were not used when initializing CompoDiffModel: ['model.noise_scheduler.biass']
- This IS expected if you are initializing CompoDiffModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing CompoDiffModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of CompoDiffModel were not initialized from the model checkpoint at navervision/CompoDiff-Aesthetic and are newly initialized: ['model.noise_scheduler.betas']

geonm · 2025-02-02T23:51:37Z

@neouyghur Sorry for the late.

I'd like to clarify that the model we made public is distinct from the one discussed in the CompoDiff paper.

This version is termed CompoDiff-Aesthetic.

This model is integral to our diffusion-based text-to-image generator, Graphit, which you can explore further on Github https://github.com/navervision/Graphit

Here are the key modifications we applied:

We transitioned to using a different textual feature extractor, moving away from the OpenCLIP-Giga model, while retaining the CLIP-Large model for visual feature extraction.
We followed the initial two stages outlined in our paper and added a further step; training the model with an additional dataset comprising over 10 million images from the LAION 5B collection. This enhancement ensures that CompoDiff can produce superior aesthetic visual features for image generation for the Graphit model.

Thanks.

also you can find SynthTriplets18M to train the CompoDiff by yourself here https://huggingface.co/datasets/navervision/SynthTriplets18M

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor results in CIRR #9

Poor results in CIRR #9

neouyghur commented Jan 26, 2025

neouyghur commented Jan 26, 2025

geonm commented Feb 2, 2025 •

edited

Loading

Poor results in CIRR #9

Poor results in CIRR #9

Comments

neouyghur commented Jan 26, 2025

neouyghur commented Jan 26, 2025

geonm commented Feb 2, 2025 • edited Loading

geonm commented Feb 2, 2025 •

edited

Loading