Training recipe for cc3m and cc12m #9

zlai0 · 2022-01-29T03:56:06Z

Hi, thanks for the great work!

I was wondering if you could provide the training recipe for cc3m and cc12m ((lr, wd, bsize, etc.)?

I am trying to reproduce the reported 0-shot result on CC3m on CLIP and SLIP with the same hyper parameters as YFCC-100M. But I am getting 14.1 and 18.4 (compared to the reported 17.1 and 23.0). The environment (OS, PyTorch, CUDA versions) has been double checked.

Thanks!

normster · 2022-01-31T05:32:02Z

Thanks for trying to reproduce our results! We used the same hyperparameters for CC3M/CC12M as YFCC15M: 4096 batch size, 3e-3/5e-4 lr, and 0.1/0.5 wd (CLIP/SLIP). Are the 14.1 and 18.4 numbers measured at the end of training or the maximum measured throughout training? And are you using --update-freq > 1?

zlai0 · 2022-02-01T04:59:51Z

They are measured at the end of training and I'm not using --update-freq >1.

Can you share some details on the training data (e.g. image resolution, format, downloader you used)? Also, do you still have the training log file? I might be able to get some clue from that. Thanks!

normster · 2022-02-01T22:29:54Z

That sounds about right to me then. Both SLIP and CLIP overfit pretty heavily on the CC3M dataset, even when training for just 40 epochs. The best results are achieved well before the end of training.

zlai0 · 2022-02-02T03:07:00Z

Thanks for the reply! However, even the best result during training never exceeds 20.0/15.0 for SLIP/CLIP, which is still several points away from the reported result. I have repeated each experiment twice. I'll try CC12M dataset and report the results latter. At the same time, can you share some details on how the training data is processed (e.g. resolution, format, downloader, etc.)?

kumamonatseu · 2022-04-15T08:04:53Z

Thanks for the reply! However, even the best result during training never exceeds 20.0/15.0 for SLIP/CLIP, which is still several points away from the reported result. I have repeated each experiment twice. I'll try CC12M dataset and report the results latter. At the same time, can you share some details on how the training data is processed (e.g. resolution, format, downloader, etc.)?

Have you ever tried to replace the dataset of SSL stream to ImageNet? I also cannot reproduce results with only the CC3M dataset. I tried to use ImageNet data for SSL and CC3M data for CLIP, and successfully achieved the linear probing results reported in the paper. I guess it could be helpful for you to reproduce the zero-shot results.

normster · 2022-05-25T08:36:00Z

Sorry for the late reply. I don't have the training logs available anymore. Please see my comment and discussion in #15 (comment).

kumamonatseu mentioned this issue Mar 23, 2022

CC3M results cannot be reproduced #15

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training recipe for cc3m and cc12m #9

Training recipe for cc3m and cc12m #9

zlai0 commented Jan 29, 2022 •

edited

Loading

normster commented Jan 31, 2022

zlai0 commented Feb 1, 2022 •

edited

Loading

normster commented Feb 1, 2022

zlai0 commented Feb 2, 2022

kumamonatseu commented Apr 15, 2022

normster commented May 25, 2022

Training recipe for cc3m and cc12m #9

Training recipe for cc3m and cc12m #9

Comments

zlai0 commented Jan 29, 2022 • edited Loading

normster commented Jan 31, 2022

zlai0 commented Feb 1, 2022 • edited Loading

normster commented Feb 1, 2022

zlai0 commented Feb 2, 2022

kumamonatseu commented Apr 15, 2022

normster commented May 25, 2022

zlai0 commented Jan 29, 2022 •

edited

Loading

zlai0 commented Feb 1, 2022 •

edited

Loading