-
Notifications
You must be signed in to change notification settings - Fork 68
Training recipe for cc3m and cc12m #9
Comments
Thanks for trying to reproduce our results! We used the same hyperparameters for CC3M/CC12M as YFCC15M: 4096 batch size, 3e-3/5e-4 lr, and 0.1/0.5 wd (CLIP/SLIP). Are the 14.1 and 18.4 numbers measured at the end of training or the maximum measured throughout training? And are you using --update-freq > 1? |
They are measured at the end of training and I'm not using --update-freq >1. Can you share some details on the training data (e.g. image resolution, format, downloader you used)? Also, do you still have the training log file? I might be able to get some clue from that. Thanks! |
That sounds about right to me then. Both SLIP and CLIP overfit pretty heavily on the CC3M dataset, even when training for just 40 epochs. The best results are achieved well before the end of training. |
Thanks for the reply! However, even the best result during training never exceeds 20.0/15.0 for SLIP/CLIP, which is still several points away from the reported result. I have repeated each experiment twice. I'll try CC12M dataset and report the results latter. At the same time, can you share some details on how the training data is processed (e.g. resolution, format, downloader, etc.)? |
Have you ever tried to replace the dataset of SSL stream to ImageNet? I also cannot reproduce results with only the CC3M dataset. I tried to use ImageNet data for SSL and CC3M data for CLIP, and successfully achieved the linear probing results reported in the paper. I guess it could be helpful for you to reproduce the zero-shot results. |
Sorry for the late reply. I don't have the training logs available anymore. Please see my comment and discussion in #15 (comment). |
Hi, thanks for the great work!
I was wondering if you could provide the training recipe for cc3m and cc12m ((lr, wd, bsize, etc.)?
I am trying to reproduce the reported 0-shot result on CC3m on CLIP and SLIP with the same hyper parameters as YFCC-100M. But I am getting 14.1 and 18.4 (compared to the reported 17.1 and 23.0). The environment (OS, PyTorch, CUDA versions) has been double checked.
Thanks!
The text was updated successfully, but these errors were encountered: