-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to replicate DeiT-SimA result #4
Comments
Hi, Thanks for your interests in our works! I explained how to run DeiT-S -> SimA in here (link). We use default hyper parameters (our code) except that we run it with 300 epochs to be comparable to the original numbers of DeiT-S in their paper. |
Thanks for your reply! I'm actually trying to use DeiT-Tiny for my experiments because of GPU memory limits (mostly 4 or 8 2080Tis), I used the exact same way to replace MHSA in DeiT-Tiny with SimA but the performance dropped drastically about 4-5%. Is there any suggestion here to boost the performance of small models? |
Hi, Please use our settings for DeiT-Tiny training. I guess since DeiT-Tiny has less capacity, you might want to reduce any regularization parameters (e.g., drop path rate). |
Thank you for your reply. I will try a lower drop path rate. And when you train DeiT-S->SimA (79.8 as your paper claimed), is drop-path rate 0.1 or 0.05? |
Also, by this code, the original PatchEmbed module is replaced by ConvPatchEmbed module, and this costs a little bit more parameters and flops. By original PatchEmbed and your default settings, the performance of DeiT-S-> SimA dropped drastically comparing to original DeiT-S as well... The flops of DeiT-S->SimA you claimed in your paper is actually not 4.6B. It shoule be 5.0B actually. |
And the parameters of DeiT-S->SimA should be 23M instead of 22M. |
Hi, Thanks for FLOPS: 4594733568.0 Can you please elaborate how you calculate above numbers? We use thop to calculate the FLOPS. We used ConvPatchEmbed in our code. Note that there are various approaches to calculate input tokens from original image, which is orthogonal to the main transformer architecture. But thanks for mentioning this difference! We will clarify this in the next arXiv version. Please let me know if you have any questions. Thanks! Have a good day! |
Below is the full architecture for DeiT-S -> SimA
|
Thanks for your reply! I used fvcore to calculate the flops and params. And I believe that various approaches to calculate input tokens are orthogonal too. But if you replace PatcheEmbedding with ConvPatchEmbedding on original DeiT-S, there is performance improvement as well (I did this experiment). So the comparison in your ablation study may seem not fair. |
Can you please use above code and calculate the FLOPs again? I wonder why your numbers are inconsistent with ours (yours: 5GFLOPs vs ours: 4.6 GFLOPs)? Can you please share the checkpoint for DeiT-S + ConvPatchEmbedding? How much is the improvement? I can evaluate it and add it to the next version of arXiv. |
Thanks for running this experiment! Let's see the final converged accuracy. |
Thanks for your wonderful work of SimA! And I'm trying to replicate your DeiT-S->SimA result but I don't find any hyperparameters settings. Is all hyperparameters inculding drop-path are the same with original DeiT-S? Thank you.
The text was updated successfully, but these errors were encountered: