You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
First, as many before, I'm very impressed by how fast you reimplemented NFnets thanks a lot for your work.
I have been experimenting a lot with training models on Imagenet in November-December but majority of my experiment were with GENet family models. The best idea from their paper is too use basic blocks at the beginning and bottlenecks later on.
As far as I see the NFNets are not using DW convs because you can't apply standardisation to them. But this leads to a huge increase in number of parameters and strong overfitting. I would like to hear you thoughts about using DW bottlenecks blocks where standardisation is only applied to fist and last 1x1 convs instead. This will lead to slight mean shift in this convs which would be fixed in the following conv.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi,
First, as many before, I'm very impressed by how fast you reimplemented NFnets thanks a lot for your work.
I have been experimenting a lot with training models on Imagenet in November-December but majority of my experiment were with GENet family models. The best idea from their paper is too use basic blocks at the beginning and bottlenecks later on.
As far as I see the NFNets are not using DW convs because you can't apply standardisation to them. But this leads to a huge increase in number of parameters and strong overfitting. I would like to hear you thoughts about using DW bottlenecks blocks where standardisation is only applied to fist and last 1x1 convs instead. This will lead to slight mean shift in this convs which would be fixed in the following conv.
Something similar has been applied in Understanding the Disharmony between Weight Normalization Family and Weight Decay: ε−shifted L2 Regularizer
This seems as a promising improvement for NFNets family
Beta Was this translation helpful? Give feedback.
All reactions