A transformer model based on sliding kernel self attention mechanism. This model is based on a implementation of Swin Transformer. See Swin Transformer repository for the original implementation.
Model | Params | Val. Acc. |
---|---|---|
Swin Transformer (tiny) | 26,598,166 | 82.19% @200eps |
Swin Transformer (tiny) | 26,598,166 | 83.34% @300eps |
Kernel Transformer (tiny) | 26,600,362 | 85.83% @300eps |