We provide RetinaNet and Deformable DETR results for object detection and Mask R-CNN results for instance segmentation.
We implement RetinaNet and Mask R-CNN on top of Detectron2 and Deformable DETR on top of the official Deformable DETR code.
🚀 All model are trained using ImageNet-1K pretrained weights.
☀️ MS denotes the same multi-scale training augmentation as in Swin-Transformer which follows the MS augmentation as in DETR and Sparse-RCNN. Therefore, we also follows the official implementation of DETR and Sparse-RCNN which are also based on Detectron2.
Please refer to detectron2/
for the details.
Backbone | Method | lr Schd | box mAP | mask mAP | #params | FLOPS | weight |
---|---|---|---|---|---|---|---|
MPViT-T | RetinaNet | 1x | 41.8 | - | 17M | 196G | model | metrics |
MPViT-XS | RetinaNet | 1x | 43.8 | - | 20M | 211G | model | metrics |
MPViT-S | RetinaNet | 1x | 45.7 | - | 32M | 248G | model | metrics |
MPViT-B | RetinaNet | 1x | 47.0 | - | 85M | 482G | model | metrics |
MPViT-T | RetinaNet | MS+3x | 44.4 | - | 17M | 196G | model | metrics |
MPViT-XS | RetinaNet | MS+3x | 46.1 | - | 20M | 211G | model | metrics |
MPViT-S | RetinaNet | MS+3x | 47.6 | - | 32M | 248G | model | metrics |
MPViT-B | RetinaNet | MS+3x | 48.3 | - | 85M | 482G | model | metrics |
MPViT-T | Mask R-CNN | 1x | 42.2 | 39.0 | 28M | 216G | model | metrics |
MPViT-XS | Mask R-CNN | 1x | 44.2 | 40.4 | 30M | 231G | model | metrics |
MPViT-S | Mask R-CNN | 1x | 46.4 | 42.4 | 43M | 268G | model | metrics |
MPViT-B | Mask R-CNN | 1x | 48.2 | 43.5 | 95M | 503G | model | metrics |
MPViT-T | Mask R-CNN | MS+3x | 44.8 | 41.0 | 28M | 216G | model | metrics |
MPViT-XS | Mask R-CNN | MS+3x | 46.6 | 42.3 | 30M | 231G | model | metrics |
MPViT-S | Mask R-CNN | MS+3x | 48.4 | 43.9 | 43M | 268G | model | metrics |
MPViT-B | Mask R-CNN | MS+3x | 49.5 | 44.5 | 95M | 503G | model | metrics |
All models are trained using the same training recipe.
Please refer to deformable_detr/
for the details.
backbone | box mAP | epochs | link |
---|---|---|---|
ResNet-50 | 44.5 | 50 | - |
CoaT-lite S | 47.0 | 50 | link |
CoaT-S | 48.4 | 50 | link |
MPViT-S | 49.0 | 50 | link |