This folder contains the Deformable DETR using MPViT as a backbone experiment using Deformable DETR framework. For fair comparison with CoaT, We also use the same official implementation as CoaT and follow its default settings (with multi-scale) in our experiments.
backbone | box mAP | epochs | link |
---|---|---|---|
ResNet-50 | 44.5 | 50 | - |
CoaT-lite S | 47.0 | 50 | link |
CoaT-S | 48.4 | 50 | link |
MPViT-S | 49.0 | 50 | link |
Install required packages. See Deformable DETR's original readme for more details.
# Install the required packages.
cd MPViT/deformable_detr
pip install -r ./requirements.txt
# Build and install MultiScaleDeformableAttention operator.
# Note: 1. It may requires CUDA installation. In our environment, we install CUDA 11.3
# which is compatible with CUDA 11.0 bundled with PyTorch and RTX 30 series graphic cards.
# 2. If you found error "no kernel image is available for execution on the device" during training,
# please use `pip uninstall MultiScaleDeformableAttention` to remove the installed package,
# delete all build folders (e.g. ./build, ./dist and ./*.egg-info), and then re-run `./make.sh`.
cd ./models/ops
sh ./make.sh
cd ../../
Please follow the steps in the CoaT's guide to download COCO 2017 dataset and extract.
Here we simply create symbolic links for models and the dataset folder.
# Create symbolic links.
# Note: Here we directly create a symbolic link to COCO dataset which has set up for detectron2/. You may
# refer to the [corresponding readme](../detectron2/README.md) to download COCO dataset first.
mkdir -p ./data
ln -sfT ../../detectron2/datasets/coco ./data/coco
We provide the MPViT-Small checkpoint pre-trained on the ImageNet-1K dataset.
We compare MPViT-Small with CoaT-Lite Small and CoaT Small which are from CoaT's official repo.
Name | AP | AP50 | AP75 | APS | APM | APL | URL |
---|---|---|---|---|---|---|---|
CoaT-Lite Small | 47.0 | 66.5 | 51.2 | 28.8 | 50.3 | 63.3 | model / log |
CoaT Small | 48.4 | 68.5 | 52.4 | 30.1 | 51.8 | 63.8 | model / log |
MPViT-Small | 49.0 | 68.7 | 53.7 | 31.7 | 52.4 | 64.5 | model / log |
The following commands provide an example (MPViT Small) to evaluate the pre-trained checkpoint.
# Usage: Please see [Deformable DETR's document] for more details.
cd MPViT/deformable_detr
GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/mpvit_small_deformable_detr.sh --resume https://dl.dropbox.com/s/omzvc4jaqcag540/deformable_detr_mpvit_small.pth --eval
This should give the following result:
# IoU metric: bbox
# Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.490
# Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.687
# Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.537
# Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.317
# Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.524
# Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.645
# Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.373
# Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.623
# Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.667
# Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.465
# Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.714
# Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.848
The following commands provide an example (MPViT-Small, 8-GPU) to train the Deformable DETR w/ MPViT backbone.
# Usage: Please see Deformable DETR's document for more details.
cd MPViT/deformable_detr
GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/mpvit_small_deformable_detr.sh
Thanks to Deformable DETR for its official implementation and CoaT.
We borrow some codes from CoaT.