Zirui Wang
- Pytorch implementation of Faster R-CNN based on VGG16.
- Supports Feature Pyramid Network (FPN).
- Supports Deformable Convolution (DCNv1)
- Pretrained model can be found here
This is a implementation a framework that combines Feature Pyramid Network (FPN) and Deformable Convolution Network (DCNv1) to improve Faster RCNN on object detection tasks.The whole model is implemented on Pytorch and trained on VOC 2007 training set and evaluate on VOC 2007 test set, with 1.1% improvement on mAP@[.5,.95] score and 3.95% improvement on mAP@[0.75:0.95] score, which demonstrates the effectiveness of the model. The model also support for KITTI 2d Object Detection dataset, training on KITTI 2D object detection training set and evaluate on validation set, with a surprisingly 11.96% increase on mAP@[.5,.95] score and a 23.35% increase on mAP@[.75,.95]. m
- Detection results on PASCAL VOC 2007 test set
- All models were evaluated using COCO-style detection evaluation metrics.
Training dataset | Model | mAP@[.5,.95] | mAP@[.75,.95] |
---|---|---|---|
VOC 07 | Faster RCNN | 69.65 | 31.14 |
VOC 07 | FPN+ Faster RCNN | 69.83 | 34.02 |
VOC 07 | Deform+ Faster RCNN | 69.93 | 30.85 |
- Detection results on KITTI 2d Object Detection valication set
- All models were evaluated using COCO-style detection evaluation metrics.
Training dataset | Model | mAP@[.5,.95] | mAP@[.75,.95] |
---|---|---|---|
KITTI 2d | Faster RCNN | 71.58 | 32.40 |
KITTI 2d | FPN+ Faster RCNN | 82.76 | 56.02 |
KITTI 2d | Deform+ Faster RCNN | 71.73 | 33.16 |
KITTI 2d | FPN+ Deform+ Faster RCNN | 82.59 | 56.30 |
- numpy
- six
- torch
- torchvision
- tqdm
- cv2
- defaultdict
- itertools
- namedtuple
- skimage
- xml
- pascal_voc_writer
- PIL
- Download the training, validation, and test data.
# VOC 2007 trainval and test datasets
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
# KITTI 2d Object Detection training set and groundtruth labels
wget https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_image_2.zip
wget https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_label_2.zip
- Untar files into two separate directories named
VOCdevkit
andKITTIdevkit
# VOC 2007 trainval and test datasets
mkdir VOCdevkit && cd VOCdevkit
tar xvf VOCtrainval_06-Nov-2007.tar
tar xvf VOCtest_06-Nov-2007.tar
# KITTI 2d Object Detection trainset and labels (following last command)
cd ..
mkdir KITTIdevkit && cd KITTI devkit
unzip data_object_image_2.zip
unzip data_object_label_2.zip
- The KITTI dataset need to reformat to match with the following structure.
dataset
├── KITTIdevkit
│ ├── training
│ ├── image_2
│ └── label_2
└── VOCdevkit
├── VOC2007
├── Annotations
├── ImageSets
├── JPEGImages
├── SegmentationClass
└── SegmentationObject
- Convert the KITTI dataset into PASCAL VOC 2007 dataset format using the dataset format convertion tool script
# go back to the main page of the project code
cd ./improved_faster_rcnn
# change directory to find the format convertion script
cd data
# run dataset format convertion script
python kitti2voc.py
- After running the above command, you should have the same dataset structure for KITTI as VOC 2007, and it is now ready to load into the model
dataset
├── KITTI2VOC
│ ├── Annotations
│ ├── ImageSets
│ ├── JPEGImages
└── VOCdevkit
└── VOC2007
├── Annotations
├── ImageSets
├── JPEGImages
├── SegmentationClass
└── SegmentationObject
- You can easily modify the parameters for training in
utils/config.py
and run the following script for model training
# if you are in local environemnt, run:
python3 ./train.py
# if you are in conda environment, run:
python ./train.py
-
You can easily modify the parameters for testing in
utils/config.py
and run the following script for model training -
You can visualize the testing image by setting
visualize=True
in configuration file -
The output image should be placed under
save_dir/visuals
specified in the configuration file -
Ex 1) FPN based on VGG16 (file name: "fpn_vgg16_1.pth")
# if you are in local environemnt, run:
python3 ./test.py
# if you are in conda environment, run:
python ./test.py
- Below are some of the resulting images from visualization