(cocoapi mAP计算在最下方↓↓↓)
Implementation of YOLO v3 object detector in Tensorflow (TF-Slim). This repository is inspired by Paweł Kapica. The full details are in this paper. In this project we cover several segments as follows:
- YOLO v3 architecture
- Weights converter (util for exporting loaded COCO weights as TF checkpoint)
- Basic working demo
- Non max suppression on the both
GPU
andCPU
is supported - Training pipeline
- Compute COCO mAP with cocoapi
YOLO paper is hard to understand, along side that paper. This repo enables you to have a quick understanding of YOLO Algorithmn.
- Clone this file
$ git clone https://github.com/YunYang1994/tensorflow-yolov3.git
- You are supposed to install some dependencies before getting out hands with these codes.
$ cd tensorflow-yolov3
$ pip install -r ./docs/requirements.txt
- Exporting loaded COCO weights as TF checkpoint(
yolov3.ckpt
) and frozen graph (yolov3_gpu_nms.pb
) . If you don't have yolov3.weights. Download and put it in the dir./checkpoint
$ python convert_weight.py --convert --freeze
- Then you will get some
.pb
files in the dir./checkpoint
, and run the demo script
$ python nms_demo.py
$ python video_demo.py # if use camera, set video_path = 0
Three files are required as follows:
dataset.txt
:
xxx/xxx.jpg 18.19 6.32 424.13 421.83 20 323.86 2.65 640.0 421.94 20
xxx/xxx.jpg 55.38 132.63 519.84 380.4 16
# image_path x_min y_min x_max y_max class_id x_min y_min ... class_id
anchors.txt
0.10,0.13, 0.16,0.30, 0.33,0.23, 0.40,0.61, 0.62,0.45, 0.69,0.59, 0.76,0.60, 0.86,0.68, 0.91,0.76
class.names
person
bicycle
car
...
toothbrush
To help you understand my training process, I made this training-pipline demo. raccoon dataset has only one class with 200 images (180 for train, 20 for test), I have prepared a shell script in the ./scripts
which enables you to get data and train it !
$ sh scripts/make_raccoon_tfrecords.sh
$ python show_input_image.py # show your input image (optional)
$ python kmeans.py # get prior anchors and rescale the values to the range [0,1]
$ python convert_weight.py --convert # get pretrained weights
$ python quick_train.py
$ tensorboard --logdir ./data
As you can see in the tensorboard, if your dataset is too small or you train for too long, the model starts to overfit and learn patterns from training data that does not generalize to the test data.
$ python convert_weight.py -cf ./checkpoint/yolov3.ckpt-2500 -nc 1 -ap ./data/raccoon_anchors.txt --freeze
$ python quick_test.py
$ python evaluate.py
if you are still unfamiliar with training pipline, you can join here to discuss with us.
raccoon-181.jpg | raccoon-55.jpg |
---|---|
Download VOC PASCAL trainval and test data
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
Download COCO trainval and test data
$ wget http://images.cocodataset.org/zips/train2017.zip
$ wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
$ wget http://images.cocodataset.org/zips/test2017.zip
$ wget http://images.cocodataset.org/annotations/image_info_test2017.zip
YOLO stands for You Only Look Once. It's an object detector that uses features learned by a deep convolutional neural network to detect an object. Although we has successfully run these codes, we must understand how YOLO works.
The paper suggests to use clustering on bounding box shape to find the good anchor box specialization suited for the data. more details see here
In this project, I use the pretrained weights, where we have 80 trained yolo classes (COCO dataset), for recognition. And the class label is represented as c
and it's integer from 1 to 80, each number represents the class label accordingly. If c=3
, then the classified object is a car
. The image features learned by the deep convolutional layers are passed onto a classifier and regressor which makes the detection prediction.(coordinates of the bounding boxes, the class label.. etc).details also see in the below picture. (thanks Levio for your great image!)
- input : [None, 416, 416, 3]
- output : confidece of an object being present in the rectangle, list of rectangles position and sizes and classes of the objects begin detected. Each bounding box is represented by 6 numbers
(Rx, Ry, Rw, Rh, Pc, C1..Cn)
as explained above. In this case n=80, which means we havec
as 80-dimensional vector, and the final size of representing the bounding box is 85.The first numberPc
is the confidence of an project, The second four numberbx, by, bw, bh
represents the information of bounding boxes. The last 80 number each is the output probability of corresponding-index class.
The output result may contain several rectangles that are false positives or overlap, if your input image size of [416, 416, 3]
, you will get (52X52+26X26+13X13)x3=10647
boxes since YOLO v3 totally uses 9 anchor boxes. (Three for each scale). So It is time to find a way to reduce them. The first attempt to reduce these rectangles is to filter them by score threshold.
Input arguments:
boxes
: tensor of shape [10647, 4]scores
: tensor of shape[10647, 80]
containing the detection scores for 80 classes.score_thresh
: float value , then get rid of whose boxes with low score
# Step 1: Create a filtering mask based on "box_class_scores" by using "threshold".
score_thresh=0.4
mask = tf.greater_equal(scores, tf.constant(score_thresh))
Even after yolo filtering by thresholding over, we still have a lot of overlapping boxes. Second approach and filtering is Non-Maximum suppression algorithm.
- Discard all boxes with
Pc <= 0.4
- While there are any remaining boxes :
- Pick the box with the largest
Pc
- Output that as a prediction
- Discard any remaining boxes with
IOU>=0.5
with the box output in the previous step
- Pick the box with the largest
In tensorflow, we can simply implement non maximum suppression algorithm like this. more details see here
for i in range(num_classes):
tf.image.non_max_suppression(boxes, score[:,i], iou_threshold=0.5)
Non-max suppression uses the very important function called "Intersection over Union", or IoU. Here is an exmaple of non maximum suppression algorithm: on input the aglorithm receive 4 overlapping bounding boxes, and the output returns only one
run command python coco_predict_gpu.py
to get final evaluate result on cocoapi.