- Pedestrian Detection YOLOv3 in INRIA
Pedestrian Detection using YOLOv3 based on Darknet in INRIA. For YOLO detector's predict bounding-boxes, apply NMS(non-maximum suppression) to remove redundant and overlapping bounding boxes.
For the model I trained, I push it on pan.baidu.com
and drive.google.com
, it's a weight that has been trained 1.3 million times.(234.9M)
Download link:
-
Baidu Cloud Disk(中文) : https://pan.baidu.com/s/1kDJqaa6NeWalzxzOhnorrQ password: zwku
-
Google Drive : https://drive.google.com/file/d/1MXOqmZH7OtUpNWu60GBgxpkikgfc70My/view?usp=sharing
INRIA Person Dataset link:
-
official link : http://pascal.inrialpes.fr/data/human/down -
Baidu Cloud Disk(中文) : https://pan.baidu.com/s/12TYw-8U9sxz9cUu2vxzvGQ password: jxqu
-
Google Drive : https://drive.google.com/file/d/1wTxod2BhY_HUkEdDYRVSuw-nDuqrgCu7/view?usp=sharing
Here is my workflow including prepare Darknet in Ubuntu | train YOLO v3 in INRIA | evaluate the trained YOLO detector.
NOTE: All python files need to be run separately.
git clone https://github.com/pjreddie/darknet
for me:
GPU=1
CUDNN=1
OPENCV=1
make
Copy (./INRIAPerson/Train and /Test)'s image(include /pos /neg) to a new folder(./data/)
After run The following command, you can get file structure:
- ./data
- /Train -- 1832 images
- /Test -- 741 images
python ./make_YOLO_data(YOLO_data).py
For the '/pos' of the original data set, use the regular expression to read the Ground Truth information of each image.
In the annotation, for each target of each image, its real bounding box(Ground Truth) is expressed by 4 int, which are (Xmin, Ymin) - (Xmax, Ymax)
.
The label used for training needs to be stored in TXT file with the same name as the image, and all the Ground Truth in a image must store in a TXT file, one per line, format:<object-class> <x> <y> <width> <height>
where x, y, width, and height are relative to the image's width and height, and object-class is all 0.
python ./take_YOLO_label(YOLO_data).py
PS: The labels file for each image in yolov3 must be in the same directory as the image, otherwise you will be told during training that cannot find the labels file.
Darknet needs a TXT file that lists the paths of all images to be trained and tested.
python ./Darknet_list_image_files(.txt).py
Copy yolov3.cfg to ./cfg/yolo-inria.cfg and modify all classes to 1 and change the number of convolutional cores(filters) of all the last layers (that is, the [convolutional] layer before the [yolo] layer) to 18. It's computed by 3*(5+classes)=18(classes=1).
Create inria.names, a file where each line is a class name, so just fill in one line: person.
Copy coco.data to ./data/inria.data and change classes to 1 and fill in the path of previously processed list file and the path of inria.names and set up the model save path(./backup) as shown below.
classes= 1
train = data/Train.txt
valid = data/Test.txt
names = data/inria.names
backup = backup/
wget https://pjreddie.com/media/files/darknet53.conv.74
Copy Darknet's executable file(darknet) to this folder. Then let's training. Don't forget save log!
./darknet detector train data/inria.data cfg/yolo-inria.cfg darknet53.conv.74 | tee training.log
Extract the log of loss and draw the log curve, but there is nan in loss, get rid of nan. So, extract the training log, remove the non-parsed log and format the log file to generate a new log file for python visual script drawing. And then Draw the loss curve.
python ./extract_log.py
python ./visualization_loss.py
Using regular expressions to get the Ground_Truth from the annotation.
python ./get_ground_truth(annotation2Ground_Truth.npy).py
Using YOLO_detector predict Pedestrian's bounding_boxes in ./data/Test/ which is got by 2.1.
python ./predict_bounding_boxes(predict).py
Using (predict bounding_boxes and Ground_Truth)'s IOU > 0.5 is good to evaluate the detector
python ./evaluate_the_detector.py
PS: detect_bounding_boxes.py provides the detector method(don't need to run)
Here are some results to share.
For training model, loss curve plot as follows, it's 3000 to 135249 because it start with a high loss.
Using (predict bounding_boxes and Ground_Truth)'s IOU > 0.5 is good to evaluate the detector. There are 589 Ground Truth and predictions bounding_boxes number is 477 and the number of correct prediction is 474. Finally, precision and recall and so on as shown in the following table.
Precision | Recall | False Positive Rate | Miss Rate |
---|---|---|---|
99.37% | 80.48% | 0.63% | 19.52% |
The detection effect of pedestrian detection based on YOLO v3 is very nice, the detection rate (recall rate) of pedestrian detection is very high, and the precision ratio reaches 99.37%, almost all predict bounding_boxes are right.