Release mAP Improvements Past Darknet, Multithreaded DataLoader · ultralytics/yolov3

This release requires PyTorch >= v1.0.0 to function properly. Please install the latest version from https://github.com/pytorch/pytorch/releases

Breaking Changes

There are no breaking changes in this release.

Bug Fixes

Multi GPU support is now working correctly #21.
test.py now natively outputs the same results as pycocotools to within 1% under most circumstances #2

Added Functionality

Dataloader is now multithread. #141
mAP improved by smarter NMS. mAP now exceeds darknet mAP by a small amount in all image sizes 320-608.

	ultralytics/yolov3 with `pycocotools`	darknet/yolov3
YOLOv3-320	51.8	51.5
YOLOv3-416	55.4	55.3
YOLOv3-608	58.2	57.9

sudo rm -rf yolov3 && git clone https://github.com/ultralytics/yolov3
# bash yolov3/data/get_coco_dataset.sh
sudo rm -rf cocoapi && git clone https://github.com/cocodataset/cocoapi && cd cocoapi/PythonAPI && make && cd ../.. && cp -r cocoapi/PythonAPI/pycocotools yolov3
cd yolov3

python3 test.py --save-json --conf-thres 0.001 --img-size 416
Namespace(batch_size=32, cfg='cfg/yolov3.cfg', conf_thres=0.001, data_cfg='cfg/coco.data', img_size=416, iou_thres=0.5, nms_thres=0.5, save_json=True, weights='weights/yolov3.weights')
Using cuda _CudaDeviceProperties(name='Tesla V100-SXM2-16GB', major=7, minor=0, total_memory=16130MB, multi_processor_count=80)
      Image      Total          P          R        mAP
Calculating mAP: 100%|█████████████████████████████████| 157/157 [08:34<00:00,  2.53s/it]
       5000       5000     0.0896      0.756      0.555
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.312
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.554
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.317
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.145
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.343
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.452
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.268
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.411
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.435
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.244
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.477
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.587
 
python3 test.py --save-json --conf-thres 0.001 --img-size 608 --batch-size 16
Namespace(batch_size=16, cfg='cfg/yolov3.cfg', conf_thres=0.001, data_cfg='cfg/coco.data', img_size=608, iou_thres=0.5, nms_thres=0.5, save_json=True, weights='weights/yolov3.weights')
Using cuda _CudaDeviceProperties(name='Tesla V100-SXM2-16GB', major=7, minor=0, total_memory=16130MB, multi_processor_count=80)
      Image      Total          P          R        mAP
Calculating mAP: 100%|█████████████████████████████████| 313/313 [08:54<00:00,  1.55s/it]
       5000       5000     0.0966      0.786      0.579
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.331
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.582
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.344
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.198
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.362
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.427
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.281
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.437
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.463
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.309
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.494
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.577

Performance

mAP computation is much slower now than before using default settings, as --conf-thres 0.001 captures many boxes that all must be passed through NMS. On a V100 test.py runs in about 8 minutes
Training speed is improved substantially compared to v3.0 due to the addition of the multithreaded PyTorch dataloader.

https://cloud.google.com/deep-learning-vm/
Machine type: n1-standard-8 (8 vCPUs, 30 GB memory)
CPU platform: Intel Skylake
GPUs: K80 ($0.198/hr), P4 ($0.279/hr), T4 ($0.353/hr), P100 ($0.493/hr), V100 ($0.803/hr)
HDD: 100 GB SSD
Dataset: COCO train 2014

GPUs	`batch_size`	batch time	epoch time	epoch cost
	(images)	(s/batch)
1 K80	16	1.43s	175min	$0.58
1 P4	8	0.51s	125min	$0.58
1 T4	16	0.78s	94min	$0.55
1 P100	16	0.39s	48min	$0.39
2 P100	32	0.48s	29min	$0.47
4 P100	64	0.65s	20min	$0.65
1 V100	16	0.25s	31min	$0.41
2 V100	32	0.29s	18min	$0.48
4 V100	64	0.41s	13min	$0.70
8 V100	128	0.49s	7min	$0.80

TODO (help and PR's welcome!)

Video Inference. Pass a video file to detect.py.
YAPF linting (including possible wrap to PEP8 79 character-line standard) #88.
Add iOS App inference to photos and videos in Camera Roll.
Add parameter to switch between 'darknet' and 'power' wh methods. #168
Hyperparameter search for loss function constants.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mAP Improvements Past Darknet, Multithreaded DataLoader

Breaking Changes

Bug Fixes

Added Functionality

Performance

TODO (help and PR's welcome!)