We use TensorRT's pytorch quntization tool to finetune training QAT yolov7 from the pre-trained weight, then export the model to onnx and deploy it with TensorRT. The accuray and performance can be found in below table.
Method | Calibration method | mAPval 0.5 |
mAPval 0.5:0.95 |
batch-1 fps Jetson Orin-X |
batch-16 fps Jetson Orin-X |
weight |
---|---|---|---|---|---|---|
pytorch FP16 | - | 0.6972 | 0.5120 | - | - | yolov7.pt |
pytorch PTQ-INT8 | Histogram(MSE) | 0.6957 | 0.5100 | - | - | yolov7_ptq.pt yolov7_ptq_640.onnx |
pytorch QAT-INT8 | Histogram(MSE) | 0.6961 | 0.5111 | - | - | yolov7_qat.pt |
TensorRT FP16 | - | 0.6973 | 0.5124 | 140 | 168 | yolov7.onnx |
TensorRT PTQ-INT8 | TensorRT built in EntropyCalibratorV2 | 0.6317 | 0.4573 | 207 | 264 | - |
TensorRT QAT-INT8 | Histogram(MSE) | 0.6962 | 0.5113 | 207 | 266 | yolov7_qat_640.onnx |
- network input resolution: 3x640x640
- note: trtexec cudaGraph is enabled
Suggest to use docker environment.
$ docker pull nvcr.io/nvidia/pytorch:22.09-py3
- Clone and apply patch
# use this YoloV7 as a sample base
git clone https://github.com/WongKinYiu/yolov7.git
cp -r yolov_deepstream/yolov7_qat/* yolov7/
- Install dependencies
$ pip install pytorch-quantization --extra-index-url https://pypi.ngc.nvidia.com
- Download dataset and pretrained model
$ bash scripts/get_coco.sh
$ wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt
$ python scripts/qat.py quantize yolov7.pt --ptq=ptq.pt --qat=qat.pt --eval-ptq --eval-origin
This script includes steps below:
-
Insert Q&DQ nodes to get fake-quant pytorch model
Pytorch quntization tool provides automatic insertion of QDQ function. But for yolov7 model, it can not get the same performance as PTQ, because in Explicit mode(QAT mode), TensorRT will henceforth refer Q/DQ nodes' placement to restrict the precision of the model. Some of the automatic added Q&DQ nodes can not be fused with other layers which will cause some extra useless precision convertion. In our script, We find Some rules and restrictions for yolov7, QDQ nodes are automatically analyzed and configured in a rule-based manner, ensuring that they are optimal under TensorRT. Ensuring that all nodes are running INT8(confirmed with tool:trt-engine-explorer, see scripts/draw-engine.py). for details of this part, please refer quantization/rules.py, About the guidance of Q&DQ insert, please refer Guidance_of_QAT_performance_optimization -
PTQ calibration
After inserting Q&DQ nodes, we recommend to run PTQ-Calibration first. Per experiments,Histogram(MSE)
is the best PTQ calibration method for yolov7. Note: if you are satisfied with PTQ result, you could also skip QAT. -
QAT training
After QAT, need to finetune traning our model. after getting the accuracy we are satisfied, Saving the weights to files
$ python scripts/qat.py export qat.pt --size=640 --save=qat.onnx --dynamic
$ bash scripts/eval-trt.sh qat.pt
$ /usr/src/tensorrt/bin/trtexec --onnx=qat.onnx --int8 --fp16 --workspace=1024000 --minShapes=images:4x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:4x3x640x640
$ python scripts/qat.py quantize yolov7-tiny.pt --qat=qat.pt --ptq=ptq.pt --ignore-policy="model\.77\.m\.(.*)|model\.0\.(.*)" --supervision-stride=1 --eval-ptq --eval-origin
- For YoloV5, please use the script
scripts/qat-yolov5.py
. This adds QAT support forAdd operator
, making it more performant. - Please refer to the
quantize.replace_bottleneck_forward
function to handle theAdd operator
.