Add use_trt and use_int8 int infer.py

PaddlePaddle · Feb 25, 2021 · a6a68fa · a6a68fa
1 parent ae5c137
commit a6a68fa
Show file tree

Hide file tree

Showing 3 changed files with 68 additions and 4 deletions.
diff --git a/deploy/python/README.md b/deploy/python/README.md
@@ -22,8 +22,14 @@ python deploy/python/infer.py --config /path/to/deploy.yaml --image_path
 |-|-|-|-|
 |config|**导出模型时生成的配置文件**, 而非configs目录下的配置文件|是|-|
 |image_path|预测图片的路径或者目录|是|-|
+|use_trt|是否开启TensorRT来加速预测|否|否|
+|use_int8|启动TensorRT预测时，是否以int8模式运行|否|否|
 |batch_size|单卡batch size|否|配置文件中指定值|
 |save_dir|保存预测结果的目录|否|output|
 
 *测试样例和预测结果如下*
 ![cityscape_predict_demo.png](../../docs/images/cityscapes_predict_demo.png)
+
+*注意：*
+*1. 当使用量化模型预测时，需要同时开启TensorRT预测和int8预测才会有加速效果*
+*2. 使用TensorRT需要使用支持TRT功能的Paddle库，请参考[附录](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-release)下载对应的PaddlePaddle安装包，或者参考[源码编译](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/compile/fromsource.html)自行编译*
diff --git a/deploy/python/infer.py b/deploy/python/infer.py
@@ -68,11 +68,12 @@ def __init__(self, args):
             pred_cfg.enable_use_gpu(100, 0)
 
             if self.args.use_trt:
+                ptype = PrecisionType.Int8 if args.use_int8 else PrecisionType.Float32
                 pred_cfg.enable_tensorrt_engine(
                     workspace_size=1 << 30,
                     max_batch_size=1,
                     min_subgraph_size=3,
-                    precision_mode=PrecisionType.Int8,
+                    precision_mode=ptype,
                     use_static=False,
                     use_calib_mode=False)
 
@@ -96,7 +97,6 @@ def run(self, imgs):
             ])
             input_handle.reshape(data.shape)
             input_handle.copy_from_cpu(data)
-
             self.predictor.run()
 
             output_names = self.predictor.get_output_names()
@@ -106,11 +106,16 @@ def run(self, imgs):
         self.postprocess(results, imgs)
 
     def postprocess(self, results, imgs):
+        if not os.path.exists(self.args.save_dir):
+            os.makedirs(self.args.save_dir)
+
         results = np.concatenate(results, axis=0)
         for i in range(results.shape[0]):
             result = np.argmax(results[i], axis=0)
             result = get_pseudo_color_map(result)
             basename = os.path.basename(imgs[i])
+            basename, _ = os.path.splitext(basename)
+            basename = f'{basename}.png'
             result.save(os.path.join(self.args.save_dir, basename))
 
 
@@ -147,8 +152,12 @@ def parse_args():
         '--use_trt',
         dest='use_trt',
         help='Whether to use Nvidia TensorRT to accelerate prediction.',
-        type=bool,
-        default=False)
+        action='store_true')
+    parser.add_argument(
+        '--use_int8',
+        dest='use_int8',
+        help='Whether to use Int8 prediction when using TensorRT prediction.',
+        action='store_true')
 
     return parser.parse_args()
 

diff --git a/slim/README.md b/slim/README.md
@@ -14,6 +14,8 @@ pip install paddleslim==2.0.0
 
 模型量化是通过将模型中的参数计算，从浮点计算转成低比特定点计算的技术，可以有效的降低模型计算强度、参数大小和内存消耗。PaddleSeg基于PaddleSlim库，提供了模型在线量化的能力。
 
+*注意：对于小模型而言，由于模型本身运行速度已经非常快，加入量化操作反而可能导致模型运行速度变慢*
+
 ### step 1. 模型训练
 
 我们可以通过PaddleSeg提供的脚本对模型进行训练，请确保完成了PaddleSeg的安装工作，并且位于PaddleSeg目录下，执行以下脚本：
@@ -45,6 +47,11 @@ python train.py \
 |save_dir|量化后模型的保存路径|否|output|
 
 ```shell
+# 请在PaddleSeg根目录运行
+export PYTONPATH=`pwd`
+# windows下请执行以下命令
+# set PYTONPATH=%cd%
+
 python slim/quant.py \
        --config configs/quick_start/bisenet_optic_disc_512x512_1k.yml \
        --retraining_iters 10 \
@@ -93,6 +100,11 @@ python train.py \
 |save_dir|裁剪后模型的保存路径|否|output|
 
 ```shell
+# 请在PaddleSeg根目录运行
+export PYTONPATH=`pwd`
+# windows下请执行以下命令
+# set PYTONPATH=%cd%
+
 python slim/prune.py \
        --config configs/quick_start/bisenet_optic_disc_512x512_1k.yml \
        --pruning_ratio 0.2 \
@@ -104,3 +116,40 @@ python slim/prune.py \
 ## 部署
 
 通过`量化`和`剪枝`得到的模型，我们可以直接进行部署应用，相关教程请参考[模型部署](../docs/model_export.md)
+
+
+## 量化&剪枝加速比
+
+    测试环境：
+       GPU: V100
+       CPU: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
+       CUDA: 10.2
+       cuDNN: 7.6
+       TensorRT: 6.0.1.5
+
+    测试方法:
+       1. 运行耗时为纯模型预测时间，测试图片cityspcaes(1024x512)
+       2. 预测10次作为热启动，连续预测50次取平均得到预测时间
+       3. 使用GPU + TensorRT测试
+
+|模型|未量化运行耗时(ms)|量化运行耗时(ms)|加速比|
+|-|-|-|-|
+|deeplabv3_resnet50_os8|204.2|150.1|26.49%|
+|deeplabv3p_resnet50_os8|147.2|89.5|39.20%|
+|gcnet_resnet50_os8|201.8|126.1|37.51%|
+|pspnet_resnet50_os8|266.8|206.8|22.49%|  
+
+|模型|裁剪率|运行耗时(ms)|加速比|
+|-|-|-|-|
+|fastscnn|-|7.0|-|
+||0.1|5.9|15.71%|
+||0.2|5.7|18.57%|
+||0.3|5.6|20.00%|
+|fcn_hrnetw18|-|43.28|-|
+||0.1|40.46|6.51%|
+||0.2|40.41|6.63%|
+||0.3|38.84|10.25%|
+|unet|-|76.04|-|
+||0.1|74.39|2.16%|
+||0.2|72.10|5.18%|
+||0.3|66.96|11.94%|