Skip to content

Commit

Permalink
Add use_trt and use_int8 int infer.py
Browse files Browse the repository at this point in the history
  • Loading branch information
nepeplwu committed Feb 25, 2021
1 parent ae5c137 commit a6a68fa
Show file tree
Hide file tree
Showing 3 changed files with 68 additions and 4 deletions.
6 changes: 6 additions & 0 deletions deploy/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,14 @@ python deploy/python/infer.py --config /path/to/deploy.yaml --image_path
|-|-|-|-|
|config|**导出模型时生成的配置文件**, 而非configs目录下的配置文件||-|
|image_path|预测图片的路径或者目录||-|
|use_trt|是否开启TensorRT来加速预测|||
|use_int8|启动TensorRT预测时,是否以int8模式运行|||
|batch_size|单卡batch size||配置文件中指定值|
|save_dir|保存预测结果的目录||output|

*测试样例和预测结果如下*
![cityscape_predict_demo.png](../../docs/images/cityscapes_predict_demo.png)

*注意:*
*1. 当使用量化模型预测时,需要同时开启TensorRT预测和int8预测才会有加速效果*
*2. 使用TensorRT需要使用支持TRT功能的Paddle库,请参考[附录](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-release)下载对应的PaddlePaddle安装包,或者参考[源码编译](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/compile/fromsource.html)自行编译*
17 changes: 13 additions & 4 deletions deploy/python/infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,11 +68,12 @@ def __init__(self, args):
pred_cfg.enable_use_gpu(100, 0)

if self.args.use_trt:
ptype = PrecisionType.Int8 if args.use_int8 else PrecisionType.Float32
pred_cfg.enable_tensorrt_engine(
workspace_size=1 << 30,
max_batch_size=1,
min_subgraph_size=3,
precision_mode=PrecisionType.Int8,
precision_mode=ptype,
use_static=False,
use_calib_mode=False)

Expand All @@ -96,7 +97,6 @@ def run(self, imgs):
])
input_handle.reshape(data.shape)
input_handle.copy_from_cpu(data)

self.predictor.run()

output_names = self.predictor.get_output_names()
Expand All @@ -106,11 +106,16 @@ def run(self, imgs):
self.postprocess(results, imgs)

def postprocess(self, results, imgs):
if not os.path.exists(self.args.save_dir):
os.makedirs(self.args.save_dir)

results = np.concatenate(results, axis=0)
for i in range(results.shape[0]):
result = np.argmax(results[i], axis=0)
result = get_pseudo_color_map(result)
basename = os.path.basename(imgs[i])
basename, _ = os.path.splitext(basename)
basename = f'{basename}.png'
result.save(os.path.join(self.args.save_dir, basename))


Expand Down Expand Up @@ -147,8 +152,12 @@ def parse_args():
'--use_trt',
dest='use_trt',
help='Whether to use Nvidia TensorRT to accelerate prediction.',
type=bool,
default=False)
action='store_true')
parser.add_argument(
'--use_int8',
dest='use_int8',
help='Whether to use Int8 prediction when using TensorRT prediction.',
action='store_true')

return parser.parse_args()

Expand Down
49 changes: 49 additions & 0 deletions slim/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ pip install paddleslim==2.0.0

模型量化是通过将模型中的参数计算,从浮点计算转成低比特定点计算的技术,可以有效的降低模型计算强度、参数大小和内存消耗。PaddleSeg基于PaddleSlim库,提供了模型在线量化的能力。

*注意:对于小模型而言,由于模型本身运行速度已经非常快,加入量化操作反而可能导致模型运行速度变慢*

### step 1. 模型训练

我们可以通过PaddleSeg提供的脚本对模型进行训练,请确保完成了PaddleSeg的安装工作,并且位于PaddleSeg目录下,执行以下脚本:
Expand Down Expand Up @@ -45,6 +47,11 @@ python train.py \
|save_dir|量化后模型的保存路径||output|

```shell
# 请在PaddleSeg根目录运行
export PYTONPATH=`pwd`
# windows下请执行以下命令
# set PYTONPATH=%cd%

python slim/quant.py \
--config configs/quick_start/bisenet_optic_disc_512x512_1k.yml \
--retraining_iters 10 \
Expand Down Expand Up @@ -93,6 +100,11 @@ python train.py \
|save_dir|裁剪后模型的保存路径||output|

```shell
# 请在PaddleSeg根目录运行
export PYTONPATH=`pwd`
# windows下请执行以下命令
# set PYTONPATH=%cd%

python slim/prune.py \
--config configs/quick_start/bisenet_optic_disc_512x512_1k.yml \
--pruning_ratio 0.2 \
Expand All @@ -104,3 +116,40 @@ python slim/prune.py \
## 部署

通过`量化``剪枝`得到的模型,我们可以直接进行部署应用,相关教程请参考[模型部署](../docs/model_export.md)


## 量化&剪枝加速比

测试环境:
GPU: V100
CPU: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
CUDA: 10.2
cuDNN: 7.6
TensorRT: 6.0.1.5

测试方法:
1. 运行耗时为纯模型预测时间,测试图片cityspcaes(1024x512)
2. 预测10次作为热启动,连续预测50次取平均得到预测时间
3. 使用GPU + TensorRT测试

|模型|未量化运行耗时(ms)|量化运行耗时(ms)|加速比|
|-|-|-|-|
|deeplabv3_resnet50_os8|204.2|150.1|26.49%|
|deeplabv3p_resnet50_os8|147.2|89.5|39.20%|
|gcnet_resnet50_os8|201.8|126.1|37.51%|
|pspnet_resnet50_os8|266.8|206.8|22.49%|

|模型|裁剪率|运行耗时(ms)|加速比|
|-|-|-|-|
|fastscnn|-|7.0|-|
||0.1|5.9|15.71%|
||0.2|5.7|18.57%|
||0.3|5.6|20.00%|
|fcn_hrnetw18|-|43.28|-|
||0.1|40.46|6.51%|
||0.2|40.41|6.63%|
||0.3|38.84|10.25%|
|unet|-|76.04|-|
||0.1|74.39|2.16%|
||0.2|72.10|5.18%|
||0.3|66.96|11.94%|

0 comments on commit a6a68fa

Please sign in to comment.