Skip to content

BKM: Use CSRnet to count crowded people

dahanhan edited this page Dec 19, 2019 · 2 revisions

Problem Statement

It’s a frequently question to know how many persons in an area, indoor or outdoor. And furtherly to understand the distribution map to get more accurate and comprehensive information, which could be critical for making correct decisions in high-risk environments, such as stampede and riot. Among many DNN structures, CSRNet is one of which delivered a state-of-the-art crowd counting tasks.

Model description

CSRnet chooses VGG-16 as the front-end, which output size is 1/8 of the original input size. Then dilated convolution layers is used as the back-end for extracting deeper information and generating a heatmap which keeps the same size of the front-end. At the final stage, a bilinear interpolation with factor of 8 is used for scaling and make the output have the same resolution of the input size.

The above image shows the CSRnet on three images from ShanghaiTech dataset. The first line is the original images, the second line is the ground truth of the dataset of each image, and the third line is the CSRnet output heatmap. The darker pixel indicates the denser crowded people. To count the whole person number, sum all data of the output matrix of the model.

Convert the model to IR format

The author shares the training code at https://github.com/leeyeehoo/CSRNet-pytorch, in PyTorch format. There is no pre-trained model weights here. Another implementation in Keras format on https://github.com/Neerajj9/CSRNet-keras. We use the weight files in our project. It’s also trained on ShanghaiTech dataset. OpenVINO MO tool can’t convert Keras model directly, so we first convert it to tensorflow format by a python script ‘h5_to_pb.py’ script. Please note the input keras path. Then use OpenVINO MO tools to convert tensorflow model(.pb) to IR files by below command:

#$(OPENVINO_PATH)/deployment_tools/model_optimizer # python3 mo.py --framework tf --input_model <path_to_pb> --input_shape [1,768,1024,3] --output_dir <output_path> --mean_values [123.675,116.28,103.53] --scale_values [58.395,57.12,57.375]

The 2nd and 3rd dimension number in [input_shape], which is the input blob height and width, can be changed to get a balance between performance and accuracy. The higher resolution will give better accuracy but lower the throughput of inference pipeline. The [mean_values] and [scale_values] settings are decided by the training process. Because we don’t train the model by ourselves, these settings shouldn’t be changed. Then you can find the converted IR model in <output_path>.

Generate INT8 model

The calibration tool in OpenVINO can convert models of FP32/16 to INT8 precision to achieve higher performance. There are two modes in calibration process, and we choose the simplified mode to get better performance improvement than the standard mode. Detailed instruction of calibration tool can be reference at https://docs.openvinotoolkit.org/latest/_inference_engine_tools_calibration_tool_README.html.

#$(OPENVINO_PATH)/deployment_tools # python3 ./tools/calibrate.py -sm -m <path_to_FP32_IR > -s <path_test_images> -p INT8 -td CPU -e ./inference_engine/lib/intel64/libcpu_extension_avx512.so -o <output_path>