PP-TSM

Introduction

We optimized TSM model and proposed PP-TSM in this repo. Without increasing the number of parameters, the accuracy of TSM was significantly improved in UCF101 and Kinetics-400 datasets. Please refer to Tricks on PP-TSM for more details.

Version	Sampling method	Top1
Ours (distill)	Dense	76.16
Ours	Dense	75.69
mmaction2	Dense	74.55
mit-han-lab	Dense	74.1

Version	Sampling method	Top1
Ours (distill)	Uniform	75.11
Ours	Uniform	74.54
mmaction2	Uniform	71.90
mit-han-lab	Uniform	71.16

Data

Please refer to Kinetics400 data download and preparation doc k400-data

Please refer to UCF101 data download and preparation doc ucf101-data

Train

Train on kinetics-400

download pretrain-model

Please download ResNet50_vd_ssld_v2 as pretraind model:

wget https://videotag.bj.bcebos.com/PaddleVideo/PretrainModel/ResNet50_vd_ssld_v2_pretrained.pdparams

and add path to MODEL.framework.backbone.pretrained in config file as：

MODEL:
    framework: "Recognizer2D"
    backbone:
        name: "ResNetTweaksTSM"
        pretrained: your weight path

If use ResNet101 as backbone, please download ResNet101_vd_ssld_pretrained.pdparams as pretraind model.

Start training

Train PP-TSM on kinetics-400 scripts:

python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7"  --log_dir=log_pptsm  main.py  --validate -c configs/recognition/pptsm/pptsm_k400_frames_uniform.yaml

Train PP-TSM on kinetics-400 video data using scripts:

python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7"  --log_dir=log_pptsm  main.py  --validate -c configs/recognition/pptsm/pptsm_k400_videos_uniform.yaml

AMP is useful for speeding up training:

export FLAGS_conv_workspace_size_limit=800 #MB
export FLAGS_cudnn_exhaustive_search=1
export FLAGS_cudnn_batchnorm_spatial_persistent=1

python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7"  --log_dir=log_pptsm  main.py  --amp --validate -c configs/recognition/pptsm/pptsm_k400_frames_uniform.yaml

Train PP-TSM on kinetics-400 with dense sampling:

python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7"  --log_dir=log_pptsm  main.py  --validate -c configs/recognition/pptsm/pptsm_k400_frames_dense.yaml

Train PP-TSM on kinetics-400 with ResNet101 as backbone using dense sampling:

python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7"  --log_dir=log_pptsm  main.py  --validate -c configs/recognition/pptsm/pptsm_k400_frames_dense_r101.yaml

Test

For uniform sampling, test accuracy can be found in training-logs by search key word best, such as:

Already save the best model (top1 acc)0.7454

For dense sampling, test accuracy can be obtained using scripts:

python3 main.py --test -c configs/recognition/pptsm/pptsm_k400_frames_dense.yaml -w output/ppTSM/ppTSM_best.pdparams

Accuracy on Kinetics400:

backbone	distill	Sampling method	num_seg	target_size	Top-1	checkpoints
ResNet50	False	Uniform	8	224	74.54	ppTSM_k400_uniform.pdparams
ResNet50	False	Dense	8	224	75.69	ppTSM_k400_dense.pdparams
ResNet50	True	Uniform	8	224	75.11	ppTSM_k400_uniform_distill.pdparams
ResNet50	True	Dense	8	224	76.16	ppTSM_k400_dense_distill.pdparams
ResNet101	True	Uniform	8	224	76.35	ppTSM_k400_uniform_distill_r101.pdparams
ResNet101	False	Dense	8	224	77.15	ppTSM_k400_dense_r101.pdparams

Inference

export inference model

To get model architecture file ppTSM.pdmodel and parameters file ppTSM.pdiparams, use:

python3.7 tools/export_model.py -c configs/recognition/pptsm/pptsm_k400_frames_uniform.yaml \
                                -p data/ppTSM_k400_uniform.pdparams \
                                -o inference/ppTSM

Args usage please refer to Model Inference.

infer

python3.7 tools/predict.py --input_file data/example.avi \
                           --config configs/recognition/pptsm/pptsm_k400_frames_uniform.yaml \
                           --model_file inference/ppTSM/ppTSM.pdmodel \
                           --params_file inference/ppTSM/ppTSM.pdiparams \
                           --use_gpu=True \
                           --use_tensorrt=False

example of logs:

Current video file: data/example.avi
	top-1 class: 5
	top-1 score: 0.9907386302947998

we can get the class name using class id and map file data/k400/Kinetics-400_label_list.txt. The top1 prediction of data/example.avi is archery.

Reference

TSM: Temporal Shift Module for Efficient Video Understanding, Ji Lin, Chuang Gan, Song Han
Distilling the Knowledge in a Neural Network, Geoffrey Hinton, Oriol Vinyals, Jeff Dean

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pp-tsm.md

pp-tsm.md

PP-TSM

Contents

Introduction

Data

Train

Train on kinetics-400

download pretrain-model

Start training

Test

Inference

export inference model

infer

Reference

Files

pp-tsm.md

Latest commit

History

pp-tsm.md

File metadata and controls

PP-TSM

Contents

Introduction

Data

Train

Train on kinetics-400

download pretrain-model

Start training

Test

Inference

export inference model

infer

Reference