Skip to content

Latest commit

 

History

History
221 lines (147 loc) · 9.21 KB

File metadata and controls

221 lines (147 loc) · 9.21 KB

简体中文 | English

TSM

Contents

Introduction

Temporal Shift Module (TSM) is a popular model that attracts more attention at present. The method of moving through channels greatly improves the utilization ability of temporal information without increasing any additional number of parameters and calculation amount. Moreover, due to its lightweight and efficient characteristics, it is very suitable for industrial landing.


This code implemented single RGB stream of TSM networks. Backbone is ResNet-50.

Please refer to the ICCV 2019 paper for details TSM: Temporal Shift Module for Efficient Video Understanding

Data

Please refer to Kinetics-400 data download and preparation k400 data preparation

Please refer to UCF101 data download and preparation ucf101 data preparation

Train

Train on the Kinetics-400 dataset

download pretrain-model

  1. Please download ResNet50_pretrain.pdparams as pretraind model:

    wget https://videotag.bj.bcebos.com/PaddleVideo/PretrainModel/ResNet50_pretrain.pdparams
  2. Open PaddleVideo/configs/recognition/tsm/tsm_k400_frames.yaml, and fill in the downloaded weight path below pretrained:

    MODEL:
    	framework: "Recognizer2D"
    		backbone:
    		name: "ResNetTSM"
    		pretrained: your weight path

Start training

  • By specifying different configuration files, different data formats/data sets can be used for training. Taking the training configuration of Kinetics-400 data set + 8 cards + frames format as an example, the startup command is as follows (more training commands can be viewed in PaddleVideo/run.sh).

    python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" --log_dir=log_tsm main.py  --validate -c configs/recognition/tsm/tsm_k400_frames.yaml
  • Training Kinetics-400 dataset of videos format using scripts.

    python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" --log_dir=log_tsm main.py  --validate -c configs/recognition/tsm/tsm_k400_videos.yaml
  • AMP is useful for speeding up training, scripts as follows:

export FLAGS_conv_workspace_size_limit=800 #MB
export FLAGS_cudnn_exhaustive_search=1
export FLAGS_cudnn_batchnorm_spatial_persistent=1

python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" --log_dir=log_tsm main.py  --amp --validate -c configs/recognition/tsm/tsm_k400_frames.yaml
  • AMP works better with NHWC data format, scripts as follows:
export FLAGS_conv_workspace_size_limit=800 #MB
export FLAGS_cudnn_exhaustive_search=1
export FLAGS_cudnn_batchnorm_spatial_persistent=1

python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" --log_dir=log_tsm main.py  --amp --validate -c configs/recognition/tsm/tsm_k400_frames_nhwc.yaml
  • For the config file usage,please refer to config.

Train on UCF-101 dataset

download pretrain-model

  • Load the TSM model we trained on Kinetics-400 TSM_k400.pdparams, or download it through the command line

    wget https://videotag.bj.bcebos.com/PaddleVideo-release2.1/TSM/TSM_k400.pdparams
  • Open PaddleVideo/configs/recognition/tsm/tsm_ucf101_frames.yaml, and fill in the downloaded weight path below pretrained:

    MODEL:
        framework: "Recognizer2D"
        backbone:
            name: "ResNetTSM"
            pretrained: your weight path

Start training

  • By specifying different configuration files, different data formats/data sets can be used for training. Taking the training configuration of Kinetics-400 data set + 8 cards + frames format as an example, the startup command is as follows (more training commands can be viewed in PaddleVideo/run.sh).

    python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3" --log_dir=log_tsm main.py  --validate -c configs/recognition/tsm/tsm_ucf101_frames.yaml
  • Training UCF-101 dataset of videos format using scripts.

    python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3" --log_dir=log_tsm main.py  --validate -c configs/recognition/tsm/tsm_ucf101_videos.yaml
  • AMP is useful for speeding up training, scripts as follows:

    export FLAGS_conv_workspace_size_limit=800 #MB
    export FLAGS_cudnn_exhaustive_search=1
    export FLAGS_cudnn_batchnorm_spatial_persistent=1
    
    python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3" --log_dir=log_tsm main.py  --amp --validate -c configs/recognition/tsm/tsm_ucf101_frames.yaml
  • AMP works better with NHWC data format, scripts as follows:

    export FLAGS_conv_workspace_size_limit=800 #MB
    export FLAGS_cudnn_exhaustive_search=1
    export FLAGS_cudnn_batchnorm_spatial_persistent=1
    
    python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3" --log_dir=log_tsm main.py  --amp --validate -c configs/recognition/tsm/tsm_ucf101_frames_nhwc.yaml

Test

Put the weight of the model to be tested into the output/TSM/ directory, the test command is as follows

python3 main.py --test -c configs/recognition/tsm/tsm.yaml -w output/TSM/TSM_best.pdparams

When the test configuration uses the following parameters, the evaluation accuracy on the validation data set of Kinetics-400 is as follows:

backbone Sampling method Training Strategy num_seg target_size Top-1 checkpoints
ResNet50 Uniform NCHW 8 224 71.06 TSM_k400.pdparams

When the test configuration uses the following parameters, the evaluation accuracy on the validation data set of UCF-101 is as follows:

backbone Sampling method Training Strategy num_seg target_size Top-1 checkpoints
ResNet50 Uniform NCHW 8 224 94.42 TSM_ucf101_nchw.pdparams
ResNet50 Uniform NCHW+AMP 8 224 94.40 TSM_ucf101_amp_nchw.pdparams
ResNet50 Uniform NHWC+AMP 8 224 94.55 TSM_ucf101_amp_nhwc.pdparams

Inference

export inference model

To get model architecture file TSM.pdmodel and parameters file TSM.pdiparams, use:

python3.7 tools/export_model.py -c configs/recognition/tsm/tsm_k400_frames.yaml \
                                -p data/TSM_k400.pdparams \
                                -o inference/TSM

infer

python3.7 tools/predict.py --input_file data/example.avi \
                           --config configs/recognition/tsm/tsm_k400_frames.yaml \
                           --model_file inference/TSM/TSM.pdmodel \
                           --params_file inference/TSM/TSM.pdiparams \
                           --use_gpu=True \
                           --use_tensorrt=False

Implementation details

data processing

  • The model reads the mp4 data in the Kinetics-400 data set, first divides each piece of video data into num_seg segments, and then uniformly extracts 1 frame of image from each segment to obtain sparsely sampled num_seg video frames. Then do the same random data enhancement to this num_seg frame image, including multi-scale random cropping, random left and right flips, data normalization, etc., and finally zoom to target_size.

Training strategy

  • Use Momentum optimization algorithm training, momentum=0.9
  • Using L2_Decay, the weight attenuation coefficient is 1e-4
  • Using global gradient clipping, the clipping factor is 20.0
  • The total number of epochs is 50, and the learning rate will be attenuated by 0.1 times when the epoch reaches 20 and 40
  • The learning rate of the weight and bias of the FC layer are respectively 5 times and 10 times the overall learning rate, and the bias does not set L2_Decay
  • Dropout_ratio=0.5

Parameter initialization

  • Initialize the weight of the FC layer with the normal distribution of Normal(mean=0, std=0.001), and initialize the bias of the FC layer with a constant of 0

Reference