Skip to content

Latest commit

 

History

History
49 lines (38 loc) · 6.99 KB

MODEL_ZOO.md

File metadata and controls

49 lines (38 loc) · 6.99 KB

MODEL ZOO

Kinetics

Dataset architecture depth init clips x crops #frames x sampling rate acc@1 acc@5 checkpoint config
K400 TAda2D R50 IN-1K 10 x 3 8 x 8 76.7 92.6 [google drive][baidu(code:p06d)] tada2d_8x8.yaml
K400 TAda2D R50 IN-1K 10 x 3 16 x 5 77.4 93.1 [google drive][baidu(code:6k8h)] tada2d_16x5.yaml
K400 ViViT Fact. Enc. B16x2 IN-21K 4 x 3 32 x 2 79.4 94.0 [google drive][baidu(code:1t51)] vivit_fac_enc_b16x2.yaml

Something-Something

Dataset architecture depth init clips x crops #frames acc@1 acc@5 checkpoint config
SSV2 TAda2D R50 IN-1K 2 x 3 8 64.2 88.0 [google drive][baidu(code:dlil)] tada2d_8f.yaml
SSV2 TAda2D R50 IN-1K 2 x 3 16 65.6 89.1 [google drive][baidu(code:f857)] tada2d_16f.yaml

Epic-Kitchens Action Recognition

architecture init resolution clips x crops #frames x sampling rate action acc@1 verb acc@1 noun acc@1 checkpoint config
ViViT Fact. Enc.-B16x2 K700 320 4 x 3 32 x 2 46.3 67.4 58.9 [google drive][baidu(code:rinh)] vivit_fac_enc.yaml
ir-CSN-R152 K700 224 10 x 3 32 x 2 44.5 68.4 55.9 [google drive][baidu(code:s0uj)] csn.yaml

Epic-Kitchens Temporal Action Localization

feature classification type [email protected] [email protected] [email protected] [email protected] [email protected] Avg checkpoint config
ViViT ViViT Verb 22.90 21.93 20.74 19.08 16.00 20.13 [google drive][baidu(code:3sud)] vivit-os-local.yaml
ViViT ViViT Noun 28.95 27.38 25.52 22.67 18.95 24.69 [google drive][baidu(code:3sud)] vivit-os-local.yaml
ViViT ViViT Action 20.82 19.93 18.67 17.02 15.06 18.30 [google drive][baidu(code:3sud)] vivit-os-local.yaml
TAda2D TAda2D Verb 19.70 18.49 17.41 15.50 12.78 16.78 [google drive][baidu(code:d01j)] -
TAda2D TAda2D Noun 20.54 19.32 17.94 15.77 13.39 17.39 [google drive][baidu(code:d01j)] -
TAda2D TAda2D Action 15.15 14.32 13.59 12.18 10.65 13.18 [google drive][baidu(code:d01j)] -

MoSI

Note: for the following models, decord 0.4.1 are used rather than the default 0.6.0 for the codebase.

Pre-train (without finetuning)

dataset backbone checkpoint config
HMDB51 R-2D3D-18 [google drive][baidu(code:ahqg)] papers/CVPR2021-MOSI/config/MoSI_r2d3d_hmdb.py
HMDB51 R(2+1)D-10 [google drive][baidu(code:1ktb)] papers/CVPR2021-MOSI/config/MoSI_r2p1d_hmdb.py

Finetuned

dataset backbone acc@1 acc@5 checkpoint config
HMDB51 R-2D3D-18 46.93 74.71 [google drive][baidu(code:2puu)] papers/CVPR2021-MOSI/config/Finetune_r2d3d_hmdb.py
HMDB51 R(2+1)D-10 51.83 78.63 [google drive][baidu(code:hgnc)] papers/CVPR2021-MOSI/config/Finetune_r2p1d_hmdb.py