Stage | Model | ARID Top-1 | Download | Shell |
---|---|---|---|---|
Pre-train | UniFormer-B32-K600 | N/A | N/A | |
Pre-train | UniFormer-B32-SSV2 | N/A | N/A | |
Pre-train | MViT-B32-K600 | N/A | N/A | |
Pre-train | SlowFast-R101-K700 | N/A | N/A | |
Adapt BN | UniFormer-B32-K600 | 62.64 | run.sh | |
Adapt BN | UniFormer-B32-SSV2 | 58.90 | run.sh | |
Adapt BN | MViT-B32-K600 | 58.14 | run.sh | |
Adapt BN | SlowFast-R101-K700 | 57.79 | run.sh | |
Pseudo1 | UniFormer-B32-K600 | 83.50 | run.sh | |
Pseudo1 | UniFormer-B32-SSV2 | 81.04 | run.sh | |
Pseudo1 | MViT-B32-K600 | 81.68 | run.sh | |
Pseudo1 | SlowFast-R101-K700 | 80.78 | run.sh | |
Pseudo2 | UniFormer-B32-K600 | 87.84 | run.sh | |
Pseudo2 | UniFormer-B32-SSV2 | 85.95 | run.sh | |
Pseudo2 | MViT-B32-K600 | 86.63 | run.sh | |
Pseudo2 | SlowFast-R101-K700 | 85.63 | run.sh | |
Pseudo3 | UniFormer-B32-K600 | 89.48 | run.sh | |
Pseudo3 | UniFormer-B32-SSV2 | 88.74 | run.sh | |
Pseudo3 | MViT-B32-K600 | 88.75 | run.sh | |
Pseudo3 | SlowFast-R101-K700 | 88.59 | run.sh | |
Pseudo4 | UniFormer-B32-K600 | 89.91 | run.sh | |
Pseudo4 | UniFormer-B32-SSV2 | 90.25 | run.sh | |
Pseudo4 | MViT-B32-K600 | 90.30 | run.sh | |
Pseudo4 | SlowFast-R101-K700 | 89.49 | run.sh | |
Pseudo4 | UniFormer-B32-SSV2† | 89.51 | run.sh |
Note:
- All models are trained with 32 frames that are uniformly sampled from the raw videos by default, except that the UniFormer-B32† is trained with dense sampling.
- We used all the videos in ARID (a total of 6207 videos) for validation. For training, we generate pseudo labels for these videos.
- All acc results are evaluated with the TTA of 1(crop)x1(view) and gamma correction.
You can reuse all these models via setting TRAIN.CHECKPOINT_FILE_PATH
and TEST.CHECKPOINT_FILE_PATH
.