This repository gives the official PyTorch implementation of Actor-Context-Actor Relation Network for Spatio-temporal Action Localization (CVPR 2021) - 1st place solution of AVA-Kinetics Crossover Challenge 2020. This codebase also provides a general pipeline for training and evaluation on AVA-style datasets, as well as state-of-the-art action detection models.
Junting Pan | Siyu Chen | Zheng Shou | Jing Shao | Hongsheng Li |
Some key dependencies are listed below, while others are given in requirements.txt
.
- Python >= 3.6
- PyTorch >= 1.3, and a corresponding version of torchvision
- ffmpeg (used in data preparation)
- Download pre-trained models, which are listed in
pretrained/README.md
, to thepretrained
folder. - Prepare data. Please refer to
DATA.md
. - Download annotations files to the
annotations
folder. Seeannotations/README.md
for detailed information.
Default values for arguments nproc_per_node
, backend
and master_port
are 8
, nccl
and 31114
respectively.
python main.py --config CONFIG_FILE [--nproc_per_node N_PROCESSES] [--backend BACKEND] [--master_addr MASTER_ADDR] [--master_port MASTER_PORT]
In this case, the master_addr
argument must be provided. Moreover, arguments nnodes
and node_rank
can be additionally specified (similar to torch.distributed.launch
), otherwise the program will try to obtain their values from environment variables. See distributed_utils.py
for details.
Trained models are provided in model_zoo/README.md
.
- Our detections for AVA
- More advanced backbone
- Data preparation for Kinetics dataset, and training on AVA-Kinetics
- Implementation for ACFB
ACAR-Net is released under the Apache 2.0 license.
Find slides and video presentation of our winning solution on [Google Slides] [Youtube Video] [Bilibili Video] (Starting from 18:20).
Find our work on arXiv.
Please cite with the following Bibtex code:
@article{pan2020actorcontextactor,
title={Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization},
author={Junting Pan and Siyu Chen and Zheng Shou and Jing Shao and Hongsheng Li},
journal={arXiv preprint arXiv:2006.07976},
year={2020}
}
You may also want to refer to our publication with the more human-friendly Chicago style:
Junting Pan, Siyu Chen, Zheng Shou, Jing Shao, Hongsheng Li. "Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization." Arxiv 2020.
If you have any general question about our work or code which may be of interest to other researchers, please use the public issues section of this repository. Alternatively, drop us an e-mail at [email protected] and [email protected] .