Actor-identified Spatiotemporal Action Detection (ASAD)

The success of deep learning on video Action Recognition (AR) has motivated researchers to progressively promote related tasks from the coarse level to the fine-grained level. Compared with conventional AR that only predicts an action label for the entire video, Temporal Action Detection (TAD) has been investigated for estimating the start and end time for each action in videos. Taking TAD a step further, Spatiotemporal Action Detection (SAD) has been studied for localizing the action both spatially and temporally in videos. However, who performs the action, is generally ignored in SAD, while identifying the actor could also be important. To this end, we propose a novel task, Actor-identified Spatiotemporal Action Detection (ASAD), to bridge the gap between SAD and actor identification.

A-AVA Datasets

We create a new A-AVA dataset based on existing AVA dataset and the TAO dataset, by assigning the unique actor identity and actions to each actor.

A-AVA Dataset Download

Our A-AVA dataset is extended from the AVA dataset and the TAO dataset. For more details and copyrights, please refer to their webpages.

You can download our A-AVA dataset from Download

A-AVA Dataset Structure

After donwloading A-AVA dataset, you can unzip it to generate following data.

AVA/
|-- action_anno_train.pickle
|-- action_anno_val.pickle
|-- train
|   |-- AVA
|   |   |-- 5BDj0ow5hnA_scene_13_61290-62898
|   |   |   |-- frame0301.jpg   
|-- val
|   |-- AVA
|   |   |-- 7YpF6DntOYw_scene_3_32470-33281
|   |   |   |-- frame0001.jpg

The A-AVA dataset annotation has the following structures:

action_anno_val = {1230(video id):
                {'video_info': 
                                {'video_id': 1230,
                                'width': 1280,
                                'height': 720,
                                'video_path': 'val/AVA/keUOiCcHtoQ_scene_28_124948-125707'
                                },
                    'img_info':
                                {47560 (image id):  
                                    {'img_path': 'val/AVA/keUOiCcHtoQ_scene_28_124948-125707/frame0051.jpg',
                                    'frame': 50},
                                ...
                                }
                    'obj_info':
                                {47560 (image id): 
                                    {0 (object id): 
                                        {'bbox': (154, 79, 341, 281), 
                                        'image_id': 47560, 
                                        'action': [14]},
                                    ...}
                                ...}
                }
        ...
        }

To do the evaluation, the output should have the same format as the above example. Make your result as action_pred_val.pickle and put it under A-AVA folder.

Quantitative evaluation

We provide the evaluation scripts. Which can be run by

python run_evaluation.py \
--pred_file A-AVA/action_pred_val.pickle \
--true_file A-AVA/action_anno_val.pickle

The results will be illustrated as

[email protected]: xxx,  IDF1: xxx,   MT: xxx, ML: xxx, ID s.w.: xxx,   HL: xxx

Citation

@article{yang2022actoridentified,
  title   = {Actor-identified Spatiotemporal Action Detection - Detecting Who Is Doing What in Videos},
  author  = {Fan Yang and Norimichi Ukita and Sakriani Sakti and Satoshi Nakamura},
  year    = {2022},
  journal = {arXiv preprint arXiv: Arxiv-2208.12940}
}

License

This project is licensed under the MIT License - see the LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
docs		docs
utils		utils
LICENSE		LICENSE
README.md		README.md
run_evaluation.py		run_evaluation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Actor-identified Spatiotemporal Action Detection (ASAD)

A-AVA Datasets

A-AVA Dataset Download

A-AVA Dataset Structure

Quantitative evaluation

Citation

License

About

Releases

Packages

Languages

License

fandulu/ASAD

Folders and files

Latest commit

History

Repository files navigation

Actor-identified Spatiotemporal Action Detection (ASAD)

A-AVA Datasets

A-AVA Dataset Download

A-AVA Dataset Structure

Quantitative evaluation

Citation

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages