Skip to content
/ ASAD Public

Action Detection

License

Notifications You must be signed in to change notification settings

fandulu/ASAD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The success of deep learning on video Action Recognition (AR) has motivated researchers to progressively promote related tasks from the coarse level to the fine-grained level. Compared with conventional AR that only predicts an action label for the entire video, Temporal Action Detection (TAD) has been investigated for estimating the start and end time for each action in videos. Taking TAD a step further, Spatiotemporal Action Detection (SAD) has been studied for localizing the action both spatially and temporally in videos. However, who performs the action, is generally ignored in SAD, while identifying the actor could also be important. To this end, we propose a novel task, Actor-identified Spatiotemporal Action Detection (ASAD), to bridge the gap between SAD and actor identification.

A-AVA Datasets

We create a new A-AVA dataset based on existing AVA dataset and the TAO dataset, by assigning the unique actor identity and actions to each actor.

A-AVA Dataset Download

Our A-AVA dataset is extended from the AVA dataset and the TAO dataset. For more details and copyrights, please refer to their webpages.

You can download our A-AVA dataset from Download

A-AVA Dataset Structure

After donwloading A-AVA dataset, you can unzip it to generate following data.

AVA/
|-- action_anno_train.pickle
|-- action_anno_val.pickle
|-- train
|   |-- AVA
|   |   |-- 5BDj0ow5hnA_scene_13_61290-62898
|   |   |   |-- frame0301.jpg   
|-- val
|   |-- AVA
|   |   |-- 7YpF6DntOYw_scene_3_32470-33281
|   |   |   |-- frame0001.jpg   

The A-AVA dataset annotation has the following structures:

action_anno_val = {1230(video id):
                {'video_info': 
                                {'video_id': 1230,
                                'width': 1280,
                                'height': 720,
                                'video_path': 'val/AVA/keUOiCcHtoQ_scene_28_124948-125707'
                                },
                    'img_info':
                                {47560 (image id):  
                                    {'img_path': 'val/AVA/keUOiCcHtoQ_scene_28_124948-125707/frame0051.jpg',
                                    'frame': 50},
                                ...
                                }
                    'obj_info':
                                {47560 (image id): 
                                    {0 (object id): 
                                        {'bbox': (154, 79, 341, 281), 
                                        'image_id': 47560, 
                                        'action': [14]},
                                    ...}
                                ...}
                }
        ...
        }

To do the evaluation, the output should have the same format as the above example. Make your result as action_pred_val.pickle and put it under A-AVA folder.

Quantitative evaluation

We provide the evaluation scripts. Which can be run by

python run_evaluation.py \
--pred_file A-AVA/action_pred_val.pickle \
--true_file A-AVA/action_anno_val.pickle

The results will be illustrated as

[email protected]: xxx,  IDF1: xxx,   MT: xxx, ML: xxx, ID s.w.: xxx,   HL: xxx

Citation

@article{yang2022actoridentified,
  title   = {Actor-identified Spatiotemporal Action Detection - Detecting Who Is Doing What in Videos},
  author  = {Fan Yang and Norimichi Ukita and Sakriani Sakti and Satoshi Nakamura},
  year    = {2022},
  journal = {arXiv preprint arXiv: Arxiv-2208.12940}
}

License

This project is licensed under the MIT License - see the LICENSE file for details

About

Action Detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published