Dataset | Data type | Scenes | Annotation | Task | #Examples/ #Classes |
SOTA/ benchmark |
---|---|---|---|---|---|---|
KTH[1] | Trimmed-video | Daily Living | Video-level | Action Recognition | 2391/ 6 |
98.9%[2] |
Collective Activity[3] | Trimmed-video | Daily Living | Person/ Group-level |
Group ActivityRecognition | 44/ 5 |
91.0%[4] |
HOLLYWOOD2[5] | Trimmed-video | Movie | Video-level | Action Recognition | 3,669/ 12 |
73.7%[6] |
Daphnet Gait[7] | Signal-sequence | Sport | Signal-level | Action Recognition | 1,917,887/ 2 |
94.1%[8] |
CK[9] | Still-image | Facial Expression | Image-level | Facial ExpressionRecognition | 327/ 7 |
88.7%[10] |
MMI[11] | Video/ Still-image |
Facial Expression | Action Unit | Facial ExpressionRecognition | 2900/ 6 |
98.6%[12] |
Pascal VOC Aactions[13] | Still-image | Comprehensive | Image-level | Action Recognition | 11,530/ 20 |
90.2%[14] |
WISDM[15] | Signal-sequence | Daily Living | Signal-level | Action Recognition | 1098213/ 6 |
98.2%[16] |
HMDB51[17] | Trimmed-video | Daily Living | Video-level | Action Recognition | 6,766/ 51 |
82.1%[18] |
UCF101[19] | Trimmed-video | Sport | Video-level | Action Recognition | 13,320/ 101 |
98.2%[20] |
Opportunity[21] | Signal-sequence | Daily Living | Signal-level | Action Recognition | 701,366/ 16 |
91.8%[22] |
PAMAP2[23] | Signal-sequence | Daily Living | Signal-level | Action Recognition | 2,844,868/ 18 |
91.0%[24] |
SFEW-2.0[25],[26] | Still-image | Facial Expression | Image-level | Facial ExpressionRecognition | 1394/ 7 |
58.1%[27] |
MPII[28] | Still-image | Comprehensive | Image-level | Pose Estimation | 24920/ 410 |
92.1%[29] |
Breakfast Dataset[30] | Trimmed-video | Daily Living | Video-level | Action Recognition | 1,989/ 10 |
45.7%[31] |
HICO[32] | Still-image | Comprehensive | Image-level | Human-Object Interaction Recognition | 47774/ 117 |
47.1%[33] |
ACTIVITYNET-200[34] | Untrimmed-video | Daily Living | Time-interval | Video Understanding | 19,994/ 200 |
91.3%[35] |
Volleyball[36] | Trimmed-video | Sport | Video-level | Group ActivityRecognition | 4830/ 8 |
92.6%[4] |
Charades[38] | Trimmed-video | Daily Living | Video-level | Action Recognition | 9,848/ 157 |
43.4%[39] |
YouTube-8M[40] | Untrimmed-video | Comprehensive | Time-interval | Video Understanding | 6,100,000/ 3862 |
85.0%[40] |
THUMOS14[42] | Untrimmed-video | Comprehensive | Time-interval | Video Understanding | 18404/ 101 |
82.2%[35] |
Kinetics[44] | Trimmed-video | Comprehensive | Video-level | Action Recognition | 300,000/ 700 |
82.8%[45] |
Something-Something[46] | Trimmed-video | Daily Living | Video-level | Action Recognition | 220,847/ 174 |
51.6%[45] |
FCVID[48] | Untrimmed-video | Comprehensive | Video-level | Action Recognition | 91,223/ 239 |
77.6%[49] |
20BN-JESTER[50] | Trimmed-video | Hand Gesture | Video-level | Action Recognition | 148000/ 27 |
94.8%[50] |
Infrared Visible[52] | Trimmed-video | Daily Living | Video-level | Action Recognition | 1200/ 12 |
80.2%[52] |
AVA[54] | Untrimmed-video | Movie | Time-interval | Video Understanding | 57,600/ 80 |
27.2%[39] |
Epic-kitchen[56] | Trimmed-video | Daily Living | Video-level | Action Recognition | 432/ 149 |
34.5%[45] |
COIN[58] | Untrimmed-video | Daily Living | Time-interval | Video Understanding | 11827/ 180 |
88.0%[58] |
Moments in Time[60] | Trimmed-video | Comprehensive | Video-level | Action Recognition | 1,000,000/ 339 |
32.4%[61] |
- Recognizing human actions: a local SVM approach | 2004
- Human actions recognition based on 3D deep neural network | 2017
- What are they doing?: Collective activity classification using spatio-temporal relationship among people | 2009
- Learning Actor Relation Graphs for Group Activity Recognition | 2019
- Actions in Context | 2009
- Modeling video evolution for action recognition | 2015
- Potentials of enhanced context awareness in wearable assistants for Parkinson's disease patients with the freezing of gait syndrome | 2009
- Deep recurrent neural networks for human activity recognition | 2017
- The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression | 2010
- Greedy search for descriptive spatial face features | 2017
- Induced disgust, happiness and surprise: an addition to the mmi facial expression database | 2010
- Dexpression: Deep convolutional neural network for expression recognition | 2015
- The pascal visual object classes (voc) challenge | 2010
- Contextual action recognition with r* cnn | 2015
- Activity recognition using cell phone accelerometers | 2011
- Deep activity recognition models with triaxial accelerometers | 2016
- HMDB: a large video database for human motion recognition | 2011
- End-to-end video-level representation learning for action recognition | 2018
- UCF101: A dataset of 101 human actions classes from videos in the wild | 2012
- Potion: Pose motion representation for action recognition | 2018
- The Opportunity challenge: A benchmark database for on-body sensor-based activity recognition | 2013
- Comparison of feature learning methods for human activity recognition using wearable sensors | 2018
- Time series classification using multi-channels deep convolutional neural networks | 2014
- A comprehensive study of activity recognition using accelerometers | 2018
- Collecting large, richly annotated facial-expression databases from movies | 2012
- Emotion recognition in the wild challenge 2014: Baseline, data and protocol | 2014
- Covariance pooling for facial expression recognition | 2018
- 2d human pose estimation: New benchmark and state of the art analysis | 2014
- Multi-scale structure-aware network for human pose estimation | 2018
- The language of actions: Recovering the syntax and semantics of goal-directed human activities | 2014
- D3tw: Discriminative differentiable dynamic time warping for weakly supervised action alignment and segmentation | 2019
- Hico: A benchmark for recognizing human-object interactions in images | 2015
- HAKE: Human Activity Knowledge Engine | 2019
- ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding | 2015
- Untrimmednets for weakly supervised action recognition and detection | 2017
- A hierarchical deep temporal model for group activity recognition | 2016
- Learning Actor Relation Graphs for Group Activity Recognition | 2019
- Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding | 2016
- Long-term feature banks for detailed video understanding | 2019
- Youtube-8m: A large-scale video classification benchmark | 2016
- Youtube-8m: A large-scale video classification benchmark | 2016
- The THUMOS challenge on action recognition for videos “in the wild” | 2017
- Untrimmednets for weakly supervised action recognition and detection | 2017
- The kinetics human action video dataset | 2017
- Large-scale weakly-supervised pre-training for video action recognition | 2019
- The" Something Something" Video Database for Learning and Evaluating Visual Common Sense. | 2017
- Large-scale weakly-supervised pre-training for video action recognition | 2019
- Exploiting feature and class relationships in video categorization with regularized deep neural networks | 2017
- Pivot correlational neural network for multimodal video categorization | 2018
- Temporal Relational Reasoning in Videos | 2018
- Temporal Relational Reasoning in Videos | 2018
- PM-GANs: Discriminative Representation Learning for Action Recognition Using Partial-modalities | 2018
- PM-GANs: Discriminative Representation Learning for Action Recognition Using Partial-modalities | 2018
- AVA: A video dataset of spatio-temporally localized atomic visual actions | 2018
- Long-term feature banks for detailed video understanding | 2019
- Scaling Egocentric Vision: The EPIC-KITCHENS Dataset | 2018
- Large-scale weakly-supervised pre-training for video action recognition | 2019
- Coin: A large-scale dataset for comprehensive instructional video analysis | 2019
- Coin: A large-scale dataset for comprehensive instructional video analysis | 2019
- Moments in Time Dataset: one million videos for event understanding | 2019
- Collaborative Spatiotemporal Feature Learning for Video Action Recognition | 2019