Merge branch 'main' of github.com:kennymckormick/pyskl into skeletr

kennymckormick · Sep 6, 2023 · 9577afd · 9577afd
2 parents b4d809d + e36e39c
commit 9577afd
Show file tree

Hide file tree

Showing 30 changed files with 910 additions and 267 deletions.
diff --git a/.gitignore b/.gitignore
@@ -108,9 +108,6 @@ benchlist.txt
 work_dirs/
 .cache/
 
-# Pytorch
-*.pth
-
 # Profile
 *.prof
 

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -11,7 +11,7 @@ repos:
         args: ["--max-line-length=120"]
         exclude: ^configs/
   - repo: https://github.com/PyCQA/isort
-    rev: 5.10.1
+    rev: 5.11.5
     hooks:
       - id: isort
   - repo: https://github.com/pre-commit/mirrors-yapf

diff --git a/README.md b/README.md
@@ -8,69 +8,61 @@ PYSKL is a toolbox focusing on action recognition based on **SK**e**L**eton data
 
 This repo is the official implementation of [PoseConv3D](https://arxiv.org/abs/2104.13586) and [STGCN++](https://github.com/kennymckormick/pyskl/tree/main/configs/stgcn%2B%2B).
 
-<div align="center">
-  <img src="https://user-images.githubusercontent.com/34324155/123989146-2ecae680-d9fb-11eb-916b-b9db5563a9e5.gif" width="500px"><br>
-  <p style="font-size:1.2vw;">Skeleton-base Action Recognition Results on NTU-RGB+D-120</p>
+<div id="wrapper" align="center">
+<figure>
+  <img src="https://user-images.githubusercontent.com/34324155/123989146-2ecae680-d9fb-11eb-916b-b9db5563a9e5.gif" width="520px">&emsp;
+  <img src="https://user-images.githubusercontent.com/34324155/218010909-ccfc89f0-9ed4-4b04-b38d-af7ffe49d2cd.gif" width="290px"><br>
+  <p style="font-size:1.2vw;">Left: Skeleton-base Action Recognition Results on NTU-RGB+D-120; Right: CPU Realtime Skeleton-base Gesture Recognition Results</p>
+</figure>
 </div>
 
-## News
+## Change Log
 
+- Improve skeleton extraction script ([PR](https://github.com/kennymckormick/pyskl/pull/150)). Now it supports non-distributed skeleton extraction and k400-style (**2023-03-20**).
+- Support PyTorch 2.0: when set `--compile` for training/testing scripts and with `torch.__version__ >= 'v2.0.0'` detected, will use `torch.compile` to compile the model before training/testing. Experimental Feature, absolutely no performance warranty (**2023-03-16**).
+- Provide a real-time gesture recognition demo based on skeleton-based action recognition with ST-GCN++, check [Demo](/demo/demo.md) for more details and instructions (**2023-02-10**).
 - Provide [scripts](/examples/inference_speed.ipynb) to estimate the inference speed of each model (**2022-12-30**).
 - Support [RGBPoseConv3D](https://arxiv.org/abs/2104.13586), a two-stream 3D-CNN for action recognition based on RGB & Human Skeleton. Follow the [guide](/configs/rgbpose_conv3d/README.md) to train and test RGBPoseConv3D on NTURGB+D （**2022-12-29**).
-- We provide a script ([ntu_preproc.py](/tools/data/ntu_preproc.py)) to generate PYSKL-style annotations files from official NTURGB+D skeleton files (**2022-12-20**).
-- Support [DG-STGCN](https://arxiv.org/abs/2210.05895), which is a state-of-the-art skeleton action algorithm that doesn't rely on a pre-defined graph (**2022-12-12**).
-- The [tech report](https://arxiv.org/abs/2205.09443) of PYSKL is accepted by MM 2022 (**2022-06-28**).
 
 ## Supported Algorithms
 
-- [x] DG-STGCN (Arxiv): https://arxiv.org/abs/2210.05895 [[MODELZOO](/configs/dgstgcn/README.md)]
-- [x] ST-GCN (AAAI 2018): https://arxiv.org/abs/1801.07455 [[MODELZOO](/configs/stgcn/README.md)]
-- [x] ST-GCN++ (PYSKL, Tech Report): https://arxiv.org/abs/2205.09443 [[MODELZOO](/configs/stgcn++/README.md)]
-- [x] PoseConv3D (CVPR 2022 Oral): https://arxiv.org/abs/2104.13586 [[MODELZOO](/configs/posec3d/README.md)]
-- [x] AAGCN (TIP): https://arxiv.org/abs/1912.06971 [[MODELZOO](/configs/aagcn/README.md)]
-- [x] MS-G3D (CVPR 2020 Oral): https://arxiv.org/abs/2003.14111 [[MODELZOO](/configs/msg3d/README.md)]
-- [x] CTR-GCN (ICCV 2021): https://arxiv.org/abs/2107.12213 [[MODELZOO](/configs/ctrgcn/README.md)]
+- [x] [DG-STGCN (Arxiv)](https://arxiv.org/abs/2210.05895) [[MODELZOO](/configs/dgstgcn/README.md)]
+- [x] [ST-GCN (AAAI 2018)](https://arxiv.org/abs/1801.07455) [[MODELZOO](/configs/stgcn/README.md)]
+- [x] [ST-GCN++ (ACMMM 2022)](https://arxiv.org/abs/2205.09443) [[MODELZOO](/configs/stgcn++/README.md)]
+- [x] [PoseConv3D (CVPR 2022 Oral)](https://arxiv.org/abs/2104.13586) [[MODELZOO](/configs/posec3d/README.md)]
+- [x] [AAGCN (TIP)](https://arxiv.org/abs/1912.06971) [[MODELZOO](/configs/aagcn/README.md)]
+- [x] [MS-G3D (CVPR 2020 Oral)](https://arxiv.org/abs/2003.14111) [[MODELZOO](/configs/msg3d/README.md)]
+- [x] [CTR-GCN (ICCV 2021)](https://arxiv.org/abs/2107.12213) [[MODELZOO](/configs/ctrgcn/README.md)]
 
 ## Supported Skeleton Datasets
 
-- [x] NTURGB+D (CVPR 2016): [NTU RGB+D: A large scale dataset for 3D human activity analysis](https://openaccess.thecvf.com/content_cvpr_2016/papers/Shahroudy_NTU_RGBD_A_CVPR_2016_paper.pdf)
-- [x] NTURGB+D 120 (TPAMI 2019): [Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8713892)
-- [x] Kinetics 400 (CVPR 2017): [Quo vadis, action recognition? a new model and the kinetics dataset](https://openaccess.thecvf.com/content_cvpr_2017/papers/Carreira_Quo_Vadis_Action_CVPR_2017_paper.pdf)
-- [x] UCF101 (ArXiv 2012): [UCF101: A dataset of 101 human actions classes from videos in the wild](https://arxiv.org/pdf/1212.0402.pdf)
-- [x] HMDB51 (ICCV 2021): [HMDB: a large video database for human motion recognition](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6126543)
-- [x] FineGYM (CVPR 2020): [Finegym: A hierarchical video dataset for fine-grained action understanding](https://openaccess.thecvf.com/content_CVPR_2020/papers/Shao_FineGym_A_Hierarchical_Video_Dataset_for_Fine-Grained_Action_Understanding_CVPR_2020_paper.pdf)
-- [x] Diving48 (ECCV 2018): [Resound: Towards action recognition without representation bias](https://openaccess.thecvf.com/content_ECCV_2018/papers/Yingwei_Li_RESOUND_Towards_Action_ECCV_2018_paper.pdf)
+- [x] [NTURGB+D (CVPR 2016)](https://arxiv.org/abs/1604.02808) and [NTURGB+D 120 (TPAMI 2019)](https://arxiv.org/abs/1905.04757)
+- [x] [Kinetics 400 (CVPR 2017)](https://arxiv.org/abs/1705.06950)
+- [x] [UCF101 (ArXiv 2012)](https://arxiv.org/pdf/1212.0402.pdf)
+- [x] [HMDB51 (ICCV 2021)](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6126543)
+- [x] [FineGYM (CVPR 2020)](https://arxiv.org/abs/2004.06704)
+- [x] [Diving48 (ECCV 2018)](https://openaccess.thecvf.com/content_ECCV_2018/papers/Yingwei_Li_RESOUND_Towards_Action_ECCV_2018_paper.pdf)
 
 ## Installation
 ```shell
 git clone https://github.com/kennymckormick/pyskl.git
 cd pyskl
-# Please first install pytorch according to instructions on the official website: https://pytorch.org/get-started/locally/. Please use pytorch with version smaller than 1.11.0 and larger (or equal) than 1.5.0
-# The following command will install mmcv-full 1.5.0 from source, which might be very slow (take ~10 minutes). You can also follow the instruction at https://github.com/open-mmlab/mmcv to install mmcv-full from pre-built wheels, which will be much faster.
-pip install -r requirements.txt
+# This command runs well with conda 22.9.0, if you are running an early conda version and got some errors, try to update your conda first
+conda env create -f pyskl.yaml
+conda activate pyskl
 pip install -e .
 ```
 
 ## Demo
 
-```shell
-# Before running the demo, make sure you have installed mmcv-full, mmpose and mmdet. You should first install mmcv-full, and then install mmpose, mmdet.
-# You should run the following scripts under the directory `$PYSKL`
-# Running the demo with PoseC3D trained on NTURGB+D 120 (Joint Modality), which is the default option. The input file is demo/ntu_sample.avi, the output file is demo/demo.mp4
-python demo/demo_skeleton.py demo/ntu_sample.avi demo/demo.mp4
-# Running the demo with STGCN++ trained on NTURGB+D 120 (Joint Modality). The input file is demo/ntu_sample.avi, the output file is demo/demo.mp4
-python demo/demo_skeleton.py demo/ntu_sample.avi demo/demo.mp4 --config configs/stgcn++/stgcn++_ntu120_xsub_hrnet/j.py --checkpoint http://download.openmmlab.com/mmaction/pyskl/ckpt/stgcnpp/stgcnpp_ntu120_xsub_hrnet/j.pth
-```
-
-Note that for running demo on an arbitrary input video, you need a tracker to formulate pose estimation results for each frame into multiple skeleton sequences. Currently we are using a [naive tracker](https://github.com/kennymckormick/pyskl/blob/4ddb7ac384e231694fd2b4b7774144e5762862ab/demo/demo_skeleton.py#L192) based on inter-frame pose similarities. You can also try to write your own tracker.
+Check [demo.md](/demo/demo.md).
 
 ## Data Preparation
 
 We provide HRNet 2D skeletons for every dataset we supported and Kinect 3D skeletons for the NTURGB+D and NTURGB+D 120 dataset. To obtain the human skeleton annotations, you can:
 
 1. Use our pre-processed skeleton annotations: we directly provide the processed skeleton data for all datasets as pickle files (which can be directly used for training and testing), check [Data Doc](/tools/data/README.md) for the download links and descriptions of the annotation format.
 2. For NTURGB+D 3D skeletons, you can download the official annotations from https://github.com/shahroudy/NTURGB-D, and use our [provided script](/tools/data/ntu_preproc.py) to generate the processed pickle files. The generated files are the same with the provided `ntu60_3danno.pkl` and `ntu120_3danno.pkl`. For detailed instructions, follow the [Data Doc](/tools/data/README.md).
-
 3. We also provide scripts to extract 2D HRNet skeletons from RGB videos, you can follow the [diving48_example](/examples/extract_diving48_skeleton/diving48_example.ipynb) to extract 2D skeletons from an arbitrary RGB video dataset.
 
 You can use [vis_skeleton](/demo/vis_skeleton.ipynb) to visualize the provided skeleton data.
@@ -91,12 +83,12 @@ For specific examples, please go to the README for each specific algorithm we su
 If you use PYSKL in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry and the BibTex entry corresponding to the specific algorithm you used.
 
 ```BibTeX
-@misc{duan2022PYSKL,
-  url = {https://arxiv.org/abs/2205.09443},
-  author = {Duan, Haodong and Wang, Jiaqi and Chen, Kai and Lin, Dahua},
-  title = {PYSKL: Towards Good Practices for Skeleton Action Recognition},
-  publisher = {arXiv},
-  year = {2022}
+@inproceedings{duan2022pyskl,
+  title={Pyskl: Towards good practices for skeleton action recognition},
+  author={Duan, Haodong and Wang, Jiaqi and Chen, Kai and Lin, Dahua},
+  booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
+  pages={7351--7354},
+  year={2022}
 }
 ```
 

diff --git a/configs/stgcn++/README.md b/configs/stgcn++/README.md
@@ -7,12 +7,12 @@ STGCN++ is a variant of STGCN we developed in PYSKL with some modifications in t
 ## Citation
 
 ```BibTeX
-@misc{duan2022PYSKL,
-  url = {https://arxiv.org/abs/2205.09443},
-  author = {Duan, Haodong and Wang, Jiaqi and Chen, Kai and Lin, Dahua},
-  title = {PYSKL: Towards Good Practices for Skeleton Action Recognition},
-  publisher = {arXiv},
-  year = {2022}
+@inproceedings{duan2022pyskl,
+  title={Pyskl: Towards good practices for skeleton action recognition},
+  author={Duan, Haodong and Wang, Jiaqi and Chen, Kai and Lin, Dahua},
+  booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
+  pages={7351--7354},
+  year={2022}
 }
 ```
 

diff --git a/demo/demo.md b/demo/demo.md
@@ -0,0 +1,41 @@
+# Demo
+
+We currently provide an offline GPU-demo for skeleton action recognition and an online CPU-demo for gesture recognition. Details are provided below.
+
+## Preparation
+
+- Before running the skeleton action recognition demo, make sure you have installed `mmcv-full`, `mmpose` and `mmdet`. We recommend you to directly use the provided conda environment, with all necessary dependencies included:
+```bash
+# Following commands assume you are in the root directory of pyskl (indicated as `$PYSKL`)
+# This command runs well with conda 22.9.0, if you are running an early conda version and got some errors, try to update your conda first
+conda env create -f pyskl.yaml  # Create the conda environment (named `pyskl`) for this project, run it if you haven't created one yet.
+conda activate pyskl  # Activate the `pyskl` environment
+pip install -e .  # Install this project
+```
+- Before running the gesture recognition demo, you need to install `mediapipe` first. This can be completed simply by `pip install mediapipe`.
+
+## Skeleton Action Recognition Demo (GPU, offline)
+
+The provided skeleton action recognition demo is offline, which means it takes a video clip as input and return the action detection. The demo runs on GPU. By default, this demo recognizes 120 actions categories defined in [NTURGB+D 120](https://arxiv.org/abs/1905.04757).
+
+For human skeleton extraction, we use [Faster-RCNN (R50 backbone)](/demo/faster_rcnn_r50_fpn_2x_coco.py) for human detection and [HRNet_w32](demo/hrnet_w32_coco_256x192.py) for human pose estimation. All based on OpenMMLab implementations.
+
+```bash
+# Running the demo with PoseC3D trained on NTURGB+D 120 (Joint Modality), which is the default option. The input file is demo/ntu_sample.avi, the output file is demo/demo.mp4
+python demo/demo_skeleton.py demo/ntu_sample.avi demo/demo.mp4
+# Running the demo with STGCN++ trained on NTURGB+D 120 (Joint Modality). The input file is demo/ntu_sample.avi, the output file is demo/demo.mp4
+python demo/demo_skeleton.py demo/ntu_sample.avi demo/demo.mp4 --config configs/stgcn++/stgcn++_ntu120_xsub_hrnet/j.py --checkpoint http://download.openmmlab.com/mmaction/pyskl/ckpt/stgcnpp/stgcnpp_ntu120_xsub_hrnet/j.pth
+```
+
+Note that for running demo on an arbitrary input video, you need a tracker to formulate pose estimation results for each frame into multiple skeleton sequences. Currently we are using a [naive tracker](https://github.com/kennymckormick/pyskl/blob/4ddb7ac384e231694fd2b4b7774144e5762862ab/demo/demo_skeleton.py#L192) based on inter-frame pose similarities. You can also try to write your own tracker.
+
+## Gestrue Recognition Demo (CPU, Real-time)
+
+We provide an online gesture recognition demo that runs real-time on CPU. The demo takes a video stream as input and predict the current gesture performed (It only supports the single-hand scenario now). By default, this demo recognizes 15 gestures defined in [HaGRID](https://github.com/hukenovs/hagrid), including: Call, Dislike, Fist, Four, Like, Mute, OK, One, Palm, Peace, Rock, Stop, Three [Middle 3 Fingers], Three [Left 3 Fingers], Two Up.
+
+For hand keypoint extraction, we use the opensource solution [mediapipe](https://google.github.io/mediapipe/). For skeleton-based gesture recognition, currently we adopt a light variant of [ST-GCN++](/demo/stgcnpp_gesture.py) model trained on the [HaGRID](https://github.com/hukenovs/hagrid) gesture recognition dataset.
+
+```bash
+# Run the real time skeleton-based gesture recognition demo
+python demo/demo_gesture.py
+```