update

yjh0410 · Feb 6, 2023 · 4e9e13b · 4e9e13b
commit 4e9e13b
Show file tree

Hide file tree

Showing 85 changed files with 12,480 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,8 @@
+*.pt
+*.pth
+*.pkl
+*.pyc
+*.txt
+__pycache__
+det_results
+.vscode
diff --git a/README.md b/README.md
@@ -0,0 +1,167 @@
+# YOWOv2
+
+## Requirements
+- We recommend you to use Anaconda to create a conda environment:
+```Shell
+conda create -n yowo python=3.6
+```
+
+- Then, activate the environment:
+```Shell
+conda activate yowo
+```
+
+- Requirements:
+```Shell
+pip install -r requirements.txt 
+```
+
+## Visualization
+
+Comming soon ...
+
+# Dataset
+You can download **UCF24** from the following links:
+
+## UCF101-24:
+* Google drive
+
+Link: https://drive.google.com/file/d/1Dwh90pRi7uGkH5qLRjQIFiEmMJrAog5J/view?usp=sharing
+
+* BaiduYun Disk
+
+Link: https://pan.baidu.com/s/11GZvbV0oAzBhNDVKXsVGKg
+
+Password: hmu6 
+
+## AVA
+You can use instructions from [here](https://github.com/yjh0410/AVA_Dataset) to prepare **AVA** dataset.
+
+# Experiment
+* UCF101-24
+
+|      Model     |  Clip  | GFLOPs |  Params | F-mAP | V-mAP |   FPS   |    Weight    |
+|----------------|--------|--------|---------|-------|-------|---------|--------------|
+|  YOWOv2-Nano   |   16   |  2.6   | 3.5 M   | 78.8  | 48.0  |   42    | [ckpt](https://github.com/yjh0410/YOWOv2/releases/download/yowo_v2_weight/yowo_v2_nano_ucf24.pth) |
+|  YOWOv2-Tiny   |   16   |  5.8   | 10.9 M  | 80.5  | 51.3  |   50    | [ckpt](https://github.com/yjh0410/YOWOv2/releases/download/yowo_v2_weight/yowo_v2_tiny_ucf24.pth) |
+|  YOWOv2-Medium |   16   |  24.1  | 52.0 M  | 83.1  | 50.7  |   42    | [ckpt](https://github.com/yjh0410/YOWOv2/releases/download/yowo_v2_weight/yowo_v2_medium_ucf24.pth) |
+|  YOWOv2-Large  |   16   |  107.1 | 109.7 M | 85.2  | 52.0  |   30    | [ckpt](https://github.com/yjh0410/YOWOv2/releases/download/yowo_v2_weight/yowo_v2_large_ucf24.pth) |
+|  YOWOv2-Nano   |   32   |  4.0   | 3.5 M   | 79.4  | 49.0  |   42    | [ckpt](https://github.com/yjh0410/YOWOv2/releases/download/yowo_v2_weight/yowo_v2_nano_ucf24_k32.pth) |
+|  YOWOv2-Tiny   |   32   |  9.0   | 10.9 M  | 83.0  | 51.2  |   50    | [ckpt](https://github.com/yjh0410/YOWOv2/releases/download/yowo_v2_weight/yowo_v2_tiny_ucf24_k32.pth) |
+|  YOWOv2-Medium |   32   |  27.3  | 52.0 M  | 83.7  | 52.5  |   40    | [ckpt](https://github.com/yjh0410/YOWOv2/releases/download/yowo_v2_weight/yowo_v2_medium_ucf24_k32.pth) |
+|  YOWOv2-Large  |   32   |  183.9 | 109.7 M | 87.0  | 52.8  |   22    | [ckpt](https://github.com/yjh0410/YOWOv2/releases/download/yowo_v2_weight/yowo_v2_large_ucf24_k32.pth) |
+
+* AVA v2.2
+
+|     Model      |    Clip    |    mAP    |   FPS   |    weight    |
+|----------------|------------|-----------|---------|--------------|
+|  YOWOv2-Nano   |     16     |   12.6    |   40    | [ckpt](https://github.com/yjh0410/YOWOv2/releases/download/yowo_v2_weight/yowo_v2_nano_ava.pth) |
+|  YOWOv2-Tiny   |     16     |   14.9    |   49    | [ckpt](https://github.com/yjh0410/YOWOv2/releases/download/yowo_v2_weight/yowo_v2_tiny_ava.pth) |
+|  YOWOv2-Medium |     16     |   18.4    |   41    | [ckpt](https://github.com/yjh0410/YOWOv2/releases/download/yowo_v2_weight/yowo_v2_medium_ava.pth) |
+|  YOWOv2-Large  |     16     |   20.2    |   29    | [ckpt](https://github.com/yjh0410/YOWOv2/releases/download/yowo_v2_weight/yowo_v2_large_ava.pth) |
+|  YOWOv2-Nano   |     32     |       |       |  |
+|  YOWOv2-Tiny   |     32     |   15.6    |   49    | [ckpt](https://github.com/yjh0410/YOWOv2/releases/download/yowo_v2_weight/yowo_v2_tiny_ava_k32.pth) |
+|  YOWOv2-Medium |     32     |   18.4    |   40    | [ckpt](https://github.com/yjh0410/YOWOv2/releases/download/yowo_v2_weight/yowo_v2_medium_ava_k32.pth) |
+|  YOWOv2-Large  |     32     |   21.7    |   22    | [ckpt](https://github.com/yjh0410/YOWOv2/releases/download/yowo_v2_weight/yowo_v2_large_ava_k32.pth) |
+
+
+## Train YOWOv2
+* UCF101-24
+
+```Shell
+python train.py --cuda -d ucf24 --root path/to/dataset -v yowo_v2_nano --num_workers 4 --eval_epoch 1 --max_epoch 8 --lr_epoch 2 3 4 5 --lr 0.0001 -ldr 0.5 -bs 8 -accu 16
+```
+
+or you can just run the script:
+
+```Shell
+sh train_ucf.sh
+```
+
+* AVA
+```Shell
+python train.py --cuda -d ava_v2.2 --root path/to/dataset -v yowo_v2_nano --num_workers 4 --eval_epoch 1 --max_epoch 10 --lr_epoch 3 4 5 6 --lr 0.0001 -ldr 0.5 -bs 8 -accu 16 --eval
+```
+
+or you can just run the script:
+
+```Shell
+sh train_ava.sh
+```
+
+##  Test YOWOv2
+* UCF101-24
+For example:
+
+```Shell
+python test.py --cuda -d ucf24 -v yowo_v2_nano --weight path/to/weight -size 224 --show
+```
+
+* AVA
+For example:
+
+```Shell
+python test.py --cuda -d ava_v2.2 -v yowo_v2_nano --weight path/to/weight -size 224 --show
+```
+
+##  Test YOWOv2 on AVA video
+For example:
+
+```Shell
+python test_video_ava.py --cuda -d ava_v2.2 -v yowo_v2_nano --weight path/to/weight --video path/to/video --show
+```
+
+Note that you can set ```path/to/video``` to other videos in your local device, not AVA videos.
+
+## Evaluate YOWOv2
+* UCF101-24
+For example:
+
+```Shell
+# Frame mAP
+python eval.py \
+        --cuda \
+        -d ucf24 \
+        -v yowo_v2_nano \
+        -bs 16 \
+        -size 224 \
+        --weight path/to/weight \
+        --cal_frame_mAP \
+```
+
+```Shell
+# Video mAP
+python eval.py \
+        --cuda \
+        -d ucf24 \
+        -v yowo_v2_nano \
+        -bs 16 \
+        -size 224 \
+        --weight path/to/weight \
+        --cal_video_mAP \
+```
+
+* AVA
+
+Run the following command to calculate frame [email protected] IoU:
+
+```Shell
+python eval.py \
+        --cuda \
+        -d ava_v2.2 \
+        -v yowo_v2_nano \
+        -bs 16 \
+        --weight path/to/weight
+```
+
+## Demo
+```Shell
+# run demo
+python demo.py --cuda -d ucf24 -v yowo_v2_nano -size 224 --weight path/to/weight --video path/to/video --show
+                      -d ava_v2.2
+```
+
+## References
+If you are using our code, please consider citing our paper.
+
+Comming soon ...
diff --git a/config/__init__.py b/config/__init__.py
@@ -0,0 +1,21 @@
+from .dataset_config import dataset_config
+from .yowo_v2_config import yowo_v2_config
+
+
+def build_model_config(args):
+    print('==============================')
+    print('Model Config: {} '.format(args.version.upper()))
+
+    if 'yowo_v2_' in args.version:
+        m_cfg = yowo_v2_config[args.version]
+
+    return m_cfg
+
+
+def build_dataset_config(args):
+    print('==============================')
+    print('Dataset Config: {} '.format(args.dataset.upper()))
+
+    d_cfg = dataset_config[args.dataset]
+
+    return d_cfg
diff --git a/config/dataset_config.py b/config/dataset_config.py
@@ -0,0 +1,92 @@
+# Dataset configuration
+
+
+dataset_config = {
+    'ucf24': {
+        # dataset
+        'gt_folder': './evaluator/groundtruths_ucf_jhmdb/groundtruths_ucf/',
+        # input size
+        'train_size': 224,
+        'test_size': 224,
+        # transform
+        'jitter': 0.2,
+        'hue': 0.1,
+        'saturation': 1.5,
+        'exposure': 1.5,
+        'sampling_rate': 1,
+        # cls label
+        'multi_hot': False,  # one hot
+        # optimizer
+        'optimizer': 'adamw',
+        'momentum': 0.9,
+        'weight_decay': 5e-4,
+        # warmup strategy
+        'warmup': 'linear',
+        'warmup_factor': 0.00066667,
+        'wp_iter': 500,
+        # class names
+        'valid_num_classes': 24,
+        'label_map': (
+                    'Basketball',     'BasketballDunk',    'Biking',            'CliffDiving',
+                    'CricketBowling', 'Diving',            'Fencing',           'FloorGymnastics', 
+                    'GolfSwing',      'HorseRiding',       'IceDancing',        'LongJump',
+                    'PoleVault',      'RopeClimbing',      'SalsaSpin',         'SkateBoarding',
+                    'Skiing',         'Skijet',            'SoccerJuggling',    'Surfing',
+                    'TennisSwing',    'TrampolineJumping', 'VolleyballSpiking', 'WalkingWithDog'
+                ),
+    },
+
+    'ava_v2.2':{
+        # dataset
+        'frames_dir': 'frames/',
+        'frame_list': 'frame_lists/',
+        'annotation_dir': 'annotations/',
+        'train_gt_box_list': 'ava_v2.2/ava_train_v2.2.csv',
+        'val_gt_box_list': 'ava_v2.2/ava_val_v2.2.csv',
+        'train_exclusion_file': 'ava_v2.2/ava_train_excluded_timestamps_v2.2.csv',
+        'val_exclusion_file': 'ava_v2.2/ava_val_excluded_timestamps_v2.2.csv',
+        'labelmap_file': 'ava_v2.2/ava_action_list_v2.2_for_activitynet_2019.pbtxt', # 'ava_v2.2/ava_action_list_v2.2.pbtxt',
+        'class_ratio_file': 'config/ava_categories_ratio.json',
+        'backup_dir': 'results/',
+        # input size
+        'train_size': 224,
+        'test_size': 224,
+        # transform
+        'jitter': 0.2,
+        'hue': 0.1,
+        'saturation': 1.5,
+        'exposure': 1.5,
+        'sampling_rate': 1,
+        # cls label
+        'multi_hot': True,  # multi hot
+        # train config
+        'optimizer': 'adamw',
+        'momentum': 0.9,
+        'weight_decay': 5e-4,
+        # warmup strategy
+        'warmup': 'linear',
+        'warmup_factor': 0.00066667,
+        'wp_iter': 500,
+        # class names
+        'valid_num_classes': 80,
+        'label_map': (
+                    'bend/bow(at the waist)', 'crawl', 'crouch/kneel', 'dance', 'fall down',  # 1-5
+                    'get up', 'jump/leap', 'lie/sleep', 'martial art', 'run/jog',             # 6-10
+                    'sit', 'stand', 'swim', 'walk', 'answer phone',                           # 11-15
+                    'brush teeth', 'carry/hold (an object)', 'catch (an object)', 'chop', 'climb (e.g. a mountain)',  # 16-20
+                    'clink glass', 'close (e.g., a door, a box)', 'cook', 'cut', 'dig',                               # 21-25
+                    'dress/put on clothing', 'drink', 'drive (e.g., a car, a truck)', 'eat', 'enter',                 # 26-30
+                    'exit', 'extract', 'fishing', 'hit (an object)', 'kick (an object)',                              # 31-35
+                    'lift/pick up', 'listen (e.g., to music)', 'open (e.g., a window, a car door)', 'paint', 'play board game',  # 36-40
+                    'play musical instrument', 'play with pets', 'point to (an object)', 'press','pull (an object)',             # 41-45
+                    'push (an object)', 'put down', 'read', 'ride (e.g., a bike, a car, a horse)', 'row boat',                   # 46-50
+                    'sail boat', 'shoot', 'shovel', 'smoke', 'stir',                                                             # 51-55
+                    'take a photo', 'text on/look at a cellphone', 'throw', 'touch (an object)', 'turn (e.g., a screwdriver)',   # 56-60
+                    'watch (e.g., TV)', 'work on a computer', 'write', 'fight/hit (a person)', 'give/serve (an object) to (a person)',  # 61-65
+                    'grab (a person)', 'hand clap', 'hand shake', 'hand wave', 'hug (a person)',                                        # 66-70
+                    'kick (a person)', 'kiss (a person)', 'lift (a person)', 'listen to (a person)', 'play with kids',                  # 71-75
+                    'push (another person)', 'sing to (e.g., self, a person, a group)', 'take (an object) from (a person)',             # 76-78
+                    'talk to (e.g., self, a person, a group)', 'watch (a person)'                                                       # 79-80
+                ),
+    }
+}
diff --git a/config/yowo_v2_config.py b/config/yowo_v2_config.py
@@ -0,0 +1,84 @@
+# Model configuration
+
+
+yowo_v2_config = {
+    'yowo_v2_nano': {
+        # backbone
+        ## 2D
+        'backbone_2d': 'yolo_free_nano',
+        'pretrained_2d': True,
+        'stride': [8, 16, 32],
+        ## 3D
+        'backbone_3d': 'shufflenetv2',
+        'model_size': '1.0x',
+        'pretrained_3d': True,
+        'memory_momentum': 0.9,
+        # head
+        'head_dim': 64,
+        'head_norm': 'BN',
+        'head_act': 'lrelu',
+        'num_cls_heads': 2,
+        'num_reg_heads': 2,
+        'head_depthwise': True,
+    },
+
+    'yowo_v2_tiny': {
+        # backbone
+        ## 2D
+        'backbone_2d': 'yolo_free_tiny',
+        'pretrained_2d': True,
+        'stride': [8, 16, 32],
+        ## 3D
+        'backbone_3d': 'shufflenetv2',
+        'model_size': '2.0x',
+        'pretrained_3d': True,
+        'memory_momentum': 0.9,
+        # head
+        'head_dim': 64,
+        'head_norm': 'BN',
+        'head_act': 'lrelu',
+        'num_cls_heads': 2,
+        'num_reg_heads': 2,
+        'head_depthwise': False,
+    },
+
+    'yowo_v2_medium': {
+        # backbone
+        ## 2D
+        'backbone_2d': 'yolo_free_large',
+        'pretrained_2d': True,
+        'stride': [8, 16, 32],
+        ## 3D
+        'backbone_3d': 'shufflenetv2',
+        'model_size': '2.0x',
+        'pretrained_3d': True,
+        'memory_momentum': 0.9,
+        # head
+        'head_dim': 128,
+        'head_norm': 'BN',
+        'head_act': 'silu',
+        'num_cls_heads': 2,
+        'num_reg_heads': 2,
+        'head_depthwise': False,
+    },
+
+    'yowo_v2_large': {
+        # backbone
+        ## 2D
+        'backbone_2d': 'yolo_free_large',
+        'pretrained_2d': True,
+        'stride': [8, 16, 32],
+        ## 3D
+        'backbone_3d': 'resnext101',
+        'pretrained_3d': True,
+        'memory_momentum': 0.9,
+        # head
+        'head_dim': 256,
+        'head_norm': 'BN',
+        'head_act': 'silu',
+        'num_cls_heads': 2,
+        'num_reg_heads': 2,
+        'head_depthwise': False,
+    },
+
+}
diff --git a/dataset/__init__.py b/dataset/__init__.py