init

csuhan · Mar 22, 2022 · 2373d1b · 2373d1b
commit 2373d1b
Show file tree

Hide file tree

Showing 61 changed files with 119,035 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,60 @@
+# output dir
+output*
+instant_test_output
+inference_test_output
+
+
+# *.png
+# *.json
+*.diff
+# *.jpg
+!/projects/DensePose/doc/images/*.jpg
+
+# compilation and distribution
+__pycache__
+_ext
+*.pyc
+*.pyd
+*.so
+*.dll
+*.egg-info/
+build/
+dist/
+wheels/
+
+# pytorch/python/numpy formats
+*.pth
+*.pkl
+*.npy
+*.ts
+model_ts*.txt
+
+# ipython/jupyter notebooks
+*.ipynb
+**/.ipynb_checkpoints/
+
+# Editor temporaries
+*.swn
+*.swo
+*.swp
+*~
+
+# editor settings
+.idea
+.vscode
+_darcs
+
+# project dirs
+/detectron2/model_zoo/configs
+# /datasets/*
+!/datasets/*.*
+/projects/*/datasets
+/models
+/snippet
+res*
+/checkpoints
+detectron2
+results
+checkpoints
+demo_images
+exps
diff --git a/README.md b/README.md
@@ -0,0 +1,109 @@
+## OpenDet
+
+<img src="./docs/opendet2.png" width="78%"/>
+
+> **Expanding Low-Density Latent Regions for Open-Set Object Detection (CVPR2022)**<br>
+> [Jiaming Han](https://csuhan.com), [Yuqiang Ren](https://github.com/Anymake), [Jian Ding](https://dingjiansw101.github.io), [Xingjia Pan](https://scholar.google.com.hk/citations?user=NaSU3eIAAAAJ&hl=zh-CN), Ke Yan, [Gui-Song Xia](http://www.captain-whu.com/xia_En.html).<br>
+> [arXiv preprint](https://csuhan.com/attaches/cvpr_3605_final.pdf).
+
+OpenDet2: OpenDet is implemented based on [detectron2](https://github.com/facebookresearch/detectron2).
+
+### Setup
+
+The code is based on [detectron2 v0.5](https://github.com/facebookresearch/detectron2/tree/v0.5). 
+
+* **Installation** 
+
+Here is a from-scratch setup script.
+
+```
+conda create -n opendet2 python=3.8 -y
+conda activate opendet2
+
+conda install pytorch=1.8.1 torchvision cudatoolkit=10.1 -c pytorch -y
+pip install detectron2==0.5 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html
+git clone https://github.com/csuhan/opendet2.git
+cd opendet2
+pip install -v -e .
+```
+
+* **Prepare datasets** 
+
+Please follow [datasets/README.md](datasets/README.md) for dataset preparation. Then we generate VOC-COCO datasets.
+
+```
+bash datasets/opendet2_utils/prepare_openset_voc_coco.sh
+# using data splits provided by us.
+cp datasets/voc_coco_ann datasets/voc_coco -rf
+```
+
+### Model Zoo
+
+We report the results on VOC and VOC-COCO-20, and provide pretrained models. Please refer to the corresponding log file for full results.
+
+* **Faster R-CNN**
+
+| Method  | backbone | mAP<sub>K&uarr;</sub>(VOC) | WI<sub>&darr;</sub> | AOSE<sub>&darr;</sub> | mAP<sub>K&uarr;</sub> | AP<sub>U&uarr;</sub> |   Download   |
+|---------|:--------:|:--------------------------:|:-------------------:|:---------------------:|:---------------------:|:--------------------:|:------------:|
+| FR-CNN  |   R-50   |            80.06           |        19.50        |         16518         |         58.36         |           0          | [config](configs/faster_rcnn_R_50_FPN_3x_baseline.yaml) [model](https://drive.google.com/drive/folders/10uFOLLCK4N8te08-C-olRyDV-cJ-L6lU?usp=sharing) |
+| PROSER  |   R-50   |            79.42           |        20.44        |         14266         |         56.72         |         16.99        | [config](configs/faster_rcnn_R_50_FPN_3x_proser.yaml) [model](https://drive.google.com/drive/folders/1_L85gisyvDtBXPe2UbI49vrd5FoBIOI_?usp=sharing) |
+| ORE     |   R-50   |            79.80           |        18.18        |         12811         |         58.25         |         2.60         | [config]() [model]() |
+| DS      |   R-50   |            79.70           |        16.76        |         13062         |         58.46         |         8.75         | [config](configs/faster_rcnn_R_50_FPN_3x_ds.yaml) [model](https://drive.google.com/drive/folders/1OWDjL29E2H-_lSApXqM2r8PS7ZvUNtiv?usp=sharing) |
+| OpenDet |   R-50   |            80.02           |        12.50        |         10758         |         58.64         |         14.38        | [config](configs/faster_rcnn_R_50_FPN_3x_opendet.yaml) [model](https://drive.google.com/drive/folders/10uFOLLCK4N8te08-C-olRyDV-cJ-L6lU?usp=sharing) |
+| OpenDet |  Swin-T  |            83.29           |        10.76        |          9149         |         63.42         |         16.35        | [config](configs/faster_rcnn_Swin_T_FPN_3x_opendet.yaml) [model](https://drive.google.com/drive/folders/1j5SkEzeqr0ZnGVVZ4mzXSOvookHfvVvm?usp=sharing) |
+
+* **RetinaNet**
+
+| Method         | mAP<sub>K&uarr;</sub>(VOC) | WI<sub>&darr;</sub> | AOSE<sub>&darr;</sub> | mAP<sub>K&uarr;</sub> | AP<sub>U&uarr;</sub> |     Download     |
+|----------------|:--------------------------:|:-------------------:|:---------------------:|:---------------------:|:--------------------:|:----------------:|
+| RetinaNet      |            79.63           |        14.16        |         36531         |         57.32         |           0          | [config](configs/retinanet_R_50_FPN_3x_baseline.yaml) [model](https://drive.google.com/drive/folders/15fHfyA2HuXp6LfdTMBuHG6ZwtLcgvD-p?usp=sharing) |
+| Open-RetinaNet |            79.64           |        10.74        |         17208         |         57.32         |         10.55        | [config](configs/retinanet_R_50_FPN_3x_opendet.yaml) [model](https://drive.google.com/drive/folders/1uLRZ5bdGaoORWaP2huiyL_WyLicmWT4G?usp=sharing) |
+
+
+**Note**:
+* If you cannot access google drive, BaiduYun download link can be found [here](https://pan.baidu.com/s/1I4Pp40pM84aeYTNeGc0kPA) with extracting code ABCD.
+* The above results are reimplemented. Therefore, they are slightly different from our paper.
+* The official code of ORE is at [OWOD](https://github.com/JosephKJ/OWOD). We do not plan to include ORE in our code. 
+
+### Online Demo
+
+Try our online demo at [huggingface space](https://huggingface.co/spaces/csuhan/opendet2).
+
+### Train and Test
+
+* **Testing**
+
+First, you need to download pretrained weights in the model zoo, e.g., [OpenDet](https://drive.google.com/drive/folders/10uFOLLCK4N8te08-C-olRyDV-cJ-L6lU?usp=sharing).
+
+Then, run the following command:
+```
+python tools/train_net.py --num-gpus 8 --config-file configs/faster_rcnn_R_50_FPN_3x_opendet.yaml \
+        --eval-only MODEL.WEIGHTS output/faster_rcnn_R_50_FPN_3x_opendet/model_final.pth
+```
+
+* **Training**
+
+The training process is the same as detectron2.
+```
+python tools/train_net.py --num-gpus 8 --config-file configs/faster_rcnn_R_50_FPN_3x_opendet.yaml
+```
+
+To train with the Swin-T backbone, please download [swin_tiny_patch4_window7_224.pth](https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth) and convert it to detectron2's format using [tools/convert_swin_to_d2.py](tools/convert_swin_to_d2.py).
+```
+wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth
+python tools/convert_swin_to_d2.py swin_tiny_patch4_window7_224.pth swin_tiny_patch4_window7_224_d2.pth
+```
+
+
+### Citation
+
+If you find our work useful for your research, please consider citing:
+
+```BibTeX
+@InProceedings{han2022opendet,
+    author    = {Han, Jiaming and Ren, Yuqiang and Ding, Jian and Pan, Xingjia and Yan, Ke and Xia, Gui-Song},
+    title     = {Expanding Low-Density Latent Regions for Open-Set Object Detection},
+    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
+    year      = {2022}
+}
+```
diff --git a/app.py b/app.py
@@ -0,0 +1,56 @@
+"""
+Online demo at huggingface.
+The link is: https://huggingface.co/spaces/csuhan/opendet2
+"""
+import os
+os.system('pip install torch==1.9 torchvision')
+os.system('pip install detectron2==0.5 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.9/index.html')
+os.system('pip install timm opencv-python-headless')
+
+
+import gradio as gr
+
+from demo.predictor import VisualizationDemo
+from detectron2.config import get_cfg
+from opendet2 import add_opendet_config
+
+
+model_cfgs = {
+    "FR-CNN": ["configs/faster_rcnn_R_50_FPN_3x_baseline.yaml", "frcnn_r50.pth"],
+    "OpenDet-R50": ["configs/faster_rcnn_R_50_FPN_3x_opendet.yaml", "opendet2_r50.pth"],
+    "OpenDet-SwinT": ["configs/faster_rcnn_Swin_T_FPN_18e_opendet_voc.yaml", "opendet2_swint.pth"],
+}
+
+
+def setup_cfg(model):
+    cfg = get_cfg()
+    add_opendet_config(cfg)
+    model_cfg = model_cfgs[model]
+    cfg.merge_from_file(model_cfg[0])
+    cfg.MODEL.WEIGHTS = model_cfg[1]
+    cfg.MODEL.DEVICE = "cpu"
+    cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5
+    cfg.MODEL.ROI_HEADS.VIS_IOU_THRESH = 0.8
+    cfg.freeze()
+    return cfg
+
+
+def inference(input, model):
+    cfg = setup_cfg(model)
+    demo = VisualizationDemo(cfg)
+    # use PIL, to be consistent with evaluation
+    predictions, visualized_output = demo.run_on_image(input)
+    output = visualized_output.get_image()[:, :, ::-1]
+    return output
+
+
+iface = gr.Interface(
+    inference,
+    [
+        "image",
+        gr.inputs.Radio(
+            ["FR-CNN", "OpenDet-R50", "OpenDet-SwinT"], default='OpenDet-R50'),
+    ],
+    "image")
+
+iface.launch()
diff --git a/configs/Base-RCNN-FPN-OPENDET.yaml b/configs/Base-RCNN-FPN-OPENDET.yaml
@@ -0,0 +1,25 @@
+_BASE_: "./Base-RCNN-FPN.yaml"
+MODEL:
+  MASK_ON: False
+  ROI_HEADS:
+    NAME: "OpenSetStandardROIHeads"
+    NUM_CLASSES: 81
+    NUM_KNOWN_CLASSES: 20
+  ROI_BOX_HEAD:
+    NAME: "FastRCNNSeparateConvFCHead"
+    OUTPUT_LAYERS: "OpenDetFastRCNNOutputLayers"
+    CLS_AGNOSTIC_BBOX_REG: True
+UPLOSS:
+  START_ITER: 100
+  SAMPLING_METRIC: "min_score"
+  TOPK: 3
+  ALPHA: 1.0
+  WEIGHT: 1.0
+ICLOSS:
+  OUT_DIM: 128
+  QUEUE_SIZE: 256
+  IN_QUEUE_SIZE: 16
+  BATCH_IOU_THRESH: 0.5
+  QUEUE_IOU_THRESH: 0.7
+  TEMPERATURE: 0.1
+  WEIGHT: 0.1
diff --git a/configs/Base-RCNN-FPN.yaml b/configs/Base-RCNN-FPN.yaml
@@ -0,0 +1,44 @@
+# The same as detectron2/configs/Base-RCNN-FPN.yaml
+MODEL:
+  META_ARCHITECTURE: "GeneralizedRCNN"
+  BACKBONE:
+    NAME: "build_resnet_fpn_backbone"
+  RESNETS:
+    OUT_FEATURES: ["res2", "res3", "res4", "res5"]
+  FPN:
+    IN_FEATURES: ["res2", "res3", "res4", "res5"]
+  ANCHOR_GENERATOR:
+    SIZES: [[32], [64], [128], [256], [512]]  # One size for each in feature map
+    ASPECT_RATIOS: [[0.5, 1.0, 2.0]]  # Three aspect ratios (same for all in feature maps)
+  RPN:
+    IN_FEATURES: ["p2", "p3", "p4", "p5", "p6"]
+    PRE_NMS_TOPK_TRAIN: 2000  # Per FPN level
+    PRE_NMS_TOPK_TEST: 1000  # Per FPN level
+    # Detectron1 uses 2000 proposals per-batch,
+    # (See "modeling/rpn/rpn_outputs.py" for details of this legacy issue)
+    # which is approximately 1000 proposals per-image since the default batch size for FPN is 2.
+    POST_NMS_TOPK_TRAIN: 1000
+    POST_NMS_TOPK_TEST: 1000
+  ROI_HEADS:
+    NAME: "StandardROIHeads"
+    IN_FEATURES: ["p2", "p3", "p4", "p5"]
+  ROI_BOX_HEAD:
+    NAME: "FastRCNNConvFCHead"
+    NUM_FC: 2
+    POOLER_RESOLUTION: 7
+  ROI_MASK_HEAD:
+    NAME: "MaskRCNNConvUpsampleHead"
+    NUM_CONV: 4
+    POOLER_RESOLUTION: 14
+DATASETS:
+  TRAIN: ("coco_2017_train",)
+  TEST: ("coco_2017_val",)
+SOLVER:
+  IMS_PER_BATCH: 16
+  BASE_LR: 0.02
+  STEPS: (60000, 80000)
+  MAX_ITER: 90000
+INPUT:
+  MIN_SIZE_TRAIN: (480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800)
+  MIN_SIZE_TEST: 800
+VERSION: 2
diff --git a/configs/Base-RetinaNet.yaml b/configs/Base-RetinaNet.yaml
@@ -0,0 +1,26 @@
+# The same as detectron2/configs/Base-RetinaNet.yaml
+MODEL:
+  META_ARCHITECTURE: "RetinaNet"
+  BACKBONE:
+    NAME: "build_retinanet_resnet_fpn_backbone"
+  RESNETS:
+    OUT_FEATURES: ["res3", "res4", "res5"]
+  ANCHOR_GENERATOR:
+    SIZES: !!python/object/apply:eval ["[[x, x * 2**(1.0/3), x * 2**(2.0/3) ] for x in [32, 64, 128, 256, 512 ]]"]
+  FPN:
+    IN_FEATURES: ["res3", "res4", "res5"]
+  RETINANET:
+    IOU_THRESHOLDS: [0.4, 0.5]
+    IOU_LABELS: [0, -1, 1]
+    SMOOTH_L1_LOSS_BETA: 0.0
+DATASETS:
+  TRAIN: ("coco_2017_train",)
+  TEST: ("coco_2017_val",)
+SOLVER:
+  IMS_PER_BATCH: 16
+  BASE_LR: 0.01  # Note that RetinaNet uses a different default learning rate
+  STEPS: (60000, 80000)
+  MAX_ITER: 90000
+INPUT:
+  MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800)
+VERSION: 2
diff --git a/configs/faster_rcnn_R_50_FPN_3x_baseline.yaml b/configs/faster_rcnn_R_50_FPN_3x_baseline.yaml
@@ -0,0 +1,16 @@
+_BASE_: "./Base-RCNN-FPN-OPENDET.yaml"
+MODEL:
+  WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-50.pkl"
+  RESNETS:
+    DEPTH: 50
+  ROI_BOX_HEAD:
+    OUTPUT_LAYERS: "CosineFastRCNNOutputLayers" # baseline use a simple cosine FRCNN
+DATASETS:
+  TRAIN: ('voc_2007_train', 'voc_2012_trainval')
+  TEST: ('voc_2007_test', 'voc_coco_20_40_test', 'voc_coco_20_60_test', 'voc_coco_20_80_test', 'voc_coco_2500_test', 'voc_coco_5000_test', 'voc_coco_10000_test', 'voc_coco_20000_test')
+SOLVER:
+  STEPS: (21000, 29000)
+  MAX_ITER: 32000
+  WARMUP_ITERS: 100
+  AMP:
+    ENABLED: True
diff --git a/configs/faster_rcnn_R_50_FPN_3x_ds.yaml b/configs/faster_rcnn_R_50_FPN_3x_ds.yaml
@@ -0,0 +1,18 @@
+_BASE_: "./Base-RCNN-FPN-OPENDET.yaml"
+MODEL:
+  WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-50.pkl"
+  RESNETS:
+    DEPTH: 50
+  ROI_HEADS:
+    NAME: "DropoutStandardROIHeads"
+  ROI_BOX_HEAD:
+    OUTPUT_LAYERS: "DropoutFastRCNNOutputLayers"
+DATASETS:
+  TRAIN: ('voc_2007_train', 'voc_2012_trainval')
+  TEST: ('voc_2007_test', 'voc_coco_20_40_test', 'voc_coco_20_60_test', 'voc_coco_20_80_test', 'voc_coco_2500_test', 'voc_coco_5000_test', 'voc_coco_10000_test', 'voc_coco_20000_test')
+SOLVER:
+  STEPS: (21000, 29000)
+  MAX_ITER: 32000
+  WARMUP_ITERS: 100
+  AMP:
+    ENABLED: True
diff --git a/configs/faster_rcnn_R_50_FPN_3x_opendet.yaml b/configs/faster_rcnn_R_50_FPN_3x_opendet.yaml
@@ -0,0 +1,16 @@
+_BASE_: "./Base-RCNN-FPN-OPENDET.yaml"
+MODEL:
+  WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-50.pkl"
+  RESNETS:
+    DEPTH: 50
+DATASETS:
+  TRAIN: ('voc_2007_train', 'voc_2012_trainval')
+  TEST: ('voc_2007_test', 'voc_coco_20_40_test', 'voc_coco_20_60_test', 'voc_coco_20_80_test', 'voc_coco_2500_test', 'voc_coco_5000_test', 'voc_coco_10000_test', 'voc_coco_20000_test')
+SOLVER:
+  STEPS: (21000, 29000)
+  MAX_ITER: 32000
+  WARMUP_ITERS: 100
+  AMP:
+    ENABLED: True
+
+# UPLOSS.WEIGHT: former two are 0.5, the last is 1.0