roytseng-tw · adityaarun1 · May 23, 2018 · May 24, 2018 · May 24, 2018 · May 24, 2018
diff --git a/.gitignore b/.gitignore
@@ -9,6 +9,8 @@ data/*
 notebooks/*.pkl
 
 /Outputs
+lib/build
+lib/detectron_pytorch.egg-info
 
 # ------------------------------
 

diff --git a/README.md b/README.md
@@ -18,7 +18,7 @@
 
 </div>
 
-**This code follows the implementation architecture of Detectron.** Only part of the functionality is supported. Check [this section](#supported-network-modules) for more information.
+**This code follows the implementation architecture of Detectron.** Only part of the functionality is supported. Check [this section](#supported-network-modules) for more information. This code now supports **PyTorch 1.0** and **TorchVision 0.3**.
 
 With this code, you can...
 
@@ -35,7 +35,7 @@ This implementation has the following features:
 
 - **It supports multiple GPUs training**.
 
-- **It supports three pooling methods**. Notice that only **roi align** is revised to match the implementation in Caffe2. So, use it.
+- **It supports two pooling methods**. Notice that only **roi align** is revised to match the implementation in Caffe2. So, use it.
 
 - **It is memory efficient**. For data batching, there are two techiniques available to reduce memory usage: 1) *Aspect grouping*: group images with similar aspect ratio in a batch 2) *Aspect cropping*: crop images that are too long. Aspect grouping is implemented in Detectron, so it's used for default. Aspect cropping is the idea from [jwyang/faster-rcnn.pytorch](https://github.com/jwyang/faster-rcnn.pytorch), and it's not used for default.
 
@@ -46,6 +46,9 @@ This implementation has the following features:
 - (2018/05/25) Support ResNeXt backbones.
 - (2018/05/22) Add group normalization baselines.
 - (2018/05/15) PyTorch0.4 is supported now !
+- (2019/08/28) Support PASCAL VOC and Custom Dataset
+- (2019/01/17) **PyTorch 1.0 Supported now!**
+- (2019/05/30) Code rebased on **TorchVision 0.3**. Compilation is now optional!
 
 ## Getting Started
 Clone the repo:
@@ -59,9 +62,9 @@ git clone https://github.com/roytseng-tw/mask-rcnn.pytorch.git
 Tested under python3.
 
 - python packages
-  - pytorch>=0.3.1
-  - torchvision>=0.2.0
-  - cython
+  - pytorch>=1.0.0
+  - torchvision>=0.3.0
+  - cython>=0.29.2
   - matplotlib
   - numpy
   - scipy
@@ -70,10 +73,10 @@ Tested under python3.
   - packaging
   - [pycocotools](https://github.com/cocodataset/cocoapi)  — for COCO dataset, also available from pip.
   - tensorboardX  — for logging the losses in Tensorboard
-- An NVIDAI GPU and CUDA 8.0 or higher. Some operations only have gpu implementation.
+- An NVIDIA GPU and CUDA 8.0 or higher. Some operations only have gpu implementation.
 - **NOTICE**: different versions of Pytorch package have different memory usages.
 
-### Compilation
+### Compilation [Optional]
 
 Compile the CUDA code:
 
@@ -82,9 +85,7 @@ cd lib  # please change to this directory
 sh make.sh
 ```
 
-If your are using Volta GPUs, uncomment this [line](https://github.com/roytseng-tw/mask-rcnn.pytorch/tree/master/lib/make.sh#L15) in `lib/mask.sh` and remember to postpend a backslash at the line above. `CUDA_PATH` defaults to `/usr/loca/cuda`. If you want to use a CUDA library on different path, change this [line](https://github.com/roytseng-tw/mask-rcnn.pytorch/tree/master/lib/make.sh#L3) accordingly.
-
-It will compile all the modules you need, including NMS, ROI_Pooing, ROI_Crop and ROI_Align. (Actually gpu nms is never used ...)
+It will compile all the modules you need, including NMS. (Actually gpu nms is never used ...)
 
 Note that, If you use `CUDA_VISIBLE_DEVICES` to set gpus, **make sure at least one gpu is visible when compile the code.**
 
@@ -116,7 +117,7 @@ mkdir data
       ├── train2014
       ├── train2017
       ├── val2014
-      ├──val2017
+      ├── val2017
       ├── ...
   ```
   Download coco mini annotations from [here](https://s3-us-west-2.amazonaws.com/detectron/coco/coco_annotations_minival.tgz).
@@ -127,8 +128,40 @@ mkdir data
    ```
    ln -s path/to/coco data/coco
    ```
+
+- **PASCAL VOC 2007 + 12**
+  Please follow the instructions in [py-faster-rcnn](https://github.com/rbgirshick/py-faster-rcnn#beyond-the-demo-installation-for-training-and-testing-models) to prepare VOC datasets. Actually, you can refer to any others. After downloading the data, creat softlinks in the `data/VOC<year>` folder as folows,
+  ```
+  VOCdevkitPATH=/path/to/voc_devkit
+  mkdir -p $DETECTRON/detectron/datasets/data/VOC<year>
+  ln -s /${VOCdevkitPATH}/VOC<year>/JPEGImages $DETECTRON.PYTORCH/data/VOC<year>/JPEGImages
+  ln -s /${VOCdevkitPATH}/VOC<year>/json_annotations $DETECTRON.PYTORCH/data/VOC<year>/annotations
+  ln -s /${VOCdevkitPATH} $DETECTRON.PYTORCH/data/VOC<year>/VOCdevkit<year>
+  ```
+  The directory structure of `JPEGImages` and `annotations` should be as follows,
+  ```
+  VOC<year>
+  ├── annotations
+  |   ├── train.json
+  │   ├── trainval.json
+  │   ├── test.json
+  │   ├── ...
+  |
+  └── JPEGImages
+      ├── <im-1-name>.jpg
+      ├── ...
+      ├── <im-N-name>.jpg
+  ```
+  **NOTE:** The `annotations` folder requires you to have PASCAL VOC annotations in COCO json format, which is available for download [here](https://storage.googleapis.com/coco-dataset/external/PASCAL_VOC.zip). You can also convert the XML annotatinos files to JSON by running the following script,
+  ```
+  python tools/pascal_voc_xml2coco_json_converter.py $VOCdevkitPATH $year
+  ```
+  (In order to succesfully run the script above, you need to update the full path to the respective folders in the script).
+
+- **Custom Dataset**
+  Similar to above, create a directory named `CustomDataset` in the `data` folder and add symlinks to the `annotations` directory and `JPEGImages` as shown for Pascal Voc dataset. You also need to link the custom dataset devkit to `CustomDataDevkit`.
 
-  Recommend to put the images on a SSD for possible better training performance
+Recommend to put the images on a SSD for possible better training performance
 
 ### Pretrained Model
 
@@ -200,7 +233,11 @@ Use `--bs` to overwrite the default batch size to a proper value that fits into
 
 Specify `—-use_tfboard` to log the losses on Tensorboard.
 
-**NOTE**: use `--dataset keypoints_coco2017` when training for keypoint-rcnn.
+**NOTE**: 
+  - use `--dataset keypoints_coco2017` when training for keypoint-rcnn.
+  - use `--dataset voc2007` when training for PASCAL VOC 2007.
+  - use `--dataset voc2012` when training for PASCAL VOC 2012.
+  - use `--dataset custom_dataset --num_classes $NUM_CLASSES` when training for your custom dataset. Here, `$NUM_CLASSES` is the number of object classes **+ 1** (for background class) present in your custom dataset.
 
 ### The use of `--iter_size`
 As in Caffe, update network once (`optimizer.step()`) every `iter_size` iterations (forward + backward). This way to have a larger effective batch size for training. Notice that, step count is only increased after network update.

diff --git a/lib/datasets/dataset_catalog.py b/lib/datasets/dataset_catalog.py
@@ -193,6 +193,14 @@
         ANN_FN:
             _DATA_DIR + '/coco/annotations/image_info_test2017.json'
     },
+    'voc_2007_train': {
+        IM_DIR:
+            _DATA_DIR + '/VOC2007/JPEGImages',
+        ANN_FN:
+            _DATA_DIR + '/VOC2007/annotations/voc_2007_train.json',
+        DEVKIT_DIR:
+            _DATA_DIR + '/VOC2007/VOCdevkit2007'
+    },
     'voc_2007_trainval': {
         IM_DIR:
             _DATA_DIR + '/VOC2007/JPEGImages',
@@ -209,12 +217,44 @@
         DEVKIT_DIR:
             _DATA_DIR + '/VOC2007/VOCdevkit2007'
     },
+    'voc_2012_train': {
+        IM_DIR:
+            _DATA_DIR + '/VOC2012/JPEGImages',
+        ANN_FN:
+            _DATA_DIR + '/VOC2012/annotations/train.json',
+        DEVKIT_DIR:
+            _DATA_DIR + '/VOC2012/VOCdevkit2012'
+    },
     'voc_2012_trainval': {
         IM_DIR:
             _DATA_DIR + '/VOC2012/JPEGImages',
         ANN_FN:
-            _DATA_DIR + '/VOC2012/annotations/voc_2012_trainval.json',
+            _DATA_DIR + '/VOC2012/annotations/trainval.json',
         DEVKIT_DIR:
             _DATA_DIR + '/VOC2012/VOCdevkit2012'
+    },
+    'custom_data_train': {
+        IM_DIR:
+            _DATA_DIR + '/CustomData/JPEGImages',
+        ANN_FN:
+            _DATA_DIR + '/CustomData/annotations/train.json',
+        DEVKIT_DIR:
+            _DATA_DIR + '/CustomData/CustomDataDevkit'
+    },
+    'custom_data_trainval': {
+        IM_DIR:
+            _DATA_DIR + '/CustomData/JPEGImages',
+        ANN_FN:
+            _DATA_DIR + '/CustomData/annotations/trainval.json',
+        DEVKIT_DIR:
+            _DATA_DIR + '/CustomData/CustomDataDevkit'
+    },
+    'custom_data_test': {
+        IM_DIR:
+            _DATA_DIR + '/CustomData/JPEGImages',
+        ANN_FN:
+            _DATA_DIR + '/CustomData/annotations/test.json',
+        DEVKIT_DIR:
+            _DATA_DIR + '/CustomData/CustomDataDevkit'
     }
 }
diff --git a/lib/datasets/voc_eval.py b/lib/datasets/voc_eval.py
@@ -136,11 +136,11 @@ def voc_eval(detpath,
                         i + 1, len(imagenames)))
         # save
         logger.info('Saving cached annotations to {:s}'.format(cachefile))
-        with open(cachefile, 'w') as f:
+        with open(cachefile, 'wb') as f:
             cPickle.dump(recs, f)
     else:
         # load
-        with open(cachefile, 'r') as f:
+        with open(cachefile, 'rb') as f:
             recs = cPickle.load(f)
 
     # extract gt objects for this class

diff --git a/lib/make.sh b/lib/make.sh
@@ -1,60 +1,6 @@
 #!/usr/bin/env bash
 
-CUDA_PATH=/usr/local/cuda/
 
 python setup.py build_ext --inplace
 rm -rf build
 
-# Choose cuda arch as you need
-CUDA_ARCH="-gencode arch=compute_30,code=sm_30 \
-           -gencode arch=compute_35,code=sm_35 \
-           -gencode arch=compute_50,code=sm_50 \
-           -gencode arch=compute_52,code=sm_52 \
-           -gencode arch=compute_60,code=sm_60 \
-           -gencode arch=compute_61,code=sm_61 "
-#          -gencode arch=compute_70,code=sm_70 "
-
-# compile NMS
-cd model/nms/src
-echo "Compiling nms kernels by nvcc..."
-nvcc -c -o nms_cuda_kernel.cu.o nms_cuda_kernel.cu \
-	 -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC $CUDA_ARCH
-
-cd ../
-python build.py
-
-# compile roi_pooling
-cd ../../
-cd model/roi_pooling/src
-echo "Compiling roi pooling kernels by nvcc..."
-nvcc -c -o roi_pooling.cu.o roi_pooling_kernel.cu \
-	 -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC $CUDA_ARCH
-cd ../
-python build.py
-
-# # compile roi_align
-# cd ../../
-# cd model/roi_align/src
-# echo "Compiling roi align kernels by nvcc..."
-# nvcc -c -o roi_align_kernel.cu.o roi_align_kernel.cu \
-# 	 -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC $CUDA_ARCH
-# cd ../
-# python build.py
-
-# compile roi_crop
-cd ../../
-cd model/roi_crop/src
-echo "Compiling roi crop kernels by nvcc..."
-nvcc -c -o roi_crop_cuda_kernel.cu.o roi_crop_cuda_kernel.cu \
-	 -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC $CUDA_ARCH
-cd ../
-python build.py
-
-# compile roi_align (based on Caffe2's implementation)
-cd ../../
-cd modeling/roi_xfrom/roi_align/src
-echo "Compiling roi align kernels by nvcc..."
-nvcc -c -o roi_align_kernel.cu.o roi_align_kernel.cu \
-	 -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC $CUDA_ARCH
-cd ../
-python build.py
diff --git a/lib/model/__init__.py b/lib/model/__init__.py
diff --git a/lib/model/nms/.gitignore b/lib/model/nms/.gitignore
diff --git a/lib/model/nms/__init__.py b/lib/model/nms/__init__.py
diff --git a/lib/model/nms/_ext/__init__.py b/lib/model/nms/_ext/__init__.py
diff --git a/lib/model/nms/_ext/nms/__init__.py b/lib/model/nms/_ext/nms/__init__.py
diff --git a/lib/model/nms/build.py b/lib/model/nms/build.py
diff --git a/lib/model/nms/make.sh b/lib/model/nms/make.sh
diff --git a/lib/model/nms/nms_gpu.py b/lib/model/nms/nms_gpu.py