diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
index 086f417804f..fcc3af1fce3 100644
--- a/.github/CONTRIBUTING.md
+++ b/.github/CONTRIBUTING.md
@@ -13,17 +13,19 @@ All kinds of contributions are welcome, including but not limited to the followi
 4. create a PR
 
 Note
+
 - If you plan to add some new features that involve large changes, it is encouraged to open an issue for discussion first.
-- If you are the author of some papers and would like to include your method to mmdetection,
-please let us know (open an issue or contact the maintainers). We will much appreciate your contribution.
+- If you are the author of some papers and would like to include your method to mmdetection, please let us know (open an issue or contact the maintainers). We will much appreciate your contribution.
 - For new features and new modules, unit tests are required to improve the code's robustness.
 
 ## Code style
 
 ### Python
+
 We adopt [PEP8](https://www.python.org/dev/peps/pep-0008/) as the preferred code style.
 
 We use the following tools for linting and formatting:
+
 - [flake8](http://flake8.pycqa.org/en/latest/): linter
 - [yapf](https://github.com/google/yapf): formatter
 - [isort](https://github.com/timothycrosley/isort): sort imports
@@ -36,19 +38,33 @@ The config for a pre-commit hook is stored in [.pre-commit-config](../.pre-commi
 
 After you clone the repository, you will need to install initialize pre-commit hook.
 
-```
+```shell
 pip install -U pre-commit
 ```
 
 From the repository folder
-```
+
+```shell
 pre-commit install
 ```
 
-After this on every commit check code linters and formatter will be enforced.
+If you are facing issue when installing markdown lint, you may install ruby for markdown lint by following
+
+```shell
+# install rvm
+curl -L https://get.rvm.io | bash -s -- --autolibs=read-fail
+# set up environment
+echo 'source $HOME/.bash_profile' >> ~/.bashrc
+source ~/.profile
+rvm autolibs disable
+# install ruby
+rvm install 2.7.1
+```
 
+After this on every commit check code linters and formatter will be enforced.
 
 >Before you create a PR, make sure that your code lints and is formatted by yapf.
 
 ### C++ and CUDA
+
 We follow the [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html).
diff --git a/.github/ISSUE_TEMPLATE/error-report.md b/.github/ISSUE_TEMPLATE/error-report.md
index acdfd352148..ec28669514c 100644
--- a/.github/ISSUE_TEMPLATE/error-report.md
+++ b/.github/ISSUE_TEMPLATE/error-report.md
@@ -10,6 +10,7 @@ assignees: ''
 Thanks for your error report and we appreciate it a lot.
 
 **Checklist**
+
 1. I have searched related issues but cannot get the expected help.
 2. The bug has not been fixed in the latest version.
 
@@ -17,10 +18,13 @@ Thanks for your error report and we appreciate it a lot.
 A clear and concise description of what the bug is.
 
 **Reproduction**
+
 1. What command or script did you run?
-```
+
+```none
 A placeholder for the command.
 ```
+
 2. Did you make any modifications on the code or config? Did you understand what you have modified?
 3. What dataset did you use?
 
@@ -33,7 +37,8 @@ A placeholder for the command.
 
 **Error traceback**
 If applicable, paste the error trackback here.
-```
+
+```none
 A placeholder for trackback.
 ```
 
diff --git a/.github/ISSUE_TEMPLATE/reimplementation_questions.md b/.github/ISSUE_TEMPLATE/reimplementation_questions.md
index 58ffdeb3f0f..6b358387701 100644
--- a/.github/ISSUE_TEMPLATE/reimplementation_questions.md
+++ b/.github/ISSUE_TEMPLATE/reimplementation_questions.md
@@ -10,17 +10,20 @@ assignees: ''
 **Notice**
 
 There are several common situations in the reimplementation issues as below
+
 1. Reimplement a model in the model zoo using the provided configs
 2. Reimplement a model in the model zoo on other dataset (e.g., custom datasets)
 3. Reimplement a custom model but all the components are implemented in MMDetection
 4. Reimplement a custom model with new modules implemented by yourself
 
 There are several things to do for different cases as below.
+
 - For case 1 & 3, please follow the steps in the following sections thus we could help to quick identify the issue.
 - For case 2 & 4, please understand that we are not able to do much help here because we usually do not know the full code and the users should be responsible to the code they write.
 - One suggestion for case 2 & 4 is that the users should first check whether the bug lies in the self-implemented code or the original code. For example, users can first make sure that the same model runs well on supported datasets. If you still need help, please describe what you have done and what you obtain in the issue, and follow the steps in the following sections and try as clear as possible so that we can better help you.
 
 **Checklist**
+
 1. I have searched related issues but cannot get the expected help.
 2. The issue has not been fixed in the latest version.
 
@@ -29,14 +32,19 @@ There are several things to do for different cases as below.
 A clear and concise description of what the problem you meet and what have you done.
 
 **Reproduction**
+
 1. What command or script did you run?
-```
+
+```none
 A placeholder for the command.
 ```
+
 2. What config dir you run?
-```
+
+```none
 A placeholder for the config.
 ```
+
 3. Did you make any modifications on the code or config? Did you understand what you have modified?
 4. What dataset did you use?
 
@@ -44,13 +52,14 @@ A placeholder for the config.
 
 1. Please run `python mmdet/utils/collect_env.py` to collect necessary environment information and paste it here.
 2. You may add addition that may be helpful for locating the problem, such as
-    - How you installed PyTorch [e.g., pip, conda, source]
-    - Other environment variables that may be related (such as `$PATH`, `$LD_LIBRARY_PATH`, `$PYTHONPATH`, etc.)
+   1. How you installed PyTorch [e.g., pip, conda, source]
+   2. Other environment variables that may be related (such as `$PATH`, `$LD_LIBRARY_PATH`, `$PYTHONPATH`, etc.)
 
 **Results**
 
 If applicable, paste the related results here, e.g., what you expect and what you get.
-```
+
+```none
 A placeholder for results comparison
 ```
 
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 9e6d30895b0..35566b35337 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -28,6 +28,11 @@ repos:
         args: ["--remove"]
       - id: mixed-line-ending
         args: ["--fix=lf"]
+  - repo: https://github.com/jumanjihouse/pre-commit-hooks
+    rev: 2.1.4
+    hooks:
+      - id: markdownlint
+        args: ["-r", "~MD002,~MD013,~MD024,~MD029,~MD033,~MD034,~MD036"]
   - repo: https://github.com/myint/docformatter
     rev: v1.3.1
     hooks:
diff --git a/README.md b/README.md
index 1a01c4e2c16..7c698d77868 100644
--- a/README.md
+++ b/README.md
@@ -51,6 +51,7 @@ A comparison between v1.x and v2.0 codebases can be found in [compatibility.md](
 Results and models are available in the [model zoo](docs/model_zoo.md).
 
 Supported backbones:
+
 - [x] ResNet
 - [x] ResNeXt
 - [x] VGG
@@ -60,6 +61,7 @@ Supported backbones:
 - [x] ResNeSt
 
 Supported methods:
+
 - [x] [RPN](configs/rpn)
 - [x] [Fast R-CNN](configs/fast_rcnn)
 - [x] [Faster R-CNN](configs/faster_rcnn)
diff --git a/configs/atss/README.md b/configs/atss/README.md
index b34307f436d..99f571652c5 100644
--- a/configs/atss/README.md
+++ b/configs/atss/README.md
@@ -1,9 +1,8 @@
 # Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection
 
-
 ## Introduction
 
-```
+```latex
 @article{zhang2019bridging,
   title   =  {Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection},
   author  =  {Zhang, Shifeng and Chi, Cheng and Yao, Yongqiang and Lei, Zhen and Li, Stan Z.},
@@ -12,7 +11,6 @@
 }
 ```
 
-
 ## Results and Models
 
 | Backbone  | Style   | Lr schd | Mem (GB) | Inf time (fps) | box AP | Config | Download |
diff --git a/configs/cascade_rcnn/README.md b/configs/cascade_rcnn/README.md
index 3ad625a8e4a..74c99066114 100644
--- a/configs/cascade_rcnn/README.md
+++ b/configs/cascade_rcnn/README.md
@@ -1,7 +1,8 @@
 # Cascade R-CNN: High Quality Object Detection and Instance Segmentation
 
 ## Introduction
-```
+
+```latex
 @article{Cai_2019,
    title={Cascade R-CNN: High Quality Object Detection and Instance Segmentation},
    ISSN={1939-3539},
diff --git a/configs/centripetalnet/README.md b/configs/centripetalnet/README.md
index 5c83422291a..ca502e5d7ab 100644
--- a/configs/centripetalnet/README.md
+++ b/configs/centripetalnet/README.md
@@ -1,7 +1,8 @@
 # CentripetalNet
 
 ## Introduction
-```
+
+```latex
 @InProceedings{Dong_2020_CVPR,
 author = {Dong, Zhiwei and Li, Guoxuan and Liao, Yue and Wang, Fei and Ren, Pengju and Qian, Chen},
 title = {CentripetalNet: Pursuing High-Quality Keypoint Pairs for Object Detection},
@@ -18,5 +19,6 @@ year = {2020}
 | HourglassNet-104 | [16 x 6](./centripetalnet_hourglass104_mstest_16x6_210e_coco.py) | 190/210 | 16.7 | 3.7 | 44.8 | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/centripetalnet/centripetalnet_hourglass104_mstest_16x6_210e_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/centripetalnet/centripetalnet_hourglass104_mstest_16x6_210e_coco/centripetalnet_hourglass104_mstest_16x6_210e_coco_20200915_204804-3ccc61e5.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/centripetalnet/centripetalnet_hourglass104_mstest_16x6_210e_coco/centripetalnet_hourglass104_mstest_16x6_210e_coco_20200915_204804.log.json) |
 
 Note:
+
 - TTA setting is single-scale and `flip=True`.
 - The model we released is the best checkpoint rather than the latest checkpoint (box AP 44.8 vs 44.6 in our experiment).
diff --git a/configs/cityscapes/README.md b/configs/cityscapes/README.md
index 80ce589c562..146ec0d07c1 100644
--- a/configs/cityscapes/README.md
+++ b/configs/cityscapes/README.md
@@ -9,7 +9,6 @@
 - A conversion [script](../../tools/convert_datasets/cityscapes.py) is provided to convert Cityscapes into COCO format. Please refer to [install.md](../../docs/install.md#prepare-datasets) for details.
 - `CityscapesDataset` implemented three evaluation methods. `bbox` and `segm` are standard COCO bbox/mask AP. `cityscapes` is the cityscapes dataset official evaluation, which may be slightly higher than COCO.
 
-
 ### Faster R-CNN
 
 |    Backbone     |  Style  | Lr schd | Scale    | Mem (GB) | Inf time (fps) | box AP | Config | Download   |
diff --git a/configs/cornernet/README.md b/configs/cornernet/README.md
index 457200c8088..65a7eda2ff0 100644
--- a/configs/cornernet/README.md
+++ b/configs/cornernet/README.md
@@ -1,7 +1,8 @@
 # CornerNet
 
 ## Introduction
-```
+
+```latex
 @inproceedings{law2018cornernet,
   title={Cornernet: Detecting objects as paired keypoints},
   author={Law, Hei and Deng, Jia},
@@ -21,9 +22,10 @@
 | HourglassNet-104 | [32 x 3](./cornernet_hourglass104_mstest_32x3_210e_coco.py) | 180/210 | 9.5 | 3.9 | 40.4 | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/cornernet/cornernet_hourglass104_mstest_32x3_210e_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/cornernet/cornernet_hourglass104_mstest_32x3_210e_coco/cornernet_hourglass104_mstest_32x3_210e_coco_20200819_203110-1efaea91.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/cornernet/cornernet_hourglass104_mstest_32x3_210e_coco/cornernet_hourglass104_mstest_32x3_210e_coco_20200819_203110.log.json) |
 
 Note:
+
 - TTA setting is single-scale and `flip=True`.
 - Experiments with `images_per_gpu=6` are conducted on Tesla V100-SXM2-32GB, `images_per_gpu=3` are conducted on GeForce GTX 1080 Ti.
 - Here are the descriptions of each experiment setting:
-    - 10 x 5: 10 GPUs with 5 images per gpu. This is the same setting as that reported in the original paper.
-    - 8 x 6: 8 GPUs with 6 images per gpu. The total batchsize is similar to paper and only need 1 node to train.
-    - 32 x 3: 32 GPUs with 3 images per gpu. The default setting for 1080TI and need 4 nodes to train.
+  - 10 x 5: 10 GPUs with 5 images per gpu. This is the same setting as that reported in the original paper.
+  - 8 x 6: 8 GPUs with 6 images per gpu. The total batchsize is similar to paper and only need 1 node to train.
+  - 32 x 3: 32 GPUs with 3 images per gpu. The default setting for 1080TI and need 4 nodes to train.
diff --git a/configs/dcn/README.md b/configs/dcn/README.md
index 94bec674ed0..9c42f94d791 100644
--- a/configs/dcn/README.md
+++ b/configs/dcn/README.md
@@ -1,8 +1,8 @@
 # Deformable Convolutional Networks
 
-# Introduction
+## Introduction
 
-```
+```none
 @inproceedings{dai2017deformable,
   title={Deformable Convolutional Networks},
   author={Dai, Jifeng and Qi, Haozhi and Xiong, Yuwen and Li, Yi and Zhang, Guodong and Hu, Han and Wei, Yichen},
diff --git a/configs/deepfashion/README.md b/configs/deepfashion/README.md
index c087575b73a..fa31cca8eef 100644
--- a/configs/deepfashion/README.md
+++ b/configs/deepfashion/README.md
@@ -1,6 +1,6 @@
 # DeepFashion
 
-MMFashion(https://github.com/open-mmlab/mmfashion) develops "fashion parsing and segmentation" module
+[MMFashion](https://github.com/open-mmlab/mmfashion) develops "fashion parsing and segmentation" module
 based on the dataset
 [DeepFashion-Inshop](https://drive.google.com/drive/folders/0B7EVK8r0v71pVDZFQXRsMDZCX1E?usp=sharing).
 Its annotation follows COCO style.
@@ -38,6 +38,7 @@ After that you can train the Mask RCNN r50 on DeepFashion-In-shop dataset by lau
 or creating your own config file.
 
 ## Model Zoo
+
 |   Backbone  |  Model type  |       Dataset       |  bbox detection Average Precision  | segmentation Average Precision |  Config |      Download (Google)      |
 | :---------: | :----------: | :-----------------: | :--------------------------------: | :----------------------------: | :---------:| :-------------------------: |
 |   ResNet50  |   Mask RCNN  | DeepFashion-In-shop |                0.599               |              0.584             |[config](https://github.com/open-mmlab/mmdetection/blob/master/configs/deepfashion/mask_rcnn_r50_fpn_15e_deepfashion.py)|  [model](https://drive.google.com/open?id=1q6zF7J6Gb-FFgM87oIORIt6uBozaXp5r) &#124; [log](https://drive.google.com/file/d/1qTK4Dr4FFLa9fkdI6UVko408gkrfTRLP/view?usp=sharing)   |
diff --git a/configs/double_heads/README.md b/configs/double_heads/README.md
index 049dad5dca8..6c031d0b856 100644
--- a/configs/double_heads/README.md
+++ b/configs/double_heads/README.md
@@ -1,7 +1,8 @@
 # Rethinking Classification and Localization for Object Detection
 
 ## Introduction
-```
+
+```latex
 @article{wu2019rethinking,
     title={Rethinking Classification and Localization for Object Detection},
     author={Yue Wu and Yinpeng Chen and Lu Yuan and Zicheng Liu and Lijuan Wang and Hongzhi Li and Yun Fu},
diff --git a/configs/empirical_attention/README.md b/configs/empirical_attention/README.md
index 1e737ea0053..ed151178503 100644
--- a/configs/empirical_attention/README.md
+++ b/configs/empirical_attention/README.md
@@ -2,7 +2,7 @@
 
 ## Introduction
 
-```
+```latex
 @article{zhu2019empirical,
   title={An Empirical Study of Spatial Attention Mechanisms in Deep Networks},
   author={Zhu, Xizhou and Cheng, Dazhi and Zhang, Zheng and Lin, Stephen and Dai, Jifeng},
@@ -11,7 +11,6 @@
 }
 ```
 
-
 ## Results and Models
 
 | Backbone  | Attention Component | DCN  | Lr schd | Mem (GB) | Inf time (fps) | box AP | Config | Download |
diff --git a/configs/fast_rcnn/README.md b/configs/fast_rcnn/README.md
index b01c4b5956d..1c9da5079d1 100644
--- a/configs/fast_rcnn/README.md
+++ b/configs/fast_rcnn/README.md
@@ -1,7 +1,8 @@
 # Fast R-CNN
 
 ## Introduction
-```
+
+```latex
 @inproceedings{girshick2015fast,
   title={Fast r-cnn},
   author={Girshick, Ross},
diff --git a/configs/faster_rcnn/README.md b/configs/faster_rcnn/README.md
index a331ccd3bf4..5152317b380 100644
--- a/configs/faster_rcnn/README.md
+++ b/configs/faster_rcnn/README.md
@@ -1,7 +1,8 @@
 # Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
 
 ## Introduction
-```
+
+```latex
 @article{Ren_2017,
    title={Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks},
    journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
@@ -29,6 +30,7 @@
 | X-101-64x4d-FPN | pytorch |   2x    | -        | -              | 41.6   | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/faster_rcnn/faster_rcnn_x101_64x4d_fpn_2x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_x101_64x4d_fpn_2x_coco/faster_rcnn_x101_64x4d_fpn_2x_coco_20200512_161033-5961fa95.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_x101_64x4d_fpn_2x_coco/faster_rcnn_x101_64x4d_fpn_2x_coco_20200512_161033.log.json)  |
 
 ## Different regression loss
+
 We trained with R-50-FPN pytorch style backbone for 1x schedule.
 
 |    Backbone     | Loss type | Mem (GB) | Inf time (fps) | box AP | Config | Download |
@@ -39,6 +41,7 @@ We trained with R-50-FPN pytorch style backbone for 1x schedule.
 |    R-50-FPN     |  BoundedIoULoss |          |                | 37.4   | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_bounded_iou_1x_coco-98ad993b.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_bounded_iou_1x_coco_20200505_160738.log.json)  |
 
 ## Pre-trained Models
+
 We also train some models with longer schedules and multi-scale training. The users could finetune them for downstream tasks.
 
 |    Backbone     |  Style  | Lr schd | Mem (GB) | Inf time (fps) | box AP | Config | Download |
diff --git a/configs/fcos/README.md b/configs/fcos/README.md
index d7c0d419736..84b3fbfaaf5 100644
--- a/configs/fcos/README.md
+++ b/configs/fcos/README.md
@@ -2,7 +2,7 @@
 
 ## Introduction
 
-```
+```latex
 @article{tian2019fcos,
   title={FCOS: Fully Convolutional One-Stage Object Detection},
   author={Tian, Zhi and Shen, Chunhua and Chen, Hao and He, Tong},
@@ -23,7 +23,6 @@
 | R-101     | caffe   | Y       | N        | N       | N       | 1x      | 10.2     | 17.3           | 39.2   | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/fcos/fcos_r101_caffe_fpn_gn-head_4x4_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/fcos/fcos_r101_caffe_fpn_gn-head_4x4_1x_coco/fcos_r101_caffe_fpn_gn_1x_4gpu_20200218-13e2cc55.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/fcos/fcos_r101_caffe_fpn_gn-head_4x4_1x_coco/20200130_004231.log.json) |
 | R-101     | caffe   | Y       | N        | N       | N       | 2x      | -        | -              | 39.1   | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/fcos/fcos_r101_caffe_fpn_gn-head_4x4_2x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/fcos/fcos_r101_caffe_fpn_gn-head_4x4_2x_coco/fcos_r101_caffe_fpn_gn_2x_4gpu_20200218-d2261033.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/fcos/fcos_r101_caffe_fpn_gn-head_4x4_2x_coco/20200130_004231.log.json) |
 
-
 | Backbone  | Style   | GN      | MS train | Lr schd | Mem (GB) | Inf time (fps) | box AP | Config | Download |
 |:---------:|:-------:|:-------:|:--------:|:-------:|:--------:|:--------------:|:------:|:------:|:--------:|
 | R-50      | caffe   | Y       | Y        | 2x      | 6.5      | 22.9           | 38.7   |  |  |
@@ -31,6 +30,7 @@
 | X-101     | pytorch | Y       | Y        | 2x      | 10.0     | 9.3            | 42.5   | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/fcos/fcos_x101_64x4d_fpn_gn-head_mstrain_640-800_4x2_2x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/fcos/fcos_x101_64x4d_fpn_gn-head_mstrain_640-800_4x2_2x_coco/fcos_x101_64x4d_fpn_gn-head_mstrain_640-800_4x2_2x_coco_20200229-11f8c079.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/fcos/fcos_x101_64x4d_fpn_gn-head_mstrain_640-800_4x2_2x_coco/fcos_x101_64x4d_fpn_gn-head_mstrain_640-800_4x2_2x_coco_20200229_222104.log.json) |
 
 **Notes:**
+
 - To be consistent with the author's implementation, we use 4 GPUs with 4 images/GPU for R-50 and R-101 models, and 8 GPUs with 2 image/GPU for X-101 models.
 - The X-101 backbone is X-101-64x4d.
 - Tricks means setting `norm_on_bbox`, `centerness_on_reg`, `center_sampling` as `True`.
diff --git a/configs/foveabox/README.md b/configs/foveabox/README.md
index 4ea751e930b..7b69178d4c8 100644
--- a/configs/foveabox/README.md
+++ b/configs/foveabox/README.md
@@ -4,6 +4,7 @@ FoveaBox is an accurate, flexible and completely anchor-free object detection sy
 Different from previous anchor-based methods, FoveaBox directly learns the object existing possibility and the bounding box coordinates without anchor reference. This is achieved by: (a) predicting category-sensitive semantic maps for the object existing possibility, and (b) producing category-agnostic bounding box for each position that potentially contains an object.
 
 ## Main Results
+
 ### Results on R50/101-FPN
 
 | Backbone  | Style   |  align  | ms-train| Lr schd | Mem (GB) | Inf time (fps) | box AP | Config | Download |
@@ -25,8 +26,10 @@ Different from previous anchor-based methods, FoveaBox directly learns the objec
 Any pull requests or issues are welcome.
 
 ## Citations
+
 Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follows.
-```
+
+```latex
 @article{kong2019foveabox,
   title={FoveaBox: Beyond Anchor-based Object Detector},
   author={Kong, Tao and Sun, Fuchun and Liu, Huaping and Jiang, Yuning and Shi, Jianbo},
diff --git a/configs/fp16/README.md b/configs/fp16/README.md
index e8ec8721084..bca4fb9cda3 100644
--- a/configs/fp16/README.md
+++ b/configs/fp16/README.md
@@ -1,7 +1,8 @@
 # Mixed Precision Training
 
 ## Introduction
-```
+
+```latex
 @article{micikevicius2017mixed,
   title={Mixed precision training},
   author={Micikevicius, Paulius and Narang, Sharan and Alben, Jonah and Diamos, Gregory and Elsen, Erich and Garcia, David and Ginsburg, Boris and Houston, Michael and Kuchaiev, Oleksii and Venkatesh, Ganesh and others},
diff --git a/configs/free_anchor/README.md b/configs/free_anchor/README.md
index 85a675e92c3..0cbb7afeaad 100644
--- a/configs/free_anchor/README.md
+++ b/configs/free_anchor/README.md
@@ -2,7 +2,7 @@
 
 ## Introduction
 
-```
+```latex
 @inproceedings{zhang2019freeanchor,
   title   =  {{FreeAnchor}: Learning to Match Anchors for Visual Object Detection},
   author  =  {Zhang, Xiaosong and Wan, Fang and Liu, Chang and Ji, Rongrong and Ye, Qixiang},
@@ -20,5 +20,6 @@
 | X-101-32x4d | pytorch | 1x   | 8.1      | 11.1 | 41.9 | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/free_anchor/retinanet_free_anchor_x101_32x4d_fpn_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/free_anchor/retinanet_free_anchor_x101_32x4d_fpn_1x_coco/retinanet_free_anchor_x101_32x4d_fpn_1x_coco_20200130-d4846968.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/free_anchor/retinanet_free_anchor_x101_32x4d_fpn_1x_coco/retinanet_free_anchor_x101_32x4d_fpn_1x_coco_20200130_095627.log.json) |
 
 **Notes:**
+
 - We use 8 GPUs with 2 images/GPU.
 - For more settings and models, please refer to the [official repo](https://github.com/zhangxiaosong18/FreeAnchor).
diff --git a/configs/fsaf/README.md b/configs/fsaf/README.md
index 8039bfb36ed..a07fe648f1c 100644
--- a/configs/fsaf/README.md
+++ b/configs/fsaf/README.md
@@ -9,6 +9,7 @@ In the original paper, feature maps within the central 0.2-0.5 area of a gt box
 it is empirically found that a hard threshold (0.2-0.2) gives a further gain on the performance. (see the table below)
 
 ## Main Results
+
 ### Results on R50/R101/X101-FPN
 
 | Backbone   |  ignore range | ms-train| Lr schd |Train Mem (GB)| Train time (s/iter) | Inf time (fps) | box AP | Config | Download |
@@ -19,16 +20,19 @@ it is empirically found that a hard threshold (0.2-0.2) gives a further gain on
 | X-101      |   0.2-0.2     | N       | 1x      |    9.38      | 1.23            |    5.6         | 42.4 (41.0)   | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/fsaf/fsaf_x101_64x4d_fpn_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/fsaf/fsaf_x101_64x4d_fpn_1x_coco/fsaf_x101_64x4d_fpn_1x_coco-e3f6e6fd.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/fsaf/fsaf_x101_64x4d_fpn_1x_coco/fsaf_x101_64x4d_fpn_1x_coco_20200428_160424.log.json)|
 
 **Notes:**
- - *1x means the model is trained for 12 epochs.*
- - *AP values in the brackets represent those reported in the original paper.*
- - *All results are obtained with a single model and single-scale test.*
- - *X-101 backbone represents ResNext-101-64x4d.*
- - *All pretrained backbones use pytorch style.*
- - *All models are trained on 8 Titan-XP gpus and tested on a single gpu.*
+
+- *1x means the model is trained for 12 epochs.*
+- *AP values in the brackets represent those reported in the original paper.*
+- *All results are obtained with a single model and single-scale test.*
+- *X-101 backbone represents ResNext-101-64x4d.*
+- *All pretrained backbones use pytorch style.*
+- *All models are trained on 8 Titan-XP gpus and tested on a single gpu.*
 
 ## Citations
+
 BibTeX reference is as follows.
-```
+
+```latex
 @inproceedings{zhu2019feature,
   title={Feature Selective Anchor-Free Module for Single-Shot Object Detection},
   author={Zhu, Chenchen and He, Yihui and Savvides, Marios},
diff --git a/configs/gcnet/README.md b/configs/gcnet/README.md
index 7c9e29c1ba9..0fe0fc101d1 100644
--- a/configs/gcnet/README.md
+++ b/configs/gcnet/README.md
@@ -11,7 +11,7 @@ We provide config files to reproduce the results in the paper for
 
 ## Citing GCNet
 
-```
+```latex
 @article{cao2019GCNet,
   title={GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond},
   author={Cao, Yue and Xu, Jiarui and Lin, Stephen and Wei, Fangyun and Hu, Han},
@@ -21,6 +21,7 @@ We provide config files to reproduce the results in the paper for
 ```
 
 ## Results and models
+
 The results on COCO 2017val are shown in the below table.
 
 | Backbone  | Model            | Context        | Lr schd | Mem (GB) | Inf time (fps) | box AP | mask AP | Config | Download  |
diff --git a/configs/gfl/README.md b/configs/gfl/README.md
index b379f6850ed..7ca72cc574d 100644
--- a/configs/gfl/README.md
+++ b/configs/gfl/README.md
@@ -1,11 +1,10 @@
 # Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection
 
-
 ## Introduction
 
 We provide config files to reproduce the object detection results in the paper [Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection](https://arxiv.org/abs/2006.04388)
 
-```
+```latex
 @article{li2020generalized,
   title={Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection},
   author={Li, Xiang and Wang, Wenhai and Wu, Lijun and Chen, Shuo and Hu, Xiaolin and Li, Jun and Tang, Jinhui and Yang, Jian},
@@ -14,7 +13,6 @@ We provide config files to reproduce the object detection results in the paper [
 }
 ```
 
-
 ## Results and Models
 
 | Backbone          | Style   | Lr schd | Multi-scale Training| Inf time (fps) | box AP | Config | Download |
diff --git a/configs/gn/README.md b/configs/gn/README.md
index 205892afcd7..d6db55ea05b 100644
--- a/configs/gn/README.md
+++ b/configs/gn/README.md
@@ -2,7 +2,7 @@
 
 ## Introduction
 
-```
+```latex
 @inproceedings{wu2018group,
   title={Group Normalization},
   author={Wu, Yuxin and He, Kaiming},
@@ -23,6 +23,7 @@
 | R-50-FPN (c)  | Mask R-CNN | 3x      | 7.1      | -              | 40.1   | 36.2    | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/gn/mask_rcnn_r50_fpn_gn-all_contrib_3x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/gn/mask_rcnn_r50_fpn_gn-all_contrib_3x_coco/mask_rcnn_r50_fpn_gn-all_contrib_3x_coco_20200225-542aefbc.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/gn/mask_rcnn_r50_fpn_gn-all_contrib_3x_coco/mask_rcnn_r50_fpn_gn-all_contrib_3x_coco_20200225_235135.log.json) |
 
 **Notes:**
+
 - (d) means pretrained model converted from Detectron, and (c) means the contributed model pretrained by [@thangvubk](https://github.com/thangvubk).
 - The `3x` schedule is epoch [28, 34, 36].
 - **Memory, Train/Inf time is outdated.**
diff --git a/configs/grid_rcnn/README.md b/configs/grid_rcnn/README.md
index d6f4966511a..96b598f881b 100644
--- a/configs/grid_rcnn/README.md
+++ b/configs/grid_rcnn/README.md
@@ -2,7 +2,7 @@
 
 ## Introduction
 
-```
+```latex
 @inproceedings{lu2019grid,
   title={Grid r-cnn},
   author={Lu, Xin and Li, Buyu and Yue, Yuxin and Li, Quanquan and Yan, Junjie},
@@ -28,5 +28,6 @@
 | X-101-64x4d | 2x      | 11.3     | 7.7            | 43.0   | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/grid_rcnn/grid_rcnn_x101_64x4d_fpn_gn-head_2x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/grid_rcnn/grid_rcnn_x101_64x4d_fpn_gn-head_2x_coco/grid_rcnn_x101_64x4d_fpn_gn-head_2x_coco_20200204-ec76a754.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/grid_rcnn/grid_rcnn_x101_64x4d_fpn_gn-head_2x_coco/grid_rcnn_x101_64x4d_fpn_gn-head_2x_coco_20200204_080641.log.json) |
 
 **Notes:**
+
 - All models are trained with 8 GPUs instead of 32 GPUs in the original paper.
 - The warming up lasts for 1 epoch and `2x` here indicates 25 epochs.
diff --git a/configs/groie/README.md b/configs/groie/README.md
index 9ccb9128599..05385618c16 100644
--- a/configs/groie/README.md
+++ b/configs/groie/README.md
@@ -42,12 +42,11 @@ the trained models.
 | R-101-FPN | GC-Net           |   1x    |  42.2  |  37.8   | [config](../configs/gcnet/mask_rcnn_r101_fpn_syncbn-backbone_r4_gcb_c3-c5_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/gcnet/mask_rcnn_r101_fpn_syncbn-backbone_r4_gcb_c3-c5_1x_coco/mask_rcnn_r101_fpn_syncbn-backbone_r4_gcb_c3-c5_1x_coco_20200206-8407a3f0.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/gcnet/mask_rcnn_r101_fpn_syncbn-backbone_r4_gcb_c3-c5_1x_coco/mask_rcnn_r101_fpn_syncbn-backbone_r4_gcb_c3-c5_1x_coco_20200206_142508.log.json) |
 | R-101-FPN | + GRoIE          |   1x    |   |    | [config](./mask_rcnn_r101_fpn_syncbn-backbone_r4_gcb_c3-c5_groie_1x_coco.py)| [model](http://download.openmmlab.com/mmdetection/v2.0/groie/mask_rcnn_r101_fpn_syncbn-backbone_r4_gcb_c3-c5_groie_1x_coco/mask_rcnn_r101_fpn_syncbn-backbone_r4_gcb_c3-c5_groie_1x_coco_20200607_224507-8daae01c.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/groie/mask_rcnn_r101_fpn_syncbn-backbone_r4_gcb_c3-c5_groie_1x_coco/mask_rcnn_r101_fpn_syncbn-backbone_r4_gcb_c3-c5_groie_1x_coco_20200607_224507.log.json) |
 
-
 ## Citation
 
 If you use this work or benchmark in your research, please cite this project.
 
-```
+```latex
 @misc{rossi2020novel,
     title={A novel Region of Interest Extraction Layer for Instance Segmentation},
     author={Leonardo Rossi and Akbar Karimi and Andrea Prati},
diff --git a/configs/guided_anchoring/README.md b/configs/guided_anchoring/README.md
index 3bd1121b950..e8b415b79a4 100644
--- a/configs/guided_anchoring/README.md
+++ b/configs/guided_anchoring/README.md
@@ -4,7 +4,7 @@
 
 We provide config files to reproduce the results in the CVPR 2019 paper for [Region Proposal by Guided Anchoring](https://arxiv.org/abs/1901.03278).
 
-```
+```latex
 @inproceedings{wang2019region,
     title={Region Proposal by Guided Anchoring},
     author={Jiaqi Wang and Kai Chen and Shuo Yang and Chen Change Loy and Dahua Lin},
@@ -24,7 +24,6 @@ The results on COCO 2017 val is shown in the below table. (results on test-dev a
 | GA-RPN | X-101-32x4d-FPN | pytorch |   1x    |   8.5    |      10.0      |  70.6   | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/guided_anchoring/ga_rpn_x101_32x4d_fpn_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/guided_anchoring/ga_rpn_x101_32x4d_fpn_1x_coco/ga_rpn_x101_32x4d_fpn_1x_coco_20200220-c28d1b18.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/guided_anchoring/ga_rpn_x101_32x4d_fpn_1x_coco/ga_rpn_x101_32x4d_fpn_1x_coco_20200220_221326.log.json) |
 | GA-RPN | X-101-64x4d-FPN | pytorch |   1x    |   7.1    |      7.5       |  71.2   | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/guided_anchoring/ga_rpn_x101_64x4d_fpn_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/guided_anchoring/ga_rpn_x101_64x4d_fpn_1x_coco/ga_rpn_x101_64x4d_fpn_1x_coco_20200225-3c6e1aa2.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/guided_anchoring/ga_rpn_x101_64x4d_fpn_1x_coco/ga_rpn_x101_64x4d_fpn_1x_coco_20200225_152704.log.json) |
 
-
 |     Method     |    Backbone     |  Style  | Lr schd | Mem (GB) | Inf time (fps) | box AP | Config | Download |
 | :------------: | :-------------: | :-----: | :-----: | :------: | :------------: | :----: | :------: | :--------: |
 | GA-Faster RCNN |    R-50-FPN     |  caffe  |   1x    |   5.5    |                |  39.6  |          [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/guided_anchoring/ga_faster_r50_caffe_fpn_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/guided_anchoring/ga_faster_r50_caffe_fpn_1x_coco/ga_faster_r50_caffe_fpn_1x_coco_20200702_000718-a11ccfe6.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/guided_anchoring/ga_faster_r50_caffe_fpn_1x_coco/ga_faster_r50_caffe_fpn_1x_coco_20200702_000718.log.json)           |
@@ -36,13 +35,10 @@ The results on COCO 2017 val is shown in the below table. (results on test-dev a
 |  GA-RetinaNet  | X-101-32x4d-FPN | pytorch |   1x    |   6.9    |      10.6      |  40.5  |      [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/guided_anchoring/ga_retinanet_x101_32x4d_fpn_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/guided_anchoring/ga_retinanet_x101_32x4d_fpn_1x_coco/ga_retinanet_x101_32x4d_fpn_1x_coco_20200219-40c56caa.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/guided_anchoring/ga_retinanet_x101_32x4d_fpn_1x_coco/ga_retinanet_x101_32x4d_fpn_1x_coco_20200219_223025.log.json)      |
 |  GA-RetinaNet  | X-101-64x4d-FPN | pytorch |   1x    |   9.9    |      7.7       |  41.3  |      [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/guided_anchoring/ga_retinanet_x101_64x4d_fpn_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/guided_anchoring/ga_retinanet_x101_64x4d_fpn_1x_coco/ga_retinanet_x101_64x4d_fpn_1x_coco_20200226-ef9f7f1f.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/guided_anchoring/ga_retinanet_x101_64x4d_fpn_1x_coco/ga_retinanet_x101_64x4d_fpn_1x_coco_20200226_221123.log.json)      |
 
-
-
 - In the Guided Anchoring paper, `score_thr` is set to 0.001 in Fast/Faster RCNN and 0.05 in RetinaNet for both baselines and Guided Anchoring.
 
 - Performance on COCO test-dev benchmark are shown as follows.
 
-
 |     Method     | Backbone  | Style | Lr schd | Aug Train | Score thr |  AP   | AP_50 | AP_75 | AP_small | AP_medium | AP_large | Download |
 | :------------: | :-------: | :---: | :-----: | :-------: | :-------: | :---: | :---: | :---: | :------: | :-------: | :------: | :------: |
 | GA-Faster RCNN | R-101-FPN | caffe |   1x    |     F     |   0.05    |       |       |       |          |           |          |          |
diff --git a/configs/hrnet/README.md b/configs/hrnet/README.md
index 450516658ae..94018380baa 100644
--- a/configs/hrnet/README.md
+++ b/configs/hrnet/README.md
@@ -2,7 +2,7 @@
 
 ## Introduction
 
-```
+```latex
 @inproceedings{SunXLW19,
   title={Deep High-Resolution Representation Learning for Human Pose Estimation},
   author={Ke Sun and Bin Xiao and Dong Liu and Jingdong Wang},
@@ -22,7 +22,6 @@
 
 ## Results and Models
 
-
 ### Faster R-CNN
 
 |    Backbone     |  Style  | Lr schd | Mem (GB) | Inf time (fps) | box AP | Config | Download |
@@ -45,7 +44,6 @@
 |   HRNetV2p-W40  | pytorch |   1x    |  10.9    |                | 42.1   |  37.5   |  [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/hrnet/mask_rcnn_hrnetv2p_w40_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/hrnet/mask_rcnn_hrnetv2p_w40_1x_coco/mask_rcnn_hrnetv2p_w40_1x_coco_20200511_015646-66738b35.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/hrnet/mask_rcnn_hrnetv2p_w40_1x_coco/mask_rcnn_hrnetv2p_w40_1x_coco_20200511_015646.log.json)  |
 |   HRNetV2p-W40  | pytorch |   2x    |   10.9   |                | 42.8   |  38.2   |  [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/hrnet/mask_rcnn_hrnetv2p_w40_2x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/hrnet/mask_rcnn_hrnetv2p_w40_2x_coco/mask_rcnn_hrnetv2p_w40_2x_coco_20200512_163732-aed5e4ab.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/hrnet/mask_rcnn_hrnetv2p_w40_2x_coco/mask_rcnn_hrnetv2p_w40_2x_coco_20200512_163732.log.json)  |
 
-
 ### Cascade R-CNN
 
 |    Backbone     |  Style  | Lr schd | Mem (GB) | Inf time (fps) | box AP | Config | Download |
@@ -54,7 +52,6 @@
 |   HRNetV2p-W32  | pytorch |   20e   |  9.4     | 11.0           | 43.3   | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/hrnet/cascade_rcnn_hrnetv2p_w32_20e_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/hrnet/cascade_rcnn_hrnetv2p_w32_20e_coco/cascade_rcnn_hrnetv2p_w32_20e_coco_20200208-928455a4.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/hrnet/cascade_rcnn_hrnetv2p_w32_20e_coco/cascade_rcnn_hrnetv2p_w32_20e_coco_20200208_160511.log.json)  |
 |   HRNetV2p-W40  | pytorch |   20e   |  10.8    |                | 43.8   |  [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/hrnet/cascade_rcnn_hrnetv2p_w40_20e_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/hrnet/cascade_rcnn_hrnetv2p_w40_20e_coco/cascade_rcnn_hrnetv2p_w40_20e_coco_20200512_161112-75e47b04.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/hrnet/cascade_rcnn_hrnetv2p_w40_20e_coco/cascade_rcnn_hrnetv2p_w40_20e_coco_20200512_161112.log.json)  |
 
-
 ### Cascade Mask R-CNN
 
 |    Backbone     |  Style  | Lr schd | Mem (GB) | Inf time (fps) | box AP | mask AP | Config | Download |
@@ -71,7 +68,6 @@
 |   HRNetV2p-W32  | pytorch |   20e   | 13.1     | 4.9            | 45.4   | 39.9    | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/hrnet/htc_hrnetv2p_w32_20e_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/hrnet/htc_hrnetv2p_w32_20e_coco/htc_hrnetv2p_w32_20e_coco_20200207-7639fa12.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/hrnet/htc_hrnetv2p_w32_20e_coco/htc_hrnetv2p_w32_20e_coco_20200207_193153.log.json) |
 |   HRNetV2p-W40  | pytorch |   20e   | 14.6     |                | 46.4   | 40.8    | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/hrnet/htc_hrnetv2p_w40_20e_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/hrnet/htc_hrnetv2p_w40_20e_coco/htc_hrnetv2p_w40_20e_coco_20200529_183411-417c4d5b.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/hrnet/htc_hrnetv2p_w40_20e_coco/htc_hrnetv2p_w40_20e_coco_20200529_183411.log.json) |
 
-
 ### FCOS
 
 | Backbone  | Style   |  GN     | MS train | Lr schd | Mem (GB) | Inf time (fps) | box AP | Config | Download |
@@ -84,8 +80,6 @@
 |HRNetV2p-W32| pytorch | Y       | Y       | 2x       | 17.5 | 12.4 | 41.8   | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/hrnet/fcos_hrnetv2p_w32_gn-head_mstrain_640-800_4x4_2x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/hrnet/fcos_hrnetv2p_w32_gn-head_mstrain_640-800_4x4_2x_coco/fcos_hrnetv2p_w32_gn-head_mstrain_640-800_4x4_2x_coco_20200314-065d37a6.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/hrnet/fcos_hrnetv2p_w32_gn-head_mstrain_640-800_4x4_2x_coco/fcos_hrnetv2p_w32_gn-head_mstrain_640-800_4x4_2x_coco_20200314_145356.log.json) |
 |HRNetV2p-W48| pytorch | Y       | Y       | 2x       | 20.3 | 10.8 | 42.8   | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/hrnet/fcos_hrnetv2p_w40_gn-head_mstrain_640-800_4x4_2x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/hrnet/fcos_hrnetv2p_w40_gn-head_mstrain_640-800_4x4_2x_coco/fcos_hrnetv2p_w40_gn-head_mstrain_640-800_4x4_2x_coco_20200314-e201886d.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/hrnet/fcos_hrnetv2p_w40_gn-head_mstrain_640-800_4x4_2x_coco/fcos_hrnetv2p_w40_gn-head_mstrain_640-800_4x4_2x_coco_20200314_150607.log.json) |
 
-
-
 **Note:**
 
 - The `28e` schedule in HTC indicates decreasing the lr at 24 and 27 epochs, with a total of 28 epochs.
diff --git a/configs/htc/README.md b/configs/htc/README.md
index 618dcbae10f..d0fa59d9acb 100644
--- a/configs/htc/README.md
+++ b/configs/htc/README.md
@@ -4,7 +4,7 @@
 
 We provide config files to reproduce the results in the CVPR 2019 paper for [Hybrid Task Cascade](https://arxiv.org/abs/1901.07518).
 
-```
+```latex
 @inproceedings{chen2019hybrid,
   title={Hybrid task cascade for instance segmentation},
   author={Chen, Kai and Pang, Jiangmiao and Wang, Jiaqi and Xiong, Yu and Li, Xiaoxiao and Sun, Shuyang and Feng, Wansen and Liu, Ziwei and Shi, Jianping and Ouyang, Wanli and Chen Change Loy and Dahua Lin},
@@ -18,7 +18,7 @@ We provide config files to reproduce the results in the CVPR 2019 paper for [Hyb
 HTC requires COCO and COCO-stuff dataset for training. You need to download and extract it in the COCO dataset path.
 The directory should be like this.
 
-```
+```none
 mmdetection
 ├── mmdet
 ├── tools
@@ -46,7 +46,7 @@ The results on COCO 2017val are shown in the below table. (results on test-dev a
 
 - In the HTC paper and COCO 2018 Challenge, `score_thr` is set to 0.001 for both baselines and HTC.
 - We use 8 GPUs with 2 images/GPU for R-50 and R-101 models, and 16 GPUs with 1 image/GPU for X-101 models.
-If you would like to train X-101 HTC with 8 GPUs, you need to change the lr from 0.02 to 0.01.
+  If you would like to train X-101 HTC with 8 GPUs, you need to change the lr from 0.02 to 0.01.
 
 We also provide a powerful HTC with DCN and multi-scale training model. No testing augmentation is used.
 
diff --git a/configs/instaboost/README.md b/configs/instaboost/README.md
index 1d4dbe5951c..1017fb96303 100644
--- a/configs/instaboost/README.md
+++ b/configs/instaboost/README.md
@@ -2,7 +2,7 @@
 
 Configs in this directory is the implementation for ICCV2019 paper "InstaBoost: Boosting Instance Segmentation Via Probability Map Guided Copy-Pasting" and provided by the authors of the paper. InstaBoost is a data augmentation method for object detection and instance segmentation. The paper has been released on [`arXiv`](https://arxiv.org/abs/1908.07801).
 
-```
+```latex
 @inproceedings{fang2019instaboost,
   title={Instaboost: Boosting instance segmentation via probability map guided copy-pasting},
   author={Fang, Hao-Shu and Sun, Jianhua and Wang, Runzhong and Gou, Minghao and Li, Yong-Lu and Lu, Cewu},
@@ -18,7 +18,7 @@ Configs in this directory is the implementation for ICCV2019 paper "InstaBoost:
 
 You need to install `instaboostfast` before using it.
 
-```
+```shell
 pip install instaboostfast
 ```
 
@@ -30,10 +30,9 @@ InstaBoost have been already integrated in the data pipeline, thus all you need
 
 ## Results and Models
 
- - All models were trained on `coco_2017_train` and tested on `coco_2017_val` for conveinience of evaluation and comparison. In the paper, the results are obtained from `test-dev`.
- - To balance accuracy and training time when using InstaBoost, models released in this page are all trained for 48 Epochs. Other training and testing configs strictly follow the original framework.
- - For results and models in MMDetection V1.x, please refer to [Instaboost](https://github.com/GothicAi/Instaboost).
-
+- All models were trained on `coco_2017_train` and tested on `coco_2017_val` for conveinience of evaluation and comparison. In the paper, the results are obtained from `test-dev`.
+- To balance accuracy and training time when using InstaBoost, models released in this page are all trained for 48 Epochs. Other training and testing configs strictly follow the original framework.
+- For results and models in MMDetection V1.x, please refer to [Instaboost](https://github.com/GothicAi/Instaboost).
 
 |     Network     |       Backbone       | Lr schd | Mem (GB) | Inf time (fps) | box AP  | mask AP | Config |     Download       |
 | :-------------: |      :--------:      | :-----: | :------: | :------------: | :------:| :-----: | :------: | :-----------------: |
diff --git a/configs/legacy_1.x/README.md b/configs/legacy_1.x/README.md
index 9a0bb477a1e..ae751d61d42 100644
--- a/configs/legacy_1.x/README.md
+++ b/configs/legacy_1.x/README.md
@@ -10,6 +10,7 @@ Due to the BC-breaking changes in MMDetection V2.0 from MMDetection V1.x, runnin
 To upgrade the model version, the users need to do the following steps.
 
 ### 1. Convert model weights
+
 There are three main difference in the model weights between V1.x and V2.0 codebases.
 
 1. Since the class order in all the detector's classification branch is reordered, all the legacy model weights need to go through the conversion process.
@@ -23,10 +24,11 @@ detectors. We provide a scripts `tools/upgrade_model_version.py` to convert the
 python tools/upgrade_model_version.py ${OLD_MODEL_PATH} ${NEW_MODEL_PATH} --num-classes ${NUM_CLASSES}
 
 ```
+
 - OLD_MODEL_PATH: the path to load the model weights in 1.x version.
 - NEW_MODEL_PATH: the path to save the converted model weights in 2.0 version.
 - NUM_CLASSES: number of classes of the original model weights. Usually it is 81 for COCO dataset, 21 for VOC dataset.
-The number of classes in V2.0 models should be equal to that in V1.x models - 1.
+  The number of classes in V2.0 models should be equal to that in V1.x models - 1.
 
 ### 2. Use configs with legacy settings
 
diff --git a/configs/lvis/README.md b/configs/lvis/README.md
index d7c106b5e25..a7d7850bb42 100644
--- a/configs/lvis/README.md
+++ b/configs/lvis/README.md
@@ -1,7 +1,8 @@
 # LVIS dataset
 
 ## Introduction
-```
+
+```latex
 @inproceedings{gupta2019lvis,
   title={{LVIS}: A Dataset for Large Vocabulary Instance Segmentation},
   author={Gupta, Agrim and Dollar, Piotr and Girshick, Ross},
@@ -11,16 +12,21 @@
 ```
 
 ## Common Setting
+
 * Please follow [install guide](../../docs/install.md#install-mmdetection) to install open-mmlab forked cocoapi first.
 * Run following scripts to install our forked lvis-api.
-    ```
+
+    ```shell
     # mmlvis is fully compatible with official lvis
     pip install mmlvis
     ```
+
     or
-    ```
+
+    ```shell
     pip install -r requirements/optional.txt
     ```
+
 * All experiments use oversample strategy [here](../../docs/tutorials/new_dataset.md#class-balanced-dataset) with oversample threshold `1e-3`.
 * The size of LVIS v0.5 is half of COCO, so schedule `2x` in LVIS is roughly the same iterations as `1x` in COCO.
 
diff --git a/configs/mask_rcnn/README.md b/configs/mask_rcnn/README.md
index 40533b7182a..d65f170735f 100644
--- a/configs/mask_rcnn/README.md
+++ b/configs/mask_rcnn/README.md
@@ -1,7 +1,8 @@
 # Mask R-CNN
 
 ## Introduction
-```
+
+```latex
 @article{He_2017,
    title={Mask R-CNN},
    journal={2017 IEEE International Conference on Computer Vision (ICCV)},
@@ -28,8 +29,8 @@
 | X-101-64x4d-FPN | pytorch |   2x    |  -       |   -            |  42.7  |  38.1   | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/mask_rcnn/mask_rcnn_x101_64x4d_fpn_2x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_x101_64x4d_fpn_2x_coco/mask_rcnn_x101_64x4d_fpn_2x_coco_20200509_224208-39d6f70c.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_x101_64x4d_fpn_2x_coco/mask_rcnn_x101_64x4d_fpn_2x_coco_20200509_224208.log.json)|
 | X-101-32x8d-FPN | pytorch |   1x    |  -       |   -            |  42.8  |  38.3   | |
 
-
 ## Pre-trained Models
+
 We also train some models with longer schedules and multi-scale training. The users could finetune them for downstream tasks.
 
 |    Backbone     |  Style  | Lr schd | Mem (GB) | Inf time (fps) | box AP | mask AP | Config | Download |
diff --git a/configs/nas_fcos/README.md b/configs/nas_fcos/README.md
index 87c58dcb9f1..420121fc00d 100644
--- a/configs/nas_fcos/README.md
+++ b/configs/nas_fcos/README.md
@@ -2,7 +2,7 @@
 
 ## Introduction
 
-```
+```latex
 @article{wang2019fcos,
   title={Nas-fcos: Fast neural architecture search for object detection},
   author={Wang, Ning and Gao, Yang and Chen, Hao and Wang, Peng and Tian, Zhi and Shen, Chunhua},
@@ -19,4 +19,5 @@
 | FCOSHead  | R-50      | caffe   | Y       | 1x      |          |                | 38.5   | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/nas_fcos/nas_fcos_fcoshead_r50_caffe_fpn_gn-head_4x4_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/nas_fcos/nas_fcos_fcoshead_r50_caffe_fpn_gn-head_4x4_1x_coco/nas_fcos_fcoshead_r50_caffe_fpn_gn-head_4x4_1x_coco_20200521-7fdcbce0.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/nas_fcos/nas_fcos_fcoshead_r50_caffe_fpn_gn-head_4x4_1x_coco/nas_fcos_fcoshead_r50_caffe_fpn_gn-head_4x4_1x_coco_20200521.log.json) |
 
 **Notes:**
+
 - To be consistent with the author's implementation, we use 4 GPUs with 4 images/GPU.
diff --git a/configs/nas_fpn/README.md b/configs/nas_fpn/README.md
index c6e0a0c9c93..d5faecf78f5 100644
--- a/configs/nas_fpn/README.md
+++ b/configs/nas_fpn/README.md
@@ -2,7 +2,7 @@
 
 ## Introduction
 
-```
+```latex
 @inproceedings{ghiasi2019fpn,
   title={Nas-fpn: Learning scalable feature pyramid architecture for object detection},
   author={Ghiasi, Golnaz and Lin, Tsung-Yi and Le, Quoc V},
@@ -21,5 +21,4 @@ We benchmark the new training schedule (crop training, large batch, unfrozen BN,
 | R-50-FPN    | 50e     | 12.9     | 22.9           | 37.9   | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/nas_fpn/retinanet_r50_fpn_crop640_50e_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/nas_fpn/retinanet_r50_fpn_crop640_50e_coco/retinanet_r50_fpn_crop640_50e_coco-9b953d76.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/nas_fpn/retinanet_r50_fpn_crop640_50e_coco/retinanet_r50_fpn_crop640_50e_coco_20200529_095329.log.json) |
 | R-50-NASFPN | 50e     | 13.2     | 23.0           | 40.5   | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/nas_fpn/retinanet_r50_nasfpn_crop640_50e_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/nas_fpn/retinanet_r50_nasfpn_crop640_50e_coco/retinanet_r50_nasfpn_crop640_50e_coco-0ad1f644.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/nas_fpn/retinanet_r50_nasfpn_crop640_50e_coco/retinanet_r50_nasfpn_crop640_50e_coco_20200528_230008.log.json) |
 
-
 **Note**: We find that it is unstable to train NAS-FPN and there is a small chance that results can be 3% mAP lower.
diff --git a/configs/paa/README.md b/configs/paa/README.md
index 19b2b4740f1..38abe0ba8b6 100644
--- a/configs/paa/README.md
+++ b/configs/paa/README.md
@@ -1,8 +1,7 @@
 # Probabilistic Anchor Assignment with IoU Prediction for Object Detection
 
-
-
 ## Results and Models
+
 We provide config files to reproduce the object detection results in the
 ECCV 2020 paper for Probabilistic Anchor Assignment with IoU
 Prediction for Object Detection.
@@ -19,4 +18,5 @@ Prediction for Object Detection.
 | R-101-FPN   | 24e     | 6.2     | True          | 43.5   | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/paa/paa_r101_fpn_2x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/paa/paa_r101_fpn_2x_coco/paa_r101_fpn_2x_coco_20200821-6829f96b.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/paa/paa_r101_fpn_2x_coco/paa_r101_fpn_2x_coco_20200821-6829f96b.log.json) |
 
 **Note**:
+
 1. We find that the performance is unstable with 1x setting and may fluctuate by about 0.2 mAP. We report the best results.
diff --git a/configs/pisa/README.md b/configs/pisa/README.md
index 75e58b7acef..b03ac7ad9bd 100644
--- a/configs/pisa/README.md
+++ b/configs/pisa/README.md
@@ -2,7 +2,7 @@
 
 ## Introduction
 
-```
+```latex
 @inproceedings{cao2019prime,
   title={Prime sample attention in object detection},
   author={Cao, Yuhang and Chen, Kai and Loy, Chen Change and Lin, Dahua},
@@ -13,7 +13,6 @@
 
 ## Results and models
 
-
 | PISA | Network | Backbone            | Lr schd | box AP | mask AP | Config | Download |
 |:----:|:-------:|:-------------------:|:-------:|:------:|:-------:|:------:|:--------:|
 | ×    | Faster R-CNN | R-50-FPN       | 1x      | 36.4   |         | - |
@@ -34,5 +33,6 @@
 | √    | SSD300       | VGG16          | 1x      | 31.8   |         | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/pisa/pisa_ssd512_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/pisa/pisa_ssd512_coco/pisa_ssd512_coco-247addee.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/pisa/pisa_ssd512_coco/pisa_ssd512_coco_20200508_131030.log.json)  |
 
 **Notes:**
+
 - In the original paper, all models are trained and tested on mmdet v1.x, thus results may not be exactly the same with this release on v2.0.
 - It is noted PISA only modifies the training pipeline so the inference time remains the same with the baseline.
diff --git a/configs/point_rend/README.md b/configs/point_rend/README.md
index e946973e700..0120185f07d 100644
--- a/configs/point_rend/README.md
+++ b/configs/point_rend/README.md
@@ -1,7 +1,8 @@
 # PointRend
 
 ## Introduction
-```
+
+```latex
 @InProceedings{kirillov2019pointrend,
   title={{PointRend}: Image Segmentation as Rendering},
   author={Alexander Kirillov and Yuxin Wu and Kaiming He and Ross Girshick},
diff --git a/configs/regnet/README.md b/configs/regnet/README.md
index 67ba03d2de9..a3d332cda03 100644
--- a/configs/regnet/README.md
+++ b/configs/regnet/README.md
@@ -6,7 +6,7 @@ We implement RegNetX and RegNetY models in detection systems and provide their f
 
 The pre-trained modles are converted from [model zoo of pycls](https://github.com/facebookresearch/pycls/blob/master/MODEL_ZOO.md).
 
-```
+```latex
 @article{radosavovic2020designing,
     title={Designing Network Design Spaces},
     author={Ilija Radosavovic and Raj Prateek Kosaraju and Ross Girshick and Kaiming He and Piotr Dollár},
@@ -20,6 +20,7 @@ The pre-trained modles are converted from [model zoo of pycls](https://github.co
 ## Usage
 
 To use a regnet model, there are two steps to do:
+
 1. Convert the model to ResNet-style supported by MMDetection
 2. Modify backbone and neck in config accordingly
 
@@ -33,8 +34,8 @@ ResNet-style checkpoints used in MMDetection.
 ```bash
 python -u tools/regnet2mmdet.py ${PRETRAIN_PATH} ${STORE_PATH}
 ```
-This script convert model from `PRETRAIN_PATH` and store the converted model in `STORE_PATH`.
 
+This script convert model from `PRETRAIN_PATH` and store the converted model in `STORE_PATH`.
 
 ### Modify config
 
@@ -48,6 +49,7 @@ For other pre-trained models or self-implemented regnet models, the users are re
 ## Results
 
 ### Mask R-CNN
+
 |   Backbone   |  Style  | Lr schd | Mem (GB) | Inf time (fps) | box AP | mask AP | Config | Download |
 | :---------: | :-----: | :-----: | :------: | :------------: | :----: | :-----: | :------: | :--------: |
 |    [R-50-FPN](../mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py)| pytorch |   1x    | 4.4      | 12.0           | 38.2   | 34.7    |  [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r50_fpn_1x_coco/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r50_fpn_1x_coco/mask_rcnn_r50_fpn_1x_coco_20200205_050542.log.json) |
@@ -61,6 +63,7 @@ For other pre-trained models or self-implemented regnet models, the users are re
 |[RegNetX-3.2GF-FPN-DCN-C3-C5](./mask_rcnn_regnetx-3.2GF_fpn_mdconv_c3-c5_1x_coco.py)| pytorch |   1x    |5.0 ||40.3|36.6|[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/regnet/mask_rcnn_regnetx-3.2GF_fpn_mdconv_c3-c5_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/regnet/mask_rcnn_regnetx-3.2GF_fpn_mdconv_c3-c5_1x_coco/mask_rcnn_regnetx-3.2GF_fpn_mdconv_c3-c5_1x_coco_20200520_172726-75f40794.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/regnet/mask_rcnn_regnetx-3.2GF_fpn_mdconv_c3-c5_1x_coco/mask_rcnn_regnetx-3.2GF_fpn_mdconv_c3-c5_1x_coco_20200520_172726.log.json)   |
 
 ### Faster R-CNN
+
 |   Backbone  |  Style  | Lr schd | Mem (GB) | Inf time (fps) | box AP | Config | Download |
 | :---------: | :-----: | :-----: | :------: | :------------: | :----: | :------: | :--------: |
 |    [R-50-FPN](../faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py)| pytorch |   1x    | 4.0      | 18.2           | 37.4   | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130_204655.log.json) |
@@ -68,6 +71,7 @@ For other pre-trained models or self-implemented regnet models, the users are re
 |[RegNetX-3.2GF-FPN](./faster_rcnn_regnetx-3.2GF_fpn_2x_coco.py)| pytorch |   2x    | 4.5||41.1|[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/regnet/faster_rcnn_regnetx-3.2GF_fpn_2x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/regnet/faster_rcnn_regnetx-3.2GF_fpn_2x_coco/faster_rcnn_regnetx-3.2GF_fpn_2x_coco_20200520_223955-e2081918.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/regnet/faster_rcnn_regnetx-3.2GF_fpn_2x_coco/faster_rcnn_regnetx-3.2GF_fpn_2x_coco_20200520_223955.log.json)   |
 
 ### RetinaNet
+
 |  Backbone   |  Style  | Lr schd | Mem (GB) | Inf time (fps) | box AP | Config | Download |
 | :---------: | :-----: | :-----: | :------: | :------------: | :----: | :------: |  :--------: |
 |    [R-50-FPN](../retinanet/retinanet_r50_fpn_1x_coco.py)     | pytorch |   1x    |   3.8    |      16.6      |  36.5  | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/retinanet/retinanet_r50_fpn_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/retinanet/retinanet_r50_fpn_1x_coco/retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/retinanet/retinanet_r50_fpn_1x_coco/retinanet_r50_fpn_1x_coco_20200130_002941.log.json) |
@@ -85,6 +89,6 @@ We also train some models with longer schedules and multi-scale training. The us
 |Mask RCNN |[RegNetX-3.2GF-FPN](./mask_rcnn_regnetx-3.2GF_fpn_mstrain_3x_coco.py)| pytorch |   3x    |5.0 ||43.1|38.7|[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/regnet/mask_rcnn_regnetx-3.2GF_fpn_mstrain_3x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/regnet/mask_rcnn_regnetx-3.2GF_fpn_mstrain_3x_coco/mask_rcnn_regnetx-3.2GF_fpn_mstrain_3x_coco_20200521_202221-99879813.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/regnet/mask_rcnn_regnetx-3.2GF_fpn_mstrain_3x_coco/mask_rcnn_regnetx-3.2GF_fpn_mstrain_3x_coco_20200521_202221.log.json)   |
 
 ### Notice
+
 1. The models are trained using a different weight decay, i.e., `weight_decay=5e-5` according to the setting in ImageNet training. This brings improvement of at least 0.7 AP absolute but does not improve the model using ResNet-50.
-2. RetinaNets using RegNets are trained with learning rate 0.02 with gradient clip. We find that using learning rate 0.02 could improve the results by at least 0.7 AP absolute and gradient clip is necessary to stabilize the training.
-However, this does not improve the performance of ResNet-50-FPN RetinaNet.
+2. RetinaNets using RegNets are trained with learning rate 0.02 with gradient clip. We find that using learning rate 0.02 could improve the results by at least 0.7 AP absolute and gradient clip is necessary to stabilize the training. However, this does not improve the performance of ResNet-50-FPN RetinaNet.
diff --git a/configs/res2net/README.md b/configs/res2net/README.md
index b326ba4a5ee..3275fdfbc9d 100644
--- a/configs/res2net/README.md
+++ b/configs/res2net/README.md
@@ -14,9 +14,10 @@ We propose a novel building block for CNNs, namely Res2Net, by constructing hier
 Compared with other backbone networks, Res2Net requires fewer parameters and FLOPs.
 
 **Note:**
+
 - GFLOPs for classification are calculated with image size (224x224).
 
-```
+```latex
 @article{gao2019res2net,
   title={Res2Net: A New Multi-scale Backbone Architecture},
   author={Gao, Shang-Hua and Cheng, Ming-Ming and Zhao, Kai and Zhang, Xin-Yu and Yang, Ming-Hsuan and Torr, Philip},
@@ -25,28 +26,38 @@ Compared with other backbone networks, Res2Net requires fewer parameters and FLO
   doi={10.1109/TPAMI.2019.2938758},
 }
 ```
+
 ## Results and Models
+
 ### Faster R-CNN
+
 |    Backbone     |  Style  | Lr schd | Mem (GB) | Inf time (fps) | box AP | Config | Download |
 | :-------------: | :-----: | :-----: | :------: | :------------: | :----: | :------: | :--------: |
-|R2-101-FPN	      | pytorch	|   2x	  |   7.4	   |   -	          |  43.0	 |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/res2net/faster_rcnn_r2_101_fpn_2x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/res2net/faster_rcnn_r2_101_fpn_2x_coco/faster_rcnn_r2_101_fpn_2x_coco-175f1da6.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/res2net/faster_rcnn_r2_101_fpn_2x_coco/faster_rcnn_r2_101_fpn_2x_coco_20200514_231734.log.json) |
+|R2-101-FPN       | pytorch |   2x   |   7.4    |   -           |  43.0  |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/res2net/faster_rcnn_r2_101_fpn_2x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/res2net/faster_rcnn_r2_101_fpn_2x_coco/faster_rcnn_r2_101_fpn_2x_coco-175f1da6.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/res2net/faster_rcnn_r2_101_fpn_2x_coco/faster_rcnn_r2_101_fpn_2x_coco_20200514_231734.log.json) |
+
 ### Mask R-CNN
+
 |    Backbone     |  Style  | Lr schd | Mem (GB) | Inf time (fps) | box AP | mask AP | Config | Download |
 | :-------------: | :-----: | :-----: | :------: | :------------: | :----: | :-----: | :------: | :--------: |
-|R2-101-FPN	      | pytorch	|    2x	  |   7.9	   |      -	        |   43.6 |	38.7	 |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/res2net/mask_rcnn_r2_101_fpn_2x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/res2net/mask_rcnn_r2_101_fpn_2x_coco/mask_rcnn_r2_101_fpn_2x_coco-17f061e8.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/res2net/mask_rcnn_r2_101_fpn_2x_coco/mask_rcnn_r2_101_fpn_2x_coco_20200515_002413.log.json) |
+|R2-101-FPN       | pytorch |    2x   |   7.9    |      -         |   43.6 | 38.7  |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/res2net/mask_rcnn_r2_101_fpn_2x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/res2net/mask_rcnn_r2_101_fpn_2x_coco/mask_rcnn_r2_101_fpn_2x_coco-17f061e8.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/res2net/mask_rcnn_r2_101_fpn_2x_coco/mask_rcnn_r2_101_fpn_2x_coco_20200515_002413.log.json) |
+
 ### Cascade R-CNN
+
 |    Backbone     |  Style  | Lr schd | Mem (GB) | Inf time (fps) | box AP | Config | Download |
 | :-------------: | :-----: | :-----: | :------: | :------------: | :----: | :------: | :--------: |
-|R2-101-FPN	      | pytorch	|   20e	  |   7.8	   |      -	        |  45.7  |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/res2net/cascade_rcnn_r2_101_fpn_20e_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/res2net/cascade_rcnn_r2_101_fpn_20e_coco/cascade_rcnn_r2_101_fpn_20e_coco-f4b7b7db.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/res2net/cascade_rcnn_r2_101_fpn_20e_coco/cascade_rcnn_r2_101_fpn_20e_coco_20200515_091644.log.json) |
+|R2-101-FPN       | pytorch |   20e   |   7.8    |      -         |  45.7  |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/res2net/cascade_rcnn_r2_101_fpn_20e_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/res2net/cascade_rcnn_r2_101_fpn_20e_coco/cascade_rcnn_r2_101_fpn_20e_coco-f4b7b7db.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/res2net/cascade_rcnn_r2_101_fpn_20e_coco/cascade_rcnn_r2_101_fpn_20e_coco_20200515_091644.log.json) |
+
 ### Cascade Mask R-CNN
+
 |    Backbone     |  Style  | Lr schd | Mem (GB) | Inf time (fps) | box AP | mask AP | Config | Download |
 | :-------------: | :-----: | :-----: | :------: | :------------: | :----: | :-----: | :------: | :--------: |
-R2-101-FPN	      | pytorch	|  20e	  |    9.5	 |      -	        |  46.4	 |  40.0	 |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/res2net/cascade_mask_rcnn_r2_101_fpn_20e_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/res2net/cascade_mask_rcnn_r2_101_fpn_20e_coco/cascade_mask_rcnn_r2_101_fpn_20e_coco-8a7b41e1.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/res2net/cascade_mask_rcnn_r2_101_fpn_20e_coco/cascade_mask_rcnn_r2_101_fpn_20e_coco_20200515_091645.log.json) |
+R2-101-FPN       | pytorch |  20e   |    9.5  |      -         |  46.4  |  40.0  |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/res2net/cascade_mask_rcnn_r2_101_fpn_20e_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/res2net/cascade_mask_rcnn_r2_101_fpn_20e_coco/cascade_mask_rcnn_r2_101_fpn_20e_coco-8a7b41e1.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/res2net/cascade_mask_rcnn_r2_101_fpn_20e_coco/cascade_mask_rcnn_r2_101_fpn_20e_coco_20200515_091645.log.json) |
+
 ### Hybrid Task Cascade (HTC)
+
 |    Backbone     |  Style  | Lr schd | Mem (GB) | Inf time (fps) | box AP | mask AP | Config | Download |
 | :-------------: | :-----: | :-----: | :------: | :------------: | :----: | :-----: | :------: | :--------: |
-| R2-101-FPN	    | pytorch	|   20e	  |    -	   |      -	        |  47.5  |	41.6	 | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/res2net/htc_r2_101_fpn_20e_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/res2net/htc_r2_101_fpn_20e_coco/htc_r2_101_fpn_20e_coco-3a8d2112.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/res2net/htc_r2_101_fpn_20e_coco/htc_r2_101_fpn_20e_coco_20200515_150029.log.json) |
-
+| R2-101-FPN     | pytorch |   20e   |    -    |      -         |  47.5  | 41.6  | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/res2net/htc_r2_101_fpn_20e_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/res2net/htc_r2_101_fpn_20e_coco/htc_r2_101_fpn_20e_coco-3a8d2112.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/res2net/htc_r2_101_fpn_20e_coco/htc_r2_101_fpn_20e_coco_20200515_150029.log.json) |
 
 - Res2Net ImageNet pretrained models are in [Res2Net-PretrainedModels](https://github.com/Res2Net/Res2Net-PretrainedModels).
 - More applications of Res2Net are in [Res2Net-Github](https://github.com/Res2Net/).
diff --git a/configs/resnest/README.md b/configs/resnest/README.md
index 07c916407e8..4d29c2a0c12 100644
--- a/configs/resnest/README.md
+++ b/configs/resnest/README.md
@@ -17,26 +17,26 @@ year={2020}
 
 |    Backbone     |  Style  | Lr schd | Mem (GB) | Inf time (fps) | box AP | Config | Download |
 | :-------------: | :-----: | :-----: | :------: | :------------: | :----: | :------: | :--------: |
-|S-50-FPN	      | pytorch	|   1x	  |   4.8  |   -	          | 42.0 |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/resnest/faster_rcnn_s50_fpn_syncbn-backbone+head_mstrain-range_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/resnest/faster_rcnn_s50_fpn_syncbn-backbone+head_mstrain-range_1x_coco/faster_rcnn_s50_fpn_syncbn-backbone+head_mstrain-range_1x_coco_20200926_125502-20289c16.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/resnest/faster_rcnn_s50_fpn_syncbn-backbone+head_mstrain-range_1x_coco/faster_rcnn_s50_fpn_syncbn-backbone+head_mstrain-range_1x_coco-20200926_125502.log.json) |
-|S-101-FPN	      | pytorch	|   1x	  |   7.1  |   -	          | 44.5 |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/resnest/faster_rcnn_s101_fpn_syncbn-backbone+head_mstrain-range_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/resnest/faster_rcnn_s101_fpn_syncbn-backbone+head_mstrain-range_1x_coco/faster_rcnn_s101_fpn_syncbn-backbone+head_mstrain-range_1x_coco_20201006_021058-421517f1.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/resnest/faster_rcnn_s101_fpn_syncbn-backbone+head_mstrain-range_1x_coco/faster_rcnn_s101_fpn_syncbn-backbone+head_mstrain-range_1x_coco-20201006_021058.log.json) |
+|S-50-FPN       | pytorch |   1x   |   4.8  |   -           | 42.0 |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/resnest/faster_rcnn_s50_fpn_syncbn-backbone+head_mstrain-range_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/resnest/faster_rcnn_s50_fpn_syncbn-backbone+head_mstrain-range_1x_coco/faster_rcnn_s50_fpn_syncbn-backbone+head_mstrain-range_1x_coco_20200926_125502-20289c16.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/resnest/faster_rcnn_s50_fpn_syncbn-backbone+head_mstrain-range_1x_coco/faster_rcnn_s50_fpn_syncbn-backbone+head_mstrain-range_1x_coco-20200926_125502.log.json) |
+|S-101-FPN       | pytorch |   1x   |   7.1  |   -           | 44.5 |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/resnest/faster_rcnn_s101_fpn_syncbn-backbone+head_mstrain-range_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/resnest/faster_rcnn_s101_fpn_syncbn-backbone+head_mstrain-range_1x_coco/faster_rcnn_s101_fpn_syncbn-backbone+head_mstrain-range_1x_coco_20201006_021058-421517f1.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/resnest/faster_rcnn_s101_fpn_syncbn-backbone+head_mstrain-range_1x_coco/faster_rcnn_s101_fpn_syncbn-backbone+head_mstrain-range_1x_coco-20201006_021058.log.json) |
 
 ### Mask R-CNN
 
 |    Backbone     |  Style  | Lr schd | Mem (GB) | Inf time (fps) | box AP | mask AP | Config | Download |
 | :-------------: | :-----: | :-----: | :------: | :------------: | :----: | :-----: | :------: | :--------: |
-|S-50-FPN	      | pytorch	|    1x	  |   5.5  |      -	        | 42.6 | 38.1 |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/resnest/mask_rcnn_s50_fpn_syncbn-backbone+head_mstrain_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/resnest/mask_rcnn_s50_fpn_syncbn-backbone+head_mstrain_1x_coco/mask_rcnn_s50_fpn_syncbn-backbone+head_mstrain_1x_coco_20200926_125503-8a2c3d47.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/resnest/mask_rcnn_s50_fpn_syncbn-backbone+head_mstrain_1x_coco/mask_rcnn_s50_fpn_syncbn-backbone+head_mstrain_1x_coco-20200926_125503.log.json) |
-|S-101-FPN	      | pytorch	|    1x	  |   7.8  |      -	        | 45.2 | 40.2 |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/resnest/mask_rcnn_s101_fpn_syncbn-backbone+head_mstrain_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/resnest/mask_rcnn_s101_fpn_syncbn-backbone+head_mstrain_1x_coco/mask_rcnn_s101_fpn_syncbn-backbone+head_mstrain_1x_coco_20201005_215831-af60cdf9.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/resnest/mask_rcnn_s101_fpn_syncbn-backbone+head_mstrain_1x_coco/mask_rcnn_s101_fpn_syncbn-backbone+head_mstrain_1x_coco-20201005_215831.log.json) |
+|S-50-FPN       | pytorch |    1x   |   5.5  |      -         | 42.6 | 38.1 |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/resnest/mask_rcnn_s50_fpn_syncbn-backbone+head_mstrain_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/resnest/mask_rcnn_s50_fpn_syncbn-backbone+head_mstrain_1x_coco/mask_rcnn_s50_fpn_syncbn-backbone+head_mstrain_1x_coco_20200926_125503-8a2c3d47.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/resnest/mask_rcnn_s50_fpn_syncbn-backbone+head_mstrain_1x_coco/mask_rcnn_s50_fpn_syncbn-backbone+head_mstrain_1x_coco-20200926_125503.log.json) |
+|S-101-FPN       | pytorch |    1x   |   7.8  |      -         | 45.2 | 40.2 |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/resnest/mask_rcnn_s101_fpn_syncbn-backbone+head_mstrain_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/resnest/mask_rcnn_s101_fpn_syncbn-backbone+head_mstrain_1x_coco/mask_rcnn_s101_fpn_syncbn-backbone+head_mstrain_1x_coco_20201005_215831-af60cdf9.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/resnest/mask_rcnn_s101_fpn_syncbn-backbone+head_mstrain_1x_coco/mask_rcnn_s101_fpn_syncbn-backbone+head_mstrain_1x_coco-20201005_215831.log.json) |
 
 ### Cascade R-CNN
 
 |    Backbone     |  Style  | Lr schd | Mem (GB) | Inf time (fps) | box AP | Config | Download |
 | :-------------: | :-----: | :-----: | :------: | :------------: | :----: | :------: | :--------: |
-|S-50-FPN	      | pytorch	|   1x	  |   -	   |   -	          |  44.5  |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/resnest/cascade_rcnn_s50_fpn_syncbn-backbone+head_mstrain-range_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/resnest/cascade_rcnn_s50_fpn_syncbn-backbone+head_mstrain-range_1x_coco/cascade_rcnn_s50_fpn_syncbn-backbone+head_mstrain-range_1x_coco_20201122_213640-763cc7b5.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/resnest/cascade_rcnn_s101_fpn_syncbn-backbone+head_mstrain-range_1x_coco/cascade_rcnn_s101_fpn_syncbn-backbone+head_mstrain-range_1x_coco-20201005_113242.log.json) |
-|S-101-FPN	      | pytorch	|   1x	  |   8.4  |   -	          |  46.8  |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/resnest/cascade_rcnn_s101_fpn_syncbn-backbone+head_mstrain-range_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/resnest/cascade_rcnn_s101_fpn_syncbn-backbone+head_mstrain-range_1x_coco/cascade_rcnn_s101_fpn_syncbn-backbone+head_mstrain-range_1x_coco_20201005_113242-b9459f8f.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/resnest/cascade_rcnn_s50_fpn_syncbn-backbone+head_mstrain-range_1x_coco/cascade_rcnn_s50_fpn_syncbn-backbone+head_mstrain-range_1x_coco-20201122_213640.log.json) |
+|S-50-FPN       | pytorch |   1x   |   -    |   -           |  44.5  |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/resnest/cascade_rcnn_s50_fpn_syncbn-backbone+head_mstrain-range_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/resnest/cascade_rcnn_s50_fpn_syncbn-backbone+head_mstrain-range_1x_coco/cascade_rcnn_s50_fpn_syncbn-backbone+head_mstrain-range_1x_coco_20201122_213640-763cc7b5.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/resnest/cascade_rcnn_s101_fpn_syncbn-backbone+head_mstrain-range_1x_coco/cascade_rcnn_s101_fpn_syncbn-backbone+head_mstrain-range_1x_coco-20201005_113242.log.json) |
+|S-101-FPN       | pytorch |   1x   |   8.4  |   -           |  46.8  |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/resnest/cascade_rcnn_s101_fpn_syncbn-backbone+head_mstrain-range_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/resnest/cascade_rcnn_s101_fpn_syncbn-backbone+head_mstrain-range_1x_coco/cascade_rcnn_s101_fpn_syncbn-backbone+head_mstrain-range_1x_coco_20201005_113242-b9459f8f.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/resnest/cascade_rcnn_s50_fpn_syncbn-backbone+head_mstrain-range_1x_coco/cascade_rcnn_s50_fpn_syncbn-backbone+head_mstrain-range_1x_coco-20201122_213640.log.json) |
 
 ### Cascade Mask R-CNN
 
 |    Backbone     |  Style  | Lr schd | Mem (GB) | Inf time (fps) | box AP | mask AP | Config | Download |
 | :-------------: | :-----: | :-----: | :------: | :------------: | :----: | :-----: | :------: | :--------: |
-|S-50-FPN	      | pytorch	|    1x	  |   -	   |      -	        | 45.4 | 39.5 |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/resnest/cascade_mask_rcnn_s50_fpn_syncbn-backbone+head_mstrain_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/resnest/cascade_mask_rcnn_s50_fpn_syncbn-backbone+head_mstrain_1x_coco/cascade_mask_rcnn_s50_fpn_syncbn-backbone+head_mstrain_1x_coco_20201122_104428-99eca4c7.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/resnest/cascade_mask_rcnn_s50_fpn_syncbn-backbone+head_mstrain_1x_coco/cascade_mask_rcnn_s50_fpn_syncbn-backbone+head_mstrain_1x_coco-20201122_104428.log.json) |
-|S-101-FPN	      | pytorch	|    1x	  |  10.5  |      -	        | 47.7 | 41.4 |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/resnest/cascade_mask_rcnn_s101_fpn_syncbn-backbone+head_mstrain_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/resnest/cascade_mask_rcnn_s101_fpn_syncbn-backbone+head_mstrain_1x_coco/cascade_mask_rcnn_s101_fpn_syncbn-backbone+head_mstrain_1x_coco_20201005_113243-42607475.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/resnest/cascade_mask_rcnn_s101_fpn_syncbn-backbone+head_mstrain_1x_coco/cascade_mask_rcnn_s101_fpn_syncbn-backbone+head_mstrain_1x_coco-20201005_113243.log.json) |
+|S-50-FPN       | pytorch |    1x   |   -    |      -         | 45.4 | 39.5 |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/resnest/cascade_mask_rcnn_s50_fpn_syncbn-backbone+head_mstrain_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/resnest/cascade_mask_rcnn_s50_fpn_syncbn-backbone+head_mstrain_1x_coco/cascade_mask_rcnn_s50_fpn_syncbn-backbone+head_mstrain_1x_coco_20201122_104428-99eca4c7.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/resnest/cascade_mask_rcnn_s50_fpn_syncbn-backbone+head_mstrain_1x_coco/cascade_mask_rcnn_s50_fpn_syncbn-backbone+head_mstrain_1x_coco-20201122_104428.log.json) |
+|S-101-FPN       | pytorch |    1x   |  10.5  |      -         | 47.7 | 41.4 |[config](https://github.com/open-mmlab/mmdetection/tree/master/configs/resnest/cascade_mask_rcnn_s101_fpn_syncbn-backbone+head_mstrain_1x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/resnest/cascade_mask_rcnn_s101_fpn_syncbn-backbone+head_mstrain_1x_coco/cascade_mask_rcnn_s101_fpn_syncbn-backbone+head_mstrain_1x_coco_20201005_113243-42607475.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/resnest/cascade_mask_rcnn_s101_fpn_syncbn-backbone+head_mstrain_1x_coco/cascade_mask_rcnn_s101_fpn_syncbn-backbone+head_mstrain_1x_coco-20201005_113243.log.json) |
diff --git a/configs/retinanet/README.md b/configs/retinanet/README.md
index b7953ffc9f7..ffb7b9f949d 100644
--- a/configs/retinanet/README.md
+++ b/configs/retinanet/README.md
@@ -1,7 +1,8 @@
 # Focal Loss for Dense Object Detection
 
 ## Introduction
-```
+
+```latex
 @inproceedings{lin2017focal,
   title={Focal loss for dense object detection},
   author={Lin, Tsung-Yi and Goyal, Priya and Girshick, Ross and He, Kaiming and Doll{\'a}r, Piotr},
diff --git a/configs/rpn/README.md b/configs/rpn/README.md
index 773d5e3a3e5..09aff132c54 100644
--- a/configs/rpn/README.md
+++ b/configs/rpn/README.md
@@ -1,7 +1,8 @@
 # Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
 
 ## Introduction
-```
+
+```latex
 @inproceedings{ren2015faster,
   title={Faster r-cnn: Towards real-time object detection with region proposal networks},
   author={Ren, Shaoqing and He, Kaiming and Girshick, Ross and Sun, Jian},
diff --git a/configs/sabl/README.md b/configs/sabl/README.md
index 495324c7667..85c3c57d42b 100644
--- a/configs/sabl/README.md
+++ b/configs/sabl/README.md
@@ -4,7 +4,7 @@
 
 We provide config files to reproduce the object detection results in the ECCV 2020 Spotlight paper for [Side-Aware Boundary Localization for More Precise Object Detection](https://arxiv.org/abs/1912.04260).
 
-```
+```latex
 @inproceedings{Wang_2020_ECCV,
     title = {Side-Aware Boundary Localization for More Precise Object Detection},
     author = {Jiaqi Wang and Wenwei Zhang and Yuhang Cao and Kai Chen and Jiangmiao Pang and Tao Gong and Jianping Shi and Chen Change Loy and Dahua Lin},
@@ -18,7 +18,6 @@ We provide config files to reproduce the object detection results in the ECCV 20
 The results on COCO 2017 val is shown in the below table. (results on test-dev are usually slightly higher than val).
 Single-scale testing (1333x800) is adopted in all results.
 
-
 |       Method       | Backbone  | Lr schd | ms-train | box AP |                                                       Config                                                       |                                                                                                                                   Download                                                                                                                                    |
 | :----------------: | :-------: | :-----: | :------: | :----: | :----------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
 | SABL Faster R-CNN  | R-50-FPN  |   1x    |    N     |  39.9  |  [config](https://github.com/open-mmlab/mmdetection/blob/master/configs/sabl/sabl_faster_rcnn_r50_fpn_1x_coco.py)  |    [model](http://download.openmmlab.com/mmdetection/v2.0/sabl/sabl_faster_rcnn_r50_fpn_1x_coco/sabl_faster_rcnn_r50_fpn_1x_coco-e867595b.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/sabl/sabl_faster_rcnn_r50_fpn_1x_coco/20200830_130324.log.json)    |
diff --git a/configs/scratch/README.md b/configs/scratch/README.md
index a47ed52af08..18f638203c9 100644
--- a/configs/scratch/README.md
+++ b/configs/scratch/README.md
@@ -2,7 +2,7 @@
 
 ## Introduction
 
-```
+```latex
 @article{he2018rethinking,
   title={Rethinking imagenet pre-training},
   author={He, Kaiming and Girshick, Ross and Doll{\'a}r, Piotr},
@@ -19,4 +19,5 @@
 | Mask R-CNN   | R-50-FPN  | pytorch | 6x      | 41.2   | 37.4    | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/scratch/mask_rcnn_r50_fpn_gn-all_scratch_6x_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/scratch/mask_rcnn_r50_fpn_gn-all_scratch_6x_coco/scratch_mask_rcnn_r50_fpn_gn_6x_bbox_mAP-0.412__segm_mAP-0.374_20200201_193051-1e190a40.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/scratch/mask_rcnn_r50_fpn_gn-all_scratch_6x_coco/scratch_mask_rcnn_r50_fpn_gn_6x_20200201_193051.log.json)  |
 
 Note:
+
 - The above models are trained with 16 GPUs.
diff --git a/configs/ssd/README.md b/configs/ssd/README.md
index 582292f2dbb..e0d17744c71 100644
--- a/configs/ssd/README.md
+++ b/configs/ssd/README.md
@@ -1,7 +1,8 @@
 # SSD: Single Shot MultiBox Detector
 
 ## Introduction
-```
+
+```latex
 @article{Liu_2016,
    title={SSD: Single Shot MultiBox Detector},
    journal={ECCV},
diff --git a/configs/vfnet/README.md b/configs/vfnet/README.md
index 3d6aef3fbc8..f5cc22ecfa1 100644
--- a/configs/vfnet/README.md
+++ b/configs/vfnet/README.md
@@ -1,6 +1,7 @@
 # VarifocalNet: An IoU-aware Dense Object Detector
 
 ## Introduction
+
 **VarifocalNet (VFNet)** learns to predict the IoU-aware classification score which mixes the object presence confidence and localization accuracy together as the detection score for a bounding box. The learning is supervised by the proposed Varifocal Loss (VFL), based on a new star-shaped bounding box feature representation (the features at nine yellow sampling points). Given the new representation, the object localization accuracy is further improved by refining the initially regressed bounding box. The full paper is available at: [https://arxiv.org/abs/2008.13367](https://arxiv.org/abs/2008.13367).
 
 <div align="center">
@@ -10,7 +11,7 @@
 
 ## Citing VarifocalNet
 
-```
+```latex
 @article{zhang2020varifocalnet,
   title={VarifocalNet: An IoU-aware Dense Object Detector},
   author={Zhang, Haoyang and Wang, Ying and Dayoub, Feras and S{\"u}nderhauf, Niko},
@@ -32,8 +33,8 @@
 | X-101-32x4d  | pytorch   | Y       | Y        | 2x      | -          | 49.7         | 50.0              | [config](https://github.com/open-mmlab/mmdetection/blob/master/configs/vfnet_x101_32x4d_fpn_mdconv_c3-c5_mstrain_2x_coco.py) | [model](https://openmmlab.oss-cn-hangzhou.aliyuncs.com/mmdetection/v2.0/vfnet/vfnet_x101_32x4d_fpn_mdconv_c3-c5_mstrain_2x_coco/vfnet_x101_32x4d_fpn_mdconv_c3-c5_mstrain_2x_coco_20201027pth-d300a6fc.pth) &#124; [log](https://openmmlab.oss-cn-hangzhou.aliyuncs.com/mmdetection/v2.0/vfnet/vfnet_x101_32x4d_fpn_mdconv_c3-c5_mstrain_2x_coco/vfnet_x101_32x4d_fpn_mdconv_c3-c5_mstrain_2x_coco.json)|
 | X-101-64x4d  | pytorch   | Y       | Y        | 2x      |  -         | 50.4         | 50.8              | [config](https://github.com/open-mmlab/mmdetection/blob/master/configs/vfnet_x101_64x4d_fpn_mdconv_c3-c5_mstrain_2x_coco.py) | [model](https://openmmlab.oss-cn-hangzhou.aliyuncs.com/mmdetection/v2.0/vfnet/vfnet_x101_64x4d_fpn_mdconv_c3-c5_mstrain_2x_coco/vfnet_x101_64x4d_fpn_mdconv_c3-c5_mstrain_2x_coco_20201027pth-b5f6da5e.pth) &#124; [log](https://openmmlab.oss-cn-hangzhou.aliyuncs.com/mmdetection/v2.0/vfnet/vfnet_x101_64x4d_fpn_mdconv_c3-c5_mstrain_2x_coco/vfnet_x101_64x4d_fpn_mdconv_c3-c5_mstrain_2x_coco.json)|
 
-
 **Notes:**
+
 - The MS-train scale range is 1333x[480:960] (`range` mode) and the inference scale keeps 1333x800.
 - DCN means using `DCNv2` in both backbone and head.
 - Inference time will be updated soon.
diff --git a/configs/yolact/README.md b/configs/yolact/README.md
index 37d2b46e26b..fea128e3f78 100644
--- a/configs/yolact/README.md
+++ b/configs/yolact/README.md
@@ -1,4 +1,5 @@
 # **Y**ou **O**nly **L**ook **A**t **C**oefficien**T**s
+
 ```
     ██╗   ██╗ ██████╗ ██╗      █████╗  ██████╗████████╗
     ╚██╗ ██╔╝██╔═══██╗██║     ██╔══██╗██╔════╝╚══██╔══╝
@@ -9,13 +10,15 @@
 ```
 
 A simple, fully convolutional model for real-time instance segmentation. This is the code for our paper:
- - [YOLACT: Real-time Instance Segmentation](https://arxiv.org/abs/1904.02689)
+
+- [YOLACT: Real-time Instance Segmentation](https://arxiv.org/abs/1904.02689)
  <!-- - [YOLACT++: Better Real-time Instance Segmentation](https://arxiv.org/abs/1912.06218) -->
 
-#### For a real-time demo, check out our ICCV video:
+For a real-time demo, check out our ICCV video:
 [![IMAGE ALT TEXT HERE](https://img.youtube.com/vi/0pMfmo8qfpQ/0.jpg)](https://www.youtube.com/watch?v=0pMfmo8qfpQ)
 
-# Evaluation
+## Evaluation
+
 Here are our YOLACT models along with their FPS on a Titan Xp and mAP on COCO's `val`:
 
 | Image Size | GPU x BS | Backbone      | *FPS  | mAP  | Weights | Configs | Download |
@@ -26,19 +29,24 @@ Here are our YOLACT models along with their FPS on a Titan Xp and mAP on COCO's
 
 *Note: The FPS is evaluated by the [original implementation](https://github.com/dbolya/yolact). When calculating FPS, only the model inference time is taken into account. Data loading and post-processing operations such as converting masks to RLE code, generating COCO JSON results, image rendering are not included.
 
-# Training
+## Training
+
 All the aforementioned models are trained with a single GPU. It typically takes ~12GB VRAM when using resnet-101 as the backbone. If you want to try multiple GPUs training, you may have to modify the configuration files accordingly, such as adjusting the training schedule and freezing batch norm.
+
 ```Shell
 # Trains using the resnet-101 backbone with a batch size of 8 on a single GPU.
 ./tools/dist_train.sh configs/yolact/yolact_r101.py 1
 ```
 
-# Testing
+## Testing
+
 Please refer to [mmdetection/docs/getting_started.md](https://github.com/open-mmlab/mmdetection/blob/master/docs/getting_started.md#inference-with-pretrained-models).
 
-# Citation
+## Citation
+
 If you use YOLACT or this code base in your work, please cite
-```
+
+```latex
 @inproceedings{yolact-iccv2019,
   author    = {Daniel Bolya and Chong Zhou and Fanyi Xiao and Yong Jae Lee},
   title     = {YOLACT: {Real-time} Instance Segmentation},
@@ -48,7 +56,8 @@ If you use YOLACT or this code base in your work, please cite
 ```
 
 <!-- For YOLACT++, please cite
-```
+
+```latex
 @misc{yolact-plus-arxiv2019,
   title         = {YOLACT++: Better Real-time Instance Segmentation},
   author        = {Daniel Bolya and Chong Zhou and Fanyi Xiao and Yong Jae Lee},
diff --git a/configs/yolo/README.md b/configs/yolo/README.md
index 83c17159d9e..7f0c88c5de7 100644
--- a/configs/yolo/README.md
+++ b/configs/yolo/README.md
@@ -1,7 +1,8 @@
 # YOLOv3
 
 ## Introduction
-```
+
+```latex
 @misc{redmon2018yolov3,
     title={YOLOv3: An Incremental Improvement},
     author={Joseph Redmon and Ali Farhadi},
@@ -20,6 +21,6 @@
 |   DarkNet-53    |   416   |   273e  |   3.8    |      61.2      |  30.9  | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/yolo/yolov3_d53_mstrain-416_273e_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/yolo/yolov3_d53_mstrain-416_273e_coco/yolov3_d53_mstrain-416_273e_coco-2b60fcd9.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/yolo/yolov3_d53_mstrain-416_273e_coco/yolov3_d53_mstrain-416_273e_coco-20200819_173424.log.json) |
 |   DarkNet-53    |   608   |   273e  |   7.1    |      48.1      |  33.4  | [config](https://github.com/open-mmlab/mmdetection/tree/master/configs/yolo/yolov3_d53_mstrain-608_273e_coco.py) | [model](http://download.openmmlab.com/mmdetection/v2.0/yolo/yolov3_d53_mstrain-608_273e_coco/yolov3_d53_mstrain-608_273e_coco-139f5633.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/yolo/yolov3_d53_mstrain-608_273e_coco/yolov3_d53_mstrain-608_273e_coco-20200819_170820.log.json) |
 
-
 ## Credit
+
 This implementation originates from the project of Haoyu Wu(@wuhy08) at Western Digital.
diff --git a/docs/2_new_data_model.md b/docs/2_new_data_model.md
index 608f6ac162f..b20451511ca 100644
--- a/docs/2_new_data_model.md
+++ b/docs/2_new_data_model.md
@@ -147,6 +147,7 @@ If you take a look at the dataset, you will find the dataset format is as below:
     'name': 'polygon'}}},
  'size': 1115004}
 ```
+
 The annotation is a JSON file where each key indicates an image's all annotations.
 The code to convert the ballon dataset into coco format is as below.
 
@@ -206,7 +207,6 @@ def convert_balloon_to_coco(ann_file, out_file, image_prefix):
 
 Using the function above, users can successfully convert the annotation file into json format, then we can use `CocoDataset` to train and evaluate the model.
 
-
 ## Prepare a config
 
 The second step is to prepare a config thus the dataset could be successfully loaded. Assume that we want to use Mask R-CNN with FPN, the config to train the detector on ballon dataset is as below. Assume the config is under directory `configs/ballon/` and named as `mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_balloon.py`, the config is as below.
diff --git a/docs/3_exist_data_new_model.md b/docs/3_exist_data_new_model.md
index 4052d80d9df..ea3a60d5457 100644
--- a/docs/3_exist_data_new_model.md
+++ b/docs/3_exist_data_new_model.md
@@ -8,14 +8,14 @@ The basic steps are as below:
 2. Prepare a config
 3. Train, test, inference models on the standard dataset.
 
-### Prepare the standard dataset
+## Prepare the standard dataset
 
 In this note, as we use the standard cityscapes dataset as an example.
 
 It is recommended to symlink the dataset root to `$MMDETECTION/data`.
 If your folder structure is different, you may need to change the corresponding paths in config files.
 
-```
+```none
 mmdetection
 ├── mmdet
 ├── tools
@@ -50,7 +50,7 @@ python tools/convert_datasets/cityscapes.py ./data/cityscapes --nproc 8 --out-di
 Currently the config files in `cityscapes` use COCO pre-trained weights to initialize.
 You could download the pre-trained models in advance if network is unavailable or slow, otherwise it would cause errors at the beginning of training.
 
-### Prepare a config
+## Prepare a config
 
 The second step is to prepare a config for your own training setting. Assume that we want to use Cascade Mask R-CNN with FPN to train the cityscapes dataset, and assume the config is under directory `configs/cityscapes/` and named as `cascade_mask_rcnn_r50_fpn_1x_cityscapes.py`, the config is as below.
 
@@ -143,7 +143,7 @@ total_epochs = 8
 load_from = 'http://download.openmmlab.com/mmdetection/v2.0/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco/cascade_mask_rcnn_r50_fpn_1x_coco_20200203-9d4dcb24.pth'
 ```
 
-### Train a new model
+## Train a new model
 
 To train a model with the new config, you can simply run
 
@@ -153,7 +153,7 @@ python tools/train.py configs/cityscapes/cascade_mask_rcnn_r50_fpn_1x_cityscapes
 
 For more detailed usages, please refer to the [Case 1](1_exist_data_model.md).
 
-### Test and inference
+## Test and inference
 
 To test the trained model, you can simply run
 
diff --git a/docs/changelog.md b/docs/changelog.md
index 637de72e7e7..614b237b07f 100644
--- a/docs/changelog.md
+++ b/docs/changelog.md
@@ -40,7 +40,6 @@
 - Clean duplicated `reduce_mean` function (#4056)
 - Refactor Q&A (#4045)
 
-
 ### v2.6.0 (1/11/2020)
 
 - Support new method: [VarifocalNet](https://arxiv.org/abs/2008.13367).
@@ -131,6 +130,7 @@ Function `get_subset_by_classes` in dataset is refactored and only filters out i
 ### v2.4.0 (5/9/2020)
 
 **Highlights**
+
 - Fix lots of issues/bugs and reorganize the trouble shooting page
 - Support new methods [SABL](https://arxiv.org/abs/1912.04260), [YOLOv3](https://arxiv.org/abs/1804.02767), and [PAA Assign](https://arxiv.org/abs/2007.08103)
 - Support Batch Inference
@@ -138,12 +138,14 @@ Function `get_subset_by_classes` in dataset is refactored and only filters out i
 - Switch model zoo to download.openmmlab.com
 
 **Backwards Incompatible Changes**
+
 - Support Batch Inference (#3564, #3686, #3705): Since v2.4.0, MMDetection could inference model with multiple images in a single GPU.
-This change influences all the test APIs in MMDetection and downstream codebases. To help the users migrate their code, we use `replace_ImageToTensor` (#3686) to convert legacy test data pipelines during dataset initialization.
+  This change influences all the test APIs in MMDetection and downstream codebases. To help the users migrate their code, we use `replace_ImageToTensor` (#3686) to convert legacy test data pipelines during dataset initialization.
 - Support RandomFlip with horizontal/vertical/diagonal direction (#3608): Since v2.4.0, MMDetection supports horizontal/vertical/diagonal flip in the data augmentation. This influences bounding box, mask, and image transformations in data augmentation process and the process that will map those data back to the original format.
 - Migrate to use `mmlvis` and `mmpycocotools` for COCO and LVIS dataset (#3727). The APIs are fully compatible with the original `lvis` and `pycocotools`. Users need to uninstall the existing pycocotools and lvis packages in their environment first and install `mmlvis` & `mmpycocotools`.
 
 **Bug Fixes**
+
 - Fix default mean/std for onnx (#3491)
 - Fix coco evaluation and add metric items (#3497)
 - Fix typo for install.md (#3516)
@@ -164,6 +166,7 @@ This change influences all the test APIs in MMDetection and downstream codebases
 - Fix bug in OHEMSampler (#3677)
 
 **New Features**
+
 - Support Cutout augmentation (#3521)
 - Support evaluation on multiple datasets through ConcatDataset (#3522)
 - Support [PAA assign](https://arxiv.org/abs/2007.08103) #(3547)
@@ -174,6 +177,7 @@ This change influences all the test APIs in MMDetection and downstream codebases
 - Support custom imports (#3641)
 
 **Improvements**
+
 - Refactor common issues in documentation (#3530)
 - Add pytorch 1.6 to CI config (#3532)
 - Add config to runner meta (#3534)
@@ -188,12 +192,14 @@ This change influences all the test APIs in MMDetection and downstream codebases
 ### v2.3.0 (5/8/2020)
 
 **Highlights**
+
 - The CUDA/C++ operators have been moved to `mmcv.ops`. For backward compatibility `mmdet.ops` is kept as warppers of `mmcv.ops`.
 - Support new methods [CornerNet](https://arxiv.org/abs/1808.01244), [DIOU](https://arxiv.org/abs/1911.08287)/[CIOU](https://arxiv.org/abs/2005.03572) loss, and new dataset: [LVIS V1](https://arxiv.org/abs/1908.03195)
 - Provide more detailed colab training tutorials and more complete documentation.
 - Support to convert RetinaNet from Pytorch to ONNX.
 
 **Bug Fixes**
+
 - Fix the model initialization bug of DetectoRS (#3187)
 - Fix the bug of module names in NASFCOSHead (#3205)
 - Fix the filename bug in publish_model.py (#3237)
@@ -209,6 +215,7 @@ This change influences all the test APIs in MMDetection and downstream codebases
 - Fix runtimeError caused by incontiguous tensor in Res2Net+DCN (#3412)
 
 **New Features**
+
 - Support [CornerNet](https://arxiv.org/abs/1808.01244) (#3036)
 - Support [DIOU](https://arxiv.org/abs/1911.08287)/[CIOU](https://arxiv.org/abs/2005.03572) loss (#3151)
 - Support [LVIS V1](https://arxiv.org/abs/1908.03195) dataset (#)
@@ -217,6 +224,7 @@ This change influences all the test APIs in MMDetection and downstream codebases
 - Support to convert RetinaNet from Pytorch to ONNX (#3075)
 
 **Improvements**
+
 - Support to process ignore boxes in ATSS assigner (#3082)
 - Allow to crop images without ground truth in `RandomCrop` (#3153)
 - Enable the the `Accuracy` module to set threshold (#3155)
@@ -229,31 +237,33 @@ This change influences all the test APIs in MMDetection and downstream codebases
 - Remove git hash in version file (#3466)
 - Check mmcv version to force version compatibility (#3460)
 
-
 ### v2.2.0 (1/7/2020)
 
 **Highlights**
+
 - Support new methods: [DetectoRS](https://arxiv.org/abs/2006.02334), [PointRend](https://arxiv.org/abs/1912.08193), [Generalized Focal Loss](https://arxiv.org/abs/2006.04388), [Dynamic R-CNN](https://arxiv.org/abs/2004.06002)
 
 **Bug Fixes**
- - Fix FreeAnchor when no gt in image (#3176)
- - Clean up deprecated usage of `register_module()` (#3092, #3161)
- - Fix pretrain bug in NAS FCOS (#3145)
- - Fix `num_classes` in SSD (#3142)
- - Fix FCOS warmup (#3119)
- - Fix `rstrip` in `tools/publish_model.py`
- - Fix `flip_ratio` default value in RandomFLip pipeline (#3106)
- - Fix cityscapes eval with ms_rcnn (#3112)
- - Fix RPN softmax (#3056)
- - Fix filename of LVIS@v0.5 (#2998)
- - Fix nan loss by filtering out-of-frame gt_bboxes in COCO (#2999)
- - Fix bug in FSAF (#3018)
- - Add FocalLoss `num_classes` check (#2964)
- - Fix PISA Loss when there are no gts (#2992)
- - Avoid nan in `iou_calculator` (#2975)
- - Prevent possible bugs in loading and transforms caused by shallow copy (#2967)
+
+- Fix FreeAnchor when no gt in image (#3176)
+- Clean up deprecated usage of `register_module()` (#3092, #3161)
+- Fix pretrain bug in NAS FCOS (#3145)
+- Fix `num_classes` in SSD (#3142)
+- Fix FCOS warmup (#3119)
+- Fix `rstrip` in `tools/publish_model.py`
+- Fix `flip_ratio` default value in RandomFLip pipeline (#3106)
+- Fix cityscapes eval with ms_rcnn (#3112)
+- Fix RPN softmax (#3056)
+- Fix filename of LVIS@v0.5 (#2998)
+- Fix nan loss by filtering out-of-frame gt_bboxes in COCO (#2999)
+- Fix bug in FSAF (#3018)
+- Add FocalLoss `num_classes` check (#2964)
+- Fix PISA Loss when there are no gts (#2992)
+- Avoid nan in `iou_calculator` (#2975)
+- Prevent possible bugs in loading and transforms caused by shallow copy (#2967)
 
 **New Features**
+
 - Add DetectoRS (#3064)
 - Support Generalize Focal Loss (#3097)
 - Support PointRend (#2752)
@@ -275,15 +285,16 @@ This change influences all the test APIs in MMDetection and downstream codebases
 - Use `len(data['img_metas'])` to indicate `num_samples` (#3073, #3053)
 - Switch to EpochBasedRunner (#2976)
 
-
 ### v2.1.0 (8/6/2020)
 
 **Highlights**
+
 - Support new backbones: [RegNetX](https://arxiv.org/abs/2003.13678), [Res2Net](https://arxiv.org/abs/1904.01169)
 - Support new methods: [NASFCOS](https://arxiv.org/abs/1906.04423), [PISA](https://arxiv.org/abs/1904.04821), [GRoIE](https://arxiv.org/abs/2004.13665)
 - Support new dataset: [LVIS](https://arxiv.org/abs/1908.03195)
 
 **Bug Fixes**
+
 - Change the CLI argument `--validate` to `--no-validate` to enable validation after training epochs by default. (#2651)
 - Add missing cython to docker file (#2713)
 - Fix bug in nms cpu implementation (#2754)
@@ -298,6 +309,7 @@ This change influences all the test APIs in MMDetection and downstream codebases
 - Fix the bug of logger when loading pre-trained weights in base detector (#2936)
 
 **New Features**
+
 - Add IoU models (#2666)
 - Add colab demo for inference
 - Support class agnostic nms (#2553)
@@ -316,6 +328,7 @@ This change influences all the test APIs in MMDetection and downstream codebases
 - Support GRoIE (#2584)
 
 **Improvements**
+
 - Allow different x and y strides in anchor heads. (#2629)
 - Make FSAF loss more robust to no gt (#2680)
 - Compute pure inference time instead (#2657) and update inference speed (#2730)
@@ -333,8 +346,8 @@ This change influences all the test APIs in MMDetection and downstream codebases
 - Improve documentations (#2918, #2714)
 - Use optimizer constructor in mmcv and clean the original implementation in `mmdet.core.optimizer` (#2947)
 
-
 ### v2.0.0 (6/5/2020)
+
 In this release, we made lots of major refactoring and modifications.
 
 1. **Faster speed**. We optimize the training and inference speed for common models, achieving up to 30% speedup for training and 25% for inference. Please refer to [model zoo](model_zoo.md#comparison-with-detectron2) for details.
@@ -355,6 +368,7 @@ In this release, we made lots of major refactoring and modifications.
 Models training with MMDetection 1.x are not fully compatible with 2.0, please refer to the [compatibility doc](compatibility.md) for the details and how to migrate to the new version.
 
 **Improvements**
+
 - Unify cuda and cpp API for custom ops. (#2277)
 - New config files with inheritance. (#2216)
 - Encapsulate the second stage into RoI heads. (#1999)
@@ -375,6 +389,7 @@ Models training with MMDetection 1.x are not fully compatible with 2.0, please r
 - Drop the support for Python 3.5 and use F-string in the codebase. (#2531)
 
 **Bug Fixes**
+
 - Fix the scale factors for resized images without keep the aspect ratio. (#2039)
 - Check if max_num > 0 before slicing in NMS. (#2486)
 - Fix Deformable RoIPool when there is no instance. (#2490)
@@ -382,6 +397,7 @@ Models training with MMDetection 1.x are not fully compatible with 2.0, please r
 - Fix the evaluation of Cityscapes. (#2578)
 
 **New Features**
+
 - Add deep_stem and avg_down option to ResNet, i.e., support ResNetV1d. (#2252)
 - Add L1 loss. (#2376)
 - Support both polygon and bitmap for instance masks. (#2353, #2540)
@@ -400,16 +416,19 @@ Models training with MMDetection 1.x are not fully compatible with 2.0, please r
 ### v1.1.0 (24/2/2020)
 
 **Highlights**
+
 - Dataset evaluation is rewritten with a unified api, which is used by both evaluation hooks and test scripts.
 - Support new methods: [CARAFE](https://arxiv.org/abs/1905.02188).
 
 **Breaking Changes**
+
 - The new MMDDP inherits from the official DDP, thus the `__init__` api is changed to be the same as official DDP.
 - The `mask_head` field in HTC config files is modified.
 - The evaluation and testing script is updated.
 - In all transforms, instance masks are stored as a numpy array shaped (n, h, w) instead of a list of (h, w) arrays, where n is the number of instances.
 
 **Bug Fixes**
+
 - Fix IOU assigners when ignore_iof_thr > 0 and there is no pred boxes. (#2135)
 - Fix mAP evaluation when there are no ignored boxes. (#2116)
 - Fix the empty RoI input for Deformable RoI Pooling. (#2099)
@@ -421,6 +440,7 @@ Models training with MMDetection 1.x are not fully compatible with 2.0, please r
 - Fix the albumentation transform when there is no ground truth bbox. (#2032)
 
 **Improvements**
+
 - Use torch instead of numpy for random sampling. (#2094)
 - Migrate to the new MMDDP implementation in MMCV v0.3. (#2090)
 - Add meta information in logs. (#2086)
@@ -429,6 +449,7 @@ Models training with MMDetection 1.x are not fully compatible with 2.0, please r
 - Use numpy array for masks in transforms. (#2030)
 
 **New Features**
+
 - Implement "CARAFE: Content-Aware ReAssembly of FEatures". (#1583)
 - Add `worker_init_fn()` in data_loader when seed is set. (#2066, #2111)
 - Add logging utils. (#2035)
@@ -438,12 +459,14 @@ Models training with MMDetection 1.x are not fully compatible with 2.0, please r
 This release mainly improves the code quality and add more docstrings.
 
 **Highlights**
+
 - Documentation is online now: https://mmdetection.readthedocs.io.
 - Support new models: [ATSS](https://arxiv.org/abs/1912.02424).
 - DCN is now available with the api `build_conv_layer` and `ConvModule` like the normal conv layer.
 - A tool to collect environment information is available for trouble shooting.
 
 **Bug Fixes**
+
 - Fix the incompatibility of the latest numpy and pycocotools. (#2024)
 - Fix the case when distributed package is unavailable, e.g., on Windows. (#1985)
 - Fix the dimension issue for `refine_bboxes()`. (#1962)
@@ -454,6 +477,7 @@ This release mainly improves the code quality and add more docstrings.
 - Fix the mask data type when using albumentation. (#1818)
 
 **Improvements**
+
 - Enhance AssignResult and SamplingResult. (#1995)
 - Add ability to overwrite existing module in Registry. (#1982)
 - Reorganize requirements and make albumentations and imagecorruptions optional. (#1969)
@@ -468,6 +492,7 @@ This release mainly improves the code quality and add more docstrings.
 - Remove the option `keep_all_stages` in HTC and Cascade R-CNN. (#1806)
 
 **New Features**
+
 - Add two test-time options `crop_mask` and `rle_mask_encode` for mask heads. (#2013)
 - Support loading grayscale images as single channel. (#1975)
 - Implement "Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection". (#1872)
@@ -475,12 +500,12 @@ This release mainly improves the code quality and add more docstrings.
 - Add GN support for flops computation. (#1850)
 - Collect env info for trouble shooting. (#1812)
 
-
 ### v1.0rc1 (13/12/2019)
 
 The RC1 release mainly focuses on improving the user experience, and fixing bugs.
 
 **Highlights**
+
 - Support new models: [FoveaBox](https://arxiv.org/abs/1904.03797), [RepPoints](https://arxiv.org/abs/1904.11490) and [FreeAnchor](https://arxiv.org/abs/1909.02466).
 - Add a Dockerfile.
 - Add a jupyter notebook demo and a webcam demo.
@@ -489,9 +514,11 @@ The RC1 release mainly focuses on improving the user experience, and fixing bugs
 - Fix lots of bugs.
 
 **Breaking Changes**
+
 - There was a bug for computing COCO-style mAP w.r.t different scales (AP_s, AP_m, AP_l), introduced by #621. (#1679)
 
 **Bug Fixes**
+
 - Fix a sampling interval bug in Libra R-CNN. (#1800)
 - Fix the learning rate in SSD300 WIDER FACE. (#1781)
 - Fix the scaling issue when `keep_ratio=False`. (#1730)
@@ -523,6 +550,7 @@ The RC1 release mainly focuses on improving the user experience, and fixing bugs
 - Fix recursive imports. (#1099)
 
 **Improvements**
+
 - Print the config file and mmdet version in the log. (#1721)
 - Lint the code before compiling in travis CI. (#1715)
 - Add a probability argument for the `Expand` transform. (#1651)
@@ -547,6 +575,7 @@ The RC1 release mainly focuses on improving the user experience, and fixing bugs
 - Use `.scalar_type()` instead of `.type()` to suppress some warnings. (#1070)
 
 **New Features**
+
 - Add an option `--with_ap` to compute the AP for each class. (#1549)
 - Implement "FreeAnchor: Learning to Match Anchors for Visual Object Detection". (#1391)
 - Support [Albumentations](https://github.com/albumentations-team/albumentations) for augmentations in the data pipeline. (#1354)
@@ -560,8 +589,8 @@ The RC1 release mainly focuses on improving the user experience, and fixing bugs
 - Add FLOPs counter. (#1127)
 - Allow arbitrary layer order for ConvModule. (#1078)
 
-
 ### v1.0rc0 (27/07/2019)
+
 - Implement lots of new methods and components (Mixed Precision Training, HTC, Libra R-CNN, Guided Anchoring, Empirical Attention, Mask Scoring R-CNN, Grid R-CNN (Plus), GHM, GCNet, FCOS, HRNet, Weight Standardization, etc.). Thank all collaborators!
 - Support two additional datasets: WIDER FACE and Cityscapes.
 - Refactoring for loss APIs and make it more flexible to adopt different losses and related hyper-parameters.
@@ -569,38 +598,47 @@ The RC1 release mainly focuses on improving the user experience, and fixing bugs
 - Integrate all compiling and installing in a single script.
 
 ### v0.6.0 (14/04/2019)
+
 - Up to 30% speedup compared to the model zoo.
 - Support both PyTorch stable and nightly version.
 - Replace NMS and SigmoidFocalLoss with Pytorch CUDA extensions.
 
 ### v0.6rc0(06/02/2019)
+
 - Migrate to PyTorch 1.0.
 
 ### v0.5.7 (06/02/2019)
+
 - Add support for Deformable ConvNet v2. (Many thanks to the authors and [@chengdazhi](https://github.com/chengdazhi))
 - This is the last release based on PyTorch 0.4.1.
 
 ### v0.5.6 (17/01/2019)
+
 - Add support for Group Normalization.
 - Unify RPNHead and single stage heads (RetinaHead, SSDHead) with AnchorHead.
 
 ### v0.5.5 (22/12/2018)
+
 - Add SSD for COCO and PASCAL VOC.
 - Add ResNeXt backbones and detection models.
 - Refactoring for Samplers/Assigners and add OHEM.
 - Add VOC dataset and evaluation scripts.
 
 ### v0.5.4 (27/11/2018)
+
 - Add SingleStageDetector and RetinaNet.
 
 ### v0.5.3 (26/11/2018)
+
 - Add Cascade R-CNN and Cascade Mask R-CNN.
 - Add support for Soft-NMS in config files.
 
 ### v0.5.2 (21/10/2018)
+
 - Add support for custom datasets.
 - Add a script to convert PASCAL VOC annotations to the expected format.
 
 ### v0.5.1 (20/10/2018)
+
 - Add BBoxAssigner and BBoxSampler, the `train_cfg` field in config files are restructured.
 - `ConvFCRoIHead` / `SharedFCRoIHead` are renamed to `ConvFCBBoxHead` / `SharedFCBBoxHead` for consistency.
diff --git a/docs/compatibility.md b/docs/compatibility.md
index b8520a859d9..7d0565ce679 100644
--- a/docs/compatibility.md
+++ b/docs/compatibility.md
@@ -5,26 +5,27 @@ MMDetection 2.0 goes through a big refactoring and addresses many legacy issues.
 The major differences are in four folds: coordinate system, codebase conventions, training hyperparameters, and modular design.
 
 ## Coordinate System
+
 The new coordinate system is consistent with [Detectron2](https://github.com/facebookresearch/detectron2/) and treats the center of the most left-top pixel as (0, 0) rather than the left-top corner of that pixel.
 Accordingly, the system interprets the coordinates in COCO bounding box and segmentation annotations as coordinates in range `[0, width]` or `[0, height]`.
 This modification affects all the computation related to the bbox and pixel selection,
 which is more natural and accurate.
 
 - The height and width of a box with corners (x1, y1) and (x2, y2) in the new coordinate system is computed as `width = x2 - x1` and `height = y2 - y1`.
-In MMDetection 1.x and previous version, a "+ 1" was added both height and width.
-This modification are in three folds:
+  In MMDetection 1.x and previous version, a "+ 1" was added both height and width.
+  This modification are in three folds:
 
   1. Box transformation and encoding/decoding in regression.
   2. IoU calculation. This affects the matching process between ground truth and bounding box and the NMS process. The effect to compatibility is very negligible, though.
   3. The corners of bounding box is in float type and no longer quantized. This should provide more accurate bounding box results. This also makes the bounding box and RoIs not required to have minimum size of 1, whose effect is small, though.
 
 - The anchors are center-aligned to feature grid points and in float type.
-In MMDetection 1.x and previous version, the anchors are in `int` type and not center-aligned.
-This affects the anchor generation in RPN and all the anchor-based methods.
+  In MMDetection 1.x and previous version, the anchors are in `int` type and not center-aligned.
+  This affects the anchor generation in RPN and all the anchor-based methods.
 
 - ROIAlign is better aligned with the image coordinate system. The new implementation is adopted from [Detectron2](https://github.com/facebookresearch/detectron2/tree/master/detectron2/layers/csrc/ROIAlign).
-The RoIs are shifted by half a pixel by default when they are used to cropping RoI features, compared to MMDetection 1.x.
-The old behavior is still available by setting `aligned=False` instead of `aligned=True`.
+  The RoIs are shifted by half a pixel by default when they are used to cropping RoI features, compared to MMDetection 1.x.
+  The old behavior is still available by setting `aligned=False` instead of `aligned=True`.
 
 - Mask cropping and pasting are more accurate.
 
@@ -35,7 +36,7 @@ The old behavior is still available by setting `aligned=False` instead of `align
 ## Codebase Conventions
 
 - MMDetection 2.0 changes the order of class labels to reduce unused parameters in regression and mask branch more naturally (without +1 and -1).
-This effect all the classification layers of the model to have a different ordering of class labels. The final layers of regression branch and mask head no longer keep K+1 channels for K categories, and their class orders are consistent with the classification branch.
+  This effect all the classification layers of the model to have a different ordering of class labels. The final layers of regression branch and mask head no longer keep K+1 channels for K categories, and their class orders are consistent with the classification branch.
 
   - In MMDetection 2.0, label "K" means background, and labels [0, K-1] correspond to the K = num_categories object categories.
 
@@ -44,14 +45,16 @@ This effect all the classification layers of the model to have a different order
   - **Note**: The class order of softmax RPN is still the same as that in 1.x in versions<=2.4.0 while sigmoid RPN is not affected. The class orders in all heads are unified since MMDetection v2.5.0.
 
 - Low quality matching in R-CNN is not used. In MMDetection 1.x and previous versions, the `max_iou_assigner` will match low quality boxes for each ground truth box in both RPN and R-CNN training. We observe this sometimes does not assign the most perfect GT box to some bounding boxes,
-thus MMDetection 2.0 do not allow low quality matching by default in R-CNN training in the new system. This sometimes may slightly improve the box AP (~0.1% absolute).
+  thus MMDetection 2.0 do not allow low quality matching by default in R-CNN training in the new system. This sometimes may slightly improve the box AP (~0.1% absolute).
 
 - Separate scale factors for width and height. In MMDetection 1.x and previous versions, the scale factor is a single float in mode `keep_ratio=True`. This is slightly inaccurate because the scale factors for width and height have slight difference. MMDetection 2.0 adopts separate scale factors for width and height, the improvement on AP ~0.1% absolute.
 
 - Configs name conventions are changed. MMDetection V2.0 adopts the new name convention to maintain the gradually growing model zoo as the following:
-  ```
+
+  ```shell
   [model]_(model setting)_[backbone]_[neck]_(norm setting)_(misc)_(gpu x batch)_[schedule]_[dataset].py,
   ```
+
   where the (`misc`) includes DCN and GCBlock, etc. More details are illustrated in the [documentation for config](config.md)
 
 - MMDetection V2.0 uses new ResNet Caffe backbones to reduce warnings when loading pre-trained models. Most of the new backbones' weights are the same as the former ones but do not have `conv.bias`, except that they use a different `img_norm_cfg`. Thus, the new backbone will not cause warning of unexpected keys.
@@ -62,7 +65,7 @@ The change in training hyperparameters does not affect
 model-level compatibility but slightly improves the performance. The major ones are:
 
 - The number of proposals after nms is changed from 2000 to 1000 by setting `nms_post=1000` and `max_num=1000`.
-This slightly improves both mask AP and bbox AP by ~0.2% absolute.
+  This slightly improves both mask AP and bbox AP by ~0.2% absolute.
 
 - The default box regression losses for Mask R-CNN, Faster R-CNN and RetinaNet are changed from smooth L1 Loss to L1 loss. This leads to an overall improvement in box AP (~0.6% absolute). However, using L1-loss for other methods such as Cascade R-CNN and HTC does not improve the performance, so we keep the original settings for these methods.
 
diff --git a/docs/conventions.md b/docs/conventions.md
index 4b65229303b..86e8cb721c7 100644
--- a/docs/conventions.md
+++ b/docs/conventions.md
@@ -3,9 +3,11 @@
 Please check the following conventions if you would like to modify MMDetection as your own project.
 
 ## Loss
+
 In MMDetection, a `dict` containing losses and metrics will be returned by `model(**data)`.
 
 For example, in bbox head,
+
 ```python
 class BBoxHead(nn.Module):
     ...
@@ -19,6 +21,7 @@ class BBoxHead(nn.Module):
         losses['loss_bbox'] = self.loss_bbox(...)
         return losses
 ```
+
 `bbox_head.loss()` will be called during model forward.
 The returned dict contains `'loss_bbox'`, `'loss_cls'`, `'acc'` .
 Only `'loss_bbox'`, `'loss_cls'` will be used during back propagation,
diff --git a/docs/faq.md b/docs/faq.md
index b0aedd3fdcb..f438d7ec76c 100644
--- a/docs/faq.md
+++ b/docs/faq.md
@@ -68,13 +68,11 @@ We list some common troubles faced by many users and their corresponding solutio
 ## Training
 
 - "Loss goes Nan"
-  1. Check if the dataset annotations are valid: zero-size bounding boxes will cause the regression loss to be Nan due to the commonly used transformation for box regression. Some small size (width or height are smaller than 1) boxes will also cause this problem after data augmentation (e.g., instaboost). So check the data and try to filter out those zero-size boxes and skip some risky augmentations on the small-size boxes when you face the problem.
-  2. Reduce the learning rate: the learning rate might be too large due to some reasons, e.g., change of batch size. You can rescale them to the value that could stably train the model.
-  3. Extend the warmup iterations: some models are sensitive to the learning rate at the start of the training. You can extend the warmup iterations, e.g., change the `warmup_iters` from 500 to 1000 or 2000.
-  4. Add gradient clipping: some models requires gradient clipping to stablize the training process. You can add gradient clippint to avoid gradients that are too large.
-
+    1. Check if the dataset annotations are valid: zero-size bounding boxes will cause the regression loss to be Nan due to the commonly used transformation for box regression. Some small size (width or height are smaller than 1) boxes will also cause this problem after data augmentation (e.g., instaboost). So check the data and try to filter out those zero-size boxes and skip some risky augmentations on the small-size boxes when you face the problem.
+    2. Reduce the learning rate: the learning rate might be too large due to some reasons, e.g., change of batch size. You can rescale them to the value that could stably train the model.
+    3. Extend the warmup iterations: some models are sensitive to the learning rate at the start of the training. You can extend the warmup iterations, e.g., change the `warmup_iters` from 500 to 1000 or 2000.
+    4. Add gradient clipping: some models requires gradient clipping to stablize the training process. You can add gradient clippint to avoid gradients that are too large.
 - ’GPU out of memory"
-  1. There are some scenarios when there are large amount of ground truth boxes, which may cause OOM during target assignment.
-  You can set `gpu_assign_thr=N` in the config of assigner thus the assigner will calculate box overlaps through CPU when there are more than N GT boxes.
-  2. Set `with_cp=True` in the backbone. This uses the sublinear strategy in PyTorch to reduce GPU memory cost in the backbone.
-  3. Try mixed precision training using following the examples in `config/fp16`. The `loss_scale` might need further tuning for different models.
+    1. There are some scenarios when there are large amount of ground truth boxes, which may cause OOM during target assignment. You can set `gpu_assign_thr=N` in the config of assigner thus the assigner will calculate box overlaps through CPU when there are more than N GT boxes.
+    2. Set `with_cp=True` in the backbone. This uses the sublinear strategy in PyTorch to reduce GPU memory cost in the backbone.
+    3. Try mixed precision training using following the examples in `config/fp16`. The `loss_scale` might need further tuning for different models.
diff --git a/docs/get_started.md b/docs/get_started.md
index a2bfb45b633..4ef8245032e 100644
--- a/docs/get_started.md
+++ b/docs/get_started.md
@@ -35,8 +35,7 @@ If mmcv and mmcv-full are both installed, there will be `ModuleNotFoundError`.
     conda activate open-mmlab
     ```
 
-2. Install PyTorch and torchvision following the [official instructions
-](https://pytorch.org/), e.g.,
+2. Install PyTorch and torchvision following the [official instructions](https://pytorch.org/), e.g.,
 
     ```shell
     conda install pytorch torchvision -c pytorch
diff --git a/docs/model_zoo.md b/docs/model_zoo.md
index 95bba2a7454..3e0b0b23ef0 100644
--- a/docs/model_zoo.md
+++ b/docs/model_zoo.md
@@ -13,7 +13,6 @@ You can replace `https://s3.ap-northeast-2.amazonaws.com/open-mmlab` with `https
 - For fair comparison with other codebases, we report the GPU memory as the maximum value of `torch.cuda.max_memory_allocated()` for all 8 GPUs. Note that this value is usually less than what `nvidia-smi` shows.
 - We report the inference time as the total time of network forwarding and post-processing, excluding the data loading time. Results are obtained with the script [benchmark.py](https://github.com/open-mmlab/mmdetection/blob/master/tools/benchmark.py) which computes the average time on 2000 images.
 
-
 ## Baselines
 
 ### RPN
@@ -61,6 +60,7 @@ Please refer to [Weight Standardization](https://github.com/open-mmlab/mmdetecti
 Please refer to [Deformable Convolutional Networks](https://github.com/open-mmlab/mmdetection/blob/master/configs/dcn) for details.
 
 ### CARAFE: Content-Aware ReAssembly of FEatures
+
 Please refer to [CARAFE](https://github.com/open-mmlab/mmdetection/blob/master/configs/carafe) for details.
 
 ### Instaboost
@@ -104,6 +104,7 @@ Please refer to [GHM](https://github.com/open-mmlab/mmdetection/blob/master/conf
 Please refer to [GCNet](https://github.com/open-mmlab/mmdetection/blob/master/configs/gcnet) for details.
 
 ### HRNet
+
 Please refer to [HRNet](https://github.com/open-mmlab/mmdetection/blob/master/configs/hrnet) for details.
 
 ### Mask Scoring R-CNN
@@ -115,54 +116,71 @@ Please refer to [Mask Scoring R-CNN](https://github.com/open-mmlab/mmdetection/b
 Please refer to [Rethinking ImageNet Pre-training](https://github.com/open-mmlab/mmdetection/blob/master/configs/scratch) for details.
 
 ### NAS-FPN
+
 Please refer to [NAS-FPN](https://github.com/open-mmlab/mmdetection/blob/master/configs/nas_fpn) for details.
 
 ### ATSS
+
 Please refer to [ATSS](https://github.com/open-mmlab/mmdetection/blob/master/configs/atss) for details.
 
 ### FSAF
+
 Please refer to [FSAF](https://github.com/open-mmlab/mmdetection/blob/master/configs/fsaf) for details.
 
 ### RegNetX
+
 Please refer to [RegNet](https://github.com/open-mmlab/mmdetection/blob/master/configs/regnet) for details.
 
 ### Res2Net
+
 Please refer to [Res2Net](https://github.com/open-mmlab/mmdetection/blob/master/configs/res2net) for details.
 
 ### GRoIE
+
 Please refer to [GRoIE](https://github.com/open-mmlab/mmdetection/blob/master/configs/groie) for details.
 
 ### Dynamic R-CNN
+
 Please refer to [Dynamic R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/dynamic_rcnn) for details.
 
 ### PointRend
+
 Please refer to [PointRend](https://github.com/open-mmlab/mmdetection/blob/master/configs/point_rend) for details.
 
 ### DetectoRS
+
 Please refer to [DetectoRS](https://github.com/open-mmlab/mmdetection/blob/master/configs/detectors) for details.
 
 ### Generalized Focal Loss
+
 Please refer to [Generalized Focal Loss](https://github.com/open-mmlab/mmdetection/blob/master/configs/gfl) for details.
 
 ### CornerNet
+
 Please refer to [CornerNet](https://github.com/open-mmlab/mmdetection/blob/master/configs/cornernet) for details.
 
 ### YOLOv3
+
 Please refer to [YOLOv3](https://github.com/open-mmlab/mmdetection/blob/master/configs/yolo) for details.
 
 ### PAA
+
 Please refer to [PAA](https://github.com/open-mmlab/mmdetection/blob/master/configs/paa) for details.
 
 ### SABL
+
 Please refer to [SABL](https://github.com/open-mmlab/mmdetection/blob/master/configs/sabl) for details.
 
 ### CentripetalNet
+
 Please refer to [CentripetalNet](https://github.com/open-mmlab/mmdetection/blob/master/configs/centripetalnet) for details.
 
 ### ResNeSt
+
 Please refer to [ResNeSt](https://github.com/open-mmlab/mmdetection/blob/master/configs/resnest) for details.
 
 ### DETR
+
 Please refer to [DETR](https://github.com/open-mmlab/mmdetection/blob/master/configs/detr) for details.
 
 ### Other datasets
@@ -174,6 +192,7 @@ We also benchmark some methods on [PASCAL VOC](https://github.com/open-mmlab/mmd
 We also train [Faster R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/faster_rcnn) and [Mask R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/mask_rcnn) using ResNet-50 and [RegNetX-3.2G](https://github.com/open-mmlab/mmdetection/blob/master/configs/regnet) with multi-scale training and longer schedules. These models serve as strong pre-trained models for downstream tasks for convenience.
 
 ## Speed benchmark
+
 We compare the training speed of Mask R-CNN with some other popular frameworks (The data is copied from [detectron2](https://github.com/facebookresearch/detectron2/blob/master/docs/notes/benchmarks.md)).
 For mmdetection, we benchmark with [mask_rcnn_r50_caffe_fpn_poly_1x_coco_v1.py](https://github.com/open-mmlab/mmdetection/blob/master/configs/mask_rcnn/mask_rcnn_r50_caffe_fpn_poly_1x_coco_v1.py), which should have the same setting with [mask_rcnn_R_50_FPN_noaug_1x.yaml](https://github.com/facebookresearch/detectron2/blob/master/configs/Detectron1-Comparisons/mask_rcnn_R_50_FPN_noaug_1x.yaml) of detectron2.
 We also provide the [checkpoint](http://download.openmmlab.com/mmdetection/v2.0/benchmark/mask_rcnn_r50_caffe_fpn_poly_1x_coco_no_aug/mask_rcnn_r50_caffe_fpn_poly_1x_coco_no_aug_compare_20200518-10127928.pth) and [training log](http://download.openmmlab.com/mmdetection/v2.0/benchmark/mask_rcnn_r50_caffe_fpn_poly_1x_coco_no_aug/mask_rcnn_r50_caffe_fpn_poly_1x_coco_no_aug_20200518_105755.log.json) for reference. The throughput is computed as the average throughput in iterations 100-500 to skip GPU warmup time.
@@ -215,7 +234,6 @@ For fair comparison, we install and run both frameworks on the same machine.
 | [Mask R-CNN](https://github.com/open-mmlab/mmdetection/blob/master/configs/mask_rcnn/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco.py)   | 1x      | [38.6 & 35.2](https://github.com/facebookresearch/detectron2/blob/master/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml) | 38.8 & 35.4 | [model](http://download.openmmlab.com/mmdetection/v2.0/benchmark/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco-dbecf295.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/benchmark/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco_20200430_054239.log.json) |
 | [Retinanet](https://github.com/open-mmlab/mmdetection/blob/master/configs/retinanet/retinanet_r50_caffe_fpn_mstrain_1x_coco.py)    | 1x      | [36.5](https://github.com/facebookresearch/detectron2/blob/master/configs/COCO-Detection/retinanet_R_50_FPN_1x.yaml)        | 37.0        | [model](http://download.openmmlab.com/mmdetection/v2.0/benchmark/retinanet_r50_caffe_fpn_mstrain_1x_coco/retinanet_r50_caffe_fpn_mstrain_1x_coco-586977a0.pth) &#124; [log](http://download.openmmlab.com/mmdetection/v2.0/benchmark/retinanet_r50_caffe_fpn_mstrain_1x_coco/retinanet_r50_caffe_fpn_mstrain_1x_coco_20200430_014748.log.json) |
 
-
 ### Training Speed
 
 The training speed is measure with s/iter. The lower, the better.
@@ -226,7 +244,6 @@ The training speed is measure with s/iter. The lower, the better.
 | Mask R-CNN   | 0.261      | 0.265       |
 | Retinanet    | 0.200      | 0.205       |
 
-
 ### Inference Speed
 
 The inference speed is measured with fps (img/s) on a single GPU, the higher, the better.
diff --git a/docs/robustness_benchmarking.md b/docs/robustness_benchmarking.md
index 4a6938c3f1a..60bd0e645b8 100644
--- a/docs/robustness_benchmarking.md
+++ b/docs/robustness_benchmarking.md
@@ -5,7 +5,7 @@
 We provide tools to test object detection and instance segmentation models on the image corruption benchmark defined in [Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming](https://arxiv.org/abs/1907.07484).
 This page provides basic tutorials how to use the benchmark.
 
-```
+```latex
 @article{michaelis2019winter,
   title={Benchmarking Robustness in Object Detection:
     Autonomous Driving when Winter is Coming},
@@ -71,6 +71,7 @@ python tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESUL
 ```
 
 Or a costom set of corruptions e.g.:
+
 ```shell
 # gaussian noise, zoom blur and snow
 python tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] --corruptions gaussian_noise zoom_blur snow
diff --git a/docs/tutorials/config.md b/docs/tutorials/config.md
index 3a81e7e61c4..df22d097b35 100644
--- a/docs/tutorials/config.md
+++ b/docs/tutorials/config.md
@@ -34,14 +34,14 @@ We follow the below style to name config files. Contributors are advised to foll
 - `{backbone}`: backbone type like `r50` (ResNet-50), `x101` (ResNeXt-101).
 - `{neck}`: neck type like `fpn`, `pafpn`, `nasfpn`, `c4`.
 - `[norm_setting]`: `bn` (Batch Normalization) is used unless specified, other norm layer type could be `gn` (Group Normalization), `syncbn` (Synchronized Batch Normalization).
-`gn-head`/`gn-neck` indicates GN is applied in head/neck only, while `gn-all` means GN is applied in the entire model, e.g. backbone, neck, head.
+    `gn-head`/`gn-neck` indicates GN is applied in head/neck only, while `gn-all` means GN is applied in the entire model, e.g. backbone, neck, head.
 - `[misc]`: miscellaneous setting/plugins of model, e.g. `dconv`, `gcb`, `attention`, `albu`, `mstrain`.
 - `[gpu x batch_per_gpu]`: GPUs and samples per GPU, `8x2` is used by default.
 - `{schedule}`: training schedule, options are `1x`, `2x`, `20e`, etc.
-`1x` and `2x` means 12 epochs and 24 epochs respectively.
-`20e` is adopted in cascade models, which denotes 20 epochs.
-For `1x`/`2x`, initial learning rate decays by a factor of 10 at the 8/16th and 11/22th epochs.
-For `20e`, initial learning rate decays by a factor of 10 at the 16th and 19th epochs.
+    `1x` and `2x` means 12 epochs and 24 epochs respectively.
+    `20e` is adopted in cascade models, which denotes 20 epochs.
+    For `1x`/`2x`, initial learning rate decays by a factor of 10 at the 8/16th and 11/22th epochs.
+    For `20e`, initial learning rate decays by a factor of 10 at the 16th and 19th epochs.
 - `{dataset}`: dataset like `coco`, `cityscapes`, `voc_0712`, `wider_face`.
 
 ## An Example of Mask R-CNN
@@ -437,6 +437,7 @@ The `_delete_=True` would replace all old keys in `backbone` field with new keys
 Some intermediate variables are used in the configs files, like `train_pipeline`/`test_pipeline` in datasets.
 It's worth noting that when modifying intermediate variables in the children configs, user need to pass the intermediate variables into corresponding fields again.
 For example, we would like to use multi scale strategy to train a Mask R-CNN. `train_pipeline`/`test_pipeline` are intermediate variable we would like modify.
+
 ```python
 _base_ = './mask_rcnn_r50_fpn_1x_coco.py'
 img_norm_cfg = dict(
@@ -476,4 +477,5 @@ data = dict(
     val=dict(pipeline=test_pipeline),
     test=dict(pipeline=test_pipeline))
 ```
+
 We first define the new `train_pipeline`/`test_pipeline` and pass them into `data`.
diff --git a/docs/tutorials/data_pipeline.md b/docs/tutorials/data_pipeline.md
index 02a9b5c8304..7ea5665f3d9 100644
--- a/docs/tutorials/data_pipeline.md
+++ b/docs/tutorials/data_pipeline.md
@@ -20,6 +20,7 @@ We present a classical pipeline in the following figure. The blue blocks are pip
 The operations are categorized into data loading, pre-processing, formatting and test-time augmentation.
 
 Here is a pipeline example for Faster R-CNN.
+
 ```python
 img_norm_cfg = dict(
     mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
@@ -55,68 +56,87 @@ For each operation, we list the related dict fields that are added/updated/remov
 ### Data loading
 
 `LoadImageFromFile`
+
 - add: img, img_shape, ori_shape
 
 `LoadAnnotations`
+
 - add: gt_bboxes, gt_bboxes_ignore, gt_labels, gt_masks, gt_semantic_seg, bbox_fields, mask_fields
 
 `LoadProposals`
+
 - add: proposals
 
 ### Pre-processing
 
 `Resize`
+
 - add: scale, scale_idx, pad_shape, scale_factor, keep_ratio
 - update: img, img_shape, *bbox_fields, *mask_fields, *seg_fields
 
 `RandomFlip`
+
 - add: flip
 - update: img, *bbox_fields, *mask_fields, *seg_fields
 
 `Pad`
+
 - add: pad_fixed_size, pad_size_divisor
 - update: img, pad_shape, *mask_fields, *seg_fields
 
 `RandomCrop`
+
 - update: img, pad_shape, gt_bboxes, gt_labels, gt_masks, *bbox_fields
 
 `Normalize`
+
 - add: img_norm_cfg
 - update: img
 
 `SegRescale`
+
 - update: gt_semantic_seg
 
 `PhotoMetricDistortion`
+
 - update: img
 
 `Expand`
+
 - update: img, gt_bboxes
 
 `MinIoURandomCrop`
+
 - update: img, gt_bboxes, gt_labels
 
 `Corrupt`
+
 - update: img
 
 ### Formatting
 
 `ToTensor`
+
 - update: specified by `keys`.
 
 `ImageToTensor`
+
 - update: specified by `keys`.
 
 `Transpose`
+
 - update: specified by `keys`.
 
 `ToDataContainer`
+
 - update: specified by `fields`.
 
 `DefaultFormatBundle`
+
 - update: img, proposals, gt_bboxes, gt_bboxes_ignore, gt_labels, gt_masks, gt_semantic_seg
 
 `Collect`
+
 - add: img_meta (the keys of img_meta is specified by `meta_keys`)
 - remove: all other keys except for those specified by `keys`
 
diff --git a/docs/tutorials/finetune.md b/docs/tutorials/finetune.md
index 1c1a97bd0c4..fb6d0e18b38 100644
--- a/docs/tutorials/finetune.md
+++ b/docs/tutorials/finetune.md
@@ -4,13 +4,14 @@ Detectors pre-trained on the COCO dataset can serve as a good pre-trained model
 This tutorial provides instruction for users to use the models provided in the [Model Zoo](../model_zoo.md) for other datasets to obtain better performance.
 
 There are two steps to finetune a model on a new dataset.
+
 - Add support for the new dataset following [Tutorial 2: Customize Datasets](customize_dataset.md).
 - Modify the configs as will be discussed in this tutorial.
 
-
 Take the finetuning process on Cityscapes Dataset as an example, the users need to modify five parts in the config.
 
 ## Inherit base configs
+
 To release the burden and reduce bugs in writing the whole configs, MMDetection V2.0 support inheriting configs from multiple existing configs. To finetune a Mask RCNN model, the new config needs to inherit
 `_base_/models/mask_rcnn_r50_fpn.py` to build the basic structure of the model. To use the Cityscapes Dataset, the new config can also simply inherit `_base_/datasets/cityscapes_instance.py`. For runtime settings such as training schedules, the new config needs to inherit `_base_/default_runtime.py`. This configs are in the `configs` directory and the users can also choose to write the whole contents rather than use inheritance.
 
@@ -22,6 +23,7 @@ _base_ = [
 ```
 
 ## Modify head
+
 Then the new config needs to modify the head according to the class numbers of the new datasets. By only changing `num_classes` in the roi_head, the weights of the pre-trained models are mostly reused except the final prediction head.
 
 ```python
@@ -53,9 +55,11 @@ model = dict(
 ```
 
 ## Modify dataset
+
 The users may also need to prepare the dataset and write the configs about dataset. MMDetection V2.0 already support VOC, WIDER FACE, COCO and Cityscapes Dataset.
 
 ## Modify training schedule
+
 The finetuning hyperparameters vary from the default schedule. It usually requires smaller learning rate and less training epochs
 
 ```python
@@ -76,6 +80,7 @@ log_config = dict(interval=100)
 ```
 
 ## Use pre-trained model
+
 To use the pre-trained model, the new config add the link of pre-trained models in the `load_from`. The users might need to download the model weights before training to avoid the download time during training.
 
 ```python
diff --git a/docs/useful_tools.md b/docs/useful_tools.md
index 9cd1d5fabad..9857ba597a7 100644
--- a/docs/useful_tools.md
+++ b/docs/useful_tools.md
@@ -106,8 +106,7 @@ Params: 37.74 M
 
 1. FLOPs are related to the input shape while parameters are not. The default
  input shape is (1, 3, 1280, 800).
-2. Some operators are not counted into FLOPs like GN and custom operators
-. Refer to [`mmcv.cnn.get_model_complexity_info()`](https://github.com/open-mmlab/mmcv/blob/master/mmcv/cnn/utils/flops_counter.py) for details.
+2. Some operators are not counted into FLOPs like GN and custom operators. Refer to [`mmcv.cnn.get_model_complexity_info()`](https://github.com/open-mmlab/mmcv/blob/master/mmcv/cnn/utils/flops_counter.py) for details.
 3. The FLOPs of two-stage detectors is dependent on the number of proposals.
 
 ## Model conversion