diff --git a/README.cn.md b/README.cn.md
index ec4bc390..d0b5554d 100644
--- a/README.cn.md
+++ b/README.cn.md
@@ -9,12 +9,13 @@
 
 ---
 
-**Vega ver1.6.1 发布**
+**Vega ver1.7.0 发布**
 
-- Bug Fixes
+- 特性增强
 
-  - 日志打印中的评估时间错误。
-  - 更新Record时错误更新了模型描述。
+  - 提供用于Ascend MindStudio的发布版本。
+  - 提供Horovod（GPU）和HCCL（NPU）的数据并行训练能力。
+  - 修复BUG：BOHB算法在超过3轮后可能会无法自动停止。
 
 ---
 
diff --git a/README.md b/README.md
index 823d3b7d..b316b92d 100644
--- a/README.md
+++ b/README.md
@@ -8,12 +8,13 @@
 
 ---
 
-**Vega ver1.6.1 released**
+**Vega ver1.7.0 released**
 
-- Bug Fixes:
+- Feature enhancement:
 
-  - Evaluation time error in log.
-  - Updating error model description while updating record.
+  - Releases Ascend MindStudio version.
+  - Provides data parallel training capabilities for Horovod (GPU) and HCCL (NPU).
+  - Fixed bug: The BOHB algorithm may not automatically stop after more than three rounds.
 
 ---
 
diff --git a/RELEASE.md b/RELEASE.md
index a4506221..62012f85 100644
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,4 +1,4 @@
-**Vega ver1.6.1 released:**
+**Vega ver1.7.0 released:**
 
 **Introduction**
 
diff --git a/docs/cn/developer/developer_guide.md b/docs/cn/developer/developer_guide.md
index b59c7b1e..4d5e5db2 100644
--- a/docs/cn/developer/developer_guide.md
+++ b/docs/cn/developer/developer_guide.md
@@ -289,8 +289,6 @@ trainer的主要函数是train_process()，该函数定义如下：
                 self._valid_epoch()
             self.callbacks.after_epoch(epoch)
         self.callbacks.after_train()
-        if self.distributed:
-            self._shutdown_distributed()
 
     def _train_epoch(self):
         if vega.is_torch_backend():
@@ -707,28 +705,3 @@ class PipeStep(object):
         """Do the main task in this pipe step."""
         pass
 ```
-
-## 8. Fully Train
-
-在`Fully Train`上，我们支持单卡训练和基于`Horovod`的多机多卡分布式训练，`Fully Train`对应于`pipeline`的`TrainPipeStep`部分。
-
-### 8.1 配置
-
-如果需要进行`Horovod`分布式训练，需要在`TrainPipeStep`的`trainer`部分的配置文件里加上一个配置项`distributed`，并设置成`True`，如果没有这一项，默认是False，即不使用分布式训练。
-
-```yaml
-fullytrain:
-    pipe_step:
-        type: TrainPipeStep
-    trainer:
-        type: trainer
-        distributed: True
-```
-
-我们通过`shell`启动`Horovod`分布式训练，已经在镜像里完成不同节点之间的通信配置，开发者可以不用关心`vega`内部是如何启动的。
-
-### 8.2 Trainer支持Horovod分布式
-
-在使用分布式训练时，相对于单卡的训练，`trainer`的网络模型、优化器、数据加载等需要使用`Horovod`封装成分布式的对象。
-
-在训练的过程中，单卡和分布式训练的代码几乎是一致的，只是在最后计算验证指标时，需要将不同卡上的指标值综合起来，计算总的平均值。
diff --git a/docs/cn/developer/quick_start.md b/docs/cn/developer/quick_start.md
index 2d109d42..9aeb3c82 100644
--- a/docs/cn/developer/quick_start.md
+++ b/docs/cn/developer/quick_start.md
@@ -171,7 +171,6 @@ nas:
             type: accuracy
         epochs: 3
         save_steps: 250
-        distributed: False
         num_class: 10
     dataset:
         type: Cifar10
diff --git a/docs/cn/user/config_reference.md b/docs/cn/user/config_reference.md
index f6aca3a1..0ece95f3 100644
--- a/docs/cn/user/config_reference.md
+++ b/docs/cn/user/config_reference.md
@@ -81,20 +81,13 @@ general:
 
 ## 2.1 并行和分布式
 
-涉及到分布式的配置项有：general.parallel_search, general.parallel_fully_train 和 trainer.distributed，若有多张GPU|NUP，可根据需要选择合适的并行和分布式设置。
+在NAS/HPO搜索过程中，一般一个Trainer对应一个GPU/NPU，若需要一个Trainer对应多个GPU/NPU，可通过修改`general.device_per_trainer`参数。
 
-| general.parallel_search or<br>general.parallel_fully_train | general.devices_per_trainer | trainer.distributed | 分布式和并行方式 |
-| :--: | :--: | :--: | :-- |
-| False | 1 | False | (缺省设置)使用一张卡串行搜索和训练 |
-| False | >1 | False | 使用多张卡串行搜索和训练 |
-| False | >=1 (分配给每个模型的加速卡数量) | True | 使用Horovod/HCCL进行训练 |
-| True | 1 | 任意值 | 并行搜索和训练，每个模型使用一张卡 |
-| True | >1 (分配给每个模型的加速卡数量) | 任意值 | 并行搜索和训练，每个模型使用多张卡 |
-
-如以下是搜索阶段使用2张卡训练一个模型，在完整训练阶段使用Horovod进行训练。
+目前该配置仅支持PyTorch/GPU场景，如下所示。
 
 ```yaml
 general:
+    backend: pytroch
     parallel_search: True
     parallel_fully_train: False
     devices_per_trainer: 2
@@ -143,6 +136,30 @@ fully_train:
         type: Cifar10
 ```
 
+在完整训练阶段，可考虑使用Horovod（GPU）或者HCCL（NPU）两种方式来提供数据分布式模型训练。
+
+如下所示：
+
+```yaml
+pipeline: [fully_train]
+
+fully_train:
+    pipe_step:
+        type: HorovodTrainStep  # HorovodTrainStep(GPU), HcclTrainStep(NPU)
+    trainer:
+        epochs: 160
+    model:
+        model_desc:
+            modules: ['backbone']
+            backbone:
+                type: ResNet
+                num_class: 10
+    dataset:
+        type: Cifar10
+        common:
+            data_path: /cache/datasets/cifar10/
+```
+
 ## 3. NAS和HPO配置项
 
 HPO / NAS的配置项有如下几个主要部分：
diff --git a/docs/cn/user/security_configure.md b/docs/cn/user/security_configure.md
new file mode 100644
index 00000000..9e69b809
--- /dev/null
+++ b/docs/cn/user/security_configure.md
@@ -0,0 +1,41 @@
+# vega 安全配置
+
+## 评估服务器
+### 评估服务器 https 安全配置
+待补充
+### 评估服务器 其他安全配置建议
+#### 评估服务器配置白名单，仅可信的服务器连接评估服务器
+1. linux 白名单配置
+    * 配置白名单：
+        ```
+        sudo iptables -I INPUT -p tcp --dport 评估端口 -j DROP
+        sudo iptables -I INPUT -s 白名单IP地址1 -p tcp --dport 评估端口 -j ACCEPT
+        sudo iptables -I INPUT -s 白名单IP地址2 -p tcp --dport 评估端口 -j ACCEPT
+        sudo iptables -I INPUT -s 白名单IP地址3 -p tcp --dport 评估端口 -j ACCEPT
+        sudo iptables -I INPUT -s 白名单IP地址4 -p tcp --dport 评估端口 -j ACCEPT
+       ```
+    * 如果需要从白名单中删除某一项
+        1. 查询白名单 ```sudo iptables -L -n --line-number```
+        2. 删除白名单 ```sudo iptables -D INPUT 查询的对应行编号```
+    
+2. 配置文件 `.vega/vega.ini` 配置白名单
+    * 在配置中的 limit.white_list 中配置白名单，用逗号分隔
+    ```ini
+    [limit]
+    white_list=127.0.0.1,10.174.183.95
+    ```
+
+#### 评估服务器配置访问频率
+配置文件`.vega/vega.ini` 配置访问频率,默认限制每分钟最大100次访问
+```ini
+[limit]
+request_frequency_limit=5/minute # 配置为每分钟最大5次访问
+```
+
+#### 评估服务器配置请求大小限制
+配置文件`.vega/vega.ini` 配置请求大小限制，可以控制上传文件大小，默认配置 1G
+```ini
+[limit]
+max_content_length=100000 # 配置请求大小最大100K 
+```
+
diff --git a/docs/en/developer/developer_guide.md b/docs/en/developer/developer_guide.md
index 0472c7bc..2a4dd3a5 100644
--- a/docs/en/developer/developer_guide.md
+++ b/docs/en/developer/developer_guide.md
@@ -293,8 +293,6 @@ The standard trainer training process is implemented in the train_process interf
                 self._valid_epoch()
             self.callbacks.after_epoch(epoch)
         self.callbacks.after_train()
-        if self.distributed:
-            self._shutdown_distributed()
 
     def _train_epoch(self):
         if vega.is_torch_backend():
@@ -712,28 +710,3 @@ class PipeStep(object):
         """Do the main task in this pipe step."""
         pass
 ```
-
-## 8. Fully Train
-
-On `Fully Train`, we support single-card training and multi-device multi-card distributed training based on `Horovod`. `Fully Train` corresponds to `TrainPipeStep` in `pipeline`.
-
-### 8.1 Configuration
-
-If you need to perform `Horovod` distributed training, add the configuration item `distributed` to the `trainer` configuration file of `TrainPipeStep` and set it to `True`. If this configuration item is not added, the default value is False, indicating that distributed training is not used.
-
-```yaml
-fullytrain:
-    pipe_step:
-        type: TrainPipeStep
-    trainer:
-        type: trainer
-        distributed: True
-```
-
-The `shell` is used to start the `Horovod` distributed training. The communication between different nodes has been configured in the image. Developers do not need to care about how the `vega` is started internally.
-
-### 8.2 Distributed Horovod Supported by Trainers
-
-In distributed training, the network model, optimizer, and data loading of the `trainer` need to be encapsulated into distributed objects using the `Horovod`.
-
-During the training, the code of single-card training is almost the same as that of distributed training. However, during the final calculation of verification indicators, the indicator values on different cards need to be combined to calculate the total average value.
diff --git a/docs/en/user/config_reference.md b/docs/en/user/config_reference.md
index 998c6edc..c6e61b34 100644
--- a/docs/en/user/config_reference.md
+++ b/docs/en/user/config_reference.md
@@ -80,17 +80,9 @@ general:
 
 ## 2.1 Parallel and distributed
 
-If there are multiple GPU|NUPs in the running environment, select a proper parallel or distributed configuration as required. The configuration items related to distributed deployment are general.parallel_search, general.parallel_fully_train, and trainer.distributed.
+During NAS/HPO search, one trainer corresponds to one GPU/NPU. If one trainer corresponds to multiple GPUs/NPUs, you can modify the `general.device_per_trainer` parameter.
 
-| general.parallel_search or<br>general.parallel_fully_train | general.devices_per_trainer | trainer.distributed | Distributed and parallel modes |
-| :--: | :--: | :--: | :-- |
-| False | 1 | False | (default) Serial search and training with one card |
-| False | >1 | False | Serial Search and Training Using Multiple Cards |
-| False |  >=1<br>(Number of cards assigned to each model) | True | Training with Horovod/HCCL |
-| True | 1 | Any value | Parallel search and training with one card per model |
-| True | >1<br>(Number of cards assigned to each model) | Any value | Parallel search and training with multiple cards per model |
-
-Here's how to train a model using 2 cards during the search phase and Horovod during the full training phase:
+Currently, this configuration works on PyTorch/GPU, as shown in the following:
 
 ```yaml
 general:
@@ -142,6 +134,30 @@ fully_train:
         type: Cifar10
 ```
 
+In the fully training phase, Horovod (GPU) or HCCL (NPU) can be used to provide distributed data model training.
+
+This is as follows:
+
+```yaml
+pipeline: [fully_train]
+
+fully_train:
+    pipe_step:
+        type: HorovodTrainStep  # HorovodTrainStep(GPU), HcclTrainStep(NPU)
+    trainer:
+        epochs: 160
+    model:
+        model_desc:
+            modules: ['backbone']
+            backbone:
+                type: ResNet
+                num_class: 10
+    dataset:
+        type: Cifar10
+        common:
+            data_path: /cache/datasets/cifar10/
+```
+
 ## 3. NAS and HPO configuration items
 
 HPO and NAS configuration items include:
diff --git a/evaluate_service/hardwares/davinci/davinci.py b/evaluate_service/hardwares/davinci/davinci.py
index bf872d60..f8c7cecb 100644
--- a/evaluate_service/hardwares/davinci/davinci.py
+++ b/evaluate_service/hardwares/davinci/davinci.py
@@ -39,10 +39,11 @@ def convert_model(self, backend, model, weight, **kwargs):
         """
         om_save_path = kwargs["save_dir"]
         input_shape = kwargs["input_shape"]
+        precision = kwargs['precision']
         log_save_path = os.path.dirname(model)
 
         command_line = ["bash", self.current_path + "/model_convert.sh", self.davinci_environment_type, backend, model,
-                        weight, om_save_path, log_save_path, input_shape]
+                        weight, om_save_path, log_save_path, input_shape, precision]
         try:
             subprocess.check_output(command_line)
         except subprocess.CalledProcessError as exc:
diff --git a/evaluate_service/hardwares/davinci/model_convert.sh b/evaluate_service/hardwares/davinci/model_convert.sh
index 818a89a5..edeb6805 100644
--- a/evaluate_service/hardwares/davinci/model_convert.sh
+++ b/evaluate_service/hardwares/davinci/model_convert.sh
@@ -5,6 +5,7 @@ WEIGHT=$4
 OM_SAVE_PATH=$5
 LOG_SAVE_PATH=$6
 INPUT_SHAPE=$7
+PRECISION=$8
 
 if [ $DAVINCI_ENV_TYPE == "ATLAS200DK" ]; then
     if [ $BACKEND == "tensorflow" ]; then
@@ -16,13 +17,13 @@ if [ $DAVINCI_ENV_TYPE == "ATLAS200DK" ]; then
     fi
 else
     if [ $BACKEND == "tensorflow" ]; then
-        atc --model=$MODEL  --framework=3  --input_format='NCHW'  --disable_reuse_memory=1  --input_shape=$INPUT_SHAPE  --output=$OM_SAVE_PATH/davinci_model --soc_version=Ascend310 --core_type=AiCore  >$LOG_SAVE_PATH/omg.log 2>&1
+        atc --model=$MODEL  --framework=3  --input_format='NCHW'  --disable_reuse_memory=1  --input_shape=$INPUT_SHAPE  --output=$OM_SAVE_PATH/davinci_model --soc_version=Ascend310 --core_type=AiCore  --output_type=$PRECISION >$LOG_SAVE_PATH/omg.log 2>&1
     elif [ $BACKEND == "caffe" ]; then
         atc --model=$MODEL --weight=$WEIGHT --framework=0  --input_format='NCHW' --disable_reuse_memory=1  --output=$OM_SAVE_PATH/davinci_model --soc_version=Ascend310 --core_type=AiCore  >$LOG_SAVE_PATH/omg.log  2>&1
     elif [ $BACKEND == "mindspore" ]; then
-        atc --model=$MODEL  --framework=1  --disable_reuse_memory=1  --output=$OM_SAVE_PATH/davinci_model --soc_version=Ascend310 --core_type=AiCore   >$LOG_SAVE_PATH/omg.log  2>&1
+        atc --model=$MODEL  --framework=1  --disable_reuse_memory=1  --output=$OM_SAVE_PATH/davinci_model --soc_version=Ascend310 --core_type=AiCore  --output_type=$PRECISION >$LOG_SAVE_PATH/omg.log  2>&1
     elif [ $BACKEND == "onnx" ]; then
-        atc --model=$MODEL  --framework=5  --output=$OM_SAVE_PATH/davinci_model --soc_version=Ascend310 --core_type=AiCore   >$LOG_SAVE_PATH/omg.log  2>&1
+        atc --model=$MODEL  --framework=5  --output=$OM_SAVE_PATH/davinci_model --soc_version=Ascend310 --core_type=AiCore  --output_type=$PRECISION  >$LOG_SAVE_PATH/omg.log  2>&1
     else
         echo "[ERROR] Davinci model convert: The backend must be tensorflow, caffe, mindspore or onnx."
     fi
diff --git a/evaluate_service/main.py b/evaluate_service/main.py
index ab6e1115..52d09001 100644
--- a/evaluate_service/main.py
+++ b/evaluate_service/main.py
@@ -42,6 +42,7 @@
 import traceback
 import argparse
 
+
 app = Flask(__name__)
 api = Api(app)
 
@@ -50,7 +51,7 @@ class Evaluate(Resource):
     """Evaluate Service for service."""
 
     def __init__(self):
-        self.result = {"latency": "9999", "out_data": [], "status": "sucess", "timestamp": ""}
+        self.result = {"latency": "9999", "out_data": [], "status": "sucess", "timestamp": "", "error_message": ""}
 
     @classmethod
     def _add_params(cls, work_path, optional_params):
@@ -70,9 +71,10 @@ def post(self):
             try:
                 self.hardware_instance.convert_model(backend=self.backend, model=self.model, weight=self.weight,
                                                      save_dir=self.share_dir, input_shape=self.input_shape,
-                                                     out_nodes=self.out_nodes)
+                                                     out_nodes=self.out_nodes, precision=self.precision)
             except Exception:
                 self.result["status"] = "Model convert failed."
+                self.result["error_message"] = traceback.format_exc()
                 logging.error("[ERROR] Model convert failed!")
                 traceback.print_exc()
         try:
@@ -85,6 +87,7 @@ def post(self):
             self.result["out_data"] = output
         except Exception:
             self.result["status"] = "Inference failed."
+            self.result["error_message"] = traceback.format_exc()
             logging.error("[ERROR] Inference failed! ")
             traceback.print_exc()
 
@@ -99,6 +102,7 @@ def parse_paras(self):
         self.input_shape = request.form.get("input_shape", type=str, default="")
         self.out_nodes = request.form.get("out_nodes", type=str, default="")
         self.repeat_times = int(request.form.get("repeat_times"))
+        self.precision = request.form.get("precision", type=str, default="FP32")
 
     def upload_files(self):
         """Upload the files from the client to the service."""
@@ -151,7 +155,7 @@ def _parse_args():
     parser.add_argument("-w", "--work_path", type=str, required=True, help="the work dir to save the file")
     parser.add_argument("-t", "--davinci_environment_type", type=str, required=False, default="ATLAS300",
                         help="the type the davinci hardwares")
-    parser.add_argument("-c", "--clean_interval", type=int, required=False, default=1 * 24 * 3600,
+    parser.add_argument("-c", "--clean_interval", type=int, required=False, default=1 * 6 * 3600,
                         help="the time interval to clean the temp folder")
     parser.add_argument("-u", "--ddk_user_name", type=str, required=False, default="user",
                         help="the user to acess ATLAS200200 DK")
diff --git a/examples/compression/prune_ea/prune_finetune_ms.yml b/examples/compression/prune_ea/prune_finetune_ms.yml
index ccff2147..e9a8898c 100644
--- a/examples/compression/prune_ea/prune_finetune_ms.yml
+++ b/examples/compression/prune_ea/prune_finetune_ms.yml
@@ -12,7 +12,7 @@ fine_tune:
             type: ResNetMs
             resnet_size: 50
             num_classes: 10
-            need_adjust: True
+        need_adjust: True
         pretrained_model_file: "/cache/models/resnet50-19c8e357.pth"
     trainer:
         type: Trainer
diff --git a/examples/data_augmentation/cyclesr/cyclesr.yml b/examples/data_augmentation/cyclesr/cyclesr.yml
index 055068b8..c5602622 100644
--- a/examples/data_augmentation/cyclesr/cyclesr.yml
+++ b/examples/data_augmentation/cyclesr/cyclesr.yml
@@ -24,7 +24,6 @@ fully_train:
             save_in_memory: False
             pin_memory: False
             shuffle: True
-            distributed: False
             imgs_per_gpu: 4
             drop_last: True
         test:
@@ -34,7 +33,6 @@ fully_train:
             num_workers: 8
             shuffle: False
             pin_memory: False
-            distributed: False
             imgs_per_gpu: 4
             val_ps_offset: 10
             drop_last: False
@@ -51,7 +49,6 @@ fully_train:
         val_ps_offset: 10
         continue_train: !!null
         lr_policy: linear
-        distributed: False
         model_desc:
             modules: ["custom"]
             custom:
diff --git a/examples/developer/train.py b/examples/developer/train.py
deleted file mode 100644
index 02e732a9..00000000
--- a/examples/developer/train.py
+++ /dev/null
@@ -1,90 +0,0 @@
-# -*- coding:utf-8 -*-
-
-# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
-# This program is free software; you can redistribute it and/or modify
-# it under the terms of the MIT License.
-# This program is distributed in the hope that it will be useful,
-# but WITHOUT ANY WARRANTY; without even the implied warranty of
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-# MIT License for more details.
-
-"""The example of training model."""
-
-import vega
-
-
-# ===================== backend =========================
-
-# set backend, {pytorch, tensorflow, mindspore}, {GPU, NPU}
-vega.set_backend("pytorch", "GPU")
-
-
-# ===================== model =========================
-
-# using vega's model directly
-# vega's network can work on pytorch, tensorflwo or mindspore
-model = vega.network("ResNet", depth=18).cuda()
-
-# # or using vega's model zoo with model desc
-# desc = {
-#     "modules": ["backbone"],
-#     "backbone": {
-#         "type": "ResNet",
-#         "depth": 18,
-#         "num_class": 10,
-#     }
-# }
-# model = ModelZoo.get_model(model_desc=desc).cuda()
-
-# # or using torchvision model
-# from torchvision.models import resnet18, resnet34
-# model = resnet18().cuda()
-
-
-# ===================== dataset =========================
-
-# using vega's dataset, vega's dataset can work on pytorch, tensorflwo or mindspore
-train_loader = vega.dataset("Cifar10", data_path="/cache/datasets/cifar10", mode="train", batch_size=256).loader
-test_loader = vega.dataset("Cifar10", data_path="/cache/datasets/cifar10", mode="val", batch_size=256).loader
-
-# # or using torchvision dataset
-# import torchvision
-# import torchvision.transforms as transforms
-# train_dataset = torchvision.datasets.CIFAR10(
-#     root="/cache/datasets/cifar10", train=True, transform=transforms.ToTensor())
-# test_dataset = torchvision.datasets.CIFAR10(
-#     root="/cache/datasets/cifar10", train=False, transform=transforms.ToTensor())
-# train_loader = torch.utils.data.DataLoader(
-#     dataset=train_dataset, batch_size=256, shuffle=True)
-# test_loader = torch.utils.data.DataLoader(
-#     dataset=test_dataset, batch_size=256, shuffle=False)
-
-
-# ===================== trainer =========================
-
-trainer = vega.trainer(model=model)
-trainer.config.epochs = 2
-trainer.config.mixup = True
-trainer.train_loader = train_loader
-trainer.valid_loader = test_loader
-trainer.train_process()
-print("Training is complete. Please check folder: {}".format(trainer.get_local_worker_path()))
-
-
-# ===================== cluster =========================
-
-# from vega.core.scheduler import create_master, shutdown_cluster
-# from vega.core.run import init_cluster_args
-# from vega.common.general import General
-
-# General._parallel = True
-# init_cluster_args()
-# master = create_master()
-
-# master.run(trainer)
-
-# # master.run(other trainers)
-
-# master.join()
-# shutdown_cluster()
-# print("Training is complete. Please check the folder: {}".format(trainer.get_local_worker_path()))
diff --git a/examples/features/data_parallel_train/hccl.yml b/examples/features/data_parallel_train/hccl.yml
new file mode 100644
index 00000000..92220ac0
--- /dev/null
+++ b/examples/features/data_parallel_train/hccl.yml
@@ -0,0 +1,27 @@
+# .
+# The configuration is as follows:
+
+
+pipeline: [fully_train]
+
+fully_train:
+    pipe_step:
+        type: HcclTrainStep
+    trainer:
+        epochs: 1
+    model:
+        model_desc:
+            modules: ['backbone']
+            backbone:
+                type: ResNet
+                num_class: 10
+    dataset:
+        type: Cifar10
+        common:
+            data_path: /cache/datasets/cifar10/
+    evaluator:
+        type: Evaluator
+        host_evaluator:                 # optional, evaluate the accuracy of the model on the host side
+            type: HostEvaluator
+            metric:
+                type: accuracy          # accuracy (classification) | psnr (super resolution)
diff --git a/examples/features/data_parallel_train/horovod.yml b/examples/features/data_parallel_train/horovod.yml
new file mode 100644
index 00000000..2f50c3b1
--- /dev/null
+++ b/examples/features/data_parallel_train/horovod.yml
@@ -0,0 +1,21 @@
+# .
+# The configuration is as follows:
+
+
+pipeline: [fully_train]
+
+fully_train:
+    pipe_step:
+        type: HorovodTrainStep
+    trainer:
+        epochs: 1
+    model:
+        model_desc:
+            modules: ['backbone']
+            backbone:
+                type: ResNet
+                num_class: 10
+    dataset:
+        type: Cifar10
+        common:
+            data_path: /cache/datasets/cifar10/
diff --git a/examples/features/data_parallel_train/nas_hpo_ddp.yml b/examples/features/data_parallel_train/nas_hpo_ddp.yml
new file mode 100644
index 00000000..d9598575
--- /dev/null
+++ b/examples/features/data_parallel_train/nas_hpo_ddp.yml
@@ -0,0 +1,50 @@
+general:
+    backend: pytorch
+    parallel_search: True
+    devices_per_trainer: 2
+
+pipeline: [hpo]
+
+hpo:
+    pipe_step:
+        type: SearchPipeStep
+    dataset:
+        type: Cifar10
+        common:
+            data_path: /cache/datasets/cifar10/
+            batch_size: 256
+    search_algorithm:
+        type: BohbHpo
+        policy:
+            total_epochs: 100
+    search_space:
+        type: SearchSpace
+        hyperparameters:
+            -   key: dataset.batch_size
+                type: CATEGORY
+                range: [64, 128, 256]
+            -   key: trainer.optimizer.params.lr
+                type: CATEGORY
+                range: [0.001, 0.003, 0.007, 0.01, 0.03, 0.07, 0.1]
+            -   key: trainer.optimizer.type
+                type: CATEGORY
+                range: ['Adam', 'SGD']
+            -   key: trainer.optimizer.params.momentum
+                type: FLOAT
+                range: [0.8, 0.99]
+        condition:
+            -   key: condition_for_sgd_momentum
+                child: trainer.optimizer.params.momentum
+                parent: trainer.optimizer.type
+                type: EQUAL
+                range: ["SGD"]
+    model:
+        model_desc:
+            modules: ["backbone"]
+            backbone:
+                type: ResNet
+                depth: 18
+                num_class: 10
+    trainer:
+        type: Trainer
+        epochs: 1
diff --git a/examples/features/multi_task/multi_head.yml b/examples/features/multi_task/multi_head.yml
index fe87b806..d91c9646 100644
--- a/examples/features/multi_task/multi_head.yml
+++ b/examples/features/multi_task/multi_head.yml
@@ -92,7 +92,6 @@ fully_train:
             type: Trainer
             multi_task: True
             cuda: true
-            distributed: False
             seed: 0
             epochs: 1
             optimizer:
diff --git a/examples/fully_train/efficientnet/efficientnet_b0.yml b/examples/fully_train/efficientnet/efficientnet_b0.yml
index cb9c2fa5..997d567d 100644
--- a/examples/fully_train/efficientnet/efficientnet_b0.yml
+++ b/examples/fully_train/efficientnet/efficientnet_b0.yml
@@ -60,7 +60,6 @@ fully_train:
             std: [0.229, 0.224, 0.225]
             workers: 8
         epochs: 500 
-        distributed: False
         prefetcher: True
         model_name: effcientnet.pickle
         ckpt_name: efficientnet_fully_train.pth
diff --git a/examples/fully_train/efficientnet/efficientnet_b4.yml b/examples/fully_train/efficientnet/efficientnet_b4.yml
index 9281cd0a..6c3efa79 100644
--- a/examples/fully_train/efficientnet/efficientnet_b4.yml
+++ b/examples/fully_train/efficientnet/efficientnet_b4.yml
@@ -63,7 +63,6 @@ fully_train:
             std: [0.229, 0.224, 0.225]
             workers: 8
         epochs: 300
-        distributed: True
         prefetcher: True
         model_name: effcientnet.pickle
         ckpt_name: efficientnet_fully_train.pth
diff --git a/examples/fully_train/faster_rcnn/asha_fasterrcnn.yml b/examples/fully_train/faster_rcnn/asha_fasterrcnn.yml
deleted file mode 100644
index 85c67a60..00000000
--- a/examples/fully_train/faster_rcnn/asha_fasterrcnn.yml
+++ /dev/null
@@ -1,69 +0,0 @@
-general:
-    backend: pytorch  #pytorch
-
-pipeline: [hpo]
-
-hpo:
-    pipe_step:
-        type: SearchPipeStep
-
-    search_algorithm:
-        type: AshaHpo
-        objective_keys: mAP
-        policy:
-            total_epochs: 200
-
-    search_space:
-        type: SearchSpace
-        hyperparameters:
-            -   key: network.box_score_thresh
-                type: FLOAT_EXP
-                range: [0.0001, 1]
-            -   key: trainer.optimizer.params.lr
-                type: FLOAT_EXP
-                range: [0.0001, 0.01]
-            -   key: trainer.optimizer.type
-                type: CATEGORY
-                range: ['Adam', 'SGD']
-            -   key: trainer.optimizer.params.momentum
-                type: FLOAT
-                range: [0.0, 0.99]
-        condition:
-            -   key: condition_for_sgd_momentum
-                child: trainer.optimizer.params.momentum
-                parent: trainer.optimizer.type
-                type: EQUAL
-                range: ["SGD"]
-    model:
-        model_desc:
-            type: torchvision_fasterrcnn_resnet50_fpn
-
-    trainer:
-        type: Trainer
-        get_train_metric_after_epoch: False
-        model_statistics: False
-        is_detection_trainer: True
-        optimizer:
-            type: SGD
-            params:
-                lr: 0.003
-                momentum: 0.9
-                weight_decay: 0.0001
-        lr_scheduler:
-            type: CosineAnnealingLR
-            params:
-                T_max: 30000
-                eta_min: 0.0001
-        loss:
-            type: SumLoss
-        metric:
-            type: coco
-            params:
-                anno_path: /cache/datasets/COCO2017/annotations/instances_val2017.json
-
-    dataset:
-        type: CocoDataset
-        common:
-            data_root: /cache/datasets/COCO2017/
-            img_prefix: 2017
-            ann_prefix: instances
\ No newline at end of file
diff --git a/examples/fully_train/faster_rcnn/autoloss_fasterrcnn.yml b/examples/fully_train/faster_rcnn/autoloss_fasterrcnn.yml
deleted file mode 100644
index 30a34841..00000000
--- a/examples/fully_train/faster_rcnn/autoloss_fasterrcnn.yml
+++ /dev/null
@@ -1,53 +0,0 @@
-general:
-    backend: pytorch  #pytorch
-#    parallel_search: True
-#    parallel_fully_train: True
-
-pipeline: [auto_loss]
-
-auto_loss:
-    pipe_step:
-        type: SearchPipeStep
-
-    search_algorithm:
-        type: Autoloss
-        objective_keys: mAP
-        policy:
-            config_count: 8
-            total_rungs: 200
-            loss_num: 4
-
-    model:
-        model_desc:
-            type: torchvision_fasterrcnn_resnet50_fpn
-
-    trainer:
-        type: Trainer
-        get_train_metric_after_epoch: False
-        model_statistics: False
-        is_detection_trainer: True
-        adaptive_muti_loss: True
-        optimizer:
-            type: SGD
-            params:
-                lr: 0.003
-                momentum: 0.9
-                weight_decay: 0.0001
-        lr_scheduler:
-            type: CosineAnnealingLR
-            params:
-                T_max: 30000
-                eta_min: 0.0001
-        loss:
-            type: SumLoss
-        metric:
-            type: coco
-            params:
-                anno_path: /cache/datasets/COCO2017/annotations/instances_val2017.json
-
-    dataset:
-        type: CocoDataset
-        common:
-            data_root: /cache/datasets/COCO2017/
-            img_prefix: 2017
-            ann_prefix: instances
\ No newline at end of file
diff --git a/examples/fully_train/faster_rcnn/faster_rcnn_finetune.yml b/examples/fully_train/faster_rcnn/faster_rcnn_finetune.yml
deleted file mode 100644
index 11cf0998..00000000
--- a/examples/fully_train/faster_rcnn/faster_rcnn_finetune.yml
+++ /dev/null
@@ -1,47 +0,0 @@
-pipeline: [finetune]
-
-finetune:
-    pipe_step:
-        type: TrainPipeStep
-
-    model:
-        pretrained_model_file: /cache/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth
-        head: roi_heads
-        model_desc:
-            type: FasterRCNN
-            backbone:
-                type: ResNetBackbone
-            num_classes: 7
-
-    trainer:
-        type: Trainer
-        epochs: 12
-        get_train_metric_after_epoch: False
-        model_statistics: False
-        load_checkpoint: False
-        is_detection_trainer: True
-        perfs_cmp_key: AP50
-        optimizer:
-            type: SGD
-            params:
-                lr: 0.003
-                momentum: 0.9
-                weight_decay: !!float 1e-4
-        lr_scheduler:
-            type: MultiStepLR
-            params:
-                milestones: [10]
-                gamma: 0.1
-        loss:
-            type: SumLoss
-        metric:
-            type: coco
-            params:
-                anno_path: /cache/datasets/naie_coco/annotations/instances_val2017.json
-    dataset:
-        type: CocoDataset
-        common:
-            batch_size: 1
-            data_root: /cache/datasets/naie_coco
-            img_prefix: "2017"
-            ann_prefix: instances
\ No newline at end of file
diff --git a/examples/fully_train/faster_rcnn/faster_rcnn_replace_backbone.yml b/examples/fully_train/faster_rcnn/faster_rcnn_replace_backbone.yml
deleted file mode 100644
index 70f85e8e..00000000
--- a/examples/fully_train/faster_rcnn/faster_rcnn_replace_backbone.yml
+++ /dev/null
@@ -1,57 +0,0 @@
-general:
-    backend: pytorch
-
-
-pipeline: [fullytrain]
-
-fullytrain:
-    pipe_step:
-        type: TrainPipeStep
-
-    model:
-        model_desc:
-            type: FasterRCNN
-            num_classes: 91
-            backbone:
-                type: SerialBackbone
-                weight_file: /cache/models/fasterrcnn_serialnet_backbone.pth
-            neck:
-                type: FPN
-    trainer:
-        type: Trainer
-        get_train_metric_after_epoch: False
-        model_statistics: False
-        is_detection_trainer: True
-        epochs: 1
-        optimizer:
-            type: SGD
-            params:
-                lr: 0.003
-                momentum: 0.9
-                weight_decay: 0.0001
-        lr_scheduler:
-            type: WarmupScheduler
-            params:
-                by_epoch: False
-                warmup_type: linear
-                warmup_iters: 2000
-                warmup_ratio: 0.1
-                after_scheduler_config:
-                    type: CosineAnnealingLR
-                    T_max: 30000
-                    eta_min: 0.0001
-                after_scheduler_by_epoch: False
-        loss:
-            type: SumLoss
-        metric:
-            type: coco
-            params:
-                anno_path: /cache/datasets/COCO2017/annotations/instances_val2017.json
-
-    dataset:
-        type: CocoDataset
-        common:
-            batch_size: 1
-            data_root: /cache/datasets/COCO2017/
-            img_prefix: 2017
-            ann_prefix: instances
diff --git a/examples/nas/adelaide_ea/adelaide_ea.yml b/examples/nas/adelaide_ea/adelaide_ea.yml
index 3c01df91..ee07f5ca 100644
--- a/examples/nas/adelaide_ea/adelaide_ea.yml
+++ b/examples/nas/adelaide_ea/adelaide_ea.yml
@@ -2,6 +2,7 @@ general:
     backend: pytorch  # pytorch | tensorflow
     parallel_search: True
     parallel_fully_train: True
+    ms_execute_mode: 1   # for mindspore
 
 
 pipeline: [random, mutate, fully_train]
@@ -64,8 +65,6 @@ random:
         type: Trainer
         callbacks: AdelaideEATrainerCallback
         cuda: true
-        distributed: False
-        execute_mode: PYNATIVE_MODE   # for mindspore
         seed: 0
         epochs: 6
         optimizer:
@@ -112,7 +111,7 @@ fully_train:
     dataset:
         ref: random.dataset
         train:
-            batch_size: 24
+            batch_size: 16
     trainer:
         ref: random.trainer
         load_checkpoint: False
diff --git a/examples/nas/backbone_nas/backbone_nas_finetune_ms.yml b/examples/nas/backbone_nas/backbone_nas_finetune_ms.yml
index 3d3ad739..dc2af73e 100644
--- a/examples/nas/backbone_nas/backbone_nas_finetune_ms.yml
+++ b/examples/nas/backbone_nas/backbone_nas_finetune_ms.yml
@@ -9,6 +9,8 @@ fine_tune:
     pipe_step:
         type: TrainPipeStep
     model:
+        pretrained_model_file: "/cache/models/resnet50-19c8e357.pth"
+        need_adjust: True
         model_desc:
             modules: ['backbone']
             backbone:
@@ -16,8 +18,6 @@ fine_tune:
                 depth: 50
                 num_class: 10
                 small_input: False
-                need_adjust: True
-        pretrained_model_file: "/cache/models/resnet50-19c8e357.pth"
 
     trainer:
         type: Trainer
@@ -50,31 +50,31 @@ fine_tune:
             batch_size: 128
         train:
             transforms:
-                - type: Resize
-                  size: [256, 256]
-                - type: RandomCrop
-                  size: [224, 224]
-                - type: RandomHorizontalFlip
-                - type: ToTensor
-                - type: Normalize
-                  mean: [0.50, 0.5, 0.5]
-                  std: [0.50, 0.5, 0.5]
+                -   type: Resize
+                    size: [256, 256]
+                -   type: RandomCrop
+                    size: [224, 224]
+                -   type: RandomHorizontalFlip
+                -   type: ToTensor
+                -   type: Normalize
+                    mean: [0.50, 0.5, 0.5]
+                    std: [0.50, 0.5, 0.5]
         val:
             transforms:
-                - type: Resize
-                  size: [224, 224]
-                - type: ToTensor
-                - type: Normalize
-                  mean: [0.50, 0.5, 0.5]
-                  std: [0.50, 0.5, 0.5]
+                -   type: Resize
+                    size: [224, 224]
+                -   type: ToTensor
+                -   type: Normalize
+                    mean: [0.50, 0.5, 0.5]
+                    std: [0.50, 0.5, 0.5]
         test:
             transforms:
-                - type: Resize
-                  size: [224, 224]
-                - type: ToTensor
-                - type: Normalize
-                  mean: [0.50, 0.5, 0.5]
-                  std: [0.50, 0.5, 0.5]
+                -   type: Resize
+                    size: [224, 224]
+                -   type: ToTensor
+                -   type: Normalize
+                    mean: [0.50, 0.5, 0.5]
+                    std: [0.50, 0.5, 0.5]
 
 
 nas:
@@ -92,15 +92,15 @@ nas:
             min_sample: 10
     search_space:
         hyperparameters:
-            - key: network.backbone.depth
-              type: CATEGORY
-              range: [50]
-            - key: network.backbone.doublechannel
-              type: CATEGORY
-              range: [3, 4]
-            - key: network.backbone.downsample
-              type: CATEGORY
-              range: [3, 4]
+            -   key: network.backbone.depth
+                type: CATEGORY
+                range: [50]
+            -   key: network.backbone.doublechannel
+                type: CATEGORY
+                range: [3, 4]
+            -   key: network.backbone.downsample
+                type: CATEGORY
+                range: [3, 4]
     model:
         model_desc:
             modules: ['backbone']
diff --git a/examples/nas/modnas/ps.yml b/examples/nas/modnas/ps.yml
index 55191b6d..4dbc14e2 100644
--- a/examples/nas/modnas/ps.yml
+++ b/examples/nas/modnas/ps.yml
@@ -71,6 +71,7 @@ nas:
                                     -   type: DefaultTorchCheckpointLoader
                                         args:
                                             path: '{local_worker_path}/exp/default/chkpt/model_teacher_best.pt'
+                                crit_conf: CrossEntropySoftTargetLoss
                     args:
                         stages:
                             -   sequential: [1.0, 0.75, 0.5]
diff --git a/examples/nas/segmentation_ea/segmentation_ea.yml b/examples/nas/segmentation_ea/segmentation_ea.yml
index 6ba82ccf..7e38a6aa 100644
--- a/examples/nas/segmentation_ea/segmentation_ea.yml
+++ b/examples/nas/segmentation_ea/segmentation_ea.yml
@@ -67,7 +67,6 @@ nas:
         type: Trainer
         callbacks: SegmentationEATrainerCallback
         cuda: true
-        distributed: false
         seed: 1
         epochs: 5
         call_metrics_on_train: false
diff --git a/examples/nas/sp_nas/spnas_md.yml b/examples/nas/sp_nas/spnas_md.yml
new file mode 100644
index 00000000..030e7dbc
--- /dev/null
+++ b/examples/nas/sp_nas/spnas_md.yml
@@ -0,0 +1,133 @@
+pipeline: [serial]
+
+serial:
+    pipe_step:
+        type: SearchPipeStep
+
+    search_algorithm:
+        type: SpNasS
+        max_sample: 1
+        objective_keys: AP50
+
+    search_space:
+        type: SearchSpace
+        hyperparameters:
+            -   key: network.backbone.code
+                type: CATEGORY
+                range: ['111-2111-211111-211']
+
+    model:
+        model_desc:
+            type: Faster_Rcnn_MD
+
+    trainer:
+        type: SpNasTrainerCallback
+        epochs: 6
+        get_train_metric_after_epoch: False
+        model_statistics: False
+        is_detection_trainer: True
+        perfs_cmp_key: AP50
+        optimizer:
+            type: SGD
+            params:
+                lr: 0.03
+                momentum: 0.9
+                weight_decay: !!float 1e-4
+        lr_scheduler:
+            type: WarmupScheduler
+            by_epoch: False
+            params:
+                warmup_type: linear
+                warmup_iters: 2000
+                warmup_ratio: 0.001
+                after_scheduler_config:
+                    type: MultiStepLR
+                    by_epoch: True
+                    params:
+                        milestones:  [10, 20]
+                        gamma: 0.1
+        loss:
+            type: SumLoss
+        metric:
+            type: coco
+            params:
+                anno_path: /cache/datasets/mini_COCO2017/annotations/instances_val2017.json
+
+    dataset:
+        type: CocoDataset
+        common:
+            batch_size: 2
+            num_parallel_workers: 8
+            flip_ratio: 0.5
+            expand_ratio: 1.0
+            img_width: 1280
+            img_height: 768
+            keep_ratio: True
+            device_id: 0
+            device_num: 1
+            rank_id: 0
+            python_multiprocessing: True
+            coco_root: "/cache/datasets/COCO2017"
+            mindrecord_dir: "/cache/MindRecord_COCO_TRAIN"
+            instance_set: "annotations/instances_{}.json"
+            coco_classes: ['background', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
+                           'train', 'truck', 'boat', 'traffic light', 'fire hydrant',
+                           'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog',
+                           'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra',
+                           'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',
+                           'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
+                           'kite', 'baseball bat', 'baseball glove', 'skateboard',
+                           'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',
+                           'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
+                           'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
+                           'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',
+                           'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
+                           'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
+                           'refrigerator', 'book', 'clock', 'vase', 'scissors',
+                           'teddy bear', 'hair drier', 'toothbrush']
+            num_classes: 81
+        train:
+            train_data_type: "train2017"
+        val:
+            val_data_type: "val2017"
+            test_batch_size: 2
+
+
+parallel:
+    pipe_step:
+        type: SearchPipeStep
+        models_folder: "{local_base_path}/output/serial/"
+        pretrained_folder: "{local_base_path}/output/serial/"
+
+    search_algorithm:
+        type: SpNasP
+
+    search_space:
+        type: SearchSpace
+        hyperparameters:
+            -   key: network.neck.code
+                type: CATEGORY
+                range: [[0, 1, 2, 3]]
+
+    model:
+        model_desc:
+            type: Faster_Rcnn_MD
+
+    trainer:
+        ref: serial.trainer
+
+    dataset:
+        ref: serial.dataset
+
+fullytrain:
+    pipe_step:
+        type: TrainPipeStep
+        models_folder: "{local_base_path}/output/parallel/"
+        pretrained_folder: "{local_base_path}/output/parallel/"
+
+    trainer:
+        ref: serial.trainer
+        epochs: 24
+
+    dataset:
+        ref: serial.dataset
diff --git a/examples/nlp/bert_md.yml b/examples/nlp/bert_md.yml
new file mode 100644
index 00000000..4a30a6af
--- /dev/null
+++ b/examples/nlp/bert_md.yml
@@ -0,0 +1,19 @@
+general:
+    backend: mindspore   #pytorch | tensorflow | mindspore
+    device_category: NPU
+    dft: True
+
+pipeline: [fully_train]
+
+
+fully_train:
+    pipe_step:
+        type: TrainPipeStep  # distributed: HcclTrainStep
+    model:
+        model_desc:
+            modules: ['bert']
+            bert:
+                type: Bert
+    trainer:
+        type: BertTrainerCallback
+        epochs: 40
diff --git a/setup.py b/setup.py
index f6f70333..24e1638f 100644
--- a/setup.py
+++ b/setup.py
@@ -23,7 +23,7 @@
 
 setuptools.setup(
     name="noah-vega",
-    version="1.6.1",
+    version="1.7.0",
     packages=["vega", "evaluate_service"],
     include_package_data=True,
     python_requires=">=3.6",
diff --git a/vega/__init__.py b/vega/__init__.py
index 5cf9395f..4c036092 100644
--- a/vega/__init__.py
+++ b/vega/__init__.py
@@ -1,4 +1,4 @@
-__version__ = "1.6.1"
+__version__ = "1.7.0"
 
 
 import sys
diff --git a/vega/algorithms/__init__.py b/vega/algorithms/__init__.py
index dcdb7fd5..8cf4cfdf 100644
--- a/vega/algorithms/__init__.py
+++ b/vega/algorithms/__init__.py
@@ -3,3 +3,5 @@
 from .data_augmentation import *  # noqa: F401, F403
 from .compression import *  # noqa: F401, F403
 from .auto_loss import *    # noqa: F401, F403
+from .fully_train import *
+from .nlp import *
diff --git a/vega/algorithms/compression/__init__.py b/vega/algorithms/compression/__init__.py
index d30e738b..51f4b69c 100644
--- a/vega/algorithms/compression/__init__.py
+++ b/vega/algorithms/compression/__init__.py
@@ -12,9 +12,9 @@
 
 from vega.common.class_factory import ClassFactory
 
-
 ClassFactory.lazy_register("vega.algorithms.compression", {
     "prune_ea": ["PruneCodec", "PruneEA", "PruneSearchSpace", "PruneTrainerCallback"],
     "prune_ea_mobilenet": ["PruneMobilenetCodec", "PruneMobilenetTrainerCallback"],
     "quant_ea": ["QuantCodec", "QuantEA", "QuantTrainerCallback"],
+    "prune_dag": ["PruneDAGSearchSpace", "AdaptiveBatchNormalizationCallback", "SCOPDAGSearchSpace"],
 })
diff --git a/vega/algorithms/compression/prune_dag/__init__.py b/vega/algorithms/compression/prune_dag/__init__.py
new file mode 100644
index 00000000..6461d414
--- /dev/null
+++ b/vega/algorithms/compression/prune_dag/__init__.py
@@ -0,0 +1,3 @@
+from .prune_dag import PruneDAGSearchSpace, AdaptiveBatchNormalizationCallback, SCOPDAGSearchSpace
+
+__all__ = ["PruneDAGSearchSpace", "AdaptiveBatchNormalizationCallback", "SCOPDAGSearchSpace"]
diff --git a/vega/algorithms/compression/prune_dag/dag_relations.py b/vega/algorithms/compression/prune_dag/dag_relations.py
new file mode 100644
index 00000000..7b9574aa
--- /dev/null
+++ b/vega/algorithms/compression/prune_dag/dag_relations.py
@@ -0,0 +1,126 @@
+# -*- coding: utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+"""Dag relations."""
+import vega
+
+if vega.is_torch_backend():
+    import torch
+    from torch.nn import Conv2d, Linear
+elif vega.is_ms_backend():
+    from mindspore.nn import Conv2d, Dense as Linear
+
+
+def is_conv2d(module):
+    """Determine Conv2d."""
+    # depth-wise convolution not in pruned search space.
+    return isinstance(module, Conv2d) and not is_depth_wise_conv(module)
+
+
+def is_depth_wise_conv(module):
+    """Determine Conv2d."""
+    if hasattr(module, "groups"):
+        return module.groups != 1 and module.in_channels == module.out_channels
+    elif hasattr(module, "group"):
+        return module.group != 1 and module.in_channels == module.out_channels
+
+
+def is_connection_node(node):
+    """Determine is connection node."""
+    return node.is_operator_conn_module or len(node.child_nodes) > 1 or node.module_type == 'torch_func_cat'
+
+
+def reset_c_out_node(node):
+    """Determine is connection node."""
+    if isinstance(node.module, Linear):
+        return None
+    else:
+        return node.c_out
+
+
+def check_and_export_model(pruned_model, dummy_input):
+    """Check and export model to onnx file."""
+    dummy_input = dummy_input or torch.ones(1, 3, 224, 224)
+    torch.onnx.export(pruned_model, dummy_input, "pruned.onnx")
+
+
+def sub_blocks_relation_search(in_node, c_out=None):
+    """Search relations of blocks."""
+    nodes_in_block = []
+    c_nodes = [in_node]
+    while c_nodes:
+        node = c_nodes.pop()
+        nodes_in_block.append(node)
+        if isinstance(node.module, Conv2d):
+            continue
+        for parent_node in node.parent_nodes:
+            if is_connection_node(parent_node):
+                c_out = parent_node.c_out
+            else:
+                c_nodes.append(parent_node)
+    for node in nodes_in_block:
+        if not isinstance(node.module, Conv2d):
+            node.c_in = c_out
+        node.c_out = c_out
+    return nodes_in_block
+
+
+def sub_cat_relation_search(in_node, c_out=None):
+    """Search relations of blocks."""
+    nodes_in_block = []
+    c_nodes = [in_node]
+    while c_nodes:
+        node = c_nodes.pop()
+        nodes_in_block.append(node)
+        for parent_node in node.parent_nodes:
+            if not is_connection_node(parent_node):
+                c_nodes.append(parent_node)
+    if vega.is_torch_backend():
+        for node in nodes_in_block:
+            if is_conv2d(node.module):
+                break
+            if not isinstance(node.module, Conv2d):
+                node.c_in = c_out
+            node.c_out = c_out
+    elif vega.is_ms_backend():
+        for node in nodes_in_block[1:]:
+            if node.child_nodes:
+                node.c_out = node.child_nodes[0].c_in
+            if not isinstance(node.module, Conv2d):
+                node.c_in = node.c_out
+    return nodes_in_block
+
+
+def node_relations_search(model, desc):
+    """Search relations of dag node."""
+    for name, node in model.named_nodes():
+        c_out = desc.get(node.name + '.out_channels')
+        if c_out and not node.c_out:
+            node.c_out = c_out
+        else:
+            for parent_node in node.parent_nodes:
+                if node.module_type == 'torch_func_cat':
+                    cat_c_outs = []
+                    for n in node.parent_nodes:
+                        cat_c_outs.extend(n.c_out)
+                    node.c_out = cat_c_outs
+                else:
+                    node.c_out = min(parent_node.c_out, node.c_out) if node.c_out else parent_node.c_out
+                if is_connection_node(parent_node):
+                    break
+        node.c_out = reset_c_out_node(node)
+        for child_node in node.child_nodes:
+            if not child_node.c_in:
+                child_node.c_in = node.c_out
+        if is_connection_node(node):
+            if node.module_type == 'torch_func_cat':
+                sub_cat_relation_search(node, node.c_out)
+            else:
+                sub_blocks_relation_search(node, node.c_out)
+    return model
diff --git a/vega/algorithms/compression/prune_dag/prune_dag.py b/vega/algorithms/compression/prune_dag/prune_dag.py
new file mode 100644
index 00000000..d42776c2
--- /dev/null
+++ b/vega/algorithms/compression/prune_dag/prune_dag.py
@@ -0,0 +1,112 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+"""This is Operator SearchSpace."""
+import copy
+import logging
+import vega
+from vega.common import ClassFactory, ClassType
+from vega.trainer.callbacks import Callback
+from vega.core.search_space import SearchSpace
+from vega.core.pipeline.conf import PipeStepConfig
+from .prune_events import prune_dag_model
+from .dag_relations import node_relations_search, is_conv2d
+from vega.model_zoo import ModelZoo
+from vega.common.parameter_sharing import ParameterSharing
+
+
+@ClassFactory.register(ClassType.SEARCHSPACE)
+class PruneDAGSearchSpace(SearchSpace):
+    """Prune SearchSpace."""
+
+    @classmethod
+    def get_space(self, desc):
+        """Get model and input."""
+        self.model = ModelZoo().get_model(PipeStepConfig.model.model_desc, PipeStepConfig.model.pretrained_model_file)
+        arch_params_key = '{}.out_channels'
+        search_space = [dict(key=arch_params_key.format(name), type="HALF", range=[module.out_channels])
+                        for name, module in self.model.named_modules() if is_conv2d(module)]
+        return {"hyperparameters": search_space}
+
+    @classmethod
+    def to_desc(self, desc):
+        """Decode to model desc."""
+        pruned_model = copy.deepcopy(self.model)
+        node_relations_search(pruned_model, desc)
+        prune_dag_model(pruned_model)
+        PipeStepConfig.model.pretrained_model_file = ParameterSharing().push(pruned_model, 'pruned_weights')
+        return pruned_model.to_desc()
+
+
+@ClassFactory.register(ClassType.SEARCHSPACE)
+class SCOPDAGSearchSpace(SearchSpace):
+    """SCOP DAG SearchSpace."""
+
+    @classmethod
+    def get_space(self, desc):
+        """Get model and input."""
+        self.model = ModelZoo().get_model(PipeStepConfig.model.model_desc, PipeStepConfig.model.pretrained_model_file)
+        if not desc.get("hyperparameters"):
+            raise ValueError("hyperparameters should be config in SCOPDAGSearchSpace.")
+        search_space = []
+        for item in desc.get("hyperparameters"):
+            arch_params_key = "{}." + item.get("key")
+            arch_type = item.get("type")
+            arch_type_range = item.get("range")
+            search_space.extend([dict(key=arch_params_key.format(name), type=arch_type, range=arch_type_range)
+                                 for name, module in self.model.named_modules() if is_conv2d(module)])
+        return {"hyperparameters": search_space}
+
+    @classmethod
+    def to_desc(self, desc):
+        """Decode to model desc."""
+        pruned_model = copy.deepcopy(self.model)
+        desc = self._decode_fn(pruned_model, desc)
+        node_relations_search(pruned_model, desc)
+        prune_dag_model(pruned_model)
+        PipeStepConfig.model.pretrained_model_file = ParameterSharing().push(pruned_model, 'pruned_weights')
+        return pruned_model.to_desc()
+
+    @classmethod
+    def _decode_fn(self, model, desc):
+        mask_code_desc = {}
+        for name, rate in desc.items():
+            node_name = '.'.join(name.split('.')[:-1])
+            arch_type = name.split('.')[-1]
+            if node_name not in model.module_map:
+                continue
+            node_channels = model.module_map[node_name].module.out_channels
+            if arch_type == 'prune_d_rate':
+                select_idx = round(node_channels * rate / 100 / 16) * 16
+                select_idx = select_idx if select_idx > 16 else node_channels
+            else:
+                select_idx = node_channels * rate // 100
+            idx_code = [1 if idx < select_idx else 0 for idx in range(node_channels)]
+            mask_code_desc[node_name + '.out_channels'] = idx_code
+        return mask_code_desc
+
+
+@ClassFactory.register(ClassType.CALLBACK)
+class AdaptiveBatchNormalizationCallback(Callback):
+    """Adaptive Batch Normalization."""
+
+    def before_train(self, logs=None):
+        """Freeze Conv2D and BatchNorm."""
+        if not vega.is_torch_backend():
+            return
+        import torch
+        for name, module in self.trainer.model.named_modules():
+            if isinstance(module, torch.nn.Conv2d):
+                for name, parameter in module.named_parameters():
+                    parameter.requires_grad_(False)
+            elif isinstance(module, torch.nn.BatchNorm2d):
+                module.weight.requires_grad = False
+                module.bias.requires_grad = False
+        learnable_params = [param for param in self.trainer.model.parameters() if param.requires_grad]
+        logging.info("Adaptive BatchNormalization learnable params size: {}".format(len(learnable_params)))
diff --git a/vega/algorithms/compression/prune_dag/prune_events.py b/vega/algorithms/compression/prune_dag/prune_events.py
new file mode 100644
index 00000000..e91b13c0
--- /dev/null
+++ b/vega/algorithms/compression/prune_dag/prune_events.py
@@ -0,0 +1,102 @@
+# -*- coding: utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""Prune DAG model."""
+import vega
+
+if vega.is_torch_backend():
+    import torch
+
+
+def prune_conv2d_out_channels(module, value):
+    """Prune out channels of Conv2d."""
+    assert sum(value) == module.out_channels
+    out_channels_idx = [idx for idx, value in enumerate(value) if value == 1]
+    for name, weight in module._parameters.items():
+        if weight is None:
+            continue
+        if name == 'weight':
+            module.weight.data = weight[out_channels_idx, :, :, :]
+        elif name == 'bias':
+            module.bias.data = weight[out_channels_idx]
+
+
+def prune_conv2d_in_channels(module, value):
+    """Prune in channels of conv2d."""
+    assert sum(value) == module.in_channels
+    in_channels_idx = [idx for idx, value in enumerate(value) if value == 1]
+    for name, weight in module._parameters.items():
+        if weight is None or name != 'weight':
+            continue
+        if hasattr(module, "groups") and module.groups != 1:
+            # group and depth-wise convolution
+            # todo: not working on BINARY_CODE mode, mask code must be divisible by weight
+            module.groups = module.in_channels // weight.shape[1]
+        else:
+            prune_weight = weight[:, in_channels_idx, :, :]
+            module.weight.data = prune_weight
+
+
+def prune_linear(module, value):
+    """Prune linear."""
+    if sum(value) == module.in_features:
+        idx_in = [idx for idx, value in enumerate(value) if value == 1]
+    else:
+        idx_in = [idx for idx, value in enumerate([1] * module.in_features)]
+    module.weight.data = module.weight.data[:, idx_in]
+
+
+def prune_batch_norm(module, value):
+    """Prune Batch Norm."""
+    assert sum(value) == module.num_features
+    idx = [idx for idx, value in enumerate(value) if value == 1]
+    weights = {**module._parameters, **module._buffers}
+    if 'num_batches_tracked' in weights:
+        weights.pop('num_batches_tracked')
+    for name, weight in weights.items():
+        prune_weight = weight[idx]
+        if name == 'running_mean':
+            module.running_mean.data = prune_weight
+        elif name == 'running_var':
+            module.running_var.data = prune_weight
+        elif name == 'weight':
+            module.weight.data = prune_weight
+        elif name == 'bias':
+            module.bias.data = prune_weight
+
+
+def prune_dag_model(model):
+    """Prune Dag model."""
+    for name, node in model.named_nodes():
+        if isinstance(node.module, torch.nn.Conv2d):
+            if node.c_in:
+                node.module.in_channels = sum(node.c_in)
+                prune_conv2d_in_channels(node.module, node.c_in)
+            if node.c_out:
+                node.module.out_channels = sum(node.c_out)
+                prune_conv2d_out_channels(node.module, node.c_out)
+        elif isinstance(node.module, torch.nn.BatchNorm2d):
+            if node.c_in:
+                node.module.num_features = sum(node.c_in)
+                node.c_out = node.c_in
+                prune_batch_norm(node.module, node.c_in)
+        elif isinstance(node.module, torch.nn.Linear):
+            if node.c_in:
+                if sum(node.c_in) == len(node.c_in):
+                    continue
+                if node.module.in_features == len(node.c_in):
+                    node.module.in_features = sum(node.c_in)
+                else:
+                    node.module.in_features = node.module.in_features // len(node.c_in) * sum(node.c_in)
+                prune_linear(node.module, node.c_in)
+        elif node.module_type == 'torch_tensor_view':
+            if node.c_in and len(node.c_in) != sum(node.c_in) and node.saved_args and len(node.saved_args) > 1:
+                node.saved_args = tuple([node.saved_args[0], node.saved_args[1] // len(node.c_in) * sum(node.c_in)])
+    return model
diff --git a/vega/algorithms/compression/prune_ea/prune_search_space.py b/vega/algorithms/compression/prune_ea/prune_search_space.py
index aea1b8c7..907bcb62 100644
--- a/vega/algorithms/compression/prune_ea/prune_search_space.py
+++ b/vega/algorithms/compression/prune_ea/prune_search_space.py
@@ -9,6 +9,7 @@
 # MIT License for more details.
 
 """Check and Define Prune Model SearchSpace."""
+
 import logging
 from vega.common import ClassFactory, ClassType
 from vega.core.search_space import SearchSpace
@@ -17,6 +18,14 @@
 from vega.modules.operators import ops
 
 
+def is_depth_wise_conv(module):
+    """Determine Conv2d."""
+    if hasattr(module, "groups"):
+        return module.groups != 1 and module.in_channels == module.out_channels
+    elif hasattr(module, "group"):
+        return module.group != 1 and module.in_channels == module.out_channels
+
+
 @ClassFactory.register(ClassType.SEARCHSPACE)
 class PruneSearchSpace(SearchSpace):
     """Prune SearchSpace."""
@@ -26,7 +35,8 @@ def get_space(self, desc):
         """Get model and input."""
         model = NetworkDesc(PipeStepConfig.model.model_desc).to_model()
         arch_params_key = 'network._arch_params.Prune.{}.out_channels'
-        search_space = [dict(key=arch_params_key.format(module.name), type="BINARY_CODE", range=[module.out_channels])
-                        for name, module in model.named_modules() if isinstance(module, ops.Conv2d)]
+        search_space = [dict(key=arch_params_key.format(module.name), type="HALF", range=[module.out_channels])
+                        for name, module in model.named_modules() if
+                        isinstance(module, ops.Conv2d) and not is_depth_wise_conv(module)]
         logging.info("Prune Nas Search Space: {}".format(search_space))
         return {"hyperparameters": search_space}
diff --git a/vega/algorithms/compression/prune_ea/prune_trainer_callback.py b/vega/algorithms/compression/prune_ea/prune_trainer_callback.py
index ba701f43..64baf28a 100644
--- a/vega/algorithms/compression/prune_ea/prune_trainer_callback.py
+++ b/vega/algorithms/compression/prune_ea/prune_trainer_callback.py
@@ -66,7 +66,7 @@ def before_train(self, logs=None):
                      self.flops_count * 1e-6, self.params_count * 1e-3, self.latency_count * 1000)
         self.trainer.model = self._generate_init_model()
         if vega.is_torch_backend():
-            self.trainer.optimizer = Optimizer()(model=self.trainer.model, distributed=self.trainer.distributed)
+            self.trainer.optimizer = Optimizer()(model=self.trainer.model, distributed=self.trainer.horovod)
             self.trainer.lr_scheduler = LrScheduler()(self.trainer.optimizer)
 
     def after_epoch(self, epoch, logs=None):
diff --git a/vega/algorithms/compression/quant_ea/quant_trainer_callback.py b/vega/algorithms/compression/quant_ea/quant_trainer_callback.py
index 7a3ef32b..afb22679 100644
--- a/vega/algorithms/compression/quant_ea/quant_trainer_callback.py
+++ b/vega/algorithms/compression/quant_ea/quant_trainer_callback.py
@@ -57,7 +57,7 @@ def before_train(self, logs=None):
             elif vega.is_npu_device():
                 model = model.to(vega.get_devices())
                 count_input = torch.FloatTensor(*count_input).to(vega.get_devices())
-            self.trainer.optimizer = Optimizer()(model=self.trainer.model, distributed=self.trainer.distributed)
+            self.trainer.optimizer = Optimizer()(model=self.trainer.model, distributed=self.trainer.horovod)
             self.trainer.lr_scheduler = LrScheduler()(self.trainer.optimizer)
         elif vega.is_tf_backend():
             tf.compat.v1.reset_default_graph()
diff --git a/vega/algorithms/data_augmentation/cyclesr/cyclesr_trainer_callback.py b/vega/algorithms/data_augmentation/cyclesr/cyclesr_trainer_callback.py
index 797c55a3..8a4f187e 100644
--- a/vega/algorithms/data_augmentation/cyclesr/cyclesr_trainer_callback.py
+++ b/vega/algorithms/data_augmentation/cyclesr/cyclesr_trainer_callback.py
@@ -69,7 +69,7 @@ def _init_dataloader(self, mode):
         :rtype: tuple of torch.utils.data.Dataset
         """
         dataset = Dataset(mode=mode)
-        if self.cfg.distributed:
+        if self.trainer.horovod:
             sampler = torch.utils.data.distributed.DistributedSampler(
                 dataset, num_replicas=hvd.size(), rank=hvd.rank())
             dataset.sampler = sampler
@@ -87,7 +87,7 @@ def _init_model(self):
             _file = FileOps.join_path(self.worker_path, "model_desc_{}.json".format(self._worker_id))
             with open(_file, "w") as f:
                 json.dump(self.cfg.model_desc, f)
-            if self.cfg.distributed:
+            if self.trainer.horovod:
                 hvd.join()
             model_desc = self.cfg.model_desc
             net_desc = NetworkDesc(model_desc)
@@ -292,7 +292,7 @@ def _train_loop(self):
         if not vega.is_cpu_device():
             self.trainer._init_setting()
         self.model = self._init_model()
-        if self.cfg.distributed:
+        if self.trainer.horovod:
             self._horovod_init_optimizer()
             self._init_horovod_setting()
         self.train_data = self._init_dataloader('train')
diff --git a/vega/algorithms/data_augmentation/pba_trainer_callback.py b/vega/algorithms/data_augmentation/pba_trainer_callback.py
index aaf417e0..c5f0ec45 100644
--- a/vega/algorithms/data_augmentation/pba_trainer_callback.py
+++ b/vega/algorithms/data_augmentation/pba_trainer_callback.py
@@ -20,9 +20,8 @@ class PbaTrainerCallback(Callback):
 
     def before_train(self, logs=None):
         """Be called before the training process."""
-        self.epochs = self.trainer.epochs
         self.transforms = self.trainer.hps.dataset.transforms
-        self.transform_interval = self.epochs // len(self.transforms[0]['all_para'].keys())
+        self.transform_interval = self.trainer.epochs // len(self.transforms[0]['all_para'].keys())
         self.hps = self.trainer.hps
 
     def before_epoch(self, epoch, logs=None):
diff --git a/vega/algorithms/fully_train/__init__.py b/vega/algorithms/fully_train/__init__.py
new file mode 100644
index 00000000..3ccc792c
--- /dev/null
+++ b/vega/algorithms/fully_train/__init__.py
@@ -0,0 +1,6 @@
+from vega.common.class_factory import ClassFactory
+
+
+ClassFactory.lazy_register("vega.algorithms.fully_train", {
+    "resnet.resnet_trainer_callback": ["trainer:ResnetTrainer"]
+})
diff --git a/vega/algorithms/fully_train/resnet/__init__.py b/vega/algorithms/fully_train/resnet/__init__.py
new file mode 100644
index 00000000..45716d46
--- /dev/null
+++ b/vega/algorithms/fully_train/resnet/__init__.py
@@ -0,0 +1 @@
+from .resnet_trainer_callback import ResnetTrainer
diff --git a/vega/algorithms/fully_train/resnet/resnet_trainer_callback.py b/vega/algorithms/fully_train/resnet/resnet_trainer_callback.py
new file mode 100644
index 00000000..6c549ba7
--- /dev/null
+++ b/vega/algorithms/fully_train/resnet/resnet_trainer_callback.py
@@ -0,0 +1,166 @@
+# -*- coding: utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""Resnet Trainer."""
+
+import os
+from mindspore import context
+from mindspore import Tensor
+from mindspore.train import Model as MsModel
+from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor
+from mindspore.parallel import set_algo_parameters
+import vega
+from vega.trainer.trainer_base import TrainerBase
+from vega.common import ClassFactory, ClassType
+import logging
+from mindspore.communication.management import init as hccl_init
+from mindspore.context import ParallelMode
+from .src.resnet import resnet50 as resnet
+from .src.dataset import create_dataset2 as create_dataset
+from .src.CrossEntropySmooth import CrossEntropySmooth
+from .src.lr_generator import get_lr
+from mindspore.nn.optim import Momentum
+import mindspore.nn as nn
+import mindspore.common.initializer as weight_init
+from vega.datasets.conf.dataset import DatasetConfig
+from vega.trainer.callbacks.ms_callbacks import EvalCallBack
+from vega.common.general import General
+
+
+def init_weight(net):
+    """Initialize weight."""
+    for _, cell in net.cells_and_names():
+        if isinstance(cell, nn.Conv2d):
+            cell.weight.set_data(weight_init.initializer(weight_init.XavierUniform(),
+                                                         cell.weight.shape,
+                                                         cell.weight.dtype))
+        if isinstance(cell, nn.Dense):
+            cell.weight.set_data(weight_init.initializer(weight_init.TruncatedNormal(),
+                                                         cell.weight.shape,
+                                                         cell.weight.dtype))
+
+
+def init_group_prams(net):
+    """Initialize group_prams."""
+    decayed_params = []
+    no_decayed_params = []
+    for param in net.trainable_params():
+        if 'beta' not in param.name and 'gamma' not in param.name and 'bias' not in param.name:
+            decayed_params.append(param)
+        else:
+            no_decayed_params.append(param)
+
+    group_params = [{'params': decayed_params, 'weight_decay': 0.0001},
+                    {'params': no_decayed_params},
+                    {'order_params': net.trainable_params()}]
+    return group_params
+
+
+@ClassFactory.register(ClassType.TRAINER)
+class ResnetTrainer(TrainerBase):
+    """Trainer mindspore class."""
+
+    def build(self):
+        """Build the trainer by assembling the necessary components."""
+        logging.debug("Trainer Config: {}".format(self.config))
+        self._init_hps()
+        self.do_validation = False
+        self.use_syncbn = self.config.syncbn
+        if self.use_syncbn and vega.is_torch_backend():
+            import apex
+            self.model = apex.parallel.convert_syncbn_model(self.model)
+        if not self.train_loader:
+            self.train_loader = self._init_dataloader(mode='train')
+        if not self.valid_loader:
+            self.valid_loader = self._init_dataloader(mode='val')
+        self.batch_num_train = len(self.train_loader)
+        self.batch_num_valid = len(self.valid_loader)
+        logging.debug("Trainer Config: {}".format(self.config))
+        config = DatasetConfig().to_dict()
+        self.train_config = config['_class_data'].train
+        self.valid_config = config['_class_data'].val
+        self.loss = CrossEntropySmooth(sparse=self.config.loss.params.sparse,
+                                       reduction=self.config.loss.params.reduction,
+                                       smooth_factor=self.config.loss.params.smooth_factor,
+                                       num_classes=self.train_config.n_class)
+        self.metric_name = self.config.metric.type
+
+        self.train_metrics = None
+        self.valid_metrics = self._init_metrics()
+        self.ms_metrics = self.valid_metrics() if isinstance(self.valid_metrics(), dict) else {
+            self.metric_name: self.valid_metrics()}
+
+        self.net = resnet(class_num=self.train_config.n_class)
+        init_weight(net=self.net)
+        from mindspore.train.loss_scale_manager import FixedLossScaleManager
+        self.loss_scale = FixedLossScaleManager(self.config.loss_scale, drop_overflow_update=False)
+
+    def init_env(self):
+        """Construct the trainer of Resnet."""
+        super().init_env()
+        self._init_ms_context()
+        self._init_distributed_setting()
+
+    def _train_epoch(self):
+        """Construct the trainer of Resnet."""
+        try:
+            dataset = create_dataset(dataset_path=self.train_config.data_path + '/train', do_train=True,
+                                     repeat_num=1, batch_size=self.train_config.batch_size, target='Ascend',
+                                     distribute=True)
+            step_size = dataset.get_dataset_size()
+
+            lr = Tensor(
+                get_lr(lr_init=self.config.lr_scheduler.params.lr_init, lr_end=self.config.lr_scheduler.params.lr_end,
+                       lr_max=self.config.lr_scheduler.params.lr_max,
+                       warmup_epochs=0, total_epochs=self.config.epochs, steps_per_epoch=step_size,
+                       lr_decay_mode=self.config.lr_scheduler.params.lr_decay_mode))
+            group_params = init_group_prams(self.net)
+            opt = Momentum(group_params, lr, self.config.optimizer.params.momentum, loss_scale=self.config.loss_scale)
+
+            self.ms_model = MsModel(network=self.net,
+                                    loss_fn=self.loss,
+                                    optimizer=opt,
+                                    loss_scale_manager=self.loss_scale,
+                                    amp_level="O2", keep_batchnorm_fp32=False,
+                                    acc_level="O0",
+                                    metrics=self.ms_metrics)
+            config_ck = CheckpointConfig(save_checkpoint_steps=self.config.save_steps, keep_checkpoint_max=1)
+            save_path = self.get_local_worker_path(self.step_name, self.worker_id)
+            ckpoint_cb = ModelCheckpoint(config=config_ck, directory=save_path)
+            loss_cb = LossMonitor()
+            self.valid_loader = create_dataset(dataset_path=self.valid_config.data_path + '/val', do_train=False,
+                                               batch_size=self.valid_config.batch_size,
+                                               target='Ascend')
+            eval_cb = EvalCallBack(self.ms_model, self.valid_loader, self.dataset_sink_mode, self)
+            callback_list = [ckpoint_cb, loss_cb, eval_cb]
+
+            self.ms_model.train(epoch=self.epochs,
+                                train_dataset=dataset,
+                                callbacks=callback_list,
+                                dataset_sink_mode=False)
+        except RuntimeError as e:
+            logging.warning(f"failed to train the model, skip it, message: {str(e)}")
+
+    def _init_distributed_setting(self):
+        """Construct the trainer of Resnet."""
+        context.set_auto_parallel_context(parallel_mode=ParallelMode.DATA_PARALLEL, gradients_mean=True)
+        set_algo_parameters(elementwise_op_strategy_follow=True)
+        context.set_auto_parallel_context(all_reduce_fusion_config=self.config.all_reduce_fusion_config)
+
+    def _init_ms_context(self):
+        mode = General.ms_execute_mode
+        logging.info(f"Run train/val in mode: {mode}.")
+        if vega.is_npu_device():
+            context.set_context(mode=mode, device_target="Ascend", device_id=int(os.environ["DEVICE_ID"]))
+        else:
+            context.set_context(mode=mode, device_target="CPU")
+
+        self.dataset_sink_mode = General.dataset_sink_mode
+        logging.info(f"Dataset_sink_mode:{self.dataset_sink_mode}.")
diff --git a/vega/algorithms/fully_train/resnet/src/CrossEntropySmooth.py b/vega/algorithms/fully_train/resnet/src/CrossEntropySmooth.py
new file mode 100644
index 00000000..372ea920
--- /dev/null
+++ b/vega/algorithms/fully_train/resnet/src/CrossEntropySmooth.py
@@ -0,0 +1,35 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+"""Define loss function for network."""
+import mindspore.nn as nn
+from mindspore import Tensor
+from mindspore.common import dtype as mstype
+from mindspore.nn.loss.loss import _Loss as LossBase
+from mindspore.ops import functional as F
+from mindspore.ops import operations as P
+
+
+class CrossEntropySmooth(LossBase):
+    """CrossEntropy."""
+
+    def __init__(self, sparse=True, reduction='mean', smooth_factor=0., num_classes=1000):
+        super(CrossEntropySmooth, self).__init__()
+        self.onehot = P.OneHot()
+        self.sparse = sparse
+        self.on_value = Tensor(1.0 - smooth_factor, mstype.float32)
+        self.off_value = Tensor(1.0 * smooth_factor / (num_classes - 1), mstype.float32)
+        self.ce = nn.SoftmaxCrossEntropyWithLogits(reduction=reduction)
+
+    def construct(self, logit, label):
+        """Construct the trainer of Resnet."""
+        if self.sparse:
+            label = self.onehot(label, F.shape(logit)[1], self.on_value, self.off_value)
+        loss = self.ce(logit, label)
+        return loss
diff --git a/vega/algorithms/fully_train/resnet/src/dataset.py b/vega/algorithms/fully_train/resnet/src/dataset.py
new file mode 100644
index 00000000..66201461
--- /dev/null
+++ b/vega/algorithms/fully_train/resnet/src/dataset.py
@@ -0,0 +1,96 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+"""Create train or eval dataset."""
+import os
+import mindspore.common.dtype as mstype
+import mindspore.dataset as ds
+import mindspore.dataset.vision.c_transforms as C
+import mindspore.dataset.transforms.c_transforms as C2
+from mindspore.communication.management import init, get_rank, get_group_size
+
+
+def create_dataset2(dataset_path, do_train, repeat_num=1, batch_size=32, target="Ascend", distribute=False,
+                    enable_cache=False, cache_session_id=None):
+    """
+    Create a train or eval imagenet2012 dataset for resnet50.
+
+    Args:
+        dataset_path(string): the path of dataset.
+        do_train(bool): whether dataset is used for train or eval.
+        repeat_num(int): the repeat times of dataset. Default: 1
+        batch_size(int): the batch size of dataset. Default: 32
+        target(str): the device target. Default: Ascend
+        distribute(bool): data for distribute or not. Default: False
+        enable_cache(bool): whether tensor caching service is used for eval. Default: False
+        cache_session_id(int): If enable_cache, cache session_id need to be provided. Default: None
+
+    Returns:
+        dataset
+    """
+    if target == "Ascend":
+        device_num = int(os.environ.get('RANK_SIZE'))
+        rank_id = int(os.environ.get('RANK_ID'))
+    else:
+        if distribute:
+            init()
+            rank_id = get_rank()
+            device_num = get_group_size()
+        else:
+            device_num = 1
+
+    if device_num == 1:
+        data_set = ds.ImageFolderDataset(dataset_path, num_parallel_workers=12, shuffle=True)
+    else:
+        data_set = ds.ImageFolderDataset(dataset_path, num_parallel_workers=12, shuffle=True,
+                                         num_shards=device_num, shard_id=rank_id)
+
+    image_size = 224
+    mean = [0.485 * 255, 0.456 * 255, 0.406 * 255]
+    std = [0.229 * 255, 0.224 * 255, 0.225 * 255]
+
+    # define map operations
+    if do_train:
+        trans = [
+            C.RandomCropDecodeResize(image_size, scale=(0.08, 1.0), ratio=(0.75, 1.333)),
+            C.RandomHorizontalFlip(prob=0.5),
+            C.Normalize(mean=mean, std=std),
+            C.HWC2CHW()
+        ]
+    else:
+        trans = [
+            C.Decode(),
+            C.Resize(256),
+            C.CenterCrop(image_size),
+            C.Normalize(mean=mean, std=std),
+            C.HWC2CHW()
+        ]
+
+    type_cast_op = C2.TypeCast(mstype.int32)
+
+    data_set = data_set.map(operations=trans, input_columns="image", num_parallel_workers=12)
+    # only enable cache for eval
+    if do_train:
+        enable_cache = False
+    if enable_cache:
+        if not cache_session_id:
+            raise ValueError("A cache session_id must be provided to use cache.")
+        eval_cache = ds.DatasetCache(session_id=int(cache_session_id), size=0)
+        data_set = data_set.map(operations=type_cast_op, input_columns="label", num_parallel_workers=12,
+                                cache=eval_cache)
+    else:
+        data_set = data_set.map(operations=type_cast_op, input_columns="label", num_parallel_workers=12)
+
+    # apply batch operations
+    data_set = data_set.batch(batch_size, drop_remainder=True)
+
+    # apply dataset repeat operation
+    data_set = data_set.repeat(repeat_num)
+
+    return data_set
diff --git a/vega/algorithms/fully_train/resnet/src/lr_generator.py b/vega/algorithms/fully_train/resnet/src/lr_generator.py
new file mode 100644
index 00000000..7e2924c5
--- /dev/null
+++ b/vega/algorithms/fully_train/resnet/src/lr_generator.py
@@ -0,0 +1,235 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+"""Learning rate generator."""
+import math
+import numpy as np
+
+
+def _generate_steps_lr(lr_init, lr_max, total_steps, warmup_steps):
+    """
+    Apply three steps decay to generate learning rate array.
+
+    Args:
+       lr_init(float): init learning rate.
+       lr_max(float): max learning rate.
+       total_steps(int): all steps in training.
+       warmup_steps(int): all steps in warmup epochs.
+
+    Returns:
+       np.array, learning rate array.
+    """
+    decay_epoch_index = [0.3 * total_steps, 0.6 * total_steps, 0.8 * total_steps]
+    lr_each_step = []
+    for i in range(total_steps):
+        if i < warmup_steps:
+            lr = lr_init + (lr_max - lr_init) * i / warmup_steps
+        else:
+            if i < decay_epoch_index[0]:
+                lr = lr_max
+            elif i < decay_epoch_index[1]:
+                lr = lr_max * 0.1
+            elif i < decay_epoch_index[2]:
+                lr = lr_max * 0.01
+            else:
+                lr = lr_max * 0.001
+        lr_each_step.append(lr)
+    return lr_each_step
+
+
+def _generate_poly_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps):
+    """
+    Apply polynomial decay to generate learning rate array.
+
+    Args:
+       lr_init(float): init learning rate.
+       lr_end(float): end learning rate
+       lr_max(float): max learning rate.
+       total_steps(int): all steps in training.
+       warmup_steps(int): all steps in warmup epochs.
+
+    Returns:
+       np.array, learning rate array.
+    """
+    lr_each_step = []
+    if warmup_steps != 0:
+        inc_each_step = (float(lr_max) - float(lr_init)) / float(warmup_steps)
+    else:
+        inc_each_step = 0
+    for i in range(total_steps):
+        if i < warmup_steps:
+            lr = float(lr_init) + inc_each_step * float(i)
+        else:
+            base = (1.0 - (float(i) - float(warmup_steps)) / (float(total_steps) - float(warmup_steps)))
+            lr = float(lr_max) * base * base
+            if lr < 0.0:
+                lr = 0.0
+        lr_each_step.append(lr)
+    return lr_each_step
+
+
+def _generate_cosine_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps):
+    """
+    Apply cosine decay to generate learning rate array.
+
+    Args:
+       lr_init(float): init learning rate.
+       lr_end(float): end learning rate
+       lr_max(float): max learning rate.
+       total_steps(int): all steps in training.
+       warmup_steps(int): all steps in warmup epochs.
+
+    Returns:
+       np.array, learning rate array.
+    """
+    decay_steps = total_steps - warmup_steps
+    lr_each_step = []
+    for i in range(total_steps):
+        if i < warmup_steps:
+            lr_inc = (float(lr_max) - float(lr_init)) / float(warmup_steps)
+            lr = float(lr_init) + lr_inc * (i + 1)
+        else:
+            linear_decay = (total_steps - i) / decay_steps
+            cosine_decay = 0.5 * (1 + math.cos(math.pi * 2 * 0.47 * i / decay_steps))
+            decayed = linear_decay * cosine_decay + 0.00001
+            lr = lr_max * decayed
+        lr_each_step.append(lr)
+    return lr_each_step
+
+
+def _generate_liner_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps):
+    """
+    Apply liner decay to generate learning rate array.
+
+    Args:
+       lr_init(float): init learning rate.
+       lr_end(float): end learning rate
+       lr_max(float): max learning rate.
+       total_steps(int): all steps in training.
+       warmup_steps(int): all steps in warmup epochs.
+
+    Returns:
+       np.array, learning rate array.
+    """
+    lr_each_step = []
+    for i in range(total_steps):
+        if i < warmup_steps:
+            lr = lr_init + (lr_max - lr_init) * i / warmup_steps
+        else:
+            lr = lr_max - (lr_max - lr_end) * (i - warmup_steps) / (total_steps - warmup_steps)
+        lr_each_step.append(lr)
+    return lr_each_step
+
+
+def get_lr(lr_init, lr_end, lr_max, warmup_epochs, total_epochs, steps_per_epoch, lr_decay_mode):
+    """
+    Generate learning rate array.
+
+    Args:
+       lr_init(float): init learning rate
+       lr_end(float): end learning rate
+       lr_max(float): max learning rate
+       warmup_epochs(int): number of warmup epochs
+       total_epochs(int): total epoch of training
+       steps_per_epoch(int): steps of one epoch
+       lr_decay_mode(string): learning rate decay mode, including steps, poly, cosine or liner(default)
+
+    Returns:
+       np.array, learning rate array
+    """
+    lr_each_step = []
+    total_steps = steps_per_epoch * total_epochs
+    warmup_steps = steps_per_epoch * warmup_epochs
+
+    if lr_decay_mode == 'steps':
+        lr_each_step = _generate_steps_lr(lr_init, lr_max, total_steps, warmup_steps)
+    elif lr_decay_mode == 'poly':
+        lr_each_step = _generate_poly_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps)
+    elif lr_decay_mode == 'cosine':
+        lr_each_step = _generate_cosine_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps)
+    else:
+        lr_each_step = _generate_liner_lr(lr_init, lr_end, lr_max, total_steps, warmup_steps)
+
+    lr_each_step = np.array(lr_each_step).astype(np.float32)
+    return lr_each_step
+
+
+def linear_warmup_lr(current_step, warmup_steps, base_lr, init_lr):
+    """Construct the trainer of Resnet."""
+    lr_inc = (float(base_lr) - float(init_lr)) / float(warmup_steps)
+    lr = float(init_lr) + lr_inc * current_step
+    return lr
+
+
+def warmup_cosine_annealing_lr(lr, steps_per_epoch, warmup_epochs, max_epoch=120, global_step=0):
+    """
+    Generate learning rate array with cosine.
+
+    Args:
+       lr(float): base learning rate
+       steps_per_epoch(int): steps size of one epoch
+       warmup_epochs(int): number of warmup epochs
+       max_epoch(int): total epochs of training
+       global_step(int): the current start index of lr array
+    Returns:
+       np.array, learning rate array
+    """
+    base_lr = lr
+    warmup_init_lr = 0
+    total_steps = int(max_epoch * steps_per_epoch)
+    warmup_steps = int(warmup_epochs * steps_per_epoch)
+    decay_steps = total_steps - warmup_steps
+
+    lr_each_step = []
+    for i in range(total_steps):
+        if i < warmup_steps:
+            lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr)
+        else:
+            linear_decay = (total_steps - i) / decay_steps
+            cosine_decay = 0.5 * (1 + math.cos(math.pi * 2 * 0.47 * i / decay_steps))
+            decayed = linear_decay * cosine_decay + 0.00001
+            lr = base_lr * decayed
+        lr_each_step.append(lr)
+
+    lr_each_step = np.array(lr_each_step).astype(np.float32)
+    learning_rate = lr_each_step[global_step:]
+    return learning_rate
+
+
+def get_thor_lr(global_step, lr_init, decay, total_epochs, steps_per_epoch, decay_epochs=100):
+    """Get model_lr."""
+    lr_each_step = []
+    total_steps = steps_per_epoch * total_epochs
+    for i in range(total_steps):
+        epoch = (i + 1) / steps_per_epoch
+        base = (1.0 - float(epoch) / total_epochs) ** decay
+        lr_local = lr_init * base
+        if epoch >= decay_epochs:
+            lr_local = lr_local * 0.5
+        if epoch >= decay_epochs + 1:
+            lr_local = lr_local * 0.5
+        lr_each_step.append(lr_local)
+    current_step = global_step
+    lr_each_step = np.array(lr_each_step).astype(np.float32)
+    learning_rate = lr_each_step[current_step:]
+    return learning_rate
+
+
+def get_thor_damping(global_step, damping_init, decay_rate, total_epochs, steps_per_epoch):
+    """Get model_damping."""
+    damping_each_step = []
+    total_steps = steps_per_epoch * total_epochs
+    for step in range(total_steps):
+        epoch = (step + 1) / steps_per_epoch
+        damping_here = damping_init * (decay_rate ** (epoch / 10))
+        damping_each_step.append(damping_here)
+    current_step = global_step
+    damping_each_step = np.array(damping_each_step).astype(np.float32)
+    damping_now = damping_each_step[current_step:]
+    return damping_now
diff --git a/vega/algorithms/fully_train/resnet/src/metric.py b/vega/algorithms/fully_train/resnet/src/metric.py
new file mode 100644
index 00000000..a968a20b
--- /dev/null
+++ b/vega/algorithms/fully_train/resnet/src/metric.py
@@ -0,0 +1,120 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""evaluation metric."""
+
+from mindspore.communication.management import GlobalComm
+from mindspore.ops import operations as P
+import mindspore.nn as nn
+import mindspore.common.dtype as mstype
+
+
+class ClassifyCorrectCell(nn.Cell):
+    r"""
+    Cell that returns correct count of the prediction in classification network.
+
+    Args:
+        network (Cell): The network Cell.
+
+    Inputs:
+        - **data** (Tensor) - Tensor of shape :math:`(N, \ldots)`.
+        - **label** (Tensor) - Tensor of shape :math:`(N, \ldots)`.
+
+    Outputs:
+        Tuple, containing a scalar correct count of the prediction
+
+    Examples:
+        >>> # For a defined network Net without loss function
+        >>> net = Net()
+        >>> eval_net = nn.ClassifyCorrectCell(net)
+    """
+
+    def __init__(self, network):
+        super(ClassifyCorrectCell, self).__init__(auto_prefix=False)
+        self._network = network
+        self.argmax = P.Argmax()
+        self.equal = P.Equal()
+        self.cast = P.Cast()
+        self.reduce_sum = P.ReduceSum()
+        self.allreduce = P.AllReduce(P.ReduceOp.SUM, GlobalComm.WORLD_COMM_GROUP)
+
+    def construct(self, data, label):
+        """Construct the trainer of Resnet."""
+        outputs = self._network(data)
+        y_pred = self.argmax(outputs)
+        y_pred = self.cast(y_pred, mstype.int32)
+        y_correct = self.equal(y_pred, label)
+        y_correct = self.cast(y_correct, mstype.float32)
+        y_correct = self.reduce_sum(y_correct)
+        total_correct = self.allreduce(y_correct)
+        return (total_correct,)
+
+
+class DistAccuracy(nn.Metric):
+    r"""
+    Calculates the accuracy for classification data in distributed mode.
+
+    Args:
+        eval_type (str): Metric to calculate the accuracy over a dataset, for classification (single-label).
+
+    Examples:
+        >>> y_correct = Tensor(np.array([20]))
+        >>> metric = nn.DistAccuracy(batch_size=3, device_num=8)
+        >>> metric.clear()
+        >>> metric.update(y_correct)
+        >>> accuracy = metric.eval()
+    """
+
+    def __init__(self, batch_size, device_num):
+        super(DistAccuracy, self).__init__()
+        self.clear()
+        self.batch_size = batch_size
+        self.device_num = device_num
+
+    def clear(self):
+        """Clear the internal evaluation result."""
+        self._correct_num = 0
+        self._total_num = 0
+
+    def update(self, *inputs):
+        """
+        Update the internal evaluation result :math:`y_{pred}` and :math:`y`.
+
+        Args:
+            inputs: Input `y_correct`. `y_correct` is a `scalar Tensor`.
+                `y_correct` is the right prediction count that gathered from all devices
+                it's a scalar in float type
+
+        Raises:
+            ValueError: If the number of the input is not 1.
+        """
+        if len(inputs) != 1:
+            raise ValueError('Distribute accuracy needs 1 input (y_correct), but got {}'.format(len(inputs)))
+        y_correct = self._convert_data(inputs[0])
+        self._correct_num += y_correct
+        self._total_num += self.batch_size * self.device_num
+
+    def eval(self):
+        """
+        Compute the accuracy.
+
+        Returns:
+            Float, the computed result.
+
+        Raises:
+            RuntimeError: If the sample size is 0.
+        """
+        if self._total_num == 0:
+            raise RuntimeError('Accuracy can not be calculated, because the number of samples is 0.')
+        return self._correct_num / self._total_num
diff --git a/vega/algorithms/fully_train/resnet/src/momentum.py b/vega/algorithms/fully_train/resnet/src/momentum.py
new file mode 100644
index 00000000..7e4b4991
--- /dev/null
+++ b/vega/algorithms/fully_train/resnet/src/momentum.py
@@ -0,0 +1,130 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Momentum."""
+from mindspore.ops import functional as F, composite as C, operations as P
+from mindspore.common.parameter import Parameter
+from mindspore.common.tensor import Tensor
+import mindspore.common.dtype as mstype
+from mindspore._checkparam import Validator
+from mindspore.nn.optim.optimizer import Optimizer
+
+_momentum_opt = C.MultitypeFuncGraph("momentum_opt")
+
+
+@_momentum_opt.register("Function", "Tensor", "Tensor", "Tensor", "Tensor", "Tensor", "Tensor", "Tensor")
+def _tensor_run_opt_ext(opt, weight_decay, scale, momentum, learning_rate, gradient, weight, moment):
+    """Apply momentum optimizer to the weight parameter using Tensor."""
+    success = F.depend(True, opt(weight_decay, scale, weight, moment, learning_rate, gradient, momentum))
+    return success
+
+
+class Momentum(Optimizer):
+    r"""
+    Implements the Momentum algorithm.
+
+    Args:
+        params (Union[list[Parameter], list[dict]]): When the `params` is a list of `Parameter` which will be updated,
+            the element in `params` must be class `Parameter`. When the `params` is a list of `dict`, the "params",
+            "lr", "weight_decay" and "order_params" are the keys can be parsed.
+
+            - params: Required. The value must be a list of `Parameter`.
+
+            - lr: Optional. If "lr" in the keys, the value of corresponding learning rate will be used.
+              If not, the `learning_rate` in the API will be used.
+
+            - weight_decay: Optional. If "weight_decay" in the keys, the value of corresponding weight decay
+              will be used. If not, the `weight_decay` in the API will be used.
+
+            - order_params: Optional. If "order_params" in the keys, the value must be the order of parameters and
+              the order will be followed in optimizer. There are no other keys in the `dict` and the parameters which
+              in the value of 'order_params' must be in one of group parameters.
+
+        learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or a graph for the learning rate.
+            When the learning_rate is an Iterable or a Tensor in a 1D dimension, use dynamic learning rate, then
+            the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule,
+            use dynamic learning rate, the i-th learning rate will be calculated during the process of training
+            according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero
+            dimension, use fixed learning rate. Other cases are not supported. The float learning rate must be
+            equal to or greater than 0. If the type of `learning_rate` is int, it will be converted to float.
+        momentum (float): Hyperparameter of type float, means momentum for the moving average.
+            It must be at least 0.0.
+        weight_decay (int, float): Weight decay (L2 penalty). It must be equal to or greater than 0.0. Default: 0.0.
+        loss_scale (int, float): A floating point value for the loss scale. It must be greater than 0.0. Default: 1.0.
+        use_nesterov (bool): Enable Nesterov momentum. Default: False.
+
+    Inputs:
+        - **gradients** (tuple[Tensor]) - The gradients of `params`, the shape is the same as `params`.
+
+    Outputs:
+        tuple[bool], all elements are True.
+
+    Raises:
+        ValueError: If the momentum is less than 0.0.
+        TypeError: If the momentum is not a float or use_nesterov is not a bool.
+
+    Supported Platforms:
+        ``GPU``
+
+    Examples:
+        >>> net = Net()
+        >>> #1) All parameters use the same learning rate and weight decay
+        >>> optim = Momentum(params=net.trainable_params(), learning_rate=0.1, momentum=0.9)
+        >>>
+        >>> #2) Use parameter groups and set different values
+        >>> conv_params = list(filter(lambda x: 'conv' in x.name, net.trainable_params()))
+        >>> no_conv_params = list(filter(lambda x: 'conv' not in x.name, net.trainable_params()))
+        >>> group_params = [{'params': conv_params, 'weight_decay': 0.01},
+        ...                 {'params': no_conv_params, 'lr': 0.01},
+        ...                 {'order_params': net.trainable_params()}]
+        >>> optim = Momentum(group_params, learning_rate=0.1, momentum=0.9, weight_decay=0.0)
+        >>> # The conv_params's parameters will use a learning rate of default value 0.1 and a weight decay of 0.01.
+        >>> # The no_conv_params's parameters will use a learning rate of 0.01 and a weight decay of default value 0.0.
+        >>> # The final parameters order in which the optimizer will be followed is the value of 'order_params'.
+        >>>
+        >>> loss = nn.SoftmaxCrossEntropyWithLogits()
+        >>> model = Model(net, loss_fn=loss, optimizer=optim, metrics=None)
+    """
+
+    def __init__(self, params, learning_rate, momentum, weight_decay=0.0, loss_scale=1.0, use_nesterov=False):
+        super(Momentum, self).__init__(learning_rate, params, weight_decay, loss_scale)
+        Validator.check_value_type("momentum", momentum, [float], self.cls_name)
+        if isinstance(momentum, float) and momentum < 0.0:
+            raise ValueError("momentum should be at least 0.0, but got momentum {}".format(momentum))
+        self.momentum = Parameter(Tensor(momentum, mstype.float32), name="momentum")
+        self.params = self.parameters
+        self.use_nesterov = Validator.check_bool(use_nesterov)
+        self.moments = self.params.clone(prefix="moments", init='zeros')
+        self.hyper_map = C.HyperMap()
+        # Use FusedWeightScaleApplyMomentum to avoid extra kernel launch.
+        self.opt = P.FusedWeightScaleApplyMomentum()
+
+    def construct(self, gradients):
+        """Construct the trainer of Resnet."""
+        params = self.params
+        moments = self.moments
+        weight_decay = Tensor(0.0, mstype.float32)
+        scale = Tensor(1.0, mstype.float32)
+        if self.exec_weight_decay:
+            weight_decay = self.weight_decay_tensor
+        if self.need_scale:
+            scale = self.reciprocal_scale
+        lr = self.get_lr()
+        if self.is_group_lr:
+            success = self.hyper_map(F.partial(_momentum_opt, self.opt, weight_decay, scale, self.momentum),
+                                     lr, gradients, params, moments)
+        else:
+            success = self.hyper_map(F.partial(_momentum_opt, self.opt, weight_decay, scale, self.momentum, lr),
+                                     gradients, params, moments)
+        return success
diff --git a/vega/algorithms/fully_train/resnet/src/resnet.py b/vega/algorithms/fully_train/resnet/src/resnet.py
new file mode 100644
index 00000000..4edb4cc6
--- /dev/null
+++ b/vega/algorithms/fully_train/resnet/src/resnet.py
@@ -0,0 +1,610 @@
+# Copyright 2020-2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""ResNet."""
+import math
+import numpy as np
+from scipy.stats import truncnorm
+import mindspore.nn as nn
+import mindspore.common.dtype as mstype
+from mindspore.ops import operations as P
+from mindspore.ops import functional as F
+from mindspore.common.tensor import Tensor
+
+
+def _conv_variance_scaling_initializer(in_channel, out_channel, kernel_size):
+    """Construct the trainer of Resnet."""
+    fan_in = in_channel * kernel_size * kernel_size
+    scale = 1.0
+    scale /= max(1., fan_in)
+    stddev = (scale ** 0.5) / .87962566103423978
+    mu, sigma = 0, stddev
+    weight = truncnorm(-2, 2, loc=mu, scale=sigma).rvs(out_channel * in_channel * kernel_size * kernel_size)
+    weight = np.reshape(weight, (out_channel, in_channel, kernel_size, kernel_size))
+    return Tensor(weight, dtype=mstype.float32)
+
+
+def _weight_variable(shape, factor=0.01):
+    """Construct the trainer of Resnet."""
+    init_value = np.random.randn(*shape).astype(np.float32) * factor
+    return Tensor(init_value)
+
+
+def calculate_gain(nonlinearity, param=None):
+    """Calculate gain."""
+    linear_fns = ['linear', 'conv1d', 'conv2d', 'conv3d', 'conv_transpose1d', 'conv_transpose2d', 'conv_transpose3d']
+    res = 0
+    if nonlinearity in linear_fns or nonlinearity == 'sigmoid':
+        res = 1
+    elif nonlinearity == 'tanh':
+        res = 5.0 / 3
+    elif nonlinearity == 'relu':
+        res = math.sqrt(2.0)
+    elif nonlinearity == 'leaky_relu':
+        if param is None:
+            negative_slope = 0.01
+        elif not isinstance(param, bool) and isinstance(param, int) or isinstance(param, float):
+            # True/False are instances of int, hence check above
+            negative_slope = param
+        else:
+            raise ValueError("negative_slope {} not a valid number".format(param))
+        res = math.sqrt(2.0 / (1 + negative_slope ** 2))
+    else:
+        raise ValueError("Unsupported nonlinearity {}".format(nonlinearity))
+    return res
+
+
+def _calculate_fan_in_and_fan_out(tensor):
+    """Calculate fan_in_and_fan_out."""
+    dimensions = len(tensor)
+    if dimensions < 2:
+        raise ValueError("Fan in and fan out can not be computed for tensor with fewer than 2 dimensions")
+    if dimensions == 2:  # Linear
+        fan_in = tensor[1]
+        fan_out = tensor[0]
+    else:
+        num_input_fmaps = tensor[1]
+        num_output_fmaps = tensor[0]
+        receptive_field_size = 1
+        if dimensions > 2:
+            receptive_field_size = tensor[2] * tensor[3]
+        fan_in = num_input_fmaps * receptive_field_size
+        fan_out = num_output_fmaps * receptive_field_size
+    return fan_in, fan_out
+
+
+def _calculate_correct_fan(tensor, mode):
+    """Calculate correct_fan."""
+    mode = mode.lower()
+    valid_modes = ['fan_in', 'fan_out']
+    if mode not in valid_modes:
+        raise ValueError("Mode {} not supported, please use one of {}".format(mode, valid_modes))
+    fan_in, fan_out = _calculate_fan_in_and_fan_out(tensor)
+    return fan_in if mode == 'fan_in' else fan_out
+
+
+def kaiming_normal(inputs_shape, a=0, mode='fan_in', nonlinearity='leaky_relu'):
+    """Construct the trainer of Resnet."""
+    fan = _calculate_correct_fan(inputs_shape, mode)
+    gain = calculate_gain(nonlinearity, a)
+    std = gain / math.sqrt(fan)
+    return np.random.normal(0, std, size=inputs_shape).astype(np.float32)
+
+
+def kaiming_uniform(inputs_shape, a=0., mode='fan_in', nonlinearity='leaky_relu'):
+    """Construct the trainer of Resnet."""
+    fan = _calculate_correct_fan(inputs_shape, mode)
+    gain = calculate_gain(nonlinearity, a)
+    std = gain / math.sqrt(fan)
+    bound = math.sqrt(3.0) * std  # Calculate uniform bounds from standard deviation
+    return np.random.uniform(-bound, bound, size=inputs_shape).astype(np.float32)
+
+
+def _conv3x3(in_channel, out_channel, stride=1, use_se=False, res_base=False):
+    """Construct the trainer of Resnet."""
+    if use_se:
+        weight = _conv_variance_scaling_initializer(in_channel, out_channel, kernel_size=3)
+    else:
+        weight_shape = (out_channel, in_channel, 3, 3)
+        weight = Tensor(kaiming_normal(weight_shape, mode="fan_out", nonlinearity='relu'))
+    if res_base:
+        return nn.Conv2d(in_channel, out_channel, kernel_size=3, stride=stride,
+                         padding=1, pad_mode='pad', weight_init=weight)
+    return nn.Conv2d(in_channel, out_channel, kernel_size=3, stride=stride,
+                     padding=0, pad_mode='same', weight_init=weight)
+
+
+def _conv1x1(in_channel, out_channel, stride=1, use_se=False, res_base=False):
+    """Construct the trainer of Resnet."""
+    if use_se:
+        weight = _conv_variance_scaling_initializer(in_channel, out_channel, kernel_size=1)
+    else:
+        weight_shape = (out_channel, in_channel, 1, 1)
+        weight = Tensor(kaiming_normal(weight_shape, mode="fan_out", nonlinearity='relu'))
+    if res_base:
+        return nn.Conv2d(in_channel, out_channel, kernel_size=1, stride=stride,
+                         padding=0, pad_mode='pad', weight_init=weight)
+    return nn.Conv2d(in_channel, out_channel, kernel_size=1, stride=stride,
+                     padding=0, pad_mode='same', weight_init=weight)
+
+
+def _conv7x7(in_channel, out_channel, stride=1, use_se=False, res_base=False):
+    """Construct the trainer of Resnet."""
+    if use_se:
+        weight = _conv_variance_scaling_initializer(in_channel, out_channel, kernel_size=7)
+    else:
+        weight_shape = (out_channel, in_channel, 7, 7)
+        weight = Tensor(kaiming_normal(weight_shape, mode="fan_out", nonlinearity='relu'))
+    if res_base:
+        return nn.Conv2d(in_channel, out_channel,
+                         kernel_size=7, stride=stride, padding=3, pad_mode='pad', weight_init=weight)
+    return nn.Conv2d(in_channel, out_channel,
+                     kernel_size=7, stride=stride, padding=0, pad_mode='same', weight_init=weight)
+
+
+def _bn(channel, res_base=False):
+    """Construct the trainer of Resnet."""
+    if res_base:
+        return nn.BatchNorm2d(channel, eps=1e-5, momentum=0.1,
+                              gamma_init=1, beta_init=0, moving_mean_init=0, moving_var_init=1)
+    return nn.BatchNorm2d(channel, eps=1e-4, momentum=0.9,
+                          gamma_init=1, beta_init=0, moving_mean_init=0, moving_var_init=1)
+
+
+def _bn_last(channel):
+    """Construct the trainer of Resnet."""
+    return nn.BatchNorm2d(channel, eps=1e-4, momentum=0.9,
+                          gamma_init=0, beta_init=0, moving_mean_init=0, moving_var_init=1)
+
+
+def _fc(in_channel, out_channel, use_se=False):
+    """Construct the trainer of Resnet."""
+    if use_se:
+        weight = np.random.normal(loc=0, scale=0.01, size=out_channel * in_channel)
+        weight = Tensor(np.reshape(weight, (out_channel, in_channel)), dtype=mstype.float32)
+    else:
+        weight_shape = (out_channel, in_channel)
+        weight = Tensor(kaiming_uniform(weight_shape, a=math.sqrt(5)))
+    return nn.Dense(in_channel, out_channel, has_bias=True, weight_init=weight, bias_init=0)
+
+
+class ResidualBlock(nn.Cell):
+    """
+    ResNet V1 residual block definition.
+
+    Args:
+        in_channel (int): Input channel.
+        out_channel (int): Output channel.
+        stride (int): Stride size for the first convolutional layer. Default: 1.
+        use_se (bool): Enable SE-ResNet50 net. Default: False.
+        se_block(bool): Use se block in SE-ResNet50 net. Default: False.
+
+    Returns:
+        Tensor, output tensor.
+
+    Examples:
+        >>> ResidualBlock(3, 256, stride=2)
+    """
+
+    expansion = 4
+
+    def __init__(self,
+                 in_channel,
+                 out_channel,
+                 stride=1,
+                 use_se=False, se_block=False):
+        super(ResidualBlock, self).__init__()
+        self.stride = stride
+        self.use_se = use_se
+        self.se_block = se_block
+        channel = out_channel // self.expansion
+        self.conv1 = _conv1x1(in_channel, channel, stride=1, use_se=self.use_se)
+        self.bn1 = _bn(channel)
+        if self.use_se and self.stride != 1:
+            self.e2 = nn.SequentialCell([_conv3x3(channel, channel, stride=1, use_se=True), _bn(channel),
+                                         nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2, pad_mode='same')])
+        else:
+            self.conv2 = _conv3x3(channel, channel, stride=stride, use_se=self.use_se)
+            self.bn2 = _bn(channel)
+
+        self.conv3 = _conv1x1(channel, out_channel, stride=1, use_se=self.use_se)
+        self.bn3 = _bn_last(out_channel)
+        if self.se_block:
+            self.se_global_pool = P.ReduceMean(keep_dims=False)
+            self.se_dense_0 = _fc(out_channel, int(out_channel / 4), use_se=self.use_se)
+            self.se_dense_1 = _fc(int(out_channel / 4), out_channel, use_se=self.use_se)
+            self.se_sigmoid = nn.Sigmoid()
+            self.se_mul = P.Mul()
+        self.relu = nn.ReLU()
+
+        self.down_sample = False
+
+        if stride != 1 or in_channel != out_channel:
+            self.down_sample = True
+        self.down_sample_layer = None
+
+        if self.down_sample:
+            if self.use_se:
+                if stride == 1:
+                    self.down_sample_layer = nn.SequentialCell([_conv1x1(in_channel, out_channel,
+                                                                         stride, use_se=self.use_se), _bn(out_channel)])
+                else:
+                    self.down_sample_layer = nn.SequentialCell([nn.MaxPool2d(kernel_size=2, stride=2, pad_mode='same'),
+                                                                _conv1x1(in_channel, out_channel, 1,
+                                                                         use_se=self.use_se), _bn(out_channel)])
+            else:
+                self.down_sample_layer = nn.SequentialCell([_conv1x1(in_channel, out_channel, stride,
+                                                                     use_se=self.use_se), _bn(out_channel)])
+
+    def construct(self, x):
+        """Construct the trainer of Resnet."""
+        identity = x
+
+        out = self.conv1(x)
+        out = self.bn1(out)
+        out = self.relu(out)
+        if self.use_se and self.stride != 1:
+            out = self.e2(out)
+        else:
+            out = self.conv2(out)
+            out = self.bn2(out)
+            out = self.relu(out)
+        out = self.conv3(out)
+        out = self.bn3(out)
+        if self.se_block:
+            out_se = out
+            out = self.se_global_pool(out, (2, 3))
+            out = self.se_dense_0(out)
+            out = self.relu(out)
+            out = self.se_dense_1(out)
+            out = self.se_sigmoid(out)
+            out = F.reshape(out, F.shape(out) + (1, 1))
+            out = self.se_mul(out, out_se)
+
+        if self.down_sample:
+            identity = self.down_sample_layer(identity)
+
+        out = out + identity
+        out = self.relu(out)
+
+        return out
+
+
+class ResidualBlockBase(nn.Cell):
+    """
+    ResNet V1 residual block definition.
+
+    Args:
+        in_channel (int): Input channel.
+        out_channel (int): Output channel.
+        stride (int): Stride size for the first convolutional layer. Default: 1.
+        use_se (bool): Enable SE-ResNet50 net. Default: False.
+        se_block(bool): Use se block in SE-ResNet50 net. Default: False.
+        res_base (bool): Enable parameter setting of resnet18. Default: True.
+
+    Returns:
+        Tensor, output tensor.
+
+    Examples:
+        >>> ResidualBlockBase(3, 256, stride=2)
+    """
+
+    def __init__(self,
+                 in_channel,
+                 out_channel,
+                 stride=1,
+                 use_se=False,
+                 se_block=False,
+                 res_base=True):
+        super(ResidualBlockBase, self).__init__()
+        self.res_base = res_base
+        self.conv1 = _conv3x3(in_channel, out_channel, stride=stride, res_base=self.res_base)
+        self.bn1d = _bn(out_channel)
+        self.conv2 = _conv3x3(out_channel, out_channel, stride=1, res_base=self.res_base)
+        self.bn2d = _bn(out_channel)
+        self.relu = nn.ReLU()
+
+        self.down_sample = False
+        if stride != 1 or in_channel != out_channel:
+            self.down_sample = True
+
+        self.down_sample_layer = None
+        if self.down_sample:
+            self.down_sample_layer = nn.SequentialCell([_conv1x1(in_channel, out_channel, stride,
+                                                                 use_se=use_se, res_base=self.res_base),
+                                                        _bn(out_channel, res_base)])
+
+    def construct(self, x):
+        """Construct the trainer of Resnet."""
+        identity = x
+
+        out = self.conv1(x)
+        out = self.bn1d(out)
+        out = self.relu(out)
+
+        out = self.conv2(out)
+        out = self.bn2d(out)
+
+        if self.down_sample:
+            identity = self.down_sample_layer(identity)
+
+        out = out + identity
+        out = self.relu(out)
+
+        return out
+
+
+class ResNet(nn.Cell):
+    """
+    ResNet architecture.
+
+    Args:
+        block (Cell): Block for network.
+        layer_nums (list): Numbers of block in different layers.
+        in_channels (list): Input channel in each layer.
+        out_channels (list): Output channel in each layer.
+        strides (list):  Stride size in each layer.
+        num_classes (int): The number of classes that the training images are belonging to.
+        use_se (bool): Enable SE-ResNet50 net. Default: False.
+        se_block(bool): Use se block in SE-ResNet50 net in layer 3 and layer 4. Default: False.
+        res_base (bool): Enable parameter setting of resnet18. Default: False.
+
+    Returns:
+        Tensor, output tensor.
+
+    Examples:
+        >>> ResNet(ResidualBlock,
+        >>>        [3, 4, 6, 3],
+        >>>        [64, 256, 512, 1024],
+        >>>        [256, 512, 1024, 2048],
+        >>>        [1, 2, 2, 2],
+        >>>        10)
+    """
+
+    def __init__(self,
+                 block,
+                 layer_nums,
+                 in_channels,
+                 out_channels,
+                 strides,
+                 num_classes,
+                 use_se=False,
+                 res_base=False):
+        super(ResNet, self).__init__()
+
+        if not len(layer_nums) == len(in_channels) == len(out_channels) == 4:
+            raise ValueError("the length of layer_num, in_channels, out_channels list must be 4!")
+        self.use_se = use_se
+        self.res_base = res_base
+        self.se_block = False
+        if self.use_se:
+            self.se_block = True
+
+        if self.use_se:
+            self.conv1_0 = _conv3x3(3, 32, stride=2, use_se=self.use_se)
+            self.bn1_0 = _bn(32)
+            self.conv1_1 = _conv3x3(32, 32, stride=1, use_se=self.use_se)
+            self.bn1_1 = _bn(32)
+            self.conv1_2 = _conv3x3(32, 64, stride=1, use_se=self.use_se)
+        else:
+            self.conv1 = _conv7x7(3, 64, stride=2, res_base=self.res_base)
+        self.bn1 = _bn(64, self.res_base)
+        self.relu = P.ReLU()
+
+        if self.res_base:
+            self.pad = nn.Pad(paddings=((0, 0), (0, 0), (1, 1), (1, 1)))
+            self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode="valid")
+        else:
+            self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode="same")
+
+        self.layer1 = self._make_layer(block,
+                                       layer_nums[0],
+                                       in_channel=in_channels[0],
+                                       out_channel=out_channels[0],
+                                       stride=strides[0],
+                                       use_se=self.use_se)
+        self.layer2 = self._make_layer(block,
+                                       layer_nums[1],
+                                       in_channel=in_channels[1],
+                                       out_channel=out_channels[1],
+                                       stride=strides[1],
+                                       use_se=self.use_se)
+        self.layer3 = self._make_layer(block,
+                                       layer_nums[2],
+                                       in_channel=in_channels[2],
+                                       out_channel=out_channels[2],
+                                       stride=strides[2],
+                                       use_se=self.use_se,
+                                       se_block=self.se_block)
+        self.layer4 = self._make_layer(block,
+                                       layer_nums[3],
+                                       in_channel=in_channels[3],
+                                       out_channel=out_channels[3],
+                                       stride=strides[3],
+                                       use_se=self.use_se,
+                                       se_block=self.se_block)
+
+        self.mean = P.ReduceMean(keep_dims=True)
+        self.flatten = nn.Flatten()
+        self.end_point = _fc(out_channels[3], num_classes, use_se=self.use_se)
+
+    def _make_layer(self, block, layer_num, in_channel, out_channel, stride, use_se=False, se_block=False):
+        """
+        Make stage network of ResNet.
+
+        Args:
+            block (Cell): Resnet block.
+            layer_num (int): Layer number.
+            in_channel (int): Input channel.
+            out_channel (int): Output channel.
+            stride (int): Stride size for the first convolutional layer.
+            se_block(bool): Use se block in SE-ResNet50 net. Default: False.
+        Returns:
+            SequentialCell, the output layer.
+
+        Examples:
+            >>> _make_layer(ResidualBlock, 3, 128, 256, 2)
+        """
+        layers = []
+
+        resnet_block = block(in_channel, out_channel, stride=stride, use_se=use_se)
+        layers.append(resnet_block)
+        if se_block:
+            for _ in range(1, layer_num - 1):
+                resnet_block = block(out_channel, out_channel, stride=1, use_se=use_se)
+                layers.append(resnet_block)
+            resnet_block = block(out_channel, out_channel, stride=1, use_se=use_se, se_block=se_block)
+            layers.append(resnet_block)
+        else:
+            for _ in range(1, layer_num):
+                resnet_block = block(out_channel, out_channel, stride=1, use_se=use_se)
+                layers.append(resnet_block)
+        return nn.SequentialCell(layers)
+
+    def construct(self, x):
+        """Construct the trainer of Resnet."""
+        if self.use_se:
+            x = self.conv1_0(x)
+            x = self.bn1_0(x)
+            x = self.relu(x)
+            x = self.conv1_1(x)
+            x = self.bn1_1(x)
+            x = self.relu(x)
+            x = self.conv1_2(x)
+        else:
+            x = self.conv1(x)
+        x = self.bn1(x)
+        x = self.relu(x)
+        if self.res_base:
+            x = self.pad(x)
+        c1 = self.maxpool(x)
+
+        c2 = self.layer1(c1)
+        c3 = self.layer2(c2)
+        c4 = self.layer3(c3)
+        c5 = self.layer4(c4)
+
+        out = self.mean(c5, (2, 3))
+        out = self.flatten(out)
+        out = self.end_point(out)
+
+        return out
+
+
+def resnet18(class_num=10):
+    """
+    Get ResNet18 neural network.
+
+    Args:
+        class_num (int): Class number.
+
+    Returns:
+        Cell, cell instance of ResNet18 neural network.
+
+    Examples:
+        >>> net = resnet18(10)
+    """
+    return ResNet(ResidualBlockBase,
+                  [2, 2, 2, 2],
+                  [64, 64, 128, 256],
+                  [64, 128, 256, 512],
+                  [1, 2, 2, 2],
+                  class_num,
+                  res_base=True)
+
+
+def resnet34(class_num=10):
+    """
+    Get ResNet34 neural network.
+
+    Args:
+        class_num (int): Class number.
+
+    Returns:
+        Cell, cell instance of ResNet34 neural network.
+
+    Examples:
+        >>> net = resnet18(10)
+    """
+    return ResNet(ResidualBlockBase,
+                  [3, 4, 6, 3],
+                  [64, 64, 128, 256],
+                  [64, 128, 256, 512],
+                  [1, 2, 2, 2],
+                  class_num,
+                  res_base=True)
+
+
+def resnet50(class_num=10):
+    """
+    Get ResNet50 neural network.
+
+    Args:
+        class_num (int): Class number.
+
+    Returns:
+        Cell, cell instance of ResNet50 neural network.
+
+    Examples:
+        >>> net = resnet50(10)
+    """
+    return ResNet(ResidualBlock,
+                  [3, 4, 6, 3],
+                  [64, 256, 512, 1024],
+                  [256, 512, 1024, 2048],
+                  [1, 2, 2, 2],
+                  class_num)
+
+
+def se_resnet50(class_num=1001):
+    """
+    Get SE-ResNet50 neural network.
+
+    Args:
+        class_num (int): Class number.
+
+    Returns:
+        Cell, cell instance of SE-ResNet50 neural network.
+
+    Examples:
+        >>> net = se-resnet50(1001)
+    """
+    return ResNet(ResidualBlock,
+                  [3, 4, 6, 3],
+                  [64, 256, 512, 1024],
+                  [256, 512, 1024, 2048],
+                  [1, 2, 2, 2],
+                  class_num,
+                  use_se=True)
+
+
+def resnet101(class_num=1001):
+    """
+    Get ResNet101 neural network.
+
+    Args:
+        class_num (int): Class number.
+
+    Returns:
+        Cell, cell instance of ResNet101 neural network.
+
+    Examples:
+        >>> net = resnet101(1001)
+    """
+    return ResNet(ResidualBlock,
+                  [3, 4, 23, 3],
+                  [64, 256, 512, 1024],
+                  [256, 512, 1024, 2048],
+                  [1, 2, 2, 2],
+                  class_num)
diff --git a/vega/algorithms/hpo/__init__.py b/vega/algorithms/hpo/__init__.py
index a0f5bcca..c3233392 100644
--- a/vega/algorithms/hpo/__init__.py
+++ b/vega/algorithms/hpo/__init__.py
@@ -21,4 +21,5 @@
     "pbt_hpo": ["PBTHpo"],
     "pbt_trainer_callback": ["PbtTrainerCallback"],
     "sha_base.hebo_adaptor": ["HeboAdaptor"],
+    "bayes": ["BayesSearch"],
 })
diff --git a/vega/algorithms/hpo/bayes.py b/vega/algorithms/hpo/bayes.py
new file mode 100644
index 00000000..1bc8b838
--- /dev/null
+++ b/vega/algorithms/hpo/bayes.py
@@ -0,0 +1,64 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""Defined Bayes Search class."""
+from vega.common import ClassFactory, ClassType
+from vega.core.search_algs import SearchAlgorithm
+from vega.algorithms.hpo.bayes_conf import BayesConfig
+from vega.algorithms.hpo.ea.ga import GeneticAlgorithm
+from vega.algorithms.hpo.sha_base.tuner import TunerBuilder
+
+
+@ClassFactory.register(ClassType.SEARCH_ALGORITHM)
+class BayesSearch(SearchAlgorithm):
+    """An Hpo of Bayes optimization."""
+
+    config = BayesConfig()
+
+    def __init__(self, search_space=None, **kwargs):
+        """Init BayesSearch."""
+        super(BayesSearch, self).__init__(search_space, **kwargs)
+        self.num_samples = self.config.num_samples
+        self._all_desc_dict = {}
+        multi_obj = isinstance(self.config.objective_keys, list) and len(self.config.objective_keys) > 1
+        alg_name = "GA" if multi_obj else self.config.tuner
+        if alg_name == "GA":
+            self.tuner = GeneticAlgorithm(search_space, random_samples=self.config.warmup_count,
+                                          prob_crossover=self.config.prob_crossover,
+                                          prob_mutatation=self.config.prob_mutatation)
+        else:
+            self.tuner = TunerBuilder(search_space=search_space, tuner=alg_name)
+        self.sample_count = 0
+
+    def search(self, config_id=None):
+        """Search one NetworkDesc from search space."""
+        desc = self.tuner.propose()[0]
+        self.sample_count += 1
+        self._all_desc_dict[str(self.sample_count)] = desc
+        return dict(worker_id=self.sample_count, encoded_desc=desc)
+
+    @property
+    def max_samples(self):
+        """Get max samples number."""
+        return self.num_samples
+
+    @property
+    def is_completed(self):
+        """Whether to complete algorithm."""
+        return self.sample_count >= self.num_samples
+
+    def update(self, record):
+        """Update function, Not Implemented Yet.
+
+        :param record: record dict.
+        """
+        desc = self._all_desc_dict.get(str(record.get('worker_id')))
+        rewards = record.get("rewards")
+        self.tuner.add(desc, rewards)
diff --git a/vega/algorithms/hpo/bayes_conf.py b/vega/algorithms/hpo/bayes_conf.py
new file mode 100644
index 00000000..461edad4
--- /dev/null
+++ b/vega/algorithms/hpo/bayes_conf.py
@@ -0,0 +1,54 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""Defined Configs."""
+from vega.common import ConfigSerializable
+
+
+class BayesPolicyConfig(ConfigSerializable):
+    """Bohb Policy Config."""
+
+    @classmethod
+    def rules(cls):
+        """Return rules for checking."""
+        rules_BohbPolicyConfig = {"num_samples": {"type": int},
+                                  "warmup_count": {"type": int},
+                                  "prob_mutatation": {"type": float},
+                                  "prob_crossover": {"type": float},
+                                  "tuner": {"type": str},
+                                  }
+        return rules_BohbPolicyConfig
+
+
+class BayesConfig(ConfigSerializable):
+    """Bayes Config."""
+
+    policy = BayesPolicyConfig
+    objective_keys = 'accuracy'
+    num_samples = 32
+    warmup_count = 16
+    prob_mutatation = 0.2
+    prob_crossover = 0.6
+    tuner = "RF"  # TPE | GP | RF
+
+    @classmethod
+    def rules(cls):
+        """Return rules for checking."""
+        rules_BoConfig = {"policy": {"type": dict},
+                          "objective_keys": {"type": (list, str)}
+                          }
+        return rules_BoConfig
+
+    @classmethod
+    def get_config(cls):
+        """Get sub config."""
+        return {
+            "policy": cls.policy
+        }
diff --git a/vega/algorithms/hpo/evolution_search.py b/vega/algorithms/hpo/evolution_search.py
index 3214ddb5..adcf654a 100644
--- a/vega/algorithms/hpo/evolution_search.py
+++ b/vega/algorithms/hpo/evolution_search.py
@@ -76,7 +76,7 @@ def search(self):
         # split codes
         desc = {}
         for _name, _size in each_codes_cache.items():
-            desc[_name] = encoding_new[:_size][0]
+            desc[_name] = encoding_new[:_size]
             encoding_new = encoding_new[_size:]
         self.sample_count += 1
         sample = dict(worker_id=self.sample_count, encoded_desc=desc)
diff --git a/vega/algorithms/hpo/hpo_base.py b/vega/algorithms/hpo/hpo_base.py
index 7f783683..ca07b870 100644
--- a/vega/algorithms/hpo/hpo_base.py
+++ b/vega/algorithms/hpo/hpo_base.py
@@ -9,6 +9,7 @@
 # MIT License for more details.
 
 """Defined AshaHpo class."""
+
 import logging
 import copy
 from threading import Lock
@@ -94,6 +95,9 @@ def next_rung(**kwargs):
         return {"result": "success", "data": {"rung_id": None, "message": "do not has next_rung method"}}
 
     record = ReportRecord().load_dict(kwargs)
+    if None in record.rewards:
+        return {"result": "success", "data": {"rung_id": None, "message": "part of rewards is missing"}}
+
     _instance.update(record.serialize())
     result = _instance.search(config_id=record.worker_id)
     if result is not None and "encoded_desc" in result and "trainer.epochs" in result["encoded_desc"]:
diff --git a/vega/algorithms/hpo/pbt_trainer_callback.py b/vega/algorithms/hpo/pbt_trainer_callback.py
index 89b96589..7f541331 100644
--- a/vega/algorithms/hpo/pbt_trainer_callback.py
+++ b/vega/algorithms/hpo/pbt_trainer_callback.py
@@ -20,9 +20,8 @@ class PbtTrainerCallback(Callback):
 
     def before_train(self, logs=None):
         """Be called before the training process."""
-        self.epochs = self.trainer.epochs
         self.params_list = self.trainer.hps.trainer.all_configs
-        self.load_para_interval = self.epochs // len(self.params_list.keys())
+        self.load_para_interval = self.trainer.epochs // len(self.params_list.keys())
 
     def before_epoch(self, epoch, logs=None):
         """Be called before epoch."""
diff --git a/vega/algorithms/hpo/sha_base/asha.py b/vega/algorithms/hpo/sha_base/asha.py
index c2840c27..8239d301 100644
--- a/vega/algorithms/hpo/sha_base/asha.py
+++ b/vega/algorithms/hpo/sha_base/asha.py
@@ -230,7 +230,7 @@ def _check_completed(self):
             return False
 
         max_rung_id = self.sieve_board['rung_id'].max()
-        if max_rung_id == self.total_rungs - 1:
+        if max_rung_id == self.total_rungs:
             return True
 
         candidate_ids = self._get_top_k_config_ids(max_rung_id)
@@ -261,7 +261,7 @@ def _get_top_k_config_ids(self, rung_id):
             if num_next_rung >= k:
                 return None
 
-            if isinstance(df.iloc[0]["score"], float):
+            if isinstance(df.iloc[0]["score"], float) or isinstance(df.iloc[0]["score"], int):
                 ids = df.sort_values("score", ascending=False).iloc[:k]["config_id"].tolist()
             elif isinstance(df.iloc[0]["score"], list):
                 data = df[["config_id", "score"]].to_numpy()
diff --git a/vega/algorithms/nas/__init__.py b/vega/algorithms/nas/__init__.py
index 19515466..421daf9c 100644
--- a/vega/algorithms/nas/__init__.py
+++ b/vega/algorithms/nas/__init__.py
@@ -30,5 +30,6 @@
     "sp_nas": ["SpNasS", "SpNasP"],
     "sr_ea": ["SRCodec", "SRMutate", "SRRandom"],
     "mfasc": ["search_algorithm:MFASC"],
-    "opt_nas": ["OperatorSearchSpace", "OperatorReplaceCallback"]
+    "opt_nas": ["OperatorSearchSpace", "OperatorReplaceCallback"],
+    "dag_mutate": ["DAGMutateSearchSpace"]
 })
diff --git a/vega/algorithms/nas/adelaide_ea/adelaide_trainer_callback.py b/vega/algorithms/nas/adelaide_ea/adelaide_trainer_callback.py
index 38189fb4..a311bd77 100644
--- a/vega/algorithms/nas/adelaide_ea/adelaide_trainer_callback.py
+++ b/vega/algorithms/nas/adelaide_ea/adelaide_trainer_callback.py
@@ -22,6 +22,7 @@
     import tensorflow as tf
 elif vega.is_ms_backend():
     import mindspore
+    from mindspore.train import Model as MsModel
 
 logger = logging.getLogger(__name__)
 
@@ -45,6 +46,12 @@ def before_train(self, logs=None):
             count_input = tf.random.uniform(input_shape, dtype=tf.float32)
         elif vega.is_ms_backend():
             count_input = mindspore.Tensor(np.random.randn(*input_shape).astype(np.float32))
+            loss_fn = ClassFactory.get_cls(ClassType.LOSS, "CustomSoftmaxCrossEntropyWithLogits")()
+            self.trainer.ms_model = MsModel(network=self.trainer.model,
+                                            loss_fn=loss_fn,
+                                            optimizer=self.trainer.optimizer,
+                                            metrics=self.trainer.ms_metrics)
+
         flops_count, params_count = calc_model_flops_params(self.trainer.model, count_input)
         self.flops_count, self.params_count = flops_count * 1e-9, params_count * 1e-3
         logger.info("Flops: {:.2f} G, Params: {:.1f} K".format(self.flops_count, self.params_count))
diff --git a/vega/algorithms/nas/cars/cars_trainer_callback.py b/vega/algorithms/nas/cars/cars_trainer_callback.py
index eb88cd7b..5d94af1e 100644
--- a/vega/algorithms/nas/cars/cars_trainer_callback.py
+++ b/vega/algorithms/nas/cars/cars_trainer_callback.py
@@ -103,7 +103,7 @@ def model_fn(self, features, labels, mode):
         if mode == tf.estimator.ModeKeys.TRAIN:
             global_step = tf.compat.v1.train.get_global_step()
             epoch = tf.cast(global_step, tf.float32) / tf.cast(len(self.trainer.train_loader), tf.float32)
-            self.trainer.optimizer = Optimizer()(distributed=self.trainer.distributed)
+            self.trainer.optimizer = Optimizer()(distributed=self.trainer.horovod)
             self.trainer.lr_scheduler = LrScheduler()(self.trainer.optimizer)
             self.trainer.lr_scheduler.step(epoch)
             self.trainer.model.training = True
diff --git a/vega/algorithms/nas/dag_mutate/__init__.py b/vega/algorithms/nas/dag_mutate/__init__.py
new file mode 100644
index 00000000..e5265b33
--- /dev/null
+++ b/vega/algorithms/nas/dag_mutate/__init__.py
@@ -0,0 +1,3 @@
+from .mutate import DAGMutateSearchSpace
+
+__all__ = ["DAGMutateSearchSpace"]
diff --git a/vega/algorithms/nas/dag_mutate/mutate.py b/vega/algorithms/nas/dag_mutate/mutate.py
new file mode 100644
index 00000000..f1a67ce0
--- /dev/null
+++ b/vega/algorithms/nas/dag_mutate/mutate.py
@@ -0,0 +1,93 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+"""This is DAG Cell for network."""
+import copy
+import json
+import random
+import logging
+from collections import OrderedDict
+from vega.common import ClassFactory, ClassType, Config
+from vega.core.search_space import SearchSpace
+from vega.core.pipeline.conf import PipeStepConfig
+from vega.model_zoo import ModelZoo
+from vega.algorithms.nas.dag_mutate.search_blocks import search_blocks, forward_block_out_shape
+
+
+@ClassFactory.register(ClassType.SEARCHSPACE)
+class DAGMutateSearchSpace(SearchSpace):
+    """Prune SearchSpace."""
+
+    @classmethod
+    def to_desc(self, desc):
+        """Decode to model desc."""
+        if not hasattr(self, "model") or not self.model:
+            self.model = ModelZoo().get_model(PipeStepConfig.model.model_desc,
+                                              PipeStepConfig.model.pretrained_model_file)
+        model = copy.deepcopy(self.model)
+        blocks = search_blocks(model)
+        target_desc = OrderedDict(copy.deepcopy(model.to_desc()))
+        return mutate_blocks(blocks, target_desc, desc)
+
+
+def mutate_blocks(blocks, target_desc, block_desc):
+    """Mutate Block."""
+    target_block = generate_d_blocks(block_desc)
+    logging.info("generate d block: {}".format(target_block))
+    mutated_blocks = [block for block in blocks if
+                      block.c_in == target_block.c_in and block.c_out == target_block.c_out]
+    if not mutate_blocks:
+        return None
+    mutated_desc = target_desc
+    for block in mutated_blocks:
+        if random.uniform(0, 1) > 0.5:
+            continue
+        mutated_desc = mutate_block(mutated_desc, block, target_block)
+    return mutated_desc
+
+
+def generate_d_blocks(block_desc):
+    """Generate d blocks."""
+    block_str = block_desc.get("block_str")
+    stride = block_desc.get('stride')
+    c_in = block_desc.get('c_in')
+    ops = block_desc.get('ops') or ['conv3', 'conv1', 'conv3_grp2', 'conv3_grp4', 'conv3_base1', 'conv3_base32',
+                                    'conv3_sep']
+    target_block = Config(dict(type="EncodedBlock", block_str=block_str, in_channel=c_in, op_names=ops, stride=stride))
+    target_block.c_in = c_in
+    target_block.c_out = forward_block_out_shape(target_block, [1, c_in, 32, 32], idx=1)
+    return target_block
+
+
+def mutate_block(model_desc, mutated_block, target_block=None):
+    """Mutate block."""
+    if not mutated_block.c_in or not mutated_block.c_out:
+        return None
+    if not (mutated_block.c_in == target_block.c_in and mutated_block.c_out == target_block.c_out):
+        return None
+    logging.info("Mutate blocks start module name: {}, end module name: {}".format(
+        mutated_block.start_name, mutated_block.end_name))
+    mutated_map = OrderedDict()
+    while model_desc:
+        name, node = model_desc.popitem(0)
+        if name != 'type':
+            node = json.loads(node) if isinstance(node, str) else node
+        if name not in mutated_block.nodes:
+            mutated_map[name] = node
+            continue
+        if name == mutated_block.end_name:
+            mutated_map[name] = dict(name=name, module=target_block, module_type=target_block.get("type"),
+                                     parent_node_names=[mutated_block.start_name],
+                                     child_node_names=node.get("child_node_names"))
+        elif name == mutated_block.start_name:
+            tmp_node = copy.deepcopy(node)
+            tmp_node["child_node_names"] = [mutated_block.end_name]
+            tmp_node["child_nodes"] = []
+            mutated_map[name] = tmp_node
+    return mutated_map
diff --git a/vega/algorithms/nas/dag_mutate/search_blocks.py b/vega/algorithms/nas/dag_mutate/search_blocks.py
new file mode 100644
index 00000000..7fc59a4d
--- /dev/null
+++ b/vega/algorithms/nas/dag_mutate/search_blocks.py
@@ -0,0 +1,99 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+"""This is Search all blocks in network."""
+from collections import OrderedDict
+from vega.common import ClassFactory, ClassType
+
+
+def is_connection_node(node):
+    """Determine is connection node."""
+    return node.is_operator_conn_module or len(node.child_nodes) > 1 or node.module_type == 'torch_func_cat'
+
+
+class BlockItems(object):
+    """Blocks Items."""
+
+    def __init__(self):
+        self._nodes = OrderedDict()
+        self._start_name = None
+        self._end_name = None
+
+    def add(self, name, node, start_node=False, end_node=False):
+        """Add a node into items."""
+        self._nodes[name] = node
+        if start_node:
+            self._start_name = name
+        if end_node:
+            self._end_name = name
+
+    @property
+    def nodes(self):
+        """Get nodes."""
+        return self._nodes
+
+    @property
+    def c_in(self):
+        """Get input shape."""
+        convs = [node for name, node in self.nodes.items() if node.module_type == 'Conv2d']
+        if convs:
+            return convs[-1].module.in_channels
+        return None  # in_node
+
+    @property
+    def c_out(self):
+        """Get output shape."""
+        convs = [node for name, node in self.nodes.items() if node.module_type == 'Conv2d']
+        if convs:
+            return convs[0].module.out_channels
+        return 256
+
+    @property
+    def start_name(self):
+        """Get start name."""
+        return self._start_name or next(iter(reversed(self._nodes)))
+
+    @property
+    def end_name(self):
+        """Get end name."""
+        return self._end_name or next(iter(self._nodes))
+
+
+def search_blocks_items(in_node):
+    """Search and list all sub blocks items."""
+    items = BlockItems()
+    c_nodes = [in_node]
+    while c_nodes:
+        node = c_nodes.pop()
+        items.add(node.name, node)
+        for parent_node in node.parent_nodes:
+            if not is_connection_node(parent_node):
+                c_nodes.append(parent_node)
+            else:
+                items.add(parent_node.name, parent_node, start_node=True)
+    return items
+
+
+def search_blocks(model):
+    """Search all blocks of dag network."""
+    blocks = []
+    for name, node in model.named_nodes():
+        if is_connection_node(node):
+            blocks.append(search_blocks_items(node))
+    return blocks
+
+
+def forward_block_out_shape(block_desc, input_shape, idx=None):
+    """Forward blocks."""
+    import torch
+    block = ClassFactory.get_instance(ClassType.NETWORK, block_desc)
+    out_shape = block(torch.ones(*input_shape)).shape
+    if idx:
+        return out_shape[idx]
+    return out_shape
diff --git a/vega/algorithms/nas/darts_cnn/darts_trainer_callback.py b/vega/algorithms/nas/darts_cnn/darts_trainer_callback.py
index 350dad52..14b04aee 100644
--- a/vega/algorithms/nas/darts_cnn/darts_trainer_callback.py
+++ b/vega/algorithms/nas/darts_cnn/darts_trainer_callback.py
@@ -65,7 +65,10 @@ def before_train_step(self, epoch, logs=None):
         except Exception:
             self.valid_loader_iter = iter(self.trainer.valid_loader)
             valid_input, valid_target = next(self.valid_loader_iter)
-        valid_input, valid_target = valid_input.to(self.device), valid_target.to(self.device)
+        if vega.is_npu_device():
+            valid_input, valid_target = valid_input.to(int(self.device)), valid_target.to(int(self.device))
+        else:
+            valid_input, valid_target = valid_input.to(self.device), valid_target.to(self.device)
         # Call arch search step
         self._train_arch_step(train_input, train_target, valid_input, valid_target)
 
@@ -110,7 +113,7 @@ def model_fn(self, features, labels, mode):
             labels, valid_labels = labels['train'], labels['valid']
             # update arch
             epoch = tf.cast(global_step, tf.float32) / tf.cast(len(self.trainer.train_loader), tf.float32)
-            self.trainer.optimizer = Optimizer()(distributed=self.trainer.distributed)
+            self.trainer.optimizer = Optimizer()(distributed=self.trainer.horovod)
             self.trainer.lr_scheduler = LrScheduler()(self.trainer.optimizer)
             self.trainer.lr_scheduler.step(epoch)
             update_ops = tf.compat.v1.get_collection(tf.compat.v1.GraphKeys.UPDATE_OPS)
diff --git a/vega/algorithms/nas/fis/autogate_grda_s2_trainer_callback.py b/vega/algorithms/nas/fis/autogate_grda_s2_trainer_callback.py
index 39f5378a..2e905bfc 100644
--- a/vega/algorithms/nas/fis/autogate_grda_s2_trainer_callback.py
+++ b/vega/algorithms/nas/fis/autogate_grda_s2_trainer_callback.py
@@ -42,7 +42,7 @@ def before_train(self, logs=None):
         logging.info("loading stage1_hpo_result \n{}".format(hpo_result))
 
         self.selected_pairs = hpo_result['feature_interaction']
-        logging.info('feature_interaction:', self.selected_pairs)
+        logging.info(f'feature_interaction: {self.selected_pairs}')
 
         # add selected_pairs
         setattr(ModelConfig.model_desc['custom'], 'selected_pairs', self.selected_pairs)
diff --git a/vega/algorithms/nas/mfasc/mfasc.py b/vega/algorithms/nas/mfasc/mfasc.py
index 2220447f..3e6f6312 100644
--- a/vega/algorithms/nas/mfasc/mfasc.py
+++ b/vega/algorithms/nas/mfasc/mfasc.py
@@ -115,7 +115,8 @@ def search(self):
         desc = self.choices[i]
         self.budget_spent += train_epochs
         self.cur_i = i
-        return {"worker_id": self.budget_spent, "encoded_desc": desc, 'trainer': {'epochs': train_epochs}}
+        desc["trainer.epochs"] = train_epochs
+        return {"worker_id": self.budget_spent, "encoded_desc": desc}
 
     def update(self, report):
         """Update function.
diff --git a/vega/algorithms/nas/modnas/README.md b/vega/algorithms/nas/modnas/README.md
index 0765c351..1bcea30a 100644
--- a/vega/algorithms/nas/modnas/README.md
+++ b/vega/algorithms/nas/modnas/README.md
@@ -3,6 +3,7 @@
 > [MLSys 2021] ModularNAS: Towards Modularized and Reusable Neural Architecture Search
 
 [![Documentation Status](https://readthedocs.org/projects/modularnas/badge/?version=latest)](https://modularnas.readthedocs.io/en/latest/?badge=latest)
+[![Coverage Status](https://coveralls.io/repos/github/CreeperLin/modnas/badge.svg?branch=HEAD&t=kPmeuP)](https://coveralls.io/github/CreeperLin/modnas?branch=HEAD)
 
 [ModularNAS Docs](https://modularnas.readthedocs.io/)
 
diff --git a/vega/algorithms/nas/modnas/backend/__init__.py b/vega/algorithms/nas/modnas/backend/__init__.py
index 27ef734e..a3e60afb 100644
--- a/vega/algorithms/nas/modnas/backend/__init__.py
+++ b/vega/algorithms/nas/modnas/backend/__init__.py
@@ -9,18 +9,20 @@
 # MIT License for more details.
 
 import importlib
+import traceback
 from modnas.registry.backend import build
 from . import predefined
+from typing import Optional
 
 _backend = None
 
 _backend_keys = []
 
 
-def use(backend, *args, imported=False, **kwargs):
+def use(backend: Optional[str], *args, imported=False, **kwargs) -> None:
     """Switch to backend by name."""
     global _backend, _backend_keys
-    if backend == _backend or backend in ['none', None]:
+    if backend == _backend or backend == 'none' or backend is None:
         return
     try:
         if imported:
@@ -28,6 +30,7 @@ def use(backend, *args, imported=False, **kwargs):
         else:
             bk_mod = build(backend, *args, **kwargs)
     except ImportError:
+        traceback.print_exc()
         return
     bk_vars = vars(bk_mod)
     bk_keys = bk_vars.keys()
@@ -38,7 +41,7 @@ def use(backend, *args, imported=False, **kwargs):
         if k.startswith('__'):
             continue
         ns[k] = bk_vars[k]
-    _backend_keys = bk_keys
+    _backend_keys = list(bk_keys)
     _backend = backend
 
 
@@ -47,6 +50,6 @@ def backend():
     return _backend
 
 
-def is_backend(backend):
+def is_backend(backend: str) -> bool:
     """Return if the current backend is the given one."""
     return _backend == backend
diff --git a/vega/algorithms/nas/modnas/backend/predefined/torch/__init__.py b/vega/algorithms/nas/modnas/backend/predefined/torch/__init__.py
index 0a89cd10..79c8c412 100644
--- a/vega/algorithms/nas/modnas/backend/predefined/torch/__init__.py
+++ b/vega/algorithms/nas/modnas/backend/predefined/torch/__init__.py
@@ -2,7 +2,7 @@
 from .optimizer import get_optimizer
 from .lr_scheduler import get_lr_scheduler
 from .data_provider import get_data_provider
-from .utils import init_device, get_dev_mem_used, model_summary,\
+from .utils import version, init_device, get_device, set_device, get_dev_mem_used, model_summary,\
     clear_bn_running_statistics, recompute_bn_running_statistics
 import modnas.core.params.torch
 import modnas.arch_space.construct.torch
diff --git a/vega/algorithms/nas/modnas/backend/predefined/torch/criterion.py b/vega/algorithms/nas/modnas/backend/predefined/torch/criterion.py
index e51f1a70..447efebe 100644
--- a/vega/algorithms/nas/modnas/backend/predefined/torch/criterion.py
+++ b/vega/algorithms/nas/modnas/backend/predefined/torch/criterion.py
@@ -68,6 +68,18 @@ def forward(self, y_pred, y_true):
         return cross_entropy_soft_target(y_pred, soft_y_true)
 
 
+class CrossEntropySoftTargetLoss(nn.Module):
+    """Cross entropy loss with label smoothing."""
+
+    def __init__(self, softmax=True):
+        super().__init__()
+        self.softmax = softmax
+
+    def forward(self, y_pred, y_true):
+        """Return loss."""
+        return cross_entropy_soft_target(y_pred, (F.softmax(y_true, dim=-1) if self.softmax else y_true))
+
+
 @register
 class MixUpLoss():
     """Apply MIXUP loss."""
@@ -92,29 +104,27 @@ def __call__(self, loss, estim, y_pred, X, y_true):
         mixed_x = lam * X + (1 - lam) * alt_X
         mixed_y_pred = estim.model_output(mixed_x)
         loss = loss or 0
-        return lam * self.criterion(loss, estim, mixed_y_pred, mixed_x, y_true) + (1 - lam) * self.criterion(
-            loss, estim, mixed_y_pred, mixed_x, alt_y_true)
+        crit = self.criterion(loss, estim, mixed_y_pred, mixed_x, y_true)
+        crit_alt = self.criterion(loss, estim, mixed_y_pred, mixed_x, alt_y_true)
+        return lam * crit + (1 - lam) * crit_alt
 
 
 @register
 class AuxiliaryLoss():
     """Apply Auxiliary loss."""
 
-    def __init__(self, aux_ratio=0.4, loss_type='ce', forward_func='forward_aux'):
+    def __init__(self, crit_conf='CrossEntropyLoss', aux_ratio=0.4, forward_func='forward_aux'):
         super().__init__()
         self.aux_ratio = aux_ratio
         self.fwd_func = forward_func
-        if loss_type == 'ce':
-            self.loss_func = F.cross_entropy
-        else:
-            raise ValueError('unsupported loss type: {}'.format(loss_type))
+        self.criterion = build(crit_conf)
 
     def __call__(self, loss, estim, y_pred, X, y_true):
         """Return loss."""
         aux_logits = estim.model_output(X, attr=self.fwd_func)
         if aux_logits is None:
             return loss
-        aux_loss = self.loss_func(aux_logits, y_true).to(device=X.device)
+        aux_loss = self.criterion(loss, estim, aux_logits, X, y_true).to(device=X.device)
         return loss + self.aux_ratio * aux_loss
 
 
@@ -122,18 +132,13 @@ def __call__(self, loss, estim, y_pred, X, y_true):
 class KnowledgeDistillLoss():
     """Apply Knowledge Distillation."""
 
-    def __init__(self, kd_model_constructor=None, kd_model=None, kd_ratio=0.5, loss_scale=1., loss_type='ce'):
+    def __init__(self, crit_conf, kd_model_constructor=None, kd_model=None, kd_ratio=0.5, loss_scale=1.):
         super().__init__()
         self.kd_model_constructor = kd_model_constructor
         self.kd_model = kd_model
         self.kd_ratio = kd_ratio
         self.loss_scale = loss_scale
-        if loss_type == 'ce':
-            self.loss_func = lambda y_pred, target: cross_entropy_soft_target(y_pred, F.softmax(target, dim=-1))
-        elif loss_type == 'mse':
-            self.loss_func = F.mse_loss
-        else:
-            raise ValueError('unsupported loss_type: {}'.format(loss_type))
+        self.criterion = build(crit_conf)
 
     def _load_model(self, kd_model, kd_model_constructor):
         if not isinstance(kd_model_constructor, list):
@@ -149,7 +154,7 @@ def __call__(self, loss, estim, y_pred, X, y_true):
         with torch.no_grad():
             self.kd_model.to(device=X.device)
             soft_logits = self.kd_model(X)
-        kd_loss = self.loss_func(y_pred, soft_logits).to(device=loss.device)
+        kd_loss = self.criterion(loss, estim, y_pred, X, soft_logits).to(device=loss.device)
         loss = self.loss_scale * ((1 - self.kd_ratio) * loss + self.kd_ratio * kd_loss)
         return loss
 
@@ -226,15 +231,17 @@ def __call__(self, loss, estim, y_pred, X, y_true):
         return self.alpha * loss * (torch.log(mt.to(device=loss.device)) / math.log(self.target_val))**self.beta
 
 
-register(torch_criterion_wrapper(CrossEntropyLabelSmoothingLoss))
-
-module = torch.nn
+_module = torch.nn
+_loss_functions = [CrossEntropyLabelSmoothingLoss, CrossEntropySoftTargetLoss]
 
-for name, attr in module.__dict__.items():
+for name, attr in _module.__dict__.items():
     if name.startswith('__'):
         continue
     if not callable(attr):
         continue
     if 'Loss' not in name:
         continue
-    register(torch_criterion_wrapper(attr))
+    _loss_functions.append(attr)
+
+for loss_fn in _loss_functions:
+    register(torch_criterion_wrapper(loss_fn))
diff --git a/vega/algorithms/nas/modnas/backend/predefined/torch/data_provider.py b/vega/algorithms/nas/modnas/backend/predefined/torch/data_provider.py
index fa151e79..3b236d9d 100644
--- a/vega/algorithms/nas/modnas/backend/predefined/torch/data_provider.py
+++ b/vega/algorithms/nas/modnas/backend/predefined/torch/data_provider.py
@@ -9,7 +9,8 @@
 # MIT License for more details.
 
 """Torch data providers."""
-from modnas.utils import merge_config
+import copy
+from modnas.utils.config import merge_config
 from modnas.registry.data_provider import build
 from modnas.registry.dataloader import build as build_dataloader
 from modnas.registry.dataset import build as build_dataset
@@ -21,7 +22,7 @@ def get_data(configs):
     for conf in configs:
         if conf is None:
             continue
-        config = conf if config is None else merge_config(config, conf)
+        config = copy.deepcopy(conf) if config is None else merge_config(config, conf)
     if config is None:
         return None
     return build_dataset(config)
diff --git a/vega/algorithms/nas/modnas/backend/predefined/torch/utils.py b/vega/algorithms/nas/modnas/backend/predefined/torch/utils.py
index e656cfc5..69901ef0 100644
--- a/vega/algorithms/nas/modnas/backend/predefined/torch/utils.py
+++ b/vega/algorithms/nas/modnas/backend/predefined/torch/utils.py
@@ -11,7 +11,19 @@
 """Torch utils."""
 import numpy as np
 import torch
-from modnas.utils import format_value
+from modnas.utils import format_value, format_dict
+
+
+_device = None
+
+
+def version():
+    """Return backend version information."""
+    return format_dict({
+        'torch': torch.__version__,
+        'cuda': torch._C._cuda_getCompiledVersion(),
+        'cudnn': torch.backends.cudnn.version(),
+    }, sep=', ', kv_sep='=', fmt_key=False, fmt_val=False)
 
 
 def init_device(device=None, seed=11235):
@@ -23,6 +35,17 @@ def init_device(device=None, seed=11235):
         torch.backends.cudnn.benchmark = True
 
 
+def set_device(device):
+    """Set current device."""
+    global _device
+    _device = device
+
+
+def get_device():
+    """Return current device."""
+    return _device
+
+
 def get_dev_mem_used():
     """Return memory used in device."""
     return torch.cuda.memory_allocated() / 1024. / 1024.
diff --git a/vega/algorithms/nas/modnas/callback/base.py b/vega/algorithms/nas/modnas/callback/base.py
index 84e1af85..d663daab 100644
--- a/vega/algorithms/nas/modnas/callback/base.py
+++ b/vega/algorithms/nas/modnas/callback/base.py
@@ -11,6 +11,10 @@
 """Base callback."""
 from modnas.core.event import event_on, event_off
 from modnas.utils.logging import get_logger
+from typing import Callable, Dict, Optional, Tuple, Union
+
+
+_HANDLER_CONF_TYPE = Dict[str, Union[Tuple[Callable, int], Callable]]
 
 
 class CallbackBase():
@@ -19,11 +23,12 @@ class CallbackBase():
     logger = get_logger('callback')
     priority = 0
 
-    def __init__(self, handler_conf=None) -> None:
-        self.handlers = None
-        self.bind_handlers(handler_conf)
+    def __init__(self, handler_conf: Optional[_HANDLER_CONF_TYPE] = None) -> None:
+        self.handlers = {}
+        if handler_conf is not None:
+            self.bind_handlers(handler_conf)
 
-    def bind_handlers(self, handler_conf):
+    def bind_handlers(self, handler_conf: _HANDLER_CONF_TYPE) -> None:
         """Bind event handlers."""
         handlers = {}
         for ev, conf in handler_conf.items():
diff --git a/vega/algorithms/nas/modnas/callback/predefined/early_stopping.py b/vega/algorithms/nas/modnas/callback/predefined/early_stopping.py
index 909f1b16..872117f6 100644
--- a/vega/algorithms/nas/modnas/callback/predefined/early_stopping.py
+++ b/vega/algorithms/nas/modnas/callback/predefined/early_stopping.py
@@ -11,6 +11,13 @@
 """Early stopping."""
 from modnas.registry.callback import register
 from ..base import CallbackBase
+from collections import OrderedDict
+from modnas.estim.base import EstimBase
+from modnas.optim.base import OptimBase
+from typing import Any, Dict, Optional
+
+
+_ret_type = Optional[Dict[str, Any]]
 
 
 @register
@@ -19,7 +26,7 @@ class EarlyStopping(CallbackBase):
 
     priority = -10
 
-    def __init__(self, threshold=10):
+    def __init__(self, threshold: int = 10) -> None:
         super().__init__({
             'before:EstimBase.run': self.reset,
             'after:EstimBase.step_done': self.on_step_done,
@@ -29,19 +36,21 @@ def __init__(self, threshold=10):
         self.last_opt = -1
         self.stop = False
 
-    def reset(self, estim, optim):
+    def reset(self, estim: EstimBase, optim: OptimBase) -> None:
         """Reset callback states."""
         self.last_opt = -1
         self.stop = False
 
-    def on_step_done(self, ret, estim, params, value, arch_desc=None):
+    def on_step_done(
+        self, ret: _ret_type, estim: EstimBase, params: OrderedDict, value: float, arch_desc: Optional[Any] = None
+    ) -> _ret_type:
         """Check early stop in each step."""
         ret = ret or {}
         if ret.get('is_opt'):
             self.last_opt = -1
         return ret
 
-    def on_epoch(self, ret, estim, optim, epoch, tot_epochs):
+    def on_epoch(self, ret: _ret_type, estim: EstimBase, optim: OptimBase, epoch: int, tot_epochs: int) -> _ret_type:
         """Check early stop in each epoch."""
         self.last_opt += 1
         if self.last_opt >= self.threshold:
diff --git a/vega/algorithms/nas/modnas/callback/predefined/optimum.py b/vega/algorithms/nas/modnas/callback/predefined/optimum.py
index 042d2061..20b377a6 100644
--- a/vega/algorithms/nas/modnas/callback/predefined/optimum.py
+++ b/vega/algorithms/nas/modnas/callback/predefined/optimum.py
@@ -9,6 +9,7 @@
 # MIT License for more details.
 
 """Search optimum statistics reporter."""
+from functools import partial
 from modnas.utils import format_value
 from modnas.registry.callback import register
 from modnas.callback.base import CallbackBase
@@ -34,7 +35,7 @@ class OptimumReporter(CallbackBase):
 
     priority = 0
 
-    def __init__(self, cmp_keys=None, cmp_fn=None, cmp_th=None, score_fn=None, stat_epoch=True):
+    def __init__(self, cmp_keys=None, cmp_fn=None, cmp_th=None, score_fn=None, format_fn=None, stat_epoch=True):
         handlers = {
             'before:EstimBase.run': self.reset,
             'after:EstimBase.step_done': self.on_step_done,
@@ -50,6 +51,7 @@ def __init__(self, cmp_keys=None, cmp_fn=None, cmp_th=None, score_fn=None, stat_
         self.cmp_fn = cmp_fn or {}
         self.cmp_th = cmp_th or {}
         self.score_fn = score_fn
+        self.format_fn = format_fn or partial(format_value, unit=False, factor=0, prec=4)
         self.results = []
         self.opt_results = []
         self.ep_opt_results = []
@@ -118,7 +120,7 @@ def format_metrics(self, opts):
         if not opts:
             return None
         met = [r[1] for r in opts]
-        met = [{k: format_value(v, unit=False, factor=0, prec=4) for k, v in m.items()} for m in met]
+        met = [{k: self.format_fn(v) for k, v in m.items()} for m in met]
         met = [(list(m.values())[0] if len(m) == 1 else m) for m in met]
         if len(met) == 1:
             met = met[0]
diff --git a/vega/algorithms/nas/modnas/callback/predefined/trainer_reporter.py b/vega/algorithms/nas/modnas/callback/predefined/trainer_reporter.py
index 531ac6bf..66e0e7d8 100644
--- a/vega/algorithms/nas/modnas/callback/predefined/trainer_reporter.py
+++ b/vega/algorithms/nas/modnas/callback/predefined/trainer_reporter.py
@@ -21,37 +21,39 @@ class TrainerReporter(CallbackBase):
 
     priority = -1
 
-    def __init__(self, interval=0.2, format_fn=None):
+    def __init__(self, interval=0.2, format_fn=None, stat_cls=None):
         super().__init__({
             'after:TrainerBase.train_step': partial(self.report_step, 'train'),
             'after:TrainerBase.valid_step': partial(self.report_step, 'valid'),
-            'after:TrainerBase.train_epoch': self.report_epoch,
-            'after:TrainerBase.valid_epoch': self.report_epoch,
+            'after:TrainerBase.train_epoch': partial(self.report_epoch, 'train'),
+            'after:TrainerBase.valid_epoch': partial(self.report_epoch, 'valid'),
             'after:TrainerBase.loss': self.on_loss,
         })
         self.interval = interval
         self.format_fn = format_fn
         self.last_batch_size = 1
-        self.stats = None
+        self.stat_cls = stat_cls or AverageMeter
+        self.stats = {}
 
-    def init_stats(self, keys):
+    def init_stats(self, proc, keys):
         """Initialize statistics."""
-        self.stats = {k: AverageMeter() for k in keys}
+        self.stats[proc] = {k: self.stat_cls() for k in keys}
 
     def reset(self):
         """Reset statistics."""
-        self.stats = None
+        self.stats.clear()
         self.last_batch_size = 1
 
     def on_loss(self, ret, trainer, output, data, model):
         """Record batch size in each loss call."""
         self.last_batch_size = len(data[-1])
 
-    def report_epoch(self, ret, *args, **kwargs):
+    def report_epoch(self, proc, ret, *args, **kwargs):
         """Log statistics report in each epoch."""
         ret = ret or {}
-        if self.stats:
-            ret.update({k: v.avg for k, v in self.stats.items()})
+        proc_stats = self.stats.get(proc)
+        if proc_stats and not ret:
+            ret.update({k: v.avg for k, v in proc_stats.items()})
         self.reset()
         return None if not ret else ret
 
@@ -68,13 +70,14 @@ def report_step(self, proc, ret, trainer, estim, model, epoch, tot_epochs, step,
         stats = ret.copy() if isinstance(ret, dict) else {}
         stats = {k: v for k, v in stats.items() if isinstance(v, (int, float))}
         stats_len = stats.pop('N', self.last_batch_size)
-        if self.stats is None and stats:
-            self.init_stats(stats.keys())
+        if proc not in self.stats and stats:
+            self.init_stats(proc, stats.keys())
+        proc_stats = self.stats[proc]
         writer = trainer.writer
         for k, v in stats.items():
-            self.stats[k].update(v, n=stats_len)
+            proc_stats[k].update(v, n=stats_len)
             if writer is not None:
                 writer.add_scalar('/'.join(['trainer', proc, k]), v, cur_step)
         if interval is None or (interval != 0 and (step + 1) % interval == 0) or step + 1 == tot_steps:
-            fmt_info = format_dict({k: v.avg for k, v in self.stats.items()}, fmt_val=self.format_fn)
+            fmt_info = format_dict({k: v.avg for k, v in proc_stats.items()}, fmt_val=self.format_fn)
             trainer.logger.info('{}: [{:3d}/{}] {}'.format(proc.title(), step + 1, tot_steps, fmt_info))
diff --git a/vega/algorithms/nas/modnas/compat/trainer_callback.py b/vega/algorithms/nas/modnas/compat/trainer_callback.py
index c629e16a..640723f7 100644
--- a/vega/algorithms/nas/modnas/compat/trainer_callback.py
+++ b/vega/algorithms/nas/modnas/compat/trainer_callback.py
@@ -23,7 +23,7 @@
 from modnas.trainer.base import TrainerBase
 from modnas.utils.wrapper import init_all
 from modnas.utils.logging import get_logger
-from modnas.utils import merge_config
+from modnas.utils.config import merge_config
 
 
 logger = get_logger('compat')
@@ -238,9 +238,10 @@ def init(self):
         self.config['expman'] = self.config.get('expman', {})
         self.config['expman']['root_dir'] = FileOps.join_path(self.trainer.get_local_worker_path(), 'exp')
         self.config = merge_config(self.config, self.model.config)
-        ctx = init_all(config=self.config, base_model=self.model.net)
+        ctx = init_all(config=self.config, base_model=None)
         self.__dict__.update(ctx)
-        self.model.net = list(self.estims.values())[0].model
+        if self.model.net is None:
+            self.model.net = list(self.estims.values())[0].model
         if self.optim:
             self.search_alg.set_optim(self.optim)
         self.wrp_trainer = VegaTrainerWrapper(self.trainer)
@@ -255,7 +256,7 @@ def before_train(self, logs=None):
         self.config = copy.deepcopy(self.trainer_config.modnas)
         self.model = self.trainer.model
         self.search_alg = None
-        if not self.config.get('fully_train'):
+        if self.config.get('vega_train', False) is False:
             self.search_alg = SearchAlgorithm(SearchSpace())
         self.trainer.train_loader = self.trainer._init_dataloader(mode='train')
         self.trainer.valid_loader = self.trainer._init_dataloader(mode='val')
diff --git a/vega/algorithms/nas/modnas/contrib/callback/metrics_stats.py b/vega/algorithms/nas/modnas/contrib/callback/metrics_stats.py
index 0552a571..ea8bc972 100644
--- a/vega/algorithms/nas/modnas/contrib/callback/metrics_stats.py
+++ b/vega/algorithms/nas/modnas/contrib/callback/metrics_stats.py
@@ -14,6 +14,11 @@
 from modnas.registry.callback import register
 from modnas.callback.base import CallbackBase
 from matplotlib import pyplot as plt
+from collections import OrderedDict
+from modnas.estim.base import EstimBase
+from modnas.optim.base import OptimBase
+from typing import Dict, List, Tuple, Optional, Any
+
 plt.switch_backend('Agg')
 
 
@@ -21,7 +26,7 @@
 class MetricsStatsReporter(CallbackBase):
     """Metrics statistics reporter class."""
 
-    def __init__(self, axis_list=None):
+    def __init__(self, axis_list: List[Tuple[int, int]] = None) -> None:
         super().__init__({
             'after:EstimBase.step_done': self.on_step_done,
             'after:EstimBase.run': self.save_stats,
@@ -29,11 +34,14 @@ def __init__(self, axis_list=None):
         self.results = []
         self.axis_list = axis_list
 
-    def on_step_done(self, ret, estim, params, value, arch_desc=None):
+    def on_step_done(
+        self, ret: Dict[str, bool], estim: EstimBase, params: Optional[OrderedDict],
+        value: Dict[str, float], arch_desc: Optional[Any] = None
+    ) -> None:
         """Record Estimator evaluation result on each step."""
         self.results.append((params, value))
 
-    def save_stats(self, ret, estim, optim):
+    def save_stats(self, ret: Dict[str, Any], estim: EstimBase, optim: OptimBase) -> Dict[str, Any]:
         """Save statistics on search end."""
         results = self.results
         if not results:
@@ -57,3 +65,4 @@ def save_stats(self, ret, estim, optim):
             pickle.dump(results, f)
             self.logger.info('metrics results saved to {}'.format(result_path))
         self.results = []
+        return ret
diff --git a/vega/algorithms/nas/modnas/contrib/callback/mixedop_stats.py b/vega/algorithms/nas/modnas/contrib/callback/mixedop_stats.py
index 89edd67c..ad2b0205 100644
--- a/vega/algorithms/nas/modnas/contrib/callback/mixedop_stats.py
+++ b/vega/algorithms/nas/modnas/contrib/callback/mixedop_stats.py
@@ -16,6 +16,10 @@
 from modnas.arch_space.mixed_ops import MixedOp
 from modnas.callback.base import CallbackBase
 from matplotlib import pyplot as plt
+from modnas.estim.base import EstimBase
+from modnas.optim.base import OptimBase
+from typing import Dict, Optional, Any
+
 plt.switch_backend('Agg')
 
 
@@ -23,18 +27,20 @@
 class MixedOpStatsReporter(CallbackBase):
     """Mixed operator statistics reporter class."""
 
-    def __init__(self):
+    def __init__(self) -> None:
         super().__init__({
             'before:EstimBase.run_epoch': self.record_probs,
             'after:EstimBase.run': self.save_stats,
         })
         self.probs = []
 
-    def record_probs(self, estim, optim, epoch, tot_epochs):
+    def record_probs(
+        self, estim: EstimBase, optim: Optional[OptimBase], epoch: Optional[int], tot_epochs: Optional[int]
+    ) -> None:
         """Record mixed operator probabilities on each epoch."""
         self.probs.append([F.softmax(m.alpha().detach(), dim=-1).cpu().numpy() for m in MixedOp.gen(estim.model)])
 
-    def save_stats(self, ret, estim, optim):
+    def save_stats(self, ret: Dict[str, Any], estim: EstimBase, optim: OptimBase) -> Dict[str, Any]:
         """Save statistics on search end."""
         self.record_probs(estim, None, None, None)
         probs = self.probs
@@ -58,3 +64,4 @@ def save_stats(self, ret, estim, optim):
             pickle.dump(save_probs, f)
             self.logger.info('mixed op probs saved to {}'.format(probs_path))
         self.probs = []
+        return ret
diff --git a/vega/algorithms/nas/modnas/contrib/estim/fakedata.py b/vega/algorithms/nas/modnas/contrib/estim/fakedata.py
index 2f31e2c5..65fd455e 100644
--- a/vega/algorithms/nas/modnas/contrib/estim/fakedata.py
+++ b/vega/algorithms/nas/modnas/contrib/estim/fakedata.py
@@ -12,20 +12,22 @@
 import numpy as np
 from modnas.core.param_space import ParamSpace
 from modnas.core.params import Categorical
-from modnas.registry.estim import RegressionEstim
+from modnas.estim.predefined.regression import RegressionEstim
 from modnas.registry.construct import register as register_constructor
 from modnas.registry.estim import register as register_estim
+from modnas.optim.base import OptimBase
+from typing import Dict, List, Union
 
 
 @register_constructor
 class FakeDataSpaceConstructor():
     """Fake data space constructor class."""
 
-    def __init__(self, n_params=2**5, dim=2**1):
+    def __init__(self, n_params: int = 2**5, dim: int = 2**1) -> None:
         self.n_params = n_params
         self.dim = dim
 
-    def __call__(self, model):
+    def __call__(self, model: None) -> None:
         """Construct search space."""
         del model
         _ = [Categorical(list(range(self.dim))) for _ in range(self.n_params)]
@@ -34,7 +36,9 @@ def __call__(self, model):
 class FakeDataPredictor():
     """Fake data regression predictor class."""
 
-    def __init__(self, score_dim=1, seed=11235, random_score=False, noise_scale=0.01):
+    def __init__(
+        self, score_dim: int = 1, seed: int = 11235, random_score: bool = False, noise_scale: float = 0.01
+    ) -> None:
         super().__init__()
         self.rng = np.random.RandomState(seed)
         self.score_dim = score_dim
@@ -42,7 +46,7 @@ def __init__(self, score_dim=1, seed=11235, random_score=False, noise_scale=0.01
         self.noise_scale = noise_scale
         self.scores = {'dim_{}'.format(i): {} for i in range(score_dim)}
 
-    def get_score(self, params, scores):
+    def get_score(self, params: Dict[str, int], scores: Dict[str, Union[List[float]]]) -> float:
         """Return score of given parameters."""
         score = 0
         for pn, v in params.items():
@@ -52,7 +56,7 @@ def get_score(self, params, scores):
             if pn not in scores:
                 if self.random_score:
                     p_score = self.rng.rand(dim)
-                    p_score = p_score / np.max(p_score)
+                    p_score = (p_score / np.max(p_score)).tolist()
                 else:
                     p_score = list(range(dim))
                 scores[pn] = p_score
@@ -61,7 +65,7 @@ def get_score(self, params, scores):
         score += 0 if self.noise_scale is None else self.rng.normal(loc=0, scale=self.noise_scale)
         return score
 
-    def predict(self, params):
+    def predict(self, params: Dict[str, int]) -> Union[float, Dict[str, float]]:
         """Return predicted evaluation results."""
         scores = {k: self.get_score(params, v) for k, v in self.scores.items()}
         if len(scores) == 1:
@@ -73,10 +77,10 @@ def predict(self, params):
 class FakeDataEstim(RegressionEstim):
     """Fake data regression estimator class."""
 
-    def __init__(self, *args, pred_conf=None, **kwargs):
+    def __init__(self, *args, pred_conf=None, **kwargs) -> None:
         super().__init__(*args, predictor=FakeDataPredictor(**(pred_conf or {})), **kwargs)
 
-    def run(self, optim):
+    def run(self, optim: OptimBase) -> None:
         """Run Estimator routine."""
         ret = super().run(optim)
         scores = self.predictor.scores
diff --git a/vega/algorithms/nas/modnas/contrib/metrics/profiler_metrics.py b/vega/algorithms/nas/modnas/contrib/metrics/profiler_metrics.py
index 7596222d..8039bf07 100644
--- a/vega/algorithms/nas/modnas/contrib/metrics/profiler_metrics.py
+++ b/vega/algorithms/nas/modnas/contrib/metrics/profiler_metrics.py
@@ -13,19 +13,21 @@
 import torch
 from modnas.registry.metrics import register
 from modnas.metrics.base import MetricsBase
+from typing import Optional
+from rasp.profiler.tree import StatTreeNode
 
 
 @register
 class LocalProfilerMetrics(MetricsBase):
     """Local network hardware performance profiler metrics class."""
 
-    def __init__(self, device=None, rep=50, warmup=10):
+    def __init__(self, device: Optional[str] = None, rep: int = 50, warmup: int = 10) -> None:
         super().__init__()
         self.rep = rep
         self.warmup = warmup
         self.device = device
 
-    def __call__(self, node):
+    def __call__(self, node: StatTreeNode) -> float:
         """Return metrics output."""
         in_shape = node['in_shape']
         op = node.module
diff --git a/vega/algorithms/nas/modnas/contrib/optim/hyperopt.py b/vega/algorithms/nas/modnas/contrib/optim/hyperopt.py
new file mode 100644
index 00000000..69ef0a39
--- /dev/null
+++ b/vega/algorithms/nas/modnas/contrib/optim/hyperopt.py
@@ -0,0 +1,133 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""Optim wrapper for Hyperopt."""
+import numpy as np
+from collections import OrderedDict
+from modnas.registry.optim import register
+from modnas.optim.base import OptimBase
+from modnas.core.params import Categorical, Numeric
+try:
+    import hyperopt
+    from hyperopt import tpe, hp
+    import hyperopt.utils
+    import hyperopt.base
+except ImportError:
+    hyperopt = None
+
+
+@register
+class HyperoptOptim(OptimBase):
+    """Hyperopt Optimizer class."""
+
+    MAX_RANDINT = 2 ** 31 - 1
+
+    def __init__(self, hyperopt_args=None, space=None):
+        super().__init__(space)
+        if hyperopt is None:
+            raise ValueError('Hyperopt is not installed')
+        hyperopt_dims = {}
+        for n, p in self.space.named_params():
+            if isinstance(p, Numeric):
+                args = {
+                    'low': p.bound[0],
+                    'high': p.bound[1],
+                }
+                if p.is_int():
+                    sd = hp.uniformint(n, **args)
+                else:
+                    sd = hp.uniform(n, **args)
+            elif isinstance(p, Categorical):
+                sd = hp.choice(n, p.choices)
+            else:
+                continue
+            hyperopt_dims[n] = sd
+        hyperopt_args = hyperopt_args or {}
+        self.algo = tpe.suggest
+        self.trials = hyperopt.base.Trials()
+        self.domain = hyperopt.base.Domain(self.hp_eval, hyperopt_dims)
+        self.rstate = np.random.RandomState()
+        self.next_pts = []
+
+    def hp_eval(self, args):
+        """Hyperopt objective wrapper."""
+        self.next_pts.append(args)
+        return 0
+
+    def has_next(self):
+        """Return True if Optimizer has the next set of parameters."""
+        return True
+
+    def convert_param(self, p):
+        """Return value converted from Hyperopt space."""
+        if isinstance(p, (np.float, np.float64)):
+            return float(p)
+        if isinstance(p, (np.int, np.int64)):
+            return int(p)
+        return p
+
+    def next(self, batch_size):
+        """Return the next batch of parameter sets."""
+        n_to_enqueue = batch_size
+        new_ids = self.trials.new_trial_ids(n_to_enqueue)
+        self.trials.refresh()
+        seed = self.rstate.randint(self.MAX_RANDINT)
+        new_trials = self.algo(new_ids, self.domain, self.trials, seed)
+        self.trials.insert_trial_docs(new_trials)
+        self.trials.refresh()
+        for trial in self.trials._dynamic_trials:
+            if trial['state'] == hyperopt.base.JOB_STATE_NEW:
+                trial['state'] = hyperopt.base.JOB_STATE_RUNNING
+                now = hyperopt.utils.coarse_utcnow()
+                trial['book_time'] = now
+                trial['refresh_time'] = now
+                spec = hyperopt.base.spec_from_misc(trial['misc'])
+                ctrl = hyperopt.base.Ctrl(self.trials, current_trial=trial)
+                self.domain.evaluate(spec, ctrl)
+        self.trials.refresh()
+
+        next_params = []
+        for pt in self.next_pts:
+            params = OrderedDict()
+            for n, p in pt.items():
+                params[n] = self.convert_param(p)
+            next_params.append(params)
+        self.next_pts.clear()
+        return next_params
+
+    def step(self, estim):
+        """Update Optimizer states using Estimator evaluation results."""
+        def to_metrics(res):
+            if res is None:
+                return None
+            if isinstance(res, dict):
+                v = list(res.values())[0]
+            if isinstance(res, (tuple, list)):
+                v = res[0]
+            else:
+                v = res
+            return v
+
+        _, results = estim.get_last_results()
+        skresults = [to_metrics(r) for r in results]
+        trials = filter(lambda x: x['state'] == hyperopt.base.JOB_STATE_RUNNING, self.trials._dynamic_trials)
+        for trial, result in zip(trials, skresults):
+            now = hyperopt.utils.coarse_utcnow()
+            if result is None:
+                trial['state'] = hyperopt.base.JOB_STATE_ERROR
+                trial['refresh_time'] = now
+            else:
+                trial['state'] = hyperopt.base.JOB_STATE_DONE
+                trial['result'] = {
+                    'loss': -result,
+                    'status': hyperopt.base.STATUS_OK
+                }
+                trial['refresh_time'] = now
+        self.trials.refresh()
diff --git a/vega/algorithms/nas/modnas/contrib/optim/skopt.py b/vega/algorithms/nas/modnas/contrib/optim/skopt.py
index e27f535b..a1c6ed1a 100644
--- a/vega/algorithms/nas/modnas/contrib/optim/skopt.py
+++ b/vega/algorithms/nas/modnas/contrib/optim/skopt.py
@@ -13,8 +13,12 @@
 import numpy as np
 from collections import OrderedDict
 from modnas.registry.optim import register
+from modnas.estim.base import EstimBase
 from modnas.optim.base import OptimBase
 from modnas.core.params import Categorical as ParamCategorical, Numeric
+from modnas.core.param_space import ParamSpace
+from typing import List, Dict, Optional
+
 try:
     import skopt
     from skopt import Optimizer
@@ -27,7 +31,7 @@
 class SkoptOptim(OptimBase):
     """Scikit-optimize Optimizer class."""
 
-    def __init__(self, skopt_args=None, space=None):
+    def __init__(self, skopt_args: Optional[Dict] = None, space: Optional[ParamSpace] = None) -> None:
         super().__init__(space)
         if skopt is None:
             raise ValueError('scikit-optimize is not installed')
@@ -52,17 +56,19 @@ def __init__(self, skopt_args=None, space=None):
         self.param_names = param_names
         self.skoptim = Optimizer(**skopt_args)
 
-    def has_next(self):
+    def has_next(self) -> bool:
         """Return True if Optimizer has the next set of parameters."""
         return True
 
-    def convert_param(self, p):
+    def convert_param(self, p: float) -> float:
         """Return value converted from scikit-optimize space."""
-        if isinstance(p, np.float):
+        if isinstance(p, (np.float, np.float64)):
             return float(p)
+        if isinstance(p, (np.int, np.int64)):
+            return int(p)
         return p
 
-    def _next(self):
+    def _next(self) -> OrderedDict:
         """Return the next set of parameters."""
         next_pt = self.skoptim.ask()
         next_params = OrderedDict()
@@ -70,7 +76,7 @@ def _next(self):
             next_params[n] = self.convert_param(p)
         return next_params
 
-    def next(self, batch_size):
+    def next(self, batch_size: int) -> List[OrderedDict]:
         """Return the next batch of parameter sets."""
         if batch_size == 1:
             return [self._next()]
@@ -83,7 +89,7 @@ def next(self, batch_size):
             next_params.append(params)
         return next_params
 
-    def step(self, estim):
+    def step(self, estim: EstimBase) -> None:
         """Update Optimizer states using Estimator evaluation results."""
         def to_metrics(res):
             if isinstance(res, dict):
diff --git a/vega/algorithms/nas/modnas/core/__init__.py b/vega/algorithms/nas/modnas/core/__init__.py
index 53cf4adc..3c29e3c6 100644
--- a/vega/algorithms/nas/modnas/core/__init__.py
+++ b/vega/algorithms/nas/modnas/core/__init__.py
@@ -9,9 +9,10 @@
 # MIT License for more details.
 
 from functools import wraps, partial
+from typing import Callable, Type, List, Any
 
 
-def make_decorator(func):
+def make_decorator(func: Callable) -> Callable:
     """Return wrapped function that acts as decorator if no extra positional args are given."""
 
     @wraps(func)
@@ -23,9 +24,9 @@ def wrapped(*args, **kwargs):
     return wrapped
 
 
-def singleton(cls):
+def singleton(cls: Type) -> Callable:
     """Return wrapped class that has only one instance."""
-    inst = []
+    inst: List[Any] = []
 
     @wraps(cls)
     def get_instance(*args, **kwargs):
diff --git a/vega/algorithms/nas/modnas/core/event.py b/vega/algorithms/nas/modnas/core/event.py
index 34310997..12217d48 100644
--- a/vega/algorithms/nas/modnas/core/event.py
+++ b/vega/algorithms/nas/modnas/core/event.py
@@ -13,7 +13,8 @@
 from functools import wraps
 from . import singleton, make_decorator
 from modnas.utils.logging import get_logger
-from modnas.utils import merge_config
+from modnas.utils.config import merge_config
+from typing import Any, Callable, Optional, Type, Union
 
 
 logger = get_logger(__name__)
@@ -95,14 +96,17 @@ def dispatch_all(self, merge_ret=False, chain_ret=True, fret=None, is_ret=False)
 
 
 @make_decorator
-def event_hooked(func, name=None, before=True, after=True, pass_ret=True, qual=True, module=False, **emit_args):
+def event_hooked(
+    func: Callable, name: Union[str, bool] = True, before: Union[str, bool] = True, after: Union[str, bool] = True,
+    pass_ret: bool = True, qual: Union[str, bool] = True, module: Union[str, bool] = False, **emit_args
+) -> Callable:
     """Return wrapped function with hooked event triggers."""
-    qual = func.__qualname__.split('.')[0] if qual is True else (None if qual is False else qual)
-    module = func.__module__ if module is True else (None if module is False else module)
-    name = func.__name__ if name is None else (None if name is False else name)
-    ev = (module + '.' if module else '') + (qual + '.' if qual else '') + name
-    ev_before = None if before is False else (('before' if before is True else before) + ':' + ev)
-    ev_after = None if after is False else (('after' if after is True else after) + ':' + ev)
+    _qual = qual if isinstance(qual, str) else (func.__qualname__.split('.')[0] if qual else '')
+    _module = module if isinstance(module, str) else (func.__module__ if module else '')
+    _name = name if isinstance(name, str) else (func.__name__ if name else '')
+    ev = '.'.join(filter(lambda x: x, [_module, _qual, _name]))
+    ev_before = None if before is False else '{}:{}'.format('before' if before is True else before, ev)
+    ev_after = None if after is False else '{}:{}'.format('after' if after is True else after, ev)
 
     @wraps(func)
     def wrapped(*args, **kwargs):
@@ -128,7 +132,7 @@ def wrapped(*args, **kwargs):
 
 
 @make_decorator
-def event_unhooked(func, remove_all=False, before=False, after=False):
+def event_unhooked(func: Callable, remove_all: bool = False, before: bool = False, after: bool = False) -> Callable:
     """Return function with event hooks removed."""
     func.ev_before = None if before is False else func.ev_before
     func.ev_after = None if after is False else func.ev_after
@@ -138,7 +142,9 @@ def event_unhooked(func, remove_all=False, before=False, after=False):
 
 
 @make_decorator
-def event_hooked_method(obj, attr=None, method=None, *args, base_qual=True, **kwargs):
+def event_hooked_method(obj: Any, attr: Optional[str] = None, method: Optional[Callable] = None, *args,
+                        base_qual=True, **kwargs
+                        ) -> Any:
     """Return object with event hooked for given method."""
     if attr is None and inspect.ismethod(obj):
         attr = obj.__name__
@@ -161,7 +167,7 @@ def event_hooked_method(obj, attr=None, method=None, *args, base_qual=True, **kw
 
 
 @make_decorator
-def event_hooked_members(obj, *args, methods=None, is_method=False, is_function=False, **kwargs):
+def event_hooked_members(obj: Any, *args, methods=None, is_method=False, is_function=False, **kwargs) -> Any:
     """Return object with event hooked for member methods."""
     for attr, mem in inspect.getmembers(obj):
         if methods is not None and attr not in methods:
@@ -175,7 +181,7 @@ def event_hooked_members(obj, *args, methods=None, is_method=False, is_function=
 
 
 @make_decorator
-def event_hooked_inst(cls, *args, **kwargs):
+def event_hooked_inst(cls: Type, *args, **kwargs) -> Callable:
     """Return object factory with event hooked for member methods in instances."""
     @wraps(cls)
     def wrapped(*cls_args, **cls_kwargs):
@@ -186,17 +192,18 @@ def wrapped(*cls_args, **cls_kwargs):
 
 
 @make_decorator
-def event_hooked_class(cls, *args, **kwargs):
+def event_hooked_class(cls: Type, *args, **kwargs) -> Type:
     """Return class with event hooked for member methods."""
     event_hooked_members(cls, *args, is_function=True, **kwargs)
     return cls
 
 
 @make_decorator
-def event_hooked_subclass(cls, *args, **kwargs):
+def event_hooked_subclass(cls: Type, *args, **kwargs) -> Type:
     """Return class with event hooked for member methods in all subclasses."""
     ori_new = cls.__new__
 
+    @wraps(ori_new)
     def mod_new(cls, *fn_args, **fn_kwargs):
         if not mod_new.hooked.get(cls, False):
             event_hooked_class(cls, *args, **kwargs)
@@ -208,7 +215,7 @@ def mod_new(cls, *fn_args, **fn_kwargs):
 
 
 @make_decorator
-def event_hooked_subclass_inst(cls, *args, **kwargs):
+def event_hooked_subclass_inst(cls: Type, *args, **kwargs) -> Type:
     """Return class with event hooked for member methods in all subclass instances."""
     ori_init = cls.__init__
 
diff --git a/vega/algorithms/nas/modnas/core/params/base.py b/vega/algorithms/nas/modnas/core/params/base.py
index 5f3e86ec..5dad4c4f 100644
--- a/vega/algorithms/nas/modnas/core/params/base.py
+++ b/vega/algorithms/nas/modnas/core/params/base.py
@@ -12,17 +12,19 @@
 from collections import OrderedDict
 from modnas.core.event import event_emit, event_on
 from modnas.core.param_space import ParamSpace
+from typing import Any, Dict, Optional, Union, Callable
 
 
 class Param():
     """Base parameter class."""
 
-    def __init__(self, name=None, space=None, on_update=None):
+    def __init__(
+        self, name: Optional[str] = None, space: Optional[ParamSpace] = None, on_update: Optional[Callable] = None
+    ) -> None:
         self.name = None
         self._parent = None
         self._children = OrderedDict()
-        space = space or ParamSpace()
-        space.register(self, name)
+        (space or ParamSpace()).register(self, name)
         self.event_name = 'update:{}'.format(self.name)
         if on_update is not None:
             event_on(self.event_name, on_update)
@@ -33,7 +35,7 @@ def set_value_hooked(*args, **kwargs):
             self.on_update()
         self.set_value = set_value_hooked
 
-    def __repr__(self):
+    def __repr__(self) -> str:
         """Return representation string."""
         return '{}(name={}, {})'.format(self.__class__.__name__, self.name, self.extra_repr())
 
@@ -55,11 +57,11 @@ def set_value(self, value):
             raise ValueError('Invalid parameter value')
         self.val = value
 
-    def on_update(self):
+    def on_update(self) -> None:
         """Trigger parameter update event."""
         event_emit(self.event_name, self)
 
-    def __deepcopy__(self, memo):
+    def __deepcopy__(self, memo: Dict[Union[int, str], Any]) -> Any:
         """Return deepcopy."""
         # disable deepcopy
         return self
diff --git a/vega/algorithms/nas/modnas/core/params/default.py b/vega/algorithms/nas/modnas/core/params/default.py
index b1dbbb72..18dfe625 100644
--- a/vega/algorithms/nas/modnas/core/params/default.py
+++ b/vega/algorithms/nas/modnas/core/params/default.py
@@ -13,9 +13,11 @@
 import numpy as np
 from .base import Param
 from modnas.registry.params import register
+from modnas.core.param_space import ParamSpace
+from typing import Callable, List, Optional, Union, Any
 
 
-def _default_categorical_sampler(dim):
+def _default_categorical_sampler(dim: int) -> int:
     return np.random.randint(dim)
 
 
@@ -33,14 +35,17 @@ class Categorical(Param):
 
     TYPE = 'C'
 
-    def __init__(self, choices, sampler=None, name=None, space=None, on_update=None):
+    def __init__(
+        self, choices: List[Any], sampler: Optional[Callable[[int], int]] = None, name: Optional[str] = None,
+        space: Optional[ParamSpace] = None, on_update: Optional[Callable[[int], None]] = None
+    ) -> None:
         super().__init__(name, space, on_update)
         self.sample = _default_categorical_sampler if sampler is None else sampler
         self.choices = choices
-        self._length = None
-        self.val = None
+        self._length = -1
+        self.val = 0
 
-    def extra_repr(self):
+    def extra_repr(self) -> str:
         """Return extra representation string."""
         return 'choices={}'.format(self.choices)
 
@@ -48,26 +53,26 @@ def is_valid(self, value):
         """Return if the value is valid."""
         return value in self.choices
 
-    def get_value(self, index):
+    def get_value(self, index: int) -> Any:
         """Return value for given index."""
         return self.choices[index]
 
-    def set_value(self, value, index=None):
+    def set_value(self, value: Any, index: Optional[int] = None) -> None:
         """Set parameter value."""
         index = self.get_index(value) if index is None else index
         self.val = index
 
-    def value(self):
+    def value(self) -> Any:
         """Return parameter value."""
         return self.choices[self.index()]
 
-    def index(self):
+    def index(self) -> int:
         """Return parameter index."""
         if self.val is None:
             self.val = self.sample(len(self.choices))
         return self.val
 
-    def get_index(self, value):
+    def get_index(self, value: Any) -> int:
         """Return parameter index for given value."""
         return self.choices.index(value)
 
@@ -75,9 +80,9 @@ def set_index(self, index):
         """Set parameter index."""
         self.set_value(index=index)
 
-    def __len__(self):
+    def __len__(self) -> int:
         """Return choice size."""
-        if self._length is None:
+        if self._length == -1:
             self._length = len(self.choices)
         return self._length
 
diff --git a/vega/algorithms/nas/modnas/core/params/torch.py b/vega/algorithms/nas/modnas/core/params/torch.py
index 0e71628b..9e51d3a5 100644
--- a/vega/algorithms/nas/modnas/core/params/torch.py
+++ b/vega/algorithms/nas/modnas/core/params/torch.py
@@ -12,9 +12,12 @@
 import torch
 from .base import Param
 from modnas.registry.params import register
+from modnas.core.param_space import ParamSpace
+from torch.nn.parameter import Parameter
+from typing import Optional, Callable
 
 
-def _default_tensor_sampler(shape, init_ratio=1e-3):
+def _default_tensor_sampler(shape: int, init_ratio: float = 1e-3) -> Parameter:
     return torch.nn.Parameter(init_ratio * torch.randn(shape))
 
 
@@ -24,14 +27,17 @@ class TorchTensor(Param):
 
     TYPE = 'T'
 
-    def __init__(self, shape, sampler=None, name=None, space=None, on_update=None):
+    def __init__(
+        self, shape: int, sampler: Optional[Callable] = None, name: Optional[str] = None,
+        space: Optional[ParamSpace] = None, on_update: Optional[Callable] = None
+    ) -> None:
         super().__init__(name, space, on_update)
         self.sample = _default_tensor_sampler if sampler is None else sampler
         self.shape = shape
         self.val = self.sample(self.shape)
         self._length = None
 
-    def extra_repr(self):
+    def extra_repr(self) -> str:
         """Return extra representation string."""
         return 'shape={}'.format(self.shape)
 
@@ -39,12 +45,12 @@ def is_valid(self, value):
         """Return if the value is valid."""
         return isinstance(value, torch.Tensor)
 
-    def value(self):
+    def value(self) -> Parameter:
         """Return parameter value."""
         if self.val is None:
             self.val = self.sample(self.shape)
         return self.val
 
-    def set_value(self, value):
+    def set_value(self, value: Parameter) -> None:
         """Set parameter value."""
         self.val = value
diff --git a/vega/algorithms/nas/modnas/data_provider/base.py b/vega/algorithms/nas/modnas/data_provider/base.py
index a42ff0b7..b1e78966 100644
--- a/vega/algorithms/nas/modnas/data_provider/base.py
+++ b/vega/algorithms/nas/modnas/data_provider/base.py
@@ -41,10 +41,10 @@ def reset_valid_iter(self):
         """Reset validate iterator."""
         raise NotImplementedError
 
-    def get_num_train_batch(self):
+    def get_num_train_batch(self, epoch: int):
         """Return number of train batches in current epoch."""
         raise NotImplementedError
 
-    def get_num_valid_batch(self):
+    def get_num_valid_batch(self, epoch: int):
         """Return number of validate batches in current epoch."""
         raise NotImplementedError
diff --git a/vega/algorithms/nas/modnas/data_provider/dataloader/torch/default.py b/vega/algorithms/nas/modnas/data_provider/dataloader/torch/default.py
index bbecd17d..9c692644 100644
--- a/vega/algorithms/nas/modnas/data_provider/dataloader/torch/default.py
+++ b/vega/algorithms/nas/modnas/data_provider/dataloader/torch/default.py
@@ -10,28 +10,33 @@
 
 """Default DataLoader."""
 import random
-from torch.utils.data import DataLoader
+from torch.utils.data.dataloader import DataLoader
+from torch.utils.data.dataset import Dataset
 from torch.utils.data.sampler import SubsetRandomSampler
 from modnas.registry.data_loader import register
 from modnas.utils.logging import get_logger
+from typing import Any, Dict, Optional, Tuple, Union, Callable
 
 
 logger = get_logger('data_loader')
 
 
 @register
-def DefaultDataLoader(trn_data,
-                      val_data,
-                      parallel_multiplier=1,
-                      trn_batch_size=64,
-                      val_batch_size=64,
-                      workers=2,
-                      train_size=0,
-                      train_ratio=1.,
-                      train_seed=1,
-                      valid_size=0,
-                      valid_ratio=0.,
-                      valid_seed=1):
+def DefaultDataLoader(
+        trn_data: Dataset,
+        val_data: Optional[Dataset],
+        trn_batch_size: int = 64,
+        val_batch_size: int = 64,
+        workers: int = 2,
+        collate_fn: Optional[Callable] = None,
+        parallel_multiplier: int = 1,
+        train_size: int = 0,
+        train_ratio: float = 1.,
+        train_seed: int = 1,
+        valid_size: int = 0,
+        valid_ratio: Union[float, int] = 0.,
+        valid_seed: int = 1,
+) -> Tuple[Optional[DataLoader], Optional[DataLoader]]:
     """Return default DataLoader."""
     # index
     n_train_data = len(trn_data)
@@ -65,10 +70,12 @@ def DefaultDataLoader(trn_data,
     trn_batch_size *= parallel_multiplier
     val_batch_size *= parallel_multiplier
     workers *= parallel_multiplier
-    extra_kwargs = {
+    extra_kwargs: Dict[str, Any] = {
         'num_workers': workers,
         'pin_memory': True,
     }
+    if collate_fn is not None:
+        extra_kwargs['collate_fn'] = collate_fn
     if len(trn_idx) > 0:
         trn_sampler = SubsetRandomSampler(trn_idx)
         trn_loader = DataLoader(trn_data, batch_size=trn_batch_size, sampler=trn_sampler, **extra_kwargs)
diff --git a/vega/algorithms/nas/modnas/data_provider/dataloader/torch/image_cls.py b/vega/algorithms/nas/modnas/data_provider/dataloader/torch/image_cls.py
index 132cf921..edb7ba08 100644
--- a/vega/algorithms/nas/modnas/data_provider/dataloader/torch/image_cls.py
+++ b/vega/algorithms/nas/modnas/data_provider/dataloader/torch/image_cls.py
@@ -11,16 +11,19 @@
 """Dataloader for Image classification."""
 import random
 import numpy as np
-from torch.utils.data import DataLoader
+from torch.utils.data.dataloader import DataLoader
+from torch.utils.data.dataset import Dataset
 from torch.utils.data.sampler import SubsetRandomSampler
 from modnas.registry.data_loader import register
 from modnas.utils.logging import get_logger
+from typing import Any, Dict, List, Optional, Tuple, Union, Callable
 
 
+CLASSES_TYPE = Union[int, List[Union[str, int]]]
 logger = get_logger('data_loader')
 
 
-def get_label_class(label):
+def get_label_class(label: int) -> int:
     """Return class index of given label."""
     if isinstance(label, float):
         label_cls = int(label)
@@ -33,7 +36,7 @@ def get_label_class(label):
     return label_cls
 
 
-def get_dataset_label(data):
+def get_dataset_label(data: Dataset) -> List[int]:
     """Return label of given data."""
     if hasattr(data, 'targets'):
         return [c for c in data.targets]
@@ -53,12 +56,14 @@ def get_dataset_class(data):
     return []
 
 
-def filter_index_class(data_idx, labels, classes):
+def filter_index_class(data_idx: List[int], labels: List[int], classes: List[int]) -> List[int]:
     """Return data indices from given classes."""
     return [idx for idx in data_idx if get_label_class(labels[idx]) in classes]
 
 
-def train_valid_split(trn_idx, train_labels, class_size):
+def train_valid_split(
+    trn_idx: List[int], train_labels: List[int], class_size: Dict[int, int]
+) -> Tuple[List[int], List[int]]:
     """Return split train and valid data indices."""
     random.shuffle(trn_idx)
     train_idx, valid_idx = [], []
@@ -87,7 +92,7 @@ def map_data_label(data, mapping):
         data.test_labels = [mapping.get(get_label_class(c), c) for c in labels]
 
 
-def select_class(trn_data, classes):
+def select_class(trn_data: Dataset, classes: Optional[CLASSES_TYPE] = None) -> List[int]:
     """Return train data class list selected from given classes."""
     all_classes = list(set([get_label_class(c) for c in get_dataset_label(trn_data)]))
     if isinstance(classes, int):
@@ -111,20 +116,22 @@ def select_class(trn_data, classes):
 
 
 @register
-def ImageClsDataLoader(trn_data,
-                       val_data,
-                       classes=None,
-                       trn_batch_size=64,
-                       val_batch_size=64,
-                       workers=2,
-                       collate_fn=None,
-                       parallel_multiplier=1,
-                       train_size=0,
-                       train_ratio=1.,
-                       train_seed=1,
-                       valid_size=0,
-                       valid_ratio=0.,
-                       valid_seed=1):
+def ImageClsDataLoader(
+        trn_data: Dataset,
+        val_data: Optional[Dataset],
+        classes: Optional[CLASSES_TYPE] = None,
+        trn_batch_size: int = 64,
+        val_batch_size: int = 64,
+        workers: int = 2,
+        collate_fn: Optional[Callable] = None,
+        parallel_multiplier: int = 1,
+        train_size: int = 0,
+        train_ratio: float = 1.,
+        train_seed: int = 1,
+        valid_size: int = 0,
+        valid_ratio: Union[float, int] = 0.,
+        valid_seed: int = 1
+) -> Tuple[Optional[DataLoader], Optional[DataLoader]]:
     """Return image classification DataLoader."""
     # classes
     trn_labels = get_dataset_label(trn_data)
@@ -175,7 +182,7 @@ def ImageClsDataLoader(trn_data,
     trn_batch_size *= parallel_multiplier
     val_batch_size *= parallel_multiplier
     workers *= parallel_multiplier
-    extra_kwargs = {
+    extra_kwargs: Dict[str, Any] = {
         'num_workers': workers,
         'pin_memory': True,
     }
diff --git a/vega/algorithms/nas/modnas/data_provider/dataset/torch/image_cls.py b/vega/algorithms/nas/modnas/data_provider/dataset/torch/image_cls.py
index 239fa224..b356e0cb 100644
--- a/vega/algorithms/nas/modnas/data_provider/dataset/torch/image_cls.py
+++ b/vega/algorithms/nas/modnas/data_provider/dataset/torch/image_cls.py
@@ -14,35 +14,51 @@
 import torch
 from torchvision import transforms, datasets
 from modnas.registry.dataset import register
+from typing import Callable, Optional, Dict, List, Any
+from torch.utils.data.dataset import Dataset
+
+
+_metadata = {
+    'cifar10': {
+        'mean': [0.49139968, 0.48215827, 0.44653124],
+        'stddev': [0.24703233, 0.24348505, 0.26158768],
+    },
+    'cifar100': {
+        'mean': [0.5070751592371323, 0.48654887331495095, 0.4409178433670343],
+        'stddev': [0.2673342858792401, 0.2564384629170883, 0.27615047132568404],
+    },
+    'mnist': {
+        'mean': [0.13066051707548254],
+        'stddev': [0.30810780244715075],
+    },
+    'fashionmnist': {
+        'mean': [0.28604063146254594],
+        'stddev': [0.35302426207299326],
+    },
+    'imagenet': {
+        'mean': [0.485, 0.456, 0.406],
+        'stddev': [0.229, 0.224, 0.225],
+    },
+}
 
 
-def get_metadata(dataset):
-    """Return dataset metadata."""
-    if dataset == 'cifar10':
-        mean = [0.49139968, 0.48215827, 0.44653124]
-        stddev = [0.24703233, 0.24348505, 0.26158768]
-    elif dataset == 'cifar100':
-        mean = [0.5070751592371323, 0.48654887331495095, 0.4409178433670343]
-        stddev = [0.2673342858792401, 0.2564384629170883, 0.27615047132568404]
-    elif dataset == 'mnist':
-        mean = [0.13066051707548254]
-        stddev = [0.30810780244715075]
-    elif dataset == 'fashionmnist':
-        mean = [0.28604063146254594]
-        stddev = [0.35302426207299326]
-    elif dataset == 'imagenet':
-        mean = [0.485, 0.456, 0.406]
-        stddev = [0.229, 0.224, 0.225]
-    else:
-        mean = [0.5, 0.5, 0.5]
-        stddev = [0, 0, 0]
-    return {
-        'mean': mean,
-        'stddev': stddev,
-    }
+_default_metadata = {
+    'mean': [0.5, 0.5, 0.5],
+    'stddev': [0, 0, 0],
+}
 
 
-_train_transforms = {
+_dsets = {
+    'cifar10': datasets.CIFAR10,
+    'cifar100': datasets.CIFAR100,
+    'mnist': datasets.MNIST,
+    'fashionmnist': datasets.FashionMNIST,
+    'imagenet': datasets.ImageFolder,
+    'image': datasets.ImageFolder,
+}
+
+
+_train_transforms: Dict[str, Callable] = {
     'cifar10':
     lambda: [transforms.RandomCrop(32, padding=4), transforms.RandomHorizontalFlip()],
     'cifar100':
@@ -65,7 +81,8 @@ def get_metadata(dataset):
     ],
 }
 
-_valid_transforms = {
+
+_valid_transforms: Dict[str, Callable] = {
     'imagenet': lambda: [
         transforms.Resize(256),
         transforms.CenterCrop(224),
@@ -80,15 +97,16 @@ def get_metadata(dataset):
 class Cutout(object):
     """Apply Cutout on dataset."""
 
-    def __init__(self, length):
+    def __init__(self, length, seed=11235):
         self.length = length
+        self.rng = np.random.RandomState(seed)
 
     def __call__(self, img):
         """Return image with Cutout applied."""
         h, w = img.size(1), img.size(2)
         mask = np.ones((h, w), np.float32)
-        y = np.random.randint(h)
-        x = np.random.randint(w)
+        y = self.rng.randint(h)
+        x = self.rng.randint(w)
 
         y1 = np.clip(y - self.length // 2, 0, h)
         y2 = np.clip(y + self.length // 2, 0, h)
@@ -104,38 +122,25 @@ def __call__(self, img):
 
 
 @register
-def ImageClsData(dataset,
-                 root,
-                 valid=False,
-                 mean=None,
-                 stddev=None,
-                 cutout=0,
-                 jitter=False,
-                 transform_args=None,
-                 to_tensor=True):
+def ImageClsData(dataset: str,
+                 root: str,
+                 valid: bool = False,
+                 mean: Optional[List[float]] = None,
+                 stddev: Optional[List[float]] = None,
+                 cutout: int = 0,
+                 jitter: bool = False,
+                 transform_args: Optional[Dict[str, Any]] = None,
+                 to_tensor: bool = True) -> Dataset:
     """Return dataset for image classification."""
     dataset = dataset.lower()
-    meta = get_metadata(dataset)
+    dset = _dsets.get(dataset)
+    if dset is None:
+        raise ValueError('unsupported dataset: {}'.format(dataset))
+    meta = _metadata.get(dataset, _default_metadata)
     mean = meta['mean'] if mean is None else mean
     stddev = meta['stddev'] if stddev is None else stddev
-    os.makedirs(root, exist_ok=True)
-    if dataset == 'cifar10':
-        dset = datasets.CIFAR10
-    elif dataset == 'cifar100':
-        dset = datasets.CIFAR100
-    elif dataset == 'mnist':
-        dset = datasets.MNIST
-    elif dataset == 'fashionmnist':
-        dset = datasets.FashionMNIST
-    elif dataset == 'imagenet':
-        dset = datasets.ImageFolder
-    elif dataset == 'image':
-        dset = datasets.ImageFolder
-    else:
-        raise ValueError('unsupported dataset: {}'.format(dataset))
     transf_all = _valid_transforms if valid else _train_transforms
     transf = transf_all.get(dataset, lambda: [])(**(transform_args or {}))
-
     if jitter is True or jitter == 'strong':
         transf.append(transforms.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4, hue=0.1))
     elif jitter == 'normal':
@@ -144,7 +149,7 @@ def ImageClsData(dataset,
         transf.extend([transforms.ToTensor(), transforms.Normalize(mean, stddev)])
     if cutout > 0:
         transf.append(Cutout(cutout))
-
+    os.makedirs(root, exist_ok=True)
     if dset == datasets.ImageFolder:
         data = dset(root, transform=transforms.Compose(transf))
     else:
diff --git a/vega/algorithms/nas/modnas/data_provider/predefined/default.py b/vega/algorithms/nas/modnas/data_provider/predefined/default.py
index 5e1c4f83..30430eaa 100644
--- a/vega/algorithms/nas/modnas/data_provider/predefined/default.py
+++ b/vega/algorithms/nas/modnas/data_provider/predefined/default.py
@@ -8,26 +8,25 @@
 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 # MIT License for more details.
 
-"""Default DataProvider with dataloader."""
+"""Default DataProvider with Iterable."""
 from ..base import DataProviderBase
 from modnas.registry.data_provider import register
+from typing import List, Optional, Any, Collection, Iterator
 
 
 @register
 class DefaultDataProvider(DataProviderBase):
     """Default DataProvider with dataloader."""
 
-    def __init__(self, train_loader, valid_loader):
+    def __init__(self, train_loader: Collection, valid_loader: Optional[Collection]) -> None:
         super().__init__()
         self.train_loader = train_loader
         self.valid_loader = valid_loader
-        self.train_iter = None
-        self.valid_iter = None
         self.no_valid_warn = True
         self.reset_train_iter()
         self.reset_valid_iter()
 
-    def get_next_train_batch(self):
+    def get_next_train_batch(self) -> List[Any]:
         """Return the next train batch."""
         if self.train_loader is None:
             self.logger.error('no train loader')
@@ -39,7 +38,7 @@ def get_next_train_batch(self):
             trn_batch = next(self.get_train_iter())
         return trn_batch
 
-    def get_next_valid_batch(self):
+    def get_next_valid_batch(self) -> List[Any]:
         """Return the next validate batch."""
         if self.valid_loader is None:
             if self.no_valid_warn:
@@ -53,26 +52,26 @@ def get_next_valid_batch(self):
             val_batch = next(self.get_valid_iter())
         return val_batch
 
-    def get_train_iter(self):
+    def get_train_iter(self) -> Iterator:
         """Return train iterator."""
-        return self.train_iter
+        return self.train_iter or iter([])
 
-    def get_valid_iter(self):
+    def get_valid_iter(self) -> Iterator:
         """Return validate iterator."""
-        return self.valid_iter
+        return self.valid_iter or iter([])
 
-    def reset_train_iter(self):
+    def reset_train_iter(self) -> None:
         """Reset train iterator."""
         self.train_iter = None if self.train_loader is None else iter(self.train_loader)
 
-    def reset_valid_iter(self):
+    def reset_valid_iter(self) -> None:
         """Reset validate iterator."""
         self.valid_iter = None if self.valid_loader is None else iter(self.valid_loader)
 
-    def get_num_train_batch(self, epoch):
+    def get_num_train_batch(self, epoch: int) -> int:
         """Return number of train batches in current epoch."""
         return 0 if self.train_loader is None else len(self.train_loader)
 
-    def get_num_valid_batch(self, epoch):
+    def get_num_valid_batch(self, epoch: int) -> int:
         """Return number of validate batches in current epoch."""
         return 0 if self.valid_loader is None else len(self.valid_loader)
diff --git a/vega/algorithms/nas/modnas/docs/images/modnas_formulation.png b/vega/algorithms/nas/modnas/docs/images/modnas_formulation.png
deleted file mode 100644
index 116d86cf..00000000
Binary files a/vega/algorithms/nas/modnas/docs/images/modnas_formulation.png and /dev/null differ
diff --git a/vega/algorithms/nas/modnas/estim/base.py b/vega/algorithms/nas/modnas/estim/base.py
index 5347997a..7ed9eb92 100644
--- a/vega/algorithms/nas/modnas/estim/base.py
+++ b/vega/algorithms/nas/modnas/estim/base.py
@@ -17,6 +17,7 @@
 from modnas.registry.export import build as build_exporter
 from modnas.core.event import event_hooked_subclass
 from modnas.utils.logging import get_logger
+from modnas.registry import streamline_spec
 
 
 def build_criterions_all(crit_configs, device_ids=None):
@@ -25,11 +26,7 @@ def build_criterions_all(crit_configs, device_ids=None):
     crits_train = []
     crits_eval = []
     crits_valid = []
-    if crit_configs is None:
-        crit_configs = []
-    if not isinstance(crit_configs, list):
-        crit_configs = [crit_configs]
-    for crit_conf in crit_configs:
+    for crit_conf in streamline_spec(crit_configs):
         crit = backend.get_criterion(crit_conf, device_ids=device_ids)
         crit_mode = crit_conf['mode'] if isinstance(crit_conf, dict) and 'mode' in crit_conf else 'all'
         if not isinstance(crit_mode, list):
diff --git a/vega/algorithms/nas/modnas/estim/predefined/pipeline.py b/vega/algorithms/nas/modnas/estim/predefined/pipeline.py
index 5f498701..a722876e 100644
--- a/vega/algorithms/nas/modnas/estim/predefined/pipeline.py
+++ b/vega/algorithms/nas/modnas/estim/predefined/pipeline.py
@@ -12,7 +12,6 @@
 import time
 import yaml
 import traceback
-import queue
 import threading
 import multiprocessing as mp
 from ..base import EstimBase
@@ -26,7 +25,7 @@ def _mp_step_runner(conn, step_conf):
 
 
 def _mp_runner(step_conf):
-    ctx = mp.get_context('spawn')
+    ctx = mp.get_context(step_conf.get('mp_context', 'spawn'))
     p_con, c_con = ctx.Pipe()
     proc = ctx.Process(target=_mp_step_runner, args=(c_con, yaml.dump(step_conf)))
     time.sleep(step_conf.get('delay', 0))
@@ -51,9 +50,11 @@ def __init__(self, *args, use_multiprocessing=True, return_res=True, **kwargs):
         self.return_res = return_res
         self.runner = _mp_runner if use_multiprocessing else _default_runner
         self.step_results = dict()
-        self.pending = queue.Queue()
+        self.pending = []
         self.finished = set()
+        self.failed = set()
         self.cond_all_finished = threading.Lock()
+        self.lock_schedule = threading.Lock()
 
     def exec_runner(self, pname):
         """Execute runner in a thread."""
@@ -61,6 +62,7 @@ def exec_runner(self, pname):
             ret = self.runner(self.config.pipeline[pname])
         except RuntimeError:
             self.logger.info('pipeline step failed with error: {}'.format(traceback.format_exc()))
+            self.failed.add(pname)
             ret = None
         self.step_done(pname, ret, None)
 
@@ -73,7 +75,10 @@ def step(self, pname):
             inp_val = self.step_results
             for k in keys:
                 if not inp_val or k not in inp_val:
-                    raise RuntimeError('input key {} not found in return {}'.format(inp_idx, self.step_results))
+                    self.logger.error('input key {} not found in return {}'.format(inp_idx, self.step_results))
+                    self.failed.add(pname)
+                    self.step_done(pname, None, None)
+                    return
                 inp_val = inp_val[k]
             pconf[inp_kw] = inp_val
         self.logger.info('pipeline: running {}'.format(pname))
@@ -87,27 +92,40 @@ def step_done(self, params, value, arch_desc=None):
         self.logger.info('pipeline: finished {}, results={}'.format(pname, value))
         self.step_results[pname] = value
         self.finished.add(pname)
-        self._schedule()
+        if len(self.finished) == len(self.config.pipeline):
+            self.cond_all_finished.release()
+        else:
+            self._schedule()
         return {'no_opt': True}
 
     def _schedule(self):
         """Scheduler available jobs."""
-        if len(self.finished) == len(self.config.pipeline):
-            self.cond_all_finished.release()
-            return
-        while not self.pending.empty():
-            pname = self.pending.get()
+        self.lock_schedule.acquire()
+        new_pending = []
+        for pname in self.pending:
             pconf = self.config.pipeline.get(pname)
             dep_sat = True
+            failed = False
             deps = pconf.get('depends', []) + list(set([v.split('.')[0] for v in pconf.get('inputs', {}).values()]))
             for dep in deps:
+                if dep in self.failed:
+                    failed = True
+                    self.failed.add(pname)
+                    self.finished.add(pname)
+                    break
                 if dep not in self.finished:
                     dep_sat = False
                     break
+            if failed:
+                continue
             if not dep_sat:
-                self.pending.put(pname)
+                new_pending.append(pname)
                 continue
             self.stepped(pname)
+        self.pending = new_pending
+        self.lock_schedule.release()
+        if len(self.finished) == len(self.config.pipeline):
+            self.cond_all_finished.release()
 
     def run(self, optim):
         """Run Estimator routine."""
@@ -116,11 +134,14 @@ def run(self, optim):
         config = self.config
         pipeconf = config.pipeline
         for pn in pipeconf.keys():
-            self.pending.put(pn)
+            self.pending.append(pn)
         self.cond_all_finished.acquire()
         self._schedule()
         self.cond_all_finished.acquire()
         self.cond_all_finished.release()
+        if self.failed:
+            if len(self.failed) == len(self.finished):
+                raise RuntimeError('pipeline: all failed')
         logger.info('pipeline: all finished: {}'.format(self.step_results))
         if self.return_res:
             return {'step_results': self.step_results}
diff --git a/vega/algorithms/nas/modnas/estim/predefined/unified.py b/vega/algorithms/nas/modnas/estim/predefined/unified.py
index 8c4048d7..2a77a932 100644
--- a/vega/algorithms/nas/modnas/estim/predefined/unified.py
+++ b/vega/algorithms/nas/modnas/estim/predefined/unified.py
@@ -13,13 +13,19 @@
 from ..base import EstimBase
 from modnas.core.param_space import ParamSpace
 from modnas.registry.estim import register
+from modnas.optim.base import OptimBase
+from collections import OrderedDict
+from typing import Dict, Optional, Any
 
 
 @register
 class UnifiedEstim(EstimBase):
     """Unified Estimator class."""
 
-    def __init__(self, train_epochs=1, train_steps=-1, reset_training=False, eval_steps=1, *args, **kwargs):
+    def __init__(
+        self, train_epochs: int = 1, train_steps: int = -1, reset_training: bool = False, eval_steps: int = 1,
+        *args, **kwargs
+    ) -> None:
         super().__init__(*args, **kwargs)
         if train_steps != 0:
             train_epochs = 1
@@ -29,7 +35,7 @@ def __init__(self, train_epochs=1, train_steps=-1, reset_training=False, eval_st
         self.eval_steps = eval_steps
         self.cur_step = -1
 
-    def step(self, params):
+    def step(self, params: OrderedDict) -> Dict[str, Any]:
         """Return evaluation results of a parameter set."""
         ParamSpace().update_params(params)
         n_train_batch = self.get_num_train_batch()
@@ -60,7 +66,7 @@ def step(self, params):
         self.logger.info('Evaluate: {} -> {}'.format(arch_desc, ret))
         return ret
 
-    def run_epoch(self, optim, epoch, tot_epochs):
+    def run_epoch(self, optim: OptimBase, epoch: int, tot_epochs: int) -> Optional[Dict[str, Any]]:
         """Run Estimator routine for one epoch."""
         logger = self.logger
         config = self.config
@@ -85,10 +91,11 @@ def run_epoch(self, optim, epoch, tot_epochs):
             self.stepped(params)
         self.wait_done()
         if (epoch + 1) % n_epoch_steps != 0:
-            return
+            return None
         self.cur_epoch += 1
+        return None
 
-    def run(self, optim):
+    def run(self, optim: OptimBase) -> None:
         """Run Estimator routine."""
         self.reset_trainer()
         config = self.config
diff --git a/vega/algorithms/nas/modnas/ext_requirements.txt b/vega/algorithms/nas/modnas/ext_requirements.txt
new file mode 100644
index 00000000..eca67a41
--- /dev/null
+++ b/vega/algorithms/nas/modnas/ext_requirements.txt
@@ -0,0 +1,19 @@
+# Plotting
+matplotlib
+
+# Tensorboard
+tensorboardX
+
+
+# Bayesian optimization
+scikit-optimize
+
+# Score model
+xgboost
+scikit-learn
+
+# Distributed
+rpyc
+
+# NASBench
+tensorflow
diff --git a/vega/algorithms/nas/modnas/extra_requirements.txt b/vega/algorithms/nas/modnas/extra_requirements.txt
index 7293c6d5..4301c788 100644
--- a/vega/algorithms/nas/modnas/extra_requirements.txt
+++ b/vega/algorithms/nas/modnas/extra_requirements.txt
@@ -9,6 +9,7 @@ git+git://github.com/creeperlin/rasp
 
 # Bayesian optimization
 scikit-optimize
+hyperopt
 
 # Score model
 xgboost
diff --git a/vega/algorithms/nas/modnas/metrics/__init__.py b/vega/algorithms/nas/modnas/metrics/__init__.py
index 92f4b54b..d930a14d 100644
--- a/vega/algorithms/nas/modnas/metrics/__init__.py
+++ b/vega/algorithms/nas/modnas/metrics/__init__.py
@@ -9,15 +9,17 @@
 # MIT License for more details.
 
 from modnas.registry.metrics import build
+from modnas.registry import SPEC_TYPE
 from .base import MetricsBase
+from typing import Dict, Optional, Any
 
 
-def build_metrics_all(mt_configs, estim=None):
+def build_metrics_all(mt_configs: Optional[SPEC_TYPE], estim: Optional[Any] = None) -> Dict[str, MetricsBase]:
     """Build Metrics from configs."""
     metrics = {}
+    MetricsBase.set_estim(estim)
     if mt_configs is None:
         mt_configs = {}
-    MetricsBase.set_estim(estim)
     if not isinstance(mt_configs, dict):
         mt_configs = {'default': mt_configs}
     for mt_name, mt_conf in mt_configs.items():
diff --git a/vega/algorithms/nas/modnas/metrics/base.py b/vega/algorithms/nas/modnas/metrics/base.py
index bb780cc5..173bff69 100644
--- a/vega/algorithms/nas/modnas/metrics/base.py
+++ b/vega/algorithms/nas/modnas/metrics/base.py
@@ -10,6 +10,7 @@
 
 """Implementation of Metrics interface."""
 from modnas.utils.logging import get_logger
+from typing import Any
 
 
 class MetricsBase():
@@ -18,19 +19,19 @@ class MetricsBase():
     logger = get_logger('metrics')
     cur_estim = None
 
-    def __init__(self):
+    def __init__(self) -> None:
         self.estim = MetricsBase.get_estim()
 
-    def __call__(self, *args, **kwargs):
+    def __call__(self, *args, **kwargs) -> Any:
         """Return metrics output."""
         raise NotImplementedError
 
     @staticmethod
-    def get_estim():
+    def get_estim() -> Any:
         """Get current Estimator."""
         return MetricsBase.cur_estim
 
     @staticmethod
-    def set_estim(estim):
+    def set_estim(estim: Any) -> None:
         """Set current Estimator."""
         MetricsBase.cur_estim = estim
diff --git a/vega/algorithms/nas/modnas/metrics/predefined/agg.py b/vega/algorithms/nas/modnas/metrics/predefined/agg.py
index 9e1461e9..1440e63b 100644
--- a/vega/algorithms/nas/modnas/metrics/predefined/agg.py
+++ b/vega/algorithms/nas/modnas/metrics/predefined/agg.py
@@ -12,19 +12,20 @@
 from functools import reduce
 from ..base import MetricsBase
 from modnas.registry.metrics import register, build
+from typing import Dict, Any
 
 
 @register
 class SumAggMetrics(MetricsBase):
     """Aggregate metrics by sum."""
 
-    def __init__(self, metrics_conf):
+    def __init__(self, metrics_conf: Dict) -> None:
         super().__init__()
         self.metrics = {k: build(conf) for k, conf in metrics_conf.items()}
         self.base = {k: conf.get('base', 1) for k, conf in metrics_conf.items()}
         self.weight = {k: conf.get('weight', 1) for k, conf in metrics_conf.items()}
 
-    def __call__(self, item):
+    def __call__(self, item: Any) -> Any:
         """Return metrics output."""
         mt_res = {k: (mt(item) or 0) for k, mt in self.metrics.items()}
         self.logger.info('SumAgg: {{{}}}'.format(', '.join(['{}: {}'.format(k, r) for k, r in mt_res.items()])))
diff --git a/vega/algorithms/nas/modnas/metrics/predefined/stats.py b/vega/algorithms/nas/modnas/metrics/predefined/stats.py
index c8f0bb49..d9778bc8 100644
--- a/vega/algorithms/nas/modnas/metrics/predefined/stats.py
+++ b/vega/algorithms/nas/modnas/metrics/predefined/stats.py
@@ -14,13 +14,15 @@
 import numpy as np
 from ..base import MetricsBase
 from modnas.registry.metrics import register, build
+from modnas.registry import SPEC_TYPE
+from typing import List, Any, Optional
 
 
 @register
 class StatsLUTMetrics(MetricsBase):
     """Statistical metrics using look-up table (LUT)."""
 
-    def __init__(self, lut_path, head=None):
+    def __init__(self, lut_path: str, head: List[str]) -> None:
         super().__init__()
         with open(lut_path, 'r') as f:
             self.lut = yaml.load(f, Loader=yaml.Loader)
@@ -29,7 +31,7 @@ def __init__(self, lut_path, head=None):
         self.head = head
         self.warned = set()
 
-    def __call__(self, stats):
+    def __call__(self, stats: Any) -> float:
         """Return metrics output."""
         key = '#'.join([str(stats[k]) for k in self.head if not stats.get(k, None) is None])
         val = self.lut.get(key, None)
@@ -48,7 +50,7 @@ def __call__(self, stats):
 class StatsRecordMetrics(MetricsBase):
     """Statistical metrics using recorded results."""
 
-    def __init__(self, metrics, head=None, save_path=None):
+    def __init__(self, metrics: SPEC_TYPE, head: List[str], save_path: Optional[str] = None) -> None:
         super().__init__()
         self.head = head
         self.metrics = build(metrics)
@@ -58,7 +60,7 @@ def __init__(self, metrics, head=None, save_path=None):
         if save_path is not None:
             self.save_file = open(save_path, 'w')
 
-    def __call__(self, stats):
+    def __call__(self, stats: Any) -> float:
         """Return metrics output."""
         key = '#'.join([str(stats[k]) for k in self.head if stats[k] is not None])
         if key in self.record:
diff --git a/vega/algorithms/nas/modnas/metrics/torch/rasp.py b/vega/algorithms/nas/modnas/metrics/torch/rasp.py
index 135b332f..28f042a5 100644
--- a/vega/algorithms/nas/modnas/metrics/torch/rasp.py
+++ b/vega/algorithms/nas/modnas/metrics/torch/rasp.py
@@ -12,22 +12,28 @@
 from ..base import MetricsBase
 from modnas.registry.metrics import register, build
 from modnas.arch_space.mixed_ops import MixedOp
+from modnas.registry import SPEC_TYPE
+from torch.nn.modules.module import Module
+from typing import List, Any, Union
+
 try:
     import rasp
     import rasp.frontend as F
+    from rasp.profiler.tree import StatTreeNode
 except ImportError:
     rasp = None
+    StatTreeNode = None
 
 
 @register
 class RASPStatsMetrics(MetricsBase):
     """RASP node statistics metrics class."""
 
-    def __init__(self, item):
+    def __init__(self, item: str) -> None:
         super().__init__()
         self.item = item
 
-    def __call__(self, node):
+    def __call__(self, node: StatTreeNode) -> int:
         """Return metrics output."""
         return node[self.item]
 
@@ -37,14 +43,14 @@ class RASPTraversalMetrics(MetricsBase):
     """RASP model traversal metrics class."""
 
     def __init__(self,
-                 input_shape,
-                 metrics,
-                 compute=True,
-                 timing=False,
-                 device='cuda',
-                 mixed_only=False,
-                 keep_stats=True,
-                 traversal_type='tape_nodes'):
+                 input_shape: List[int],
+                 metrics: SPEC_TYPE,
+                 compute: bool = True,
+                 timing: bool = False,
+                 device: str = 'cuda',
+                 mixed_only: bool = False,
+                 keep_stats: bool = True,
+                 traversal_type: str = 'tape_nodes') -> None:
         super().__init__()
         if rasp is None:
             raise ValueError('package RASP is not found')
@@ -63,7 +69,7 @@ def __init__(self,
             raise ValueError('invalid traversal type')
         self.excluded = set()
 
-    def _traverse_tape_nodes(self, node):
+    def _traverse_tape_nodes(self, node: StatTreeNode) -> Union[float, int]:
         ret = 0
         if node in self.excluded:
             return ret
@@ -82,7 +88,7 @@ def _traverse_tape_nodes(self, node):
             ret += n_ret
         return ret
 
-    def _traverse_tape_leaves(self, node):
+    def _traverse_tape_leaves(self, node: StatTreeNode) -> Union[float, int]:
         ret = 0
         for cur_node in node.tape.items_all:
             if cur_node in self.excluded:
@@ -93,7 +99,7 @@ def _traverse_tape_leaves(self, node):
             ret += n_ret
         return ret
 
-    def _stat(self, module, input_shape):
+    def _stat(self, module: Module, input_shape: List[int]) -> None:
         """Run profiling on given module."""
         if self.eval_compute:
             F.hook_compute(module)
@@ -103,14 +109,14 @@ def _stat(self, module, input_shape):
         F.unhook_compute(module)
         F.unhook_timing(module)
 
-    def __call__(self, net):
+    def __call__(self, net: Module) -> Any:
         """Return metrics output."""
         self.excluded.clear()
         root = F.get_stats_node(net)
         if root is None:
             root = F.reg_stats_node(net)
             self._stat(net, self.input_shape)
-        mt = 0
+        mt = 0.
         for m in net.modules():
             if not isinstance(m, MixedOp):
                 continue
diff --git a/vega/algorithms/nas/modnas/optim/base.py b/vega/algorithms/nas/modnas/optim/base.py
index fa9934fd..7d9c2e99 100644
--- a/vega/algorithms/nas/modnas/optim/base.py
+++ b/vega/algorithms/nas/modnas/optim/base.py
@@ -14,6 +14,9 @@
 from modnas.core.param_space import ParamSpace
 from modnas.core.event import event_hooked_subclass
 from modnas.utils.logging import get_logger
+from collections import OrderedDict
+from modnas.estim.base import EstimBase
+from typing import Any, Dict, List, Optional
 
 
 @event_hooked_subclass
@@ -22,7 +25,7 @@ class OptimBase():
 
     logger = get_logger('optim')
 
-    def __init__(self, space=None):
+    def __init__(self, space: Optional[ParamSpace] = None) -> None:
         self.space = space or ParamSpace()
 
     def state_dict(self):
@@ -41,7 +44,7 @@ def _next(self):
         """Return the next set of parameters."""
         raise NotImplementedError
 
-    def next(self, batch_size=1):
+    def next(self, batch_size: int = 1) -> List[OrderedDict]:
         """Return the next batch of parameter sets."""
         batch = []
         for _ in range(batch_size):
@@ -50,7 +53,7 @@ def next(self, batch_size=1):
             batch.append(self._next())
         return batch
 
-    def step(self, estim):
+    def step(self, estim: EstimBase) -> None:
         """Update Optimizer states using Estimator evaluation results."""
         pass
 
@@ -67,7 +70,7 @@ class GradientBasedOptim(OptimBase):
         }
     }
 
-    def __init__(self, space=None, a_optim=None):
+    def __init__(self, space: Optional[ParamSpace] = None, a_optim: Optional[Dict[str, Any]] = None) -> None:
         super().__init__(space)
         a_optim = a_optim or GradientBasedOptim._default_optimizer_conf
         self.a_optim = backend.get_optimizer(self.space.tensor_values(), a_optim)
@@ -80,59 +83,59 @@ def load_state_dict(self, sd):
         """Resume states."""
         self.a_optim.load_state_dict(sd['a_optim'])
 
-    def optim_step(self):
+    def optim_step(self) -> None:
         """Do tensor parameter optimizer step."""
         self.a_optim.step()
         self.space.on_update_tensor_params()
 
-    def optim_reset(self):
+    def optim_reset(self) -> None:
         """Prepare tensor parameter optimizer step."""
         self.a_optim.zero_grad()
 
-    def has_next(self):
+    def has_next(self) -> bool:
         """Return True if Optimizer has the next set of parameters."""
         return True
 
-    def _next(self):
+    def _next(self) -> Dict[Any, Any]:
         return {}
 
 
 class CategoricalSpaceOptim(OptimBase):
     """Categorical space Optimizer class."""
 
-    def __init__(self, space=None):
+    def __init__(self, space: Optional[ParamSpace] = None) -> None:
         super().__init__(space)
         self.space_size = self.space.categorical_size
         self.visited = set()
 
-    def has_next(self):
+    def has_next(self) -> bool:
         """Return True if Optimizer has the next set of parameters."""
         return len(self.visited) < self.space_size()
 
-    def get_random_index(self):
+    def get_random_index(self) -> int:
         """Return a random index from categorical space."""
         index = random.randint(0, self.space_size() - 1)
         while index in self.visited:
             index = random.randint(0, self.space_size() - 1)
         return index
 
-    def is_visited(self, idx):
+    def is_visited(self, idx: int) -> bool:
         """Return True if a space index is already visited."""
         return idx in self.visited
 
-    def set_visited(self, idx):
+    def set_visited(self, idx: int) -> None:
         """Set a space index as visited."""
         self.visited.add(idx)
 
-    def get_random_params(self):
+    def get_random_params(self) -> OrderedDict:
         """Return a random parameter set from categorical space."""
         return self.space.get_categorical_params(self.get_random_index())
 
-    def is_visited_params(self, params):
+    def is_visited_params(self, params: OrderedDict) -> bool:
         """Return True if a parameter set is already visited."""
         return self.is_visited(self.space.get_categorical_index(params))
 
-    def set_visited_params(self, params):
+    def set_visited_params(self, params: OrderedDict) -> None:
         """Set a parameter set as visited."""
         self.visited.add(self.space.get_categorical_index(params))
 
diff --git a/vega/algorithms/nas/modnas/optim/model_optim/base.py b/vega/algorithms/nas/modnas/optim/model_optim/base.py
index d94578f7..a9bc2fb8 100644
--- a/vega/algorithms/nas/modnas/optim/model_optim/base.py
+++ b/vega/algorithms/nas/modnas/optim/model_optim/base.py
@@ -10,6 +10,8 @@
 
 """Score model optimum finder."""
 import random
+from collections import OrderedDict
+from typing import Set
 
 
 class ModelOptim():
@@ -18,14 +20,14 @@ class ModelOptim():
     def __init__(self, space):
         self.space = space
 
-    def get_random_index(self, excludes):
+    def get_random_index(self, excludes: Set[int]) -> int:
         """Return random categorical index from search space."""
         index = random.randint(0, self.space.categorical_size() - 1)
         while index in excludes:
             index = random.randint(0, self.space.categorical_size() - 1)
         return index
 
-    def get_random_params(self, excludes):
+    def get_random_params(self, excludes: Set[int]) -> OrderedDict:
         """Return random categorical parameters from search space."""
         return self.space.get_categorical_params(self.get_random_index(excludes))
 
diff --git a/vega/algorithms/nas/modnas/optim/model_optim/sa.py b/vega/algorithms/nas/modnas/optim/model_optim/sa.py
index 61a7eb57..d3e29725 100644
--- a/vega/algorithms/nas/modnas/optim/model_optim/sa.py
+++ b/vega/algorithms/nas/modnas/optim/model_optim/sa.py
@@ -15,6 +15,8 @@
 from .base import ModelOptim
 from modnas.registry.model_optim import register
 from modnas.utils.logging import get_logger
+from collections import OrderedDict
+from typing import Any, List, Set, Union
 
 
 logger = get_logger('model_optim')
@@ -43,7 +45,7 @@ def __init__(self,
         self.keep_history = keep_history
         self.history = None
 
-    def disturb(self, params):
+    def disturb(self, params: OrderedDict) -> OrderedDict:
         """Return randomly disturbed parameter."""
         pname = list(params)[random.randint(0, len(params) - 1)]
         p = self.space.get_param(pname)
@@ -54,14 +56,14 @@ def disturb(self, params):
         new_params[pname] = p.get_value(nidx)
         return new_params
 
-    def get_optimums(self, model, size, excludes):
+    def get_optimums(self, model: Any, size: int, excludes: Set[int]) -> List[int]:
         """Return optimums in score model."""
         topq = []
         for _ in range(self.n_iter):
             self.run_sa(model, size, excludes, topq)
         return [item[-1] for item in topq[::1]]
 
-    def run_sa(self, model, size, excludes, topq):
+    def run_sa(self, model: Any, size: int, excludes: Set[int], topq: List[Any]) -> None:
         """Run SA algorithm."""
         if self.history is None:
             params = [self.get_random_params(excludes) for _ in range(self.batch_size)]
diff --git a/vega/algorithms/nas/modnas/optim/predefined/genetic.py b/vega/algorithms/nas/modnas/optim/predefined/genetic.py
index ee38e742..c8499418 100644
--- a/vega/algorithms/nas/modnas/optim/predefined/genetic.py
+++ b/vega/algorithms/nas/modnas/optim/predefined/genetic.py
@@ -13,12 +13,16 @@
 import random
 from ..base import CategoricalSpaceOptim
 from modnas.registry.optim import register
+from modnas.core.param_space import ParamSpace
+from modnas.estim.base import EstimBase
+from collections import OrderedDict
+from typing import Callable, Dict, List, Union, Optional
 
 
 class GeneticOptim(CategoricalSpaceOptim):
     """Optimizer with genetic operators on a population."""
 
-    def __init__(self, pop_size, max_it=1000, space=None):
+    def __init__(self, pop_size: int, max_it: int = 1000, space: Optional[ParamSpace] = None) -> None:
         super().__init__(space)
         self.max_it = max_it
         self.pop_size = pop_size
@@ -29,22 +33,22 @@ def __init__(self, pop_size, max_it=1000, space=None):
     def _initialize(self):
         raise NotImplementedError
 
-    def _mating(self, pop):
+    def _mating(self, pop: List[OrderedDict]) -> List[OrderedDict]:
         cur_pop = pop
         for op in self.operators:
             cur_pop = op(cur_pop)
         return cur_pop
 
-    def _next(self):
+    def _next(self) -> OrderedDict:
         params = self.population[len(self.metrics)]
         self.set_visited_params(params)
         return params
 
-    def add_operator(self, operator):
+    def add_operator(self, operator: Callable) -> None:
         """Add a genetic operator."""
         self.operators.append(operator)
 
-    def to_metrics(self, res):
+    def to_metrics(self, res: Union[float, Dict[str, float]]) -> float:
         """Return scalar metrics from evaluation results."""
         if isinstance(res, dict):
             return list(res.values())[0]
@@ -52,7 +56,7 @@ def to_metrics(self, res):
             return res[0]
         return res
 
-    def step(self, estim):
+    def step(self, estim: EstimBase) -> None:
         """Update Optimizer states using Estimator evaluation results."""
         _, results = estim.get_last_results()
         results = [self.to_metrics(res) for res in results]
@@ -67,14 +71,14 @@ class EvolutionOptim(GeneticOptim):
     """Optimizer with Evolution algorithm."""
 
     def __init__(self,
-                 pop_size=100,
-                 n_parents=2,
-                 n_offsprings=1,
-                 n_select=10,
-                 n_eliminate=1,
-                 n_crossover=None,
-                 mutation_prob=0.01,
-                 space=None):
+                 pop_size: int = 100,
+                 n_parents: int = 2,
+                 n_offsprings: int = 1,
+                 n_select: int = 10,
+                 n_eliminate: int = 1,
+                 n_crossover: Optional[int] = None,
+                 mutation_prob: float = 0.01,
+                 space: Optional[ParamSpace] = None) -> None:
         super().__init__(space=space, pop_size=pop_size)
         self.add_operator(self._survival)
         self.add_operator(self._selection)
@@ -87,7 +91,7 @@ def __init__(self,
         self.n_crossover = pop_size if n_crossover is None else n_crossover
         self.mutation_prob = mutation_prob
 
-    def _initialize(self):
+    def _initialize(self) -> List[OrderedDict]:
         return [self.get_random_params() for _ in range(self.pop_size)]
 
     def _survival(self, pop):
@@ -99,7 +103,7 @@ def _survival(self, pop):
         self.metrics = [metrics[i] for i in idx]
         return [pop[i] for i in idx]
 
-    def _selection(self, pop):
+    def _selection(self, pop: List[OrderedDict]) -> List[OrderedDict]:
         n_select = self.n_select
         if n_select >= len(pop):
             return pop
@@ -108,7 +112,7 @@ def _selection(self, pop):
         self.metrics = [metrics[i] for i in idx]
         return [pop[i] for i in idx]
 
-    def _crossover(self, pop):
+    def _crossover(self, pop: List[OrderedDict]) -> List[OrderedDict]:
         next_pop = []
         it = 0
         while len(next_pop) < self.n_crossover and it < self.max_it:
@@ -126,7 +130,7 @@ def _crossover(self, pop):
             next_pop.append(self.get_random_params())
         return next_pop
 
-    def _mutation(self, pop):
+    def _mutation(self, pop: List[OrderedDict]) -> List[OrderedDict]:
         next_pop = []
         for gene in pop:
             it = 0
@@ -151,7 +155,7 @@ def _mutation(self, pop):
 class RegularizedEvolutionOptim(EvolutionOptim):
     """Optimizer with Regularized Evolution algorithm."""
 
-    def _survival(self, pop):
+    def _survival(self, pop: List[OrderedDict]) -> List[OrderedDict]:
         s_idx = self.n_eliminate
         if s_idx <= 0:
             return pop
diff --git a/vega/algorithms/nas/modnas/optim/predefined/gridsearch.py b/vega/algorithms/nas/modnas/optim/predefined/gridsearch.py
index a35adf36..82ac3c7e 100644
--- a/vega/algorithms/nas/modnas/optim/predefined/gridsearch.py
+++ b/vega/algorithms/nas/modnas/optim/predefined/gridsearch.py
@@ -13,6 +13,9 @@
 import random
 from ..base import CategoricalSpaceOptim
 from modnas.registry.optim import register
+from modnas.core.param_space import ParamSpace
+from collections import OrderedDict
+from typing import Optional
 
 
 @register
@@ -37,12 +40,12 @@ def has_next(self):
 class RandomSearchOptim(CategoricalSpaceOptim):
     """Optimizer using random search."""
 
-    def __init__(self, seed=None, space=None):
+    def __init__(self, seed: Optional[int] = None, space: Optional[ParamSpace] = None) -> None:
         super().__init__(space)
         seed = int(time.time()) if seed is None else seed
         random.seed(seed)
 
-    def _next(self):
+    def _next(self) -> OrderedDict:
         index = self.get_random_index()
         self.visited.add(index)
         return self.space.get_categorical_params(index)
diff --git a/vega/algorithms/nas/modnas/optim/predefined/model_based.py b/vega/algorithms/nas/modnas/optim/predefined/model_based.py
index eaab4924..0f987d8d 100644
--- a/vega/algorithms/nas/modnas/optim/predefined/model_based.py
+++ b/vega/algorithms/nas/modnas/optim/predefined/model_based.py
@@ -13,13 +13,21 @@
 from modnas.registry.score_model import build as build_score_model
 from modnas.registry.model_optim import build as build_model_optim
 from modnas.registry.optim import register
+from modnas.registry import SPEC_TYPE
+from collections import OrderedDict
+from modnas.core.param_space import ParamSpace
+from modnas.estim.base import EstimBase
+from typing import Optional
 
 
 @register
 class ModelBasedOptim(CategoricalSpaceOptim):
     """Model-based Optimizer class."""
 
-    def __init__(self, model_config, model_optim_config, greedy_e=0.05, n_next_pts=32, space=None):
+    def __init__(
+        self, model_config: SPEC_TYPE, model_optim_config: SPEC_TYPE, greedy_e: float = 0.05, n_next_pts: int = 32,
+        space: Optional[ParamSpace] = None
+    ) -> None:
         super().__init__(space)
         self.model = build_score_model(model_config, space=self.space)
         self.model_optim = build_model_optim(model_optim_config, space=self.space)
@@ -31,7 +39,7 @@ def __init__(self, model_config, model_optim_config, greedy_e=0.05, n_next_pts=3
         self.next_pt = 0
         self.train_ct = 0
 
-    def _next(self):
+    def _next(self) -> OrderedDict:
         while self.next_pt < len(self.next_xs):
             index = self.next_xs[self.next_pt]
             if not self.is_visited(index):
@@ -42,7 +50,7 @@ def _next(self):
         self.set_visited(index)
         return self.space.get_categorical_params(index)
 
-    def step(self, estim):
+    def step(self, estim: EstimBase) -> None:
         """Update Optimizer states using Estimator evaluation results."""
         inputs, results = estim.get_last_results()
         for inp, res in zip(inputs, results):
diff --git a/vega/algorithms/nas/modnas/optim/score_model/base.py b/vega/algorithms/nas/modnas/optim/score_model/base.py
index b72fda93..0625be25 100644
--- a/vega/algorithms/nas/modnas/optim/score_model/base.py
+++ b/vega/algorithms/nas/modnas/optim/score_model/base.py
@@ -10,6 +10,9 @@
 
 """Evaluation score prediction model."""
 import numpy as np
+from collections import OrderedDict
+from numpy import ndarray
+from typing import List, Union
 
 
 class ScoreModel():
@@ -26,7 +29,7 @@ def predict(self, inputs):
         """Return predicted evaluation score from model."""
         raise NotImplementedError
 
-    def _process_input(self, inp):
+    def _process_input(self, inp: OrderedDict) -> List[int]:
         ret = []
         for n, v in inp.items():
             p = self.space.get_param(n)
@@ -35,14 +38,13 @@ def _process_input(self, inp):
             ret.extend(one_hot)
         return ret
 
-    def to_feature(self, inputs):
+    def to_feature(self, inputs: Union[OrderedDict, List[OrderedDict]]) -> ndarray:
         """Return feature variables from inputs."""
         if not isinstance(inputs, list):
             inputs = [inputs]
-        inputs = [self._process_input(inp) for inp in inputs]
-        return np.array(inputs)
+        return np.array([self._process_input(inp) for inp in inputs])
 
-    def to_target(self, results):
+    def to_target(self, results: List[float]) -> ndarray:
         """Return target variables from results."""
         def to_metrics(res):
             if isinstance(res, dict):
diff --git a/vega/algorithms/nas/modnas/optim/score_model/sklearn.py b/vega/algorithms/nas/modnas/optim/score_model/sklearn.py
index bc78e056..179d5d0e 100644
--- a/vega/algorithms/nas/modnas/optim/score_model/sklearn.py
+++ b/vega/algorithms/nas/modnas/optim/score_model/sklearn.py
@@ -17,6 +17,9 @@
     sklearn = None
 from .base import ScoreModel
 from modnas.registry.score_model import register
+from collections import OrderedDict
+from numpy import ndarray
+from typing import List
 
 
 @register
@@ -31,7 +34,7 @@ def __init__(self, space, model_cls, module, model_kwargs={}):
         model_cls = getattr(module, model_cls)
         self.model = model_cls(**model_kwargs)
 
-    def fit(self, inputs, results):
+    def fit(self, inputs: List[OrderedDict], results: List[float]) -> None:
         """Fit model with evaluation results."""
         x_train = self.to_feature(inputs)
         y_train = self.to_target(results)
@@ -39,7 +42,7 @@ def fit(self, inputs, results):
         trn_x, trn_y = x_train[index], y_train[index]
         self.model.fit(trn_x, trn_y)
 
-    def predict(self, inputs):
+    def predict(self, inputs: List[OrderedDict]) -> ndarray:
         """Return predicted evaluation score from model."""
         feats = self.to_feature(inputs)
         return self.model.predict(feats)
diff --git a/vega/algorithms/nas/modnas/optim/score_model/xgboost.py b/vega/algorithms/nas/modnas/optim/score_model/xgboost.py
index 105b72be..d50ca48c 100644
--- a/vega/algorithms/nas/modnas/optim/score_model/xgboost.py
+++ b/vega/algorithms/nas/modnas/optim/score_model/xgboost.py
@@ -16,6 +16,9 @@
     xgb = None
 from .base import ScoreModel
 from modnas.registry.score_model import register
+from collections import OrderedDict
+from numpy import ndarray
+from typing import List
 
 
 xgb_params_reg = {
@@ -54,7 +57,7 @@ def __init__(self, space, loss_type='reg', xgb_kwargs={}):
         self.xgb_params = xgb_params
         self.model = None
 
-    def fit(self, inputs, results):
+    def fit(self, inputs: List[OrderedDict], results: List[float]) -> None:
         """Fit model with evaluation results."""
         x_train = self.to_feature(inputs)
         y_train = self.to_target(results)
@@ -66,7 +69,7 @@ def fit(self, inputs, results):
             num_boost_round=400,
         )
 
-    def predict(self, inputs):
+    def predict(self, inputs: List[OrderedDict]) -> ndarray:
         """Return predicted evaluation score from model."""
         feats = self.to_feature(inputs)
         dtest = xgb.DMatrix(feats)
diff --git a/vega/algorithms/nas/modnas/optim/torch/gradient_based.py b/vega/algorithms/nas/modnas/optim/torch/gradient_based.py
index 3c456456..cf82900b 100644
--- a/vega/algorithms/nas/modnas/optim/torch/gradient_based.py
+++ b/vega/algorithms/nas/modnas/optim/torch/gradient_based.py
@@ -16,6 +16,14 @@
 from modnas.core.param_space import ParamSpace
 from modnas.arch_space.mixed_ops import MixedOp
 from modnas.registry.optim import register
+from modnas.estim.base import EstimBase
+from torch import Tensor
+from torch.nn.modules.module import Module
+from torch.optim.optimizer import Optimizer
+from typing import Any, List, Optional, Tuple, Dict
+
+
+OPTIM_CONF_TYPE = Optional[Dict[str, Any]]
 
 
 @register
@@ -25,13 +33,16 @@ class DARTSOptim(GradientBasedOptim):
     modified from https://github.com/khanrc/pt.darts
     """
 
-    def __init__(self, a_optim=None, w_momentum=0.9, w_weight_decay=0.0003, space=None):
+    def __init__(
+        self, a_optim: OPTIM_CONF_TYPE = None, w_momentum: float = 0.9, w_weight_decay: float = 0.0003,
+        space: Optional[ParamSpace] = None
+    ) -> None:
         super().__init__(space, a_optim)
         self.v_net = None
         self.w_momentum = w_momentum
         self.w_weight_decay = w_weight_decay
 
-    def _virtual_step(self, trn_batch, lr, optimizer, estim):
+    def _virtual_step(self, trn_batch: Any, lr: float, optimizer: Optimizer, estim: EstimBase) -> None:
         # forward & calc loss
         model = estim.model
         loss = estim.loss(trn_batch, mode='train')  # L_trn(w)
@@ -43,7 +54,7 @@ def _virtual_step(self, trn_batch, lr, optimizer, estim):
                 m = optimizer.state[w].get('momentum_buffer', 0.) * self.w_momentum
                 vw.copy_(w - lr * (m + g + self.w_weight_decay * w))
 
-    def step(self, estim):
+    def step(self, estim: EstimBase) -> None:
         """Update Optimizer states using Estimator."""
         self.optim_reset()
         trn_batch = estim.get_cur_train_batch()
@@ -58,20 +69,19 @@ def step(self, estim):
         # calc unrolled loss
         loss = estim.loss(val_batch, model=self.v_net, mode='valid')  # L_val(w`)
         # compute gradient
-        alphas = ParamSpace().tensor_values()
-        v_alphas = tuple(alphas)
+        v_alphas = tuple(ParamSpace().tensor_values())
         v_weights = tuple(self.v_net.parameters())
         v_grads = torch.autograd.grad(loss, v_alphas + v_weights)
         dalpha = v_grads[:len(v_alphas)]
-        dw = v_grads[len(v_alphas):]
+        dw = list(v_grads[len(v_alphas):])
         hessian = self._compute_hessian(dw, trn_batch, estim)
         # update final gradient = dalpha - lr*hessian
         with torch.no_grad():
-            for a, da, h in zip(alphas, dalpha, hessian):
+            for a, da, h in zip(v_alphas, dalpha, hessian):
                 a.grad = da - lr * h
         self.optim_step()
 
-    def _compute_hessian(self, dw, trn_batch, estim):
+    def _compute_hessian(self, dw: List[Tensor], trn_batch: Tuple[Tensor, Tensor], estim: EstimBase) -> List[Any]:
         """Compute Hessian matrix.
 
         dw = dw` { L_val(w`, alpha) }
@@ -81,9 +91,9 @@ def _compute_hessian(self, dw, trn_batch, estim):
         eps = 0.01 / ||dw||
         """
         model = estim.model
-        alphas = ParamSpace().tensor_values()
+        alphas = tuple(ParamSpace().tensor_values())
         norm = torch.cat([w.view(-1) for w in dw]).norm()
-        eps = 0.01 / norm
+        eps = (0.01 / norm).item()
         # w+ = w + eps*dw`
         with torch.no_grad():
             for p, d in zip(model.parameters(), dw):
@@ -100,7 +110,7 @@ def _compute_hessian(self, dw, trn_batch, estim):
         with torch.no_grad():
             for p, d in zip(model.parameters(), dw):
                 p += eps * d
-        hessian = [(p - n) / 2. * eps.item() for p, n in zip(dalpha_pos, dalpha_neg)]
+        hessian = [(p - n) / 2. * eps for p, n in zip(dalpha_pos, dalpha_neg)]
         return hessian
 
 
@@ -117,13 +127,16 @@ class BinaryGateOptim(GradientBasedOptim):
         }
     }
 
-    def __init__(self, a_optim=None, n_samples=2, renorm=True, space=None):
+    def __init__(
+        self, a_optim: OPTIM_CONF_TYPE = None, n_samples: int = 2, renorm: bool = True,
+        space: Optional[ParamSpace] = None
+    ) -> None:
         super().__init__(space, a_optim or BinaryGateOptim._default_optimizer_conf)
         self.n_samples = n_samples
         self.sample = (self.n_samples != 0)
         self.renorm = renorm and self.sample
 
-    def step(self, estim):
+    def step(self, estim: EstimBase) -> None:
         """Update Optimizer states using Estimator."""
         self.optim_reset()
         model = estim.model
@@ -175,10 +188,10 @@ def step(self, estim):
 class DirectGradOptim(GradientBasedOptim):
     """Optimizer by backwarding training loss."""
 
-    def __init__(self, a_optim=None, space=None):
+    def __init__(self, a_optim: OPTIM_CONF_TYPE = None, space: Optional[ParamSpace] = None) -> None:
         super().__init__(space, a_optim)
 
-    def step(self, estim):
+    def step(self, estim: EstimBase) -> None:
         """Update Optimizer states using Estimator."""
         self.optim_step()
         self.optim_reset()
@@ -188,10 +201,10 @@ def step(self, estim):
 class DirectGradBiLevelOptim(GradientBasedOptim):
     """Optimizer by backwarding validating loss."""
 
-    def __init__(self, a_optim=None, space=None):
+    def __init__(self, a_optim: OPTIM_CONF_TYPE = None, space: Optional[ParamSpace] = None) -> None:
         super().__init__(space, a_optim)
 
-    def step(self, estim):
+    def step(self, estim: EstimBase) -> None:
         """Update Optimizer states using Estimator."""
         self.optim_reset()
         loss = estim.loss(estim.get_next_valid_batch(), mode='valid')
@@ -206,13 +219,16 @@ class REINFORCEOptim(GradientBasedOptim):
     modified from https://github.com/mit-han-lab/proxylessnas
     """
 
-    def __init__(self, a_optim=None, batch_size=10, space=None):
+    def __init__(
+        self, a_optim: OPTIM_CONF_TYPE = None, batch_size: int = 10,
+        space: Optional[ParamSpace] = None
+    ) -> None:
         super().__init__(space, a_optim)
         self.batch_size = batch_size
         self.baseline = None
         self.baseline_decay_weight = 0.99
 
-    def step(self, estim):
+    def step(self, estim: EstimBase) -> None:
         """Update Optimizer states using Estimator."""
         model = estim.model
         self.optim_reset()
@@ -264,23 +280,21 @@ class GumbelAnnealingOptim(GradientBasedOptim):
     """Optimizer with Gumbel Annealing (SNAS) algorithm."""
 
     def __init__(self,
-                 a_optim=None,
-                 init_temp=1e4,
-                 exp_anneal_rate=0.0015,
-                 anneal_interval=1,
-                 restart_period=None,
-                 space=None):
+                 a_optim: OPTIM_CONF_TYPE = None,
+                 init_temp: float = 1e4,
+                 exp_anneal_rate: float = 0.0015,
+                 anneal_interval: int = 1,
+                 restart_period: Optional[int] = None,
+                 space: Optional[ParamSpace] = None) -> None:
         super().__init__(space, a_optim)
         self.init_temp = init_temp
         self.exp_anneal_rate = exp_anneal_rate
         self.temp = self.init_temp
-        if restart_period is None:
-            restart_period = 0
-        self.restart_period = int(restart_period)
+        self.restart_period = restart_period or 0
         self.anneal_interval = anneal_interval
         self.cur_step = 0
 
-    def step(self, estim):
+    def step(self, estim: EstimBase) -> None:
         """Update Optimizer states using Estimator."""
         self.optim_reset()
         model = estim.model
@@ -295,6 +309,6 @@ def step(self, estim):
         if self.cur_step % intv == 0:
             self.temp = self.init_temp * math.exp(-self.exp_anneal_rate * self.cur_step / intv)
 
-    def _apply_temp(self, model):
+    def _apply_temp(self, model: Module) -> None:
         for m in MixedOp.gen(model):
             m.set_temperature(self.temp)
diff --git a/vega/algorithms/nas/modnas/registry/__init__.py b/vega/algorithms/nas/modnas/registry/__init__.py
index cff1cdba..6bad4600 100644
--- a/vega/algorithms/nas/modnas/registry/__init__.py
+++ b/vega/algorithms/nas/modnas/registry/__init__.py
@@ -11,11 +11,19 @@
 """Registry for framework components."""
 import sys
 import importlib.util
+from importlib.abc import Loader, MetaPathFinder
+from importlib.machinery import ModuleSpec
 from functools import partial
 from .registry import registry
+from typing import Any, Callable, Dict, List, Optional, Tuple, Sequence, Union
+from types import ModuleType
+import types
 
 
-def register(_reg_path, builder, _reg_id=None):
+SPEC_TYPE = Union[str, Tuple[str, ...], List[Any], Dict[str, Any]]
+
+
+def register(_reg_path: str, builder: Any, _reg_id: Optional[str] = None) -> Any:
     """Register class as name."""
     if _reg_id is None:
         _reg_id = builder.__qualname__
@@ -23,12 +31,12 @@ def register(_reg_path, builder, _reg_id=None):
     return builder
 
 
-def get_builder(_reg_path, _reg_id):
+def get_builder(_reg_path: str, _reg_id: str) -> Any:
     """Return class builder by name."""
     return registry.get(_reg_path, _reg_id)
 
 
-def parse_spec(spec):
+def parse_spec(spec: SPEC_TYPE) -> Any:
     """Return parsed id and arguments from build spec."""
     if isinstance(spec, dict):
         return spec['type'], spec.get('args', {})
@@ -39,7 +47,7 @@ def parse_spec(spec):
     raise ValueError('Invalid build spec: {}'.format(spec))
 
 
-def to_spec(reg_id, kwargs):
+def to_spec(reg_id: str, kwargs: Dict[str, Any]) -> Dict[str, Any]:
     """Return build spec from id and arguments."""
     return {
         'type': reg_id,
@@ -47,7 +55,18 @@ def to_spec(reg_id, kwargs):
     }
 
 
-def build(_reg_path, _spec, *args, **kwargs):
+def streamline_spec(spec: Optional[Union[Dict[str, SPEC_TYPE], List[SPEC_TYPE], SPEC_TYPE]]) -> List[SPEC_TYPE]:
+    """Return a list of one or multiple specs."""
+    if spec is None:
+        return []
+    if isinstance(spec, dict) and 'type' not in spec:
+        return list(spec.values())
+    if not isinstance(spec, list):
+        return [spec]
+    return spec
+
+
+def build(_reg_path: str, _spec: SPEC_TYPE, *args, **kwargs) -> Any:
     """Instantiate class by name."""
     reg_id, sp_kwargs = parse_spec(_spec)
     kwargs.update(sp_kwargs)
@@ -63,7 +82,7 @@ def reg_builder(func):
     return reg_builder
 
 
-def get_registry_utils(_reg_path):
+def get_registry_utils(_reg_path: str) -> Tuple[str, Callable, Callable, Callable, Callable]:
     """Return registration utilities."""
     _register = partial(register, _reg_path)
     _get_builder = partial(get_builder, _reg_path)
@@ -72,14 +91,14 @@ def get_registry_utils(_reg_path):
     return _reg_path, _register, _get_builder, _build, _register_as
 
 
-def _get_registry_name(path):
+def _get_registry_name(path: List[str]) -> str:
     return '.'.join(path[path.index('modnas') + 2:])
 
 
-class RegistryModule():
+class RegistryModule(ModuleType):
     """Registry as a module."""
 
-    def __init__(self, fullname):
+    def __init__(self, fullname: str) -> None:
         path = fullname.split('.')
         registry_name = _get_registry_name(path)
         self.__package__ = fullname
@@ -89,22 +108,25 @@ def __init__(self, fullname):
         self.__spec__ = None
         self.reg_path, self.register, self.get_builder, self.build, self.register_as = get_registry_utils(registry_name)
 
-    def __getattr__(self, attr):
+    def __getattr__(self, attr: str) -> Any:
         """Return builder by attribute name."""
         if attr in self.__dict__:
             return self.__dict__.get(attr)
         return self.get_builder(attr)
 
 
-class RegistryImporter():
+class RegistryImporter(Loader, MetaPathFinder):
     """Create new Registry using import hooks (PEP 302)."""
 
-    def find_spec(self, fullname, path, target=None):
+    def find_spec(
+        self, fullname: str, path: Optional[Sequence[Union[bytes, str]]], target: Optional[types.ModuleType] = None
+    ) -> Optional[ModuleSpec]:
         """Handle registry imports."""
         if 'modnas.registry' in fullname:
             return importlib.util.spec_from_loader(fullname, self)
+        return None
 
-    def load_module(self, fullname):
+    def load_module(self, fullname: str) -> RegistryModule:
         """Create and find registry by import path."""
         path = fullname.split('.')
         reg_path, reg_id = path[:-1], path[-1]
diff --git a/vega/algorithms/nas/modnas/registry/registry.py b/vega/algorithms/nas/modnas/registry/registry.py
index ebe25bce..4ddfbd65 100644
--- a/vega/algorithms/nas/modnas/registry/registry.py
+++ b/vega/algorithms/nas/modnas/registry/registry.py
@@ -9,38 +9,39 @@
 # MIT License for more details.
 
 """Default registry."""
-from modnas.utils.logging import get_logger
+import logging
+from typing import Any
+
+logger = logging.getLogger('modnas.registry')
 
 
 class Registry():
     """Registry class."""
 
-    logger = get_logger('registry')
-
-    def __init__(self, allow_replace=False):
+    def __init__(self, allow_replace: bool = False) -> None:
         self.allow_replace = allow_replace
         self._reg_class = {}
 
-    def get_full_path(self, reg_path, reg_id):
+    def get_full_path(self, reg_path: str, reg_id: str) -> str:
         """Return full registration path."""
         return '{}.{}'.format(reg_path, reg_id)
 
-    def get_reg_name(self, reg_path, reg_id):
+    def get_reg_name(self, reg_path: str, reg_id: str) -> str:
         """Return proper registration name."""
         name = self.get_full_path(reg_path, reg_id)
         return name.lower().replace('-', '').replace('_', '').replace(' ', '')
 
-    def register(self, regclass, reg_path, reg_id):
+    def register(self, regclass: Any, reg_path: str, reg_id: str) -> None:
         """Register a component class."""
         reg_id = self.get_reg_name(reg_path, reg_id)
         if reg_id in self._reg_class:
-            self.logger.warning('re-register id: {}'.format(reg_id))
+            logger.warning('re-register id: {}'.format(reg_id))
             if not self.allow_replace:
                 raise ValueError('Cannot re-register id: {}'.format(reg_id))
         self._reg_class[reg_id] = regclass
-        self.logger.debug('registered: {}'.format(reg_id))
+        logger.debug('registered: {}'.format(reg_id))
 
-    def get(self, reg_path, reg_id):
+    def get(self, reg_path: str, reg_id: str) -> Any:
         """Return registered class by name."""
         reg_id = self.get_reg_name(reg_path, reg_id)
         if reg_id not in self._reg_class:
diff --git a/vega/algorithms/nas/modnas/trainer/torch/default.py b/vega/algorithms/nas/modnas/trainer/torch/default.py
index 1f446411..c0376c2f 100644
--- a/vega/algorithms/nas/modnas/trainer/torch/default.py
+++ b/vega/algorithms/nas/modnas/trainer/torch/default.py
@@ -14,6 +14,11 @@
 from modnas import backend
 from ..base import TrainerBase
 from modnas.registry.trainer import register
+from modnas.estim.base import EstimBase
+from torch import Tensor
+from torch.nn.modules.module import Module
+from typing import Dict, Optional, Any
+from modnas.registry import SPEC_TYPE
 
 
 @register
@@ -21,32 +26,28 @@ class DefaultTrainer(TrainerBase):
     """Default Trainer class."""
 
     def __init__(self,
-                 writer=None,
-                 expman=None,
-                 device='cuda',
-                 data_provider=None,
-                 optimizer=None,
-                 lr_scheduler=None,
-                 criterion=None,
-                 w_grad_clip=0):
+                 writer: Optional[Any] = None,
+                 expman: Optional[Any] = None,
+                 data_provider: Optional[SPEC_TYPE] = None,
+                 optimizer: Optional[SPEC_TYPE] = None,
+                 lr_scheduler: Optional[SPEC_TYPE] = None,
+                 criterion: Optional[SPEC_TYPE] = None,
+                 w_grad_clip: int = 0) -> None:
         super().__init__(writer)
-        self.config = None
         self.w_grad_clip = w_grad_clip
         self.expman = expman
-        self.device = device
         self.optimizer = None
         self.lr_scheduler = None
         self.data_provider = None
         self.criterion = None
-        config = {
+        self.config = {
             'optimizer': optimizer,
             'lr_scheduler': lr_scheduler,
             'data_provider': data_provider,
             'criterion': criterion,
         }
-        self.config = config
 
-    def init(self, model, config=None):
+    def init(self, model: Module, config: Optional[Dict[str, Any]] = None) -> None:
         """Initialize trainer states."""
         self.config.update(config or {})
         if self.config['optimizer']:
@@ -57,25 +58,25 @@ def init(self, model, config=None):
             self.data_provider = backend.get_data_provider(self.config['data_provider'])
         if self.config['criterion']:
             self.criterion = backend.get_criterion(self.config['criterion'], getattr(model, 'device_ids', None))
-        self.device = self.config.get('device', self.device)
+        self.device = self.config.get('device', backend.get_device())
 
-    def get_num_train_batch(self, epoch):
+    def get_num_train_batch(self, epoch: int) -> int:
         """Return number of train batches in current epoch."""
         return 0 if self.data_provider is None else self.data_provider.get_num_train_batch(epoch=epoch)
 
-    def get_num_valid_batch(self, epoch):
+    def get_num_valid_batch(self, epoch: int) -> int:
         """Return number of validate batches in current epoch."""
         return 0 if self.data_provider is None else self.data_provider.get_num_valid_batch(epoch=epoch)
 
-    def get_next_train_batch(self):
+    def get_next_train_batch(self) -> Any:
         """Return the next train batch."""
         return self.proc_batch(self.data_provider.get_next_train_batch())
 
-    def get_next_valid_batch(self):
+    def get_next_valid_batch(self) -> Any:
         """Return the next validate batch."""
         return self.proc_batch(self.data_provider.get_next_valid_batch())
 
-    def proc_batch(self, batch):
+    def proc_batch(self, batch: Any) -> Any:
         """Process batch."""
         return tuple(v.to(device=self.device, non_blocking=True) for v in batch)
 
@@ -93,7 +94,7 @@ def load_state_dict(self, sd):
         if self.lr_scheduler is not None:
             self.lr_scheduler.load_state_dict(sd['lr_scheduler'])
 
-    def get_lr(self):
+    def get_lr(self) -> float:
         """Return current learning rate."""
         if self.lr_scheduler:
             if hasattr(self.lr_scheduler, 'get_last_lr'):
@@ -105,17 +106,21 @@ def get_optimizer(self):
         """Return optimizer."""
         return self.optimizer
 
-    def loss(self, output=None, data=None, model=None):
+    def loss(
+        self, output: Optional[Any] = None, data: Optional[Any] = None, model: Optional[Module] = None
+    ) -> Optional[Tensor]:
         """Return loss."""
         return None if self.criterion is None else self.criterion(None, None, output, *data)
 
-    def train_epoch(self, estim, model, tot_steps, epoch, tot_epochs):
+    def train_epoch(self, estim: EstimBase, model: Module, tot_steps: int, epoch: int, tot_epochs: int) -> None:
         """Train for one epoch."""
         self.data_provider.reset_train_iter()
         for step in range(tot_steps):
             self.train_step(estim, model, epoch, tot_epochs, step, tot_steps)
 
-    def train_step(self, estim, model, epoch, tot_epochs, step, tot_steps):
+    def train_step(
+        self, estim: EstimBase, model: Module, epoch: int, tot_epochs: int, step: int, tot_steps: int
+    ) -> Dict[str, Any]:
         """Train for one step."""
         optimizer = self.optimizer
         lr_scheduler = self.lr_scheduler
@@ -137,7 +142,7 @@ def train_step(self, estim, model, epoch, tot_epochs, step, tot_steps):
             'N': len(batch[-1]),
         }
 
-    def valid_epoch(self, estim, model, tot_steps, epoch=0, tot_epochs=1):
+    def valid_epoch(self, estim: EstimBase, model: Module, tot_steps: int, epoch: int = 0, tot_epochs: int = 1) -> None:
         """Validate for one epoch."""
         self.data_provider.reset_valid_iter()
         if not tot_steps:
@@ -145,7 +150,9 @@ def valid_epoch(self, estim, model, tot_steps, epoch=0, tot_epochs=1):
         for step in range(tot_steps):
             self.valid_step(estim, model, epoch, tot_epochs, step, tot_steps)
 
-    def valid_step(self, estim, model, epoch, tot_epochs, step, tot_steps):
+    def valid_step(
+        self, estim: EstimBase, model: Module, epoch: int, tot_epochs: int, step: int, tot_steps: int
+    ) -> Dict[str, Any]:
         """Validate for one step."""
         model.eval()
         with torch.no_grad():
diff --git a/vega/algorithms/nas/modnas/trainer/torch/image_cls.py b/vega/algorithms/nas/modnas/trainer/torch/image_cls.py
index 96d68387..5ee3fb37 100644
--- a/vega/algorithms/nas/modnas/trainer/torch/image_cls.py
+++ b/vega/algorithms/nas/modnas/trainer/torch/image_cls.py
@@ -44,14 +44,12 @@ class ImageClsTrainer(TrainerBase):
     def __init__(self,
                  writer=None,
                  expman=None,
-                 device='cuda',
                  data_provider=None,
                  optimizer=None,
                  lr_scheduler=None,
                  criterion='CrossEntropyLoss',
                  w_grad_clip=0):
         super().__init__(writer)
-        self.device = device
         self.w_grad_clip = w_grad_clip
         self.expman = expman
         self.optimizer = None
@@ -77,7 +75,7 @@ def init(self, model, config=None):
             self.data_provider = backend.get_data_provider(self.config['data_provider'])
         if self.config['criterion']:
             self.criterion = backend.get_criterion(self.config['criterion'], getattr(model, 'device_ids', None))
-        self.device = self.config.get('device', self.device)
+        self.device = self.config.get('device', backend.get_device())
 
     def get_num_train_batch(self, epoch):
         """Return number of train batches."""
diff --git a/vega/algorithms/nas/modnas/utils/__init__.py b/vega/algorithms/nas/modnas/utils/__init__.py
index 26d05529..ec7e35f9 100644
--- a/vega/algorithms/nas/modnas/utils/__init__.py
+++ b/vega/algorithms/nas/modnas/utils/__init__.py
@@ -17,6 +17,9 @@
 from functools import partial
 from modnas.version import __version__
 from .logging import get_logger
+from modnas import backend as be
+from typing import Callable, Dict, List, Optional, Union, Any
+
 try:
     from tensorboardX import SummaryWriter
 except ImportError:
@@ -50,7 +53,7 @@ def exec_file(path):
     return globs
 
 
-def import_modules(modules):
+def import_modules(modules: List[str]) -> None:
     """Import modules by name."""
     if modules is None:
         return
@@ -77,36 +80,19 @@ def get_exp_name(config):
     return '{}.{}'.format(time.strftime('%Y%m%d', time.localtime()), hashlib.sha1(str(config).encode()).hexdigest()[:4])
 
 
-def merge_config(src, dest, extend_list=True, overwrite=True):
-    """Return merged config."""
-    if isinstance(src, dict) and isinstance(dest, dict):
-        for k, v in dest.items():
-            if k not in src:
-                src[k] = v
-                logger.debug('merge_config: add key %s' % k)
-            else:
-                src[k] = merge_config(src[k], v, extend_list, overwrite)
-    elif isinstance(src, list) and isinstance(dest, list) and extend_list:
-        logger.debug('merge_config: extend list: %s + %s' % (src, dest))
-        src.extend(dest)
-    elif overwrite:
-        logger.debug('merge_config: overwrite: %s -> %s' % (src, dest))
-        src = dest
-    return src
-
-
-def env_info():
+def env_info() -> str:
     """Return environment info."""
     info = {
         'platform': sys.platform,
         'python': sys.version.split()[0],
         'numpy': np.__version__,
         'modnas': __version__,
+        'backend': None if be.backend() is None else '{{{}}}'.format(getattr(be, 'version', lambda: None)()),
     }
     return 'env info: {}'.format(', '.join(['{k}={{{k}}}'.format(k=k) for k in info])).format(**info)
 
 
-def check_config(config, defaults=None):
+def check_config(config: Dict, defaults: Optional[Any] = None) -> None:
     """Check config and set default values."""
     def check_field(config, field, default):
         cur_key = ''
@@ -156,7 +142,7 @@ def check_field(config, field, default):
 class DummyWriter():
     """A no-op writer."""
 
-    def __getattr__(self, item):
+    def __getattr__(self, item: str) -> Callable:
         """Return no-op."""
         def noop(*args, **kwargs):
             pass
@@ -164,7 +150,7 @@ def noop(*args, **kwargs):
         return noop
 
 
-def get_writer(log_dir, enabled=False):
+def get_writer(log_dir: str, enabled: bool = False) -> DummyWriter:
     """Return a new writer."""
     if enabled:
         if SummaryWriter is None:
@@ -175,9 +161,14 @@ def get_writer(log_dir, enabled=False):
     return writer
 
 
-def copy_members(dest, src, excepts=None, skip_private=True, method=True):
+def copy_members(
+    dest: Any, src: Any, includes: Optional[List[str]] = None, excepts: Optional[List[str]] = None,
+    skip_private: bool = True, method: bool = True
+) -> None:
     """Copy member methods from src to dest."""
     for attr, mem in inspect.getmembers(src):
+        if includes is not None and attr not in includes:
+            continue
         if excepts is not None and attr in excepts:
             continue
         if skip_private and attr.startswith('_'):
@@ -187,7 +178,7 @@ def copy_members(dest, src, excepts=None, skip_private=True, method=True):
         setattr(dest, attr, mem)
 
 
-def get_same_padding(kernel_size):
+def get_same_padding(kernel_size: int) -> int:
     """Return SAME padding size for convolutions."""
     if isinstance(kernel_size, tuple):
         assert len(kernel_size) == 2, 'invalid kernel size: %s' % kernel_size
@@ -202,17 +193,17 @@ def get_same_padding(kernel_size):
 class AverageMeter():
     """Compute and store the average and current value."""
 
-    def __init__(self):
+    def __init__(self) -> None:
         self.reset()
 
-    def reset(self):
+    def reset(self) -> None:
         """Reset all statistics."""
-        self.val = 0
-        self.avg = 0
-        self.sum = 0
+        self.val = 0.
+        self.avg = 0.
+        self.sum = 0.
         self.count = 0
 
-    def update(self, val, n=1):
+    def update(self, val: float, n: int = 1) -> None:
         """Update statistics."""
         self.val = val
         self.sum += val * n
@@ -220,63 +211,68 @@ def update(self, val, n=1):
         self.avg = self.sum / self.count
 
 
-def format_time(sec):
+def format_time(sec: float) -> str:
     """Return formatted time in seconds."""
     m, s = divmod(sec, 60)
     h, m = divmod(m, 60)
     return "%d h %d m %d s" % (h, m, s)
 
 
-def format_key(key, title=True):
+def format_key(key: str, title: bool = True) -> str:
     """Return formatted key."""
     key = ' '.join(key.split('_'))
     return key.title() if title and key.islower() else key
 
 
-def format_value(value, binary=False, div=None, factor=None, prec=2, unit=True, to_str=False):
+def format_value(
+    value: Union[str, float, int], binary: bool = False, div: Optional[int] = None,
+    factor: Optional[int] = None, prec: int = 2, unit: bool = True, to_str: bool = False
+) -> Union[str, float]:
     """Return formatted value."""
     if value is None:
         return None
     if not hasattr(value, '__truediv__'):
         return value
+    f_value = float(value)
     units = ['', 'K', 'M', 'G', 'T', 'P', 'E', 'Z', 'Y']
-    div = (1024. if binary else 1000.) if div is None else div
-    if factor is None:
-        factor = 0
-        tot_div = 1
-        while value > tot_div:
-            factor += 1
-            tot_div *= div
+    _div = (1024 if binary else 1000) if div is None else div
+    _factor = factor or 0
+    if _factor:
+        tot_div = _div ** _factor
     else:
-        tot_div = div ** factor
-    value = round(value / tot_div, prec)
+        tot_div = 1
+        while f_value > tot_div * _div:
+            _factor += 1
+            tot_div *= _div
+    f_value = round(f_value / tot_div, prec)
     if not to_str and not unit:
-        return value
-    return '{{:.{}f}}'.format(prec).format(value) + (units[factor] if unit else '')
+        return f_value
+    return '{{:.{}f}}'.format(prec).format(f_value) + (units[_factor] if unit else '')
 
 
-def format_dict(dct, sep=None, kv_sep=None, fmt_key=None, fmt_val=None):
+def format_dict(
+    dct: Dict[str, Union[float, str]], sep: str = ' | ', kv_sep: str = ': ',
+    fmt_key: Optional[Callable] = None, fmt_val: Optional[Callable] = None
+) -> str:
     """Return formatted dict."""
-    sep = sep or ' | '
-    kv_sep = kv_sep or ':'
     fmt_vals = None if fmt_val is False else (fmt_val if isinstance(fmt_val, dict) else {})
-    fmt_val = fmt_val if callable(fmt_val) else partial(format_value, unit=False, factor=0, prec=4, to_str=True)
-    fmt_key = fmt_key if callable(fmt_key) else None if fmt_key is False else format_key
-    val_dct = {k: v if fmt_vals is None else fmt_vals.get(k, fmt_val)(v) for k, v in dct.items()}
-    return sep.join(['{}{} {{{}}}'.format(fmt_key(k) if fmt_key else k, kv_sep, k) for k in dct]).format(**val_dct)
+    _fmt_val = fmt_val if callable(fmt_val) else partial(format_value, unit=False, factor=0, prec=4, to_str=True)
+    _fmt_key = fmt_key if callable(fmt_key) else None if fmt_key is False else format_key
+    val_dct = {k: v if fmt_vals is None else fmt_vals.get(k, _fmt_val)(v) for k, v in dct.items()}
+    return sep.join(['{}{}{{{}}}'.format(_fmt_key(k) if _fmt_key else k, kv_sep, k) for k in dct]).format(**val_dct)
 
 
 class ETAMeter():
     """ETA Meter."""
 
-    def __init__(self, total_steps, cur_steps=-1, time_fn=None):
+    def __init__(self, total_steps: int, cur_steps: int = -1, time_fn: Optional[Callable] = None) -> None:
         self.time_fn = time_fn or time.perf_counter
         self.total_steps = total_steps
         self.last_step = cur_steps
         self.last_time = self.time_fn()
-        self.speed = None
+        self.speed = 0.
 
-    def start(self):
+    def start(self) -> None:
         """Start timing."""
         self.last_time = self.time_fn()
 
@@ -286,18 +282,18 @@ def set_step(self, step):
         self.last_step = step
         self.last_time = self.time_fn()
 
-    def step(self, n=1):
+    def step(self, n: int = 1) -> None:
         """Increment current step."""
         self.speed = n / (self.time_fn() - self.last_time + 1e-7)
         self.last_step += n
         self.last_time = self.time_fn()
 
-    def eta(self):
+    def eta(self) -> float:
         """Return ETA in seconds."""
-        if self.speed is None:
+        if self.speed < 1e-7:
             return 0
         return (self.total_steps - self.last_step) / (self.speed + 1e-7)
 
-    def eta_fmt(self):
+    def eta_fmt(self) -> str:
         """Return formatted ETA."""
         return format_time(self.eta())
diff --git a/vega/algorithms/nas/modnas/utils/config.py b/vega/algorithms/nas/modnas/utils/config.py
index 99b76a7b..9a080a18 100644
--- a/vega/algorithms/nas/modnas/utils/config.py
+++ b/vega/algorithms/nas/modnas/utils/config.py
@@ -12,10 +12,32 @@
 # modified from https://github.com/HarryVolek/PyTorch_Speaker_Verification
 import yaml
 import copy
-from . import merge_config
+import logging
+from typing import Dict, Optional, Any
 
 
-def load_config_file(filename):
+logger = logging.getLogger('modnas.config')
+
+
+def merge_config(src: Any, dest: Any, extend_list: bool = True, overwrite: bool = True) -> Any:
+    """Return merged config."""
+    if isinstance(src, dict) and isinstance(dest, dict):
+        for k, v in dest.items():
+            if k not in src:
+                src[k] = v
+                logger.debug('merge_config: add key %s' % k)
+            else:
+                src[k] = merge_config(src[k], v, extend_list, overwrite)
+    elif isinstance(src, list) and isinstance(dest, list) and extend_list:
+        logger.debug('merge_config: extend list: %s + %s' % (src, dest))
+        src.extend(dest)
+    elif overwrite:
+        logger.debug('merge_config: overwrite: %s -> %s' % (src, dest))
+        src = dest
+    return src
+
+
+def load_config_file(filename: str) -> Dict[str, Any]:
     """Load configuration from YAML file."""
     docs = yaml.load_all(open(filename, 'r'), Loader=yaml.SafeLoader)
     config_dict = dict()
@@ -32,7 +54,7 @@ class Config(dict):
     __setattr__ = dict.__setitem__
     __delattr__ = dict.__delitem__
 
-    def __init__(self, dct=None, file=None):
+    def __init__(self, dct: Optional[Dict] = None, file: Optional[str] = None) -> None:
         super().__init__()
         dct = {} if dct is None else dct
         if file is not None:
@@ -48,7 +70,7 @@ def __init__(self, dct=None, file=None):
         yaml.add_representer(Config,
                              lambda dumper, data: dumper.represent_mapping('tag:yaml.org,2002:map', data.items()))
 
-    def to_dict(self):
+    def to_dict(self) -> Dict[str, Any]:
         """Return dict converted from Config."""
         dct = {}
         for k, v in self.items():
@@ -57,11 +79,11 @@ def to_dict(self):
             dct[k] = v
         return dct
 
-    def __deepcopy__(self, memo):
+    def __deepcopy__(self, memo: Dict[int, Any]) -> Any:
         """Return deepcopy."""
         return Config(copy.deepcopy(dict(self)))
 
-    def __str__(self):
+    def __str__(self) -> str:
         """Return config string."""
         return yaml.dump(dict(self), default_flow_style=False)
 
@@ -77,7 +99,7 @@ def get_value(config, key):
         return Config.get_value(val, '.'.join(keywords[1:]))
 
     @staticmethod
-    def set_value(config, key, value):
+    def set_value(config: Any, key: str, value: Any) -> None:
         """Set config value by path."""
         keywords = key.split('.')
         val = config.get(keywords[0], None)
@@ -90,7 +112,7 @@ def set_value(config, key, value):
         Config.set_value(val, '.'.join(keywords[1:]), value)
 
     @staticmethod
-    def apply(config, spec):
+    def apply(config: Any, spec: Any) -> None:
         """Apply items to a configuration."""
         if isinstance(spec, dict):
             spec = Config(dct=spec)
@@ -106,7 +128,7 @@ def apply(config, spec):
             raise ValueError('unsupported apply type: {}'.format(type(spec)))
 
     @staticmethod
-    def load(conf):
+    def load(conf: Any) -> Any:
         """Load configuration."""
         if isinstance(conf, Config):
             config = conf
diff --git a/vega/algorithms/nas/modnas/utils/exp_manager.py b/vega/algorithms/nas/modnas/utils/exp_manager.py
index cde1808b..41890fc9 100644
--- a/vega/algorithms/nas/modnas/utils/exp_manager.py
+++ b/vega/algorithms/nas/modnas/utils/exp_manager.py
@@ -12,6 +12,7 @@
 import os
 import time
 from .logging import get_logger
+from typing import Optional
 
 
 class ExpManager():
@@ -19,7 +20,7 @@ class ExpManager():
 
     logger = get_logger('exp_manager')
 
-    def __init__(self, name, root_dir='exp', subdir_timefmt=None):
+    def __init__(self, name: str, root_dir: str = 'exp', subdir_timefmt: Optional[str] = None) -> None:
         if subdir_timefmt is None:
             root_dir = os.path.join(root_dir, name)
         else:
@@ -28,12 +29,12 @@ def __init__(self, name, root_dir='exp', subdir_timefmt=None):
         os.makedirs(self.root_dir, exist_ok=True)
         self.logger.info('exp dir: {}'.format(self.root_dir))
 
-    def subdir(self, *args):
+    def subdir(self, *args) -> str:
         """Return subdir in current root dir."""
         subdir = os.path.join(self.root_dir, *args)
         os.makedirs(subdir, exist_ok=True)
         return subdir
 
-    def join(self, *args):
+    def join(self, *args) -> str:
         """Join root dir and subdir path."""
         return os.path.join(self.subdir(*args[:-1]), args[-1])
diff --git a/vega/algorithms/nas/modnas/utils/logging.py b/vega/algorithms/nas/modnas/utils/logging.py
index 3a2009fc..7ad753d9 100644
--- a/vega/algorithms/nas/modnas/utils/logging.py
+++ b/vega/algorithms/nas/modnas/utils/logging.py
@@ -14,6 +14,9 @@
 import copy
 import logging
 import logging.config
+from modnas.utils.config import merge_config
+from logging import Logger
+from typing import Optional, Dict, Any
 
 
 DEFAULT_LOGGING_CONF = {
@@ -45,18 +48,17 @@
 }
 
 
-def get_logger(name=None):
+def get_logger(name: Optional[str] = None) -> Logger:
     """Return logger of given name."""
     root = 'modnas'
     return logging.getLogger(root if name is None else (name if name.startswith(root) else root + '.' + name))
 
 
-def configure_logging(config=None, log_dir=None):
+def configure_logging(config: Optional[Dict[str, Any]] = None, log_dir: Optional[str] = None) -> None:
     """Config loggers."""
-    from . import merge_config
     config_fn = logging.config.dictConfig
-    conf = copy.deepcopy(DEFAULT_LOGGING_CONF)
-    conf['handlers']['file']['filename'] = os.path.join(log_dir, '%d.log' % (int(time.time())))
+    conf: Dict[str, Any] = copy.deepcopy(DEFAULT_LOGGING_CONF)
+    conf['handlers']['file']['filename'] = os.path.join(log_dir or '', '%d.log' % (int(time.time())))
     merge_config(conf, config or {})
     config_fn(conf)
 
diff --git a/vega/algorithms/nas/modnas/utils/wrapper.py b/vega/algorithms/nas/modnas/utils/wrapper.py
index 79807d64..37bf4031 100644
--- a/vega/algorithms/nas/modnas/utils/wrapper.py
+++ b/vega/algorithms/nas/modnas/utils/wrapper.py
@@ -23,6 +23,7 @@
 from modnas.registry.estim import build as build_estim
 from modnas.registry.trainer import build as build_trainer
 from modnas.registry import parse_spec, to_spec
+from modnas.utils.config import merge_config
 from modnas import utils
 from .logging import configure_logging, get_logger
 from modnas import backend as be
@@ -111,7 +112,7 @@ def load_config(conf):
     config = None
     for cfg in conf:
         loaded_cfg = Config.load(cfg)
-        config = loaded_cfg if config is None else utils.merge_config(config, loaded_cfg)
+        config = loaded_cfg if config is None else merge_config(config, loaded_cfg)
     return config
 
 
@@ -129,8 +130,7 @@ def get_init_constructor(config, device):
         default_conf = {'type': 'DefaultInitConstructor'}
     else:
         raise NotImplementedError
-    default_conf.update(config)
-    return default_conf
+    return merge_config(default_conf, config)
 
 
 def get_model_constructor(config):
@@ -256,15 +256,16 @@ def get_default_constructors(config):
     else:
         device_ids = device_conf.get('device', device_ids)
     con_config['init'] = get_init_constructor(config.get('init', {}), device_ids)
+    con_user_config = config.get('construct', {})
     if 'ops' in config:
         con_config['init']['args']['ops_conf'] = config['ops']
     if 'model' in config:
         con_config['model'] = get_model_constructor(config['model'])
     if 'mixed_op' in config:
         con_config['mixed_op'] = get_mixed_op_constructor(config['mixed_op'])
-    if arch_desc is not None:
+    if arch_desc is not None and 'arch_desc' not in con_user_config:
         con_config['arch_desc'] = get_arch_desc_constructor(arch_desc)
-    con_config = utils.merge_config(con_config, config.get('construct', {}))
+    con_config = merge_config(con_config, con_user_config)
     if be.is_backend('torch'):
         con_config['device'] = {'type': 'TorchToDevice', 'args': device_conf}
     if config.get('chkpt'):
@@ -328,7 +329,6 @@ def init_all(**kwargs):
     configure_logging(config=config.get('logging', None), log_dir=expman.subdir('logs'))
     writer = utils.get_writer(expman.subdir('writer'), **config.get('writer', {}))
     logger.info('Name: {} Routine: {} Config:\n{}'.format(name, routine, config))
-    logger.info(utils.env_info())
     # imports
     imports = config.get('import', [])
     if not isinstance(imports, list):
@@ -337,6 +337,7 @@ def init_all(**kwargs):
         imports.insert(0, 'modnas.utils.predefined')
     utils.import_modules(imports)
     be.use(config.get('backend'))
+    logger.info(utils.env_info())
     # data
     data_provider_conf = get_data_provider_config(config)
     # construct
@@ -412,7 +413,7 @@ def run(*args, parse=False, **kwargs):
     """Run routine."""
     if parse or (not args and not kwargs):
         parsed_kwargs = parse_routine_args()
-        parsed_kwargs = utils.merge_config(parsed_kwargs, kwargs)
+        parsed_kwargs = merge_config(parsed_kwargs, kwargs)
     else:
         parsed_kwargs = kwargs
     return run_default(*args, **parsed_kwargs)
diff --git a/vega/algorithms/nas/modnas/version.py b/vega/algorithms/nas/modnas/version.py
index 800c4a67..f8773fde 100644
--- a/vega/algorithms/nas/modnas/version.py
+++ b/vega/algorithms/nas/modnas/version.py
@@ -1,2 +1,2 @@
 """ModularNAS version."""
-__version__ = '0.0.5'
+__version__ = '0.0.6'
diff --git a/vega/algorithms/nas/segmentation_ea/segmentation_ea_trainercallback.py b/vega/algorithms/nas/segmentation_ea/segmentation_ea_trainercallback.py
index d7053eb4..47caa120 100644
--- a/vega/algorithms/nas/segmentation_ea/segmentation_ea_trainercallback.py
+++ b/vega/algorithms/nas/segmentation_ea/segmentation_ea_trainercallback.py
@@ -9,6 +9,7 @@
 # MIT License for more details.
 
 """The trainer program for SegmentationEA."""
+import vega
 import logging
 import torch
 from vega.common import ClassFactory, ClassType
@@ -25,7 +26,10 @@ class SegmentationEATrainerCallback(Callback):
     def before_train(self, logs=None):
         """Be called before the training process."""
         self.config = self.trainer.config
-        count_input = torch.FloatTensor(1, 3, 1024, 1024).cuda()
+        if vega.is_npu_device():
+            count_input = torch.FloatTensor(1, 3, 1024, 1024).npu()
+        else:
+            count_input = torch.FloatTensor(1, 3, 1024, 1024).cuda()
         flops_count, params_count = calc_model_flops_params(
             self.trainer.model, count_input)
         self.flops_count, self.params_count = flops_count * 1e-9, params_count * 1e-3
diff --git a/vega/algorithms/nas/sp_nas/__init__.py b/vega/algorithms/nas/sp_nas/__init__.py
index e4e4c1fd..a8cc4e9c 100644
--- a/vega/algorithms/nas/sp_nas/__init__.py
+++ b/vega/algorithms/nas/sp_nas/__init__.py
@@ -1,2 +1,5 @@
 from .spnas_s import *
 from .spnas_p import *
+import vega
+if vega.is_ms_backend():
+    from .spnas_trainer_callback import SpNasTrainerCallback
diff --git a/vega/algorithms/nas/sp_nas/spnas_trainer_callback.py b/vega/algorithms/nas/sp_nas/spnas_trainer_callback.py
new file mode 100644
index 00000000..f60b055e
--- /dev/null
+++ b/vega/algorithms/nas/sp_nas/spnas_trainer_callback.py
@@ -0,0 +1,189 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""The trainer program for Auto Lane."""
+
+import logging
+import os
+import time
+import numpy as np
+from pycocotools.coco import COCO
+from vega.common import ClassFactory, ClassType
+from vega.trainer.trainer_ms import TrainerMs
+from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor
+import mindspore.common.dtype as mstype
+from mindspore.train import Model as MsModel
+from mindspore import Tensor
+from mindspore.nn import SGD
+from .src.model_utils.config import config
+from .src.dataset import data_to_mindrecord_byte_image, create_fasterrcnn_dataset
+from .src.lr_schedule import dynamic_lr
+from .src.network_define import WithLossCell, TrainOneStepCell, LossNet
+from .src.util import coco_eval, bbox2result_1image, results2json
+from vega.datasets.conf.dataset import DatasetConfig
+
+logger = logging.getLogger(__name__)
+
+
+def valid():
+    """Construct the trainer of SpNas."""
+    config = DatasetConfig().to_dict()
+    config = config['_class_data'].val
+    prefix = "FasterRcnn_eval.mindrecord"
+    mindrecord_dir = config.mindrecord_dir
+    mindrecord_file = os.path.join(mindrecord_dir, prefix)
+
+    if not os.path.exists(mindrecord_file):
+        if not os.path.isdir(mindrecord_dir):
+            os.makedirs(mindrecord_dir)
+        if config.dataset == "coco":
+            if os.path.isdir(config.coco_root):
+                data_to_mindrecord_byte_image(config, "coco", False, prefix, file_num=1)
+            else:
+                logging.info("coco_root not exits.")
+        else:
+            if os.path.isdir(config.IMAGE_DIR) and os.path.exists(config.ANNO_PATH):
+                data_to_mindrecord_byte_image(config, "other", False, prefix, file_num=1)
+            else:
+                logging.info("IMAGE_DIR or ANNO_PATH not exits.")
+    dataset = create_fasterrcnn_dataset(config, mindrecord_file, batch_size=config.test_batch_size, is_training=False)
+    return dataset
+
+
+def train():
+    """Train fasterrcnn dataset."""
+    config = DatasetConfig().to_dict()
+    config = config['_class_data'].train
+    prefix = "FasterRcnn.mindrecord"
+    mindrecord_dir = config.mindrecord_dir
+    mindrecord_file = os.path.join(mindrecord_dir, prefix + "0")
+    print("CHECKING MINDRECORD FILES ...")
+    rank = int(os.getenv('RANK_ID', '0'))
+    device_num = int(os.getenv('RANK_SIZE', '1'))
+
+    if rank == 0 and not os.path.exists(mindrecord_file):
+        if not os.path.isdir(mindrecord_dir):
+            os.makedirs(mindrecord_dir)
+        if config.dataset == "coco":
+            if os.path.isdir(config.coco_root):
+                if not os.path.exists(config.coco_root):
+                    logging.info("Please make sure config:coco_root is valid.")
+                    raise ValueError(config.coco_root)
+                data_to_mindrecord_byte_image(config, "coco", True, prefix)
+            else:
+                logging.info("coco_root not exits.")
+        else:
+            if os.path.isdir(config.image_dir) and os.path.exists(config.anno_path):
+                if not os.path.exists(config.image_dir):
+                    logging.info("Please make sure config:image_dir is valid.")
+                    raise ValueError(config.image_dir)
+                data_to_mindrecord_byte_image(config, "other", True, prefix)
+            else:
+                logging.info("image_dir or anno_path not exits.")
+
+    while not os.path.exists(mindrecord_file + ".db"):
+        time.sleep(5)
+    dataset = create_fasterrcnn_dataset(config, mindrecord_file, batch_size=config.batch_size,
+                                        device_num=device_num, rank_id=rank,
+                                        num_parallel_workers=config.num_parallel_workers,
+                                        python_multiprocessing=config.python_multiprocessing)
+    return dataset
+
+
+@ClassFactory.register(ClassType.TRAINER)
+class SpNasTrainerCallback(TrainerMs):
+    """Construct the trainer of SpNas."""
+
+    disable_callbacks = ['ProgressLogger']
+
+    def build(self):
+        """Construct the trainer of SpNas."""
+        logging.debug("Trainer Config: {}".format(self.config))
+        self._init_hps()
+        self.use_syncbn = self.config.syncbn
+        if not self.train_loader:
+            self.train_loader = train()
+        if not self.valid_loader:
+            self.valid_loader = valid()
+        self.batch_num_train = self.train_loader.get_dataset_size()
+        self.batch_num_valid = self.valid_loader.get_dataset_size()
+
+    def _train_epoch(self):
+        """Construct the trainer of SpNas."""
+        dataset = self.train_loader
+        dataset_size = dataset.get_dataset_size()
+        self.model = self.model.set_train()
+        self.model.to_float(mstype.float16)
+        self.loss = LossNet()
+        lr = Tensor(dynamic_lr(config, dataset_size), mstype.float32)
+        self.optimizer = SGD(params=self.model.trainable_params(), learning_rate=lr, momentum=config.momentum,
+                             weight_decay=config.weight_decay, loss_scale=config.loss_scale)
+        net_with_loss = WithLossCell(self.model, self.loss)
+        self.model = TrainOneStepCell(net_with_loss, self.optimizer, sens=config.loss_scale)
+
+        config_ck = CheckpointConfig(save_checkpoint_steps=self.config.save_steps, keep_checkpoint_max=1)
+        save_path = self.get_local_worker_path(self.step_name, self.worker_id)
+        ckpoint_cb = ModelCheckpoint(config=config_ck, directory=save_path)
+        loss_cb = LossMonitor(per_print_times=1)
+        callback_list = [ckpoint_cb, loss_cb]
+        self.ms_model = MsModel(self.model)
+        try:
+            self.ms_model.train(epoch=self.trainer.epochs,
+                                train_dataset=dataset,
+                                callbacks=callback_list,
+                                dataset_sink_mode=False)
+        except RuntimeError as e:
+            logging.warning(f"failed to train the model, skip it, message: {str(e)}")
+
+    def _valid_epoch(self):
+        """Construct the trainer of SpNas."""
+        dataset = self.valid_loader
+        self.model.set_train(False)
+        outputs = []
+        dataset_coco = COCO(config.ann_file)
+
+        max_num = 128
+        for data in dataset.create_dict_iterator(num_epochs=1):
+
+            img_data = data['image']
+            img_metas = data['image_shape']
+            gt_bboxes = data['box']
+            gt_labels = data['label']
+            gt_num = data['valid_num']
+            output = self.model(img_data, img_metas, gt_bboxes, gt_labels, gt_num)
+            all_bbox = output[0]
+            all_label = output[1]
+            all_mask = output[2]
+
+            for j in range(config.test_batch_size):
+                all_bbox_squee = np.squeeze(all_bbox.asnumpy()[j, :, :])
+                all_label_squee = np.squeeze(all_label.asnumpy()[j, :, :])
+                all_mask_squee = np.squeeze(all_mask.asnumpy()[j, :, :])
+
+                all_bboxes_tmp_mask = all_bbox_squee[all_mask_squee, :]
+                all_labels_tmp_mask = all_label_squee[all_mask_squee]
+
+                if all_bboxes_tmp_mask.shape[0] > max_num:
+                    inds = np.argsort(-all_bboxes_tmp_mask[:, -1])
+                    inds = inds[:max_num]
+                    all_bboxes_tmp_mask = all_bboxes_tmp_mask[inds]
+                    all_labels_tmp_mask = all_labels_tmp_mask[inds]
+
+                outputs_tmp = bbox2result_1image(all_bboxes_tmp_mask, all_labels_tmp_mask, config.num_classes)
+
+                outputs.append(outputs_tmp)
+
+        eval_types = ["bbox"]
+        result_files = results2json(dataset_coco, outputs, "./results.pkl")
+        metrics = coco_eval(result_files, eval_types, dataset_coco, single_result=True)
+        self.valid_metrics.update(metrics)
+        valid_logs = dict()
+        valid_logs['cur_valid_perfs'] = self.valid_metrics.results
+        self.callbacks.after_valid(valid_logs)
diff --git a/vega/algorithms/nas/sp_nas/src/__init__.py b/vega/algorithms/nas/sp_nas/src/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/vega/algorithms/nas/sp_nas/src/dataset.py b/vega/algorithms/nas/sp_nas/src/dataset.py
new file mode 100644
index 00000000..441d9a52
--- /dev/null
+++ b/vega/algorithms/nas/sp_nas/src/dataset.py
@@ -0,0 +1,483 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""FasterRcnn dataset."""
+from __future__ import division
+
+import os
+import numpy as np
+from numpy import random
+
+import cv2
+import mmcv
+import mindspore.dataset as de
+import mindspore.dataset.vision.c_transforms as C
+from mindspore.mindrecord import FileWriter
+
+
+def bbox_overlaps(bboxes1, bboxes2, mode='iou'):
+    """Calculate the ious between each bbox of bboxes1 and bboxes2.
+
+    Args:
+        bboxes1(ndarray): shape (n, 4)
+        bboxes2(ndarray): shape (k, 4)
+        mode(str): iou (intersection over union) or iof (intersection
+            over foreground)
+
+    Returns:
+        ious(ndarray): shape (n, k)
+    """
+    assert mode in ['iou', 'iof']
+
+    bboxes1 = bboxes1.astype(np.float32)
+    bboxes2 = bboxes2.astype(np.float32)
+    rows = bboxes1.shape[0]
+    cols = bboxes2.shape[0]
+    ious = np.zeros((rows, cols), dtype=np.float32)
+    if rows * cols == 0:
+        return ious
+    exchange = False
+    if bboxes1.shape[0] > bboxes2.shape[0]:
+        bboxes1, bboxes2 = bboxes2, bboxes1
+        ious = np.zeros((cols, rows), dtype=np.float32)
+        exchange = True
+    area1 = (bboxes1[:, 2] - bboxes1[:, 0] + 1) * (bboxes1[:, 3] - bboxes1[:, 1] + 1)
+    area2 = (bboxes2[:, 2] - bboxes2[:, 0] + 1) * (bboxes2[:, 3] - bboxes2[:, 1] + 1)
+    for i in range(bboxes1.shape[0]):
+        x_start = np.maximum(bboxes1[i, 0], bboxes2[:, 0])
+        y_start = np.maximum(bboxes1[i, 1], bboxes2[:, 1])
+        x_end = np.minimum(bboxes1[i, 2], bboxes2[:, 2])
+        y_end = np.minimum(bboxes1[i, 3], bboxes2[:, 3])
+        overlap = np.maximum(x_end - x_start + 1, 0) * np.maximum(
+            y_end - y_start + 1, 0)
+        if mode == 'iou':
+            union = area1[i] + area2 - overlap
+        else:
+            union = area1[i] if not exchange else area2
+        ious[i, :] = overlap / union
+    if exchange:
+        ious = ious.T
+    return ious
+
+
+class PhotoMetricDistortion:
+    """Photo Metric Distortion."""
+
+    def __init__(self,
+                 brightness_delta=32,
+                 contrast_range=(0.5, 1.5),
+                 saturation_range=(0.5, 1.5),
+                 hue_delta=18):
+        self.brightness_delta = brightness_delta
+        self.contrast_lower, self.contrast_upper = contrast_range
+        self.saturation_lower, self.saturation_upper = saturation_range
+        self.hue_delta = hue_delta
+
+    def __call__(self, img, boxes, labels):
+        """Construct the trainer of SpNas."""
+        # random brightness
+        img = img.astype('float32')
+
+        if random.randint(2):
+            delta = random.uniform(-self.brightness_delta,
+                                   self.brightness_delta)
+            img += delta
+
+        # mode == 0 --> do random contrast first
+        # mode == 1 --> do random contrast last
+        mode = random.randint(2)
+        if mode == 1:
+            if random.randint(2):
+                alpha = random.uniform(self.contrast_lower,
+                                       self.contrast_upper)
+                img *= alpha
+
+        # convert color from BGR to HSV
+        img = mmcv.bgr2hsv(img)
+
+        # random saturation
+        if random.randint(2):
+            img[..., 1] *= random.uniform(self.saturation_lower,
+                                          self.saturation_upper)
+
+        # random hue
+        if random.randint(2):
+            img[..., 0] += random.uniform(-self.hue_delta, self.hue_delta)
+            img[..., 0][img[..., 0] > 360] -= 360
+            img[..., 0][img[..., 0] < 0] += 360
+
+        # convert color from HSV to BGR
+        img = mmcv.hsv2bgr(img)
+
+        # random contrast
+        if mode == 0:
+            if random.randint(2):
+                alpha = random.uniform(self.contrast_lower,
+                                       self.contrast_upper)
+                img *= alpha
+
+        # randomly swap channels
+        if random.randint(2):
+            img = img[..., random.permutation(3)]
+
+        return img, boxes, labels
+
+
+class Expand:
+    """Expand image."""
+
+    def __init__(self, mean=(0, 0, 0), to_rgb=True, ratio_range=(1, 4)):
+        if to_rgb:
+            self.mean = mean[::-1]
+        else:
+            self.mean = mean
+        self.min_ratio, self.max_ratio = ratio_range
+
+    def __call__(self, img, boxes, labels):
+        """Construct the trainer of SpNas."""
+        if random.randint(2):
+            return img, boxes, labels
+
+        h, w, c = img.shape
+        ratio = random.uniform(self.min_ratio, self.max_ratio)
+        expand_img = np.full((int(h * ratio), int(w * ratio), c),
+                             self.mean).astype(img.dtype)
+        left = int(random.uniform(0, w * ratio - w))
+        top = int(random.uniform(0, h * ratio - h))
+        expand_img[top:top + h, left:left + w] = img
+        img = expand_img
+        boxes += np.tile((left, top), 2)
+        return img, boxes, labels
+
+
+def rescale_column(img, img_shape, gt_bboxes, gt_label, gt_num, config):
+    """Rescale operation for image."""
+    img_data, scale_factor = mmcv.imrescale(img, (config.img_width, config.img_height), return_scale=True)
+    if img_data.shape[0] > config.img_height:
+        img_data, scale_factor2 = mmcv.imrescale(img_data, (config.img_height, config.img_height), return_scale=True)
+        scale_factor = scale_factor * scale_factor2
+
+    gt_bboxes = gt_bboxes * scale_factor
+    gt_bboxes[:, 0::2] = np.clip(gt_bboxes[:, 0::2], 0, img_data.shape[1] - 1)
+    gt_bboxes[:, 1::2] = np.clip(gt_bboxes[:, 1::2], 0, img_data.shape[0] - 1)
+
+    pad_h = config.img_height - img_data.shape[0]
+    pad_w = config.img_width - img_data.shape[1]
+    assert ((pad_h >= 0) and (pad_w >= 0))
+
+    pad_img_data = np.zeros((config.img_height, config.img_width, 3)).astype(img_data.dtype)
+    pad_img_data[0:img_data.shape[0], 0:img_data.shape[1], :] = img_data
+
+    img_shape = (config.img_height, config.img_width, 1.0)
+    img_shape = np.asarray(img_shape, dtype=np.float32)
+
+    return (pad_img_data, img_shape, gt_bboxes, gt_label, gt_num)
+
+
+def rescale_column_test(img, img_shape, gt_bboxes, gt_label, gt_num, config):
+    """Rescale operation for image of eval."""
+    img_data, scale_factor = mmcv.imrescale(img, (config.img_width, config.img_height), return_scale=True)
+    if img_data.shape[0] > config.img_height:
+        img_data, scale_factor2 = mmcv.imrescale(img_data, (config.img_height, config.img_height), return_scale=True)
+        scale_factor = scale_factor * scale_factor2
+
+    pad_h = config.img_height - img_data.shape[0]
+    pad_w = config.img_width - img_data.shape[1]
+    assert ((pad_h >= 0) and (pad_w >= 0))
+
+    pad_img_data = np.zeros((config.img_height, config.img_width, 3)).astype(img_data.dtype)
+    pad_img_data[0:img_data.shape[0], 0:img_data.shape[1], :] = img_data
+
+    img_shape = np.append(img_shape, (scale_factor, scale_factor))
+    img_shape = np.asarray(img_shape, dtype=np.float32)
+
+    return (pad_img_data, img_shape, gt_bboxes, gt_label, gt_num)
+
+
+def resize_column(img, img_shape, gt_bboxes, gt_label, gt_num, config):
+    """Resize operation for image."""
+    img_data = img
+    img_data, w_scale, h_scale = mmcv.imresize(
+        img_data, (config.img_width, config.img_height), return_scale=True)
+    scale_factor = np.array(
+        [w_scale, h_scale, w_scale, h_scale], dtype=np.float32)
+    img_shape = (config.img_height, config.img_width, 1.0)
+    img_shape = np.asarray(img_shape, dtype=np.float32)
+
+    gt_bboxes = gt_bboxes * scale_factor
+
+    gt_bboxes[:, 0::2] = np.clip(gt_bboxes[:, 0::2], 0, img_shape[1] - 1)
+    gt_bboxes[:, 1::2] = np.clip(gt_bboxes[:, 1::2], 0, img_shape[0] - 1)
+
+    return (img_data, img_shape, gt_bboxes, gt_label, gt_num)
+
+
+def resize_column_test(img, img_shape, gt_bboxes, gt_label, gt_num, config):
+    """Resize operation for image of eval."""
+    img_data = img
+    img_data, w_scale, h_scale = mmcv.imresize(
+        img_data, (config.img_width, config.img_height), return_scale=True)
+    scale_factor = np.array(
+        [w_scale, h_scale, w_scale, h_scale], dtype=np.float32)
+    img_shape = np.append(img_shape, (h_scale, w_scale))
+    img_shape = np.asarray(img_shape, dtype=np.float32)
+
+    gt_bboxes = gt_bboxes * scale_factor
+
+    gt_bboxes[:, 0::2] = np.clip(gt_bboxes[:, 0::2], 0, img_shape[1] - 1)
+    gt_bboxes[:, 1::2] = np.clip(gt_bboxes[:, 1::2], 0, img_shape[0] - 1)
+
+    return (img_data, img_shape, gt_bboxes, gt_label, gt_num)
+
+
+def impad_to_multiple_column(img, img_shape, gt_bboxes, gt_label, gt_num, config):
+    """Impad operation for image."""
+    img_data = mmcv.impad(img, (config.img_height, config.img_width))
+    img_data = img_data.astype(np.float32)
+    return (img_data, img_shape, gt_bboxes, gt_label, gt_num)
+
+
+def imnormalize_column(img, img_shape, gt_bboxes, gt_label, gt_num):
+    """Imnormalize operation for image."""
+    img_data = mmcv.imnormalize(img, np.array([123.675, 116.28, 103.53]), np.array([58.395, 57.12, 57.375]), True)
+    img_data = img_data.astype(np.float32)
+    return (img_data, img_shape, gt_bboxes, gt_label, gt_num)
+
+
+def flip_column(img, img_shape, gt_bboxes, gt_label, gt_num):
+    """Flip operation for image."""
+    img_data = img
+    img_data = mmcv.imflip(img_data)
+    flipped = gt_bboxes.copy()
+    _, w, _ = img_data.shape
+
+    flipped[..., 0::4] = w - gt_bboxes[..., 2::4] - 1
+    flipped[..., 2::4] = w - gt_bboxes[..., 0::4] - 1
+
+    return (img_data, img_shape, flipped, gt_label, gt_num)
+
+
+def transpose_column(img, img_shape, gt_bboxes, gt_label, gt_num):
+    """Transpose operation for image."""
+    img_data = img.transpose(2, 0, 1).copy()
+    img_data = img_data.astype(np.float32)
+    img_shape = img_shape.astype(np.float32)
+    gt_bboxes = gt_bboxes.astype(np.float32)
+    gt_label = gt_label.astype(np.int32)
+    gt_num = gt_num.astype(np.bool)
+
+    return (img_data, img_shape, gt_bboxes, gt_label, gt_num)
+
+
+def photo_crop_column(img, img_shape, gt_bboxes, gt_label, gt_num):
+    """Photo crop operation for image."""
+    random_photo = PhotoMetricDistortion()
+    img_data, gt_bboxes, gt_label = random_photo(img, gt_bboxes, gt_label)
+
+    return (img_data, img_shape, gt_bboxes, gt_label, gt_num)
+
+
+def expand_column(img, img_shape, gt_bboxes, gt_label, gt_num):
+    """Expand operation for image."""
+    expand = Expand()
+    img, gt_bboxes, gt_label = expand(img, gt_bboxes, gt_label)
+
+    return (img, img_shape, gt_bboxes, gt_label, gt_num)
+
+
+def preprocess_fn(image, box, is_training, config):
+    """Preprocess function for dataset."""
+
+    def _infer_data(image_bgr, image_shape, gt_box_new, gt_label_new, gt_iscrowd_new_revert):
+        image_shape = image_shape[:2]
+        input_data = image_bgr, image_shape, gt_box_new, gt_label_new, gt_iscrowd_new_revert
+
+        if config.keep_ratio:
+            input_data = rescale_column_test(*input_data, config=config)
+        else:
+            input_data = resize_column_test(*input_data, config=config)
+        input_data = imnormalize_column(*input_data)
+
+        output_data = transpose_column(*input_data)
+        return output_data
+
+    def _data_aug(image, box, is_training):
+        """Construct the trainer of SpNas."""
+        image_bgr = image.copy()
+        image_bgr[:, :, 0] = image[:, :, 2]
+        image_bgr[:, :, 1] = image[:, :, 1]
+        image_bgr[:, :, 2] = image[:, :, 0]
+        image_shape = image_bgr.shape[:2]
+        gt_box = box[:, :4]
+        gt_label = box[:, 4]
+        gt_iscrowd = box[:, 5]
+
+        pad_max_number = 128
+        gt_box_new = np.pad(gt_box, ((0, pad_max_number - box.shape[0]), (0, 0)), mode="constant", constant_values=0)
+        gt_label_new = np.pad(gt_label, ((0, pad_max_number - box.shape[0])), mode="constant", constant_values=-1)
+        gt_iscrowd_new = np.pad(gt_iscrowd, ((0, pad_max_number - box.shape[0])), mode="constant", constant_values=1)
+        gt_iscrowd_new_revert = (~(gt_iscrowd_new.astype(np.bool))).astype(np.int32)
+
+        if not is_training:
+            return _infer_data(image_bgr, image_shape, gt_box_new, gt_label_new, gt_iscrowd_new_revert)
+
+        flip = (np.random.rand() < config.flip_ratio)
+        expand = (np.random.rand() < config.expand_ratio)
+        input_data = image_bgr, image_shape, gt_box_new, gt_label_new, gt_iscrowd_new_revert
+
+        if expand:
+            input_data = expand_column(*input_data)
+        if config.keep_ratio:
+            input_data = rescale_column(*input_data, config=config)
+        else:
+            input_data = resize_column(*input_data, config=config)
+        input_data = imnormalize_column(*input_data)
+        if flip:
+            input_data = flip_column(*input_data)
+
+        output_data = transpose_column(*input_data)
+        return output_data
+
+    return _data_aug(image, box, is_training)
+
+
+def create_coco_label(is_training, config):
+    """Get image path and annotation from COCO."""
+    from pycocotools.coco import COCO
+
+    coco_root = config.coco_root
+    data_type = config.val_data_type
+    if is_training:
+        data_type = config.train_data_type
+
+    # Classes need to train or test.
+    train_cls = config.coco_classes
+    train_cls_dict = {}
+    for i, cls in enumerate(train_cls):
+        train_cls_dict[cls] = i
+
+    anno_json = os.path.join(coco_root, config.instance_set.format(data_type))
+
+    coco = COCO(anno_json)
+    classs_dict = {}
+    cat_ids = coco.loadCats(coco.getCatIds())
+    for cat in cat_ids:
+        classs_dict[cat["id"]] = cat["name"]
+
+    image_ids = coco.getImgIds()
+    image_files = []
+    image_anno_dict = {}
+
+    for img_id in image_ids:
+        image_info = coco.loadImgs(img_id)
+        file_name = image_info[0]["file_name"]
+        anno_ids = coco.getAnnIds(imgIds=img_id, iscrowd=None)
+        anno = coco.loadAnns(anno_ids)
+        image_path = os.path.join(coco_root, data_type, file_name)
+        annos = []
+        for label in anno:
+            bbox = label["bbox"]
+            class_name = classs_dict[label["category_id"]]
+            if class_name in train_cls:
+                x1, x2 = bbox[0], bbox[0] + bbox[2]
+                y1, y2 = bbox[1], bbox[1] + bbox[3]
+                annos.append([x1, y1, x2, y2] + [train_cls_dict[class_name]] + [int(label["iscrowd"])])
+
+        image_files.append(image_path)
+        if annos:
+            image_anno_dict[image_path] = np.array(annos)
+        else:
+            image_anno_dict[image_path] = np.array([0, 0, 0, 0, 0, 1])
+
+    return image_files, image_anno_dict
+
+
+def anno_parser(annos_str):
+    """Parse annotation from string to list."""
+    annos = []
+    for anno_str in annos_str:
+        anno = list(map(int, anno_str.strip().split(',')))
+        annos.append(anno)
+    return annos
+
+
+def filter_valid_data(image_dir, anno_path):
+    """Filter valid image file, which both in image_dir and anno_path."""
+    image_files = []
+    image_anno_dict = {}
+    if not os.path.isdir(image_dir):
+        raise RuntimeError("Path given is not valid.")
+    if not os.path.isfile(anno_path):
+        raise RuntimeError("Annotation file is not valid.")
+
+    with open(anno_path, "rb") as f:
+        lines = f.readlines()
+    for line in lines:
+        line_str = line.decode("utf-8").strip()
+        line_split = str(line_str).split(' ')
+        file_name = line_split[0]
+        image_path = os.path.join(image_dir, file_name)
+        if os.path.isfile(image_path):
+            image_anno_dict[image_path] = anno_parser(line_split[1:])
+            image_files.append(image_path)
+    return image_files, image_anno_dict
+
+
+def data_to_mindrecord_byte_image(config, dataset="coco", is_training=True, prefix="fasterrcnn.mindrecord", file_num=8):
+    """Create MindRecord file."""
+    mindrecord_dir = config.mindrecord_dir
+    mindrecord_path = os.path.join(mindrecord_dir, prefix)
+    writer = FileWriter(mindrecord_path, file_num)
+    if dataset == "coco":
+        image_files, image_anno_dict = create_coco_label(is_training, config=config)
+    else:
+        image_files, image_anno_dict = filter_valid_data(config.IMAGE_DIR, config.ANNO_PATH)
+
+    fasterrcnn_json = {
+        "image": {"type": "bytes"},
+        "annotation": {"type": "int32", "shape": [-1, 6]},
+    }
+    writer.add_schema(fasterrcnn_json, "fasterrcnn_json")
+
+    for image_name in image_files:
+        with open(image_name, 'rb') as f:
+            img = f.read()
+        annos = np.array(image_anno_dict[image_name], dtype=np.int32)
+        row = {"image": img, "annotation": annos}
+        writer.write_raw_data([row])
+    writer.commit()
+
+
+def create_fasterrcnn_dataset(config, mindrecord_file, batch_size=2, device_num=1, rank_id=0, is_training=True,
+                              num_parallel_workers=8, python_multiprocessing=False):
+    """Create FasterRcnn dataset with MindDataset."""
+    cv2.setNumThreads(0)
+    de.config.set_prefetch_size(8)
+    ds = de.MindDataset(mindrecord_file, columns_list=["image", "annotation"], num_shards=device_num, shard_id=rank_id,
+                        num_parallel_workers=4, shuffle=is_training)
+    decode = C.Decode()
+    ds = ds.map(input_columns=["image"], operations=decode)
+    compose_map_func = (lambda image, annotation: preprocess_fn(image, annotation, is_training, config=config))
+
+    if is_training:
+        ds = ds.map(input_columns=["image", "annotation"],
+                    output_columns=["image", "image_shape", "box", "label", "valid_num"],
+                    column_order=["image", "image_shape", "box", "label", "valid_num"],
+                    operations=compose_map_func, python_multiprocessing=python_multiprocessing,
+                    num_parallel_workers=num_parallel_workers)
+        ds = ds.batch(batch_size, drop_remainder=True)
+    else:
+        ds = ds.map(input_columns=["image", "annotation"],
+                    output_columns=["image", "image_shape", "box", "label", "valid_num"],
+                    column_order=["image", "image_shape", "box", "label", "valid_num"],
+                    operations=compose_map_func,
+                    num_parallel_workers=num_parallel_workers)
+        ds = ds.batch(batch_size, drop_remainder=True)
+    return ds
diff --git a/vega/algorithms/nas/sp_nas/src/default_config.yaml b/vega/algorithms/nas/sp_nas/src/default_config.yaml
new file mode 100644
index 00000000..80449986
--- /dev/null
+++ b/vega/algorithms/nas/sp_nas/src/default_config.yaml
@@ -0,0 +1,123 @@
+# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
+enable_modelarts: False
+data_url: ""
+train_url: ""
+checkpoint_url: ""
+data_path: "/cache/data"
+output_path: "/cache/train"
+load_path: "/cache/checkpoint_path"
+device_target: Ascend
+enable_profiling: False
+
+# ==============================================================================
+# config
+img_width: 1280
+img_height: 768
+keep_ratio: True
+flip_ratio: 0.5
+expand_ratio: 1.0
+
+# anchor
+feature_shapes:
+  - [192, 320]
+  - [96, 160]
+  - [48, 80]
+  - [24, 40]
+  - [12, 20]
+anchor_scales: [8]
+anchor_ratios: [0.5, 1.0, 2.0]
+anchor_strides: [4, 8, 16, 32, 64]
+num_anchors: 3
+
+# fpn
+fpn_out_channels: 256
+fpn_num_outs: 5
+
+# rpn
+rpn_in_channels: 256
+rpn_feat_channels: 256
+rpn_loss_cls_weight: 1.0
+rpn_loss_reg_weight: 1.0
+rpn_cls_out_channels: 1
+rpn_target_means: [0., 0., 0., 0.]
+rpn_target_stds: [1.0, 1.0, 1.0, 1.0]
+
+# bbox_assign_sampler
+neg_iou_thr: 0.3
+pos_iou_thr: 0.7
+min_pos_iou: 0.3
+num_bboxes: 245520
+num_gts: 128
+num_expected_neg: 256
+num_expected_pos: 128
+
+# proposal
+activate_num_classes: 2
+use_sigmoid_cls: True
+
+# roi_align
+roi_layer: {type: 'RoIAlign', out_size: 7, sample_num: 2}
+roi_align_out_channels: 256
+roi_align_featmap_strides: [4, 8, 16, 32]
+roi_align_finest_scale: 56
+roi_sample_num: 640
+
+# bbox_assign_sampler_stage2
+neg_iou_thr_stage2: 0.5
+pos_iou_thr_stage2: 0.5
+min_pos_iou_stage2: 0.5
+num_bboxes_stage2: 2000
+num_expected_pos_stage2: 128
+num_expected_neg_stage2: 512
+num_expected_total_stage2: 512
+
+# rcnn
+rcnn_num_layers: 2
+rcnn_in_channels: 256
+rcnn_fc_out_channels: 1024
+rcnn_loss_cls_weight: 1
+rcnn_loss_reg_weight: 1
+rcnn_target_means: [0., 0., 0., 0.]
+rcnn_target_stds: [0.1, 0.1, 0.2, 0.2]
+
+# train proposal
+rpn_proposal_nms_across_levels: False
+rpn_proposal_nms_pre: 2000
+rpn_proposal_nms_post: 2000
+rpn_proposal_max_num: 2000
+rpn_proposal_nms_thr: 0.7
+rpn_proposal_min_bbox_size: 0
+
+# test proposal
+rpn_nms_across_levels: False
+rpn_nms_pre: 1000
+rpn_nms_post: 1000
+rpn_max_num: 1000
+rpn_nms_thr: 0.7
+rpn_min_bbox_min_size: 0
+test_score_thr: 0.05
+test_iou_thr: 0.5
+test_max_per_img: 100
+test_batch_size: 2
+
+rpn_head_use_sigmoid: True
+rpn_head_weight: 1.0
+
+# LR
+base_lr: 0.04
+warmup_step: 500
+warmup_ratio: 0.0625
+sgd_step: [8, 11]
+sgd_momentum: 0.9
+
+# train
+batch_size: 2
+loss_scale: 256
+momentum: 0.91
+weight_decay: 0.00001
+epoch_size: 20
+save_checkpoint: True
+save_checkpoint_epochs: 1
+keep_checkpoint_max: 20
+save_checkpoint_path: "./"
+num_classes: 81
diff --git a/vega/algorithms/nas/sp_nas/src/lr_schedule.py b/vega/algorithms/nas/sp_nas/src/lr_schedule.py
new file mode 100644
index 00000000..2832211c
--- /dev/null
+++ b/vega/algorithms/nas/sp_nas/src/lr_schedule.py
@@ -0,0 +1,40 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+"""Lr generator for fasterrcnn."""
+import math
+
+
+def linear_warmup_learning_rate(current_step, warmup_steps, base_lr, init_lr):
+    """Construct the trainer of SpNas."""
+    lr_inc = (float(base_lr) - float(init_lr)) / float(warmup_steps)
+    learning_rate = float(init_lr) + lr_inc * current_step
+    return learning_rate
+
+
+def a_cosine_learning_rate(current_step, base_lr, warmup_steps, decay_steps):
+    """Construct the trainer of SpNas."""
+    base = float(current_step - warmup_steps) / float(decay_steps)
+    learning_rate = (1 + math.cos(base * math.pi)) / 2 * base_lr
+    return learning_rate
+
+
+def dynamic_lr(config, steps_per_epoch):
+    """Dynamic learning rate generator."""
+    base_lr = config.base_lr
+    total_steps = steps_per_epoch * (config.epoch_size + 1)
+    warmup_steps = int(config.warmup_step)
+    lr = []
+    for i in range(total_steps):
+        if i < warmup_steps:
+            lr.append(linear_warmup_learning_rate(i, warmup_steps, base_lr, base_lr * config.warmup_ratio))
+        else:
+            lr.append(a_cosine_learning_rate(i, base_lr, warmup_steps, total_steps))
+
+    return lr
diff --git a/vega/algorithms/nas/sp_nas/src/model_utils/__init__.py b/vega/algorithms/nas/sp_nas/src/model_utils/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/vega/algorithms/nas/sp_nas/src/model_utils/config.py b/vega/algorithms/nas/sp_nas/src/model_utils/config.py
new file mode 100644
index 00000000..84f88de2
--- /dev/null
+++ b/vega/algorithms/nas/sp_nas/src/model_utils/config.py
@@ -0,0 +1,122 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""Parse arguments."""
+
+import os
+import ast
+import argparse
+from pprint import pprint, pformat
+import yaml
+
+
+class Config:
+    """Configuration namespace. Convert dictionary to members."""
+
+    def __init__(self, cfg_dict):
+        for k, v in cfg_dict.items():
+            if isinstance(v, (list, tuple)):
+                setattr(self, k, [Config(x) if isinstance(x, dict) else x for x in v])
+            else:
+                setattr(self, k, Config(v) if isinstance(v, dict) else v)
+
+    def __str__(self):
+        """Construct the trainer of SpNas."""
+        return pformat(self.__dict__)
+
+    def __repr__(self):
+        """Construct the trainer of SpNas."""
+        return self.__str__()
+
+
+def parse_cli_to_yaml(parser, cfg, helper=None, choices=None, cfg_path="default_config.yaml"):
+    """
+    Parse command line arguments to the configuration according to the default yaml.
+
+    Args:
+        parser: Parent parser.
+        cfg: Base configuration.
+        helper: Helper description.
+        cfg_path: Path to the default yaml config.
+    """
+    parser = argparse.ArgumentParser(description="[REPLACE THIS at config.py]",
+                                     parents=[parser])
+    helper = {} if helper is None else helper
+    choices = {} if choices is None else choices
+    for item in cfg:
+        if not isinstance(cfg[item], list) and not isinstance(cfg[item], dict):
+            help_description = helper[item] if item in helper else "Please reference to {}".format(cfg_path)
+            choice = choices[item] if item in choices else None
+            if isinstance(cfg[item], bool):
+                parser.add_argument("--" + item, type=ast.literal_eval, default=cfg[item], choices=choice,
+                                    help=help_description)
+            else:
+                parser.add_argument("--" + item, type=type(cfg[item]), default=cfg[item], choices=choice,
+                                    help=help_description)
+    args = parser.parse_args()
+    return args
+
+
+def parse_yaml(yaml_path):
+    """
+    Parse the yaml config file.
+
+    Args:
+        yaml_path: Path to the yaml config.
+    """
+    with open(yaml_path, 'r') as fin:
+        try:
+            cfgs = yaml.load_all(fin.read(), Loader=yaml.FullLoader)
+            cfgs = [x for x in cfgs]
+            if len(cfgs) == 1:
+                cfg_helper = {}
+                cfg = cfgs[0]
+                cfg_choices = {}
+            elif len(cfgs) == 2:
+                cfg, cfg_helper = cfgs
+                cfg_choices = {}
+            elif len(cfgs) == 3:
+                cfg, cfg_helper, cfg_choices = cfgs
+            else:
+                raise ValueError("At most 3 docs (config, description for help, choices) are supported in config yaml")
+            print(cfg_helper)
+        except Exception:
+            raise ValueError("Failed to parse yaml")
+    return cfg, cfg_helper, cfg_choices
+
+
+def merge(args, cfg):
+    """
+    Merge the base config from yaml file and command line arguments.
+
+    Args:
+        args: Command line arguments.
+        cfg: Base configuration.
+    """
+    args_var = vars(args)
+    for item in args_var:
+        cfg[item] = args_var[item]
+    return cfg
+
+
+def get_config():
+    """Get Config according to the yaml file and cli arguments."""
+    parser = argparse.ArgumentParser(description="default name", add_help=False)
+    current_dir = os.path.dirname(os.path.abspath(__file__))
+    parser.add_argument("--config_path", type=str, default=os.path.join(current_dir, "../default_config.yaml"),
+                        help="Config file path")
+    path_args, _ = parser.parse_known_args()
+    default, helper, choices = parse_yaml(path_args.config_path)
+    args = parse_cli_to_yaml(parser=parser, cfg=default, helper=helper, choices=choices, cfg_path=path_args.config_path)
+    final_config = merge(args, default)
+    return Config(final_config)
+
+
+config = get_config()
diff --git a/vega/algorithms/nas/sp_nas/src/network_define.py b/vega/algorithms/nas/sp_nas/src/network_define.py
new file mode 100644
index 00000000..63bb5a41
--- /dev/null
+++ b/vega/algorithms/nas/sp_nas/src/network_define.py
@@ -0,0 +1,152 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+"""FasterRcnn training network wrapper."""
+
+import time
+import numpy as np
+import mindspore.nn as nn
+from mindspore.common.tensor import Tensor
+from mindspore.ops import functional as F
+from mindspore.ops import composite as C
+from mindspore import ParameterTuple
+from mindspore.train.callback import Callback
+from mindspore.nn.wrap.grad_reducer import DistributedGradReducer
+
+time_stamp_init = False
+time_stamp_first = 0
+
+
+class LossCallBack(Callback):
+    """
+    Monitor the loss in training.
+
+    If the loss is NAN or INF terminating training.
+
+    Note:
+        If per_print_times is 0 do not print loss.
+
+    Args:
+        per_print_times (int): Print loss every times. Default: 1.
+    """
+
+    def __init__(self, per_print_times=1, rank_id=0):
+        super(LossCallBack, self).__init__()
+        if not isinstance(per_print_times, int) or per_print_times < 0:
+            raise ValueError("print_step must be int and >= 0.")
+        self._per_print_times = per_print_times
+        self.count = 0
+        self.loss_sum = 0
+        self.rank_id = rank_id
+
+        global time_stamp_init, time_stamp_first
+        if not time_stamp_init:
+            time_stamp_first = time.time()
+            time_stamp_init = True
+
+    def step_end(self, run_context):
+        """Construct the trainer of SpNas."""
+        cb_params = run_context.original_args()
+        loss = cb_params.net_outputs.asnumpy()
+        cur_step_in_epoch = (cb_params.cur_step_num - 1) % cb_params.batch_num + 1
+
+        self.count += 1
+        self.loss_sum += float(loss)
+
+        if self.count >= 1:
+            global time_stamp_first
+            time_stamp_current = time.time()
+            total_loss = self.loss_sum / self.count
+
+            loss_file = open("./loss_{}.log".format(self.rank_id), "a+")
+            loss_file.write("%lu epoch: %s step: %s total_loss: %.5f" %
+                            (time_stamp_current - time_stamp_first, cb_params.cur_epoch_num, cur_step_in_epoch,
+                             total_loss))
+            loss_file.write("\n")
+            loss_file.close()
+
+            self.count = 0
+            self.loss_sum = 0
+
+
+class LossNet(nn.Cell):
+    """FasterRcnn loss method."""
+
+    def construct(self, x1, x2, x3, x4, x5, x6):
+        """Construct the trainer of SpNas."""
+        return x1 + x2
+
+
+class WithLossCell(nn.Cell):
+    """
+    Wrap the network with loss function to compute loss.
+
+    Args:
+        backbone (Cell): The target network to wrap.
+        loss_fn (Cell): The loss function used to compute loss.
+    """
+
+    def __init__(self, backbone, loss_fn):
+        super(WithLossCell, self).__init__(auto_prefix=False)
+        self._backbone = backbone
+        self._loss_fn = loss_fn
+
+    def construct(self, x, img_shape, gt_bboxe, gt_label, gt_num):
+        """Construct the trainer of SpNas."""
+        loss1, loss2, loss3, loss4, loss5, loss6 = self._backbone(x, img_shape, gt_bboxe, gt_label, gt_num)
+        return self._loss_fn(loss1, loss2, loss3, loss4, loss5, loss6)
+
+    @property
+    def backbone_network(self):
+        """
+        Get the backbone network.
+
+        Returns:
+            Cell, return backbone network.
+        """
+        return self._backbone
+
+
+class TrainOneStepCell(nn.Cell):
+    """
+    Network training package class.
+
+    Append an optimizer to the training network after that the construct function
+    can be called to create the backward graph.
+
+    Args:
+        network (Cell): The training network.
+        optimizer (Cell): Optimizer for updating the weights.
+        sens (Number): The adjust parameter. Default value is 1.0.
+        reduce_flag (bool): The reduce flag. Default value is False.
+        mean (bool): Allreduce method. Default value is False.
+        degree (int): Device number. Default value is None.
+    """
+
+    def __init__(self, network, optimizer, sens=1.0, reduce_flag=False, mean=True, degree=None):
+        super(TrainOneStepCell, self).__init__(auto_prefix=False)
+        self.network = network
+        self.network.set_grad()
+        self.weights = ParameterTuple(network.trainable_params())
+        self.optimizer = optimizer
+        self.grad = C.GradOperation(get_by_list=True,
+                                    sens_param=True)
+        self.sens = Tensor((np.ones((1,)) * sens).astype(np.float32))
+        self.reduce_flag = reduce_flag
+        if reduce_flag:
+            self.grad_reducer = DistributedGradReducer(optimizer.parameters, mean, degree)
+
+    def construct(self, x, img_shape, gt_bboxe, gt_label, gt_num):
+        """Construct the trainer of SpNas."""
+        weights = self.weights
+        loss = self.network(x, img_shape, gt_bboxe, gt_label, gt_num)
+        grads = self.grad(self.network, weights)(x, img_shape, gt_bboxe, gt_label, gt_num, self.sens)
+        if self.reduce_flag:
+            grads = self.grad_reducer(grads)
+        return F.depend(loss, self.optimizer(grads))
diff --git a/vega/algorithms/nas/sp_nas/src/util.py b/vega/algorithms/nas/sp_nas/src/util.py
new file mode 100644
index 00000000..d6872b28
--- /dev/null
+++ b/vega/algorithms/nas/sp_nas/src/util.py
@@ -0,0 +1,227 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+"""Coco eval for fasterrcnn."""
+import json
+import numpy as np
+import mmcv
+from pycocotools.coco import COCO
+from pycocotools.cocoeval import COCOeval
+
+_init_value = np.array(0.0)
+summary_init = {
+    'Precision/mAP': _init_value,
+    'Precision/mAP@.50IOU': _init_value,
+    'Precision/mAP@.75IOU': _init_value,
+    'Precision/mAP (small)': _init_value,
+    'Precision/mAP (medium)': _init_value,
+    'Precision/mAP (large)': _init_value,
+    'Recall/AR@1': _init_value,
+    'Recall/AR@10': _init_value,
+    'Recall/AR@100': _init_value,
+    'Recall/AR@100 (small)': _init_value,
+    'Recall/AR@100 (medium)': _init_value,
+    'Recall/AR@100 (large)': _init_value,
+}
+
+
+def coco_eval(result_files, result_types, coco, max_dets=(100, 300, 1000), single_result=False):
+    """Construct the trainer of SpNas."""
+    anns = json.load(open(result_files['bbox']))
+    if not anns:
+        return summary_init
+
+    if mmcv.is_str(coco):
+        coco = COCO(coco)
+    assert isinstance(coco, COCO)
+
+    for res_type in result_types:
+        result_file = result_files[res_type]
+        assert result_file.endswith('.json')
+
+        coco_dets = coco.loadRes(result_file)
+        gt_img_ids = coco.getImgIds()
+        det_img_ids = coco_dets.getImgIds()
+        iou_type = 'bbox' if res_type == 'proposal' else res_type
+        cocoEval = COCOeval(coco, coco_dets, iou_type)
+        if res_type == 'proposal':
+            cocoEval.params.useCats = 0
+            cocoEval.params.maxDets = list(max_dets)
+
+        tgt_ids = gt_img_ids if not single_result else det_img_ids
+
+        if single_result:
+            res_dict = dict()
+            for id_i in tgt_ids:
+                cocoEval = COCOeval(coco, coco_dets, iou_type)
+                if res_type == 'proposal':
+                    cocoEval.params.useCats = 0
+                    cocoEval.params.maxDets = list(max_dets)
+
+                cocoEval.params.imgIds = [id_i]
+                cocoEval.evaluate()
+                cocoEval.accumulate()
+                cocoEval.summarize()
+                res_dict.update({coco.imgs[id_i]['file_name']: cocoEval.stats[1]})
+
+        cocoEval = COCOeval(coco, coco_dets, iou_type)
+        if res_type == 'proposal':
+            cocoEval.params.useCats = 0
+            cocoEval.params.maxDets = list(max_dets)
+
+        cocoEval.params.imgIds = tgt_ids
+        cocoEval.evaluate()
+        cocoEval.accumulate()
+        cocoEval.summarize()
+
+        summary_metrics = {
+            'Precision/mAP': cocoEval.stats[0],
+            'Precision/mAP@.50IOU': cocoEval.stats[1],
+            'Precision/mAP@.75IOU': cocoEval.stats[2],
+            'Precision/mAP (small)': cocoEval.stats[3],
+            'Precision/mAP (medium)': cocoEval.stats[4],
+            'Precision/mAP (large)': cocoEval.stats[5],
+            'Recall/AR@1': cocoEval.stats[6],
+            'Recall/AR@10': cocoEval.stats[7],
+            'Recall/AR@100': cocoEval.stats[8],
+            'Recall/AR@100 (small)': cocoEval.stats[9],
+            'Recall/AR@100 (medium)': cocoEval.stats[10],
+            'Recall/AR@100 (large)': cocoEval.stats[11],
+        }
+
+    return summary_metrics
+
+
+def xyxy2xywh(bbox):
+    """Construct the trainer of SpNas."""
+    _bbox = bbox.tolist()
+    return [
+        _bbox[0],
+        _bbox[1],
+        _bbox[2] - _bbox[0] + 1,
+        _bbox[3] - _bbox[1] + 1,
+    ]
+
+
+def bbox2result_1image(bboxes, labels, num_classes):
+    """Convert detection results to a list of numpy arrays.
+
+    Args:
+        bboxes (Tensor): shape (n, 5)
+        labels (Tensor): shape (n, )
+        num_classes (int): class number, including background class
+
+    Returns:
+        list(ndarray): bbox results of each class
+    """
+    if bboxes.shape[0] == 0:
+        result = [np.zeros((0, 5), dtype=np.float32) for i in range(num_classes - 1)]
+    else:
+        result = [bboxes[labels == i, :] for i in range(num_classes - 1)]
+    return result
+
+
+def proposal2json(dataset, results):
+    """Convert proposal to json mode."""
+    img_ids = dataset.getImgIds()
+    json_results = []
+    dataset_len = dataset.get_dataset_size() * 2
+    for idx in range(dataset_len):
+        img_id = img_ids[idx]
+        bboxes = results[idx]
+        for i in range(bboxes.shape[0]):
+            data = dict()
+            data['image_id'] = img_id
+            data['bbox'] = xyxy2xywh(bboxes[i])
+            data['score'] = float(bboxes[i][4])
+            data['category_id'] = 1
+            json_results.append(data)
+    return json_results
+
+
+def det2json(dataset, results):
+    """Convert det to json mode."""
+    cat_ids = dataset.getCatIds()
+    img_ids = dataset.getImgIds()
+    json_results = []
+    dataset_len = len(img_ids)
+    for idx in range(dataset_len):
+        img_id = img_ids[idx]
+        if idx == len(results):
+            break
+        result = results[idx]
+        for label, result_label in enumerate(result):
+            bboxes = result_label
+            for i in range(bboxes.shape[0]):
+                data = dict()
+                data['image_id'] = img_id
+                data['bbox'] = xyxy2xywh(bboxes[i])
+                data['score'] = float(bboxes[i][4])
+                data['category_id'] = cat_ids[label]
+                json_results.append(data)
+    return json_results
+
+
+def segm2json(dataset, results):
+    """Convert segm to json mode."""
+    bbox_json_results = []
+    segm_json_results = []
+    for idx in range(len(dataset)):
+        img_id = dataset.img_ids[idx]
+        det, seg = results[idx]
+        for label, det_label in enumerate(det):
+            # bbox results
+            bboxes = det_label
+            for i in range(bboxes.shape[0]):
+                data = dict()
+                data['image_id'] = img_id
+                data['bbox'] = xyxy2xywh(bboxes[i])
+                data['score'] = float(bboxes[i][4])
+                data['category_id'] = dataset.cat_ids[label]
+                bbox_json_results.append(data)
+
+            if len(seg) == 2:
+                segms = seg[0][label]
+                mask_score = seg[1][label]
+            else:
+                segms = seg[label]
+                mask_score = [bbox[4] for bbox in bboxes]
+            for i in range(bboxes.shape[0]):
+                data = dict()
+                data['image_id'] = img_id
+                data['score'] = float(mask_score[i])
+                data['category_id'] = dataset.cat_ids[label]
+                segms[i]['counts'] = segms[i]['counts'].decode()
+                data['segmentation'] = segms[i]
+                segm_json_results.append(data)
+    return bbox_json_results, segm_json_results
+
+
+def results2json(dataset, results, out_file):
+    """Convert result convert to json mode."""
+    result_files = dict()
+    if isinstance(results[0], list):
+        json_results = det2json(dataset, results)
+        result_files['bbox'] = '{}.{}.json'.format(out_file, 'bbox')
+        result_files['proposal'] = '{}.{}.json'.format(out_file, 'bbox')
+        mmcv.dump(json_results, result_files['bbox'])
+    elif isinstance(results[0], tuple):
+        json_results = segm2json(dataset, results)
+        result_files['bbox'] = '{}.{}.json'.format(out_file, 'bbox')
+        result_files['proposal'] = '{}.{}.json'.format(out_file, 'bbox')
+        result_files['segm'] = '{}.{}.json'.format(out_file, 'segm')
+        mmcv.dump(json_results[0], result_files['bbox'])
+        mmcv.dump(json_results[1], result_files['segm'])
+    elif isinstance(results[0], np.ndarray):
+        json_results = proposal2json(dataset, results)
+        result_files['proposal'] = '{}.{}.json'.format(out_file, 'proposal')
+        mmcv.dump(json_results, result_files['proposal'])
+    else:
+        raise TypeError('invalid type of results')
+    return result_files
diff --git a/vega/algorithms/nlp/__init__.py b/vega/algorithms/nlp/__init__.py
new file mode 100644
index 00000000..6786090b
--- /dev/null
+++ b/vega/algorithms/nlp/__init__.py
@@ -0,0 +1,7 @@
+from vega.common.class_factory import ClassFactory
+
+
+ClassFactory.lazy_register("vega.algorithms.nlp", {
+    "bert_trainer_callback": ["BertTrainerCallback"],
+    "src.bert_for_pre_training": ["BertNetworkWithLoss"],
+})
diff --git a/vega/algorithms/nlp/bert_trainer_callback.py b/vega/algorithms/nlp/bert_trainer_callback.py
new file mode 100644
index 00000000..d8c64d58
--- /dev/null
+++ b/vega/algorithms/nlp/bert_trainer_callback.py
@@ -0,0 +1,287 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""The trainer program for Auto Lane."""
+
+import logging
+import os
+from vega.common import ClassFactory, ClassType
+from vega.trainer.trainer_ms import TrainerMs
+from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor
+from mindspore.train import Model as MsModel
+from mindspore.train.train_thor import ConvertModelUtils
+from mindspore import context
+from mindspore.nn.optim import Lamb, Momentum, AdamWeightDecay, thor
+from mindspore.nn.wrap.loss_scale import DynamicLossScaleUpdateCell
+import mindspore.dataset as de
+import mindspore.dataset.transforms.c_transforms as C
+import mindspore.nn as nn
+import mindspore.common.dtype as mstype
+from mindspore.ops import operations as P
+from mindspore.common.parameter import Parameter
+from mindspore.common.tensor import Tensor
+from mindspore.train.serialization import load_checkpoint, load_param_into_net
+from mindspore.nn.metrics import Metric
+from .src import BertNetworkWithLoss, BertTrainOneStepCell, BertTrainOneStepWithLossScaleCell, \
+    BertTrainAccumulationAllReduceEachWithLossScaleCell, \
+    BertTrainAccumulationAllReducePostWithLossScaleCell, \
+    BertTrainOneStepWithLossScaleCellForAdam, \
+    AdamWeightDecayForBert, AdamWeightDecayOp
+from .src.dataset import create_bert_dataset
+from .src.utils import LossCallBack, BertLearningRate
+from .src import BertModel, GetMaskedLMOutput
+
+logger = logging.getLogger(__name__)
+
+
+class myMetric(Metric):
+    """Self-defined Metric as a callback."""
+
+    def __init__(self):
+        super(myMetric, self).__init__()
+        self.clear()
+
+    def clear(self):
+        """Construct the trainer of Bert."""
+        self.total_num = 0
+        self.acc_num = 0
+
+    def update(self, *inputs):
+        """Construct the trainer of Bert."""
+        total_num = self._convert_data(inputs[0])
+        acc_num = self._convert_data(inputs[1])
+        self.total_num = total_num
+        self.acc_num = acc_num
+
+    def eval(self):
+        """Construct the trainer of Bert."""
+        return self.acc_num / self.total_num
+
+
+class GetLogProbs(nn.Cell):
+    """Get MaskedLM prediction scores."""
+
+    def __init__(self, config):
+        super(GetLogProbs, self).__init__()
+        self.bert = BertModel(config, False)
+        self.cls1 = GetMaskedLMOutput(config)
+
+    def construct(self, input_ids, input_mask, token_type_id, masked_pos):
+        """Construct the trainer of Bert."""
+        sequence_output, _, embedding_table = self.bert(input_ids, token_type_id, input_mask)
+        prediction_scores = self.cls1(sequence_output, embedding_table, masked_pos)
+        return prediction_scores
+
+
+class BertPretrainEva(nn.Cell):
+    """Evaluate MaskedLM prediction scores."""
+
+    def __init__(self, config):
+        super(BertPretrainEva, self).__init__()
+        self.bert = GetLogProbs(config)
+        self.argmax = P.Argmax(axis=-1, output_type=mstype.int32)
+        self.equal = P.Equal()
+        self.mean = P.ReduceMean()
+        self.sum = P.ReduceSum()
+        self.total = Parameter(Tensor([0], mstype.float32))
+        self.acc = Parameter(Tensor([0], mstype.float32))
+        self.reshape = P.Reshape()
+        self.shape = P.Shape()
+        self.cast = P.Cast()
+
+    def construct(self, input_ids, input_mask, token_type_id, masked_pos, masked_ids, masked_weights, nsp_label):
+        """Calculate prediction scores."""
+        bs, _ = self.shape(input_ids)
+        probs = self.bert(input_ids, input_mask, token_type_id, masked_pos)
+        index = self.argmax(probs)
+        index = self.reshape(index, (bs, -1))
+        eval_acc = self.equal(index, masked_ids)
+        eval_acc1 = self.cast(eval_acc, mstype.float32)
+        real_acc = eval_acc1 * masked_weights
+        acc = self.sum(real_acc)
+        total = self.sum(masked_weights)
+        self.total += total
+        self.acc += acc
+        return acc, self.total, self.acc
+
+
+def get_enwiki_512_dataset(batch_size=1, repeat_count=1, distribute_file=''):
+    """Get enwiki dataset when seq_length is 512."""
+    from .src.model_utils.config import config as cfg, bert_net_cfg
+    ds = de.TFRecordDataset([cfg.data_file], cfg.schema_file, columns_list=["input_ids", "input_mask", "segment_ids",
+                                                                            "masked_lm_positions", "masked_lm_ids",
+                                                                            "masked_lm_weights",
+                                                                            "next_sentence_labels"])
+    type_cast_op = C.TypeCast(mstype.int32)
+    ds = ds.map(operations=type_cast_op, input_columns="segment_ids")
+    ds = ds.map(operations=type_cast_op, input_columns="input_mask")
+    ds = ds.map(operations=type_cast_op, input_columns="input_ids")
+    ds = ds.map(operations=type_cast_op, input_columns="masked_lm_ids")
+    ds = ds.map(operations=type_cast_op, input_columns="masked_lm_positions")
+    ds = ds.map(operations=type_cast_op, input_columns="next_sentence_labels")
+    ds = ds.repeat(repeat_count)
+
+    # apply batch operations
+    ds = ds.batch(batch_size, drop_remainder=True)
+    return ds
+
+
+def bert_predict():
+    """Predict function."""
+    from .src.model_utils.config import config as cfg, bert_net_cfg
+    devid = int(os.getenv('DEVICE_ID'))
+    context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", device_id=devid)
+    dataset = get_enwiki_512_dataset(cfg.batch_size, 1)
+    net_for_pretraining = BertPretrainEva(bert_net_cfg)
+    net_for_pretraining.set_train(False)
+    param_dict = load_checkpoint(cfg.finetune_ckpt)
+    load_param_into_net(net_for_pretraining, param_dict)
+    model = MsModel(net_for_pretraining)
+    return model, dataset, net_for_pretraining
+
+
+def _get_optimizer(args_opt, network):
+    """Get bert optimizer, support Lamb, Momentum, AdamWeightDecay."""
+    from .src.model_utils.config import config as cfg, bert_net_cfg
+    if cfg.optimizer == 'Lamb':
+        lr_schedule = BertLearningRate(learning_rate=cfg.Lamb.learning_rate,
+                                       end_learning_rate=cfg.Lamb.end_learning_rate,
+                                       warmup_steps=cfg.Lamb.warmup_steps,
+                                       decay_steps=args_opt.train_steps,
+                                       power=cfg.Lamb.power)
+        params = network.trainable_params()
+        decay_params = list(filter(cfg.Lamb.decay_filter, params))
+        other_params = list(filter(lambda x: not cfg.Lamb.decay_filter(x), params))
+        group_params = [{'params': decay_params, 'weight_decay': cfg.Lamb.weight_decay},
+                        {'params': other_params},
+                        {'order_params': params}]
+        optimizer = Lamb(group_params, learning_rate=lr_schedule, eps=cfg.Lamb.eps)
+    elif cfg.optimizer == 'Momentum':
+        optimizer = Momentum(network.trainable_params(), learning_rate=cfg.Momentum.learning_rate,
+                             momentum=cfg.Momentum.momentum)
+    elif cfg.optimizer == 'AdamWeightDecay':
+        lr_schedule = BertLearningRate(learning_rate=cfg.AdamWeightDecay.learning_rate,
+                                       end_learning_rate=cfg.AdamWeightDecay.end_learning_rate,
+                                       warmup_steps=cfg.AdamWeightDecay.warmup_steps,
+                                       decay_steps=args_opt.train_steps,
+                                       power=cfg.AdamWeightDecay.power)
+        params = network.trainable_params()
+        decay_params = list(filter(cfg.AdamWeightDecay.decay_filter, params))
+        other_params = list(filter(lambda x: not cfg.AdamWeightDecay.decay_filter(x), params))
+        group_params = [{'params': decay_params, 'weight_decay': cfg.AdamWeightDecay.weight_decay},
+                        {'params': other_params, 'weight_decay': 0.0},
+                        {'order_params': params}]
+        if args_opt.enable_lossscale == "true" and args_opt.device_target == 'GPU':
+            optimizer = AdamWeightDecayForBert(group_params, learning_rate=lr_schedule, eps=cfg.AdamWeightDecay.eps)
+        elif context.get_context("mode") == context.PYNATIVE_MODE and args_opt.device_target == 'GPU':
+            optimizer = AdamWeightDecayOp(group_params, learning_rate=lr_schedule, eps=cfg.AdamWeightDecay.eps)
+        else:
+            optimizer = AdamWeightDecay(group_params, learning_rate=lr_schedule, eps=cfg.AdamWeightDecay.eps)
+    elif cfg.optimizer == "Thor":
+        from .src.utils import get_bert_thor_lr, get_bert_thor_damping
+        lr = get_bert_thor_lr(cfg.Thor.lr_max, cfg.Thor.lr_min, cfg.Thor.lr_power, cfg.Thor.lr_total_steps)
+        damping = get_bert_thor_damping(cfg.Thor.damping_max, cfg.Thor.damping_min, cfg.Thor.damping_power,
+                                        cfg.Thor.damping_total_steps)
+        split_indices = None
+        if bert_net_cfg.num_hidden_layers == 12 and not bert_net_cfg.use_relative_positions:
+            split_indices = [28, 55, 77]
+        elif bert_net_cfg.num_hidden_layers == 24 and not bert_net_cfg.use_relative_positions:
+            split_indices = [38, 93, 149]
+        optimizer = thor(network, lr, damping, cfg.Thor.momentum,
+                         cfg.Thor.weight_decay, cfg.Thor.loss_scale, cfg.batch_size,
+                         decay_filter=lambda x: 'layernorm' not in x.name.lower() and 'bias' not in x.name.lower(),
+                         split_indices=split_indices, enable_clip_grad=True, frequency=cfg.Thor.frequency)
+    else:
+        raise ValueError("Don't support optimizer {}, only support [Lamb, Momentum, AdamWeightDecay, Thor]".
+                         format(cfg.optimizer))
+    return optimizer
+
+
+@ClassFactory.register(ClassType.TRAINER)
+class BertTrainerCallback(TrainerMs):
+    """Construct the trainer of Bert."""
+
+    disable_callbacks = ['ProgressLogger']
+
+    def build(self):
+        """Construct the trainer of Bert."""
+        logging.debug("Trainer Config: {}".format(self.config))
+        self._init_hps()
+        self.do_validation = False
+        self.use_syncbn = self.config.syncbn
+        if not self.train_loader:
+            self.train_loader = create_bert_dataset(int(os.environ.get("RANK_SIZE", "1")),
+                                                    int(os.environ.get("RANK_ID", "0")), True,
+                                                    '/root/lzc/zhwiki/wikidata/new/', '', 32)
+        if not self.valid_loader:
+            self.valid_loader = create_bert_dataset(int(os.environ.get("RANK_SIZE", "1")),
+                                                    int(os.environ.get("RANK_ID", "0")), True,
+                                                    '/root/lzc/zhwiki/wikidata/new/', '', 32)
+        self.batch_num_train = self.train_loader.get_dataset_size()
+        self.batch_num_valid = self.valid_loader.get_dataset_size()
+
+    def _train_epoch(self):
+        """Construct the trainer of Bert."""
+        from .src.model_utils.config import config as cfg, bert_net_cfg
+        cfg.train_steps = cfg.epoch_size * self.train_loader.get_dataset_size() // cfg.accumulation_steps
+        optimizer = _get_optimizer(cfg, self.model)
+
+        if cfg.enable_lossscale == "true":
+            update_cell = DynamicLossScaleUpdateCell(loss_scale_value=cfg.loss_scale_value,
+                                                     scale_factor=cfg.scale_factor,
+                                                     scale_window=cfg.scale_window)
+            accumulation_steps = cfg.accumulation_steps
+            enable_global_norm = cfg.enable_global_norm
+            if accumulation_steps <= 1:
+                if cfg.optimizer == 'AdamWeightDecay' and cfg.device_target == 'GPU':
+                    net_with_grads = BertTrainOneStepWithLossScaleCellForAdam(self.model, optimizer=optimizer,
+                                                                              scale_update_cell=update_cell)
+                else:
+                    net_with_grads = BertTrainOneStepWithLossScaleCell(self.model, optimizer=optimizer,
+                                                                       scale_update_cell=update_cell)
+            else:
+                allreduce_post = cfg.distribute == "false" or cfg.allreduce_post_accumulation == "true"
+                net_with_accumulation = (BertTrainAccumulationAllReducePostWithLossScaleCell if allreduce_post else
+                                         BertTrainAccumulationAllReduceEachWithLossScaleCell)
+                net_with_grads = net_with_accumulation(self.model, optimizer=optimizer,
+                                                       scale_update_cell=update_cell,
+                                                       accumulation_steps=accumulation_steps,
+                                                       enable_global_norm=enable_global_norm)
+        else:
+            net_with_grads = BertTrainOneStepCell(self.model, optimizer=optimizer, enable_clip_grad=True)
+            if cfg.optimizer == "Thor":
+                net_with_grads = BertTrainOneStepCell(self.model, optimizer=optimizer, sens=cfg.Thor.loss_scale,
+                                                      enable_clip_grad=False)
+
+        config_ck = CheckpointConfig(save_checkpoint_steps=self.config.save_steps, keep_checkpoint_max=1)
+        save_path = self.get_local_worker_path(self.step_name, self.worker_id)
+        ckpoint_cb = ModelCheckpoint(config=config_ck, directory=save_path)
+        loss_cb = LossMonitor()
+        callback_list = [ckpoint_cb, loss_cb]
+        model = MsModel(net_with_grads)
+        self.ms_model = ConvertModelUtils().convert_to_thor_model(model, network=net_with_grads, optimizer=optimizer)
+        try:
+            self.ms_model.train(epoch=self.epochs,
+                                train_dataset=self.train_loader,
+                                callbacks=callback_list,
+                                dataset_sink_mode=False)
+        except RuntimeError as e:
+            logging.warning(f"failed to train the model, skip it, message: {str(e)}")
+
+    def _valid_epoch(self):
+        """Construct the trainer of Bert."""
+        _, dataset, net_for_pretraining = bert_predict()
+        net = MsModel(net_for_pretraining, eval_network=net_for_pretraining, eval_indexes=[0, 1, 2],
+                      metrics={'name': myMetric()})
+        res = net.eval(dataset, dataset_sink_mode=False)
+        logging.info('Accuracy is: {}'.format(res))
+        valid_logs = dict()
+        valid_logs['cur_valid_perfs'] = res
+        self.callbacks.after_valid(valid_logs)
diff --git a/vega/algorithms/nlp/src/CRF.py b/vega/algorithms/nlp/src/CRF.py
new file mode 100644
index 00000000..0f9e5900
--- /dev/null
+++ b/vega/algorithms/nlp/src/CRF.py
@@ -0,0 +1,170 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""CRF script."""
+
+import numpy as np
+import mindspore.nn as nn
+from mindspore.ops import operations as P
+from mindspore.common.tensor import Tensor
+from mindspore.common.parameter import Parameter
+import mindspore.common.dtype as mstype
+
+
+class CRF(nn.Cell):
+    """
+    Condit Random Field.
+
+    Args:
+        tag_to_index: The dict for tag to index mapping with extra "<START>" and "<STOP>"sign.
+        batch_size: Batch size, i.e., the length of the first dimension.
+        seq_length: Sequence length, i.e., the length of the second dimension.
+        is_training: Specifies whether to use training mode.
+    Returns:
+        Training mode: Tensor, total loss.
+        Evaluation mode: Tuple, the index for each step with the highest score; Tuple, the index for the last
+        step with the highest score.
+    """
+
+    def __init__(self, tag_to_index, batch_size=1, seq_length=128, is_training=True):
+
+        super(CRF, self).__init__()
+        self.target_size = len(tag_to_index)
+        self.is_training = is_training
+        self.tag_to_index = tag_to_index
+        self.batch_size = batch_size
+        self.seq_length = seq_length
+        self.START_TAG = "<START>"
+        self.STOP_TAG = "<STOP>"
+        self.START_VALUE = Tensor(self.target_size - 2, dtype=mstype.int32)
+        self.STOP_VALUE = Tensor(self.target_size - 1, dtype=mstype.int32)
+        transitions = np.random.normal(size=(self.target_size, self.target_size)).astype(np.float32)
+        transitions[tag_to_index[self.START_TAG], :] = -10000
+        transitions[:, tag_to_index[self.STOP_TAG]] = -10000
+        self.transitions = Parameter(Tensor(transitions))
+        self.cat = P.Concat(axis=-1)
+        self.argmax = P.ArgMaxWithValue(axis=-1)
+        self.log = P.Log()
+        self.exp = P.Exp()
+        self.sum = P.ReduceSum()
+        self.tile = P.Tile()
+        self.reduce_sum = P.ReduceSum(keep_dims=True)
+        self.reshape = P.Reshape()
+        self.expand = P.ExpandDims()
+        self.mean = P.ReduceMean()
+        init_alphas = np.ones(shape=(self.batch_size, self.target_size)) * -10000.0
+        init_alphas[:, self.tag_to_index[self.START_TAG]] = 0.
+        self.init_alphas = Tensor(init_alphas, dtype=mstype.float32)
+        self.cast = P.Cast()
+        self.reduce_max = P.ReduceMax(keep_dims=True)
+        self.on_value = Tensor(1.0, dtype=mstype.float32)
+        self.off_value = Tensor(0.0, dtype=mstype.float32)
+        self.onehot = P.OneHot()
+
+    def log_sum_exp(self, logits):
+        """Compute the log_sum_exp score for Normalization factor."""
+        max_score = self.reduce_max(logits, -1)  # 16 5 5
+        score = self.log(self.reduce_sum(self.exp(logits - max_score), -1))
+        score = max_score + score
+        return score
+
+    def _realpath_score(self, features, label):
+        """Compute the emission and transition score for the real path."""
+        label = label * 1
+        concat_A = self.tile(self.reshape(self.START_VALUE, (1,)), (self.batch_size,))
+        concat_A = self.reshape(concat_A, (self.batch_size, 1))
+        labels = self.cat((concat_A, label))
+        onehot_label = self.onehot(label, self.target_size, self.on_value, self.off_value)
+        emits = features * onehot_label
+        labels = self.onehot(labels, self.target_size, self.on_value, self.off_value)
+        label1 = labels[:, 1:, :]
+        label2 = labels[:, :self.seq_length, :]
+        label1 = self.expand(label1, 3)
+        label2 = self.expand(label2, 2)
+        label_trans = label1 * label2
+        transitions = self.expand(self.expand(self.transitions, 0), 0)
+        trans = transitions * label_trans
+        score = self.sum(emits, (1, 2)) + self.sum(trans, (1, 2, 3))
+        stop_value_index = labels[:, (self.seq_length - 1):self.seq_length, :]
+        stop_value = self.transitions[(self.target_size - 1):self.target_size, :]
+        stop_score = stop_value * self.reshape(stop_value_index, (self.batch_size, self.target_size))
+        score = score + self.sum(stop_score, 1)
+        score = self.reshape(score, (self.batch_size, -1))
+        return score
+
+    def _normalization_factor(self, features):
+        """Compute the total score for all the paths."""
+        forward_var = self.init_alphas
+        forward_var = self.expand(forward_var, 1)
+        for idx in range(self.seq_length):
+            feat = features[:, idx:(idx + 1), :]
+            emit_score = self.reshape(feat, (self.batch_size, self.target_size, 1))
+            next_tag_var = emit_score + self.transitions + forward_var
+            forward_var = self.log_sum_exp(next_tag_var)
+            forward_var = self.reshape(forward_var, (self.batch_size, 1, self.target_size))
+        terminal_var = forward_var + self.reshape(self.transitions[(self.target_size - 1):self.target_size, :], (1, -1))
+        alpha = self.log_sum_exp(terminal_var)
+        alpha = self.reshape(alpha, (self.batch_size, -1))
+        return alpha
+
+    def _decoder(self, features):
+        """Viterbi decode for evaluation."""
+        backpointers = ()
+        forward_var = self.init_alphas
+        for idx in range(self.seq_length):
+            feat = features[:, idx:(idx + 1), :]
+            feat = self.reshape(feat, (self.batch_size, self.target_size))
+            bptrs_t = ()
+
+            next_tag_var = self.expand(forward_var, 1) + self.transitions
+            best_tag_id, best_tag_value = self.argmax(next_tag_var)
+            bptrs_t += (best_tag_id,)
+            forward_var = best_tag_value + feat
+
+            backpointers += (bptrs_t,)
+        terminal_var = forward_var + self.reshape(self.transitions[(self.target_size - 1):self.target_size, :], (1, -1))
+        best_tag_id, _ = self.argmax(terminal_var)
+        return backpointers, best_tag_id
+
+    def construct(self, features, label):
+        """Construct the trainer of Bert."""
+        if self.is_training:
+            forward_score = self._normalization_factor(features)
+            gold_score = self._realpath_score(features, label)
+            return_value = self.mean(forward_score - gold_score)
+        else:
+            path_list, tag = self._decoder(features)
+            return_value = path_list, tag
+        return return_value
+
+
+def postprocess(backpointers, best_tag_id):
+    """Do postprocess."""
+    best_tag_id = best_tag_id.asnumpy()
+    batch_size = len(best_tag_id)
+    best_path = []
+    for i in range(batch_size):
+        best_path.append([])
+        best_local_id = best_tag_id[i]
+        best_path[-1].append(best_local_id)
+        for bptrs_t in reversed(backpointers):
+            bptrs_t = bptrs_t[0].asnumpy()
+            local_idx = bptrs_t[i]
+            best_local_id = local_idx[best_local_id]
+            best_path[-1].append(best_local_id)
+        # Pop off the start tag (we dont want to return that to the caller)
+        best_path[-1].pop()
+        best_path[-1].reverse()
+    return best_path
diff --git a/vega/algorithms/nlp/src/__init__.py b/vega/algorithms/nlp/src/__init__.py
new file mode 100644
index 00000000..0e3f1ab8
--- /dev/null
+++ b/vega/algorithms/nlp/src/__init__.py
@@ -0,0 +1,37 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Bert Init."""
+from .bert_for_pre_training import BertNetworkWithLoss, BertPreTraining, \
+    BertPretrainingLoss, GetMaskedLMOutput, GetNextSentenceOutput, \
+    BertTrainOneStepCell, BertTrainOneStepWithLossScaleCell, \
+    BertTrainAccumulationAllReduceEachWithLossScaleCell, \
+    BertTrainAccumulationAllReducePostWithLossScaleCell, \
+    BertTrainOneStepWithLossScaleCellForAdam
+from .bert_model import BertAttention, BertConfig, BertEncoderCell, BertModel, \
+    BertOutput, BertSelfAttention, BertTransformer, EmbeddingLookup, \
+    EmbeddingPostprocessor, RelaPosEmbeddingsGenerator, RelaPosMatrixGenerator, \
+    SaturateCast, CreateAttentionMaskFromInputMask
+from .adam import AdamWeightDecayForBert, AdamWeightDecayOp
+__all__ = [
+    "BertNetworkWithLoss", "BertPreTraining", "BertPretrainingLoss",
+    "GetMaskedLMOutput", "GetNextSentenceOutput", "BertTrainOneStepCell",
+    "BertTrainOneStepWithLossScaleCell", "BertTrainAccumulationAllReduceEachWithLossScaleCell",
+    "BertTrainAccumulationAllReducePostWithLossScaleCell",
+    "BertAttention", "BertConfig", "BertEncoderCell", "BertModel", "BertOutput",
+    "BertSelfAttention", "BertTransformer", "EmbeddingLookup",
+    "EmbeddingPostprocessor", "RelaPosEmbeddingsGenerator", "AdamWeightDecayForBert",
+    "RelaPosMatrixGenerator", "SaturateCast", "CreateAttentionMaskFromInputMask",
+    "BertTrainOneStepWithLossScaleCellForAdam", "AdamWeightDecayOp"
+]
diff --git a/vega/algorithms/nlp/src/adam.py b/vega/algorithms/nlp/src/adam.py
new file mode 100644
index 00000000..6b2c0f2d
--- /dev/null
+++ b/vega/algorithms/nlp/src/adam.py
@@ -0,0 +1,407 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""AdamWeightDecayForBert, a customized Adam for bert. Input: gradient, overflow flag."""
+
+import numpy as np
+
+from mindspore.common import dtype as mstype
+from mindspore.ops import operations as P
+from mindspore.ops import composite as C
+from mindspore.ops import functional as F
+from mindspore.common.tensor import Tensor
+from mindspore._checkparam import Validator as validator
+from mindspore._checkparam import Rel
+from mindspore.nn.optim.optimizer import Optimizer
+
+_adam_opt = C.MultitypeFuncGraph("adam_opt")
+_scaler_one = Tensor(1, mstype.int32)
+_scaler_ten = Tensor(10, mstype.float32)
+
+
+@_adam_opt.register("Tensor", "Tensor", "Tensor", "Tensor", "Number", "Tensor", "Tensor", "Tensor",
+                    "Tensor", "Bool", "Bool")
+def _update_run_kernel(beta1, beta2, eps, lr, weight_decay, param, m, v, gradient, decay_flags, optim_filter):
+    """Update parameters by AdamWeightDecay op."""
+    if optim_filter:
+        adam = P.AdamWeightDecay()
+        if decay_flags:
+            next_param = adam(param, m, v, lr, beta1, beta2, eps, Tensor(weight_decay, mstype.float32), gradient)
+        else:
+            next_param = adam(param, m, v, lr, beta1, beta2, eps, Tensor(0.0, mstype.float32), gradient)
+        return next_param
+    return gradient
+
+
+@_adam_opt.register("Tensor", "Tensor", "Tensor", "Tensor", "Tensor", "Number", "Tensor", "Tensor", "Tensor",
+                    "Tensor", "Bool", "Bool")
+def _update_run_op(beta1, beta2, eps, lr, overflow, weight_decay, param, m, v, gradient, decay_flag, optim_filter):
+    """
+    Update parameters.
+
+    Args:
+        beta1 (Tensor): The exponential decay rate for the 1st moment estimations. Should be in range (0.0, 1.0).
+        beta2 (Tensor): The exponential decay rate for the 2nd moment estimations. Should be in range (0.0, 1.0).
+        eps (Tensor): Term added to the denominator to improve numerical stability. Should be greater than 0.
+        lr (Tensor): Learning rate.
+        overflow (Tensor): Whether overflow occurs.
+        weight_decay (Number): Weight decay. Should be equal to or greater than 0.
+        param (Tensor): Parameters.
+        m (Tensor): m value of parameters.
+        v (Tensor): v value of parameters.
+        gradient (Tensor): Gradient of parameters.
+        decay_flag (bool): Applies weight decay or not.
+        optim_filter (bool): Applies parameter update or not.
+
+    Returns:
+        Tensor, the new value of v after updating.
+    """
+    if optim_filter:
+        op_mul = P.Mul()
+        op_square = P.Square()
+        op_sqrt = P.Sqrt()
+        op_cast = P.Cast()
+        op_reshape = P.Reshape()
+        op_shape = P.Shape()
+        op_select = P.Select()
+
+        param_fp32 = op_cast(param, mstype.float32)
+        m_fp32 = op_cast(m, mstype.float32)
+        v_fp32 = op_cast(v, mstype.float32)
+        gradient_fp32 = op_cast(gradient, mstype.float32)
+
+        cond = op_cast(F.fill(mstype.int32, op_shape(m_fp32), 1) * op_reshape(overflow, (())), mstype.bool_)
+        next_m = op_mul(beta1, m_fp32) + op_select(cond, m_fp32,
+                                                   op_mul(op_cast(F.tuple_to_array((1.0,)), mstype.float32) - beta1,
+                                                          gradient_fp32))
+
+        next_v = op_mul(beta2, v_fp32) + op_select(cond, v_fp32,
+                                                   op_mul(op_cast(F.tuple_to_array((1.0,)), mstype.float32) - beta2,
+                                                          op_square(gradient_fp32)))
+
+        update = next_m / (eps + op_sqrt(next_v))
+        if decay_flag:
+            update = op_mul(weight_decay, param_fp32) + update
+
+        update_with_lr = op_mul(lr, update)
+        zeros = F.fill(mstype.float32, op_shape(param_fp32), 0)
+        next_param = param_fp32 - op_select(cond, zeros, op_reshape(update_with_lr, op_shape(param_fp32)))
+
+        next_param = F.depend(next_param, F.assign(param, op_cast(next_param, F.dtype(param))))
+        next_param = F.depend(next_param, F.assign(m, op_cast(next_m, F.dtype(m))))
+        next_param = F.depend(next_param, F.assign(v, op_cast(next_v, F.dtype(v))))
+
+        return op_cast(next_param, F.dtype(param))
+    return gradient
+
+
+@_adam_opt.register("Function", "Function", "Function", "Function", "Bool", "Bool", "Bool", "Tensor", "Tensor",
+                    "Tensor", "Tensor", "Tensor", "Tensor", "RowTensor", "Tensor", "Tensor", "Tensor", "Bool", "Bool")
+def _run_opt_with_sparse(opt, sparse_opt, push, pull, use_locking, use_nesterov, target, beta1_power,
+                         beta2_power, beta1, beta2, eps, lr, gradient, param, m, v, ps_parameter, cache_enable):
+    """Apply sparse adam optimizer to the weight parameter when the gradient is sparse."""
+    success = True
+    indices = gradient.indices
+    values = gradient.values
+    if ps_parameter and not cache_enable:
+        op_shape = P.Shape()
+        shapes = (op_shape(param), op_shape(m), op_shape(v),
+                  op_shape(beta1_power), op_shape(beta2_power), op_shape(lr), op_shape(beta1),
+                  op_shape(beta2), op_shape(eps), op_shape(values), op_shape(indices))
+        success = F.depend(success, pull(push((beta1_power, beta2_power, lr, beta1, beta2,
+                                               eps, values, indices), shapes), param))
+        return success
+
+    if not target:
+        success = F.depend(success, sparse_opt(param, m, v, beta1_power, beta2_power, lr, beta1, beta2,
+                                               eps, values, indices))
+    else:
+        op_mul = P.Mul()
+        op_square = P.Square()
+        op_sqrt = P.Sqrt()
+        scatter_add = P.ScatterAdd(use_locking)
+
+        success = F.depend(success, F.assign(m, op_mul(beta1, m)))
+        success = F.depend(success, F.assign(v, op_mul(beta2, v)))
+
+        grad_indices = gradient.indices
+        grad_value = gradient.values
+
+        next_m = scatter_add(m,
+                             grad_indices,
+                             op_mul(F.tuple_to_array((1.0,)) - beta1, grad_value))
+
+        next_v = scatter_add(v,
+                             grad_indices,
+                             op_mul(F.tuple_to_array((1.0,)) - beta2, op_square(grad_value)))
+
+        if use_nesterov:
+            m_temp = next_m * _scaler_ten
+            F.assign(m, op_mul(beta1, next_m))
+            div_value = scatter_add(m,
+                                    op_mul(grad_indices, _scaler_one),
+                                    op_mul(F.tuple_to_array((1.0,)) - beta1, grad_value))
+            param_update = div_value / (op_sqrt(next_v) + eps)
+            F.assign(m, m_temp / _scaler_ten)
+        else:
+            param_update = next_m / (op_sqrt(next_v) + eps)
+
+        lr_t = lr * op_sqrt(1 - beta2_power) / (1 - beta1_power)
+        next_param = param - lr_t * param_update
+
+        success = F.depend(success, F.assign(param, next_param))
+        success = F.depend(success, F.assign(m, next_m))
+        success = F.depend(success, F.assign(v, next_v))
+
+    return success
+
+
+@_adam_opt.register("Function", "Function", "Function", "Function", "Bool", "Bool", "Bool", "Tensor", "Tensor",
+                    "Tensor", "Tensor", "Tensor", "Tensor", "Tensor", "Tensor", "Tensor", "Tensor", "Bool", "Bool")
+def _run_opt_with_one_number(opt, sparse_opt, push, pull, use_locking, use_nesterov, target,
+                             beta1_power, beta2_power, beta1, beta2, eps, lr, gradient, param,
+                             moment1, moment2, ps_parameter, cache_enable):
+    """Apply adam optimizer to the weight parameter using Tensor."""
+    success = True
+    if ps_parameter and not cache_enable:
+        op_shape = P.Shape()
+        success = F.depend(success, pull(push((beta1_power, beta2_power, lr, beta1, beta2, eps, gradient),
+                                              (op_shape(param), op_shape(moment1), op_shape(moment2))), param))
+    else:
+        success = F.depend(success, opt(param, moment1, moment2, beta1_power, beta2_power, lr, beta1, beta2,
+                                        eps, gradient))
+    return success
+
+
+@_adam_opt.register("Function", "Tensor", "Tensor", "Tensor", "Tensor", "Tensor", "Tensor", "Tensor", "Tensor",
+                    "Tensor", "Tensor")
+def _run_off_load_opt(opt, beta1_power, beta2_power, beta1, beta2, eps, lr, gradient, param, moment1, moment2):
+    """Apply AdamOffload optimizer to the weight parameter using Tensor."""
+    success = True
+    delat_param = opt(moment1, moment2, beta1_power, beta2_power, lr, beta1, beta2, eps, gradient)
+    success = F.depend(success, F.assign_add(param, delat_param))
+    return success
+
+
+def _check_param_value(beta1, beta2, eps, prim_name):
+    """Check the type of inputs."""
+    validator.check_value_type("beta1", beta1, [float], prim_name)
+    validator.check_value_type("beta2", beta2, [float], prim_name)
+    validator.check_value_type("eps", eps, [float], prim_name)
+    validator.check_float_range(beta1, 0.0, 1.0, Rel.INC_NEITHER, "beta1", prim_name)
+    validator.check_float_range(beta2, 0.0, 1.0, Rel.INC_NEITHER, "beta2", prim_name)
+    validator.check_positive_float(eps, "eps", prim_name)
+
+
+class AdamWeightDecayForBert(Optimizer):
+    """
+    Implement the Adam algorithm to fix the weight decay.
+
+    Args:
+        params (Union[list[Parameter], list[dict]]): When the `params` is a list of `Parameter` which will be updated,
+            the element in `params` must be class `Parameter`. When the `params` is a list of `dict`, the "params",
+            "lr", "weight_decay" and "order_params" are the keys can be parsed.
+
+            - params: Required. The value must be a list of `Parameter`.
+
+            - lr: Optional. If "lr" is in the keys, the value of the corresponding learning rate will be used.
+              If not, the `learning_rate` in the API will be used.
+
+            - weight_decay: Optional. If "weight_decay" is in the keys, the value of the corresponding weight decay
+              will be used. If not, the `weight_decay` in the API will be used.
+
+            - order_params: Optional. If "order_params" is in the keys, the value must be the order of parameters and
+              the order will be followed in the optimizer. There are no other keys in the `dict` and the parameters
+              which in the 'order_params' must be in one of group parameters.
+
+        learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or a graph for the learning rate.
+            When the learning_rate is an Iterable or a Tensor in a 1D dimension, use the dynamic learning rate, then
+            the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule,
+            use dynamic learning rate, the i-th learning rate will be calculated during the process of training
+            according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero
+            dimension, use fixed learning rate. Other cases are not supported. The float learning rate must be
+            equal to or greater than 0. If the type of `learning_rate` is int, it will be converted to float.
+            Default: 1e-3.
+        beta1 (float): The exponential decay rate for the 1st moment estimations. Default: 0.9.
+            Should be in range (0.0, 1.0).
+        beta2 (float): The exponential decay rate for the 2nd moment estimations. Default: 0.999.
+            Should be in range (0.0, 1.0).
+        eps (float): Term added to the denominator to improve numerical stability. Default: 1e-6.
+            Should be greater than 0.
+        weight_decay (float): Weight decay (L2 penalty). It must be equal to or greater than 0. Default: 0.0.
+
+    Inputs:
+        - **gradients** (tuple[Tensor]) - The gradients of `params`, the shape is the same as `params`.
+        - **overflow** (tuple[Tensor]) - The overflow flag in dynamiclossscale.
+
+    Outputs:
+        tuple[bool], all elements are True.
+
+    Supported Platforms:
+        ``Ascend`` ``GPU``
+
+    Examples:
+        >>> net = Net()
+        >>> #1) All parameters use the same learning rate and weight decay
+        >>> optim = AdamWeightDecay(params=net.trainable_params())
+        >>>
+        >>> #2) Use parameter groups and set different values
+        >>> conv_params = list(filter(lambda x: 'conv' in x.name, net.trainable_params()))
+        >>> no_conv_params = list(filter(lambda x: 'conv' not in x.name, net.trainable_params()))
+        >>> group_params = [{'params': conv_params, 'weight_decay': 0.01},
+        ...                 {'params': no_conv_params, 'lr': 0.01},
+        ...                 {'order_params': net.trainable_params()}]
+        >>> optim = AdamWeightDecay(group_params, learning_rate=0.1, weight_decay=0.0)
+        >>> # The conv_params's parameters will use default learning rate of 0.1 and weight decay of 0.01.
+        >>> # The no_conv_params's parameters will use learning rate of 0.01 and default weight decay of 0.0.
+        >>> # The final parameters order in which the optimizer will be followed is the value of 'order_params'.
+        >>>
+        >>> loss = nn.SoftmaxCrossEntropyWithLogits()
+        >>> model = Model(net, loss_fn=loss, optimizer=optim)
+    """
+
+    def __init__(self, params, learning_rate=1e-3, beta1=0.9, beta2=0.999, eps=1e-6, weight_decay=0.0):
+        super(AdamWeightDecayForBert, self).__init__(learning_rate, params, weight_decay)
+        _check_param_value(beta1, beta2, eps, self.cls_name)
+        self.beta1 = Tensor(np.array([beta1]).astype(np.float32))
+        self.beta2 = Tensor(np.array([beta2]).astype(np.float32))
+        self.eps = Tensor(np.array([eps]).astype(np.float32))
+        self.moments1 = self.parameters.clone(prefix="adam_m", init='zeros')
+        self.moments2 = self.parameters.clone(prefix="adam_v", init='zeros')
+        self.hyper_map = C.HyperMap()
+        self.op_select = P.Select()
+        self.op_cast = P.Cast()
+        self.op_reshape = P.Reshape()
+        self.op_shape = P.Shape()
+
+    def construct(self, gradients, overflow):
+        """Construct the trainer of Bert."""
+        lr = self.get_lr()
+        cond = self.op_cast(F.fill(mstype.int32, self.op_shape(self.beta1), 1)
+                            * self.op_reshape(overflow, (())), mstype.bool_)
+        beta1 = self.op_select(cond, self.op_cast(F.tuple_to_array((1.0,)), mstype.float32), self.beta1)
+        beta2 = self.op_select(cond, self.op_cast(F.tuple_to_array((1.0,)), mstype.float32), self.beta2)
+        if self.is_group:
+            if self.is_group_lr:
+                optim_result = self.hyper_map(F.partial(_adam_opt, self.beta1, self.beta2, self.eps),
+                                              lr, self.weight_decay, self.parameters, self.moments1, self.moments2,
+                                              gradients, self.decay_flags, self.optim_filter)
+            else:
+                optim_result = self.hyper_map(F.partial(_adam_opt, beta1, beta2, self.eps, lr, overflow),
+                                              self.weight_decay, self.parameters, self.moments1, self.moments2,
+                                              gradients, self.decay_flags, self.optim_filter)
+        else:
+            optim_result = self.hyper_map(F.partial(_adam_opt, self.beta1, self.beta2, self.eps, lr, self.weight_decay),
+                                          self.parameters, self.moments1, self.moments2,
+                                          gradients, self.decay_flags, self.optim_filter)
+        if self.use_parallel:
+            self.broadcast_params(optim_result)
+        return optim_result
+
+
+class AdamWeightDecayOp(Optimizer):
+    """
+    Implement the Adam algorithm to fix the weight decay. It is a complete operator, not a combination of other ops.
+
+    Args:
+        params (Union[list[Parameter], list[dict]]): When the `params` is a list of `Parameter` which will be updated,
+            the element in `params` must be class `Parameter`. When the `params` is a list of `dict`, the "params",
+            "lr", "weight_decay" and "order_params" are the keys can be parsed.
+
+            - params: Required. The value must be a list of `Parameter`.
+
+            - lr: Optional. If "lr" is in the keys, the value of the corresponding learning rate will be used.
+              If not, the `learning_rate` in the API will be used.
+
+            - weight_decay: Optional. If "weight_decay" is in the keys, the value of the corresponding weight decay
+              will be used. If not, the `weight_decay` in the API will be used.
+
+            - order_params: Optional. If "order_params" is in the keys, the value must be the order of parameters and
+              the order will be followed in the optimizer. There are no other keys in the `dict` and the parameters
+              which in the 'order_params' must be in one of group parameters.
+
+        learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]): A value or a graph for the learning rate.
+            When the learning_rate is an Iterable or a Tensor in a 1D dimension, use the dynamic learning rate, then
+            the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule,
+            use dynamic learning rate, the i-th learning rate will be calculated during the process of training
+            according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero
+            dimension, use fixed learning rate. Other cases are not supported. The float learning rate must be
+            equal to or greater than 0. If the type of `learning_rate` is int, it will be converted to float.
+            Default: 1e-3.
+        beta1 (float): The exponential decay rate for the 1st moment estimations. Default: 0.9.
+            Should be in range (0.0, 1.0).
+        beta2 (float): The exponential decay rate for the 2nd moment estimations. Default: 0.999.
+            Should be in range (0.0, 1.0).
+        eps (float): Term added to the denominator to improve numerical stability. Default: 1e-6.
+            Should be greater than 0.
+        weight_decay (float): Weight decay (L2 penalty). It must be equal to or greater than 0. Default: 0.0.
+
+    Inputs:
+        - **gradients** (tuple[Tensor]) - The gradients of `params`, the shape is the same as `params`.
+
+    Outputs:
+        tuple[bool], all elements are True.
+
+    Supported Platforms:
+        ``GPU``
+
+    Examples:
+        >>> net = Net()
+        >>> #1) All parameters use the same learning rate and weight decay
+        >>> optim = AdamWeightDecayOp(params=net.trainable_params())
+        >>>
+        >>> #2) Use parameter groups and set different values
+        >>> conv_params = list(filter(lambda x: 'conv' in x.name, net.trainable_params()))
+        >>> no_conv_params = list(filter(lambda x: 'conv' not in x.name, net.trainable_params()))
+        >>> group_params = [{'params': conv_params, 'weight_decay': 0.01},
+        ...                 {'params': no_conv_params, 'lr': 0.01},
+        ...                 {'order_params': net.trainable_params()}]
+        >>> optim = AdamWeightDecayOp(group_params, learning_rate=0.1, weight_decay=0.0)
+        >>> # The conv_params's parameters will use default learning rate of 0.1 and weight decay of 0.01.
+        >>> # The no_conv_params's parameters will use learning rate of 0.01 and default weight decay of 0.0.
+        >>> # The final parameters order in which the optimizer will be followed is the value of 'order_params'.
+        >>>
+        >>> loss = nn.SoftmaxCrossEntropyWithLogits()
+        >>> model = Model(net, loss_fn=loss, optimizer=optim)
+    """
+
+    def __init__(self, params, learning_rate=1e-3, beta1=0.9, beta2=0.999, eps=1e-6, weight_decay=0.0):
+        super(AdamWeightDecayOp, self).__init__(learning_rate, params, weight_decay)
+        _check_param_value(beta1, beta2, eps, self.cls_name)
+        self.beta1 = Tensor(np.array([beta1]).astype(np.float32))
+        self.beta2 = Tensor(np.array([beta2]).astype(np.float32))
+        self.eps = Tensor(np.array([eps]).astype(np.float32))
+        self.moments1 = self.parameters.clone(prefix="adam_m", init='zeros')
+        self.moments2 = self.parameters.clone(prefix="adam_v", init='zeros')
+        self.hyper_map = C.HyperMap()
+
+    def construct(self, gradients):
+        """Construct the trainer of Bert."""
+        lr = self.get_lr()
+        if self.is_group:
+            if self.is_group_lr:
+                optim_result = self.hyper_map(F.partial(_adam_opt, self.beta1, self.beta2, self.eps),
+                                              lr, self.weight_decay, self.parameters, self.moments1, self.moments2,
+                                              gradients, self.decay_flags, self.optim_filter)
+            else:
+                optim_result = self.hyper_map(F.partial(_adam_opt, self.beta1, self.beta2, self.eps, lr),
+                                              self.weight_decay, self.parameters, self.moments1, self.moments2,
+                                              gradients, self.decay_flags, self.optim_filter)
+        else:
+            optim_result = self.hyper_map(F.partial(_adam_opt, self.beta1, self.beta2, self.eps, lr, self.weight_decay),
+                                          self.parameters, self.moments1, self.moments2,
+                                          gradients, self.decay_flags, self.optim_filter)
+        if self.use_parallel:
+            self.broadcast_params(optim_result)
+        return optim_result
diff --git a/vega/algorithms/nlp/src/assessment_method.py b/vega/algorithms/nlp/src/assessment_method.py
new file mode 100644
index 00000000..6557715b
--- /dev/null
+++ b/vega/algorithms/nlp/src/assessment_method.py
@@ -0,0 +1,152 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""Bert evaluation assessment method script."""
+
+import math
+import numpy as np
+from mindspore.nn.metrics import ConfusionMatrixMetric
+from .CRF import postprocess
+
+
+class Accuracy():
+    """Calculate accuracy."""
+
+    def __init__(self):
+        self.acc_num = 0
+        self.total_num = 0
+
+    def update(self, logits, labels):
+        """Construct the trainer of Bert."""
+        labels = labels.asnumpy()
+        labels = np.reshape(labels, -1)
+        logits = logits.asnumpy()
+        logit_id = np.argmax(logits, axis=-1)
+        self.acc_num += np.sum(labels == logit_id)
+        self.total_num += len(labels)
+
+
+class F1():
+    """Calculate F1 score."""
+
+    def __init__(self, use_crf=False, num_labels=2, mode="Binary"):
+        self.TP = 0
+        self.FP = 0
+        self.FN = 0
+        self.use_crf = use_crf
+        self.num_labels = num_labels
+        self.mode = mode
+        if self.mode.lower() not in ("binary", "multilabel"):
+            raise ValueError("Assessment mode not supported, support: [Binary, MultiLabel]")
+        if self.mode.lower() != "binary":
+            self.metric = ConfusionMatrixMetric(skip_channel=False, metric_name=("f1 score"),
+                                                calculation_method=False, decrease="mean")
+
+    def update(self, logits, labels):
+        """Update F1 score."""
+        labels = labels.asnumpy()
+        labels = np.reshape(labels, -1)
+        if self.use_crf:
+            backpointers, best_tag_id = logits
+            best_path = postprocess(backpointers, best_tag_id)
+            logit_id = []
+            for ele in best_path:
+                logit_id.extend(ele)
+        else:
+            logits = logits.asnumpy()
+            logit_id = np.argmax(logits, axis=-1)
+            logit_id = np.reshape(logit_id, -1)
+
+        if self.mode.lower() == "binary":
+            pos_eva = np.isin(logit_id, [i for i in range(1, self.num_labels)])
+            pos_label = np.isin(labels, [i for i in range(1, self.num_labels)])
+            self.TP += np.sum(pos_eva & pos_label)
+            self.FP += np.sum(pos_eva & (~pos_label))
+            self.FN += np.sum((~pos_eva) & pos_label)
+        else:
+            target = np.zeros((len(labels), self.num_labels), dtype=np.int)
+            pred = np.zeros((len(logit_id), self.num_labels), dtype=np.int)
+            for i, label in enumerate(labels):
+                target[i][label] = 1
+            for i, label in enumerate(logit_id):
+                pred[i][label] = 1
+            self.metric.update(pred, target)
+
+    def eval(self):
+        """Construct the trainer of Bert."""
+        return self.metric.eval()
+
+
+class MCC():
+    """Calculate Matthews Correlation Coefficient."""
+
+    def __init__(self):
+        self.TP = 0
+        self.FP = 0
+        self.FN = 0
+        self.TN = 0
+
+    def update(self, logits, labels):
+        """Construct the trainer of Bert."""
+        labels = labels.asnumpy()
+        labels = np.reshape(labels, -1)
+        labels = labels.astype(np.bool)
+        logits = logits.asnumpy()
+        logit_id = np.argmax(logits, axis=-1)
+        logit_id = np.reshape(logit_id, -1)
+        logit_id = logit_id.astype(np.bool)
+        ornot = logit_id ^ labels
+
+        self.TP += (~ornot & labels).sum()
+        self.FP += (ornot & ~labels).sum()
+        self.FN += (ornot & labels).sum()
+        self.TN += (~ornot & ~labels).sum()
+
+    def cal(self):
+        """Construct the trainer of Bert."""
+        mcc = (self.TP * self.TN - self.FP * self.FN) / math.sqrt((self.TP + self.FP) * (self.TP + self.FN)
+                                                                  * (self.TN + self.FP) * (self.TN + self.FN))
+        return mcc
+
+
+class Spearman_Correlation():
+    """Calculate Spearman Correlation Coefficient."""
+
+    def __init__(self):
+        self.label = []
+        self.logit = []
+
+    def update(self, logits, labels):
+        """Construct the trainer of Bert."""
+        labels = labels.asnumpy()
+        labels = np.reshape(labels, -1)
+        logits = logits.asnumpy()
+        logits = np.reshape(logits, -1)
+        self.label.append(labels)
+        self.logit.append(logits)
+
+    def cal(self):
+        """Calculate Spearman Correlation."""
+        label = np.concatenate(self.label)
+        logit = np.concatenate(self.logit)
+        sort_label = label.argsort()[::-1]
+        sort_logit = logit.argsort()[::-1]
+        n = len(label)
+        d_acc = 0
+        for i in range(n):
+            d = np.where(sort_label == i)[0] - np.where(sort_logit == i)[0]
+            d_acc += d ** 2
+        ps = 1 - 6 * d_acc / n / (n ** 2 - 1)
+        return ps
diff --git a/vega/algorithms/nlp/src/bert_for_finetune.py b/vega/algorithms/nlp/src/bert_for_finetune.py
new file mode 100644
index 00000000..66eec169
--- /dev/null
+++ b/vega/algorithms/nlp/src/bert_for_finetune.py
@@ -0,0 +1,342 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""Bert for finetune script."""
+
+import mindspore.nn as nn
+from mindspore.ops import operations as P
+from mindspore.ops import functional as F
+from mindspore.ops import composite as C
+from mindspore.common.tensor import Tensor
+from mindspore.common.parameter import Parameter
+from mindspore.common import dtype as mstype
+from mindspore.nn.wrap.grad_reducer import DistributedGradReducer
+from mindspore.context import ParallelMode
+from mindspore.communication.management import get_group_size
+from mindspore import context
+from .bert_for_pre_training import clip_grad
+from .finetune_eval_model import BertCLSModel, BertNERModel, BertSquadModel
+from .utils import CrossEntropyCalculation
+
+GRADIENT_CLIP_TYPE = 1
+GRADIENT_CLIP_VALUE = 1.0
+grad_scale = C.MultitypeFuncGraph("grad_scale")
+reciprocal = P.Reciprocal()
+
+
+@grad_scale.register("Tensor", "Tensor")
+def tensor_grad_scale(scale, grad):
+    """Construct the trainer of Bert."""
+    return grad * reciprocal(scale)
+
+
+_grad_overflow = C.MultitypeFuncGraph("_grad_overflow")
+grad_overflow = P.FloatStatus()
+
+
+@_grad_overflow.register("Tensor")
+def _tensor_grad_overflow(grad):
+    """Construct the trainer of Bert."""
+    return grad_overflow(grad)
+
+
+class BertFinetuneCell(nn.Cell):
+    """
+    Especially defined for finetuning where only four inputs tensor are needed.
+
+    Args:
+        network (Cell): The training network. Note that loss function should have been added.
+        optimizer (Optimizer): Optimizer for updating the weights.
+        scale_update_cell (Cell): Cell to do the loss scale. Default: None.
+    """
+
+    def __init__(self, network, optimizer, scale_update_cell=None):
+
+        super(BertFinetuneCell, self).__init__(auto_prefix=False)
+        self.network = network
+        self.network.set_grad()
+        self.weights = optimizer.parameters
+        self.optimizer = optimizer
+        self.grad = C.GradOperation(get_by_list=True,
+                                    sens_param=True)
+        self.reducer_flag = False
+        self.allreduce = P.AllReduce()
+        self.parallel_mode = context.get_auto_parallel_context("parallel_mode")
+        if self.parallel_mode in [ParallelMode.DATA_PARALLEL, ParallelMode.HYBRID_PARALLEL]:
+            self.reducer_flag = True
+        self.grad_reducer = None
+        if self.reducer_flag:
+            mean = context.get_auto_parallel_context("gradients_mean")
+            degree = get_group_size()
+            self.grad_reducer = DistributedGradReducer(optimizer.parameters, mean, degree)
+        self.is_distributed = (self.parallel_mode != ParallelMode.STAND_ALONE)
+        self.cast = P.Cast()
+        self.gpu_target = False
+        if context.get_context("device_target") == "GPU":
+            self.gpu_target = True
+            self.float_status = P.FloatStatus()
+            self.addn = P.AddN()
+            self.reshape = P.Reshape()
+        else:
+            self.alloc_status = P.NPUAllocFloatStatus()
+            self.get_status = P.NPUGetFloatStatus()
+            self.clear_status = P.NPUClearFloatStatus()
+        self.reduce_sum = P.ReduceSum(keep_dims=False)
+        self.base = Tensor(1, mstype.float32)
+        self.less_equal = P.LessEqual()
+        self.hyper_map = C.HyperMap()
+        self.loss_scale = None
+        self.loss_scaling_manager = scale_update_cell
+        if scale_update_cell:
+            self.loss_scale = Parameter(Tensor(scale_update_cell.get_loss_scale(), dtype=mstype.float32))
+
+    def construct(self,
+                  input_ids,
+                  input_mask,
+                  token_type_id,
+                  label_ids,
+                  sens=None):
+        """Bert Finetune."""
+        weights = self.weights
+        init = False
+        loss = self.network(input_ids,
+                            input_mask,
+                            token_type_id,
+                            label_ids)
+        if sens is None:
+            scaling_sens = self.loss_scale
+        else:
+            scaling_sens = sens
+
+        if not self.gpu_target:
+            init = self.alloc_status()
+            init = F.depend(init, loss)
+            clear_status = self.clear_status(init)
+            scaling_sens = F.depend(scaling_sens, clear_status)
+        grads = self.grad(self.network, weights)(input_ids,
+                                                 input_mask,
+                                                 token_type_id,
+                                                 label_ids,
+                                                 self.cast(scaling_sens,
+                                                           mstype.float32))
+        grads = self.hyper_map(F.partial(grad_scale, scaling_sens), grads)
+        grads = self.hyper_map(F.partial(clip_grad, GRADIENT_CLIP_TYPE, GRADIENT_CLIP_VALUE), grads)
+        if self.reducer_flag:
+            grads = self.grad_reducer(grads)
+        if not self.gpu_target:
+            init = F.depend(init, grads)
+            get_status = self.get_status(init)
+            init = F.depend(init, get_status)
+            flag_sum = self.reduce_sum(init, (0,))
+        else:
+            flag_sum = self.hyper_map(F.partial(_grad_overflow), grads)
+            flag_sum = self.addn(flag_sum)
+            flag_sum = self.reshape(flag_sum, (()))
+        if self.is_distributed:
+            flag_reduce = self.allreduce(flag_sum)
+            cond = self.less_equal(self.base, flag_reduce)
+        else:
+            cond = self.less_equal(self.base, flag_sum)
+        overflow = cond
+        if sens is None:
+            overflow = self.loss_scaling_manager(self.loss_scale, cond)
+        if overflow:
+            succ = False
+        else:
+            succ = self.optimizer(grads)
+        ret = (loss, cond)
+        return F.depend(ret, succ)
+
+
+class BertSquadCell(nn.Cell):
+    """Specify defined for finetuning where only four inputs tensor are needed."""
+
+    def __init__(self, network, optimizer, scale_update_cell=None):
+        super(BertSquadCell, self).__init__(auto_prefix=False)
+        self.network = network
+        self.network.set_grad()
+        self.weights = optimizer.parameters
+        self.optimizer = optimizer
+        self.grad = C.GradOperation(get_by_list=True, sens_param=True)
+        self.reducer_flag = False
+        self.allreduce = P.AllReduce()
+        self.parallel_mode = context.get_auto_parallel_context("parallel_mode")
+        if self.parallel_mode in [ParallelMode.DATA_PARALLEL, ParallelMode.HYBRID_PARALLEL]:
+            self.reducer_flag = True
+        self.grad_reducer = None
+        if self.reducer_flag:
+            mean = context.get_auto_parallel_context("gradients_mean")
+            degree = get_group_size()
+            self.grad_reducer = DistributedGradReducer(optimizer.parameters, mean, degree)
+        self.is_distributed = (self.parallel_mode != ParallelMode.STAND_ALONE)
+        self.cast = P.Cast()
+        self.alloc_status = P.NPUAllocFloatStatus()
+        self.get_status = P.NPUGetFloatStatus()
+        self.clear_status = P.NPUClearFloatStatus()
+        self.reduce_sum = P.ReduceSum(keep_dims=False)
+        self.base = Tensor(1, mstype.float32)
+        self.less_equal = P.LessEqual()
+        self.hyper_map = C.HyperMap()
+        self.loss_scale = None
+        self.loss_scaling_manager = scale_update_cell
+        if scale_update_cell:
+            self.loss_scale = Parameter(Tensor(scale_update_cell.get_loss_scale(), dtype=mstype.float32))
+
+    def construct(self,
+                  input_ids,
+                  input_mask,
+                  token_type_id,
+                  start_position,
+                  end_position,
+                  unique_id,
+                  is_impossible,
+                  sens=None):
+        """Construct the trainer of Bert."""
+        weights = self.weights
+        init = self.alloc_status()
+        loss = self.network(input_ids,
+                            input_mask,
+                            token_type_id,
+                            start_position,
+                            end_position,
+                            unique_id,
+                            is_impossible)
+        if sens is None:
+            scaling_sens = self.loss_scale
+        else:
+            scaling_sens = sens
+        init = F.depend(init, loss)
+        clear_status = self.clear_status(init)
+        scaling_sens = F.depend(scaling_sens, clear_status)
+        grads = self.grad(self.network, weights)(input_ids,
+                                                 input_mask,
+                                                 token_type_id,
+                                                 start_position,
+                                                 end_position,
+                                                 unique_id,
+                                                 is_impossible,
+                                                 self.cast(scaling_sens,
+                                                           mstype.float32))
+        grads = self.hyper_map(F.partial(grad_scale, scaling_sens), grads)
+        grads = self.hyper_map(F.partial(clip_grad, GRADIENT_CLIP_TYPE, GRADIENT_CLIP_VALUE), grads)
+        if self.reducer_flag:
+            grads = self.grad_reducer(grads)
+        init = F.depend(init, grads)
+        get_status = self.get_status(init)
+        init = F.depend(init, get_status)
+        flag_sum = self.reduce_sum(init, (0,))
+        if self.is_distributed:
+            flag_reduce = self.allreduce(flag_sum)
+            cond = self.less_equal(self.base, flag_reduce)
+        else:
+            cond = self.less_equal(self.base, flag_sum)
+        overflow = cond
+        if sens is None:
+            overflow = self.loss_scaling_manager(self.loss_scale, cond)
+        if overflow:
+            succ = False
+        else:
+            succ = self.optimizer(grads)
+        ret = (loss, cond)
+        return F.depend(ret, succ)
+
+
+class BertCLS(nn.Cell):
+    """Train interface for classification finetuning task."""
+
+    def __init__(self, config, is_training, num_labels=2, dropout_prob=0.0, use_one_hot_embeddings=False,
+                 assessment_method=""):
+        super(BertCLS, self).__init__()
+        self.bert = BertCLSModel(config, is_training, num_labels, dropout_prob, use_one_hot_embeddings,
+                                 assessment_method)
+        self.loss = CrossEntropyCalculation(is_training)
+        self.num_labels = num_labels
+        self.assessment_method = assessment_method
+        self.is_training = is_training
+
+    def construct(self, input_ids, input_mask, token_type_id, label_ids):
+        """Construct the trainer of Bert."""
+        logits = self.bert(input_ids, input_mask, token_type_id)
+        if self.assessment_method == "spearman_correlation":
+            if self.is_training:
+                loss = self.loss(logits, label_ids)
+            else:
+                loss = logits
+        else:
+            loss = self.loss(logits, label_ids, self.num_labels)
+        return loss
+
+
+class BertNER(nn.Cell):
+    """Train interface for sequence labeling finetuning task."""
+
+    def __init__(self, config, batch_size, is_training, num_labels=11, use_crf=False,
+                 tag_to_index=None, dropout_prob=0.0, use_one_hot_embeddings=False):
+        super(BertNER, self).__init__()
+        self.bert = BertNERModel(config, is_training, num_labels, use_crf, dropout_prob, use_one_hot_embeddings)
+        if use_crf:
+            if not tag_to_index:
+                raise Exception("The dict for tag-index mapping should be provided for CRF.")
+            from src.CRF import CRF
+            self.loss = CRF(tag_to_index, batch_size, config.seq_length, is_training)
+        else:
+            self.loss = CrossEntropyCalculation(is_training)
+        self.num_labels = num_labels
+        self.use_crf = use_crf
+
+    def construct(self, input_ids, input_mask, token_type_id, label_ids):
+        """Construct the trainer of Bert."""
+        logits = self.bert(input_ids, input_mask, token_type_id)
+        if self.use_crf:
+            loss = self.loss(logits, label_ids)
+        else:
+            loss = self.loss(logits, label_ids, self.num_labels)
+        return loss
+
+
+class BertSquad(nn.Cell):
+    """Train interface for SQuAD finetuning task."""
+
+    def __init__(self, config, is_training, num_labels=2, dropout_prob=0.0, use_one_hot_embeddings=False):
+        super(BertSquad, self).__init__()
+        self.bert = BertSquadModel(config, is_training, num_labels, dropout_prob, use_one_hot_embeddings)
+        self.loss = CrossEntropyCalculation(is_training)
+        self.num_labels = num_labels
+        self.seq_length = config.seq_length
+        self.is_training = is_training
+        self.total_num = Parameter(Tensor([0], mstype.float32))
+        self.start_num = Parameter(Tensor([0], mstype.float32))
+        self.end_num = Parameter(Tensor([0], mstype.float32))
+        self.sum = P.ReduceSum()
+        self.equal = P.Equal()
+        self.argmax = P.ArgMaxWithValue(axis=1)
+        self.squeeze = P.Squeeze(axis=-1)
+
+    def construct(self, input_ids, input_mask, token_type_id, start_position, end_position, unique_id, is_impossible):
+        """Interface for SQuAD finetuning task."""
+        logits = self.bert(input_ids, input_mask, token_type_id)
+        if self.is_training:
+            unstacked_logits_0 = self.squeeze(logits[:, :, 0:1])
+            unstacked_logits_1 = self.squeeze(logits[:, :, 1:2])
+            start_loss = self.loss(unstacked_logits_0, start_position, self.seq_length)
+            end_loss = self.loss(unstacked_logits_1, end_position, self.seq_length)
+            total_loss = (start_loss + end_loss) / 2.0
+        else:
+            start_logits = self.squeeze(logits[:, :, 0:1])
+            start_logits = start_logits + 100 * input_mask
+            end_logits = self.squeeze(logits[:, :, 1:2])
+            end_logits = end_logits + 100 * input_mask
+            total_loss = (unique_id, start_logits, end_logits)
+        return total_loss
diff --git a/vega/algorithms/nlp/src/bert_for_pre_training.py b/vega/algorithms/nlp/src/bert_for_pre_training.py
new file mode 100644
index 00000000..6ec9b24c
--- /dev/null
+++ b/vega/algorithms/nlp/src/bert_for_pre_training.py
@@ -0,0 +1,860 @@
+# Copyright 2020-2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Bert for pretraining."""
+import numpy as np
+
+import mindspore.nn as nn
+from mindspore.common.initializer import initializer, TruncatedNormal
+from mindspore.ops import operations as P
+from mindspore.ops import functional as F
+from mindspore.ops import composite as C
+from mindspore.common.tensor import Tensor
+from mindspore.common.parameter import Parameter
+from mindspore.common import dtype as mstype
+from mindspore.nn.wrap.grad_reducer import DistributedGradReducer
+from mindspore.context import ParallelMode
+from mindspore.communication.management import get_group_size
+from mindspore import context
+from vega.common import ClassFactory, ClassType
+from vega.modules.module import Module
+from .bert_model import BertModel
+
+GRADIENT_CLIP_TYPE = 1
+GRADIENT_CLIP_VALUE = 1.0
+
+clip_grad = C.MultitypeFuncGraph("clip_grad")
+
+
+@ClassFactory.register(ClassType.NETWORK)
+class Bert(Module):
+    """
+    Provide bert pre-training loss through network.
+
+    Args:
+        config (BertConfig): The config of BertModel.
+        is_training (bool): Specifies whether to use the training mode.
+        use_one_hot_embeddings (bool): Specifies whether to use one-hot for embeddings. Default: False.
+
+    Returns:
+        Tensor, the loss of the network.
+    """
+
+    def __init__(self):
+        from .model_utils.config import config as cfg, bert_net_cfg
+        super(Bert, self).__init__()
+        self.net = BertNetworkWithLoss(bert_net_cfg, True)
+
+    def construct(self,
+                  input_ids,
+                  input_mask,
+                  token_type_id,
+                  next_sentence_labels,
+                  masked_lm_positions,
+                  masked_lm_ids,
+                  masked_lm_weights):
+        """Get pre-training loss."""
+        return self.net(input_ids,
+                        input_mask,
+                        token_type_id,
+                        next_sentence_labels,
+                        masked_lm_positions,
+                        masked_lm_ids,
+                        masked_lm_weights)
+
+
+@clip_grad.register("Number", "Number", "Tensor")
+def _clip_grad(clip_type, clip_value, grad):
+    """
+    Clip gradients.
+
+    Inputs:
+        clip_type (int): The way to clip, 0 for 'value', 1 for 'norm'.
+        clip_value (float): Specifies how much to clip.
+        grad (tuple[Tensor]): Gradients.
+
+    Outputs:
+        tuple[Tensor], clipped gradients.
+    """
+    if clip_type not in (0, 1):
+        return grad
+    dt = F.dtype(grad)
+    if clip_type == 0:
+        new_grad = C.clip_by_value(grad, F.cast(F.tuple_to_array((-clip_value,)), dt),
+                                   F.cast(F.tuple_to_array((clip_value,)), dt))
+    else:
+        new_grad = nn.ClipByNorm()(grad, F.cast(F.tuple_to_array((clip_value,)), dt))
+    return new_grad
+
+
+class GetMaskedLMOutput(nn.Cell):
+    """
+    Get masked lm output.
+
+    Args:
+        config (BertConfig): The config of BertModel.
+
+    Returns:
+        Tensor, masked lm output.
+    """
+
+    def __init__(self, config):
+        super(GetMaskedLMOutput, self).__init__()
+        self.width = config.hidden_size
+        self.reshape = P.Reshape()
+        self.gather = P.Gather()
+
+        weight_init = TruncatedNormal(config.initializer_range)
+        self.dense = nn.Dense(self.width,
+                              config.hidden_size,
+                              weight_init=weight_init,
+                              activation=config.hidden_act).to_float(config.compute_type)
+        self.layernorm = nn.LayerNorm((config.hidden_size,)).to_float(config.compute_type)
+        self.output_bias = Parameter(
+            initializer(
+                'zero',
+                config.vocab_size))
+        self.matmul = P.MatMul(transpose_b=True)
+        self.log_softmax = nn.LogSoftmax(axis=-1)
+        self.shape_flat_offsets = (-1, 1)
+        self.last_idx = (-1,)
+        self.shape_flat_sequence_tensor = (-1, self.width)
+        self.seq_length_tensor = Tensor(np.array((config.seq_length,)).astype(np.int32))
+        self.cast = P.Cast()
+        self.compute_type = config.compute_type
+        self.dtype = config.dtype
+
+    def construct(self,
+                  input_tensor,
+                  output_weights,
+                  positions):
+        """Get output log_probs."""
+        rng = F.tuple_to_array(F.make_range(P.Shape()(input_tensor)[0]))
+        flat_offsets = self.reshape(rng * self.seq_length_tensor, self.shape_flat_offsets)
+        flat_position = self.reshape(positions + flat_offsets, self.last_idx)
+        flat_sequence_tensor = self.reshape(input_tensor, self.shape_flat_sequence_tensor)
+        input_tensor = self.gather(flat_sequence_tensor, flat_position, 0)
+        input_tensor = self.cast(input_tensor, self.compute_type)
+        output_weights = self.cast(output_weights, self.compute_type)
+        input_tensor = self.dense(input_tensor)
+        input_tensor = self.layernorm(input_tensor)
+        logits = self.matmul(input_tensor, output_weights)
+        logits = self.cast(logits, self.dtype)
+        logits = logits + self.output_bias
+        log_probs = self.log_softmax(logits)
+        return log_probs
+
+
+class GetNextSentenceOutput(nn.Cell):
+    """
+    Get next sentence output.
+
+    Args:
+        config (BertConfig): The config of Bert.
+
+    Returns:
+        Tensor, next sentence output.
+    """
+
+    def __init__(self, config):
+        super(GetNextSentenceOutput, self).__init__()
+        self.log_softmax = P.LogSoftmax()
+        weight_init = TruncatedNormal(config.initializer_range)
+        self.dense = nn.Dense(config.hidden_size, 2,
+                              weight_init=weight_init, has_bias=True).to_float(config.compute_type)
+        self.dtype = config.dtype
+        self.cast = P.Cast()
+
+    def construct(self, input_tensor):
+        """Construct the trainer of Bert."""
+        logits = self.dense(input_tensor)
+        logits = self.cast(logits, self.dtype)
+        log_prob = self.log_softmax(logits)
+        return log_prob
+
+
+class BertPreTraining(nn.Cell):
+    """
+    Bert pretraining network.
+
+    Args:
+        config (BertConfig): The config of BertModel.
+        is_training (bool): Specifies whether to use the training mode.
+        use_one_hot_embeddings (bool): Specifies whether to use one-hot for embeddings.
+
+    Returns:
+        Tensor, prediction_scores, seq_relationship_score.
+    """
+
+    def __init__(self, config, is_training, use_one_hot_embeddings):
+        super(BertPreTraining, self).__init__()
+        self.bert = BertModel(config, is_training, use_one_hot_embeddings)
+        self.cls1 = GetMaskedLMOutput(config)
+        self.cls2 = GetNextSentenceOutput(config)
+
+    def construct(self, input_ids, input_mask, token_type_id,
+                  masked_lm_positions):
+        """Construct the trainer of Bert."""
+        sequence_output, pooled_output, embedding_table = \
+            self.bert(input_ids, token_type_id, input_mask)
+        prediction_scores = self.cls1(sequence_output,
+                                      embedding_table,
+                                      masked_lm_positions)
+        seq_relationship_score = self.cls2(pooled_output)
+        return prediction_scores, seq_relationship_score
+
+
+class BertPretrainingLoss(nn.Cell):
+    """
+    Provide bert pre-training loss.
+
+    Args:
+        config (BertConfig): The config of BertModel.
+
+    Returns:
+        Tensor, total loss.
+    """
+
+    def __init__(self, config):
+        super(BertPretrainingLoss, self).__init__()
+        self.vocab_size = config.vocab_size
+        self.onehot = P.OneHot()
+        self.on_value = Tensor(1.0, mstype.float32)
+        self.off_value = Tensor(0.0, mstype.float32)
+        self.reduce_sum = P.ReduceSum()
+        self.reduce_mean = P.ReduceMean()
+        self.reshape = P.Reshape()
+        self.last_idx = (-1,)
+        self.neg = P.Neg()
+        self.cast = P.Cast()
+
+    def construct(self, prediction_scores, seq_relationship_score, masked_lm_ids,
+                  masked_lm_weights, next_sentence_labels):
+        """Define the computation performed."""
+        label_ids = self.reshape(masked_lm_ids, self.last_idx)
+        label_weights = self.cast(self.reshape(masked_lm_weights, self.last_idx), mstype.float32)
+        one_hot_labels = self.onehot(label_ids, self.vocab_size, self.on_value, self.off_value)
+
+        per_example_loss = self.neg(self.reduce_sum(prediction_scores * one_hot_labels, self.last_idx))
+        numerator = self.reduce_sum(label_weights * per_example_loss, ())
+        denominator = self.reduce_sum(label_weights, ()) + self.cast(F.tuple_to_array((1e-5,)), mstype.float32)
+        masked_lm_loss = numerator / denominator
+
+        # next_sentence_loss
+        labels = self.reshape(next_sentence_labels, self.last_idx)
+        one_hot_labels = self.onehot(labels, 2, self.on_value, self.off_value)
+        per_example_loss = self.neg(self.reduce_sum(
+            one_hot_labels * seq_relationship_score, self.last_idx))
+        next_sentence_loss = self.reduce_mean(per_example_loss, self.last_idx)
+
+        # total_loss
+        total_loss = masked_lm_loss + next_sentence_loss
+        print(total_loss)
+        return total_loss
+
+
+class BertNetworkWithLoss(nn.Cell):
+    """
+    Provide bert pre-training loss through network.
+
+    Args:
+        config (BertConfig): The config of BertModel.
+        is_training (bool): Specifies whether to use the training mode.
+        use_one_hot_embeddings (bool): Specifies whether to use one-hot for embeddings. Default: False.
+
+    Returns:
+        Tensor, the loss of the network.
+    """
+
+    def __init__(self, config, is_training, use_one_hot_embeddings=False):
+        super(BertNetworkWithLoss, self).__init__()
+        self.bert = BertPreTraining(config, is_training, use_one_hot_embeddings)
+        self.loss = BertPretrainingLoss(config)
+        self.cast = P.Cast()
+
+    def construct(self,
+                  input_ids,
+                  input_mask,
+                  token_type_id,
+                  next_sentence_labels,
+                  masked_lm_positions,
+                  masked_lm_ids,
+                  masked_lm_weights):
+        """Get pre-training loss."""
+        prediction_scores, seq_relationship_score = \
+            self.bert(input_ids, input_mask, token_type_id, masked_lm_positions)
+        total_loss = self.loss(prediction_scores, seq_relationship_score,
+                               masked_lm_ids, masked_lm_weights, next_sentence_labels)
+        return self.cast(total_loss, mstype.float32)
+
+
+class BertTrainOneStepCell(nn.TrainOneStepCell):
+    """
+    Encapsulation class of bert network training.
+
+    Append an optimizer to the training network after that the construct
+    function can be called to create the backward graph.
+
+    Args:
+        network (Cell): The training network. Note that loss function should have been added.
+        optimizer (Optimizer): Optimizer for updating the weights.
+        sens (Number): The adjust parameter. Default: 1.0.
+        enable_clip_grad (boolean): If True, clip gradients in BertTrainOneStepCell. Default: True.
+    """
+
+    def __init__(self, network, optimizer, sens=1.0, enable_clip_grad=True):
+        super(BertTrainOneStepCell, self).__init__(network, optimizer, sens)
+        self.cast = P.Cast()
+        self.hyper_map = C.HyperMap()
+        self.enable_clip_grad = enable_clip_grad
+
+    def set_sens(self, value):
+        """Construct the trainer of Bert."""
+        self.sens = value
+
+    def construct(self,
+                  input_ids,
+                  input_mask,
+                  token_type_id,
+                  next_sentence_labels,
+                  masked_lm_positions,
+                  masked_lm_ids,
+                  masked_lm_weights):
+        """Define the computation performed."""
+        weights = self.weights
+
+        loss = self.network(input_ids,
+                            input_mask,
+                            token_type_id,
+                            next_sentence_labels,
+                            masked_lm_positions,
+                            masked_lm_ids,
+                            masked_lm_weights)
+        grads = self.grad(self.network, weights)(input_ids,
+                                                 input_mask,
+                                                 token_type_id,
+                                                 next_sentence_labels,
+                                                 masked_lm_positions,
+                                                 masked_lm_ids,
+                                                 masked_lm_weights,
+                                                 self.cast(F.tuple_to_array((self.sens,)),
+                                                           mstype.float32))
+        if self.enable_clip_grad:
+            grads = self.hyper_map(F.partial(clip_grad, GRADIENT_CLIP_TYPE, GRADIENT_CLIP_VALUE), grads)
+        grads = self.grad_reducer(grads)
+        succ = self.optimizer(grads)
+        return F.depend(loss, succ)
+
+
+grad_scale = C.MultitypeFuncGraph("grad_scale")
+reciprocal = P.Reciprocal()
+
+
+@grad_scale.register("Tensor", "Tensor")
+def tensor_grad_scale(scale, grad):
+    """Construct the trainer of Bert."""
+    return grad * reciprocal(scale)
+
+
+_grad_overflow = C.MultitypeFuncGraph("_grad_overflow")
+grad_overflow = P.FloatStatus()
+
+
+@_grad_overflow.register("Tensor")
+def _tensor_grad_overflow(grad):
+    """Construct the trainer of Bert."""
+    return grad_overflow(grad)
+
+
+class BertTrainOneStepWithLossScaleCell(nn.TrainOneStepWithLossScaleCell):
+    """
+    Encapsulation class of bert network training.
+
+    Append an optimizer to the training network after that the construct
+    function can be called to create the backward graph.
+
+    Args:
+        network (Cell): The training network. Note that loss function should have been added.
+        optimizer (Optimizer): Optimizer for updating the weights.
+        scale_update_cell (Cell): Cell to do the loss scale. Default: None.
+    """
+
+    def __init__(self, network, optimizer, scale_update_cell=None):
+        super(BertTrainOneStepWithLossScaleCell, self).__init__(network, optimizer, scale_update_cell)
+        self.cast = P.Cast()
+        self.degree = 1
+        if self.reducer_flag:
+            self.degree = get_group_size()
+            self.grad_reducer = DistributedGradReducer(optimizer.parameters, False, self.degree)
+
+        self.loss_scale = None
+        self.loss_scaling_manager = scale_update_cell
+        if scale_update_cell:
+            self.loss_scale = Parameter(Tensor(scale_update_cell.get_loss_scale(), dtype=mstype.float32))
+
+    def construct(self,
+                  input_ids,
+                  input_mask,
+                  token_type_id,
+                  next_sentence_labels,
+                  masked_lm_positions,
+                  masked_lm_ids,
+                  masked_lm_weights,
+                  sens=None):
+        """Define the computation performed."""
+        weights = self.weights
+        loss = self.network(input_ids,
+                            input_mask,
+                            token_type_id,
+                            next_sentence_labels,
+                            masked_lm_positions,
+                            masked_lm_ids,
+                            masked_lm_weights)
+        if sens is None:
+            scaling_sens = self.loss_scale
+        else:
+            scaling_sens = sens
+        status, scaling_sens = self.start_overflow_check(loss, scaling_sens)
+        grads = self.grad(self.network, weights)(input_ids,
+                                                 input_mask,
+                                                 token_type_id,
+                                                 next_sentence_labels,
+                                                 masked_lm_positions,
+                                                 masked_lm_ids,
+                                                 masked_lm_weights,
+                                                 self.cast(scaling_sens,
+                                                           mstype.float32))
+        # apply grad reducer on grads
+        grads = self.grad_reducer(grads)
+        grads = self.hyper_map(F.partial(grad_scale, scaling_sens * self.degree), grads)
+        grads = self.hyper_map(F.partial(clip_grad, GRADIENT_CLIP_TYPE, GRADIENT_CLIP_VALUE), grads)
+
+        cond = self.get_overflow_status(status, grads)
+        overflow = cond
+        if sens is None:
+            overflow = self.loss_scaling_manager(self.loss_scale, cond)
+        if overflow:
+            succ = False
+        else:
+            succ = self.optimizer(grads)
+        ret = (loss, cond, scaling_sens)
+        return F.depend(ret, succ)
+
+
+class BertTrainOneStepWithLossScaleCellForAdam(nn.TrainOneStepWithLossScaleCell):
+    """
+    Encapsulation class of bert network training.
+
+    Append an optimizer to the training network after that the construct
+    function can be called to create the backward graph.
+    Different from BertTrainOneStepWithLossScaleCell, the optimizer takes the overflow
+    condition as input.
+
+    Args:
+        network (Cell): The training network. Note that loss function should have been added.
+        optimizer (Optimizer): Optimizer for updating the weights.
+        scale_update_cell (Cell): Cell to do the loss scale. Default: None.
+    """
+
+    def __init__(self, network, optimizer, scale_update_cell=None):
+        super(BertTrainOneStepWithLossScaleCellForAdam, self).__init__(network, optimizer, scale_update_cell)
+        self.cast = P.Cast()
+        self.degree = 1
+        if self.reducer_flag:
+            self.degree = get_group_size()
+            self.grad_reducer = DistributedGradReducer(optimizer.parameters, False, self.degree)
+        self.loss_scale = None
+        self.loss_scaling_manager = scale_update_cell
+        if scale_update_cell:
+            self.loss_scale = Parameter(Tensor(scale_update_cell.get_loss_scale(), dtype=mstype.float32))
+
+    def construct(self,
+                  input_ids,
+                  input_mask,
+                  token_type_id,
+                  next_sentence_labels,
+                  masked_lm_positions,
+                  masked_lm_ids,
+                  masked_lm_weights,
+                  sens=None):
+        """Define the computation performed."""
+        weights = self.weights
+        loss = self.network(input_ids,
+                            input_mask,
+                            token_type_id,
+                            next_sentence_labels,
+                            masked_lm_positions,
+                            masked_lm_ids,
+                            masked_lm_weights)
+        if sens is None:
+            scaling_sens = self.loss_scale
+        else:
+            scaling_sens = sens
+
+        status, scaling_sens = self.start_overflow_check(loss, scaling_sens)
+        grads = self.grad(self.network, weights)(input_ids,
+                                                 input_mask,
+                                                 token_type_id,
+                                                 next_sentence_labels,
+                                                 masked_lm_positions,
+                                                 masked_lm_ids,
+                                                 masked_lm_weights,
+                                                 self.cast(scaling_sens,
+                                                           mstype.float32))
+        # apply grad reducer on grads
+        grads = self.grad_reducer(grads)
+        grads = self.hyper_map(F.partial(grad_scale, scaling_sens * self.degree), grads)
+        grads = self.hyper_map(F.partial(clip_grad, GRADIENT_CLIP_TYPE, GRADIENT_CLIP_VALUE), grads)
+        cond = self.get_overflow_status(status, grads)
+        overflow = cond
+        if self.loss_scaling_manager is not None:
+            overflow = self.loss_scaling_manager(scaling_sens, cond)
+        succ = self.optimizer(grads, overflow)
+        ret = (loss, cond, scaling_sens)
+        return F.depend(ret, succ)
+
+
+cast = P.Cast()
+add_grads = C.MultitypeFuncGraph("add_grads")
+
+
+@add_grads.register("Tensor", "Tensor")
+def _add_grads(accu_grad, grad):
+    return accu_grad + cast(grad, mstype.float32)
+
+
+update_accu_grads = C.MultitypeFuncGraph("update_accu_grads")
+
+
+@update_accu_grads.register("Tensor", "Tensor")
+def _update_accu_grads(accu_grad, grad):
+    succ = True
+    return F.depend(succ, F.assign(accu_grad, cast(grad, mstype.float32)))
+
+
+accumulate_accu_grads = C.MultitypeFuncGraph("accumulate_accu_grads")
+
+
+@accumulate_accu_grads.register("Tensor", "Tensor")
+def _accumulate_accu_grads(accu_grad, grad):
+    succ = True
+    return F.depend(succ, F.assign_add(accu_grad, cast(grad, mstype.float32)))
+
+
+zeroslike = P.ZerosLike()
+reset_accu_grads = C.MultitypeFuncGraph("reset_accu_grads")
+
+
+@reset_accu_grads.register("Tensor")
+def _reset_accu_grads(accu_grad):
+    succ = True
+    return F.depend(succ, F.assign(accu_grad, zeroslike(accu_grad)))
+
+
+class BertTrainAccumulationAllReducePostWithLossScaleCell(nn.Cell):
+    """
+    Encapsulation class of bert network training.
+
+    Append an optimizer to the training network after that the construct
+    function can be called to create the backward graph.
+
+    To mimic higher batch size, gradients are accumulated N times before weight update.
+
+    For distribution mode, allreduce will only be implemented in the weight updated step,
+    i.e. the sub-step after gradients accumulated N times.
+
+    Args:
+        network (Cell): The training network. Note that loss function should have been added.
+        optimizer (Optimizer): Optimizer for updating the weights.
+        scale_update_cell (Cell): Cell to do the loss scale. Default: None.
+        accumulation_steps (int): Number of accumulation steps before gradient update. The global batch size =
+                                batch_size * accumulation_steps. Default: 1.
+    """
+
+    def __init__(self, network, optimizer, scale_update_cell=None, accumulation_steps=1, enable_global_norm=False):
+        super(BertTrainAccumulationAllReducePostWithLossScaleCell, self).__init__(auto_prefix=False)
+        self.network = network
+        self.network.set_grad()
+        self.weights = optimizer.parameters
+        self.optimizer = optimizer
+        self.accumulation_steps = accumulation_steps
+        self.enable_global_norm = enable_global_norm
+        self.one = Tensor(np.array([1]).astype(np.int32))
+        self.zero = Tensor(np.array([0]).astype(np.int32))
+        self.local_step = Parameter(initializer(0, [1], mstype.int32))
+        self.accu_grads = self.weights.clone(prefix="accu_grads", init='zeros')
+        self.accu_overflow = Parameter(initializer(0, [1], mstype.int32))
+        self.accu_loss = Parameter(initializer(0, [1], mstype.float32))
+
+        self.grad = C.GradOperation(get_by_list=True, sens_param=True)
+        self.reducer_flag = False
+        self.parallel_mode = context.get_auto_parallel_context("parallel_mode")
+        if self.parallel_mode in [ParallelMode.DATA_PARALLEL, ParallelMode.HYBRID_PARALLEL]:
+            self.reducer_flag = True
+        self.grad_reducer = F.identity
+        self.degree = 1
+        if self.reducer_flag:
+            self.degree = get_group_size()
+            self.grad_reducer = DistributedGradReducer(optimizer.parameters, False, self.degree)
+        self.is_distributed = (self.parallel_mode != ParallelMode.STAND_ALONE)
+        self.overflow_reducer = F.identity
+        if self.is_distributed:
+            self.overflow_reducer = P.AllReduce()
+        self.cast = P.Cast()
+        self.alloc_status = P.NPUAllocFloatStatus()
+        self.get_status = P.NPUGetFloatStatus()
+        self.clear_status = P.NPUClearFloatStatus()
+        self.reduce_sum = P.ReduceSum(keep_dims=False)
+        self.base = Tensor(1, mstype.float32)
+        self.less_equal = P.LessEqual()
+        self.logical_or = P.LogicalOr()
+        self.not_equal = P.NotEqual()
+        self.select = P.Select()
+        self.reshape = P.Reshape()
+        self.hyper_map = C.HyperMap()
+        self.loss_scale = None
+        self.loss_scaling_manager = scale_update_cell
+        if scale_update_cell:
+            self.loss_scale = Parameter(Tensor(scale_update_cell.get_loss_scale(), dtype=mstype.float32))
+
+    def construct(self,
+                  input_ids,
+                  input_mask,
+                  token_type_id,
+                  next_sentence_labels,
+                  masked_lm_positions,
+                  masked_lm_ids,
+                  masked_lm_weights,
+                  sens=None):
+        """Define the computation performed."""
+        weights = self.weights
+        loss = self.network(input_ids,
+                            input_mask,
+                            token_type_id,
+                            next_sentence_labels,
+                            masked_lm_positions,
+                            masked_lm_ids,
+                            masked_lm_weights)
+        if sens is None:
+            scaling_sens = self.loss_scale
+        else:
+            scaling_sens = sens
+        # alloc status and clear should be right before gradoperation
+        init = self.alloc_status()
+        init = F.depend(init, loss)
+        clear_status = self.clear_status(init)
+        scaling_sens = F.depend(scaling_sens, clear_status)
+        # update accumulation parameters
+        is_accu_step = self.not_equal(self.local_step, self.accumulation_steps)
+        self.local_step = self.select(is_accu_step, self.local_step + self.one, self.one)
+        self.accu_loss = self.select(is_accu_step, self.accu_loss + loss, loss)
+        mean_loss = self.accu_loss / self.local_step
+        is_accu_step = self.not_equal(self.local_step, self.accumulation_steps)
+
+        grads = self.grad(self.network, weights)(input_ids,
+                                                 input_mask,
+                                                 token_type_id,
+                                                 next_sentence_labels,
+                                                 masked_lm_positions,
+                                                 masked_lm_ids,
+                                                 masked_lm_weights,
+                                                 self.cast(scaling_sens,
+                                                           mstype.float32))
+
+        accu_succ = self.hyper_map(accumulate_accu_grads, self.accu_grads, grads)
+        mean_loss = F.depend(mean_loss, accu_succ)
+
+        init = F.depend(init, mean_loss)
+        get_status = self.get_status(init)
+        init = F.depend(init, get_status)
+        flag_sum = self.reduce_sum(init, (0,))
+        overflow = self.less_equal(self.base, flag_sum)
+        overflow = self.logical_or(self.not_equal(self.accu_overflow, self.zero), overflow)
+        accu_overflow = self.select(overflow, self.one, self.zero)
+        self.accu_overflow = self.select(is_accu_step, accu_overflow, self.zero)
+
+        if is_accu_step:
+            succ = False
+        else:
+            # apply grad reducer on grads
+            grads = self.grad_reducer(self.accu_grads)
+            scaling = scaling_sens * self.degree * self.accumulation_steps
+            grads = self.hyper_map(F.partial(grad_scale, scaling), grads)
+            if self.enable_global_norm:
+                grads = C.clip_by_global_norm(grads, 1.0, None)
+            else:
+                grads = self.hyper_map(F.partial(clip_grad, GRADIENT_CLIP_TYPE, GRADIENT_CLIP_VALUE), grads)
+            accu_overflow = F.depend(accu_overflow, grads)
+            accu_overflow = self.overflow_reducer(accu_overflow)
+            overflow = self.less_equal(self.base, accu_overflow)
+            accu_succ = self.hyper_map(reset_accu_grads, self.accu_grads)
+            overflow = F.depend(overflow, accu_succ)
+            overflow = self.reshape(overflow, (()))
+            if sens is None:
+                overflow = self.loss_scaling_manager(self.loss_scale, overflow)
+            if overflow:
+                succ = False
+            else:
+                succ = self.optimizer(grads)
+
+        ret = (mean_loss, overflow, scaling_sens)
+        return F.depend(ret, succ)
+
+
+class BertTrainAccumulationAllReduceEachWithLossScaleCell(nn.Cell):
+    """
+    Encapsulation class of bert network training.
+
+    Append an optimizer to the training network after that the construct
+    function can be called to create the backward graph.
+
+    To mimic higher batch size, gradients are accumulated N times before weight update.
+
+    For distribution mode, allreduce will be implemented after each sub-step and the trailing time
+    will be overided by backend optimization pass.
+
+    Args:
+        network (Cell): The training network. Note that loss function should have been added.
+        optimizer (Optimizer): Optimizer for updating the weights.
+        scale_update_cell (Cell): Cell to do the loss scale. Default: None.
+        accumulation_steps (int): Number of accumulation steps before gradient update. The global batch size =
+                                  batch_size * accumulation_steps. Default: 1.
+    """
+
+    def __init__(self, network, optimizer, scale_update_cell=None, accumulation_steps=1, enable_global_norm=False):
+        super(BertTrainAccumulationAllReduceEachWithLossScaleCell, self).__init__(auto_prefix=False)
+        self.network = network
+        self.network.set_grad()
+        self.weights = optimizer.parameters
+        self.optimizer = optimizer
+        self.accumulation_steps = accumulation_steps
+        self.enable_global_norm = enable_global_norm
+        self.one = Tensor(np.array([1]).astype(np.int32))
+        self.zero = Tensor(np.array([0]).astype(np.int32))
+        self.local_step = Parameter(initializer(0, [1], mstype.int32))
+        self.accu_grads = self.weights.clone(prefix="accu_grads", init='zeros')
+        self.accu_overflow = Parameter(initializer(0, [1], mstype.int32))
+        self.accu_loss = Parameter(initializer(0, [1], mstype.float32))
+
+        self.grad = C.GradOperation(get_by_list=True, sens_param=True)
+        self.reducer_flag = False
+        self.parallel_mode = context.get_auto_parallel_context("parallel_mode")
+        if self.parallel_mode in [ParallelMode.DATA_PARALLEL, ParallelMode.HYBRID_PARALLEL]:
+            self.reducer_flag = True
+        self.grad_reducer = F.identity
+        self.degree = 1
+        if self.reducer_flag:
+            self.degree = get_group_size()
+            self.grad_reducer = DistributedGradReducer(optimizer.parameters, False, self.degree)
+        self.is_distributed = (self.parallel_mode != ParallelMode.STAND_ALONE)
+        self.overflow_reducer = F.identity
+        if self.is_distributed:
+            self.overflow_reducer = P.AllReduce()
+        self.cast = P.Cast()
+        self.alloc_status = P.NPUAllocFloatStatus()
+        self.get_status = P.NPUGetFloatStatus()
+        self.clear_before_grad = P.NPUClearFloatStatus()
+        self.reduce_sum = P.ReduceSum(keep_dims=False)
+        self.base = Tensor(1, mstype.float32)
+        self.less_equal = P.LessEqual()
+        self.logical_or = P.LogicalOr()
+        self.not_equal = P.NotEqual()
+        self.select = P.Select()
+        self.reshape = P.Reshape()
+        self.hyper_map = C.HyperMap()
+        self.loss_scale = None
+        self.loss_scaling_manager = scale_update_cell
+        if scale_update_cell:
+            self.loss_scale = Parameter(Tensor(scale_update_cell.get_loss_scale(), dtype=mstype.float32))
+
+    @C.add_flags(has_effect=True)
+    def construct(self,
+                  input_ids,
+                  input_mask,
+                  token_type_id,
+                  next_sentence_labels,
+                  masked_lm_positions,
+                  masked_lm_ids,
+                  masked_lm_weights,
+                  sens=None):
+        """Define the computation performed."""
+        weights = self.weights
+        loss = self.network(input_ids,
+                            input_mask,
+                            token_type_id,
+                            next_sentence_labels,
+                            masked_lm_positions,
+                            masked_lm_ids,
+                            masked_lm_weights)
+        if sens is None:
+            scaling_sens = self.loss_scale
+        else:
+            scaling_sens = sens
+
+        # update accumulation parameters
+        is_accu_step = self.not_equal(self.local_step, self.accumulation_steps)
+        self.local_step = self.select(is_accu_step, self.local_step + self.one, self.one)
+        self.accu_loss = self.select(is_accu_step, self.accu_loss + loss, loss)
+        mean_loss = self.accu_loss / self.local_step
+        is_accu_step = self.not_equal(self.local_step, self.accumulation_steps)
+
+        # alloc status and clear should be right before gradoperation
+        init = self.alloc_status()
+        self.clear_before_grad(init)
+        grads = self.grad(self.network, weights)(input_ids,
+                                                 input_mask,
+                                                 token_type_id,
+                                                 next_sentence_labels,
+                                                 masked_lm_positions,
+                                                 masked_lm_ids,
+                                                 masked_lm_weights,
+                                                 self.cast(scaling_sens,
+                                                           mstype.float32))
+
+        accu_grads = self.hyper_map(add_grads, self.accu_grads, grads)
+        scaling = scaling_sens * self.degree * self.accumulation_steps
+        grads = self.hyper_map(F.partial(grad_scale, scaling), accu_grads)
+        grads = self.grad_reducer(grads)
+
+        self.get_status(init)
+        flag_sum = self.reduce_sum(init, (0,))
+        flag_reduce = self.overflow_reducer(flag_sum)
+        overflow = self.less_equal(self.base, flag_reduce)
+        overflow = self.logical_or(self.not_equal(self.accu_overflow, self.zero), overflow)
+        accu_overflow = self.select(overflow, self.one, self.zero)
+        self.accu_overflow = self.select(is_accu_step, accu_overflow, self.zero)
+        overflow = self.reshape(overflow, (()))
+
+        if is_accu_step:
+            succ = False
+            accu_succ = self.hyper_map(update_accu_grads, self.accu_grads, accu_grads)
+            succ = F.depend(succ, accu_succ)
+        else:
+            if sens is None:
+                overflow = self.loss_scaling_manager(self.loss_scale, overflow)
+            if overflow:
+                succ = False
+            else:
+                if self.enable_global_norm:
+                    grads = C.clip_by_global_norm(grads, 1.0, None)
+                else:
+                    grads = self.hyper_map(F.partial(clip_grad, GRADIENT_CLIP_TYPE, GRADIENT_CLIP_VALUE), grads)
+
+                succ = self.optimizer(grads)
+
+            accu_succ = self.hyper_map(reset_accu_grads, self.accu_grads)
+            succ = F.depend(succ, accu_succ)
+
+        ret = (mean_loss, overflow, scaling_sens)
+        return F.depend(ret, succ)
diff --git a/vega/algorithms/nlp/src/bert_model.py b/vega/algorithms/nlp/src/bert_model.py
new file mode 100644
index 00000000..2ead0069
--- /dev/null
+++ b/vega/algorithms/nlp/src/bert_model.py
@@ -0,0 +1,891 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Bert model."""
+
+import math
+import copy
+import numpy as np
+import mindspore.common.dtype as mstype
+import mindspore.nn as nn
+import mindspore.ops.functional as F
+from mindspore.common.initializer import TruncatedNormal, initializer
+from mindspore.ops import operations as P
+from mindspore.ops import composite as C
+from mindspore.common.tensor import Tensor
+from mindspore.common.parameter import Parameter
+
+
+class BertConfig:
+    """
+    Configurate for `BertModel`.
+
+    Args:
+        seq_length (int): Length of input sequence. Default: 128.
+        vocab_size (int): The shape of each embedding vector. Default: 32000.
+        hidden_size (int): Size of the bert encoder layers. Default: 768.
+        num_hidden_layers (int): Number of hidden layers in the BertTransformer encoder
+                           cell. Default: 12.
+        num_attention_heads (int): Number of attention heads in the BertTransformer
+                             encoder cell. Default: 12.
+        intermediate_size (int): Size of intermediate layer in the BertTransformer
+                           encoder cell. Default: 3072.
+        hidden_act (str): Activation function used in the BertTransformer encoder
+                    cell. Default: "gelu".
+        hidden_dropout_prob (float): The dropout probability for BertOutput. Default: 0.1.
+        attention_probs_dropout_prob (float): The dropout probability for
+                                      BertAttention. Default: 0.1.
+        max_position_embeddings (int): Maximum length of sequences used in this
+                                 model. Default: 512.
+        type_vocab_size (int): Size of token type vocab. Default: 16.
+        initializer_range (float): Initialization value of TruncatedNormal. Default: 0.02.
+        use_relative_positions (bool): Specifies whether to use relative positions. Default: False.
+        dtype (:class:`mindspore.dtype`): Data type of the input. Default: mstype.float32.
+        compute_type (:class:`mindspore.dtype`): Compute type in BertTransformer. Default: mstype.float32.
+    """
+
+    def __init__(self,
+                 seq_length=128,
+                 vocab_size=32000,
+                 hidden_size=768,
+                 num_hidden_layers=12,
+                 num_attention_heads=12,
+                 intermediate_size=3072,
+                 hidden_act="gelu",
+                 hidden_dropout_prob=0.1,
+                 attention_probs_dropout_prob=0.1,
+                 max_position_embeddings=512,
+                 type_vocab_size=16,
+                 initializer_range=0.02,
+                 use_relative_positions=False,
+                 dtype=mstype.float32,
+                 compute_type=mstype.float32):
+        self.seq_length = seq_length
+        self.vocab_size = vocab_size
+        self.hidden_size = hidden_size
+        self.num_hidden_layers = num_hidden_layers
+        self.num_attention_heads = num_attention_heads
+        self.hidden_act = hidden_act
+        self.intermediate_size = intermediate_size
+        self.hidden_dropout_prob = hidden_dropout_prob
+        self.attention_probs_dropout_prob = attention_probs_dropout_prob
+        self.max_position_embeddings = max_position_embeddings
+        self.type_vocab_size = type_vocab_size
+        self.initializer_range = initializer_range
+        self.use_relative_positions = use_relative_positions
+        self.dtype = dtype
+        self.compute_type = compute_type
+
+
+class EmbeddingLookup(nn.Cell):
+    """
+    Embedding lookup table with a fixed dictionary and size.
+
+    Args:
+        vocab_size (int): Size of the dictionary of embeddings.
+        embedding_size (int): The size of each embedding vector.
+        embedding_shape (list): [batch_size, seq_length, embedding_size], the shape of
+                         each embedding vector.
+        use_one_hot_embeddings (bool): Specifies whether to use one hot encoding form. Default: False.
+        initializer_range (float): Initialization value of TruncatedNormal. Default: 0.02.
+    """
+
+    def __init__(self,
+                 vocab_size,
+                 embedding_size,
+                 embedding_shape,
+                 use_one_hot_embeddings=False,
+                 initializer_range=0.02):
+        super(EmbeddingLookup, self).__init__()
+        self.vocab_size = vocab_size
+        self.use_one_hot_embeddings = use_one_hot_embeddings
+        self.embedding_table = Parameter(initializer
+                                         (TruncatedNormal(initializer_range),
+                                          [vocab_size, embedding_size]))
+        self.expand = P.ExpandDims()
+        self.shape_flat = (-1,)
+        self.gather = P.Gather()
+        self.one_hot = P.OneHot()
+        self.on_value = Tensor(1.0, mstype.float32)
+        self.off_value = Tensor(0.0, mstype.float32)
+        self.array_mul = P.MatMul()
+        self.reshape = P.Reshape()
+        self.shape = tuple(embedding_shape)
+
+    def construct(self, input_ids):
+        """Get output and embeddings lookup table."""
+        extended_ids = self.expand(input_ids, -1)
+        flat_ids = self.reshape(extended_ids, self.shape_flat)
+        if self.use_one_hot_embeddings:
+            one_hot_ids = self.one_hot(flat_ids, self.vocab_size, self.on_value, self.off_value)
+            output_for_reshape = self.array_mul(
+                one_hot_ids, self.embedding_table)
+        else:
+            output_for_reshape = self.gather(self.embedding_table, flat_ids, 0)
+        output = self.reshape(output_for_reshape, self.shape)
+        return output, self.embedding_table
+
+
+class EmbeddingPostprocessor(nn.Cell):
+    """
+    Postprocessor apply positional and token type embeddings to word embeddings.
+
+    Args:
+        embedding_size (int): The size of each embedding vector.
+        embedding_shape (list): [batch_size, seq_length, embedding_size], the shape of
+                         each embedding vector.
+        use_token_type (bool): Specifies whether to use token type embeddings. Default: False.
+        token_type_vocab_size (int): Size of token type vocab. Default: 16.
+        use_one_hot_embeddings (bool): Specifies whether to use one hot encoding form. Default: False.
+        initializer_range (float): Initialization value of TruncatedNormal. Default: 0.02.
+        max_position_embeddings (int): Maximum length of sequences used in this
+                                 model. Default: 512.
+        dropout_prob (float): The dropout probability. Default: 0.1.
+    """
+
+    def __init__(self,
+                 embedding_size,
+                 embedding_shape,
+                 use_relative_positions=False,
+                 use_token_type=False,
+                 token_type_vocab_size=16,
+                 use_one_hot_embeddings=False,
+                 initializer_range=0.02,
+                 max_position_embeddings=512,
+                 dropout_prob=0.1):
+        super(EmbeddingPostprocessor, self).__init__()
+        self.use_token_type = use_token_type
+        self.token_type_vocab_size = token_type_vocab_size
+        self.use_one_hot_embeddings = use_one_hot_embeddings
+        self.max_position_embeddings = max_position_embeddings
+        self.token_type_embedding = nn.Embedding(
+            vocab_size=token_type_vocab_size,
+            embedding_size=embedding_size,
+            use_one_hot=use_one_hot_embeddings)
+        self.shape_flat = (-1,)
+        self.one_hot = P.OneHot()
+        self.on_value = Tensor(1.0, mstype.float32)
+        self.off_value = Tensor(0.1, mstype.float32)
+        self.array_mul = P.MatMul()
+        self.reshape = P.Reshape()
+        self.shape = tuple(embedding_shape)
+        self.dropout = nn.Dropout(1 - dropout_prob)
+        self.gather = P.Gather()
+        self.use_relative_positions = use_relative_positions
+        self.slice = P.StridedSlice()
+        _, seq, _ = self.shape
+        self.full_position_embedding = nn.Embedding(
+            vocab_size=max_position_embeddings,
+            embedding_size=embedding_size,
+            use_one_hot=False)
+        self.layernorm = nn.LayerNorm((embedding_size,))
+        self.position_ids = Tensor(np.arange(seq).reshape(-1, seq).astype(np.int32))
+        self.add = P.Add()
+
+    def construct(self, token_type_ids, word_embeddings):
+        """Construct the trainer of Bert."""
+        output = word_embeddings
+        if self.use_token_type:
+            token_type_embeddings = self.token_type_embedding(token_type_ids)
+            output = self.add(output, token_type_embeddings)
+        if not self.use_relative_positions:
+            position_embeddings = self.full_position_embedding(self.position_ids)
+            output = self.add(output, position_embeddings)
+        output = self.layernorm(output)
+        output = self.dropout(output)
+        return output
+
+
+class BertOutput(nn.Cell):
+    """
+    Apply a linear computation to hidden status and a residual computation to input.
+
+    Args:
+        in_channels (int): Input channels.
+        out_channels (int): Output channels.
+        initializer_range (float): Initialization value of TruncatedNormal. Default: 0.02.
+        dropout_prob (float): The dropout probability. Default: 0.1.
+        compute_type (:class:`mindspore.dtype`): Compute type in BertTransformer. Default: mstype.float32.
+    """
+
+    def __init__(self,
+                 in_channels,
+                 out_channels,
+                 initializer_range=0.02,
+                 dropout_prob=0.1,
+                 compute_type=mstype.float32):
+        super(BertOutput, self).__init__()
+        self.dense = nn.Dense(in_channels, out_channels,
+                              weight_init=TruncatedNormal(initializer_range)).to_float(compute_type)
+        self.dropout = nn.Dropout(1 - dropout_prob)
+        self.dropout_prob = dropout_prob
+        self.add = P.Add()
+        self.layernorm = nn.LayerNorm((out_channels,)).to_float(compute_type)
+        self.cast = P.Cast()
+
+    def construct(self, hidden_status, input_tensor):
+        """Construct the trainer of Bert."""
+        output = self.dense(hidden_status)
+        output = self.dropout(output)
+        output = self.add(input_tensor, output)
+        output = self.layernorm(output)
+        return output
+
+
+class RelaPosMatrixGenerator(nn.Cell):
+    """
+    Generate matrix of relative positions between inputs.
+
+    Args:
+        length (int): Length of one dim for the matrix to be generated.
+        max_relative_position (int): Max value of relative position.
+    """
+
+    def __init__(self, length, max_relative_position):
+        super(RelaPosMatrixGenerator, self).__init__()
+        self._length = length
+        self._max_relative_position = max_relative_position
+        self._min_relative_position = -max_relative_position
+        self.range_length = -length + 1
+
+        self.tile = P.Tile()
+        self.range_mat = P.Reshape()
+        self.sub = P.Sub()
+        self.expanddims = P.ExpandDims()
+        self.cast = P.Cast()
+
+    def construct(self):
+        """Generate matrix of relative positions between inputs."""
+        range_vec_row_out = self.cast(F.tuple_to_array(F.make_range(self._length)), mstype.int32)
+        range_vec_col_out = self.range_mat(range_vec_row_out, (self._length, -1))
+        tile_row_out = self.tile(range_vec_row_out, (self._length,))
+        tile_col_out = self.tile(range_vec_col_out, (1, self._length))
+        range_mat_out = self.range_mat(tile_row_out, (self._length, self._length))
+        transpose_out = self.range_mat(tile_col_out, (self._length, self._length))
+        distance_mat = self.sub(range_mat_out, transpose_out)
+
+        distance_mat_clipped = C.clip_by_value(distance_mat,
+                                               self._min_relative_position,
+                                               self._max_relative_position)
+
+        # Shift values to be >=0. Each integer still uniquely identifies a
+        # relative position difference.
+        final_mat = distance_mat_clipped + self._max_relative_position
+        return final_mat
+
+
+class RelaPosEmbeddingsGenerator(nn.Cell):
+    """
+    Generate tensor of size [length, length, depth].
+
+    Args:
+        length (int): Length of one dim for the matrix to be generated.
+        depth (int): Size of each attention head.
+        max_relative_position (int): Maxmum value of relative position.
+        initializer_range (float): Initialization value of TruncatedNormal.
+        use_one_hot_embeddings (bool): Specifies whether to use one hot encoding form. Default: False.
+    """
+
+    def __init__(self,
+                 length,
+                 depth,
+                 max_relative_position,
+                 initializer_range,
+                 use_one_hot_embeddings=False):
+        super(RelaPosEmbeddingsGenerator, self).__init__()
+        self.depth = depth
+        self.vocab_size = max_relative_position * 2 + 1
+        self.use_one_hot_embeddings = use_one_hot_embeddings
+
+        self.embeddings_table = Parameter(
+            initializer(TruncatedNormal(initializer_range),
+                        [self.vocab_size, self.depth]))
+
+        self.relative_positions_matrix = RelaPosMatrixGenerator(length=length,
+                                                                max_relative_position=max_relative_position)
+        self.reshape = P.Reshape()
+        self.one_hot = nn.OneHot(depth=self.vocab_size)
+        self.shape = P.Shape()
+        self.gather = P.Gather()  # index_select
+        self.matmul = P.BatchMatMul()
+
+    def construct(self):
+        """Generate embedding for each relative position of dimension depth."""
+        relative_positions_matrix_out = self.relative_positions_matrix()
+
+        if self.use_one_hot_embeddings:
+            flat_relative_positions_matrix = self.reshape(relative_positions_matrix_out, (-1,))
+            one_hot_relative_positions_matrix = self.one_hot(
+                flat_relative_positions_matrix)
+            embeddings = self.matmul(one_hot_relative_positions_matrix, self.embeddings_table)
+            my_shape = self.shape(relative_positions_matrix_out) + (self.depth,)
+            embeddings = self.reshape(embeddings, my_shape)
+        else:
+            embeddings = self.gather(self.embeddings_table,
+                                     relative_positions_matrix_out, 0)
+        return embeddings
+
+
+class SaturateCast(nn.Cell):
+    """
+    Perform a safe saturating cast.
+
+    Args:
+        src_type (:class:`mindspore.dtype`): The type of the elements of the input tensor. Default: mstype.float32.
+        dst_type (:class:`mindspore.dtype`): The type of the elements of the output tensor. Default: mstype.float32.
+    """
+
+    def __init__(self, src_type=mstype.float32, dst_type=mstype.float32):
+        super(SaturateCast, self).__init__()
+        np_type = mstype.dtype_to_nptype(dst_type)
+
+        self.tensor_min_type = float(np.finfo(np_type).min)
+        self.tensor_max_type = float(np.finfo(np_type).max)
+
+        self.min_op = P.Minimum()
+        self.max_op = P.Maximum()
+        self.cast = P.Cast()
+        self.dst_type = dst_type
+
+    def construct(self, x):
+        """Construct the trainer of Bert."""
+        out = self.max_op(x, self.tensor_min_type)
+        out = self.min_op(out, self.tensor_max_type)
+        return self.cast(out, self.dst_type)
+
+
+class BertAttention(nn.Cell):
+    """
+    Apply multi-headed attention from "from_tensor" to "to_tensor".
+
+    Args:
+        from_tensor_width (int): Size of last dim of from_tensor.
+        to_tensor_width (int): Size of last dim of to_tensor.
+        from_seq_length (int): Length of from_tensor sequence.
+        to_seq_length (int): Length of to_tensor sequence.
+        num_attention_heads (int): Number of attention heads. Default: 1.
+        size_per_head (int): Size of each attention head. Default: 512.
+        query_act (str): Activation function for the query transform. Default: None.
+        key_act (str): Activation function for the key transform. Default: None.
+        value_act (str): Activation function for the value transform. Default: None.
+        has_attention_mask (bool): Specifies whether to use attention mask. Default: False.
+        attention_probs_dropout_prob (float): The dropout probability for
+                                      BertAttention. Default: 0.0.
+        use_one_hot_embeddings (bool): Specifies whether to use one hot encoding form. Default: False.
+        initializer_range (float): Initialization value of TruncatedNormal. Default: 0.02.
+        do_return_2d_tensor (bool): True for return 2d tensor. False for return 3d
+                             tensor. Default: False.
+        use_relative_positions (bool): Specifies whether to use relative positions. Default: False.
+        compute_type (:class:`mindspore.dtype`): Compute type in BertAttention. Default: mstype.float32.
+    """
+
+    def __init__(self,
+                 from_tensor_width,
+                 to_tensor_width,
+                 from_seq_length,
+                 to_seq_length,
+                 num_attention_heads=1,
+                 size_per_head=512,
+                 query_act=None,
+                 key_act=None,
+                 value_act=None,
+                 has_attention_mask=False,
+                 attention_probs_dropout_prob=0.0,
+                 use_one_hot_embeddings=False,
+                 initializer_range=0.02,
+                 do_return_2d_tensor=False,
+                 use_relative_positions=False,
+                 compute_type=mstype.float32):
+
+        super(BertAttention, self).__init__()
+        self.from_seq_length = from_seq_length
+        self.to_seq_length = to_seq_length
+        self.num_attention_heads = num_attention_heads
+        self.size_per_head = size_per_head
+        self.has_attention_mask = has_attention_mask
+        self.use_relative_positions = use_relative_positions
+
+        self.scores_mul = 1.0 / math.sqrt(float(self.size_per_head))
+        self.reshape = P.Reshape()
+        self.shape_from_2d = (-1, from_tensor_width)
+        self.shape_to_2d = (-1, to_tensor_width)
+        weight = TruncatedNormal(initializer_range)
+        units = num_attention_heads * size_per_head
+        self.query_layer = nn.Dense(from_tensor_width,
+                                    units,
+                                    activation=query_act,
+                                    weight_init=weight).to_float(compute_type)
+        self.key_layer = nn.Dense(to_tensor_width,
+                                  units,
+                                  activation=key_act,
+                                  weight_init=weight).to_float(compute_type)
+        self.value_layer = nn.Dense(to_tensor_width,
+                                    units,
+                                    activation=value_act,
+                                    weight_init=weight).to_float(compute_type)
+
+        self.shape_from = (-1, from_seq_length, num_attention_heads, size_per_head)
+        self.shape_to = (-1, to_seq_length, num_attention_heads, size_per_head)
+
+        self.matmul_trans_b = P.BatchMatMul(transpose_b=True)
+        self.multiply = P.Mul()
+        self.transpose = P.Transpose()
+        self.trans_shape = (0, 2, 1, 3)
+        self.trans_shape_relative = (2, 0, 1, 3)
+        self.trans_shape_position = (1, 2, 0, 3)
+        self.multiply_data = -10000.0
+        self.matmul = P.BatchMatMul()
+
+        self.softmax = nn.Softmax()
+        self.dropout = nn.Dropout(1 - attention_probs_dropout_prob)
+
+        if self.has_attention_mask:
+            self.expand_dims = P.ExpandDims()
+            self.sub = P.Sub()
+            self.add = P.Add()
+            self.cast = P.Cast()
+            self.get_dtype = P.DType()
+        if do_return_2d_tensor:
+            self.shape_return = (-1, num_attention_heads * size_per_head)
+        else:
+            self.shape_return = (-1, from_seq_length, num_attention_heads * size_per_head)
+
+        self.cast_compute_type = SaturateCast(dst_type=compute_type)
+        if self.use_relative_positions:
+            self._generate_relative_positions_embeddings = \
+                RelaPosEmbeddingsGenerator(length=to_seq_length,
+                                           depth=size_per_head,
+                                           max_relative_position=16,
+                                           initializer_range=initializer_range,
+                                           use_one_hot_embeddings=use_one_hot_embeddings)
+
+    def construct(self, from_tensor, to_tensor, attention_mask):
+        """Reshape 2d/3d input tensors to 2d."""
+        from_tensor_2d = self.reshape(from_tensor, self.shape_from_2d)
+        to_tensor_2d = self.reshape(to_tensor, self.shape_to_2d)
+        query_out = self.query_layer(from_tensor_2d)
+        key_out = self.key_layer(to_tensor_2d)
+        value_out = self.value_layer(to_tensor_2d)
+
+        query_layer = self.reshape(query_out, self.shape_from)
+        query_layer = self.transpose(query_layer, self.trans_shape)
+        key_layer = self.reshape(key_out, self.shape_to)
+        key_layer = self.transpose(key_layer, self.trans_shape)
+
+        attention_scores = self.matmul_trans_b(query_layer, key_layer)
+
+        # use_relative_position, supplementary logic
+        if self.use_relative_positions:
+            # relations_keys is [F|T, F|T, H]
+            relations_keys = self._generate_relative_positions_embeddings()
+            relations_keys = self.cast_compute_type(relations_keys)
+            # query_layer_t is [F, B, N, H]
+            query_layer_t = self.transpose(query_layer, self.trans_shape_relative)
+            # query_layer_r is [F, B * N, H]
+            query_layer_r = self.reshape(query_layer_t,
+                                         (self.from_seq_length,
+                                          -1,
+                                          self.size_per_head))
+            # key_position_scores is [F, B * N, F|T]
+            key_position_scores = self.matmul_trans_b(query_layer_r,
+                                                      relations_keys)
+            # key_position_scores_r is [F, B, N, F|T]
+            key_position_scores_r = self.reshape(key_position_scores,
+                                                 (self.from_seq_length,
+                                                  -1,
+                                                  self.num_attention_heads,
+                                                  self.from_seq_length))
+            # key_position_scores_r_t is [B, N, F, F|T]
+            key_position_scores_r_t = self.transpose(key_position_scores_r,
+                                                     self.trans_shape_position)
+            attention_scores = attention_scores + key_position_scores_r_t
+
+        attention_scores = self.multiply(self.scores_mul, attention_scores)
+
+        if self.has_attention_mask:
+            attention_mask = self.expand_dims(attention_mask, 1)
+            multiply_out = self.sub(self.cast(F.tuple_to_array((1.0,)), self.get_dtype(attention_scores)),
+                                    self.cast(attention_mask, self.get_dtype(attention_scores)))
+
+            adder = self.multiply(multiply_out, self.multiply_data)
+            attention_scores = self.add(adder, attention_scores)
+
+        attention_probs = self.softmax(attention_scores)
+        attention_probs = self.dropout(attention_probs)
+
+        value_layer = self.reshape(value_out, self.shape_to)
+        value_layer = self.transpose(value_layer, self.trans_shape)
+        context_layer = self.matmul(attention_probs, value_layer)
+
+        # use_relative_position, supplementary logic
+        if self.use_relative_positions:
+            # relations_values is [F|T, F|T, H]
+            relations_values = self._generate_relative_positions_embeddings()
+            relations_values = self.cast_compute_type(relations_values)
+            # attention_probs_t is [F, B, N, T]
+            attention_probs_t = self.transpose(attention_probs, self.trans_shape_relative)
+            # attention_probs_r is [F, B * N, T]
+            attention_probs_r = self.reshape(
+                attention_probs_t,
+                (self.from_seq_length,
+                 -1,
+                 self.to_seq_length))
+            # value_position_scores is [F, B * N, H]
+            value_position_scores = self.matmul(attention_probs_r,
+                                                relations_values)
+            # value_position_scores_r is [F, B, N, H]
+            value_position_scores_r = self.reshape(value_position_scores,
+                                                   (self.from_seq_length,
+                                                    -1,
+                                                    self.num_attention_heads,
+                                                    self.size_per_head))
+            # value_position_scores_r_t is [B, N, F, H]
+            value_position_scores_r_t = self.transpose(value_position_scores_r,
+                                                       self.trans_shape_position)
+            context_layer = context_layer + value_position_scores_r_t
+
+        context_layer = self.transpose(context_layer, self.trans_shape)
+        context_layer = self.reshape(context_layer, self.shape_return)
+
+        return context_layer
+
+
+class BertSelfAttention(nn.Cell):
+    """
+    Apply self-attention.
+
+    Args:
+        seq_length (int): Length of input sequence.
+        hidden_size (int): Size of the bert encoder layers.
+        num_attention_heads (int): Number of attention heads. Default: 12.
+        attention_probs_dropout_prob (float): The dropout probability for
+                                      BertAttention. Default: 0.1.
+        use_one_hot_embeddings (bool): Specifies whether to use one_hot encoding form. Default: False.
+        initializer_range (float): Initialization value of TruncatedNormal. Default: 0.02.
+        hidden_dropout_prob (float): The dropout probability for BertOutput. Default: 0.1.
+        use_relative_positions (bool): Specifies whether to use relative positions. Default: False.
+        compute_type (:class:`mindspore.dtype`): Compute type in BertSelfAttention. Default: mstype.float32.
+    """
+
+    def __init__(self,
+                 seq_length,
+                 hidden_size,
+                 num_attention_heads=12,
+                 attention_probs_dropout_prob=0.1,
+                 use_one_hot_embeddings=False,
+                 initializer_range=0.02,
+                 hidden_dropout_prob=0.1,
+                 use_relative_positions=False,
+                 compute_type=mstype.float32):
+        super(BertSelfAttention, self).__init__()
+        if hidden_size % num_attention_heads != 0:
+            raise ValueError("The hidden size (%d) is not a multiple of the number "
+                             "of attention heads (%d)" % (hidden_size, num_attention_heads))
+
+        self.size_per_head = int(hidden_size / num_attention_heads)
+
+        self.attention = BertAttention(
+            from_tensor_width=hidden_size,
+            to_tensor_width=hidden_size,
+            from_seq_length=seq_length,
+            to_seq_length=seq_length,
+            num_attention_heads=num_attention_heads,
+            size_per_head=self.size_per_head,
+            attention_probs_dropout_prob=attention_probs_dropout_prob,
+            use_one_hot_embeddings=use_one_hot_embeddings,
+            initializer_range=initializer_range,
+            use_relative_positions=use_relative_positions,
+            has_attention_mask=True,
+            do_return_2d_tensor=True,
+            compute_type=compute_type)
+
+        self.output = BertOutput(in_channels=hidden_size,
+                                 out_channels=hidden_size,
+                                 initializer_range=initializer_range,
+                                 dropout_prob=hidden_dropout_prob,
+                                 compute_type=compute_type)
+        self.reshape = P.Reshape()
+        self.shape = (-1, hidden_size)
+
+    def construct(self, input_tensor, attention_mask):
+        """Construct the trainer of Bert."""
+        input_tensor = self.reshape(input_tensor, self.shape)
+        attention_output = self.attention(input_tensor, input_tensor, attention_mask)
+        output = self.output(attention_output, input_tensor)
+        return output
+
+
+class BertEncoderCell(nn.Cell):
+    """
+    Encode cells used in BertTransformer.
+
+    Args:
+        hidden_size (int): Size of the bert encoder layers. Default: 768.
+        seq_length (int): Length of input sequence. Default: 512.
+        num_attention_heads (int): Number of attention heads. Default: 12.
+        intermediate_size (int): Size of intermediate layer. Default: 3072.
+        attention_probs_dropout_prob (float): The dropout probability for
+                                      BertAttention. Default: 0.02.
+        use_one_hot_embeddings (bool): Specifies whether to use one hot encoding form. Default: False.
+        initializer_range (float): Initialization value of TruncatedNormal. Default: 0.02.
+        hidden_dropout_prob (float): The dropout probability for BertOutput. Default: 0.1.
+        use_relative_positions (bool): Specifies whether to use relative positions. Default: False.
+        hidden_act (str): Activation function. Default: "gelu".
+        compute_type (:class:`mindspore.dtype`): Compute type in attention. Default: mstype.float32.
+    """
+
+    def __init__(self,
+                 hidden_size=768,
+                 seq_length=512,
+                 num_attention_heads=12,
+                 intermediate_size=3072,
+                 attention_probs_dropout_prob=0.02,
+                 use_one_hot_embeddings=False,
+                 initializer_range=0.02,
+                 hidden_dropout_prob=0.1,
+                 use_relative_positions=False,
+                 hidden_act="gelu",
+                 compute_type=mstype.float32):
+        super(BertEncoderCell, self).__init__()
+        self.attention = BertSelfAttention(
+            hidden_size=hidden_size,
+            seq_length=seq_length,
+            num_attention_heads=num_attention_heads,
+            attention_probs_dropout_prob=attention_probs_dropout_prob,
+            use_one_hot_embeddings=use_one_hot_embeddings,
+            initializer_range=initializer_range,
+            hidden_dropout_prob=hidden_dropout_prob,
+            use_relative_positions=use_relative_positions,
+            compute_type=compute_type)
+        self.intermediate = nn.Dense(in_channels=hidden_size,
+                                     out_channels=intermediate_size,
+                                     activation=hidden_act,
+                                     weight_init=TruncatedNormal(initializer_range)).to_float(compute_type)
+        self.output = BertOutput(in_channels=intermediate_size,
+                                 out_channels=hidden_size,
+                                 initializer_range=initializer_range,
+                                 dropout_prob=hidden_dropout_prob,
+                                 compute_type=compute_type)
+
+    def construct(self, hidden_states, attention_mask):
+        """Construct the trainer of Bert."""
+        # self-attention
+        attention_output = self.attention(hidden_states, attention_mask)
+        # feed construct
+        intermediate_output = self.intermediate(attention_output)
+        # add and normalize
+        output = self.output(intermediate_output, attention_output)
+        return output
+
+
+class BertTransformer(nn.Cell):
+    """
+    Multi-layer bert transformer.
+
+    Args:
+        hidden_size (int): Size of the encoder layers.
+        seq_length (int): Length of input sequence.
+        num_hidden_layers (int): Number of hidden layers in encoder cells.
+        num_attention_heads (int): Number of attention heads in encoder cells. Default: 12.
+        intermediate_size (int): Size of intermediate layer in encoder cells. Default: 3072.
+        attention_probs_dropout_prob (float): The dropout probability for
+                                      BertAttention. Default: 0.1.
+        use_one_hot_embeddings (bool): Specifies whether to use one hot encoding form. Default: False.
+        initializer_range (float): Initialization value of TruncatedNormal. Default: 0.02.
+        hidden_dropout_prob (float): The dropout probability for BertOutput. Default: 0.1.
+        use_relative_positions (bool): Specifies whether to use relative positions. Default: False.
+        hidden_act (str): Activation function used in the encoder cells. Default: "gelu".
+        compute_type (:class:`mindspore.dtype`): Compute type in BertTransformer. Default: mstype.float32.
+        return_all_encoders (bool): Specifies whether to return all encoders. Default: False.
+    """
+
+    def __init__(self,
+                 hidden_size,
+                 seq_length,
+                 num_hidden_layers,
+                 num_attention_heads=12,
+                 intermediate_size=3072,
+                 attention_probs_dropout_prob=0.1,
+                 use_one_hot_embeddings=False,
+                 initializer_range=0.02,
+                 hidden_dropout_prob=0.1,
+                 use_relative_positions=False,
+                 hidden_act="gelu",
+                 compute_type=mstype.float32,
+                 return_all_encoders=False):
+        super(BertTransformer, self).__init__()
+        self.return_all_encoders = return_all_encoders
+
+        layers = []
+        for _ in range(num_hidden_layers):
+            layer = BertEncoderCell(hidden_size=hidden_size,
+                                    seq_length=seq_length,
+                                    num_attention_heads=num_attention_heads,
+                                    intermediate_size=intermediate_size,
+                                    attention_probs_dropout_prob=attention_probs_dropout_prob,
+                                    use_one_hot_embeddings=use_one_hot_embeddings,
+                                    initializer_range=initializer_range,
+                                    hidden_dropout_prob=hidden_dropout_prob,
+                                    use_relative_positions=use_relative_positions,
+                                    hidden_act=hidden_act,
+                                    compute_type=compute_type)
+            layers.append(layer)
+
+        self.layers = nn.CellList(layers)
+
+        self.reshape = P.Reshape()
+        self.shape = (-1, hidden_size)
+        self.out_shape = (-1, seq_length, hidden_size)
+
+    def construct(self, input_tensor, attention_mask):
+        """Construct the trainer of Bert."""
+        prev_output = self.reshape(input_tensor, self.shape)
+
+        all_encoder_layers = ()
+        for layer_module in self.layers:
+            layer_output = layer_module(prev_output, attention_mask)
+            prev_output = layer_output
+
+            if self.return_all_encoders:
+                layer_output = self.reshape(layer_output, self.out_shape)
+                all_encoder_layers = all_encoder_layers + (layer_output,)
+
+        if not self.return_all_encoders:
+            prev_output = self.reshape(prev_output, self.out_shape)
+            all_encoder_layers = all_encoder_layers + (prev_output,)
+        return all_encoder_layers
+
+
+class CreateAttentionMaskFromInputMask(nn.Cell):
+    """
+    Create attention mask according to input mask.
+
+    Args:
+        config (Class): Configuration for BertModel.
+    """
+
+    def __init__(self, config):
+        super(CreateAttentionMaskFromInputMask, self).__init__()
+        self.input_mask = None
+
+        self.cast = P.Cast()
+        self.reshape = P.Reshape()
+        self.shape = (-1, 1, config.seq_length)
+
+    def construct(self, input_mask):
+        """Construct the trainer of Bert."""
+        attention_mask = self.cast(self.reshape(input_mask, self.shape), mstype.float32)
+        return attention_mask
+
+
+class BertModel(nn.Cell):
+    """
+    Encode Representations from Transformers.
+
+    Args:
+        config (Class): Configuration for BertModel.
+        is_training (bool): True for training mode. False for eval mode.
+        use_one_hot_embeddings (bool): Specifies whether to use one hot encoding form. Default: False.
+    """
+
+    def __init__(self,
+                 config,
+                 is_training,
+                 use_one_hot_embeddings=False):
+        super(BertModel, self).__init__()
+        config = copy.deepcopy(config)
+        if not is_training:
+            config.hidden_dropout_prob = 0.0
+            config.attention_probs_dropout_prob = 0.0
+
+        self.seq_length = config.seq_length
+        self.hidden_size = config.hidden_size
+        self.num_hidden_layers = config.num_hidden_layers
+        self.embedding_size = config.hidden_size
+        self.token_type_ids = None
+
+        self.last_idx = self.num_hidden_layers - 1
+        output_embedding_shape = [-1, self.seq_length, self.embedding_size]
+
+        self.bert_embedding_lookup = nn.Embedding(
+            vocab_size=config.vocab_size,
+            embedding_size=self.embedding_size,
+            use_one_hot=use_one_hot_embeddings,
+            embedding_table=TruncatedNormal(config.initializer_range))
+
+        self.bert_embedding_postprocessor = EmbeddingPostprocessor(
+            embedding_size=self.embedding_size,
+            embedding_shape=output_embedding_shape,
+            use_relative_positions=config.use_relative_positions,
+            use_token_type=True,
+            token_type_vocab_size=config.type_vocab_size,
+            use_one_hot_embeddings=use_one_hot_embeddings,
+            initializer_range=0.02,
+            max_position_embeddings=config.max_position_embeddings,
+            dropout_prob=config.hidden_dropout_prob)
+
+        self.bert_encoder = BertTransformer(
+            hidden_size=self.hidden_size,
+            seq_length=self.seq_length,
+            num_attention_heads=config.num_attention_heads,
+            num_hidden_layers=self.num_hidden_layers,
+            intermediate_size=config.intermediate_size,
+            attention_probs_dropout_prob=config.attention_probs_dropout_prob,
+            use_one_hot_embeddings=use_one_hot_embeddings,
+            initializer_range=config.initializer_range,
+            hidden_dropout_prob=config.hidden_dropout_prob,
+            use_relative_positions=config.use_relative_positions,
+            hidden_act=config.hidden_act,
+            compute_type=config.compute_type,
+            return_all_encoders=True)
+
+        self.cast = P.Cast()
+        self.dtype = config.dtype
+        self.cast_compute_type = SaturateCast(dst_type=config.compute_type)
+        self.slice = P.StridedSlice()
+
+        self.squeeze_1 = P.Squeeze(axis=1)
+        self.dense = nn.Dense(self.hidden_size, self.hidden_size,
+                              activation="tanh",
+                              weight_init=TruncatedNormal(config.initializer_range)).to_float(config.compute_type)
+        self._create_attention_mask_from_input_mask = CreateAttentionMaskFromInputMask(config)
+
+    def construct(self, input_ids, token_type_ids, input_mask):
+        """Construct the trainer of Bert."""
+        # embedding
+        embedding_tables = self.bert_embedding_lookup.embedding_table
+        word_embeddings = self.bert_embedding_lookup(input_ids)
+        embedding_output = self.bert_embedding_postprocessor(token_type_ids,
+                                                             word_embeddings)
+
+        # attention mask [batch_size, seq_length, seq_length]
+        attention_mask = self._create_attention_mask_from_input_mask(input_mask)
+
+        # bert encoder
+        encoder_output = self.bert_encoder(self.cast_compute_type(embedding_output),
+                                           attention_mask)
+
+        sequence_output = self.cast(encoder_output[self.last_idx], self.dtype)
+
+        # pooler
+        batch_size = P.Shape()(input_ids)[0]
+        sequence_slice = self.slice(sequence_output,
+                                    (0, 0, 0),
+                                    (batch_size, 1, self.hidden_size),
+                                    (1, 1, 1))
+        first_token = self.squeeze_1(sequence_slice)
+        pooled_output = self.dense(first_token)
+        pooled_output = self.cast(pooled_output, self.dtype)
+
+        return sequence_output, pooled_output, embedding_tables
diff --git a/vega/algorithms/nlp/src/cluener_evaluation.py b/vega/algorithms/nlp/src/cluener_evaluation.py
new file mode 100644
index 00000000..8ca48125
--- /dev/null
+++ b/vega/algorithms/nlp/src/cluener_evaluation.py
@@ -0,0 +1,74 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""Bert clue evaluation."""
+
+import json
+import numpy as np
+import mindspore.common.dtype as mstype
+from mindspore.common.tensor import Tensor
+from . import tokenization
+from .sample_process import label_generation, process_one_example_p
+from .CRF import postprocess
+from .model_utils.config import bert_net_cfg
+from .score import get_result
+
+
+def process(model=None, text="", tokenizer_=None, use_crf="", tag_to_index=None, vocab=""):
+    """Process text."""
+    data = [text]
+    features = []
+    res = []
+    ids = []
+    for i in data:
+        feature = process_one_example_p(tokenizer_, vocab, i, max_seq_len=bert_net_cfg.seq_length)
+        features.append(feature)
+        input_ids, input_mask, token_type_id = feature
+        input_ids = Tensor(np.array(input_ids), mstype.int32)
+        input_mask = Tensor(np.array(input_mask), mstype.int32)
+        token_type_id = Tensor(np.array(token_type_id), mstype.int32)
+        if use_crf.lower() == "true":
+            backpointers, best_tag_id = model.predict(input_ids, input_mask, token_type_id, Tensor(1))
+            best_path = postprocess(backpointers, best_tag_id)
+            logits = []
+            for ele in best_path:
+                logits.extend(ele)
+            ids = logits
+        else:
+            logits = model.predict(input_ids, input_mask, token_type_id, Tensor(1))
+            ids = logits.asnumpy()
+            ids = np.argmax(ids, axis=-1)
+            ids = list(ids)
+    res = label_generation(text=text, probs=ids, tag_to_index=tag_to_index)
+    return res
+
+
+def submit(model=None, path="", vocab_file="", use_crf="", label_file="", tag_to_index=None):
+    """Submit task."""
+    tokenizer_ = tokenization.FullTokenizer(vocab_file=vocab_file)
+    data = []
+    for line in open(path):
+        if not line.strip():
+            continue
+        oneline = json.loads(line.strip())
+        res = process(model=model, text=oneline["text"], tokenizer_=tokenizer_,
+                      use_crf=use_crf, tag_to_index=tag_to_index, vocab=vocab_file)
+        data.append(json.dumps({"label": res}, ensure_ascii=False))
+    open("ner_predict.json", "w").write("\n".join(data))
+    labels = []
+    with open(label_file) as f:
+        for label in f:
+            labels.append(label.strip())
+    get_result(labels, "ner_predict.json", path)
diff --git a/vega/algorithms/nlp/src/dataset.py b/vega/algorithms/nlp/src/dataset.py
new file mode 100644
index 00000000..57c06b74
--- /dev/null
+++ b/vega/algorithms/nlp/src/dataset.py
@@ -0,0 +1,127 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Data operations, will be used in run_pretrain.py."""
+
+import os
+import mindspore.common.dtype as mstype
+import mindspore.dataset as ds
+import mindspore.dataset.transforms.c_transforms as C
+from mindspore import log as logger
+
+
+def create_bert_dataset(device_num=1, rank=0, do_shuffle="true", data_dir=None, schema_dir=None, batch_size=32):
+    """Create train dataset."""
+    # apply repeat operations
+    files = os.listdir(data_dir)
+    data_files = []
+    for file_name in files:
+        if "tfrecord" in file_name:
+            data_files.append(os.path.join(data_dir, file_name))
+    data_set = ds.TFRecordDataset(data_files, schema_dir if schema_dir != "" else None,
+                                  columns_list=["input_ids", "input_mask", "segment_ids", "next_sentence_labels",
+                                                "masked_lm_positions", "masked_lm_ids", "masked_lm_weights"],
+                                  shuffle=ds.Shuffle.FILES if do_shuffle == "true" else False,
+                                  num_shards=device_num, shard_id=rank, shard_equal_rows=True)
+    ori_dataset_size = data_set.get_dataset_size()
+    print('origin dataset size: ', ori_dataset_size)
+    type_cast_op = C.TypeCast(mstype.int32)
+    data_set = data_set.map(operations=type_cast_op, input_columns="masked_lm_ids")
+    data_set = data_set.map(operations=type_cast_op, input_columns="masked_lm_positions")
+    data_set = data_set.map(operations=type_cast_op, input_columns="next_sentence_labels")
+    data_set = data_set.map(operations=type_cast_op, input_columns="segment_ids")
+    data_set = data_set.map(operations=type_cast_op, input_columns="input_mask")
+    data_set = data_set.map(operations=type_cast_op, input_columns="input_ids")
+    # apply batch operations
+    data_set = data_set.batch(batch_size, drop_remainder=True)
+    logger.info("data size: {}".format(data_set.get_dataset_size()))
+    logger.info("repeat count: {}".format(data_set.get_repeat_count()))
+    return data_set
+
+
+def create_ner_dataset(batch_size=1, repeat_count=1, assessment_method="accuracy", data_file_path=None,
+                       dataset_format="mindrecord", schema_file_path=None, do_shuffle=True, drop_remainder=True):
+    """Create finetune or evaluation dataset."""
+    type_cast_op = C.TypeCast(mstype.int32)
+    if dataset_format == "mindrecord":
+        dataset = ds.MindDataset([data_file_path],
+                                 columns_list=["input_ids", "input_mask", "segment_ids", "label_ids"],
+                                 shuffle=do_shuffle)
+    else:
+        dataset = ds.TFRecordDataset([data_file_path], schema_file_path if schema_file_path != "" else None,
+                                     columns_list=["input_ids", "input_mask", "segment_ids", "label_ids"],
+                                     shuffle=do_shuffle)
+    if assessment_method == "Spearman_correlation":
+        type_cast_op_float = C.TypeCast(mstype.float32)
+        dataset = dataset.map(operations=type_cast_op_float, input_columns="label_ids")
+    else:
+        dataset = dataset.map(operations=type_cast_op, input_columns="label_ids")
+    dataset = dataset.map(operations=type_cast_op, input_columns="segment_ids")
+    dataset = dataset.map(operations=type_cast_op, input_columns="input_mask")
+    dataset = dataset.map(operations=type_cast_op, input_columns="input_ids")
+    dataset = dataset.repeat(repeat_count)
+    # apply batch operations
+    dataset = dataset.batch(batch_size, drop_remainder=drop_remainder)
+    return dataset
+
+
+def create_classification_dataset(batch_size=1, repeat_count=1, assessment_method="accuracy",
+                                  data_file_path=None, schema_file_path=None, do_shuffle=True):
+    """Create finetune or evaluation dataset."""
+    type_cast_op = C.TypeCast(mstype.int32)
+    data_set = ds.TFRecordDataset([data_file_path], schema_file_path if schema_file_path != "" else None,
+                                  columns_list=["input_ids", "input_mask", "segment_ids", "label_ids"],
+                                  shuffle=do_shuffle)
+    if assessment_method == "Spearman_correlation":
+        type_cast_op_float = C.TypeCast(mstype.float32)
+        data_set = data_set.map(operations=type_cast_op_float, input_columns="label_ids")
+    else:
+        data_set = data_set.map(operations=type_cast_op, input_columns="label_ids")
+    data_set = data_set.map(operations=type_cast_op, input_columns="segment_ids")
+    data_set = data_set.map(operations=type_cast_op, input_columns="input_mask")
+    data_set = data_set.map(operations=type_cast_op, input_columns="input_ids")
+    data_set = data_set.repeat(repeat_count)
+    # apply batch operations
+    data_set = data_set.batch(batch_size, drop_remainder=True)
+    return data_set
+
+
+def generator_squad(data_features):
+    """Construct the trainer of Bert."""
+    for feature in data_features:
+        yield (feature.input_ids, feature.input_mask, feature.segment_ids, feature.unique_id)
+
+
+def create_squad_dataset(batch_size=1, repeat_count=1, data_file_path=None, schema_file_path=None,
+                         is_training=True, do_shuffle=True):
+    """Create finetune or evaluation dataset."""
+    type_cast_op = C.TypeCast(mstype.int32)
+    if is_training:
+        data_set = ds.TFRecordDataset([data_file_path], schema_file_path if schema_file_path != "" else None,
+                                      columns_list=["input_ids", "input_mask", "segment_ids", "start_positions",
+                                                    "end_positions", "unique_ids", "is_impossible"],
+                                      shuffle=do_shuffle)
+        data_set = data_set.map(operations=type_cast_op, input_columns="start_positions")
+        data_set = data_set.map(operations=type_cast_op, input_columns="end_positions")
+    else:
+        data_set = ds.GeneratorDataset(generator_squad(data_file_path), shuffle=do_shuffle,
+                                       column_names=["input_ids", "input_mask", "segment_ids", "unique_ids"])
+    data_set = data_set.map(operations=type_cast_op, input_columns="segment_ids")
+    data_set = data_set.map(operations=type_cast_op, input_columns="input_mask")
+    data_set = data_set.map(operations=type_cast_op, input_columns="input_ids")
+    data_set = data_set.map(operations=type_cast_op, input_columns="unique_ids")
+    data_set = data_set.repeat(repeat_count)
+    # apply batch operations
+    data_set = data_set.batch(batch_size, drop_remainder=True)
+    return data_set
diff --git a/vega/algorithms/nlp/src/finetune_eval_model.py b/vega/algorithms/nlp/src/finetune_eval_model.py
new file mode 100644
index 00000000..d1840513
--- /dev/null
+++ b/vega/algorithms/nlp/src/finetune_eval_model.py
@@ -0,0 +1,121 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""Bert finetune and evaluation model script."""
+
+import mindspore.nn as nn
+from mindspore.common.initializer import TruncatedNormal
+from mindspore.ops import operations as P
+from .bert_model import BertModel
+
+
+class BertCLSModel(nn.Cell):
+    """Construct the trainer of Bert."""
+
+    def __init__(self, config, is_training, num_labels=2, dropout_prob=0.0, use_one_hot_embeddings=False,
+                 assessment_method=""):
+        super(BertCLSModel, self).__init__()
+        if not is_training:
+            config.hidden_dropout_prob = 0.0
+            config.hidden_probs_dropout_prob = 0.0
+        self.bert = BertModel(config, is_training, use_one_hot_embeddings)
+        self.cast = P.Cast()
+        self.weight_init = TruncatedNormal(config.initializer_range)
+        self.log_softmax = P.LogSoftmax(axis=-1)
+        self.dtype = config.dtype
+        self.num_labels = num_labels
+        self.dense_1 = nn.Dense(config.hidden_size, self.num_labels, weight_init=self.weight_init,
+                                has_bias=True).to_float(config.compute_type)
+        self.dropout = nn.Dropout(1 - dropout_prob)
+        self.assessment_method = assessment_method
+
+    def construct(self, input_ids, input_mask, token_type_id):
+        """Construct the trainer of Bert."""
+        _, pooled_output, _ = \
+            self.bert(input_ids, token_type_id, input_mask)
+        cls = self.cast(pooled_output, self.dtype)
+        cls = self.dropout(cls)
+        logits = self.dense_1(cls)
+        logits = self.cast(logits, self.dtype)
+        if self.assessment_method != "spearman_correlation":
+            logits = self.log_softmax(logits)
+        return logits
+
+
+class BertSquadModel(nn.Cell):
+    """Construct the trainer of Bert."""
+
+    def __init__(self, config, is_training, num_labels=2, dropout_prob=0.0, use_one_hot_embeddings=False):
+        super(BertSquadModel, self).__init__()
+        if not is_training:
+            config.hidden_dropout_prob = 0.0
+            config.hidden_probs_dropout_prob = 0.0
+        self.bert = BertModel(config, is_training, use_one_hot_embeddings)
+        self.weight_init = TruncatedNormal(config.initializer_range)
+        self.dense1 = nn.Dense(config.hidden_size, num_labels, weight_init=self.weight_init,
+                               has_bias=True).to_float(config.compute_type)
+        self.num_labels = num_labels
+        self.dtype = config.dtype
+        self.log_softmax = P.LogSoftmax(axis=1)
+        self.is_training = is_training
+
+    def construct(self, input_ids, input_mask, token_type_id):
+        """Construct the trainer of Bert."""
+        sequence_output, _, _ = self.bert(input_ids, token_type_id, input_mask)
+        batch_size, seq_length, hidden_size = P.Shape()(sequence_output)
+        sequence = P.Reshape()(sequence_output, (-1, hidden_size))
+        logits = self.dense1(sequence)
+        logits = P.Cast()(logits, self.dtype)
+        logits = P.Reshape()(logits, (batch_size, seq_length, self.num_labels))
+        logits = self.log_softmax(logits)
+        return logits
+
+
+class BertNERModel(nn.Cell):
+    """Construct the trainer of Bert."""
+
+    def __init__(self, config, is_training, num_labels=11, use_crf=False, dropout_prob=0.0,
+                 use_one_hot_embeddings=False):
+        super(BertNERModel, self).__init__()
+        if not is_training:
+            config.hidden_dropout_prob = 0.0
+            config.hidden_probs_dropout_prob = 0.0
+        self.bert = BertModel(config, is_training, use_one_hot_embeddings)
+        self.cast = P.Cast()
+        self.weight_init = TruncatedNormal(config.initializer_range)
+        self.log_softmax = P.LogSoftmax(axis=-1)
+        self.dtype = config.dtype
+        self.num_labels = num_labels
+        self.dense_1 = nn.Dense(config.hidden_size, self.num_labels, weight_init=self.weight_init,
+                                has_bias=True).to_float(config.compute_type)
+        self.dropout = nn.Dropout(1 - dropout_prob)
+        self.reshape = P.Reshape()
+        self.shape = (-1, config.hidden_size)
+        self.use_crf = use_crf
+        self.origin_shape = (-1, config.seq_length, self.num_labels)
+
+    def construct(self, input_ids, input_mask, token_type_id):
+        """Construct the trainer of Bert."""
+        sequence_output, _, _ = \
+            self.bert(input_ids, token_type_id, input_mask)
+        seq = self.dropout(sequence_output)
+        seq = self.reshape(seq, self.shape)
+        logits = self.dense_1(seq)
+        logits = self.cast(logits, self.dtype)
+        if self.use_crf:
+            return_value = self.reshape(logits, self.origin_shape)
+        else:
+            return_value = self.log_softmax(logits)
+        return return_value
diff --git a/vega/algorithms/nlp/src/model_utils/config.py b/vega/algorithms/nlp/src/model_utils/config.py
new file mode 100644
index 00000000..ab111aab
--- /dev/null
+++ b/vega/algorithms/nlp/src/model_utils/config.py
@@ -0,0 +1,214 @@
+# Copyright 2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""Parse arguments."""
+
+import os
+import ast
+import argparse
+from pprint import pformat
+import yaml
+import mindspore.common.dtype as mstype
+from ..bert_model import BertConfig
+
+
+class Config:
+    """Construct the trainer of Bert."""
+
+    def __init__(self, cfg_dict):
+        for k, v in cfg_dict.items():
+            if isinstance(v, (list, tuple)):
+                setattr(self, k, [Config(x) if isinstance(x, dict) else x for x in v])
+            else:
+                setattr(self, k, Config(v) if isinstance(v, dict) else v)
+
+    def __str__(self):
+        """Construct the trainer of Bert."""
+        return pformat(self.__dict__)
+
+    def __repr__(self):
+        """Construct the trainer of Bert."""
+        return self.__str__()
+
+
+def parse_cli_to_yaml(parser, cfg, helper=None, choices=None, cfg_path="pretrain_base_config.yaml"):
+    """
+    Parse command line arguments to the configuration according to the default yaml.
+
+    Args:
+        parser: Parent parser.
+        cfg: Base configuration.
+        helper: Helper description.
+        cfg_path: Path to the default yaml config.
+    """
+    parser = argparse.ArgumentParser(description="[REPLACE THIS at config.py]",
+                                     parents=[parser])
+    helper = {} if helper is None else helper
+    choices = {} if choices is None else choices
+    for item in cfg:
+        if not isinstance(cfg[item], list) and not isinstance(cfg[item], dict):
+            help_description = helper[item] if item in helper else "Please reference to {}".format(cfg_path)
+            choice = choices[item] if item in choices else None
+            if isinstance(cfg[item], bool):
+                parser.add_argument("--" + item, type=ast.literal_eval, default=cfg[item], choices=choice,
+                                    help=help_description)
+            else:
+                parser.add_argument("--" + item, type=type(cfg[item]), default=cfg[item], choices=choice,
+                                    help=help_description)
+    args = parser.parse_args()
+    return args
+
+
+def parse_yaml(yaml_path):
+    """
+    Parse the yaml config file.
+
+    Args:
+        yaml_path: Path to the yaml config.
+    """
+    with open(yaml_path, 'r') as fin:
+        try:
+            cfgs = yaml.load_all(fin.read(), Loader=yaml.FullLoader)
+            cfgs = [x for x in cfgs]
+            if len(cfgs) == 1:
+                cfg_helper = {}
+                cfg = cfgs[0]
+                cfg_choices = {}
+            elif len(cfgs) == 2:
+                cfg, cfg_helper = cfgs
+                cfg_choices = {}
+            elif len(cfgs) == 3:
+                cfg, cfg_helper, cfg_choices = cfgs
+            else:
+                raise ValueError("At most 3 docs (config, description for help, choices) are supported in config yaml")
+            # print(cfg_helper)
+        except Exception:
+            raise ValueError("Failed to parse yaml")
+    return cfg, cfg_helper, cfg_choices
+
+
+def merge(args, cfg):
+    """
+    Merge the base config from yaml file and command line arguments.
+
+    Args:
+        args: Command line arguments.
+        cfg: Base configuration.
+    """
+    args_var = vars(args)
+    for item in args_var:
+        cfg[item] = args_var[item]
+    return cfg
+
+
+def parse_dtype(dtype):
+    """Construct the trainer of Bert."""
+    if dtype not in ["mstype.float32", "mstype.float16"]:
+        raise ValueError("Not supported dtype")
+
+    if dtype == "mstype.float32":
+        return mstype.float32
+    if dtype == "mstype.float16":
+        return mstype.float16
+    return None
+
+
+def extra_operations(cfg):
+    """
+    Do extra work on config.
+
+    Args:
+        config: Object after instantiation of class 'Config'.
+    """
+    def create_filter_fun(keywords):
+        return lambda x: not (True in [key in x.name.lower() for key in keywords])
+
+    if cfg.description == 'run_pretrain':
+        cfg.AdamWeightDecay.decay_filter = create_filter_fun(cfg.AdamWeightDecay.decay_filter)
+        cfg.Lamb.decay_filter = create_filter_fun(cfg.Lamb.decay_filter)
+        cfg.base_net_cfg.dtype = parse_dtype(cfg.base_net_cfg.dtype)
+        cfg.base_net_cfg.compute_type = parse_dtype(cfg.base_net_cfg.compute_type)
+        cfg.nezha_net_cfg.dtype = parse_dtype(cfg.nezha_net_cfg.dtype)
+        cfg.nezha_net_cfg.compute_type = parse_dtype(cfg.nezha_net_cfg.compute_type)
+        cfg.large_net_cfg.dtype = parse_dtype(cfg.large_net_cfg.dtype)
+        cfg.large_net_cfg.compute_type = parse_dtype(cfg.large_net_cfg.compute_type)
+        cfg.large_acc_net_cfg.dtype = parse_dtype(cfg.large_acc_net_cfg.dtype)
+        cfg.large_acc_net_cfg.compute_type = parse_dtype(cfg.large_acc_net_cfg.compute_type)
+        if cfg.bert_network == 'base':
+            cfg.batch_size = cfg.base_batch_size
+            _bert_net_cfg = cfg.base_net_cfg
+        elif cfg.bert_network == 'nezha':
+            cfg.batch_size = cfg.nezha_batch_size
+            _bert_net_cfg = cfg.nezha_net_cfg
+        elif cfg.bert_network == 'large':
+            cfg.batch_size = cfg.large_batch_size
+            _bert_net_cfg = cfg.large_net_cfg
+        elif cfg.bert_network == 'large_acc':
+            cfg.batch_size = cfg.large_acc_batch_size
+            _bert_net_cfg = cfg.large_acc_net_cfg
+        else:
+            pass
+        cfg.bert_net_cfg = BertConfig(**_bert_net_cfg.__dict__)
+    elif cfg.description == 'run_ner':
+        cfg.optimizer_cfg.AdamWeightDecay.decay_filter = \
+            create_filter_fun(cfg.optimizer_cfg.AdamWeightDecay.decay_filter)
+        cfg.optimizer_cfg.Lamb.decay_filter = create_filter_fun(cfg.optimizer_cfg.Lamb.decay_filter)
+        cfg.bert_net_cfg.dtype = mstype.float32
+        cfg.bert_net_cfg.compute_type = mstype.float16
+        cfg.bert_net_cfg = BertConfig(**cfg.bert_net_cfg.__dict__)
+
+    elif cfg.description == 'run_squad':
+        cfg.optimizer_cfg.AdamWeightDecay.decay_filter = \
+            create_filter_fun(cfg.optimizer_cfg.AdamWeightDecay.decay_filter)
+        cfg.optimizer_cfg.Lamb.decay_filter = create_filter_fun(cfg.optimizer_cfg.Lamb.decay_filter)
+        cfg.bert_net_cfg.dtype = mstype.float32
+        cfg.bert_net_cfg.compute_type = mstype.float16
+        cfg.bert_net_cfg = BertConfig(**cfg.bert_net_cfg.__dict__)
+
+    elif cfg.description == 'run_classifier':
+        cfg.optimizer_cfg.AdamWeightDecay.decay_filter = \
+            create_filter_fun(cfg.optimizer_cfg.AdamWeightDecay.decay_filter)
+        cfg.optimizer_cfg.Lamb.decay_filter = create_filter_fun(cfg.optimizer_cfg.Lamb.decay_filter)
+        cfg.bert_net_cfg.dtype = mstype.float32
+        cfg.bert_net_cfg.compute_type = mstype.float16
+        cfg.bert_net_cfg = BertConfig(**cfg.bert_net_cfg.__dict__)
+    else:
+        pass
+
+
+def get_config():
+    """Get Config according to the yaml file and cli arguments."""
+    def get_abs_path(path_relative):
+        current_dir = os.path.dirname(os.path.abspath(__file__))
+        return os.path.join(current_dir, path_relative)
+    parser = argparse.ArgumentParser(description="default name", add_help=False)
+    parser.add_argument("--config_path", type=get_abs_path, default="./pretrain_config.yaml",
+                        help="Config file path")
+    path_args, _ = parser.parse_known_args()
+    default, helper, choices = parse_yaml(path_args.config_path)
+    # pprint(default)
+    config_obj = Config(default)
+    extra_operations(config_obj)
+    return config_obj
+
+
+config = get_config()
+bert_net_cfg = config.bert_net_cfg
+if config.description in ('run_classifier', 'run_ner', 'run_squad'):
+    optimizer_cfg = config.optimizer_cfg
+
+
+if __name__ == '__main__':
+    print(config)
diff --git a/vega/algorithms/nlp/src/model_utils/pretrain_config.yaml b/vega/algorithms/nlp/src/model_utils/pretrain_config.yaml
new file mode 100644
index 00000000..e42285cb
--- /dev/null
+++ b/vega/algorithms/nlp/src/model_utils/pretrain_config.yaml
@@ -0,0 +1,194 @@
+# Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing)
+enable_modelarts: False
+# Url for modelarts
+data_url: ""
+train_url: ""
+checkpoint_url: ""
+# Path for local
+data_path: "/root/lzc/wiki/"
+output_path: "/root/lzc/bert/"
+load_path: "/root/lzc/checkpoint_path/"
+device_target: "Ascend"
+enable_profiling: False
+
+# ==============================================================================
+description: 'run_pretrain'
+distribute: 'false'
+epoch_size: 40
+device_id: 0
+device_num: 1
+enable_save_ckpt: 'true'
+enable_lossscale: 'true'
+do_shuffle: 'true'
+enable_data_sink: 'true'
+data_sink_steps: 1
+accumulation_steps: 1
+allreduce_post_accumulation: 'true'
+save_checkpoint_path: ''
+load_checkpoint_path: ''
+save_checkpoint_steps: 1000
+train_steps: -1
+save_checkpoint_num: 1
+data_dir: '/root/lzc/wiki/'
+schema_dir: ''
+
+# ==============================================================================
+# pretrain related
+batch_size: 32
+# Available: [base, nezha, large, large_acc]
+bert_network: 'base'
+loss_scale_value: 65536
+scale_factor: 2
+scale_window: 1000
+optimizer: 'Lamb'
+enable_global_norm: False
+# pretrain_eval related
+data_file: ""
+schema_file: ""
+finetune_ckpt: ""
+# optimizer related
+AdamWeightDecay:
+    learning_rate: 0.00003  # 3e-5
+    end_learning_rate: 0.0
+    power: 5.0
+    weight_decay: 0.00001  # 1e-5
+    decay_filter: ['layernorm', 'bias']
+    eps: 0.000001  # 1e-6
+    warmup_steps: 10000
+
+Lamb:
+    learning_rate: 0.0003  # 3e-4
+    end_learning_rate: 0.0
+    power: 2.0
+    warmup_steps: 10000
+    weight_decay: 0.01
+    decay_filter: ['layernorm', 'bias']
+    eps: 0.00000001  # 1e-8,
+
+Momentum:
+    learning_rate: 0.00002  # 2e-5
+    momentum: 0.9
+
+Thor:
+    lr_max: 0.006464
+    lr_min: 0.000001  # 1e-6
+    lr_power: 2.0
+    lr_total_steps: 30000
+    damping_max: 0.007035
+    damping_min: 0.000001  # 1e-6
+    damping_power: 4.0
+    damping_total_steps: 30000
+    momentum: 0.9
+    weight_decay: 0.00001  # 1e-5
+    loss_scale: 1024.0
+    frequency: 100
+# ==============================================================================
+# base
+base_batch_size: 256
+base_net_cfg:
+    seq_length: 128
+    vocab_size: 21128
+    hidden_size: 768
+    num_hidden_layers: 12
+    num_attention_heads: 12
+    intermediate_size: 3072
+    hidden_act: "gelu"
+    hidden_dropout_prob: 0.1
+    attention_probs_dropout_prob: 0.1
+    max_position_embeddings: 512
+    type_vocab_size: 2
+    initializer_range: 0.02
+    use_relative_positions: False
+    dtype: mstype.float32
+    compute_type: mstype.float16
+# nezha
+nezha_batch_size: 96
+nezha_net_cfg:
+    seq_length: 128
+    vocab_size: 21128
+    hidden_size: 1024
+    num_hidden_layers: 24
+    num_attention_heads: 16
+    intermediate_size: 4096
+    hidden_act: "gelu"
+    hidden_dropout_prob: 0.1
+    attention_probs_dropout_prob: 0.1
+    max_position_embeddings: 512
+    type_vocab_size: 2
+    initializer_range: 0.02
+    use_relative_positions: True
+    dtype: mstype.float32
+    compute_type: mstype.float16
+# large
+large_batch_size: 24
+large_net_cfg:
+    seq_length: 512
+    vocab_size: 30522
+    hidden_size: 1024
+    num_hidden_layers: 24
+    num_attention_heads: 16
+    intermediate_size: 4096
+    hidden_act: "gelu"
+    hidden_dropout_prob: 0.1
+    attention_probs_dropout_prob: 0.1
+    max_position_embeddings: 512
+    type_vocab_size: 2
+    initializer_range: 0.02
+    use_relative_positions: False
+    dtype: mstype.float32
+    compute_type: mstype.float16
+# Accelerated large network which is only supported in Ascend yet.
+large_acc_batch_size: 24
+large_acc_net_cfg:
+    seq_length: 512
+    vocab_size: 30522
+    hidden_size: 1024
+    num_hidden_layers: 24
+    num_attention_heads: 16
+    intermediate_size: 4096
+    hidden_act: "fast_gelu"
+    hidden_dropout_prob: 0.1
+    attention_probs_dropout_prob: 0.1
+    max_position_embeddings: 512
+    type_vocab_size: 2
+    initializer_range: 0.02
+    use_relative_positions: False
+    dtype: mstype.float32
+    compute_type: mstype.float16
+
+
+---
+# Help description for each configuration
+enable_modelarts: "Whether training on modelarts, default: False"
+data_url: "Url for modelarts"
+train_url: "Url for modelarts"
+data_path: "The location of the input data."
+output_path: "The location of the output file."
+device_target: "Running platform, choose from Ascend or CPU, and default is Ascend."
+enable_profiling: 'Whether enable profiling while training, default: False'
+
+distribute: "Run distribute, default is 'false'."
+epoch_size: "Epoch size, default is 1."
+enable_save_ckpt: "Enable save checkpoint, default is true."
+enable_lossscale: "Use lossscale or not, default is not."
+do_shuffle: "Enable shuffle for dataset, default is true."
+enable_data_sink: "Enable data sink, default is true."
+data_sink_steps: "Sink steps for each epoch, default is 1."
+accumulation_steps: "Accumulating gradients N times before weight update, default is 1."
+allreduce_post_accumulation: "Whether to allreduce after accumulation of N steps or after each step, default is true."
+save_checkpoint_path: "Save checkpoint path"
+load_checkpoint_path: "Load checkpoint file path"
+save_checkpoint_steps: "Save checkpoint steps, default is 1000"
+train_steps: "Training Steps, default is -1, meaning run all steps according to epoch number."
+save_checkpoint_num: "Save checkpoint numbers, default is 1."
+data_dir: "Data path, it is better to use absolute path"
+schema_dir: "Schema path, it is better to use absolute path"
+---
+# chocies
+device_target: ['Ascend', 'GPU']
+distribute: ["true", "false"]
+enable_save_ckpt: ["true", "false"]
+enable_lossscale: ["true", "false"]
+do_shuffle: ["true", "false"]
+enable_data_sink: ["true", "false"]
+allreduce_post_accumulation: ["true", "false"]
diff --git a/vega/algorithms/nlp/src/sample_process.py b/vega/algorithms/nlp/src/sample_process.py
new file mode 100644
index 00000000..99b99e2f
--- /dev/null
+++ b/vega/algorithms/nlp/src/sample_process.py
@@ -0,0 +1,102 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""Process txt."""
+
+import re
+from .tokenization import convert_tokens_to_ids
+
+
+def process_one_example_p(tokenizer, vocab, text, max_seq_len=128):
+    """Process one testline."""
+    textlist = list(text)
+    tokens = []
+    for _, word in enumerate(textlist):
+        token = tokenizer.tokenize(word)
+        tokens.extend(token)
+    if len(tokens) >= max_seq_len - 1:
+        tokens = tokens[0:(max_seq_len - 2)]
+    ntokens = []
+    segment_ids = []
+    label_ids = []
+    ntokens.append("[CLS]")
+    segment_ids.append(0)
+    for _, token in enumerate(tokens):
+        ntokens.append(token)
+        segment_ids.append(0)
+    ntokens.append("[SEP]")
+    segment_ids.append(0)
+    input_ids = convert_tokens_to_ids(vocab, ntokens)
+    input_mask = [1] * len(input_ids)
+    while len(input_ids) < max_seq_len:
+        input_ids.append(0)
+        input_mask.append(0)
+        segment_ids.append(0)
+        label_ids.append(0)
+        ntokens.append("**NULL**")
+    assert len(input_ids) == max_seq_len
+    assert len(input_mask) == max_seq_len
+    assert len(segment_ids) == max_seq_len
+
+    feature = (input_ids, input_mask, segment_ids)
+    return feature
+
+
+def label_generation(text="", probs=None, tag_to_index=None):
+    """Generate label."""
+    data = [text]
+    probs = [probs]
+    result = []
+    label2id = tag_to_index
+    id2label = [k for k, v in label2id.items()]
+
+    for index, prob in enumerate(probs):
+        for v in prob[1:len(data[index]) + 1]:
+            result.append(id2label[int(v)])
+
+    labels = {}
+    start = None
+    index = 0
+    for _, t in zip("".join(data), result):
+        if re.search("^[BS]", t):
+            if start is not None:
+                label = result[index - 1][2:]
+                if labels.get(label):
+                    te_ = text[start:index]
+                    labels[label][te_] = [[start, index - 1]]
+                else:
+                    te_ = text[start:index]
+                    labels[label] = {te_: [[start, index - 1]]}
+            start = index
+        if re.search("^O", t):
+            if start is not None:
+                label = result[index - 1][2:]
+                if labels.get(label):
+                    te_ = text[start:index]
+                    labels[label][te_] = [[start, index - 1]]
+                else:
+                    te_ = text[start:index]
+                    labels[label] = {te_: [[start, index - 1]]}
+            start = None
+        index += 1
+    if start is not None:
+        label = result[start][2:]
+        if labels.get(label):
+            te_ = text[start:index]
+            labels[label][te_] = [[start, index - 1]]
+        else:
+            te_ = text[start:index]
+            labels[label] = {te_: [[start, index - 1]]}
+    return labels
diff --git a/vega/algorithms/nlp/src/score.py b/vega/algorithms/nlp/src/score.py
new file mode 100644
index 00000000..9c40412a
--- /dev/null
+++ b/vega/algorithms/nlp/src/score.py
@@ -0,0 +1,81 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""Calculate average F1 score among labels."""
+
+import json
+
+
+def get_f1_score_for_each_label(pre_lines, gold_lines, label):
+    """
+    Get F1 score for each label.
+
+    Args:
+        pre_lines: listed label info from pre_file.
+        gold_lines: listed label info from gold_file.
+        label:
+
+    Returns:
+        F1 score for this label.
+    """
+    TP = 0
+    FP = 0
+    FN = 0
+    index = 0
+    while index < len(pre_lines):
+        pre_line = pre_lines[index].get(label, {})
+        gold_line = gold_lines[index].get(label, {})
+        for sample in pre_line:
+            if sample in gold_line:
+                TP += 1
+            else:
+                FP += 1
+        for sample in gold_line:
+            if sample not in pre_line:
+                FN += 1
+        index += 1
+    f1 = 2 * TP / (2 * TP + FP + FN)
+    return f1
+
+
+def get_f1_score(labels, pre_file, gold_file):
+    """
+    Get F1 scores for each label.
+
+    Args:
+        labels: list of labels.
+        pre_file: prediction file.
+        gold_file: ground truth file.
+
+    Returns:
+        average F1 score on all labels.
+    """
+    pre_lines = [json.loads(line.strip())['label'] for line in open(pre_file) if line.strip()]
+    gold_lines = [json.loads(line.strip())['label'] for line in open(gold_file) if line.strip()]
+    if len(pre_lines) != len(gold_lines):
+        raise ValueError("pre file and gold file have different line count.")
+    f1_sum = 0
+    for label in labels:
+        f1 = get_f1_score_for_each_label(pre_lines, gold_lines, label)
+        print('label: %s, F1: %.6f' % (label, f1))
+        f1_sum += f1
+
+    return f1_sum / len(labels)
+
+
+def get_result(labels, pre_file, gold_file):
+    """Construct the trainer of Bert."""
+    avg = get_f1_score(labels, pre_file=pre_file, gold_file=gold_file)
+    print("avg F1: %.6f" % avg)
diff --git a/vega/algorithms/nlp/src/squad_get_predictions.py b/vega/algorithms/nlp/src/squad_get_predictions.py
new file mode 100644
index 00000000..1310f491
--- /dev/null
+++ b/vega/algorithms/nlp/src/squad_get_predictions.py
@@ -0,0 +1,257 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""Get predictions for squad."""
+
+import math
+import collections
+import six
+from . import tokenization
+
+
+def get_prelim_predictions(features, unique_id_to_result, n_best_size, max_answer_length):
+    """Get prelim predictions."""
+    _PrelimPrediction = collections.namedtuple(
+        "PrelimPrediction",
+        ["feature_index", "start_index", "end_index", "start_logit", "end_logit"])
+    prelim_predictions = []
+    # keep track of the minimum score of null start+end of position 0
+    for (feature_index, feature) in enumerate(features):
+        if feature.unique_id not in unique_id_to_result:
+            continue
+        result = unique_id_to_result[feature.unique_id]
+        start_indexes = _get_best_indexes(result.start_logits, n_best_size)
+        end_indexes = _get_best_indexes(result.end_logits, n_best_size)
+        # if we could have irrelevant answers, get the min score of irrelevant
+        for start_index in start_indexes:
+            for end_index in end_indexes:
+                # We could hypothetically create invalid predictions, e.g., predict
+                # that the start of the span is in the question. We throw out all
+                # invalid predictions.
+                if start_index >= len(feature.tokens):
+                    continue
+                if end_index >= len(feature.tokens):
+                    continue
+                if start_index not in feature.token_to_orig_map:
+                    continue
+                if end_index not in feature.token_to_orig_map:
+                    continue
+                if not feature.token_is_max_context.get(start_index, False):
+                    continue
+                if end_index < start_index:
+                    continue
+                length = end_index - start_index + 1
+                if length > max_answer_length:
+                    continue
+                prelim_predictions.append(
+                    _PrelimPrediction(
+                        feature_index=feature_index,
+                        start_index=start_index,
+                        end_index=end_index,
+                        start_logit=result.start_logits[start_index],
+                        end_logit=result.end_logits[end_index]))
+
+    prelim_predictions = sorted(
+        prelim_predictions,
+        key=lambda x: (x.start_logit + x.end_logit),
+        reverse=True)
+    return prelim_predictions
+
+
+def get_nbest(prelim_predictions, features, example, n_best_size, do_lower_case):
+    """Get nbest predictions."""
+    _NbestPrediction = collections.namedtuple(
+        "NbestPrediction", ["text", "start_logit", "end_logit"])
+
+    seen_predictions = {}
+    nbest = []
+    for pred in prelim_predictions:
+        if len(nbest) >= n_best_size:
+            break
+        feature = features[pred.feature_index]
+        if pred.start_index > 0:  # this is a non-null prediction
+            tok_tokens = feature.tokens[pred.start_index:(pred.end_index + 1)]
+            orig_doc_start = feature.token_to_orig_map[pred.start_index]
+            orig_doc_end = feature.token_to_orig_map[pred.end_index]
+            orig_tokens = example.doc_tokens[orig_doc_start:(orig_doc_end + 1)]
+            tok_text = " ".join(tok_tokens)
+
+            # De-tokenize WordPieces that have been split off.
+            tok_text = tok_text.replace(" ##", "")
+            tok_text = tok_text.replace("##", "")
+
+            # Clean whitespace
+            tok_text = tok_text.strip()
+            tok_text = " ".join(tok_text.split())
+            orig_text = " ".join(orig_tokens)
+            final_text = get_final_text(tok_text, orig_text, do_lower_case)
+            if final_text in seen_predictions:
+                continue
+
+            seen_predictions[final_text] = True
+        else:
+            final_text = ""
+            seen_predictions[final_text] = True
+
+        nbest.append(
+            _NbestPrediction(
+                text=final_text,
+                start_logit=pred.start_logit,
+                end_logit=pred.end_logit))
+
+    # In very rare edge cases we could have no valid predictions. So we
+    # just create a nonce prediction in this case to avoid failure.
+    if not nbest:
+        nbest.append(_NbestPrediction(text="empty", start_logit=0.0, end_logit=0.0))
+
+    assert len(nbest) >= 1
+    return nbest
+
+
+def get_predictions(all_examples, all_features, all_results, n_best_size, max_answer_length, do_lower_case):
+    """Get final predictions."""
+    example_index_to_features = collections.defaultdict(list)
+    for feature in all_features:
+        example_index_to_features[feature.example_index].append(feature)
+
+    unique_id_to_result = {}
+    for result in all_results:
+        unique_id_to_result[result.unique_id] = result
+    all_predictions = collections.OrderedDict()
+
+    for (example_index, example) in enumerate(all_examples):
+        features = example_index_to_features[example_index]
+        prelim_predictions = get_prelim_predictions(features, unique_id_to_result, n_best_size, max_answer_length)
+        nbest = get_nbest(prelim_predictions, features, example, n_best_size, do_lower_case)
+
+        total_scores = []
+        best_non_null_entry = None
+        for entry in nbest:
+            total_scores.append(entry.start_logit + entry.end_logit)
+            if not best_non_null_entry:
+                if entry.text:
+                    best_non_null_entry = entry
+
+        probs = _compute_softmax(total_scores)
+
+        nbest_json = []
+        for (i, entry) in enumerate(nbest):
+            output = collections.OrderedDict()
+            output["text"] = entry.text
+            output["probability"] = probs[i]
+            output["start_logit"] = entry.start_logit
+            output["end_logit"] = entry.end_logit
+            nbest_json.append(output)
+
+        assert len(nbest_json) >= 1
+
+        all_predictions[example.qas_id] = nbest_json[0]["text"]
+    return all_predictions
+
+
+def write_predictions(all_examples, all_features, all_results, n_best_size,
+                      max_answer_length, do_lower_case):
+    """Write final predictions to the json file and log-odds of null if needed."""
+    all_predictions = get_predictions(all_examples, all_features, all_results,
+                                      n_best_size, max_answer_length, do_lower_case)
+    return all_predictions
+
+
+def get_final_text(pred_text, orig_text, do_lower_case):
+    """Project the tokenized prediction back to the original text."""
+    def _strip_spaces(text):
+        ns_chars = []
+        ns_to_s_map = collections.OrderedDict()
+        for (i, c) in enumerate(text):
+            if c == " ":
+                continue
+            ns_to_s_map[len(ns_chars)] = i
+            ns_chars.append(c)
+        ns_text = "".join(ns_chars)
+        return (ns_text, ns_to_s_map)
+
+    tokenizer = tokenization.BasicTokenizer(do_lower_case=do_lower_case)
+    tok_text = " ".join(tokenizer.tokenize(orig_text))
+
+    start_position = tok_text.find(pred_text)
+    if start_position == -1:
+        return orig_text
+    end_position = start_position + len(pred_text) - 1
+
+    (orig_ns_text, orig_ns_to_s_map) = _strip_spaces(orig_text)
+    (tok_ns_text, tok_ns_to_s_map) = _strip_spaces(tok_text)
+
+    if len(orig_ns_text) != len(tok_ns_text):
+        return orig_text
+
+    tok_s_to_ns_map = {}
+    for (i, tok_index) in six.iteritems(tok_ns_to_s_map):
+        tok_s_to_ns_map[tok_index] = i
+
+    orig_start_position = None
+    if start_position in tok_s_to_ns_map:
+        ns_start_position = tok_s_to_ns_map[start_position]
+        if ns_start_position in orig_ns_to_s_map:
+            orig_start_position = orig_ns_to_s_map[ns_start_position]
+
+    if orig_start_position is None:
+        return orig_text
+
+    orig_end_position = None
+    if end_position in tok_s_to_ns_map:
+        ns_end_position = tok_s_to_ns_map[end_position]
+        if ns_end_position in orig_ns_to_s_map:
+            orig_end_position = orig_ns_to_s_map[ns_end_position]
+
+    if orig_end_position is None:
+        return orig_text
+
+    output_text = orig_text[orig_start_position:(orig_end_position + 1)]
+    return output_text
+
+
+def _get_best_indexes(logits, n_best_size):
+    """Get the n-best logits from a list."""
+    index_and_score = sorted(enumerate(logits), key=lambda x: x[1], reverse=True)
+
+    best_indexes = []
+    for (i, score) in enumerate(index_and_score):
+        if i >= n_best_size:
+            break
+        best_indexes.append(score[0])
+    return best_indexes
+
+
+def _compute_softmax(scores):
+    """Compute softmax probability over raw logits."""
+    if not scores:
+        return []
+
+    max_score = None
+    for score in scores:
+        if max_score is None or score > max_score:
+            max_score = score
+
+    exp_scores = []
+    total_sum = 0.0
+    for score in scores:
+        x = math.exp(score - max_score)
+        exp_scores.append(x)
+        total_sum += x
+
+    probs = []
+    for score in exp_scores:
+        probs.append(score / total_sum)
+    return probs
diff --git a/vega/algorithms/nlp/src/squad_postprocess.py b/vega/algorithms/nlp/src/squad_postprocess.py
new file mode 100644
index 00000000..e102b27d
--- /dev/null
+++ b/vega/algorithms/nlp/src/squad_postprocess.py
@@ -0,0 +1,110 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""Evaluation script for SQuAD v1.1."""
+
+from collections import Counter
+import string
+import re
+import json
+import sys
+
+
+def normalize_answer(s):
+    """Low text and remove punctuation, articles and extra whitespace."""
+
+    def remove_articles(text):
+        """Construct the trainer of Bert."""
+        return re.sub(r'\b(a|an|the)\b', ' ', text)
+
+    def white_space_fix(text):
+        """Construct the trainer of Bert."""
+        return ' '.join(text.split())
+
+    def remove_punc(text):
+        """Construct the trainer of Bert."""
+        exclude = set(string.punctuation)
+        return ''.join(ch for ch in text if ch not in exclude)
+
+    def lower(text):
+        """Construct the trainer of Bert."""
+        return text.lower()
+
+    return white_space_fix(remove_articles(remove_punc(lower(s))))
+
+
+def f1_score(prediction, ground_truth):
+    """Calculate f1 score."""
+    prediction_tokens = normalize_answer(prediction).split()
+    ground_truth_tokens = normalize_answer(ground_truth).split()
+    common = Counter(prediction_tokens) & Counter(ground_truth_tokens)
+    num_same = sum(common.values())
+    if num_same == 0:
+        return 0
+    precision = 1.0 * num_same / len(prediction_tokens)
+    recall = 1.0 * num_same / len(ground_truth_tokens)
+    f1 = (2 * precision * recall) / (precision + recall)
+    return f1
+
+
+def exact_match_score(prediction, ground_truth):
+    """Construct the trainer of Bert."""
+    return normalize_answer(prediction) == normalize_answer(ground_truth)
+
+
+def metric_max_over_ground_truths(metric_fn, prediction, ground_truths):
+    """Construct the trainer of Bert."""
+    scores_for_ground_truths = []
+    for ground_truth in ground_truths:
+        score = metric_fn(prediction, ground_truth)
+        scores_for_ground_truths.append(score)
+    return max(scores_for_ground_truths)
+
+
+def evaluate(dataset, predictions):
+    """Do evaluation."""
+    f1 = exact_match = total = 0
+    for article in dataset:
+        for paragraph in article['paragraphs']:
+            for qa in paragraph['qas']:
+                total += 1
+                if qa['id'] not in predictions:
+                    message = 'Unanswered question ' + qa['id'] + \
+                              ' will receive score 0.'
+                    print(message, file=sys.stderr)
+                    continue
+                ground_truths = list(map(lambda x: x['text'], qa['answers']))
+                if not ground_truths:
+                    continue
+                prediction = predictions[qa['id']]
+                exact_match += metric_max_over_ground_truths(
+                    exact_match_score, prediction, ground_truths)
+                f1 += metric_max_over_ground_truths(
+                    f1_score, prediction, ground_truths)
+
+    exact_match = 100.0 * exact_match / total
+    f1 = 100.0 * f1 / total
+    return {'exact_match': exact_match, 'f1': f1}
+
+
+def SQuad_postprocess(dataset_file, all_predictions, output_metrics="output.json"):
+    """Construct the trainer of Bert."""
+    with open(dataset_file) as ds:
+        dataset_json = json.load(ds)
+        dataset = dataset_json['data']
+    re_json = evaluate(dataset, all_predictions)
+    print(json.dumps(re_json))
+    with open(output_metrics, 'w') as wr:
+        wr.write(json.dumps(re_json))
diff --git a/vega/algorithms/nlp/src/tokenization.py b/vega/algorithms/nlp/src/tokenization.py
new file mode 100644
index 00000000..ffc4a523
--- /dev/null
+++ b/vega/algorithms/nlp/src/tokenization.py
@@ -0,0 +1,331 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""Tokenization."""
+
+import unicodedata
+import collections
+
+
+def convert_to_unicode(text):
+    """
+    Convert text into unicode type.
+
+    Args:
+        text: input str.
+
+    Returns:
+        input str in unicode.
+    """
+    ret = text
+    if isinstance(text, str):
+        ret = text
+    elif isinstance(text, bytes):
+        ret = text.decode("utf-8", "ignore")
+    else:
+        raise ValueError("Unsupported string type: %s" % (type(text)))
+    return ret
+
+
+def vocab_to_dict_key_token(vocab_file):
+    """Load a vocab file into a dict, key is token."""
+    vocab = collections.OrderedDict()
+    index = 0
+    with open(vocab_file, "r") as reader:
+        while True:
+            token = convert_to_unicode(reader.readline())
+            if not token:
+                break
+            token = token.strip()
+            vocab[token] = index
+            index += 1
+    return vocab
+
+
+def vocab_to_dict_key_id(vocab_file):
+    """Load a vocab file into a dict, key is id."""
+    vocab = collections.OrderedDict()
+    index = 0
+    with open(vocab_file, "r") as reader:
+        while True:
+            token = convert_to_unicode(reader.readline())
+            if not token:
+                break
+            token = token.strip()
+            vocab[index] = token
+            index += 1
+    return vocab
+
+
+def whitespace_tokenize(text):
+    """Run basic whitespace cleaning and splitting on a piece of text."""
+    text = text.strip()
+    if not text:
+        return []
+    tokens = text.split()
+    return tokens
+
+
+def convert_tokens_to_ids(vocab_file, tokens):
+    """
+    Convert tokens to ids.
+
+    Args:
+        vocab_file: path to vocab.txt.
+        tokens: list of tokens.
+
+    Returns:
+        list of ids.
+    """
+    vocab_dict = vocab_to_dict_key_token(vocab_file)
+    output = []
+    for token in tokens:
+        output.append(vocab_dict[token])
+    return output
+
+
+def convert_ids_to_tokens(vocab_file, ids):
+    """
+    Convert ids to tokens.
+
+    Args:
+        vocab_file: path to vocab.txt.
+        ids: list of ids.
+
+    Returns:
+        list of tokens.
+    """
+    vocab_dict = vocab_to_dict_key_id(vocab_file)
+    output = []
+    for _id in ids:
+        output.append(vocab_dict[_id])
+    return output
+
+
+class FullTokenizer():
+    """Construct the trainer of Bert."""
+
+    def __init__(self, vocab_file, do_lower_case=True):
+        self.vocab_dict = vocab_to_dict_key_token(vocab_file)
+        self.do_lower_case = do_lower_case
+        self.basic_tokenize = BasicTokenizer(do_lower_case)
+        self.wordpiece_tokenize = WordpieceTokenizer(self.vocab_dict)
+
+    def tokenize(self, text):
+        """
+        Do full tokenization.
+
+        Args:
+            text: str of text.
+
+        Returns:
+            list of tokens.
+        """
+        tokens_ret = []
+        text = convert_to_unicode(text)
+        for tokens in self.basic_tokenize.tokenize(text):
+            wordpiece_tokens = self.wordpiece_tokenize.tokenize(tokens)
+            tokens_ret.extend(wordpiece_tokens)
+        return tokens_ret
+
+
+class BasicTokenizer():
+    """Construct the trainer of Bert."""
+
+    def __init__(self, do_lower_case=True):
+        self.do_lower_case = do_lower_case
+
+    def tokenize(self, text):
+        """
+        Do basic tokenization.
+
+        Args:
+            text: text in unicode.
+
+        Returns:
+            a list of tokens split from text
+        """
+        text = self._clean_text(text)
+        text = self._tokenize_chinese_chars(text)
+
+        orig_tokens = whitespace_tokenize(text)
+        split_tokens = []
+        for token in orig_tokens:
+            if self.do_lower_case:
+                token = token.lower()
+                token = self._run_strip_accents(token)
+            aaa = self._run_split_on_punc(token)
+            split_tokens.extend(aaa)
+
+        output_tokens = whitespace_tokenize(" ".join(split_tokens))
+        return output_tokens
+
+    def _run_strip_accents(self, text):
+        """Construct the trainer of Bert."""
+        text = unicodedata.normalize("NFD", text)
+        output = []
+        for char in text:
+            cat = unicodedata.category(char)
+            if cat == "Mn":
+                continue
+            output.append(char)
+        return "".join(output)
+
+    def _run_split_on_punc(self, text):
+        """Split punctuation on a piece of text."""
+        i = 0
+        start_new_word = True
+        output = []
+        for char in text:
+            if _is_punctuation(char):
+                output.append([char])
+                start_new_word = True
+            else:
+                if start_new_word:
+                    output.append([])
+                start_new_word = False
+                output[-1].append(char)
+            i += 1
+        return ["".join(x) for x in output]
+
+    def _clean_text(self, text):
+        """Perform invalid character removal and whitespace cleanup on text."""
+        output = []
+        for char in text:
+            cp = ord(char)
+            if cp == 0 or cp == 0xfffd or _is_control(char):
+                continue
+            if _is_whitespace(char):
+                output.append(" ")
+            else:
+                output.append(char)
+        return "".join(output)
+
+    def _tokenize_chinese_chars(self, text):
+        """Add whitespace around any CJK character."""
+        output = []
+        for char in text:
+            cp = ord(char)
+            if self._is_chinese_char(cp):
+                output.append(" ")
+                output.append(char)
+                output.append(" ")
+            else:
+                output.append(char)
+        return "".join(output)
+
+    def _is_chinese_char(self, cp):
+        """Check whether CP is the codepoint of a CJK character."""
+        # This defines a "chinese character" as anything in the CJK Unicode block:
+        #   https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block)
+        #
+        # Note that the CJK Unicode block is NOT all Japanese and Korean characters,
+        # despite its name. The modern Korean Hangul alphabet is a different block,
+        # as is Japanese Hiragana and Katakana. Those alphabets are used to write
+        # space-separated words, so they are not treated specially and handled
+        # like the all of the other languages.
+        if ((0x4E00 <= cp <= 0x9FFF)
+                or (0x3400 <= cp <= 0x4DBF)
+                or (0x20000 <= cp <= 0x2A6DF)
+                or (0x2A700 <= cp <= 0x2B73F)
+                or (0x2B740 <= cp <= 0x2B81F)
+                or (0x2B820 <= cp <= 0x2CEAF)
+                or (0xF900 <= cp <= 0xFAFF)
+                or (0x2F800 <= cp <= 0x2FA1F)):
+            return True
+
+        return False
+
+
+class WordpieceTokenizer():
+    """Construct the trainer of Bert."""
+
+    def __init__(self, vocab):
+        self.vocab_dict = vocab
+
+    def tokenize(self, tokens):
+        """
+        Do word-piece tokenization.
+
+        Args:
+            tokens: a word.
+
+        Returns:
+            a list of tokens that can be found in vocab dict.
+        """
+        output_tokens = []
+        tokens = convert_to_unicode(tokens)
+        for token in whitespace_tokenize(tokens):
+            chars = list(token)
+            len_chars = len(chars)
+            start = 0
+            end = len_chars
+            while start < len_chars:
+                while start < end:
+                    substr = "".join(token[start:end])
+                    if start != 0:
+                        substr = "##" + substr
+                    if substr in self.vocab_dict:
+                        output_tokens.append(substr)
+                        start = end
+                        end = len_chars
+                    else:
+                        end = end - 1
+                if start == end and start != len_chars:
+                    output_tokens.append("[UNK]")
+                    break
+        return output_tokens
+
+
+def _is_whitespace(char):
+    """Check whether `chars` is a whitespace character."""
+    # \t, \n, and \r are technically control characters but we treat them
+    # as whitespace since they are generally considered as such.
+    whitespace_char = [" ", "\t", "\n", "\r"]
+    if char in whitespace_char:
+        return True
+    cat = unicodedata.category(char)
+    if cat == "Zs":
+        return True
+    return False
+
+
+def _is_control(char):
+    """Check whether `chars` is a control character."""
+    # These are technically control characters but we count them as whitespace
+    # characters.
+    control_char = ["\t", "\n", "\r"]
+    if char in control_char:
+        return False
+    cat = unicodedata.category(char)
+    if cat in ("Cc", "Cf"):
+        return True
+    return False
+
+
+def _is_punctuation(char):
+    """Check whether `chars` is a punctuation character."""
+    cp = ord(char)
+    # We treat all non-letter/number ASCII as punctuation.
+    # Characters such as "^", "$", and "`" are not in the Unicode
+    # Punctuation class but we treat them as punctuation anyways, for
+    # consistency.
+    if ((33 <= cp <= 47) or (58 <= cp <= 64) or
+            (91 <= cp <= 96) or (123 <= cp <= 126)):
+        return True
+    cat = unicodedata.category(char)
+    if cat.startswith("P"):
+        return True
+    return False
diff --git a/vega/algorithms/nlp/src/utils.py b/vega/algorithms/nlp/src/utils.py
new file mode 100644
index 00000000..0b31c91b
--- /dev/null
+++ b/vega/algorithms/nlp/src/utils.py
@@ -0,0 +1,224 @@
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+"""Functional Cells used in Bert finetune and evaluation."""
+
+import os
+import math
+import collections
+import numpy as np
+import mindspore.nn as nn
+from mindspore import log as logger
+from mindspore.ops import operations as P
+from mindspore.common.tensor import Tensor
+from mindspore.common import dtype as mstype
+from mindspore.train.callback import Callback
+from mindspore.nn.learning_rate_schedule import LearningRateSchedule, PolynomialDecayLR, WarmUpLR
+
+
+class CrossEntropyCalculation(nn.Cell):
+    """Cross Entropy loss."""
+
+    def __init__(self, is_training=True):
+        super(CrossEntropyCalculation, self).__init__()
+        self.onehot = P.OneHot()
+        self.on_value = Tensor(1.0, mstype.float32)
+        self.off_value = Tensor(0.0, mstype.float32)
+        self.reduce_sum = P.ReduceSum()
+        self.reduce_mean = P.ReduceMean()
+        self.reshape = P.Reshape()
+        self.last_idx = (-1,)
+        self.neg = P.Neg()
+        self.cast = P.Cast()
+        self.is_training = is_training
+
+    def construct(self, logits, label_ids, num_labels):
+        """Construct the trainer of Bert."""
+        if self.is_training:
+            label_ids = self.reshape(label_ids, self.last_idx)
+            one_hot_labels = self.onehot(label_ids, num_labels, self.on_value, self.off_value)
+            per_example_loss = self.neg(self.reduce_sum(one_hot_labels * logits, self.last_idx))
+            loss = self.reduce_mean(per_example_loss, self.last_idx)
+            return_value = self.cast(loss, mstype.float32)
+        else:
+            return_value = logits * 1.0
+        return return_value
+
+
+def make_directory(path: str):
+    """Make directory."""
+    if path is None or not isinstance(path, str) or path.strip() == "":
+        logger.error("The path(%r) is invalid type.", path)
+        raise TypeError("Input path is invalid type")
+
+    # convert the relative paths
+    path = os.path.realpath(path)
+    logger.debug("The abs path is %r", path)
+
+    # check the path is exist and write permissions?
+    if os.path.exists(path):
+        real_path = path
+    else:
+        # All exceptions need to be caught because create directory maybe have some limit(permissions)
+        logger.debug("The directory(%s) doesn't exist, will create it", path)
+        try:
+            os.makedirs(path, exist_ok=True)
+            real_path = path
+        except PermissionError as e:
+            logger.error("No write permission on the directory(%r), error = %r", path, e)
+            raise TypeError("No write permission on the directory.")
+    return real_path
+
+
+class LossCallBack(Callback):
+    """Monitor the loss in training."""
+
+    def __init__(self, dataset_size=-1):
+        super(LossCallBack, self).__init__()
+        self._dataset_size = dataset_size
+
+    def step_end(self, run_context):
+        """Print loss after each step."""
+        cb_params = run_context.original_args()
+        if self._dataset_size > 0:
+            percent, epoch_num = math.modf(cb_params.cur_step_num / self._dataset_size)
+            if percent == 0:
+                percent = 1
+                epoch_num -= 1
+            print("epoch: {}, current epoch percent: {}, step: {}, outputs are {}"
+                  .format(int(epoch_num), "%.3f" % percent, cb_params.cur_step_num, str(cb_params.net_outputs)),
+                  flush=True)
+        else:
+            print("epoch: {}, step: {}, outputs are {}".format(cb_params.cur_epoch_num, cb_params.cur_step_num,
+                                                               str(cb_params.net_outputs)), flush=True)
+
+
+def LoadNewestCkpt(load_finetune_checkpoint_dir, steps_per_epoch, epoch_num, prefix):
+    """Find the ckpt finetune generated and load it into eval network."""
+    files = os.listdir(load_finetune_checkpoint_dir)
+    pre_len = len(prefix)
+    max_num = 0
+    for filename in files:
+        name_ext = os.path.splitext(filename)
+        if name_ext[-1] != ".ckpt":
+            continue
+        if filename.find(prefix) == 0 and not filename[pre_len].isalpha():
+            index = filename[pre_len:].find("-")
+            if index == 0 and max_num == 0:
+                load_finetune_checkpoint_path = os.path.join(load_finetune_checkpoint_dir, filename)
+            elif index not in (0, -1):
+                name_split = name_ext[-2].split('_')
+                if (steps_per_epoch != int(name_split[len(name_split) - 1])) \
+                        or (epoch_num != int(filename[pre_len + index + 1:pre_len + index + 2])):
+                    continue
+                num = filename[pre_len + 1:pre_len + index]
+                if int(num) > max_num:
+                    max_num = int(num)
+                    load_finetune_checkpoint_path = os.path.join(load_finetune_checkpoint_dir, filename)
+    return load_finetune_checkpoint_path
+
+
+class BertLearningRate(LearningRateSchedule):
+    """Warmup-decay learning rate for Bert network."""
+
+    def __init__(self, learning_rate, end_learning_rate, warmup_steps, decay_steps, power):
+        super(BertLearningRate, self).__init__()
+        self.warmup_flag = False
+        if warmup_steps > 0:
+            self.warmup_flag = True
+            self.warmup_lr = WarmUpLR(learning_rate, warmup_steps)
+        self.decay_lr = PolynomialDecayLR(learning_rate, end_learning_rate, decay_steps, power)
+        self.warmup_steps = Tensor(np.array([warmup_steps]).astype(np.float32))
+
+        self.greater = P.Greater()
+        self.one = Tensor(np.array([1.0]).astype(np.float32))
+        self.cast = P.Cast()
+
+    def construct(self, global_step):
+        """Construct the trainer of Bert."""
+        decay_lr = self.decay_lr(global_step)
+        if self.warmup_flag:
+            is_warmup = self.cast(self.greater(self.warmup_steps, global_step), mstype.float32)
+            warmup_lr = self.warmup_lr(global_step)
+            lr = (self.one - is_warmup) * decay_lr + is_warmup * warmup_lr
+        else:
+            lr = decay_lr
+        return lr
+
+
+def convert_labels_to_index(label_list):
+    """Convert label_list to indices for NER task."""
+    label2id = collections.OrderedDict()
+    label2id["O"] = 0
+    prefix = ["S_", "B_", "M_", "E_"]
+    index = 0
+    for label in label_list:
+        for pre in prefix:
+            index += 1
+            sub_label = pre + label
+            label2id[sub_label] = index
+    return label2id
+
+
+def _get_poly_lr(global_step, lr_init, lr_end, lr_max, warmup_steps, total_steps, poly_power):
+    """
+    Generate learning rate array.
+
+    Args:
+       global_step(int): current step
+       lr_init(float): init learning rate
+       lr_end(float): end learning rate
+       lr_max(float): max learning rate
+       warmup_steps(int): number of warmup epochs
+       total_steps(int): total epoch of training
+       poly_power(int): poly learning rate power
+
+    Returns:
+       np.array, learning rate array
+    """
+    lr_each_step = []
+    if warmup_steps != 0:
+        inc_each_step = (float(lr_max) - float(lr_init)) / float(warmup_steps)
+    else:
+        inc_each_step = 0
+    for i in range(total_steps):
+        if i < warmup_steps:
+            lr = float(lr_init) + inc_each_step * float(i)
+        else:
+            base = (1.0 - (float(i) - float(warmup_steps)) / (float(total_steps) - float(warmup_steps)))
+            lr = float(lr_max - lr_end) * (base ** poly_power)
+            lr = lr + lr_end
+            if lr < 0.0:
+                lr = 0.0
+        lr_each_step.append(lr)
+
+    learning_rate = np.array(lr_each_step).astype(np.float32)
+    current_step = global_step
+    learning_rate = learning_rate[current_step:]
+    return learning_rate
+
+
+def get_bert_thor_lr(lr_max=0.0034, lr_min=3.244e-05, lr_power=1.0, lr_total_steps=30000):
+    """Construct the trainer of Bert."""
+    learning_rate = _get_poly_lr(global_step=0, lr_init=0.0, lr_end=lr_min, lr_max=lr_max, warmup_steps=0,
+                                 total_steps=lr_total_steps, poly_power=lr_power)
+    return Tensor(learning_rate)
+
+
+def get_bert_thor_damping(damping_max=5e-2, damping_min=1e-6, damping_power=1.0, damping_total_steps=30000):
+    """Construct the trainer of Bert."""
+    damping = _get_poly_lr(global_step=0, lr_init=0.0, lr_end=damping_min, lr_max=damping_max, warmup_steps=0,
+                           total_steps=damping_total_steps, poly_power=damping_power)
+    return Tensor(damping)
diff --git a/vega/common/__init__.py b/vega/common/__init__.py
index 2e0f2995..bfca7ab5 100644
--- a/vega/common/__init__.py
+++ b/vega/common/__init__.py
@@ -1,4 +1,4 @@
-from .utils import init_log, module_existed, update_dict, copy_search_file
+from .utils import init_log, close_log, module_existed, update_dict, copy_search_file
 from .utils import update_dict_with_flatten_keys, switch_directory
 from .config import Config
 from .file_ops import FileOps
diff --git a/vega/common/backend_register.py b/vega/common/backend_register.py
index fe927696..d9a9a027 100644
--- a/vega/common/backend_register.py
+++ b/vega/common/backend_register.py
@@ -27,18 +27,26 @@ def set_backend(backend='pytorch', device_category='GPU'):
     :param backend: backend type, default pytorch
     :type backend: str
     """
-    # if "BACKEND_TYPE" in os.environ:
-    #     return
-    if 'NPU_VISIBLE_DEVICES' in os.environ:
-        os.environ['NPU-VISIBLE-DEVICES'] = os.environ['NPU_VISIBLE_DEVICES']
+    devices = os.environ.get("NPU_VISIBLE_DEVICES", None) or os.environ.get("NPU-VISIBLE-DEVICES", None)
+    if devices:
+        os.environ['NPU_VISIBLE_DEVICES'] = devices
     # CUDA visible
     if 'CUDA_VISIBLE_DEVICES' in os.environ:
         os.environ['DEVICE_CATEGORY'] = 'GPU'
-    elif 'NPU-VISIBLE-DEVICES' in os.environ:
+    elif 'NPU_VISIBLE_DEVICES' in os.environ:
         os.environ['DEVICE_CATEGORY'] = 'NPU'
-        if 'RANK_TABLE_FILE' in os.environ:
-            os.environ['ORIGIN_RANK_TABLE_FILE'] = os.environ['RANK_TABLE_FILE']
-        os.environ['ORIGIN_RANK_SIZE'] = os.environ['RANK_SIZE']
+
+    # CUDA_VISIBLE_DEVICES
+    if device_category.upper() == "GPU" and "CUDA_VISIBLE_DEVICES" not in os.environ:
+        if backend.lower() in ['pytorch', "p"]:
+            import torch
+            os.environ["CUDA_VISIBLE_DEVICES"] = ",".join(
+                [str(x) for x in list(range(torch.cuda.device_count()))])
+        elif backend.lower() in ['tensorflow', "t"]:
+            from tensorflow.python.client import device_lib
+            devices = device_lib.list_local_devices()
+            os.environ["CUDA_VISIBLE_DEVICES"] = ",".join(
+                [x.name.split(":")[2] for x in devices if x.device_type == "GPU"])
 
     # device
     if device_category is not None:
@@ -72,12 +80,12 @@ def set_backend(backend='pytorch', device_category='GPU'):
     register_networks(backend)
     register_modelzoo(backend)
 
-    # register ext modules
-    vega_extension_path = os.environ.get("VEGA_EXTENSION_PATH")
-    if vega_extension_path:
-        sys.path.append(vega_extension_path)
+    # register ascend automl modules
+    ascend_automl_path = os.environ.get("ASCEND_AUTOML_PATH")
+    if ascend_automl_path:
+        sys.path.append(ascend_automl_path)
     try:
-        import vega_extension
+        import ascend_automl
     except ImportError:
         pass
     # backup config
@@ -121,4 +129,6 @@ def get_devices():
     device_category = os.environ.get('DEVICE_CATEGORY', 'CPU')
     if device_category == 'GPU':
         device_category = 'cuda'
+        if "CUDA_VISIBLE_DEVICES" in os.environ:
+            device_id = int(os.environ["CUDA_VISIBLE_DEVICES"].split(",")[0])
     return "{}:{}".format(device_category.lower(), device_id)
diff --git a/vega/common/class_factory.py b/vega/common/class_factory.py
index 66e6064b..abff153a 100644
--- a/vega/common/class_factory.py
+++ b/vega/common/class_factory.py
@@ -200,37 +200,30 @@ def get_instance(cls, type_name, params=None, **kwargs):
         if type_name != ClassType.NETWORK:
             return t_cls(**_params) if _params else t_cls()
         # remove extra params
-        params_sig = sig(t_cls.__init__).parameters
-        instance = cls._create_instance_params(params_sig, _params, t_cls)
-        if not instance:
-            extra_param = {k: v for k, v in _params.items() if k not in params_sig}
-            _params = {k: v for k, v in _params.items() if k not in extra_param}
-            try:
-                instance = t_cls(**_params) if _params else t_cls()
-            except Exception as ex:
-                logging.error("Failed to create instance:{}".format(t_cls))
-                raise ex
-            for k, v in extra_param.items():
-                setattr(instance, k, v)
-        return instance
+        params_sig = sig(t_cls).parameters if isfunction(t_cls) else sig(t_cls.__init__).parameters
+        extra_param = {k: v for k, v in _params.items() if k not in params_sig}
+        return cls._create_instance_params(params_sig, _params, extra_param, t_cls)
 
     @classmethod
-    def _create_instance_params(cls, params_sig, _params, t_cls):
+    def _create_instance_params(cls, params_sig, _params, extra_param, t_cls):
         try:
             has_args = any('*' in str(v) and not str(v).startswith('**') for v in params_sig.values())
             has_kwargs = any('**' in str(v) for v in params_sig.values())
+            filter_params = {k: v for k, v in _params.items() if k not in extra_param}
             if has_args and not has_kwargs:
+                # fun(*args)
                 return t_cls(*list(_params.values())) if list(_params.values()) else t_cls()
-            if not has_args and has_kwargs:
-                return t_cls(**_params) if _params else t_cls()
             if has_args and has_kwargs:
-                if _params and list(_params.values()):
-                    return t_cls(*list(_params.values()), **_params)
-                if _params and not list(_params.values()):
-                    return t_cls(**_params)
-                if not _params and list(_params.values()):
-                    return t_cls(*list(_params.values()))
-                return t_cls()
+                # for connection module: fun(*args, **kwargs)
+                return t_cls(*list(extra_param.values()), **filter_params)
+            if not has_args and has_kwargs:
+                # fun(**kwargs)
+                return t_cls(**_params)
+            # fun(a, b, c=None)
+            instance = t_cls(**filter_params) if filter_params else t_cls()
+            for k, v in extra_param.items():
+                setattr(instance, k, v)
+            return instance
         except Exception as ex:
             logging.error("Failed to create instance:{}".format(t_cls))
             raise ex
diff --git a/vega/common/file_ops.py b/vega/common/file_ops.py
index 88e29d71..62d8aef1 100644
--- a/vega/common/file_ops.py
+++ b/vega/common/file_ops.py
@@ -237,3 +237,16 @@ def exists(cls, path):
         :rtype: bool
         """
         return os.path.isdir(path) or os.path.isfile(path)
+
+    @classmethod
+    def remove(cls, path):
+        """Remove file."""
+        if not os.path.exists(path):
+            return
+        try:
+            if os.path.isdir(path):
+                shutil.rmtree(path)
+            else:
+                os.remove(path)
+        except Exception:
+            logger.warn(f"Failed to remove file/dir: {path}")
diff --git a/vega/common/general.py b/vega/common/general.py
index 7b60d212..09e15ce7 100644
--- a/vega/common/general.py
+++ b/vega/common/general.py
@@ -34,16 +34,23 @@ class ClusterConfig(ConfigSerializable):
     """Cluster Config."""
 
     master_ip = None
-    listen_port = get_available_port()
+    listen_port = get_available_port(min_port=28000, max_port=28999)
     slaves = []
     standalone_boot = False
     num_workers = 0
+    num_nodes = 1
+    num_workers_per_node = 1
+    horovod = False         # read-only
+    hccl = False            # read-only
+    hccl_port = get_available_port(min_port=29000, max_port=29999)
+    hccl_server_ip = None   # read-only
+    enable_broadcast_buffers = False
+    show_all_ranks = False
 
 
 class Worker(ConfigSerializable):
     """Worker Config."""
 
-    # distributed = False
     timeout = 5 * 24 * 3600     # 5 days
     eval_count = 10
     evaluate_timeout = 0.1
@@ -115,3 +122,7 @@ class General(ConfigSerializable):
     requires = []
     message_port = None
     python_command = sys.executable or "python3"
+    device_evaluate_before_train = True
+    ms_execute_mode = 0  # 0-GRAPH_MODE 1-PYNATIVE_MODE
+    dataset_sink_mode = True
+    security_setting = None
diff --git a/vega/common/message_client.py b/vega/common/message_client.py
index f653d333..b528c00c 100644
--- a/vega/common/message_client.py
+++ b/vega/common/message_client.py
@@ -13,6 +13,7 @@
 import logging
 import zmq
 from vega.common.json_coder import JsonEncoder
+from vega.common.zmq_op import connect
 
 
 __all__ = ["MessageClient"]
@@ -31,9 +32,7 @@ def __init__(self, ip="127.0.0.1", port=None, timeout=30):
 
     def _init_socket(self):
         try:
-            context = zmq.Context()
-            self.socket = context.socket(zmq.REQ)
-            self.socket.connect(f"tcp://{self.ip}:{self.port}")
+            self.socket = connect(ip=self.ip, port=self.port)
             self.poller = zmq.Poller()
             self.poller.register(self.socket, zmq.POLLIN)
         except Exception as e:
diff --git a/vega/common/message_server.py b/vega/common/message_server.py
index c5562a4b..4a6a2f9a 100644
--- a/vega/common/message_server.py
+++ b/vega/common/message_server.py
@@ -11,12 +11,12 @@
 """Message Server."""
 
 import logging
-import zmq
 import ast
 import os
 from threading import Thread
 from vega.common.utils import singleton
 from vega.common import JsonEncoder
+from vega.common.zmq_op import listen
 
 
 __all__ = ["MessageServer"]
@@ -30,8 +30,8 @@ class MessageServer(object):
     def __init__(self):
         """Initialize message server."""
         self.handlers = {}
-        self.min_port = 5000
-        self.max_port = 7000
+        self.min_port = 27000
+        self.max_port = 27999
         self.port = None
         self.register_handler("query_task_info", query_task_info)
 
@@ -41,10 +41,8 @@ def run(self, ip="*"):
             return
 
         try:
-            context = zmq.Context()
-            socket = context.socket(zmq.REP)
-            self.port = socket.bind_to_random_port(
-                f"tcp://{ip}", min_port=self.min_port, max_port=self.max_port, max_tries=100)
+            (socket, self.port) = listen(
+                ip=ip, min_port=self.min_port, max_port=self.max_port, max_tries=100)
             logging.debug("Start message monitor thread.")
             monitor_thread = Thread(target=_monitor_socket, args=(socket, self.handlers))
             monitor_thread.daemon = True
diff --git a/vega/common/parameter_sharing.py b/vega/common/parameter_sharing.py
new file mode 100644
index 00000000..28c491c0
--- /dev/null
+++ b/vega/common/parameter_sharing.py
@@ -0,0 +1,98 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+"""This is Search on Network."""
+import json
+import logging
+import vega
+import hashlib
+from vega.common import TaskOps, FileOps
+from vega.common.utils import singleton
+from threading import Lock
+
+_lock = Lock()
+
+
+def calculated_uuid(value):
+    """Create uuid by static names."""
+    value = str(json.dumps(value)) if isinstance(value, dict) else str(value)
+    return hashlib.md5(value.encode()).hexdigest()  # hash(value)
+
+
+def add_share_file_path(uuid, file_name):
+    """Share file path."""
+    global _lock
+    with _lock:
+        ParameterSharing().__shared_params__[uuid] = file_name
+        return uuid
+
+
+def pop_share_file_path(uuid):
+    """Pop Shared file path."""
+    global _lock
+    with _lock:
+        cls = ParameterSharing()
+        if not cls.__shared_params__:
+            return None
+        file_name = cls.__shared_params__.pop(uuid)
+        result = FileOps.join_path(cls.sharing_dir, file_name)
+        cls.__popped_files__.append(result)
+        return result
+
+
+@singleton
+class ParameterSharing(object):
+    """Parameter sharing class."""
+
+    __shared_params__ = {}
+    __popped_files__ = []
+
+    def __init__(self):
+        self.sharing_dir = FileOps.join_path(TaskOps().local_base_path, 'parameter_sharing')
+        FileOps.make_dir(self.sharing_dir)
+
+    def push(self, model, name):
+        """Push state dict and save into files."""
+        uuid = calculated_uuid(model.to_desc() if hasattr(model, "to_desc") else str(model))
+        file_name = "{}_{}.{}".format(name, uuid, 'pth' if vega.is_torch_backend() else 'ckpt')
+        saved_file_path = FileOps.join_path(self.sharing_dir, file_name)
+        self._save(model, saved_file_path)
+        add_share_file_path(uuid, saved_file_path)
+        logging.info("push shared weight file uuid:{}".format(uuid))
+        return saved_file_path
+
+    def pop(self, desc):
+        """Pop one file path."""
+        if not self.__shared_params__:
+            return
+        uuid = calculated_uuid(desc)
+        logging.info("pop shared weight file uuid:{}".format(uuid))
+        return pop_share_file_path(uuid)
+
+    def _save(self, model, file_name):
+        if vega.is_torch_backend():
+            import torch
+            torch.save(model.state_dict(), file_name)
+        elif vega.is_ms_backend():
+            from mindspore.train.serialization import save_checkpoint
+            save_checkpoint(model, file_name)
+
+    def _remove(self, file_path):
+        FileOps.remove(file_path)
+
+    def remove(self):
+        """Remove file has been popped."""
+        while self.__popped_files__:
+            self._remove(self.__popped_files__.pop())
+
+    def clear(self):
+        """Clear all shared params and remove files."""
+        self.__shared_params__ = {}
+        self.__popped_files__ = []
+        self._remove(self.sharing_dir)
diff --git a/vega/common/pareto_front.py b/vega/common/pareto_front.py
index 4010a9b5..d5aa037f 100644
--- a/vega/common/pareto_front.py
+++ b/vega/common/pareto_front.py
@@ -13,14 +13,18 @@
 import numpy as np
 
 
-def get_pareto(scores, index=False):
+def get_pareto(scores, index=False, max_nums=-1, choice_column=0, choice='normal', seed=None):
     """Get pareto front."""
     # TODO Get a specified number of samples
     data = scores
     if index:
         data = scores[:, 1:]
     pareto_indexes = get_pareto_index(data)
-    return scores[pareto_indexes]
+    res = scores[pareto_indexes]
+    if max_nums == -1 or len(res) <= max_nums:
+        return res
+    if choice == 'normal':
+        return normal_selection(res, max_nums, choice_column, seed)
 
 
 def get_pareto_index(scores):
@@ -33,3 +37,19 @@ def get_pareto_index(scores):
                 pareto_indexes[i] = False
                 break
     return pareto_indexes
+
+
+def normal_selection(outs, max_nums, choice_column=0, seed=None):
+    """Select one record."""
+    if seed:
+        np.random.seed(seed)
+    data = outs[:, choice_column].tolist()
+    prob = [round(np.log(i + 1e-2), 2) for i in range(1, len(data) + 1)]
+    prob_temp = prob
+    for idx, out in enumerate(data):
+        sorted_ind = np.argsort(out)
+        for idx, ind in enumerate(sorted_ind):
+            prob[ind] += prob_temp[idx]
+    normalization = [float(i) / float(sum(prob)) for i in prob]
+    idx = [np.random.choice(len(data), max_nums, replace=False, p=normalization)]
+    return outs[idx]
diff --git a/vega/common/user_config.py b/vega/common/user_config.py
index 34443fcf..c218dcdb 100644
--- a/vega/common/user_config.py
+++ b/vega/common/user_config.py
@@ -71,7 +71,7 @@ def merge_reference(child):
         ref_dict = deepcopy(UserConfig().data)
         for key in ref.split('.'):
             ref_dict = ref_dict.get(key)
-        not_merge_keys = ['callbacks', 'lazy_built']
+        not_merge_keys = ['callbacks', 'lazy_built', 'max_train_steps', 'with_train', 'with_vaild']
         for key in not_merge_keys:
             if key in ref_dict:
                 ref_dict.pop(key)
diff --git a/vega/common/utils.py b/vega/common/utils.py
index a8d2c5e5..247c6b0e 100644
--- a/vega/common/utils.py
+++ b/vega/common/utils.py
@@ -110,6 +110,13 @@ def init_log(level, log_path="./logs/", log_file="log.txt"):
     logging.getLogger().addHandler(fh)
     pil_logger = logging.getLogger('PIL')
     pil_logger.setLevel(logging.INFO)
+    return fh
+
+
+def close_log(fh: logging.Handler):
+    """Close log."""
+    fh.close()
+    logging.getLogger().removeHandler(fh)
 
 
 def lazy(func):
diff --git a/vega/common/wrappers.py b/vega/common/wrappers.py
index 2f4edc36..db7fe44c 100644
--- a/vega/common/wrappers.py
+++ b/vega/common/wrappers.py
@@ -9,9 +9,12 @@
 # MIT License for more details.
 
 """Provide wrapper functions."""
+
+import os
 from inspect import signature as sig
 from functools import wraps
-from vega.common import ClassFactory
+import vega
+from vega.common import ClassFactory, init_log, close_log, General
 
 
 def metric(name=None):
@@ -36,7 +39,53 @@ def wrapper(*args, **kwargs):
             params_sig = sig(func).parameters
             params = {param: value for param, value in kwargs.items() if param in params_sig}
             return func(*args, **params)
-
         return wrapper
-
     return decorator
+
+
+def train_process_wrapper(func):
+    """Train process wrapper."""
+    @wraps(func)
+    def wrapper(self, *args, **kwargs):
+        """Wrap method."""
+        log_type = "worker"
+        worker_type = getattr(self, "worker_type", None)
+        if worker_type is not None:
+            worker_type_value = worker_type.value
+        else:
+            worker_type_value = None
+        if worker_type_value == 3:
+            log_type = "host_evaluator"
+        elif worker_type_value == 5:
+            log_type = "device_evaluator"
+        fh = init_log(level=General.logger.level,
+                      log_file=f"{self.step_name}_{log_type}_{self.worker_id}.log",
+                      log_path=self.local_log_path)
+        if not getattr(self, "hccl", False):
+            pop_rank_envs()
+        r = func(self, *args, **kwargs)
+        if not getattr(self, "hccl", False):
+            restore_rank_envs()
+        close_log(fh)
+        return r
+    return wrapper
+
+
+_envs = {}
+
+
+def pop_rank_envs():
+    """Pop rank envs."""
+    envs = ["RANK_TABLE_FILE", "RANK_SIZE", "RANK_ID"]
+    global _envs
+    for env in envs:
+        if env in os.environ:
+            _envs[env] = os.environ[env]
+            os.environ.pop(env)
+
+
+def restore_rank_envs():
+    """Restore rank envs."""
+    global _envs
+    for env in _envs:
+        os.environ[env] = _envs[env]
diff --git a/vega/common/zmq_op.py b/vega/common/zmq_op.py
new file mode 100644
index 00000000..67e38c5e
--- /dev/null
+++ b/vega/common/zmq_op.py
@@ -0,0 +1,30 @@
+# -*- coding: utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""ZMQ operation."""
+
+import zmq
+
+
+def listen(ip, min_port, max_port, max_tries):
+    """Listen on the server."""
+    context = zmq.Context()
+    socket = context.socket(zmq.REP)
+    port = socket.bind_to_random_port(
+        f"tcp://{ip}", min_port=min_port, max_port=max_port, max_tries=100)
+    return socket, port
+
+
+def connect(ip, port):
+    """Connect to server."""
+    context = zmq.Context()
+    socket = context.socket(zmq.REQ)
+    socket.connect(f"tcp://{ip}:{port}")
+    return socket
diff --git a/vega/core/pipeline/__init__.py b/vega/core/pipeline/__init__.py
index 105b214b..4b2727fc 100644
--- a/vega/core/pipeline/__init__.py
+++ b/vega/core/pipeline/__init__.py
@@ -9,4 +9,6 @@
     "train_pipe_step": ["TrainPipeStep"],
     "benchmark_pipe_step": ["BenchmarkPipeStep"],
     "multi_task_pipe_step": ["MultiTaskPipeStep"],
+    "horovod_train_step": ["HorovodTrainStep"],
+    "hccl_train_step": ["HcclTrainStep"],
 })
diff --git a/vega/core/pipeline/generator.py b/vega/core/pipeline/generator.py
index 7459149d..37f51baf 100644
--- a/vega/core/pipeline/generator.py
+++ b/vega/core/pipeline/generator.py
@@ -21,8 +21,9 @@
 from vega.common.task_ops import TaskOps
 from vega.report import ReportServer, ReportClient
 from vega.common.config import Config
-from vega.common import update_dict, SearchableRegister
+from vega.common import update_dict
 from vega.common.utils import remove_np_value
+from vega.common.parameter_sharing import ParameterSharing
 
 
 class Generator(object):
@@ -43,6 +44,7 @@ def is_completed(self):
     def sample(self):
         """Sample a work id and model from search algorithm."""
         out = []
+        kwargs_list = []
         num_samples = 1
         for _ in range(10):
             res = self.search_alg.search()
@@ -54,45 +56,51 @@ def sample(self):
             if num_samples == 0:
                 return None
             for sample in res:
-                if isinstance(sample, dict):
-                    id = sample["worker_id"]
-                    desc = sample["encoded_desc"]
-                    sample.pop("worker_id")
-                    sample.pop("encoded_desc")
-                    kwargs = sample
-                    sample = _split_sample((id, desc))
-                else:
-                    kwargs = {}
-                    sample = _split_sample(sample)
-                if hasattr(self, "objective_keys") and self.objective_keys:
-                    kwargs["objective_keys"] = self.objective_keys
-                (id, desc, hps) = sample
-                if SearchableRegister().has_searchable():
-                    hps = SearchableRegister().update(desc)
-                    desc = PipeStepConfig.model.model_desc
-                else:
-                    desc = self._decode_hps(desc)
-                    hps = self._decode_hps(hps)
-                if "modules" in desc:
-                    PipeStepConfig.model.model_desc = deepcopy(desc)
-                elif "network" in desc:
-                    origin_desc = PipeStepConfig.model.model_desc
-                    model_desc = update_dict(desc["network"], origin_desc)
-                    PipeStepConfig.model.model_desc = model_desc
-                    desc.pop('network')
-                    desc.update(model_desc)
-
-                (hps, desc) = self._split_hps_desc(hps, desc)
-
+                (id, desc, hps, kwargs) = self._get_hps_desc_from_sample(sample)
                 if not vega.quota().verify_sample(desc) or not vega.quota().verify_affinity(desc):
                     continue
-
-                ReportClient().update(General.step_name, id, desc=desc, hps=hps, **kwargs)
                 out.append((id, desc, hps))
+                kwargs_list.append(kwargs)
             if len(out) >= num_samples:
                 break
+        for i in range(num_samples):
+            ReportClient().update(General.step_name, out[i][0], desc=out[i][1], hps=out[i][2], **kwargs_list[i])
         return out[:num_samples]
 
+    def _get_hps_desc_from_sample(self, sample):
+        if isinstance(sample, dict):
+            id = sample["worker_id"]
+            desc = sample["encoded_desc"]
+            sample.pop("worker_id")
+            sample.pop("encoded_desc")
+            kwargs = sample
+            sample = _split_sample((id, desc))
+        else:
+            kwargs = {}
+            sample = _split_sample(sample)
+        if hasattr(self, "objective_keys") and self.objective_keys:
+            kwargs["objective_keys"] = self.objective_keys
+        (id, desc, hps) = sample
+        if hasattr(self.search_alg.search_space, "to_desc"):
+            desc = self.search_alg.search_space.to_desc(desc)
+        else:
+            desc = self._decode_hps(desc)
+            hps = self._decode_hps(hps)
+            network_desc = None
+            if "modules" in desc:
+                PipeStepConfig.model.model_desc = deepcopy(desc)
+            elif "network" in desc:
+                origin_desc = PipeStepConfig.model.model_desc
+                network_desc = update_dict(desc["network"], origin_desc)
+                PipeStepConfig.model.model_desc = network_desc
+                desc.pop('network')
+
+            (hps, desc) = self._split_hps_desc(hps, desc)
+            if network_desc is not None:
+                desc.update(network_desc)
+
+        return id, desc, hps, kwargs
+
     def _split_hps_desc(self, hps, desc):
         if "type" not in desc or desc.get("type") != "Sequential":
             del_items = []
@@ -119,10 +127,11 @@ def update(self, step_name, worker_id):
         record = ReportClient().get_record(step_name, worker_id)
         logging.debug("Get Record=%s", str(record))
         self.search_alg.update(record.serialize())
-        try:
-            self.dump()
-        except TypeError:
-            logging.warning("The Generator contains object which can't be pickled.")
+        ParameterSharing().remove()
+        # try:
+        #     self.dump()
+        # except Exception:
+        #     logging.warning("The Generator contains object which can't be pickled.")
         logging.info(f"Update Success. step_name={step_name}, worker_id={worker_id}")
         logging.info("Best values: %s", ReportServer().print_best(step_name=General.step_name))
 
diff --git a/vega/core/pipeline/hccl_train_step.py b/vega/core/pipeline/hccl_train_step.py
new file mode 100644
index 00000000..985bb96c
--- /dev/null
+++ b/vega/core/pipeline/hccl_train_step.py
@@ -0,0 +1,120 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""HCCL fully train."""
+
+import os
+import logging
+import json
+import vega
+from .train_pipe_step import TrainPipeStep
+from vega.common.general import General
+from vega.common.class_factory import ClassFactory, ClassType
+from vega.common import Status, TaskOps
+from vega.report import ReportServer
+from vega.core.scheduler import create_master
+from vega.trainer.conf import TrainerConfig
+
+logger = logging.getLogger(__name__)
+
+
+@ClassFactory.register(ClassType.PIPE_STEP)
+class HcclTrainStep(TrainPipeStep):
+    """TrainPipeStep is the implementation class of PipeStep.
+
+    Fully train is the last pipe step in pipeline, we provide horovrd or local trainer
+    for user to choose.
+    """
+
+    def do(self):
+        """Start to run fully train with horovod or local trainer."""
+        logger.info("HcclTrainStep started.")
+        General.cluster.hccl = True
+        records = self._get_current_step_records()
+        logger.debug("load pipestep records: {}".format(records))
+        self.num_models = len(records)
+        self.num_epochs = self.num_models * TrainerConfig.epochs
+        self.update_status(Status.running)
+        self._set_nccl_ip_port()
+        self._new_rank_table_file()
+        self._set_ms_env()
+        self._train_multi_models(records)
+        ReportServer().output_step_all_records(step_name=self.task.step_name)
+        ReportServer().backup_output_path()
+        self.update_status(Status.finished)
+
+    def train_model(self, trainer):
+        """Train HCCL model."""
+        origin_worker_id = trainer.worker_id
+
+        General.parallel_fully_train = True
+        General.devices_per_trainer = 1
+        General._parallel = True
+
+        self.master = create_master()
+        for i in range(General.cluster.num_workers):
+            worker_id = f"{origin_worker_id}-{i}" if i != 0 else origin_worker_id
+            trainer.worker_id = worker_id
+            trainer.hccl = True
+            self.master.run(trainer)
+        self.master.join()
+
+        evaluator = self._get_evaluator(origin_worker_id)
+        if evaluator:
+            self.master.run(evaluator)
+            self.master.join()
+
+        self.master.close()
+
+    def _set_nccl_ip_port(self):
+        if not vega.is_torch_backend():
+            return
+        rank_file = os.environ["RANK_TABLE_FILE"]
+        with open(rank_file, 'r') as f:
+            data = json.loads(f.read())
+        General.cluster.hccl_server_ip = data['server_list'][0]['server_id']
+        if "server_port" in data['server_list'][0]:
+            General.cluster.hccl_port = int(data['server_list'][0]["server_port"])
+        os.environ["vega_pytorch_hccl_port"] = {General.cluster.hccl_port}
+        logger.info(f"HCCL server: tcp://{General.cluster.hccl_server_ip}:{General.cluster.hccl_port}")
+
+    def _new_rank_table_file(self):
+        if not vega.is_torch_backend():
+            return
+        rank_file = os.environ["RANK_TABLE_FILE"]
+        with open(rank_file, 'r') as f:
+            data = json.loads(f.read())
+        device_ids = os.environ["NPU_VISIBLE_DEVICES"].split(",")
+        changed = False
+        num_server = len(data['server_list'])
+        rank_size = 0
+        rank_index = 0
+        for server_id in range(num_server):
+            origin_devices = data['server_list'][server_id]['device']
+            if len(device_ids) != len(origin_devices):
+                changed = True
+            new_devices = []
+            for device in origin_devices:
+                if device["device_id"] in device_ids:
+                    device["rank_id"] = str(rank_index)
+                    rank_index += 1
+                    new_devices.append(device)
+            data['server_list'][server_id]['device'] = new_devices
+            rank_size += len(new_devices)
+        if changed:
+            rank_file = os.path.join(TaskOps().temp_path, "rank_table_file.json")
+            with open(rank_file, "w") as f:
+                json.dump(data, f)
+            os.environ["RANK_TABLE_FILE"] = rank_file
+            os.environ["RANK_SIZE"] = str(rank_size)
+
+    def _set_ms_env(self):
+        if vega.is_ms_backend():
+            os.environ["MINDSPORE_HCCL_CONFIG_PATH"] = os.environ["RANK_TABLE_FILE"]
diff --git a/vega/core/pipeline/horovod/horovod_train.py b/vega/core/pipeline/horovod/horovod_train.py
index 2a1c100c..4d768d12 100644
--- a/vega/core/pipeline/horovod/horovod_train.py
+++ b/vega/core/pipeline/horovod/horovod_train.py
@@ -41,6 +41,9 @@
 General.from_dict(cf_content.get('general_config'))
 PipeStepConfig.from_dict(cf_content.get('pipe_step_config'))
 cls_trainer = ClassFactory.get_cls('trainer', "Trainer")
-# for record in records:
-trainer = cls_trainer(model_desc=model_desc, id=worker_id)
+
+device_id = os.environ["CUDA_VISIBLE_DEVICES"].split(",")[hvd.local_rank()]
+os.environ["CUDA_VISIBLE_DEVICES"] = device_id
+
+trainer = cls_trainer(model_desc=model_desc, id=worker_id, horovod=True)
 trainer.train_process()
diff --git a/vega/core/pipeline/horovod_train_step.py b/vega/core/pipeline/horovod_train_step.py
new file mode 100644
index 00000000..912e9c33
--- /dev/null
+++ b/vega/core/pipeline/horovod_train_step.py
@@ -0,0 +1,81 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""Horovod fully train."""
+
+import os
+import logging
+import subprocess
+import pickle
+import vega
+from .train_pipe_step import TrainPipeStep
+from vega.common.general import General
+from vega.common.class_factory import ClassFactory, ClassType
+from vega.common import Status
+from vega.report import ReportServer
+from vega.core.pipeline.conf import PipeStepConfig
+from vega.trainer.conf import TrainerConfig
+
+logger = logging.getLogger(__name__)
+
+
+@ClassFactory.register(ClassType.PIPE_STEP)
+class HorovodTrainStep(TrainPipeStep):
+    """TrainPipeStep is the implementation class of PipeStep.
+
+    Fully train is the last pipe step in pipeline, we provide horovrd or local trainer
+    for user to choose.
+    """
+
+    def do(self):
+        """Start to run fully train with horovod or local trainer."""
+        logger.info("HorovodTrainStep started.")
+        General.cluster.horovod = True
+        records = self._get_current_step_records()
+        logger.debug("load pipestep records: {}".format(records))
+        self._set_cluster_info()
+        self.num_models = len(records)
+        self.num_epochs = self.num_models * TrainerConfig.epochs
+        self.update_status(Status.running)
+        self._train_multi_models(records)
+        ReportServer().output_step_all_records(step_name=self.task.step_name)
+        ReportServer().backup_output_path()
+        self.update_status(Status.finished)
+
+    def _set_cluster_info(self):
+        General.cluster.num_workers_per_node = len(os.environ["CUDA_VISIBLE_DEVICES"].split(","))
+        General.cluster.num_workers = General.cluster.num_workers_per_node * General.cluster.num_nodes
+
+    def train_model(self, trainer):
+        """Train horovod model."""
+        pwd_dir = os.path.dirname(os.path.abspath(__file__))
+        cf_file = os.path.join(self.task.temp_path, 'cf.pickle')
+        cf_content = {'registry': ClassFactory.__registry__,
+                      'general_config': General().to_dict(),
+                      'pipe_step_config': PipeStepConfig().to_dict(),
+                      'model_desc': trainer.model_desc,
+                      'worker_id': trainer.worker_id}
+        with open(cf_file, 'wb') as f:
+            pickle.dump(cf_content, f)
+        if os.environ.get('DLS_TASK_NUMBER') is None:
+            # local cluster
+            worker_ips = '127.0.0.1'
+            if General.cluster.master_ip is not None and General.cluster.master_ip != '127.0.0.1':
+                worker_ips = General.cluster.master_ip
+                for ip in General.cluster.slaves:
+                    worker_ips = worker_ips + ',' + ip
+            cmd = ['bash', f'{pwd_dir}/horovod/run_horovod_train.sh',
+                   str(General.cluster.num_workers), cf_file, worker_ips, General.python_command]
+        else:
+            # Roma
+            cmd = ['bash', '/home/work/run_horovod_train.sh',
+                   str(General.cluster.num_workers), cf_file]
+        proc = subprocess.Popen(cmd, env=os.environ)
+        proc.wait()
diff --git a/vega/core/pipeline/multi_task_pipe_step.py b/vega/core/pipeline/multi_task_pipe_step.py
index 9a5aa116..3228d156 100644
--- a/vega/core/pipeline/multi_task_pipe_step.py
+++ b/vega/core/pipeline/multi_task_pipe_step.py
@@ -8,7 +8,8 @@
 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 # MIT License for more details.
 
-"""Fully Train PipeStep that used in Pipeline."""
+"""Multi-task pipe step."""
+
 import logging
 from vega.common.general import General
 from vega.common.class_factory import ClassFactory, ClassType
@@ -33,7 +34,6 @@ class MultiTaskPipeStep(TrainPipeStep):
 
     def __init__(self, *args, **kwargs):
         super().__init__(*args, **kwargs)
-        self._distributed_training = not General._parallel and TrainerConfig.distributed
         logger.info("init MultiTaskPipeStep...")
 
     def do(self):
@@ -56,10 +56,7 @@ def _train_single_model(self, model_desc, model_id, hps, multi_task):
         logging.debug("update record=%s", str(record))
         trainer = cls_trainer(model_desc=model_desc, id=model_id, hps=hps, multi_task=multi_task)
         ReportClient().update(**record.to_dict())
-        if self._distributed_training:
-            self._do_distributed_fully_train(trainer)
-        else:
-            self._do_single_fully_train(trainer)
+        self.train_model(trainer)
 
     def _train_multi_task(self):
         from copy import deepcopy
diff --git a/vega/core/pipeline/pipe_step.py b/vega/core/pipeline/pipe_step.py
index 9fac7f7f..da8885d9 100644
--- a/vega/core/pipeline/pipe_step.py
+++ b/vega/core/pipeline/pipe_step.py
@@ -35,8 +35,6 @@ def __init__(self, name=None, **kwargs):
         self.end_time = None
         self.num_epochs = None
         self.num_models = None
-        # TODO
-        # ReportServer().restore()
 
     def __new__(cls, *args, **kwargs):
         """Create pipe step instance by ClassFactory."""
@@ -45,7 +43,6 @@ def __new__(cls, *args, **kwargs):
 
     def do(self, *args, **kwargs):
         """Do the main task in this pipe step."""
-        # set self.num_models, self.epochs and self.status=running/finished
         pass
 
     def save_info(self):
diff --git a/vega/core/pipeline/pipeline.py b/vega/core/pipeline/pipeline.py
index 5399606b..ec5b2f6a 100644
--- a/vega/core/pipeline/pipeline.py
+++ b/vega/core/pipeline/pipeline.py
@@ -22,7 +22,8 @@
 from vega.common.general import General
 from .conf import PipeStepConfig, PipelineConfig
 from vega.report import ReportServer
-from vega.common import MessageServer
+from vega.common.message_server import MessageServer
+from vega.common.parameter_sharing import ParameterSharing
 
 logger = logging.getLogger(__name__)
 
diff --git a/vega/core/pipeline/train_pipe_step.py b/vega/core/pipeline/train_pipe_step.py
index a2ff0bce..7e0da122 100644
--- a/vega/core/pipeline/train_pipe_step.py
+++ b/vega/core/pipeline/train_pipe_step.py
@@ -9,10 +9,9 @@
 # MIT License for more details.
 
 """Fully Train PipeStep that used in Pipeline."""
+
 import os
 import logging
-import subprocess
-import pickle
 import vega
 from .pipe_step import PipeStep
 from vega.common.general import General
@@ -36,7 +35,6 @@ class TrainPipeStep(PipeStep):
 
     def __init__(self, *args, **kwargs):
         super().__init__(*args, **kwargs)
-        self._distributed_training = not General._parallel and TrainerConfig.distributed
         logger.info("init TrainPipeStep...")
 
     def do(self):
@@ -75,7 +73,14 @@ def _get_current_step_records(self):
             record.step_name = step_name
         return records
 
-    def _train_single_model(self, model_desc=None, hps=None, model_id=None, weights_file=None):
+    def _train_multi_models(self, records):
+        for record in records:
+            weights_file = record.weights_file if PipeStepConfig.pipe_step.get("load_weights", True) else None
+            trainer = self._build_trainer(
+                model_desc=record.desc, hps=record.hps, model_id=record.worker_id, weights_file=weights_file)
+            self.train_model(trainer)
+
+    def _build_trainer(self, model_desc=None, hps=None, model_id=None, weights_file=None):
         cls_trainer = ClassFactory.get_cls(ClassType.TRAINER, PipeStepConfig.trainer.type)
         step_name = self.task.step_name
         if model_desc is not None:
@@ -91,111 +96,16 @@ def _train_single_model(self, model_desc=None, hps=None, model_id=None, weights_
         if vega.is_torch_backend() and General._resume:
             trainer.load_checkpoint = True
             trainer._resume_training = True
-        if self._distributed_training:
-            self._do_distributed_fully_train(trainer)
-        else:
-            self._do_single_fully_train(trainer)
+        return trainer
 
-    def _train_single_gpu_model(self, trainer):
+    def train_model(self, trainer):
+        """Train model."""
         evaluator = self._get_evaluator(trainer.worker_id)
         self.master.run(trainer, evaluator)
 
-    def _train_single_npu_model(self, trainer):
-        temp_rank_file = os.environ.get('RANK_TABLE_FILE', None)
-        temp_rank_size = os.environ['RANK_SIZE']
-        os.environ.pop('RANK_TABLE_FILE', None)
-        os.environ['RANK_SIZE'] = '1'
-        evaluator = self._get_evaluator(trainer.worker_id)
-        self.master.run(trainer, evaluator)
-        if temp_rank_file is not None:
-            os.environ['RANK_TABLE_FILE'] = temp_rank_file
-        os.environ['RANK_SIZE'] = temp_rank_size
-
-    def _do_single_fully_train(self, trainer):
-        if os.environ['DEVICE_CATEGORY'] == 'GPU':
-            self._train_single_gpu_model(trainer)
-        elif os.environ['DEVICE_CATEGORY'] == 'NPU':
-            self._train_single_npu_model(trainer)
-
-    def _train_multi_models(self, records):
-        for record in records:
-            weights_file = record.weights_file if PipeStepConfig.pipe_step.get("load_weights", True) else None
-            self._train_single_model(
-                model_desc=record.desc, hps=record.hps, model_id=record.worker_id, weights_file=weights_file)
-
     def _get_evaluator(self, worker_id):
         if not PipeStepConfig.evaluator_enable:
             return None
         cls_evaluator = ClassFactory.get_cls('evaluator', "Evaluator")
         evaluator = cls_evaluator({"step_name": self.task.step_name, "worker_id": worker_id})
         return evaluator
-
-    def _do_horovod_fully_train(self, trainer):
-        # records = self._get_current_step_records()
-        pwd_dir = os.path.dirname(os.path.abspath(__file__))
-        cf_file = os.path.join(self.task.temp_path, 'cf.pickle')
-        cf_content = {'registry': ClassFactory.__registry__,
-                      'general_config': General().to_dict(),
-                      'pipe_step_config': PipeStepConfig().to_dict(),
-                      'model_desc': trainer.model_desc,
-                      'worker_id': trainer.worker_id}
-        with open(cf_file, 'wb') as f:
-            pickle.dump(cf_content, f)
-        if os.environ.get('DLS_TASK_NUMBER') is None:
-            # local cluster
-            worker_ips = '127.0.0.1'
-            if General.cluster.master_ip is not None and General.cluster.master_ip != '127.0.0.1':
-                worker_ips = General.cluster.master_ip
-                for ip in General.cluster.slaves:
-                    worker_ips = worker_ips + ',' + ip
-            cmd = ['bash', f'{pwd_dir}/horovod/run_horovod_train.sh',
-                   str(self.world_device_size), cf_file, worker_ips, General.python_command]
-        else:
-            # Roma
-            cmd = ['bash', '/home/work/run_horovod_train.sh',
-                   str(self.world_device_size), cf_file]
-        proc = subprocess.Popen(cmd, env=os.environ)
-        proc.wait()
-
-    def _do_hccl_fully_train(self, trainer):
-        origin_worker_id = trainer.worker_id
-        model_desc = trainer.model_desc
-        del trainer
-
-        os.environ['RANK_SIZE'] = os.environ['ORIGIN_RANK_SIZE']
-        os.environ['RANK_TABLE_FILE'] = os.environ['ORIGIN_RANK_TABLE_FILE']
-        origin_parallel_fully_train = General.parallel_fully_train
-        origin_parallel = General._parallel
-        General.parallel_fully_train = True
-        General.dft = True
-        General._parallel = True
-
-        cls_trainer = ClassFactory.get_cls(ClassType.TRAINER, PipeStepConfig.trainer.type)
-        self.master = create_master()
-        workers_num = int(os.environ['RANK_SIZE'])
-        for i in range(workers_num):
-            worker_id = "{}-{}".format(origin_worker_id, i)
-            trainer = cls_trainer(model_desc, id=worker_id)
-            evaluator = self._get_evaluator(worker_id) if os.environ['DEVICE_ID'] == "0" else None
-            self.master.run(trainer, evaluator)
-
-        self.master.join()
-        self.master.close()
-        General.parallel_fully_train = origin_parallel_fully_train
-        General.dft = False
-        General._parallel = origin_parallel
-
-    def _do_distributed_fully_train(self, trainer):
-        if os.environ['DEVICE_CATEGORY'] == 'GPU':
-            self._do_horovod_fully_train(trainer)
-        elif os.environ['DEVICE_CATEGORY'] == 'NPU':
-            self._do_hccl_fully_train(trainer)
-
-    @property
-    def world_device_size(self):
-        """World device size is world size * device count in each world."""
-        import torch
-        world_size = General.env.world_size
-        device_nums = torch.cuda.device_count()
-        num_devices = world_size * device_nums
-        return num_devices
diff --git a/vega/core/scheduler/dask_env.py b/vega/core/scheduler/dask_env.py
index ce16923e..68071d8f 100644
--- a/vega/core/scheduler/dask_env.py
+++ b/vega/core/scheduler/dask_env.py
@@ -13,16 +13,17 @@
 The DaskEnv Class which used in Master to init and set basic dask-distributed
 environment.
 """
+
+import json
 import os
-import subprocess
 import logging
 import time
 from datetime import datetime
-from distributed import Client
 from vega.trainer import utils
 from vega.common.file_ops import FileOps
 from vega.common.general import General
-import shutil
+from vega.core.scheduler.run_dask import get_client, run_scheduler,\
+    run_local_worker, run_remote_worker, get_address
 
 
 class DaskEnv(object):
@@ -77,6 +78,9 @@ def _set_slave_num(self, device_num):
         self.world_size = self.slave_num * self.slave_proc_num
         if General.cluster.standalone_boot:
             self.world_size = General.cluster.num_workers
+        General.cluster.num_workers = self.world_size
+        General.cluster.num_nodes = self.slave_num
+        General.cluster.num_workers_per_node = self.slave_proc_num
         return
 
     def _get_slave_device_num(self):
@@ -92,7 +96,7 @@ def _get_slave_device_num(self):
                 pass
         elif device_category == 'NPU':
             try:
-                system_device_num = len(os.environ['NPU-VISIBLE-DEVICES'].split(','))
+                system_device_num = len(os.environ['NPU_VISIBLE_DEVICES'].split(','))
             except Exception:
                 pass
         else:
@@ -142,18 +146,18 @@ def _start_dask(self):
             address = "--node-ip-address={}".format(the_ip)
             port = "--port={}".format(the_port)
             try:
-                Client("{}:{}".format(the_ip, the_port))
+                get_client(get_address(the_ip, the_port))
                 logging.info("Reusing previous cluster:{}:{}".format(the_ip, the_port))
                 return
             except Exception:
                 logging.info("Dask-scheduler not start. Start dask-scheduler in master {}".format(the_ip))
-            scheduler_p = subprocess.Popen(["dask-scheduler", port], env=os.environ)
+            scheduler_p = run_scheduler(port=port)
             self._cluster_pid.append(scheduler_p.pid)
         time.sleep(10)
 
         master_host, master_port = utils.get_master_address(self.args)
         address = "tcp://{0}:{1}".format(master_host, master_port)
-        self.master_address = "{}:{}".format(master_host, master_port)
+        self.master_address = get_address(master_host, master_port)
         logging.info("master host({}), address({}).".format(master_host, address))
 
         self._check_dask_scheduler()
@@ -169,20 +173,18 @@ def _start_dask(self):
             return
         # run dask-worker in master
         for _ in range(self.slave_proc_num):
-            worker_p = subprocess.Popen(["dask-worker", address, '--nthreads=1', '--nprocs=1',
-                                         '--memory-limit=0', local_dir], env=os.environ)
+            worker_p = run_local_worker(address=address, local_dir=local_dir)
             self._cluster_pid.append(worker_p.pid)
         # run dask-worker in each slaves.
         for slave_ip in self.slaves:
             for _ in range(self.slave_proc_num):
-                worker_p = subprocess.Popen(["ssh", slave_ip, shutil.which("dask-worker"), address, '--nthreads=1',
-                                             '--nprocs=1', '--memory-limit=0', local_dir], env=os.environ)
+                worker_p = run_remote_worker(slave_ip=slave_ip, address=address, local_dir=local_dir)
                 self._cluster_pid.append(worker_p.pid)
 
     def _check_dask_scheduler(self):
         """Check masker is start."""
         try:
-            Client(self.master_address)
+            get_client(self.master_address)
         except TimeoutError as ex:
             raise ex
 
@@ -194,7 +196,7 @@ def _wait_workers(self):
         :rtype: int
 
         """
-        self.client = Client(self.master_address)
+        self.client = get_client(self.master_address)
         logging.debug("client scheduler info: {}".format(self.client.scheduler_info()))
         if int(self.world_size) <= 1:
             self.worker_portion = 1
@@ -207,8 +209,15 @@ def _wait_workers(self):
             if n_workers >= worker_count_min:
                 workers = self.client.scheduler_info()["workers"]
                 workers_list = []
+                workers_port = {}
                 for k, _ in workers.items():
                     workers_list.append(k)
+                    (ip, port) = k.replace("//", "").split(":")[1:]
+                    if ip in workers_port:
+                        workers_port[ip].append(port)
+                    else:
+                        workers_port[ip] = [port]
+                os.environ["vega_workers_list"] = json.dumps(workers_port)
                 logging.info("worker list: {}".format(workers_list))
                 slave_ips = list(set([item[6:].split(":")[0] for item in workers_list]))
                 slave_ips.remove(General.cluster.master_ip)
diff --git a/vega/core/scheduler/distribution.py b/vega/core/scheduler/distribution.py
index 0542d41f..f5e26754 100644
--- a/vega/core/scheduler/distribution.py
+++ b/vega/core/scheduler/distribution.py
@@ -15,6 +15,7 @@
 Distributor Base Class, Dask Distributor Class and local Evaluator Distributor
 Class. Distributor Classes are used in Master to init and maintain the cluster.
 """
+
 import time
 import multiprocessing
 from threading import Lock
@@ -80,9 +81,9 @@ def get_client(self):
         :rtype: distributed.Cient
 
         """
-        from dask.distributed import Client
+        from .run_dask import get_client
         from dask.distributed import Queue
-        client = Client(address=self.address)
+        client = get_client(address=self.address)
         self.n_workers = len(client.scheduler_info()["workers"])
         self.process_queue = Queue(client=client, maxsize=self.n_workers)
         self.result_queue = Queue(client=client)
diff --git a/vega/core/scheduler/local_master.py b/vega/core/scheduler/local_master.py
index 11566022..da21a4cc 100644
--- a/vega/core/scheduler/local_master.py
+++ b/vega/core/scheduler/local_master.py
@@ -9,7 +9,9 @@
 # MIT License for more details.
 
 """The LocalMaster's method is same as Master, and the class is used on single node."""
-import os
+
+import traceback
+import logging
 from vega.trainer.utils import WorkerTypes
 from vega.common.general import General
 from vega.report import ReportClient
@@ -23,9 +25,6 @@ def __init__(self, update_func=None):
         """Init master."""
         self.cfg = General
         self.update_func = update_func
-        if os.environ['DEVICE_CATEGORY'] == 'NPU':
-            os.environ['RANK_SIZE'] = '1'
-            os.environ.pop('RANK_TABLE_FILE', None)
 
     def run(self, worker, evaluator=None):
         """Run a worker, call the worker's train_prcess() method.
@@ -40,16 +39,26 @@ def run(self, worker, evaluator=None):
         step_name = worker.step_name
         worker_id = worker.worker_id
 
-        workers = [worker]
+        if worker.worker_type == WorkerTypes.EVALUATOR and evaluator is None:
+            workers = []
+            evaluator = worker
+        else:
+            workers = [worker]
+
         if evaluator and evaluator.worker_type == WorkerTypes.EVALUATOR:
             for sub_worker in evaluator.sub_worker_list:
-                if sub_worker.worker_type == WorkerTypes.DeviceEvaluator:
+                is_device_evaluator = sub_worker.worker_type == WorkerTypes.DeviceEvaluator
+                if is_device_evaluator and General.device_evaluate_before_train:
                     workers.insert(0, sub_worker)
                 else:
                     workers.append(sub_worker)
 
         for worker in workers:
-            worker.train_process()
+            try:
+                worker.train_process()
+            except Exception:
+                logging.error(traceback.format_exc())
+                logging.error(f"Failed to run worker, id={worker.worker_id}")
 
         self._update(step_name, worker_id)
 
diff --git a/vega/core/scheduler/master.py b/vega/core/scheduler/master.py
index fa17f50b..2cf5d7c2 100644
--- a/vega/core/scheduler/master.py
+++ b/vega/core/scheduler/master.py
@@ -88,24 +88,11 @@ def _start_cluster(self):
         """Set and start dask distributed cluster."""
         self.md = ClusterDaskDistributor(self.dask_env.master_address)
         self.client = self.md.get_client()
-        local_host = None
-        if "BATCH_CURRENT_HOST" in os.environ:
-            local_host = os.environ["BATCH_CURRENT_HOST"]
-        elif "BATCH_CUSTOM0_HOSTS" in os.environ:
-            local_host = os.environ["BATCH_CUSTOM0_HOSTS"]
-        if "CUDA_VISIBLE_DEVICES" in os.environ:
-            os.environ["ORIGIN_CUDA_VISIBLE_DEVICES"] = os.environ["CUDA_VISIBLE_DEVICES"]
+        os.environ["vega_python_command"] = General.python_command
+        os.environ["vega_timeout"] = str(General.worker.timeout)
         self._remove_worker_number_file()
-        plugin = WorkerEnv(self.dask_env.slave_proc_num,
-                           self.dask_env.slave_device_num_per_proc,
-                           local_host,
-                           os.getpid(),
-                           TaskOps().temp_path)
+        plugin = WorkerEnv(self.dask_env.slave_device_num_per_proc)
         self.client.register_worker_plugin(plugin)
-        if "ORIGIN_CUDA_VISIBLE_DEVICES" in os.environ:
-            os.environ["CUDA_VISIBLE_DEVICES"] = os.environ["ORIGIN_CUDA_VISIBLE_DEVICES"]
-        if "CUDA_VISIBLE_DEVICES" in os.environ and "ORIGIN_CUDA_VISIBLE_DEVICES" not in os.environ:
-            del os.environ["CUDA_VISIBLE_DEVICES"]
         return
 
     def _remove_worker_number_file(self):
@@ -138,10 +125,16 @@ def run(self, worker, evaluator=None):
         if worker is None:
             return
 
-        workers = [worker]
+        if worker.worker_type == utils.WorkerTypes.EVALUATOR and evaluator is None:
+            workers = []
+            evaluator = worker
+        else:
+            workers = [worker]
+
         if evaluator and evaluator.worker_type == utils.WorkerTypes.EVALUATOR:
             for sub_worker in evaluator.sub_worker_list:
-                if sub_worker.worker_type == utils.WorkerTypes.DeviceEvaluator:
+                is_device_evaluator = sub_worker.worker_type == utils.WorkerTypes.DeviceEvaluator
+                if is_device_evaluator and General.device_evaluate_before_train:
                     workers.insert(0, sub_worker)
                 else:
                     workers.append(sub_worker)
@@ -182,7 +175,9 @@ def _monitor_thread(master):
 
     def _update(self, step_name, worker_id):
         # Waiting report thread update all record
-        ReportClient().set_finished(step_name, worker_id)
+        # TODO
+        if not General.cluster.show_all_ranks and "-" not in worker_id:
+            ReportClient().set_finished(step_name, worker_id)
         if not self.update_func:
             return
         if self.update_func.__code__.co_varnames.index("step_name") == 1:
diff --git a/vega/core/scheduler/run_dask.py b/vega/core/scheduler/run_dask.py
new file mode 100644
index 00000000..57f629c4
--- /dev/null
+++ b/vega/core/scheduler/run_dask.py
@@ -0,0 +1,68 @@
+# -*- coding: utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""Run dask scheduler and worker."""
+
+import os
+import subprocess
+import shutil
+from distributed import Client
+
+
+def get_client(address):
+    """Get dask client."""
+    return Client(address)
+
+
+def get_address(master_host, master_port):
+    """Get master address."""
+    return "tcp://{}:{}".format(master_host, master_port)
+
+
+def run_scheduler(port):
+    """Run dask-scheduler."""
+    id = subprocess.Popen(
+        ["dask-scheduler", "--no-dashboard", "--no-show", port],
+        env=os.environ
+    )
+    return id
+
+
+def run_local_worker(address, local_dir):
+    """Run dask-worker on local."""
+    id = subprocess.Popen(
+        [
+            "dask-worker",
+            address,
+            '--nthreads=1',
+            '--nprocs=1',
+            '--memory-limit=0',
+            local_dir],
+        env=os.environ
+    )
+    return id
+
+
+def run_remote_worker(slave_ip, address, local_dir):
+    """Run dask-worker on remove node."""
+    id = subprocess.Popen(
+        [
+            "ssh",
+            slave_ip,
+            shutil.which("dask-worker"),
+            address,
+            '--nthreads=1',
+            '--nprocs=1',
+            '--memory-limit=0',
+            local_dir
+        ],
+        env=os.environ
+    )
+    return id
diff --git a/vega/core/scheduler/worker_env.py b/vega/core/scheduler/worker_env.py
index 03d95e38..7819ac82 100644
--- a/vega/core/scheduler/worker_env.py
+++ b/vega/core/scheduler/worker_env.py
@@ -13,138 +13,116 @@
 The DaskEnv Class which used in Master to init and set basic dask-distributed
 environment.
 """
+
 import os
 import json
-import psutil
-import subprocess
-import traceback
-import fcntl
-from copy import deepcopy
+import logging
 from distributed.diagnostics.plugin import WorkerPlugin
 
 
 class WorkerEnv(WorkerPlugin):
     """WorkerEnv for add plugin in each worker in dask cluster.
 
-    :param int workers_each_node: worker count on each slave node.
     :param int device_quota: device num for each worker to use.
-    :param str master_host_name: the dask cluster master host name.
-    :param str master_pid: the process id of the master process.
 
     """
 
-    def __init__(self, workers_each_node, device_quota, master_host_name, master_pid, temp_path):
+    def __init__(self, device_quota):
         """Init the WorkerEnv."""
-        self.workers_each_node = workers_each_node
         self.device_quota = device_quota
-        self.master_host_name = master_host_name
-        self.master_pid = master_pid
-        self.device_list = []
-        self._backend_type = os.environ["BACKEND_TYPE"]
-        self.device_category = os.environ['DEVICE_CATEGORY']
-        self._npu_visible_devices = os.environ.get('NPU_VISIBLE_DEVICES', None)
-        self._npu_visible_devices = self._npu_visible_devices or os.environ.get('NPU-VISIBLE-DEVICES', None)
-        self._batch_task_index = os.environ.get('BATCH_TASK_INDEX', None)
-        self.temp_path = temp_path
-        self.__worker_null_file__ = os.path.join(temp_path, '.vega_null')
-        self.__worker_device_folder__ = os.path.join(temp_path, '.vega_device')
-        self._cuda_devices = deepcopy(os.environ.get("ORIGIN_CUDA_VISIBLE_DEVICES", None))
-        self._ori_rank_table_file = deepcopy(os.environ.get("ORIGIN_RANK_TABLE_FILE", None))
-        if self._cuda_devices:
-            _value = self._cuda_devices.replace("'", "").replace("\"", "").replace(" ", "").split(",")
-            self._cuda_devices = [str(x) for x in _value]
+        self._save_master_env()
         return
 
-    def _init_worker_number_file(self, ip):
-        """Use a local file to save a label to mark gpu id used for different workers on a same slave node."""
-        _worker_number_file = os.path.join(self.temp_path, '.{}.worker_number'.format(ip))
-        if not os.path.isfile(_worker_number_file):
-            os.makedirs(os.path.dirname(_worker_number_file), exist_ok=True)
-            fp = open(_worker_number_file, 'w')
-            fcntl.flock(fp, fcntl.LOCK_EX)
-            fp.write('{}'.format(0))
-            fcntl.flock(fp, fcntl.LOCK_UN)
-            fp.close()
-        return _worker_number_file
-
-    def _get_device_list(self, worker_number_file):
-        """Get the cuda devices id list that are visible to current workers.
-
-        :return: the current worker visible gpu id list.
-        :rtype: list
-
-        """
-        current_count = 0
-        with open(worker_number_file, 'r+') as fp:
-            fcntl.flock(fp, fcntl.LOCK_EX)
-            f_str = fp.readline()
-            try:
-                # current_count = int(f_str.strip()) % self.workers_each_node
-                current_count = int(f_str.strip())
-            except Exception:
-                pass
-            with open(worker_number_file, 'w') as fn:
-                fn.write('{}'.format(current_count + 1))
-            fcntl.flock(fp, fcntl.LOCK_UN)
-        device_list = []
-        for i in range(current_count * self.device_quota, (current_count + 1) * self.device_quota):
-            device_list.append('{}'.format(i))
-        return device_list
-
-    def _set_visible_devices(self):
+    def _get_devices(self, index, quota, env):
+        all = os.environ[env].replace("'", "").replace("\"", "").replace(" ", "").split(",")
+        npus = ",".join([all[index * quota + i] for i in range(quota)])
+        return npus
+
+    def _save_master_env(self):
+        self.master_env = {
+            "PATH": os.environ.get("PATH", None),
+            "PYTHONPATH": os.environ.get("PYTHONPATH", None),
+            "LD_LIBRARY_PATH": os.environ.get("LD_LIBRARY_PATH", None),
+            "PWD": os.environ.get("PWD", None),
+            "RANK_TABLE_FILE": os.environ.get("RANK_TABLE_FILE", None),
+            "MINDSPORE_HCCL_CONFIG_PATH": os.environ.get("RANK_TABLE_FILE", None),
+            "DLS_TASK_NUMBER": os.environ.get("DLS_TASK_NUMBER", None),
+            "NPU_VISIBLE_DEVICES": os.environ.get("NPU_VISIBLE_DEVICES", None),
+            "ASCEND_OPP_PATH": os.environ.get("ASCEND_OPP_PATH", None),
+            "DEVICE_CATEGORY": os.environ.get("DEVICE_CATEGORY", None),
+            "BACKEND_TYPE": os.environ.get("BACKEND_TYPE", None),
+            "LD_PRELOAD": os.environ.get("LD_PRELOAD", None),
+            "DLS_JOB_ID": os.environ.get("DLS_JOB_ID", None),
+            "vega_init_env": os.environ.get("vega_init_env", None),
+            "vega_python_command": os.environ.get("vega_python_command", None),
+            "vega_timeout": os.environ.get("vega_timeout", None),
+            "vega_world_size": os.environ.get("WORLD_SIZE", None),
+            "vega_workers_list": os.environ.get("vega_workers_list", None),
+            "vega_pytorch_hccl_port": os.environ.get("vega_pytorch_hccl_port", None),
+        }
+
+    def _restore_worker_env(self):
+        for key, value in self.master_env.items():
+            if value is not None:
+                os.environ[key] = value
+
+    def _set_visible_devices(self, worker):
         """Set visible devices to each worker env."""
-        os.environ["BACKEND_TYPE"] = self._backend_type
-        os.environ['DEVICE_CATEGORY'] = self.device_category
-        if self.device_category == 'GPU':
-            _device_list = self.device_list
-            if self._cuda_devices:
-                _device_list = [self._cuda_devices[int(i)] for i in _device_list]
-            cuda_device_list_str = ",".join(_device_list)
-            os.environ['CUDA_VISIBLE_DEVICES'] = cuda_device_list_str
-            # print("CUDA_VISIBLE_DEVICES: {}".format(cuda_device_list_str))
-        elif self.device_category == 'NPU':
-            self._fit_npu_device_list()
-            origin_rank_file = self._ori_rank_table_file
-            with open(origin_rank_file, 'r') as f:
-                rank_table_json = json.loads(f.read())
-            rank_table_json['server_count'] = 1
-            group_info = rank_table_json['server_list']
-            devices_info = []
-            keep_idx = int(self._batch_task_index)
-            instance_info = group_info[keep_idx]
-            for device_id in self.device_list:
-                device_id = int(device_id)
-                devices_info.append(instance_info['device'][device_id])
-            if len(devices_info) == 0:
-                raise Exception('No matching devices info.')
-            rank_table_json['server_list'] = [instance_info]
-            rank_table_json['server_list'][0]['device'] = devices_info
-            server_id = rank_table_json['server_list'][0]['server_id']
-            new_rank_table_file = os.path.join(self.__worker_device_folder__,
-                                               'rank_table_{0}_{1}.json'.format(server_id, self.device_list[0]))
-            if not os.path.exists(self.__worker_device_folder__):
-                os.makedirs(self.__worker_device_folder__, exist_ok=True)
-            with open(new_rank_table_file, 'w') as f:
-                f.write(json.dumps(rank_table_json))
-            print('worker {} rank table json: {}'.format(self.device_list[0], rank_table_json))
-            os.environ['RANK_TABLE_FILE'] = new_rank_table_file
-            os.environ['RANK_SIZE'] = str(len(self.device_list))
-            os.environ['DEVICE_ID'] = self.device_list[0]
-            os.environ['ASCEND_DEVICE_ID'] = self.device_list[0]
-            os.environ['MASTER_ADDR'] = rank_table_json['server_list'][0]['device'][0]['device_ip']
-            os.environ['MASTER_PORT'] = rank_table_json['server_list'][0].get('server_port', '29688')
-            os.environ['RANK_ID'] = rank_table_json['server_list'][0]['device'][0]['rank_id']
-            # print("RANK_TABLE_FILE: {}".format(new_rank_table_file))
+        if os.environ['DEVICE_CATEGORY'] == 'GPU':
+            _index = self._get_device_index(worker)
+            devices = self._get_devices(_index, self.device_quota, "CUDA_VISIBLE_DEVICES")
+            os.environ['CUDA_VISIBLE_DEVICES'] = devices
+        elif os.environ['DEVICE_CATEGORY'] == 'NPU':
+            ip = worker.ip
+            _index = self._get_device_index(worker)
+            device_id = self._get_devices(_index, self.device_quota, "NPU_VISIBLE_DEVICES").split(",")[0]
+            os.environ['DEVICE_ID'] = device_id
+            os.environ['ASCEND_DEVICE_ID'] = device_id
+            rank_table_file = os.environ.get("RANK_TABLE_FILE", None)
+            if rank_table_file:
+                self._set_rank_info(device_id, rank_table_file, ip)
         else:
             raise Exception('device category must be GPU or NPU.')
 
-    def _fit_npu_device_list(self):
-        """Fit npu device list to actual visible devices."""
-        visible_list = self._npu_visible_devices.split(',')
-        new_device_list = list()
-        for device_id in self.device_list:
-            new_device_list.append(visible_list[int(device_id)])
-        self.device_list = new_device_list
+    def _set_rank_info(self, device_id, rank_table_file, ip):
+        try:
+            with open(rank_table_file, 'r') as f:
+                rank_table_json = json.loads(f.read())
+
+            server_list = rank_table_json["server_list"]
+            ips = [x["server_id"] for x in server_list]
+            if len(ips) == 1:
+                # single-node
+                devices = rank_table_json['server_list'][0]['device']
+                rank_id = list(filter(lambda x: x["device_id"] == device_id, devices))[0]["rank_id"]
+                rank_size = str(len(devices))
+                if "vega_pytorch_hccl_port" in os.environ:
+                    port = os.environ['vega_pytorch_hccl_port']
+                    os.environ['MASTER_ADDR'] = rank_table_json['server_list'][0]['device'][0]['device_ip']
+                    os.environ['MASTER_PORT'] = rank_table_json['server_list'][0].get('server_port', port)
+            else:
+                # multi-nodes
+                if "DLS_TASK_INDEX" in os.environ:
+                    index = int(os.environ["DLS_TASK_INDEX"])
+                    devices = server_list[index]["device"]
+                    rank_id = list(filter(lambda x: x["device_id"] == device_id, devices))[0]["rank_id"]
+                    rank_size = str(sum([len(x["device"]) for x in server_list]))
+                else:
+                    if ip not in ips:
+                        raise Exception(f"Worker IP {ip} not in rank table file ({ips}, {rank_table_file}). ")
+                    devices = list(filter(lambda x: x["server_id"] == ip, server_list))[0]["device"]
+                    rank_id = list(filter(lambda x: x["device_id"] == device_id, devices))[0]["rank_id"]
+                    rank_size = str(sum([len(x["device"]) for x in server_list]))
+            os.environ['RANK_ID'] = rank_id
+            os.environ['RANK_SIZE'] = rank_size
+        except Exception:
+            logging.warn(f"wrong rank table file: {rank_table_file}")
+
+    def _get_device_index(self, worker):
+        ports_list = json.loads(os.environ["vega_workers_list"])
+        (ip, port) = worker.worker_address.replace("//", "").split(":")[1:]
+        _index = ports_list[ip].index(port)
+        return _index
 
     def setup(self, worker):
         """Call back function for worker setup.
@@ -153,65 +131,10 @@ def setup(self, worker):
         CUDA_VISIBLE_DEVICES.
 
         """
-        number_file = self._init_worker_number_file(worker.ip)
-        self.device_list = self._get_device_list(number_file)
-        self._set_visible_devices()
+        self._restore_worker_env()
+        self._set_visible_devices(worker)
         return
 
     def teardown(self, worker):
         """Call back function for worker teardown."""
         return
-
-    def transition_discard(self, key, start, finish, *args, **kwargs):
-        """Call back function for worker status transition.
-
-        here to clean the gpu memory whe worker status turn to `ready`,
-        use `fuser -v` list all pid that use cuda, and filter the master's
-        processes, and kill all other processes.
-
-        :param str key: Description of parameter `key`.
-        :param str start: Start state of the transition.
-            One of waiting, ready, executing, long-running, memory, error.
-        :param str finish: Final state of the transition.
-        :param type * args: Description of parameter `*args`.
-        :param type ** kwargs: Description of parameter `**kwargs`.
-
-        """
-        print(" Plugin transition ")
-        #
-        if finish == 'ready' and len(self.device_list) > 0:
-            try:
-                current_pid = os.getpid()
-                protect_pid_set = set()
-                protect_pid_set.add(int(current_pid))
-                # if self.master_host_name is not None and self.master_host_name == self.local_host_name:
-                protect_pid_set.add(int(self.master_pid))
-                try:
-                    parent = psutil.Process(self.master_pid)
-                    for p in parent.children(recursive=False):
-                        protect_pid_set.add(int(p.pid))
-                except Exception:
-                    print("In slave node, master pid is not existed, process does not need to protect.")
-                if self.device_category == 'GPU':
-                    cuda_pid_set = set()
-                    for id in self.device_list:
-                        device = "/dev/nvidia{}".format(id)
-                        fh = open(self.__worker_null_file__, "w")
-                        p = subprocess.Popen(["fuser", "-v", device], stdout=subprocess.PIPE, stderr=fh)
-                        p.wait()
-                        fh.close()
-                        sub_pids = p.stdout.read().split()
-                        for spid in sub_pids[1:]:
-                            cuda_pid_set.add(int(spid))
-                    for spid in protect_pid_set:
-                        if spid in cuda_pid_set:
-                            cuda_pid_set.remove(spid)
-                    # for spid in cuda_pid_set:
-                    #     subprocess.call(["kill", "-9", "{}".format(spid)])
-                    if cuda_pid_set:
-                        print("Non-Vega process is using GPU, pids={}".format(cuda_pid_set))
-            except Exception:
-                print("Worker Plugin Error.")
-                print(traceback.format_exc())
-        print("cleaned the cuda memory...")
-        return
diff --git a/vega/core/search_space/ext_hyper_parameter.py b/vega/core/search_space/ext_hyper_parameter.py
index 795e1d71..7522deb8 100644
--- a/vega/core/search_space/ext_hyper_parameter.py
+++ b/vega/core/search_space/ext_hyper_parameter.py
@@ -407,7 +407,8 @@ def decode(self, x, forbidden=''):
         if random.uniform(0, 1) < ratio:
             return [1] * size
         if len(self.range) == 1:
-            need_convert_code_size = size // 2
+            need_convert_code_size = round(size * 0.5 / 16) * 16
+            need_convert_code_size = need_convert_code_size if need_convert_code_size > 16 else size
             change_ids = random.sample(range(size), need_convert_code_size)
             individual = [1 if i in change_ids else 0 for i in range(size)]
         return individual
diff --git a/vega/core/search_space/search_space.py b/vega/core/search_space/search_space.py
index 231176b0..8d1801d4 100644
--- a/vega/core/search_space/search_space.py
+++ b/vega/core/search_space/search_space.py
@@ -21,7 +21,6 @@
 from vega.common.class_factory import ClassFactory, ClassType
 from vega.core.pipeline.conf import SearchSpaceConfig
 
-
 logger = logging.getLogger(__name__)
 
 
@@ -38,8 +37,11 @@ def __init__(self, desc=None):
         super(SearchSpace, self).__init__()
         if desc is None:
             desc = SearchSpaceConfig().to_dict()
-            if desc.type is not None:
-                desc = ClassFactory.get_cls(ClassType.SEARCHSPACE, desc.type).get_space(desc)
+            if desc.type is not None and desc.type != 'SearchSpace':
+                cls = ClassFactory.get_cls(ClassType.SEARCHSPACE, desc.type)
+                desc = cls.get_space(desc)
+                if hasattr(cls, "to_desc"):
+                    self.to_desc = cls.to_desc
         for name, item in desc.items():
             self.__setattr__(name, item)
             self.__setitem__(name, item)
@@ -97,9 +99,9 @@ def verify_constraints(self, sample):
         """Verify condition."""
         for condition in self.get("condition", []):
             _type = condition["type"]
-            child = condition["child"]      # eg. trainer.optimizer.params.momentum
-            parent = condition["parent"]    # eg. trainer.optimizer.type
-            _range = condition["range"]     # eg. range': ['SGD']
+            child = condition["child"]  # eg. trainer.optimizer.params.momentum
+            parent = condition["parent"]  # eg. trainer.optimizer.type
+            _range = condition["range"]  # eg. range': ['SGD']
             if _type == "EQUAL" or _type == "IN":
                 if parent in sample and sample[parent] in _range:
                     if child not in sample:
@@ -224,6 +226,7 @@ def get_hp_names(self):
         :return: List[str]
         :rtype: list
 
+
         """
         return list(self._params.keys())
 
diff --git a/vega/datasets/mindspore/adapter.py b/vega/datasets/mindspore/adapter.py
index fa577b2b..3514b3a3 100644
--- a/vega/datasets/mindspore/adapter.py
+++ b/vega/datasets/mindspore/adapter.py
@@ -9,31 +9,13 @@
 # MIT License for more details.
 
 """This is a base class of the dataset."""
-import os
+
 from mindspore.dataset import GeneratorDataset, DistributedSampler, SubsetRandomSampler
 import mindspore.dataset.transforms.c_transforms as C2
 import mindspore.dataset.vision.c_transforms as vision
 import mindspore.common.dtype as mstype
 import numpy as np
 from mindspore.communication.management import get_rank, get_group_size
-import logging
-
-
-def _get_rank_info():
-    """Get rank size and rank id."""
-    rank_size = int(os.environ.get("RANK_SIZE", 1))
-
-    if rank_size > 1:
-        # rank_size = get_group_size()
-        # rank_id = get_rank()
-        rank_id = os.environ.get('ASCEND_DEVICE_ID', 0)
-        rank_id = int(rank_id) % rank_size
-        logging.info("rank_id is {}, rank_size is {}".format(rank_id, rank_size))
-    else:
-        rank_size = 1
-        rank_id = 0
-
-    return rank_size, rank_id
 
 
 class MsAdapter(object):
@@ -86,10 +68,10 @@ def _init_sampler(self):
         :rtype: an object or None
         """
         if self.dataset.world_size > 1:
-            self.args.shuffle = False
             sampler = DistributedSampler(num_shards=self.dataset.world_size,
                                          shard_id=self.dataset.rank,
                                          shuffle=self.args.shuffle)
+            self.args.shuffle = False
         elif not hasattr(self.args, "train_portion"):
             sampler = None
         elif self.dataset.mode == 'test' or self.args.train_portion == 1:
@@ -114,8 +96,11 @@ def loader(self):
         :return: a batch data
         :rtype: dict, list, optional
         """
-        rank_size, rank_id = _get_rank_info()
-        if rank_size > 1:
+        rank_size = 1
+        rank_id = 0
+        if self.dataset.world_size > 1:
+            rank_size = get_group_size()
+            rank_id = get_rank()
             self.sampler = None
         ms_dataset = GeneratorDataset(self.dataset, ["image", "label"], sampler=self.sampler, num_shards=rank_size,
                                       shard_id=rank_id)
diff --git a/vega/datasets/pytorch/adapter.py b/vega/datasets/pytorch/adapter.py
index ab348a2a..50bdb9ee 100644
--- a/vega/datasets/pytorch/adapter.py
+++ b/vega/datasets/pytorch/adapter.py
@@ -44,11 +44,11 @@ def _init_sampler(self):
         :rtype: an object or None
         """
         if self.dataset.world_size > 1:
-            self.args.shuffle = False
             sampler = DistributedSampler(self.dataset,
                                          num_replicas=self.dataset.world_size,
                                          rank=self.dataset.rank,
                                          shuffle=self.args.shuffle)
+            self.args.shuffle = False
         elif not hasattr(self.args, "train_portion"):
             sampler = None
         elif self.dataset.mode == 'test' or self.args.train_portion == 1:
diff --git a/vega/evaluator/conf.py b/vega/evaluator/conf.py
index fc63d114..46ea456d 100644
--- a/vega/evaluator/conf.py
+++ b/vega/evaluator/conf.py
@@ -22,6 +22,7 @@ class HostEvaluatorConfig(ConfigSerializable):
     cuda = True
     metric = {'type': 'accuracy'}
     report_freq = 10
+    is_fusion = False
 
     @classmethod
     def rules(cls):
@@ -38,12 +39,17 @@ class DeviceEvaluatorConfig(ConfigSerializable):
     hardware = "Davinci"
     remote_host = ""
     intermediate_format = "onnx"  # for torch model convert
+    opset_version = 9  # for torch model convert
+    precision = 'FP32'
     cuda = False
     evaluate_latency = True
     metric = {'type': 'accuracy'}
     calculate_metric = False
     report_freq = 10
     quantize = False
+    is_fusion = False
+    reshape_batch_size = 1
+    save_intermediate_file = False
 
 
 class EvaluatorConfig(ConfigSerializable):
diff --git a/vega/evaluator/device_evaluator.py b/vega/evaluator/device_evaluator.py
index ef8f454c..22e37e64 100644
--- a/vega/evaluator/device_evaluator.py
+++ b/vega/evaluator/device_evaluator.py
@@ -10,19 +10,19 @@
 
 """HostEvaluator used to do evaluate process on gpu."""
 
+import os
+import datetime
 import logging
 import numpy as np
+import traceback
 import vega
 from vega.common import ClassFactory, ClassType
-from vega.common.general import General
-from vega.common.utils import init_log
+from vega.common.wrappers import train_process_wrapper
+from vega.report import ReportClient
+from vega.trainer.utils import WorkerTypes
 from .tools.evaluate_davinci_bolt import evaluate
 from .conf import DeviceEvaluatorConfig
-from vega.report import ReportClient
 from .evaluator import Evaluator
-from vega.trainer.utils import WorkerTypes
-import os
-import datetime
 
 
 @ClassFactory.register(ClassType.DEVICE_EVALUATOR)
@@ -48,6 +48,8 @@ def __init__(self, worker_info=None, model=None, saved_folder=None, saved_step_n
         self.hardware = self.config.hardware
         self.remote_host = self.config.remote_host
         self.intermediate_format = self.config.intermediate_format
+        self.opset_version = self.config.opset_version
+        self.precision = self.config.precision.upper()
         self.calculate_metric = self.config.calculate_metric
         self.quantize = self.config.quantize
         self.model = model
@@ -87,8 +89,10 @@ def valid(self):  # noqa: C901
                                      "but get {}.".format(type(batch)))
                 if not self.calculate_metric:
                     repeat_times = 10
-                    data = data[0:1]
-                    target = target[0:1]
+                    reshape_batch_size = self.config.reshape_batch_size
+                    if reshape_batch_size and isinstance(reshape_batch_size, int):
+                        data = data[0:reshape_batch_size]
+                        target = target[0:reshape_batch_size]
 
                 if not self.calculate_metric and global_step >= 1:
                     break
@@ -99,15 +103,18 @@ def valid(self):  # noqa: C901
                 results = evaluate(backend="pytorch", hardware=self.hardware, remote_host=self.remote_host,
                                    model=self.model, weight=None, test_data=test_data, input_shape=data.shape,
                                    reuse_model=reuse_model, job_id=job_id, repeat_times=repeat_times,
-                                   intermediate_format=self.intermediate_format)
-                if results.get("status") != "sucess" and error_count <= error_threshold:
+                                   precision=self.precision, intermediate_format=self.intermediate_format,
+                                   opset_version=self.opset_version,
+                                   save_intermediate_file=self.config.save_intermediate_file)
+                if self.calculate_metric and results.get("status") != "sucess" and error_count <= error_threshold:
                     error_count += 1
                     break
                 latency = np.float(results.get("latency"))
-                data_num += data.shape[0]
+                data_num += 1
                 latency_sum += latency
 
-                if global_step == 0:
+                if global_step == 0 and self.calculate_metric:
+                    self.model.eval()
                     real_output = self.model(torch.Tensor(data))
                     real_output = real_output.detach().numpy()
 
@@ -147,15 +154,16 @@ def valid(self):  # noqa: C901
                 target = batch[1]
                 if not self.calculate_metric:
                     repeat_times = 10
-                    data = data[0:1]
-                    target = target[0:1]
-                input_shape = data.shape
+                    reshape_batch_size = self.config.reshape_batch_size
+                    if reshape_batch_size and isinstance(reshape_batch_size, int):
+                        data = data[0:reshape_batch_size]
+                        target = target[0:reshape_batch_size]
 
                 if not self.calculate_metric and global_step >= 1:
                     break
                 data.tofile(test_data)
 
-                if global_step == 0:
+                if global_step == 0 and self.calculate_metric:
                     input_tf = tf.placeholder(tf.float32, shape=data.shape, name='input_tf')
                     self.model.training = False
                     output = self.model(input_tf)
@@ -168,12 +176,13 @@ def valid(self):  # noqa: C901
                 results = evaluate(backend="tensorflow", hardware=self.hardware, remote_host=self.remote_host,
                                    model=self.model, weight=weight_file, test_data=test_data, input_shape=data.shape,
                                    reuse_model=reuse_model, job_id=job_id, quantize=self.quantize,
-                                   repeat_times=repeat_times)
-                if results.get("status") != "sucess" and error_count <= error_threshold:
+                                   repeat_times=repeat_times, precision=self.precision,
+                                   save_intermediate_file=self.config.save_intermediate_file)
+                if self.calculate_metric and results.get("status") != "sucess" and error_count <= error_threshold:
                     error_count += 1
                     break
                 latency = np.float(results.get("latency"))
-                data_num += input_shape[0]
+                data_num += 1
                 latency_sum += latency
 
                 if self.calculate_metric:
@@ -213,8 +222,10 @@ def valid(self):  # noqa: C901
                 target = batch["label"]
                 if not self.calculate_metric:
                     repeat_times = 10
-                    data = data[0:1]
-                    target = target[0:1]
+                    reshape_batch_size = self.config.reshape_batch_size
+                    if reshape_batch_size and isinstance(reshape_batch_size, int):
+                        data = data[0:reshape_batch_size]
+                        target = target[0:reshape_batch_size]
 
                 if not self.calculate_metric and global_step >= 1:
                     break
@@ -223,12 +234,13 @@ def valid(self):  # noqa: C901
                 reuse_model = False if global_step == 0 else True
                 results = evaluate(backend="mindspore", hardware=self.hardware, remote_host=self.remote_host,
                                    model=self.model, weight=None, test_data=test_data, input_shape=data.shape,
-                                   reuse_model=reuse_model, job_id=job_id, repeat_times=repeat_times)
+                                   reuse_model=reuse_model, job_id=job_id, repeat_times=repeat_times,
+                                   precision=self.precision, save_intermediate_file=self.config.save_intermediate_file)
                 latency = np.float(results.get("latency"))
                 latency_sum += latency
-                data_num += data.shape[0]
+                data_num += 1
 
-                if global_step == 0:
+                if global_step == 0 and self.calculate_metric:
                     real_output = self.model(mindspore.Tensor(data))
                     real_output = real_output.asnumpy()
                     if isinstance(real_output, tuple):
@@ -257,14 +269,15 @@ def valid(self):  # noqa: C901
         logging.info("valid performance: {}".format(pfms))
         return pfms
 
+    @train_process_wrapper
     def train_process(self):
         """Validate process for the model validate worker."""
-        init_log(level=General.logger.level,
-                 log_file=f"{self.step_name}_device_evaluator_{self.worker_id}.log",
-                 log_path=self.local_log_path)
-        logging.info("start Davinci or mobile evaluate process")
-        self.load_model()
-        self.valid_loader = self._init_dataloader(mode='test')
-        performance = self.valid()
-        ReportClient().update(self.step_name, self.worker_id, performance=performance)
-        logging.info(f"finished device evaluation, id: {self.worker_id}, performance: {performance}")
+        try:
+            self.load_model()
+            self.valid_loader = self._init_dataloader(mode='test')
+            performance = self.valid()
+            ReportClient().update(self.step_name, self.worker_id, performance=performance)
+            logging.info(f"finished device evaluation, id: {self.worker_id}, performance: {performance}")
+        except Exception:
+            logging.error(traceback.format_exc())
+            logging.error("Failed to evalute on device.")
diff --git a/vega/evaluator/evaluator.py b/vega/evaluator/evaluator.py
index dcd8d04c..125eed16 100644
--- a/vega/evaluator/evaluator.py
+++ b/vega/evaluator/evaluator.py
@@ -115,10 +115,10 @@ def load_model(self):
             elif vega.is_tf_backend():
                 self.weights_file = FileOps.join_path(self.saved_folder, 'model_{}'.format(self.worker_id))
         if self.weights_file is not None and os.path.exists(self.weights_file):
-            self.model = ModelZoo.get_model(self.model_desc, self.weights_file)
+            self.model = ModelZoo.get_model(self.model_desc, self.weights_file, is_fusion=self.config.is_fusion)
         else:
             logger.info("evalaute model without loading weights file")
-            self.model = ModelZoo.get_model(self.model_desc)
+            self.model = ModelZoo.get_model(self.model_desc, is_fusion=self.config.is_fusion)
 
     def _use_evaluator(self):
         """Check if use evaluator and get the evaluators.
@@ -153,6 +153,14 @@ def _init_evaluator(self):
         for cls in cls_evaluator_set:
             evaluator = cls(worker_info=self.worker_info)
             self.add_evaluator(evaluator)
+        self._disable_host_latency()
+
+    def _disable_host_latency(self):
+        if len(self.sub_worker_list) < 2:
+            return
+        for sub_evaluator in self.sub_worker_list:
+            if sub_evaluator.worker_type == WorkerTypes.HOST_EVALUATOR:
+                sub_evaluator.config.evaluate_latency = False
 
     def _get_model_desc(self):
         model_desc = self.model_desc
diff --git a/vega/evaluator/host_evaluator.py b/vega/evaluator/host_evaluator.py
index 5d66dea4..a2b24e25 100644
--- a/vega/evaluator/host_evaluator.py
+++ b/vega/evaluator/host_evaluator.py
@@ -14,14 +14,15 @@
 import copy
 import logging
 from statistics import mean
+import traceback
 import vega
 from vega.common import ClassFactory, ClassType
-from vega.common import init_log
 from vega.common.general import General
+from vega.common.wrappers import train_process_wrapper
 from vega.report import ReportClient
+from vega.trainer.utils import WorkerTypes
 from .conf import HostEvaluatorConfig
 from .evaluator import Evaluator
-from vega.trainer.utils import WorkerTypes
 
 
 @ClassFactory.register(ClassType.HOST_EVALUATOR)
@@ -113,10 +114,10 @@ def valid(self, valid_loader):
             from vega.metrics.mindspore.metrics import Metrics
             from mindspore.train import Model as MsModel
             from .utils import FakeLoss
+            self._init_ms_context()
             metrics = Metrics(self.config.metric)
             metric_name = self.config.metric().type
             ms_metric = metrics() if isinstance(metrics(), dict) else {metric_name: metrics()}
-            dataset_sink_mode = True if vega.is_npu_device() else False
             # when eval, the loss_fn is not needed actually, but when initilized, the loss_fn can't be None
             ms_model = MsModel(network=self.model,
                                loss_fn=FakeLoss(),
@@ -124,7 +125,7 @@ def valid(self, valid_loader):
             time_start = time.time()
             eval_metrics = ms_model.eval(valid_dataset=valid_loader,
                                          callbacks=None,
-                                         dataset_sink_mode=dataset_sink_mode)
+                                         dataset_sink_mode=self.dataset_sink_mode)
             for batch in valid_loader.create_dict_iterator():
                 batch_size = batch["image"].shape[0]
                 break
@@ -170,17 +171,19 @@ def _init_tf_estimator(self):
                                         session_config=session_config)
         return tf.estimator.Estimator(model_fn=self._model_fn, config=config)
 
+    @train_process_wrapper
     def train_process(self):
         """Validate process for the model validate worker."""
-        init_log(level=General.logger.level,
-                 log_file=f"{self.step_name}_host_evaluator_{self.worker_id}.log",
-                 log_path=self.local_log_path)
         logging.info("start evaluate process")
-        self.load_model()
-        self.valid_loader = self._init_dataloader(mode='test')
-        performance = self.valid(self.valid_loader)
-        ReportClient().update(self.step_name, self.worker_id, performance=performance)
-        logging.info(f"finished host evaluation, id: {self.worker_id}, performance: {performance}")
+        try:
+            self.load_model()
+            self.valid_loader = self._init_dataloader(mode='test')
+            performance = self.valid(self.valid_loader)
+            ReportClient().update(self.step_name, self.worker_id, performance=performance)
+            logging.info(f"finished host evaluation, id: {self.worker_id}, performance: {performance}")
+        except Exception:
+            logging.error(traceback.format_exc())
+            logging.error("Failed to evalute on host.")
 
     def _init_session_config(self):
         import tensorflow as tf
@@ -196,3 +199,14 @@ def _init_session_config(self):
             custom_op.name = "NpuOptimizer"
             custom_op.parameter_map["use_off_line"].b = True
             return sess_config
+
+    def _init_ms_context(self):
+        from mindspore import context
+        mode = General.ms_execute_mode
+        logging.info(f"Run evaluator in mode: {mode}.")
+        if vega.is_npu_device():
+            context.set_context(mode=mode, device_target="Ascend")
+        else:
+            context.set_context(mode=mode, device_target="CPU")
+        self.dataset_sink_mode = General.dataset_sink_mode
+        logging.info(f"Dataset_sink_mode:{self.dataset_sink_mode}.")
diff --git a/vega/evaluator/tools/evaluate_davinci_bolt.py b/vega/evaluator/tools/evaluate_davinci_bolt.py
index 0530c022..dac6e4a2 100644
--- a/vega/evaluator/tools/evaluate_davinci_bolt.py
+++ b/vega/evaluator/tools/evaluate_davinci_bolt.py
@@ -10,15 +10,16 @@
 
 """The EvaluateService of client."""
 import os
-import requests
 import logging
 import subprocess
 import pickle
 import numpy as np
+from .rest import post
 
 
-def evaluate(backend, hardware, remote_host, model, weight, test_data, input_shape=None, reuse_model=False,
-             job_id=None, quantize=False, repeat_times=1, **kwargs):
+# flake8: noqa: C901
+def evaluate(backend, hardware, remote_host, model, weight, test_data, input_shape=None, reuse_model=False, job_id=None,
+             quantize=False, repeat_times=1, precision='FP32', **kwargs):
     """Evaluate interface of the EvaluateService.
 
     :param backend: the backend can be one of "tensorflow", "caffe" and "pytorch"
@@ -60,8 +61,8 @@ def evaluate(backend, hardware, remote_host, model, weight, test_data, input_sha
         data_file = open(test_data, "rb")
         upload_data = {"data_file": data_file}
 
-    evaluate_config = {"backend": backend, "hardware": hardware, "remote_host": remote_host,
-                       "reuse_model": reuse_model, "job_id": job_id, "repeat_times": repeat_times}
+    evaluate_config = {"backend": backend, "hardware": hardware, "remote_host": remote_host, "reuse_model": reuse_model,
+                       "job_id": job_id, "repeat_times": repeat_times, "precision": precision}
     if backend == 'tensorflow':
         shape_list = [str(s) for s in input_shape]
         shape_cfg = {"input_shape": "Placeholder:" + ",".join(shape_list)}
@@ -71,32 +72,44 @@ def evaluate(backend, hardware, remote_host, model, weight, test_data, input_sha
         out_node_cfg = {"out_nodes": out_node_name}
         evaluate_config.update(out_node_cfg)
 
-    evaluate_result = requests.post(remote_host, files=upload_data, data=evaluate_config,
-                                    proxies={"http": None}).json()
+    evaluate_result = post(host=remote_host, files=upload_data, data=evaluate_config)
     # evaluate_result = requests.get(remote_host, proxies={"http": None}).json()
     if evaluate_result.get("status") != "sucess":
-        logging.warning("Evaluate failed and will try again, the status is {}, the timestamp is {}".format(
-            evaluate_result.get("status"), evaluate_result.get("timestamp")))
+        logging.warning(
+            "Evaluate failed and will try again, the status is {}, the timestamp is {}, \
+            the error message is {}.".format(
+                evaluate_result.get("status"), evaluate_result.get("timestamp"), evaluate_result.get("error_message")))
         evaluate_config["reuse_model"] = True
         upload_data = {"data_file": open(test_data, "rb")}
         retry_times = 4
         for i in range(retry_times):
-            evaluate_result = requests.post(remote_host, files=upload_data, data=evaluate_config,
-                                            proxies={"http": None}).json()
+            evaluate_result = post(host=remote_host, files=upload_data, data=evaluate_config)
             if evaluate_result.get("status") == "sucess":
                 logging.info("Evaluate sucess! The latency is {}.".format(evaluate_result["latency"]))
                 break
             else:
                 if i == 3:
                     logging.error(
-                        "Evaluate failed, the status is {},the timestamp is {}, the retry times is {}.".format(
-                            evaluate_result.get("status"), evaluate_result.get("timestamp"), i + 1))
+                        "Evaluate failed, the status is {},the timestamp is {}, the retry times is {}, the error \
+                        message is {}.".format(evaluate_result.get("status"), evaluate_result.get("timestamp"),
+                                               i + 1, evaluate_result.get("error_message")))
                 else:
                     logging.warning(
-                        "Evaluate failed, the status is {},the timestamp is {}, the retry times is {}.".format(
-                            evaluate_result.get("status"), evaluate_result.get("timestamp"), i + 1))
+                        "Evaluate failed, the status is {},the timestamp is {}, the retry times is {}, the error \
+                        message is {}.".format(evaluate_result.get("status"), evaluate_result.get("timestamp"), i + 1,
+                                               evaluate_result.get("error_message")))
     else:
         logging.info("Evaluate sucess! The latency is {}.".format(evaluate_result["latency"]))
+
+    if not kwargs.get("save_intermediate_file", False):
+        # clean intermediate file
+        if os.path.exists(model):
+            os.remove(model)
+        if weight and os.path.isfile(weight) and os.path.exists(weight):
+            os.remove(weight)
+        if os.path.exists(test_data):
+            os.remove(test_data)
+
     return evaluate_result
 
 
@@ -118,8 +131,9 @@ def preprocessing_model(backend, hardware, model, weight, input_shape, base_save
     """
     if backend == "pytorch":
         if hardware == "Bolt":
+            opset_version = kwargs["opset_version"]
             from .pytorch2onnx import pytorch2onnx
-            model = pytorch2onnx(model, input_shape, base_save_dir)
+            model = pytorch2onnx(model, input_shape, base_save_dir, opset_version)
         elif kwargs["intermediate_format"] == "caffe":
             model_file = os.path.join(base_save_dir, "torch_model.pkl")
             shape_file = os.path.join(base_save_dir, "input_shape.pkl")
@@ -142,7 +156,8 @@ def preprocessing_model(backend, hardware, model, weight, input_shape, base_save
             backend = "caffe"
         else:
             from .pytorch2onnx import pytorch2onnx
-            model = pytorch2onnx(model, input_shape, base_save_dir)
+            opset_version = kwargs["opset_version"]
+            model = pytorch2onnx(model, input_shape, base_save_dir, opset_version)
             backend = "onnx"
     elif backend == "tensorflow":
         pb_model_file = os.path.join(base_save_dir, "tf_model.pb")
diff --git a/vega/evaluator/tools/pytorch2onnx.py b/vega/evaluator/tools/pytorch2onnx.py
index 51ddd8ff..184d9564 100644
--- a/vega/evaluator/tools/pytorch2onnx.py
+++ b/vega/evaluator/tools/pytorch2onnx.py
@@ -16,7 +16,7 @@
 from vega.common.general import General
 
 
-def pytorch2onnx(model, input_shape, base_save_dir):
+def pytorch2onnx(model, input_shape, base_save_dir, opset_version=9):
     """Convert the pytorch model to onnx model.
 
     :param model: pytorch model class
@@ -29,7 +29,7 @@ def pytorch2onnx(model, input_shape, base_save_dir):
     # model.load_state_dict(torch.load(weight))
     # Export the trained model to ONNX
     dump_input = Variable(torch.randn(input_shape))
-    torch.onnx.export(model, dump_input, "{}/torch_model.onnx".format(base_save_dir))
+    torch.onnx.export(model, dump_input, "{}/torch_model.onnx".format(base_save_dir), opset_version=opset_version)
     # try:
     #     subprocess.call(
     #         f"{General.python_command} -m onnxsim {base_save_dir}/torch_model.onnx "
diff --git a/vega/evaluator/tools/rest.py b/vega/evaluator/tools/rest.py
new file mode 100644
index 00000000..f67c12bc
--- /dev/null
+++ b/vega/evaluator/tools/rest.py
@@ -0,0 +1,20 @@
+# -*- coding: utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""Rest post operation."""
+
+import requests
+
+
+def post(host, files, data):
+    """Post a rest request."""
+    result = requests.post(host, files=files, data=data, proxies={"http": None})
+    data = result.json()
+    return data
diff --git a/vega/metrics/forward_latency.py b/vega/metrics/forward_latency.py
index 6480a645..29263c7c 100644
--- a/vega/metrics/forward_latency.py
+++ b/vega/metrics/forward_latency.py
@@ -100,6 +100,7 @@ def _calc_forward_latency_davinci(model, input, sess_config=None, num=10, evalua
     # backend = evaluate_config.get("backend")
     hardware = evaluate_config.get("hardware")
     remote_host = evaluate_config.get("remote_host")
+    opset_version = evaluate_config.get("opset_version")
     intermediate_format = evaluate_config.get("intermediate_format")
     worker_path = TaskOps().local_base_path
     save_data_file = os.path.join(worker_path, "input.bin")
@@ -114,18 +115,16 @@ def _calc_forward_latency_davinci(model, input, sess_config=None, num=10, evalua
         if torch.is_tensor(input):
             input = input.cpu().numpy()
         input.tofile(save_data_file)
-        for index in range(num):
-            reuse_model = False if index == 0 else True
-            results = evaluate("pytorch", hardware, remote_host, model, None, save_data_file, input_shape,
-                               reuse_model, job_id, intermediate_format=intermediate_format)
-            latency += np.float(results.get("latency"))
+        results = evaluate("pytorch", hardware, remote_host, model, None, save_data_file, input_shape,
+                           False, job_id, repeat_times=num, intermediate_format=intermediate_format,
+                           opset_version=opset_version)
+        latency = np.float(results.get("latency"))
     elif vega.is_tf_backend():
         input_shape = input.shape.as_list()
         test_data = np.random.random(input_shape).astype(np.float32)
         test_data.tofile(save_data_file)
-        for index in range(num):
-            reuse_model = False if index == 0 else True
-            results = evaluate("tensorflow", hardware, remote_host, model, None, save_data_file, input_shape,
-                               reuse_model, job_id)
-            latency += np.float(results.get("latency"))
-    return latency / num
+
+        results = evaluate("tensorflow", hardware, remote_host, model, None, save_data_file, input_shape,
+                           False, job_id, repeat_times=num)
+        latency = np.float(results.get("latency"))
+    return latency
diff --git a/vega/metrics/mindspore/segmentation_metric.py b/vega/metrics/mindspore/segmentation_metric.py
index 4891de9f..d0a5783c 100644
--- a/vega/metrics/mindspore/segmentation_metric.py
+++ b/vega/metrics/mindspore/segmentation_metric.py
@@ -11,6 +11,46 @@
 """Metric of segmentation task."""
 from mindspore.nn.metrics import Metric
 from vega.common import ClassFactory, ClassType
+import numpy as np
+
+
+def calc_confusion_matrix(output, mask, num_class):
+    """Calculate confusion matrix between output and mask.
+
+    :param output: predicted images
+    :type output: pytorch tensor
+    :param mask: images of ground truth
+    :type mask: pytorch tensor
+    :return: confusion matrix
+    :rtype: numpy matrix
+    """
+    confusion_matrix = np.zeros((num_class, num_class))
+    preds = output.argmax(axis=3).reshape(-1)
+    mask = mask.reshape(-1)
+    for predicted, label in zip(preds, mask):
+        if label < num_class:
+            confusion_matrix[predicted][label] += 1
+    return confusion_matrix
+
+
+def compute_iou(confusion_matrix):
+    """Compute IU from confusion matrix.
+
+    :param confusion_matrix: square confusion matrix.
+    :type confusion_matrix: numpy matrix
+    :return: IU vector.
+    :rtype: numpy vector
+    """
+    n_classes = confusion_matrix.shape[0]
+    IoU = np.zeros(n_classes)
+    for i in range(n_classes):
+        sum_columns = np.sum(confusion_matrix[:, i])
+        sum_rows = np.sum(confusion_matrix[i, :])
+        num_correct = confusion_matrix[i, i]
+        denom = sum_columns + sum_rows - num_correct
+        if denom > 0:
+            IoU[i] = num_correct / denom
+    return IoU
 
 
 @ClassFactory.register(ClassType.METRIC, alias='IoUMetric')
@@ -19,6 +59,7 @@ class IoUMetric(Metric):
 
     def __init__(self, num_class):
         self.num_class = num_class
+        self.confusion_sum = np.zeros((num_class, num_class))
 
     def update(self, *inputs):
         """Update the metric."""
@@ -38,8 +79,10 @@ def clear(self):
 
     def compute_metric(self, output, target):
         """Compute sr metric."""
-        # TODO
-        return 0
+        confusion = calc_confusion_matrix(output, target, self.num_class)
+        self.confusion_sum += confusion
+        iou = compute_iou(self.confusion_sum).mean()
+        return iou
 
     @property
     def objective(self):
diff --git a/vega/model_zoo/fusion.py b/vega/model_zoo/fusion.py
new file mode 100644
index 00000000..1daa1748
--- /dev/null
+++ b/vega/model_zoo/fusion.py
@@ -0,0 +1,48 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""Fuse Operator."""
+import logging
+import copy
+import vega
+from vega.modules.operators import Identity
+
+if vega.is_torch_backend():
+    import torch
+    from torch.nn.utils.fusion import fuse_conv_bn_weights
+
+
+def fuse(model, weights_file=None):
+    """Fuse Conv and BN."""
+    if not vega.is_torch_backend() or model.__class__.__name__ != 'DagNetwork':
+        return model
+    logging.info("Start operator fusion.")
+    for name, node in model.module_map.items():
+        module = node.module
+        if isinstance(node.module, torch.nn.Conv2d):
+            next_nodes = node.child_nodes
+            if next_nodes and isinstance(next_nodes[0].module, torch.nn.BatchNorm2d):
+                node.module = _fuse_conv_bn(module, next_nodes[0].module)
+                next_nodes[0].module = Identity()
+    if weights_file:
+        _save_model(model, weights_file)
+    return model
+
+
+def _fuse_conv_bn(conv, bn):
+    fused_conv = copy.deepcopy(conv)
+    fused_conv.weight, fused_conv.bias = fuse_conv_bn_weights(
+        fused_conv.weight, fused_conv.bias, bn.running_mean, bn.running_var, bn.eps, bn.weight, bn.bias)
+    return fused_conv
+
+
+def _save_model(model, weights_file):
+    if vega.is_torch_backend():
+        torch.save(model.state_dict(), weights_file)
diff --git a/vega/model_zoo/model_zoo.py b/vega/model_zoo/model_zoo.py
index b3ced2be..3cbb862b 100644
--- a/vega/model_zoo/model_zoo.py
+++ b/vega/model_zoo/model_zoo.py
@@ -9,18 +9,15 @@
 # MIT License for more details.
 
 """Model zoo."""
-import vega
+
+import os
 import logging
 import glob
 import numpy
 from collections import OrderedDict
-import os
+import vega
 from vega.networks.network_desc import NetworkDesc
 from vega.common.general import General
-from vega.modules.graph_utils import graph2desc
-from vega.modules.module import Module
-from vega.modules.arch import transform_architecture
-from vega.common.searchable import SearchableRegister
 
 
 class ModelZoo(object):
@@ -37,7 +34,7 @@ def set_location(cls, location):
         General.model_zoo.model_zoo_path = location
 
     @classmethod
-    def get_model(cls, model_desc=None, pretrained_model_file=None, exclude_weight_prefix=None):
+    def get_model(cls, model_desc=None, pretrained_model_file=None, head=None, is_fusion=False, **kwargs):
         """Get model from model zoo.
 
         :param network_name: the name of network, eg. ResNetVariant.
@@ -50,23 +47,28 @@ def get_model(cls, model_desc=None, pretrained_model_file=None, exclude_weight_p
         :rtype: model.
 
         """
-        model = None
-        if model_desc is not None:
-            try:
-                network = NetworkDesc(model_desc)
-                model = network.to_model()
-            except Exception as e:
-                logging.error("Failed to get model, model_desc={}, msg={}".format(model_desc, str(e)))
-                raise e
+        from vega.modules.module import Module
+        from vega.modules.arch import transform_architecture
+        from vega.model_zoo.fusion import fuse
+        if not model_desc:
+            raise ValueError("model desc can't be None when create model.")
+        try:
+            model = NetworkDesc(model_desc).to_model()
+            # return model
+        except Exception as e:
+            logging.error("Failed to get model, model_desc={}, msg={}".format(model_desc, str(e)))
+            raise e
         logging.info("Model was created.")
+        for k, v in kwargs.items():
+            setattr(model, k, v)
         if not isinstance(model, Module):
             model = cls.to_module(model)
         if pretrained_model_file is not None:
-            if exclude_weight_prefix:
-                model.exclude_weight_prefix = exclude_weight_prefix
-            model = cls._load_pretrained_model(model, pretrained_model_file, exclude_weight_prefix)
+            model.exclude_weight_prefix = head
+            model = cls._load_pretrained_model(model, pretrained_model_file, head)
         model = transform_architecture(model, pretrained_model_file)
-        model = SearchableRegister().active_search_event(model)
+        if is_fusion:
+            model = fuse(model)
         if model is None:
             raise ValueError("Failed to get mode, model is None.")
         return model
@@ -75,10 +77,20 @@ def get_model(cls, model_desc=None, pretrained_model_file=None, exclude_weight_p
     def to_module(cls, model):
         """Build model desc before get model."""
         if vega.is_ms_backend():
-            from vega.networks.mindspore.backbones.ms2vega import transform_model
-            return transform_model(model)
+            if hasattr(model, "module_type"):
+                import mindspore
+                if isinstance(model, mindspore.nn.Cell):
+                    return model
+                return model()
+            else:
+                from vega.networks.mindspore.backbones.ms2vega import transform_model
+                return transform_model(model)
         if vega.is_torch_backend():
-            return model
+            import torch
+            if isinstance(model, torch.nn.Module):
+                return model
+            else:
+                return model()
         if vega.is_tf_backend():
             try:
                 model_desc = cls.parse_desc_from_pretrained_model(model)
@@ -92,6 +104,7 @@ def parse_desc_from_pretrained_model(cls, src_model, pb_file=None):
         """Parse desc from Petrained Model."""
         import tensorflow.compat.v1 as tf
         from tensorflow.python.framework import tensor_util
+        from vega.modules.graph_utils import graph2desc
         tf.reset_default_graph()
         data_shape = (1, 224, 224, 3)
         x = tf.ones(data_shape)
@@ -131,15 +144,18 @@ def _load_pretrained_model(cls, model, pretrained_model_file, exclude_weight_pre
             if not os.path.isfile(pretrained_model_file):
                 raise Exception(f"Pretrained model is not existed, model={pretrained_model_file}")
             if vega.is_npu_device():
+                from vega.common.task_ops import TaskOps
+                import time
                 device = int(os.environ.get('DEVICE_ID', 0))
-                target_model_file = "/tmp/checkpoint_{}.pth".format(device)
-                cmd = "/bin/cp -f {} {} && sed -i 's/npu:[0-9]/npu:{}/g' {}".format(pretrained_model_file,
-                                                                                    target_model_file,
-                                                                                    device,
-                                                                                    target_model_file)
+                target_model_file = "{}/checkpoint_{}_{}.pth".format(
+                    TaskOps().temp_path, device, round(time.time() * 1000))
+                cmd = "/bin/cp -f {} {} && sed -i 's/npu:[0-9]/npu:{}/g' {}".format(
+                    pretrained_model_file, target_model_file, device, target_model_file)
                 ret = os.system(cmd)
                 logging.info("modify weight file result: " + str(ret))
                 checkpoint = torch.load(target_model_file)
+                if os.path.exists(target_model_file):
+                    os.remove(target_model_file)
             else:
                 checkpoint = torch.load(pretrained_model_file)
             if exclude_weight_prefix:
diff --git a/vega/modules/arch/combiner.py b/vega/modules/arch/combiner.py
index 339b0a5f..ef038d31 100644
--- a/vega/modules/arch/combiner.py
+++ b/vega/modules/arch/combiner.py
@@ -15,6 +15,14 @@
 from vega.modules.connections import Module
 
 
+def is_depth_wise_conv(module):
+    """Determine Conv2d."""
+    if hasattr(module, "groups"):
+        return module.groups != 1 and module.in_channels == module.out_channels
+    elif hasattr(module, "group"):
+        return module.group != 1 and module.in_channels == module.out_channels
+
+
 class ConnectionsArchParamsCombiner(object):
     """Get ConnectionsArchParamsCombiner."""
 
@@ -48,11 +56,15 @@ def _traversal(self, module):
         elif isinstance(module, ops.Conv2d):
             if self.pre_conv:
                 self.add_condition(module.name + '.in_channels', self.pre_conv.name + '.out_channels')
+                if is_depth_wise_conv(module):
+                    self.add_condition(module.name + '.out_channels', module.name + '.in_channels')
             self.pre_conv = module
         elif isinstance(module, ops.BatchNorm2d):
             self.add_condition(module.name + '.num_features', self.pre_conv.name + '.out_channels')
         elif isinstance(module, ops.Linear):
             self.add_condition(module.name + '.in_features', self.pre_conv.name + '.out_channels')
+        elif module.__class__.__name__ == "Reshape":
+            self.add_condition(module.name + '.shape', self.pre_conv.name + '.out_channels')
         elif isinstance(module, Module):
             for child in module.children():
                 self._traversal(child)
diff --git a/vega/modules/arch/prune_arch.py b/vega/modules/arch/prune_arch.py
index 4696ff39..eac22ff2 100644
--- a/vega/modules/arch/prune_arch.py
+++ b/vega/modules/arch/prune_arch.py
@@ -116,6 +116,21 @@ def fit_weights(module, x):
         return None
 
 
+@ClassFactory.register('Prune', 'Reshape')
+class ReshapePruneArchitecture(Architecture):
+    """Prune Reshape ops."""
+
+    @staticmethod
+    def decode(value, org_value):
+        """Decode arch params."""
+        return [org_value[0], sum(value)]
+
+    @staticmethod
+    def fit_weights(module, x):
+        """Do nothing."""
+        return None
+
+
 def freeze(module):
     """Freeze parameter."""
     if not is_torch_backend():
diff --git a/vega/modules/blocks/micro_decoder.py b/vega/modules/blocks/micro_decoder.py
index b4e6b8f7..30220984 100644
--- a/vega/modules/blocks/micro_decoder.py
+++ b/vega/modules/blocks/micro_decoder.py
@@ -13,13 +13,13 @@
 from vega.common import ClassFactory, ClassType
 from vega.modules.module import Module
 from vega.modules.connections import ProcessList, Sequential
-from vega.modules.operators import conv_bn_relu, conv3x3
+from vega.modules.operators import conv_bn_relu, conv3x3, conv_bn_relu6
 from vega.modules.operators import AggregateCell, ContextualCell_v1
 from vega.modules.operators import ops
 
 
 @ClassFactory.register(ClassType.NETWORK)
-class InvertedConv(Module):
+class InvertedConv(Sequential):
     """Create InvertedConv SearchSpace."""
 
     def __init__(self, inp, oup, stride, kernel=3, expand_ratio=1):
@@ -31,26 +31,20 @@ def __init__(self, inp, oup, stride, kernel=3, expand_ratio=1):
         :param kernel: kernel
         :param expand_ratio: channel increase multiplier
         """
-        super(InvertedConv, self).__init__()
         hidden_dim = round(inp * expand_ratio)
         conv = []
         if expand_ratio > 1:
             conv = [
-                ops.Conv2d(in_channels=inp, out_channels=hidden_dim,
-                           kernel_size=1, stride=1, padding=0, bias=False),
-                ops.BatchNorm2d(num_features=hidden_dim),
-                ops.Relu6(inplace=True)
+                conv_bn_relu6(C_in=inp, C_out=hidden_dim, kernel_size=1, stride=1, padding=0, inplace=True),
             ]
         conv = conv + [
-            ops.Conv2d(in_channels=hidden_dim, out_channels=hidden_dim, kernel_size=kernel,
-                       stride=stride, padding=kernel // 2, groups=hidden_dim, bias=False, depthwise=True),
-            ops.BatchNorm2d(num_features=hidden_dim),
-            ops.Relu6(inplace=True),
+            conv_bn_relu6(C_in=hidden_dim, C_out=hidden_dim, kernel_size=kernel, stride=stride, padding=kernel // 2,
+                          groups=hidden_dim, depthwise=True, inplace=True),
             ops.Conv2d(in_channels=hidden_dim, out_channels=oup,
                        kernel_size=1, stride=1, padding=0, bias=False),
             ops.BatchNorm2d(num_features=oup)
         ]
-        self.models = Sequential(*conv)
+        super(InvertedConv, self).__init__(*conv)
 
 
 @ClassFactory.register(ClassType.NETWORK)
diff --git a/vega/modules/cells/__init__.py b/vega/modules/cells/__init__.py
index da0f2b60..f3ee5046 100644
--- a/vega/modules/cells/__init__.py
+++ b/vega/modules/cells/__init__.py
@@ -1 +1,2 @@
 from .basic import *
+from .dag_cell import *
diff --git a/vega/modules/cells/dag_cell.py b/vega/modules/cells/dag_cell.py
new file mode 100644
index 00000000..1ceaa32a
--- /dev/null
+++ b/vega/modules/cells/dag_cell.py
@@ -0,0 +1,158 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+"""This is DAG Cell for network."""
+from vega.modules.module import Module
+from dag import DAG
+import numpy as np
+from vega.modules.operators import ops
+from vega.modules.connections import Sequential
+from vega.common.class_factory import ClassFactory, ClassType
+
+
+@ClassFactory.register(ClassType.NETWORK)
+class DagGraphCell(Module):
+    """Merge and process inputs to the middle-level graph."""
+
+    def __init__(self, adj_matrix, nodes, in_channels=64, out_channels=64):
+        super(DagGraphCell, self).__init__()
+        self.adj_matrix = adj_matrix
+        self.nodes = nodes
+        self.c_in = in_channels
+        self.c_out = out_channels
+        self._add_nodes()
+
+    def _add_nodes(self):
+        for node_id, node_name in enumerate(self.nodes):
+            module = ClassFactory.get_instance(ClassType.NETWORK, node_name, in_channels=self.c_in,
+                                               out_channels=self.c_out)
+            self.add_module(str(node_id), module)
+
+    def _create_dag(self):
+        dag = DAG()
+        for name, modules in self.named_children():
+            dag.add_node_if_not_exists(int(name))
+        frontier = [0]
+        num_vertices = np.shape(self.adj_matrix)[0]
+        while frontier:
+            node_id = frontier.pop()
+            for v in range(num_vertices):
+                if self.adj_matrix[node_id][v]:
+                    dag.add_edge(node_id, v)
+                    frontier.append(v)
+        self.out_tensors = {}
+        return dag
+
+    def forward(self, x, *args, **kwargs):
+        """Forward x."""
+        dag = self._create_dag()
+        node = dag.ind_nodes()[0]
+        out = self._forward_module(x, node, dag)
+        return out
+
+    def _forward_module(self, x, parent, dag):
+        parent_nodes = dag.predecessors(parent)
+        if len(parent_nodes) <= 1:
+            next_input = self._modules.get(str(parent))(x)
+        elif self.out_tensors.get(parent) and len(self.out_tensors.get(parent)) == len(parent_nodes) - 1:
+            out = self.out_tensors.pop(parent)
+            out.append(x)
+            next_input = self._modules.get(str(parent))(out)
+        else:
+            if parent not in self.out_tensors:
+                self.out_tensors[parent] = []
+            self.out_tensors[parent].append(x)
+            return None
+        children = dag.downstream(parent)
+        for child in children:
+            out = self._forward_module(next_input, child, dag)
+            if out is not None:
+                next_input = out
+        return next_input
+
+
+class ConvBnRelu(Module):
+    """Conv bn Relu class."""
+
+    def __init__(self, in_channels, out_channels, kernel_size=1, stride=1, padding=0):
+        super(ConvBnRelu, self).__init__()
+        self.conv_bn_relu = Sequential(
+            ops.Conv2d(in_channels, out_channels, kernel_size, stride, padding, bias=False),
+            ops.BatchNorm2d(out_channels),
+            ops.Relu(inplace=True)
+        )
+
+    def call(self, x):
+        """Call forward function."""
+        return self.conv_bn_relu(x)
+
+
+@ClassFactory.register(ClassType.NETWORK)
+class Conv3x3BnRelu(Module):
+    """The Class of 3x3 convolution with batch norm and ReLU activation."""
+
+    def __init__(self, in_channels, out_channels):
+        super(Conv3x3BnRelu, self).__init__()
+        self.conv3x3 = ConvBnRelu(in_channels, out_channels, 3, 1, 1)
+
+    def call(self, x):
+        """Call forward function."""
+        return self.conv3x3(x)
+
+
+@ClassFactory.register(ClassType.NETWORK)
+class Conv1x1BnRelu(Module):
+    """The Class of 1x1 convolution with batch norm and ReLU activation."""
+
+    def __init__(self, in_channels, out_channels):
+        super(Conv1x1BnRelu, self).__init__()
+        self.conv1x1 = ConvBnRelu(in_channels, out_channels, 1, 1, 0)
+
+    def call(self, x):
+        """Call forward function."""
+        return self.conv1x1(x)
+
+
+@ClassFactory.register(ClassType.NETWORK)
+class MaxPool3x3(Module):
+    """The class of 3x3 max pool with no subsampling."""
+
+    def __init__(self, kernel_size=3, stride=1, padding=1):
+        super(MaxPool3x3, self).__init__()
+        self.maxpool = ops.MaxPool2d(kernel_size, stride, padding)
+
+    def call(self, x):
+        """Call forward function."""
+        return self.maxpool(x)
+
+
+@ClassFactory.register(ClassType.NETWORK)
+class Input(Module):
+    """Input Class."""
+
+    def __init__(self, size=None):
+        super(Input, self).__init__()
+        self.size = size
+
+    def call(self, x):
+        """Call forward function."""
+        return x
+
+
+@ClassFactory.register(ClassType.NETWORK)
+class Output(Module):
+    """Output Class."""
+
+    def __init__(self, size=None):
+        super(Output, self).__init__()
+        self.size = size
+
+    def call(self, x, **kwargs):
+        """Call forward function."""
+        return ops.concat(x, 1)
diff --git a/vega/modules/connections/connections.py b/vega/modules/connections/connections.py
index 709a088c..0d16f643 100644
--- a/vega/modules/connections/connections.py
+++ b/vega/modules/connections/connections.py
@@ -110,6 +110,21 @@ def from_module(cls, module):
             model.append(module, name)
         return model
 
+    def to_desc(self, recursion=True):
+        """Convert module to desc."""
+        if not recursion:
+            return self.desc
+        desc = {"type": self.__class__.__name__}
+        modules = []
+        for name, module in self.named_children():
+            if hasattr(module, 'to_desc'):
+                sub_desc = module.to_desc()
+                desc[name] = sub_desc
+                modules.append(name)
+        if modules:
+            desc["modules"] = modules
+        return desc
+
 
 class ModuleList(Module):
     """Class of LeakyReLU."""
@@ -268,32 +283,6 @@ def call(self, inputs):
         return output
 
 
-@ClassFactory.register(SearchSpaceType.CONNECTIONS)
-class Reshape(ConnectionsDecorator):
-    """Create Lambda for forward x."""
-
-    def __init__(self, *models):
-        super(Reshape, self).__init__(*models)
-
-    def call(self, x):
-        """Forward x."""
-        inputs = None
-        new_shape = None
-        for model in self.children():
-            if model is not None:
-                if inputs is None:
-                    inputs = model(x)
-                else:
-                    new_shape = model(x)
-        import torch
-        return torch.reshape(inputs, tuple(new_shape.to("cpu").numpy()))
-
-    @property
-    def out_channels(self):
-        """Get out channels."""
-        return [k.out_channels for k in self.children() if hasattr(k, 'out_channels')]
-
-
 @ClassFactory.register(ClassType.NETWORK)
 class Repeat(Module):
     """Repeat SearchSpace."""
diff --git a/vega/modules/loss/__init__.py b/vega/modules/loss/__init__.py
index 095b6ca6..bf6c0450 100644
--- a/vega/modules/loss/__init__.py
+++ b/vega/modules/loss/__init__.py
@@ -9,4 +9,5 @@
     "mean_loss": ["trainer.loss:MeanLoss"],
     "ProbOhemCrossEntropy2d": ["trainer.loss:ProbOhemCrossEntropy2d"],
     "gan_loss": ["trainer.loss:GANLoss"],
+    "ms_custom_loss": ["trainer.loss:CustomSoftmaxCrossEntropyWithLogits"],
 })
diff --git a/vega/modules/loss/ms_custom_loss.py b/vega/modules/loss/ms_custom_loss.py
new file mode 100644
index 00000000..65fdb8fc
--- /dev/null
+++ b/vega/modules/loss/ms_custom_loss.py
@@ -0,0 +1,39 @@
+# -*- coding: utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""CustomSoftmaxCrossEntropyWithLogits."""
+
+from mindspore.ops import operations as P
+from mindspore.ops import functional as F
+from mindspore.nn.loss import SoftmaxCrossEntropyWithLogits
+from vega.common import ClassFactory, ClassType
+
+
+@ClassFactory.register(ClassType.LOSS)
+class CustomSoftmaxCrossEntropyWithLogits(SoftmaxCrossEntropyWithLogits):
+    """CustomSoftmaxCrossEntropyWithLogits loss class."""
+
+    def __init__(self, sparse=True, reduction='none'):
+        super(CustomSoftmaxCrossEntropyWithLogits, self).__init__(sparse=True)
+        self.reshape = P.Reshape()
+        self.squeeze = P.Squeeze(1)
+
+    def construct(self, logits, labels):
+        """Forward of CustomSoftmaxCrossEntropyWithLogits."""
+        logits = self.reshape(logits, (-1, F.shape(logits)[-1]))
+        labels = self.reshape(labels, (-1, 1))
+        labels = self.squeeze(labels)
+        if self.sparse:
+            if self.reduction == 'mean':
+                x = self.sparse_softmax_cross_entropy(logits, labels)
+                return x
+            labels = self.one_hot(labels, F.shape(logits)[-1], self.on_value, self.off_value)
+        x = self.softmax_cross_entropy(logits, labels)[0]
+        return self.get_loss(x)
diff --git a/vega/modules/nodes.py b/vega/modules/nodes.py
index 54b91fd7..bd605f38 100644
--- a/vega/modules/nodes.py
+++ b/vega/modules/nodes.py
@@ -124,7 +124,7 @@ def __init__(self, *args, **kwargs):
         self.in_channels = None
         self.stride = None
         self.padding = None
-        self.dilations = None
+        self.dilation = None
         self.bias = False
         self.bn = False
         super(Conv2DNode, self).__init__(*args, **kwargs)
@@ -158,7 +158,7 @@ def from_ops(self):
                 attr = op.node_def.attr
                 self.stride = list(attr.get('strides').list.i)[2]
                 self.padding = str(attr.get('padding').s, encoding='utf8')
-                self.dilations = list(attr.get('dilations').list.i)
+                self.dilation = list(attr.get('dilations').list.i)[1]
 
 
 class BatchNorm2dNode(Node):
diff --git a/vega/modules/operators/conv.py b/vega/modules/operators/conv.py
index 51eea19d..113d9c8a 100644
--- a/vega/modules/operators/conv.py
+++ b/vega/modules/operators/conv.py
@@ -42,45 +42,58 @@ def conv7x7(inchannel, outchannel, stride=1, bias=False, dilation=1):
 
 
 @ClassFactory.register(ClassType.NETWORK)
-def conv_bn_relu6(C_in, C_out, kernel_size=3, stride=1, padding=0, affine=True):
+def conv_bn_relu6(C_in, C_out, kernel_size=3, stride=1, padding=0, affine=True,
+                  groups=1, depthwise=False, inplace=False):
     """Create group of Convolution + BN + Relu6 function."""
-    return ConvBnRelu(C_in, C_out, kernel_size, stride, padding, affine=affine, use_relu6=True)
+    return ConvBnRelu(C_in, C_out, kernel_size, stride, padding, affine=affine, use_relu6=True,
+                      groups=groups, depthwise=depthwise, inplace=False)
 
 
 @ClassFactory.register(ClassType.NETWORK)
-def conv_bn_relu(C_in, C_out, kernel_size, stride, padding, affine=True):
+def conv_bn_relu(C_in, C_out, kernel_size, stride, padding, affine=True, groups=1, depthwise=False, inplace=False):
     """Create group of Convolution + BN + Relu function."""
-    return ConvBnRelu(C_in, C_out, kernel_size, stride, padding, affine=affine)
+    return ConvBnRelu(C_in, C_out, kernel_size, stride, padding, affine=affine,
+                      groups=groups, depthwise=depthwise, inplace=False)
 
 
 @ClassFactory.register(ClassType.NETWORK)
 class ConvBnRelu(ops.Module):
     """Create group of Convolution + BN + Relu."""
 
-    def __init__(self, C_in, C_out, kernel_size, stride, padding, Conv2d='Conv2d', affine=True, use_relu6=False,
-                 norm_layer='BN',
+    def __init__(self, C_in, C_out, kernel_size, stride, padding, groups=1, depthwise=False,
+                 Conv2d='Conv2d', affine=True, use_relu6=False, inplace=False, norm_layer='BN',
                  has_bn=True, has_relu=True, **kwargs):
         """Construct ConvBnRelu class."""
         super(ConvBnRelu, self).__init__()
+        features = []
+        conv2d = None
         if Conv2d == 'Conv2d':
-            self.conv2d = ops.Conv2d(
-                C_in, C_out, kernel_size, stride=stride, padding=padding, bias=False)
+            conv2d = ops.Conv2d(C_in, C_out, kernel_size, stride=stride, padding=padding, bias=False,
+                                groups=groups, depthwise=depthwise)
         elif Conv2d == 'ConvWS2d':
-            self.conv2d = ops.ConvWS2d(
-                C_in, C_out, kernel_size, stride=stride, padding=padding, bias=False)
+            conv2d = ops.ConvWS2d(C_in, C_out, kernel_size, stride=stride, padding=padding, bias=False,
+                                  groups=groups, depthwise=depthwise)
+        if conv2d:
+            features.append(conv2d)
         if has_bn:
+            batch_norm2d = None
             if norm_layer == 'BN':
-                self.batch_norm2d = ops.BatchNorm2d(C_out, affine=affine)
+                batch_norm2d = ops.BatchNorm2d(C_out, affine=affine)
             elif norm_layer == 'GN':
                 num_groups = kwargs.pop('num_groups')
-                self.batch_norm2d = ops.GroupNorm(num_groups, C_out, affine=affine)
+                batch_norm2d = ops.GroupNorm(num_groups, C_out, affine=affine)
             elif norm_layer == 'Sync':
-                self.batch_norm2d = ops.SyncBatchNorm(C_out, affine=affine)
+                batch_norm2d = ops.SyncBatchNorm(C_out, affine=affine)
+            if batch_norm2d:
+                features.append(batch_norm2d)
         if has_relu:
             if use_relu6:
-                self.relu = ops.Relu6(inplace=False)
+                relu = ops.Relu6(inplace=inplace)
             else:
-                self.relu = ops.Relu(inplace=False)
+                relu = ops.Relu(inplace=inplace)
+            features.append(relu)
+        for idx, model in enumerate(features):
+            self.add_module(str(idx), model)
 
 
 @ClassFactory.register(ClassType.NETWORK)
diff --git a/vega/modules/operators/functions/adaptive_weight_ms.py b/vega/modules/operators/functions/adaptive_weight_ms.py
new file mode 100644
index 00000000..df645bcd
--- /dev/null
+++ b/vega/modules/operators/functions/adaptive_weight_ms.py
@@ -0,0 +1,76 @@
+# -*- coding: utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+"""Adaptive weight."""
+import os
+import logging
+from mindspore.train.serialization import save_checkpoint, load_checkpoint
+from mindspore import Tensor
+import numpy as np
+import uuid
+
+
+def adaptive_weight(ckpt_file, ms_model):
+    """Adapte the weight shape."""
+    parameter_dict = load_checkpoint(ckpt_file)
+    net_parameter = ms_model.parameters_and_names()
+    new_ms_params_list = []
+    for index, paras in enumerate(net_parameter):
+        net_para_name = paras[0]
+        net_para_shape = paras[1].data.shape
+
+        if net_para_name in parameter_dict:
+            init_weight = parameter_dict[net_para_name].data
+            init_para_shape = init_weight.shape
+
+            if net_para_shape != init_para_shape:
+                if "conv" in net_para_name:
+                    new_weight = _adaptive_conv(init_weight, net_para_shape)
+                elif "batch_norm" in net_para_name:
+                    new_weight = _adaptive_bn(init_weight, net_para_shape)
+                else:
+                    continue
+                logging.debug("parameter shape not match,para name: {}, init_shape:{}, net_para_shape:{}".
+                              format(net_para_name, init_para_shape, net_para_shape))
+            param_dict = {}
+            param_dict['name'] = net_para_name
+            param_dict['data'] = init_weight if net_para_shape == init_para_shape else new_weight
+            new_ms_params_list.append(param_dict)
+            # parameter_dict[net_para_name].data = new_weight
+    save_path = os.path.dirname(ckpt_file)
+    save_file_name = os.path.join(save_path, "adaptive_" + uuid.uuid1().hex[:8] + ".ckpt")
+    save_checkpoint(new_ms_params_list, save_file_name)
+    if ckpt_file.startswith("torch2ms_"):
+        os.remove(ckpt_file)
+    return save_file_name
+
+
+def _adaptive_conv(init_weight, new_shape):
+    new_weight = init_weight.asnumpy()
+    init_shape = init_weight.shape
+    if init_shape[0] >= new_shape[0]:
+        new_weight = new_weight[0:new_shape[0]]
+    else:
+        new_weight = np.tile(new_weight, (int(new_shape[0] / init_shape[0]), 1, 1, 1))
+
+    if init_shape[1] >= new_shape[1]:
+        new_weight = new_weight[:, 0:new_shape[1]]
+    else:
+        new_weight = np.tile(new_weight, (1, int(new_shape[1] / init_shape[1]), 1, 1))
+    return Tensor(new_weight)
+
+
+def _adaptive_bn(init_weight, new_shape):
+    new_weight = init_weight.asnumpy()
+    init_shape = init_weight.shape
+    if init_shape[0] >= new_shape[0]:
+        new_weight = new_weight[0:new_shape[0]]
+    else:
+        new_weight = np.tile(new_weight, int(new_shape[0] / init_shape[0]))
+    return Tensor(new_weight)
diff --git a/vega/modules/operators/functions/mindspore_fn.py b/vega/modules/operators/functions/mindspore_fn.py
index 4b72d5b8..4a5c67bf 100644
--- a/vega/modules/operators/functions/mindspore_fn.py
+++ b/vega/modules/operators/functions/mindspore_fn.py
@@ -137,7 +137,7 @@ def pretrained(self, pretrained_model_file=None):
                         ms_pretrained_weight = os.path.join(pretrained_model_file, file)
                         break
         if self.need_adjust:
-            from .pytorch_to_ms import adaptive_weight
+            from .adaptive_weight_ms import adaptive_weight
             ms_pretrained_weight = adaptive_weight(ms_pretrained_weight, self)
         return ms_pretrained_weight
 
diff --git a/vega/modules/operators/functions/pytorch_fn.py b/vega/modules/operators/functions/pytorch_fn.py
index 0dca48c1..bd974a11 100644
--- a/vega/modules/operators/functions/pytorch_fn.py
+++ b/vega/modules/operators/functions/pytorch_fn.py
@@ -46,11 +46,9 @@ def remap_state_dict(self, own_state_dict, state_dict, head_prefix=None):
         """Remap state dict from npu state files."""
         if "state_dict" in state_dict.keys():
             state_dict = state_dict["state_dict"]
-        own_keys = list(own_state_dict.keys())
+        own_keys = [k for k in own_state_dict.keys() if not k.startswith(head_prefix)] if head_prefix else list(
+            own_state_dict.keys())
         input_keys = list(state_dict.keys())
-        if len(own_keys) != len(input_keys):
-            raise Exception("own_state_dict and state_dict have unmatched key length")
-
         new_state_dict = {}
         own_key_prefix_occurrence_map = {}
         input_key_prefix_occurrence_map = {}
@@ -179,6 +177,24 @@ def forward(self, inputs=None, *args, **kwargs):
         return self.call(inputs, *args, **kwargs)
 
 
+@ClassFactory.register(ClassType.NETWORK)
+class Pad(nn.Module, OperatorSerializable):
+    """Pad block."""
+
+    def __init__(self, mode="constant", padding=None):
+        self.mode = mode
+        self.padding = padding
+        super().__init__()
+
+    def forward(self, input, pads=None, value=0):
+        """Call forward."""
+        if self.padding is not None:
+            pads = self.padding
+        elif pads is None:
+            raise TypeError("forward() missing 1 required positional argument: 'pads'")
+        return F.pad(input, list(pads), mode=self.mode, value=value)
+
+
 @ClassFactory.register(ClassType.NETWORK)
 class QuantizeConv2d(QuantConv2d, Module, OperatorSerializable):
     """QuantizeConv2d Module inherit nn.Module."""
@@ -686,160 +702,6 @@ def forward(self, x):
         return super(Embedding, self).forward(x)
 
 
-@ClassFactory.register(ClassType.NETWORK)
-class Clip(nn.Module, OperatorSerializable):
-    """Clip of torch."""
-
-    def __init__(self, min=float("-inf"), max=float("inf")):
-        """Construct Clip class."""
-        super(Clip, self).__init__()
-        self.min = float(min)
-        self.max = float(max)
-
-    def forward(self, x):
-        """Do an inference on Clip.
-
-        :param x: input tensor
-        :return: output tensor
-        """
-        return torch.clamp(x, min=0, max=self.max)
-
-
-@ClassFactory.register(ClassType.NETWORK)
-class Shape(nn.Module, OperatorSerializable):
-    """Shape of torch."""
-
-    def __init__(self, start=0, end=None):
-        """Construct Shape class."""
-        super(Shape, self).__init__()
-        self.start = start
-        self.end = end
-
-    def forward(self, x):
-        """Do an inference on Shape.
-
-        :param x: input tensor
-        :return: output tensor
-        """
-        if self.end:
-            output = torch.tensor(x.shape)[self.start:self.end]
-        else:
-            output = torch.tensor(x.shape)[self.start:]
-        return output.to(vega.get_devices())
-
-
-@ClassFactory.register(ClassType.NETWORK)
-class Gather(nn.Module, OperatorSerializable):
-    """Gather block."""
-
-    def __init__(self, axis=0):
-        """Construct Gather class."""
-        super(Gather, self).__init__()
-        self.axis = axis  # compatible with dim in pytorch
-
-    def forward(self, x):
-        """Do an inference on Gather.
-
-        :param x: input tensor
-        :return: output tensor
-        """
-        return torch.gather(x, self.axis, torch.tensor(0))
-
-
-@ClassFactory.register(ClassType.NETWORK)
-class Unsqueeze(nn.Module, OperatorSerializable):
-    """Unsqueeze block."""
-
-    def __init__(self, axes):
-        """Construct Identity class."""
-        super(Unsqueeze, self).__init__()
-        self.axes = axes
-
-    def forward(self, x):
-        """Do an inference on Unsqueeze.
-
-        :param x: input tensor
-        :return: output tensor
-        """
-        if not isinstance(self.axes, list):
-            logging.error("Unsqueeze axes: {} must be list".format(self.axes))
-            return None
-        output = x
-        for axis in self.axes:
-            output = torch.unsqueeze(output, axis)
-        return output
-
-
-@ClassFactory.register(ClassType.NETWORK)
-class ConcatTensor(nn.Module, OperatorSerializable):
-    """ConcatTensor block."""
-
-    def __init__(self, axis=0):
-        """Construct ConcatTensor class."""
-        super(ConcatTensor, self).__init__()
-        self.axis = axis
-
-    def forward(self, x):
-        """Do an inference on ConcatTensor.
-
-        :param x: input tensor
-        :return: output tensor
-        """
-        return torch.cat((x, (torch.tensor([-1]).to(vega.get_devices()))), dim=self.axis)
-
-
-@ClassFactory.register(ClassType.NETWORK)
-class Mean(nn.Module, OperatorSerializable):
-    """Mean block."""
-
-    def __init__(self, axes=None, keepdims=False):
-        """Construct Mean class."""
-        super(Mean, self).__init__()
-        self.axes = axes
-        self.keepdims = keepdims
-
-    def forward(self, x):
-        """Do an inference on Mean.
-
-        :param x: input tensor
-        :return: output tensor
-        """
-        if self.axes:
-            return torch.mean(x, dim=self.axes, keepdim=self.keepdims)
-        return torch.mean(x, keepdim=self.keepdims)
-
-
-@ClassFactory.register(ClassType.NETWORK)
-class Pad(nn.Module, OperatorSerializable):
-    """Pad block."""
-
-    def __init__(self, mode="constant", padding=None):
-        self.mode = mode
-        self.padding = padding
-        super().__init__()
-
-    def forward(self, input, pads=None, value=0):
-        """Call forward."""
-        if self.padding is not None:
-            pads = self.padding
-        elif pads is None:
-            raise TypeError("forward() missing 1 required positional argument: 'pads'")
-        return F.pad(input, list(pads), mode=self.mode, value=value)
-
-
-@ClassFactory.register(ClassType.NETWORK)
-class Reshape(nn.Module, OperatorSerializable):
-    """Reshape class."""
-
-    def __init__(self, shape=[-1, 1024]):
-        super().__init__()
-        self.shape = shape
-
-    def forward(self, input: torch.Tensor):
-        """Forward function."""
-        return torch.reshape(input, tuple(self.shape))
-
-
 def concat(inputs, dim=1):
     """Call concat according to backends."""
     return torch.cat(inputs, dim=dim)
diff --git a/vega/modules/operators/functions/pytorch_to_ms.py b/vega/modules/operators/functions/pytorch_to_ms.py
index 3c30ce27..c28cfffe 100644
--- a/vega/modules/operators/functions/pytorch_to_ms.py
+++ b/vega/modules/operators/functions/pytorch_to_ms.py
@@ -11,9 +11,8 @@
 import os
 import torch
 import logging
-from mindspore.train.serialization import save_checkpoint, load_checkpoint
+from mindspore.train.serialization import save_checkpoint
 from mindspore import Tensor
-import numpy as np
 import uuid
 
 
@@ -128,63 +127,3 @@ def pytorch2mindspore_extend(pth_file, model):
     save_file_name = os.path.join(save_path, "torch2ms_" + uuid.uuid1().hex[:8] + ".ckpt")
     save_checkpoint(ms_params_list, save_file_name)
     return save_file_name
-
-
-def adaptive_weight(ckpt_file, ms_model):
-    """Adapte the weight shape."""
-    parameter_dict = load_checkpoint(ckpt_file)
-    net_parameter = ms_model.parameters_and_names()
-    new_ms_params_list = []
-    for index, paras in enumerate(net_parameter):
-        net_para_name = paras[0]
-        net_para_shape = paras[1].data.shape
-
-        if net_para_name in parameter_dict:
-            init_weight = parameter_dict[net_para_name].data
-            init_para_shape = init_weight.shape
-
-            if net_para_shape != init_para_shape:
-                if "conv" in net_para_name:
-                    new_weight = _adaptive_conv(init_weight, net_para_shape)
-                elif "batch_norm" in net_para_name:
-                    new_weight = _adaptive_bn(init_weight, net_para_shape)
-                else:
-                    continue
-                logging.debug("parameter shape not match,para name: {}, init_shape:{}, net_para_shape:{}".
-                              format(net_para_name, init_para_shape, net_para_shape))
-            param_dict = {}
-            param_dict['name'] = net_para_name
-            param_dict['data'] = init_weight if net_para_shape == init_para_shape else new_weight
-            new_ms_params_list.append(param_dict)
-            # parameter_dict[net_para_name].data = new_weight
-    save_path = os.path.dirname(ckpt_file)
-    save_file_name = os.path.join(save_path, "adaptive_" + uuid.uuid1().hex[:8] + ".ckpt")
-    save_checkpoint(new_ms_params_list, save_file_name)
-    if ckpt_file.startswith("torch2ms_"):
-        os.remove(ckpt_file)
-    return save_file_name
-
-
-def _adaptive_conv(init_weight, new_shape):
-    new_weight = init_weight.asnumpy()
-    init_shape = init_weight.shape
-    if init_shape[0] >= new_shape[0]:
-        new_weight = new_weight[0:new_shape[0]]
-    else:
-        new_weight = np.tile(new_weight, (int(new_shape[0] / init_shape[0]), 1, 1, 1))
-
-    if init_shape[1] >= new_shape[1]:
-        new_weight = new_weight[:, 0:new_shape[1]]
-    else:
-        new_weight = np.tile(new_weight, (1, int(new_shape[1] / init_shape[1]), 1, 1))
-    return Tensor(new_weight)
-
-
-def _adaptive_bn(init_weight, new_shape):
-    new_weight = init_weight.asnumpy()
-    init_shape = init_weight.shape
-    if init_shape[0] >= new_shape[0]:
-        new_weight = new_weight[0:new_shape[0]]
-    else:
-        new_weight = np.tile(new_weight, int(new_shape[0] / init_shape[0]))
-    return Tensor(new_weight)
diff --git a/vega/modules/operators/ops.py b/vega/modules/operators/ops.py
index 3e0ebeb1..7be2ac46 100644
--- a/vega/modules/operators/ops.py
+++ b/vega/modules/operators/ops.py
@@ -24,14 +24,6 @@
     GroupNorm = fn.GroupNorm
     SyncBatchNorm = fn.SyncBatchNorm
     ConvTranspose2d = fn.ConvTranspose2d
-    Clip = fn.Clip
-    Shape = fn.Shape
-    Gather = fn.Gather
-    Unsqueeze = fn.Unsqueeze
-    ConcatTensor = fn.ConcatTensor
-    Mean = fn.Mean
-    Pad = fn.Pad
-    Reshape = fn.Reshape
 
 Module = fn.Module
 Conv2d = fn.Conv2d
diff --git a/vega/networks/__init__.py b/vega/networks/__init__.py
index 414b69f2..9059f749 100644
--- a/vega/networks/__init__.py
+++ b/vega/networks/__init__.py
@@ -13,11 +13,10 @@
 from vega.common.class_factory import ClassFactory
 from .network_desc import NetworkDesc
 
-
 ClassFactory.lazy_register("vega.networks", {
     "adelaide": ["AdelaideFastNAS"],
     "bert": ["BertClassification", "TinyBertForPreTraining", "BertClassificationHeader"],
-    "dnet": ["DNet", "DNetBackbone"],
+    "dnet": ["DNet", "DNetBackbone", "EncodedBlock"],
     "erdb_esr": ["ESRN"],
     "faster_backbone": ["FasterBackbone"],
     "faster_rcnn": ["FasterRCNN"],
diff --git a/vega/networks/dnet.py b/vega/networks/dnet.py
index bd36f892..5935e8f3 100644
--- a/vega/networks/dnet.py
+++ b/vega/networks/dnet.py
@@ -169,6 +169,7 @@ def call(self, x, **kwargs):
         return x
 
 
+@ClassFactory.register(ClassType.NETWORK)
 class EncodedBlock(Module):
     """Encode block."""
 
diff --git a/vega/networks/mindspore/__init__.py b/vega/networks/mindspore/__init__.py
index 80cbf63e..c81175d2 100644
--- a/vega/networks/mindspore/__init__.py
+++ b/vega/networks/mindspore/__init__.py
@@ -19,4 +19,5 @@
     "backbones.load_official_model": ["OffcialModelLoader"],
     "backbones.resnet_ms": ["ResNetMs"],
     "losses.mix_auxiliary_loss": ["MixAuxiliaryLoss"],
+    "faster_rcnn.faster_rcnn_resnet": ["Faster_Rcnn_MD"]
 })
diff --git a/vega/networks/mindspore/faster_rcnn/anchor_generator.py b/vega/networks/mindspore/faster_rcnn/anchor_generator.py
new file mode 100644
index 00000000..1a4bbc20
--- /dev/null
+++ b/vega/networks/mindspore/faster_rcnn/anchor_generator.py
@@ -0,0 +1,86 @@
+# Copyright 2020-2021 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""FasterRcnn anchor generator."""
+
+import numpy as np
+
+
+class AnchorGenerator():
+    """Anchor generator for FasterRcnn."""
+
+    def __init__(self, base_size, scales, ratios, scale_major=True, ctr=None):
+        """Anchor generator init method."""
+        self.base_size = base_size
+        self.scales = np.array(scales)
+        self.ratios = np.array(ratios)
+        self.scale_major = scale_major
+        self.ctr = ctr
+        self.base_anchors = self.gen_base_anchors()
+
+    def gen_base_anchors(self):
+        """Generate a single anchor."""
+        w = self.base_size
+        h = self.base_size
+        if self.ctr is None:
+            x_ctr = 0.5 * (w - 1)
+            y_ctr = 0.5 * (h - 1)
+        else:
+            x_ctr, y_ctr = self.ctr
+
+        h_ratios = np.sqrt(self.ratios)
+        w_ratios = 1 / h_ratios
+        if self.scale_major:
+            ws = (w * w_ratios[:, None] * self.scales[None, :]).reshape(-1)
+            hs = (h * h_ratios[:, None] * self.scales[None, :]).reshape(-1)
+        else:
+            ws = (w * self.scales[:, None] * w_ratios[None, :]).reshape(-1)
+            hs = (h * self.scales[:, None] * h_ratios[None, :]).reshape(-1)
+
+        base_anchors = np.stack(
+            [
+                x_ctr - 0.5 * (ws - 1), y_ctr - 0.5 * (hs - 1),
+                x_ctr + 0.5 * (ws - 1), y_ctr + 0.5 * (hs - 1)
+            ],
+            axis=-1).round()
+
+        return base_anchors
+
+    def _meshgrid(self, x, y, row_major=True):
+        """Generate grid."""
+        xx = np.repeat(x.reshape(1, len(x)), len(y), axis=0).reshape(-1)
+        yy = np.repeat(y, len(x))
+        if row_major:
+            return xx, yy
+
+        return yy, xx
+
+    def grid_anchors(self, featmap_size, stride=16):
+        """Generate anchor list."""
+        base_anchors = self.base_anchors
+
+        feat_h, feat_w = featmap_size
+        shift_x = np.arange(0, feat_w) * stride
+        shift_y = np.arange(0, feat_h) * stride
+        shift_xx, shift_yy = self._meshgrid(shift_x, shift_y)
+        shifts = np.stack([shift_xx, shift_yy, shift_xx, shift_yy], axis=-1)
+        shifts = shifts.astype(base_anchors.dtype)
+        # first feat_w elements correspond to the first row of shifts
+        # add A anchors (1, A, 4) to K shifts (K, 1, 4) to get
+        # shifted anchors (K, A, 4), reshape to (K*A, 4)
+
+        all_anchors = base_anchors[None, :, :] + shifts[:, None, :]
+        all_anchors = all_anchors.reshape(-1, 4)
+
+        return all_anchors
diff --git a/vega/networks/mindspore/faster_rcnn/bbox_assign_sample_stage2.py b/vega/networks/mindspore/faster_rcnn/bbox_assign_sample_stage2.py
new file mode 100644
index 00000000..15536122
--- /dev/null
+++ b/vega/networks/mindspore/faster_rcnn/bbox_assign_sample_stage2.py
@@ -0,0 +1,192 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+"""FasterRcnn tpositive and negative sample screening for Rcnn."""
+
+import numpy as np
+import mindspore.nn as nn
+import mindspore.common.dtype as mstype
+from mindspore.ops import operations as P
+from mindspore.common.tensor import Tensor
+
+
+class BboxAssignSampleForRcnn(nn.Cell):
+    """
+    Bbox assigner and sampler definition.
+
+    Args:
+        config (dict): Config.
+        batch_size (int): Batchsize.
+        num_bboxes (int): The anchor nums.
+        add_gt_as_proposals (bool): add gt bboxes as proposals flag.
+
+    Returns:
+        Tensor, output tensor.
+        bbox_targets: bbox location, (batch_size, num_bboxes, 4)
+        bbox_weights: bbox weights, (batch_size, num_bboxes, 1)
+        labels: label for every bboxes, (batch_size, num_bboxes, 1)
+        label_weights: label weight for every bboxes, (batch_size, num_bboxes, 1)
+
+    Examples:
+        BboxAssignSampleForRcnn(config, 2, 1024, True)
+    """
+
+    def __init__(self, config, batch_size, num_bboxes, add_gt_as_proposals):
+        super(BboxAssignSampleForRcnn, self).__init__()
+        cfg = config
+        self.dtype = np.float32
+        self.ms_type = mstype.float32
+        self.batch_size = batch_size
+        self.neg_iou_thr = cfg.neg_iou_thr_stage2
+        self.pos_iou_thr = cfg.pos_iou_thr_stage2
+        self.min_pos_iou = cfg.min_pos_iou_stage2
+        self.num_gts = cfg.num_gts
+        self.num_bboxes = num_bboxes
+        self.num_expected_pos = cfg.num_expected_pos_stage2
+        self.num_expected_neg = cfg.num_expected_neg_stage2
+        self.num_expected_total = cfg.num_expected_total_stage2
+
+        self.add_gt_as_proposals = add_gt_as_proposals
+        self.label_inds = Tensor(np.arange(1, self.num_gts + 1).astype(np.int32))
+        self.add_gt_as_proposals_valid = Tensor(np.full(self.num_gts, self.add_gt_as_proposals, dtype=np.int32))
+
+        self.concat = P.Concat(axis=0)
+        self.max_gt = P.ArgMaxWithValue(axis=0)
+        self.max_anchor = P.ArgMaxWithValue(axis=1)
+        self.sum_inds = P.ReduceSum()
+        self.iou = P.IOU()
+        self.greaterequal = P.GreaterEqual()
+        self.greater = P.Greater()
+        self.select = P.Select()
+        self.gatherND = P.GatherNd()
+        self.squeeze = P.Squeeze()
+        self.cast = P.Cast()
+        self.logicaland = P.LogicalAnd()
+        self.less = P.Less()
+        self.random_choice_with_mask_pos = P.RandomChoiceWithMask(self.num_expected_pos)
+        self.random_choice_with_mask_neg = P.RandomChoiceWithMask(self.num_expected_neg)
+        self.reshape = P.Reshape()
+        self.equal = P.Equal()
+        self.bounding_box_encode = P.BoundingBoxEncode(means=(0.0, 0.0, 0.0, 0.0), stds=(0.1, 0.1, 0.2, 0.2))
+        self.concat_axis1 = P.Concat(axis=1)
+        self.logicalnot = P.LogicalNot()
+        self.tile = P.Tile()
+
+        # Check
+        self.check_gt_one = Tensor(np.full((self.num_gts, 4), -1, dtype=self.dtype))
+        self.check_anchor_two = Tensor(np.full((self.num_bboxes, 4), -2, dtype=self.dtype))
+
+        # Init tensor
+        self.assigned_gt_inds = Tensor(np.full(num_bboxes, -1, dtype=np.int32))
+        self.assigned_gt_zeros = Tensor(np.array(np.zeros(num_bboxes), dtype=np.int32))
+        self.assigned_gt_ones = Tensor(np.array(np.ones(num_bboxes), dtype=np.int32))
+        self.assigned_gt_ignores = Tensor(np.full(num_bboxes, -1, dtype=np.int32))
+        self.assigned_pos_ones = Tensor(np.array(np.ones(self.num_expected_pos), dtype=np.int32))
+
+        self.gt_ignores = Tensor(np.full(self.num_gts, -1, dtype=np.int32))
+        self.range_pos_size = Tensor(np.arange(self.num_expected_pos).astype(self.dtype))
+        self.check_neg_mask = Tensor(np.array(np.ones(self.num_expected_neg - self.num_expected_pos), dtype=np.bool))
+        self.bboxs_neg_mask = Tensor(np.zeros((self.num_expected_neg, 4), dtype=self.dtype))
+        self.labels_neg_mask = Tensor(np.array(np.zeros(self.num_expected_neg), dtype=np.uint8))
+
+        self.reshape_shape_pos = (self.num_expected_pos, 1)
+        self.reshape_shape_neg = (self.num_expected_neg, 1)
+
+        self.scalar_zero = Tensor(0.0, dtype=self.ms_type)
+        self.scalar_neg_iou_thr = Tensor(self.neg_iou_thr, dtype=self.ms_type)
+        self.scalar_pos_iou_thr = Tensor(self.pos_iou_thr, dtype=self.ms_type)
+        self.scalar_min_pos_iou = Tensor(self.min_pos_iou, dtype=self.ms_type)
+
+    def construct(self, gt_bboxes_i, gt_labels_i, valid_mask, bboxes, gt_valids):
+        """Construct the trainer of SpNas."""
+        gt_bboxes_i = self.select(
+            self.cast(self.tile(self.reshape(self.cast(gt_valids, mstype.int32), (self.num_gts, 1)), (1, 4)),
+                      mstype.bool_), gt_bboxes_i, self.check_gt_one)
+        bboxes = self.select(
+            self.cast(self.tile(self.reshape(self.cast(valid_mask, mstype.int32), (self.num_bboxes, 1)), (1, 4)),
+                      mstype.bool_), bboxes, self.check_anchor_two)
+
+        overlaps = self.iou(bboxes, gt_bboxes_i)
+
+        max_overlaps_w_gt_index, max_overlaps_w_gt = self.max_gt(overlaps)
+        _, max_overlaps_w_ac = self.max_anchor(overlaps)
+
+        neg_sample_iou_mask = self.logicaland(self.greaterequal(max_overlaps_w_gt,
+                                                                self.scalar_zero),
+                                              self.less(max_overlaps_w_gt,
+                                                        self.scalar_neg_iou_thr))
+
+        assigned_gt_inds2 = self.select(neg_sample_iou_mask, self.assigned_gt_zeros, self.assigned_gt_inds)
+
+        pos_sample_iou_mask = self.greaterequal(max_overlaps_w_gt, self.scalar_pos_iou_thr)
+        assigned_gt_inds3 = self.select(pos_sample_iou_mask, max_overlaps_w_gt_index + self.assigned_gt_ones,
+                                        assigned_gt_inds2)
+
+        for j in range(self.num_gts):
+            max_overlaps_w_ac_j = max_overlaps_w_ac[j:j + 1:1]
+            overlaps_w_ac_j = overlaps[j:j + 1:1, ::]
+            temp1 = self.greaterequal(max_overlaps_w_ac_j, self.scalar_min_pos_iou)
+            temp2 = self.squeeze(self.equal(overlaps_w_ac_j, max_overlaps_w_ac_j))
+            pos_mask_j = self.logicaland(temp1, temp2)
+            assigned_gt_inds3 = self.select(pos_mask_j, (j + 1) * self.assigned_gt_ones, assigned_gt_inds3)
+
+        assigned_gt_inds5 = self.select(valid_mask, assigned_gt_inds3, self.assigned_gt_ignores)
+
+        bboxes = self.concat((gt_bboxes_i, bboxes))
+        label_inds_valid = self.select(gt_valids, self.label_inds, self.gt_ignores)
+        label_inds_valid = label_inds_valid * self.add_gt_as_proposals_valid
+        assigned_gt_inds5 = self.concat((label_inds_valid, assigned_gt_inds5))
+
+        # Get pos index
+        pos_index, valid_pos_index = self.random_choice_with_mask_pos(self.greater(assigned_gt_inds5, 0))
+
+        pos_check_valid = self.cast(self.greater(assigned_gt_inds5, 0), self.ms_type)
+        pos_check_valid = self.sum_inds(pos_check_valid, -1)
+        valid_pos_index = self.less(self.range_pos_size, pos_check_valid)
+        pos_index = pos_index * self.reshape(self.cast(valid_pos_index, mstype.int32), (self.num_expected_pos, 1))
+
+        num_pos = self.sum_inds(self.cast(self.logicalnot(valid_pos_index), self.ms_type), -1)
+        valid_pos_index = self.cast(valid_pos_index, mstype.int32)
+        pos_index = self.reshape(pos_index, self.reshape_shape_pos)
+        valid_pos_index = self.reshape(valid_pos_index, self.reshape_shape_pos)
+        pos_index = pos_index * valid_pos_index
+
+        pos_assigned_gt_index = self.gatherND(assigned_gt_inds5, pos_index) - self.assigned_pos_ones
+        pos_assigned_gt_index = self.reshape(pos_assigned_gt_index, self.reshape_shape_pos)
+        pos_assigned_gt_index = pos_assigned_gt_index * valid_pos_index
+
+        pos_gt_labels = self.gatherND(gt_labels_i, pos_assigned_gt_index)
+
+        # Get neg index
+        neg_index, valid_neg_index = self.random_choice_with_mask_neg(self.equal(assigned_gt_inds5, 0))
+
+        unvalid_pos_index = self.less(self.range_pos_size, num_pos)
+        valid_neg_index = self.logicaland(self.concat((self.check_neg_mask, unvalid_pos_index)), valid_neg_index)
+        neg_index = self.reshape(neg_index, self.reshape_shape_neg)
+
+        valid_neg_index = self.cast(valid_neg_index, mstype.int32)
+        valid_neg_index = self.reshape(valid_neg_index, self.reshape_shape_neg)
+        neg_index = neg_index * valid_neg_index
+
+        pos_bboxes_ = self.gatherND(bboxes, pos_index)
+
+        neg_bboxes_ = self.gatherND(bboxes, neg_index)
+        pos_assigned_gt_index = self.reshape(pos_assigned_gt_index, self.reshape_shape_pos)
+        pos_gt_bboxes_ = self.gatherND(gt_bboxes_i, pos_assigned_gt_index)
+        pos_bbox_targets_ = self.bounding_box_encode(pos_bboxes_, pos_gt_bboxes_)
+
+        total_bboxes = self.concat((pos_bboxes_, neg_bboxes_))
+        total_deltas = self.concat((pos_bbox_targets_, self.bboxs_neg_mask))
+        total_labels = self.concat((pos_gt_labels, self.labels_neg_mask))
+
+        valid_pos_index = self.reshape(valid_pos_index, self.reshape_shape_pos)
+        valid_neg_index = self.reshape(valid_neg_index, self.reshape_shape_neg)
+        total_mask = self.concat((valid_pos_index, valid_neg_index))
+
+        return total_bboxes, total_deltas, total_labels, total_mask
diff --git a/vega/networks/mindspore/faster_rcnn/faster_rcnn_resnet.py b/vega/networks/mindspore/faster_rcnn/faster_rcnn_resnet.py
new file mode 100644
index 00000000..d41e17fd
--- /dev/null
+++ b/vega/networks/mindspore/faster_rcnn/faster_rcnn_resnet.py
@@ -0,0 +1,475 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+"""FasterRcnn based on ResNet."""
+
+import numpy as np
+import mindspore.nn as nn
+from mindspore import context
+from mindspore.ops import operations as P
+from mindspore.common.tensor import Tensor
+import mindspore.common.dtype as mstype
+from mindspore.ops import functional as F
+from .resnet import ResNetFea, ResidualBlockUsing
+from .bbox_assign_sample_stage2 import BboxAssignSampleForRcnn
+from .fpn_neck import FeatPyramidNeck
+from .proposal_generator import Proposal
+from .rcnn import Rcnn
+from .rpn import RPN
+from .roi_align import SingleRoIExtractor
+from .anchor_generator import AnchorGenerator
+from vega.common import ClassFactory, ClassType
+from vega.algorithms.nas.sp_nas.src.model_utils.config import config
+from vega.modules.module import Module
+
+
+@ClassFactory.register(ClassType.NETWORK)
+class Faster_Rcnn_MD(Module):
+    """
+    FasterRcnn Network.
+
+    Note:
+        backbone = resnet
+
+    Returns:
+        Tuple, tuple of output tensor.
+        rpn_loss: Scalar, Total loss of RPN subnet.
+        rcnn_loss: Scalar, Total loss of RCNN subnet.
+        rpn_cls_loss: Scalar, Classification loss of RPN subnet.
+        rpn_reg_loss: Scalar, Regression loss of RPN subnet.
+        rcnn_cls_loss: Scalar, Classification loss of RCNN subnet.
+        rcnn_reg_loss: Scalar, Regression loss of RCNN subnet.
+
+    Examples:
+        net = Faster_Rcnn_Resnet()
+    """
+
+    def __init__(self, code='111-2111-211111-211', **kwargs):
+        super(Faster_Rcnn_MD, self).__init__()
+        self.dtype = np.float32
+        self.ms_type = mstype.float32
+        self.code = code
+        self.train_batch_size = config.batch_size
+        self.num_classes = config.num_classes
+        self.anchor_scales = config.anchor_scales
+        self.anchor_ratios = config.anchor_ratios
+        self.anchor_strides = config.anchor_strides
+        self.target_means = tuple(config.rcnn_target_means)
+        self.target_stds = tuple(config.rcnn_target_stds)
+
+        # Anchor generator
+        anchor_base_sizes = None
+        self.anchor_base_sizes = list(
+            self.anchor_strides) if anchor_base_sizes is None else anchor_base_sizes
+
+        self.anchor_generators = []
+        for anchor_base in self.anchor_base_sizes:
+            self.anchor_generators.append(
+                AnchorGenerator(anchor_base, self.anchor_scales, self.anchor_ratios))
+
+        self.num_anchors = len(self.anchor_ratios) * len(self.anchor_scales)
+
+        featmap_sizes = config.feature_shapes
+        assert len(featmap_sizes) == len(self.anchor_generators)
+
+        self.anchor_list = self.get_anchors(featmap_sizes)
+
+        # Backbone resnet
+        self.backbone = ResNetFea(code=self.code)
+
+        # Fpn
+        self.fpn_ncek = FeatPyramidNeck(self.backbone.channels,
+                                        config.fpn_out_channels,
+                                        config.fpn_num_outs)
+
+        # Rpn and rpn loss
+        self.gt_labels_stage1 = Tensor(np.ones((self.train_batch_size, config.num_gts)).astype(np.uint8))
+        self.rpn_with_loss = RPN(config,
+                                 self.train_batch_size,
+                                 config.rpn_in_channels,
+                                 config.rpn_feat_channels,
+                                 config.num_anchors,
+                                 config.rpn_cls_out_channels)
+
+        # Proposal
+        self.proposal_generator = Proposal(config,
+                                           self.train_batch_size,
+                                           config.activate_num_classes,
+                                           config.use_sigmoid_cls)
+        self.proposal_generator.set_train_local(config, True)
+        self.proposal_generator_test = Proposal(config,
+                                                config.test_batch_size,
+                                                config.activate_num_classes,
+                                                config.use_sigmoid_cls)
+        self.proposal_generator_test.set_train_local(config, False)
+
+        # Assign and sampler stage two
+        self.bbox_assigner_sampler_for_rcnn = BboxAssignSampleForRcnn(config, self.train_batch_size,
+                                                                      config.num_bboxes_stage2, True)
+        self.decode = P.BoundingBoxDecode(max_shape=(config.img_height, config.img_width), means=self.target_means,
+                                          stds=self.target_stds)
+        # Roi
+        self.roi_init(config)
+
+        # Rcnn
+        self.rcnn = Rcnn(config, config.rcnn_in_channels * config.roi_layer.out_size * config.roi_layer.out_size,
+                         self.train_batch_size, self.num_classes)
+
+        # Op declare
+        self.squeeze = P.Squeeze()
+        self.cast = P.Cast()
+
+        self.concat = P.Concat(axis=0)
+        self.concat_1 = P.Concat(axis=1)
+        self.concat_2 = P.Concat(axis=2)
+        self.reshape = P.Reshape()
+        self.select = P.Select()
+        self.greater = P.Greater()
+        self.transpose = P.Transpose()
+
+        # Improve speed
+        self.concat_start = min(self.num_classes - 2, 55)
+        self.concat_end = (self.num_classes - 1)
+
+        # Test mode
+        self.test_mode_init(config)
+
+        # Init tensor
+        self.init_tensor(config)
+        self.device_type = "Ascend" if context.get_context("device_target") == "Ascend" else "Others"
+
+    def roi_init(self, config):
+        """
+        Initialize roi from the config file.
+
+        Args:
+            config (file): config file.
+            roi_layer (dict): Numbers of block in different layers.
+            roi_align_out_channels (int): Out channel in each layer.
+            config.roi_align_featmap_strides (list): featmap_strides in each layer.
+            roi_align_finest_scale (int): finest_scale in roi.
+
+        Examples:
+            self.roi_init(config)
+        """
+        self.roi_align = SingleRoIExtractor(config,
+                                            config.roi_layer,
+                                            config.roi_align_out_channels,
+                                            config.roi_align_featmap_strides,
+                                            self.train_batch_size,
+                                            config.roi_align_finest_scale)
+        self.roi_align.set_train_local(config, True)
+        self.roi_align_test = SingleRoIExtractor(config,
+                                                 config.roi_layer,
+                                                 config.roi_align_out_channels,
+                                                 config.roi_align_featmap_strides,
+                                                 1,
+                                                 config.roi_align_finest_scale)
+        self.roi_align_test.set_train_local(config, False)
+
+    def test_mode_init(self, config):
+        """
+        Initialize test_mode from the config file.
+
+        Args:
+            config (file): config file.
+            test_batch_size (int): Size of test batch.
+            rpn_max_num (int): max num of rpn.
+            test_score_thresh (float): threshold of test score.
+            test_iou_thr (float): threshold of test iou.
+
+        Examples:
+            self.test_mode_init(config)
+        """
+        self.test_batch_size = config.test_batch_size
+        self.split = P.Split(axis=0, output_num=self.test_batch_size)
+        self.split_shape = P.Split(axis=0, output_num=4)
+        self.split_scores = P.Split(axis=1, output_num=self.num_classes)
+        self.split_cls = P.Split(axis=0, output_num=self.num_classes - 1)
+        self.tile = P.Tile()
+        self.gather = P.GatherNd()
+
+        self.rpn_max_num = config.rpn_max_num
+
+        self.zeros_for_nms = Tensor(np.zeros((self.rpn_max_num, 3)).astype(self.dtype))
+        self.ones_mask = np.ones((self.rpn_max_num, 1)).astype(np.bool)
+        self.zeros_mask = np.zeros((self.rpn_max_num, 1)).astype(np.bool)
+        self.bbox_mask = Tensor(np.concatenate((self.ones_mask, self.zeros_mask,
+                                                self.ones_mask, self.zeros_mask), axis=1))
+        self.nms_pad_mask = Tensor(np.concatenate((self.ones_mask, self.ones_mask,
+                                                   self.ones_mask, self.ones_mask, self.zeros_mask), axis=1))
+
+        self.test_score_thresh = Tensor(np.ones((self.rpn_max_num, 1)).astype(self.dtype) * config.test_score_thr)
+        self.test_score_zeros = Tensor(np.ones((self.rpn_max_num, 1)).astype(self.dtype) * 0)
+        self.test_box_zeros = Tensor(np.ones((self.rpn_max_num, 4)).astype(self.dtype) * -1)
+        self.test_iou_thr = Tensor(np.ones((self.rpn_max_num, 1)).astype(self.dtype) * config.test_iou_thr)
+        self.test_max_per_img = config.test_max_per_img
+        self.nms_test = P.NMSWithMask(config.test_iou_thr)
+        self.softmax = P.Softmax(axis=1)
+        self.logicand = P.LogicalAnd()
+        self.oneslike = P.OnesLike()
+        self.test_topk = P.TopK(sorted=True)
+        self.test_num_proposal = self.test_batch_size * self.rpn_max_num
+
+    def init_tensor(self, config):
+        """Construct the trainer of SpNas."""
+        roi_align_index = [np.array(np.ones((config.num_expected_pos_stage2 + config.num_expected_neg_stage2, 1)) * i,
+                                    dtype=self.dtype) for i in range(self.train_batch_size)]
+
+        roi_align_index_test = [np.array(np.ones((config.rpn_max_num, 1)) * i, dtype=self.dtype)
+                                for i in range(self.test_batch_size)]
+
+        self.roi_align_index_tensor = Tensor(np.concatenate(roi_align_index))
+        self.roi_align_index_test_tensor = Tensor(np.concatenate(roi_align_index_test))
+
+    def construct(self, img_data, img_metas, gt_bboxes, gt_labels, gt_valids):
+        """
+        Construct the FasterRcnn Network.
+
+        Args:
+            img_data: input image data.
+            img_metas: meta label of img.
+            gt_bboxes (Tensor): get the value of bboxes.
+            gt_labels (Tensor): get the value of labels.
+            gt_valids (Tensor): get the valid part of bboxes.
+
+        Returns:
+            Tuple,tuple of output tensor
+        """
+        x = self.backbone(img_data)
+        x = self.fpn_ncek(x)
+
+        rpn_loss, cls_score, bbox_pred, rpn_cls_loss, rpn_reg_loss, _ = self.rpn_with_loss(x,
+                                                                                           img_metas,
+                                                                                           self.anchor_list,
+                                                                                           gt_bboxes,
+                                                                                           self.gt_labels_stage1,
+                                                                                           gt_valids)
+
+        if self.training:
+            proposal, proposal_mask = self.proposal_generator(cls_score, bbox_pred, self.anchor_list)
+        else:
+            proposal, proposal_mask = self.proposal_generator_test(cls_score, bbox_pred, self.anchor_list)
+
+        gt_labels = self.cast(gt_labels, mstype.int32)
+        gt_valids = self.cast(gt_valids, mstype.int32)
+        bboxes_tuple = ()
+        deltas_tuple = ()
+        labels_tuple = ()
+        mask_tuple = ()
+        if self.training:
+            for i in range(self.train_batch_size):
+                gt_bboxes_i = self.squeeze(gt_bboxes[i:i + 1:1, ::])
+
+                gt_labels_i = self.squeeze(gt_labels[i:i + 1:1, ::])
+                gt_labels_i = self.cast(gt_labels_i, mstype.uint8)
+
+                gt_valids_i = self.squeeze(gt_valids[i:i + 1:1, ::])
+                gt_valids_i = self.cast(gt_valids_i, mstype.bool_)
+
+                bboxes, deltas, labels, mask = self.bbox_assigner_sampler_for_rcnn(gt_bboxes_i,
+                                                                                   gt_labels_i,
+                                                                                   proposal_mask[i],
+                                                                                   proposal[i][::, 0:4:1],
+                                                                                   gt_valids_i)
+                bboxes_tuple += (bboxes,)
+                deltas_tuple += (deltas,)
+                labels_tuple += (labels,)
+                mask_tuple += (mask,)
+
+            bbox_targets = self.concat(deltas_tuple)
+            rcnn_labels = self.concat(labels_tuple)
+            bbox_targets = F.stop_gradient(bbox_targets)
+            rcnn_labels = F.stop_gradient(rcnn_labels)
+            rcnn_labels = self.cast(rcnn_labels, mstype.int32)
+        else:
+            mask_tuple += proposal_mask
+            bbox_targets = proposal_mask
+            rcnn_labels = proposal_mask
+            for p_i in proposal:
+                bboxes_tuple += (p_i[::, 0:4:1],)
+
+        if self.training:
+            if self.train_batch_size > 1:
+                bboxes_all = self.concat(bboxes_tuple)
+            else:
+                bboxes_all = bboxes_tuple[0]
+            rois = self.concat_1((self.roi_align_index_tensor, bboxes_all))
+        else:
+            if self.test_batch_size > 1:
+                bboxes_all = self.concat(bboxes_tuple)
+            else:
+                bboxes_all = bboxes_tuple[0]
+            if self.device_type == "Ascend":
+                bboxes_all = self.cast(bboxes_all, mstype.float16)
+            rois = self.concat_1((self.roi_align_index_test_tensor, bboxes_all))
+
+        rois = self.cast(rois, mstype.float32)
+        rois = F.stop_gradient(rois)
+
+        if self.training:
+            roi_feats = self.roi_align(rois,
+                                       self.cast(x[0], mstype.float32),
+                                       self.cast(x[1], mstype.float32),
+                                       self.cast(x[2], mstype.float32),
+                                       self.cast(x[3], mstype.float32))
+        else:
+            roi_feats = self.roi_align_test(rois,
+                                            self.cast(x[0], mstype.float32),
+                                            self.cast(x[1], mstype.float32),
+                                            self.cast(x[2], mstype.float32),
+                                            self.cast(x[3], mstype.float32))
+
+        roi_feats = self.cast(roi_feats, self.ms_type)
+        rcnn_masks = self.concat(mask_tuple)
+        rcnn_masks = F.stop_gradient(rcnn_masks)
+        rcnn_mask_squeeze = self.squeeze(self.cast(rcnn_masks, mstype.bool_))
+        rcnn_loss, rcnn_cls_loss, rcnn_reg_loss, _ = self.rcnn(roi_feats,
+                                                               bbox_targets,
+                                                               rcnn_labels,
+                                                               rcnn_mask_squeeze)
+
+        output = ()
+        if self.training:
+            output += (rpn_loss, rcnn_loss, rpn_cls_loss, rpn_reg_loss, rcnn_cls_loss, rcnn_reg_loss)
+        else:
+            output = self.get_det_bboxes(rcnn_cls_loss, rcnn_reg_loss, rcnn_masks, bboxes_all, img_metas)
+
+        return output
+
+    def get_det_bboxes(self, cls_logits, reg_logits, mask_logits, rois, img_metas):
+        """Get the actual detection box."""
+        scores = self.softmax(cls_logits)
+
+        boxes_all = ()
+        for i in range(self.num_classes):
+            k = i * 4
+            reg_logits_i = self.squeeze(reg_logits[::, k:k + 4:1])
+            out_boxes_i = self.decode(rois, reg_logits_i)
+            boxes_all += (out_boxes_i,)
+
+        img_metas_all = self.split(img_metas)
+        scores_all = self.split(scores)
+        mask_all = self.split(self.cast(mask_logits, mstype.int32))
+
+        boxes_all_with_batchsize = ()
+        for i in range(self.test_batch_size):
+            scale = self.split_shape(self.squeeze(img_metas_all[i]))
+            scale_h = scale[2]
+            scale_w = scale[3]
+            boxes_tuple = ()
+            for j in range(self.num_classes):
+                boxes_tmp = self.split(boxes_all[j])
+                out_boxes_h = boxes_tmp[i] / scale_h
+                out_boxes_w = boxes_tmp[i] / scale_w
+                boxes_tuple += (self.select(self.bbox_mask, out_boxes_w, out_boxes_h),)
+            boxes_all_with_batchsize += (boxes_tuple,)
+
+        output = self.multiclass_nms(boxes_all_with_batchsize, scores_all, mask_all)
+
+        return output
+
+    def multiclass_nms(self, boxes_all, scores_all, mask_all):
+        """Multiscale postprocessing."""
+        all_bboxes = ()
+        all_labels = ()
+        all_masks = ()
+
+        for i in range(self.test_batch_size):
+            bboxes = boxes_all[i]
+            scores = scores_all[i]
+            masks = self.cast(mask_all[i], mstype.bool_)
+
+            res_boxes_tuple = ()
+            res_labels_tuple = ()
+            res_masks_tuple = ()
+
+            for j in range(self.num_classes - 1):
+                k = j + 1
+                _cls_scores = scores[::, k:k + 1:1]
+                _bboxes = self.squeeze(bboxes[k])
+                _mask_o = self.reshape(masks, (self.rpn_max_num, 1))
+
+                cls_mask = self.greater(_cls_scores, self.test_score_thresh)
+                _mask = self.logicand(_mask_o, cls_mask)
+
+                _reg_mask = self.cast(self.tile(self.cast(_mask, mstype.int32), (1, 4)), mstype.bool_)
+
+                _bboxes = self.select(_reg_mask, _bboxes, self.test_box_zeros)
+                _cls_scores = self.select(_mask, _cls_scores, self.test_score_zeros)
+                __cls_scores = self.squeeze(_cls_scores)
+                scores_sorted, topk_inds = self.test_topk(__cls_scores, self.rpn_max_num)
+                topk_inds = self.reshape(topk_inds, (self.rpn_max_num, 1))
+                scores_sorted = self.reshape(scores_sorted, (self.rpn_max_num, 1))
+                _bboxes_sorted = self.gather(_bboxes, topk_inds)
+                _mask_sorted = self.gather(_mask, topk_inds)
+
+                scores_sorted = self.tile(scores_sorted, (1, 4))
+                cls_dets = self.concat_1((_bboxes_sorted, scores_sorted))
+                cls_dets = P.Slice()(cls_dets, (0, 0), (self.rpn_max_num, 5))
+
+                cls_dets, _index, _mask_nms = self.nms_test(cls_dets)
+                _index = self.reshape(_index, (self.rpn_max_num, 1))
+                _mask_nms = self.reshape(_mask_nms, (self.rpn_max_num, 1))
+
+                _mask_n = self.gather(_mask_sorted, _index)
+
+                _mask_n = self.logicand(_mask_n, _mask_nms)
+                cls_labels = self.oneslike(_index) * j
+                res_boxes_tuple += (cls_dets,)
+                res_labels_tuple += (cls_labels,)
+                res_masks_tuple += (_mask_n,)
+
+            res_boxes_start = self.concat(res_boxes_tuple[:self.concat_start])
+            res_labels_start = self.concat(res_labels_tuple[:self.concat_start])
+            res_masks_start = self.concat(res_masks_tuple[:self.concat_start])
+
+            res_boxes_end = self.concat(res_boxes_tuple[self.concat_start:self.concat_end])
+            res_labels_end = self.concat(res_labels_tuple[self.concat_start:self.concat_end])
+            res_masks_end = self.concat(res_masks_tuple[self.concat_start:self.concat_end])
+
+            res_boxes = self.concat((res_boxes_start, res_boxes_end))
+            res_labels = self.concat((res_labels_start, res_labels_end))
+            res_masks = self.concat((res_masks_start, res_masks_end))
+
+            reshape_size = (self.num_classes - 1) * self.rpn_max_num
+            res_boxes = self.reshape(res_boxes, (1, reshape_size, 5))
+            res_labels = self.reshape(res_labels, (1, reshape_size, 1))
+            res_masks = self.reshape(res_masks, (1, reshape_size, 1))
+
+            all_bboxes += (res_boxes,)
+            all_labels += (res_labels,)
+            all_masks += (res_masks,)
+
+        all_bboxes = self.concat(all_bboxes)
+        all_labels = self.concat(all_labels)
+        all_masks = self.concat(all_masks)
+        return all_bboxes, all_labels, all_masks
+
+    def get_anchors(self, featmap_sizes):
+        """Get anchors according to feature map sizes.
+
+        Args:
+            featmap_sizes (list[tuple]): Multi-level feature map sizes.
+            img_metas (list[dict]): Image meta info.
+
+        Returns:
+            tuple: anchors of each image, valid flags of each image
+        """
+        num_levels = len(featmap_sizes)
+
+        # since feature map sizes of all images are the same, we only compute
+        # anchors for one time
+        multi_level_anchors = ()
+        for i in range(num_levels):
+            anchors = self.anchor_generators[i].grid_anchors(
+                featmap_sizes[i], self.anchor_strides[i])
+            multi_level_anchors += (Tensor(anchors.astype(self.dtype)),)
+
+        return multi_level_anchors
diff --git a/vega/networks/mindspore/faster_rcnn/fpn_neck.py b/vega/networks/mindspore/faster_rcnn/fpn_neck.py
new file mode 100644
index 00000000..c9a1dbbc
--- /dev/null
+++ b/vega/networks/mindspore/faster_rcnn/fpn_neck.py
@@ -0,0 +1,110 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+"""FasterRcnn feature pyramid network."""
+
+import numpy as np
+import mindspore.nn as nn
+from mindspore.ops import operations as P
+from mindspore.common.tensor import Tensor
+from mindspore.common import dtype as mstype
+from mindspore.common.initializer import initializer
+
+
+def bias_init_zeros(shape):
+    """Bias init method."""
+    return Tensor(np.array(np.zeros(shape).astype(np.float32)))
+
+
+def _conv(in_channels, out_channels, kernel_size=3, stride=1, padding=0, pad_mode='pad'):
+    """Conv2D wrapper."""
+    shape = (out_channels, in_channels, kernel_size, kernel_size)
+    weights = initializer("XavierUniform", shape=shape, dtype=mstype.float32).to_tensor()
+    shape_bias = (out_channels,)
+    biass = bias_init_zeros(shape_bias)
+    return nn.Conv2d(in_channels, out_channels,
+                     kernel_size=kernel_size, stride=stride, padding=padding,
+                     pad_mode=pad_mode, weight_init=weights, has_bias=True, bias_init=biass)
+
+
+class FeatPyramidNeck(nn.Cell):
+    """
+    Feature pyramid network cell, usually uses as network neck.
+
+    Applies the convolution on multiple, input feature maps
+    and output feature map with same channel size. if required num of
+    output larger then num of inputs, add extra maxpooling for further
+    downsampling;
+
+    Args:
+        in_channels (tuple) - Channel size of input feature maps.
+        out_channels (int) - Channel size output.
+        num_outs (int) - Num of output features.
+
+    Returns:
+        Tuple, with tensors of same channel size.
+
+    Examples:
+        neck = FeatPyramidNeck([100,200,300], 50, 4)
+        input_data = (normal(0,0.1,(1,c,1280//(4*2**i), 768//(4*2**i)),
+                      dtype=np.float32) \
+                      for i, c in enumerate(config.fpn_in_channels))
+        x = neck(input_data)
+    """
+
+    def __init__(self,
+                 in_channels,
+                 out_channels,
+                 num_outs,
+                 code=None):
+        super(FeatPyramidNeck, self).__init__()
+        self.num_outs = num_outs
+        self.in_channels = in_channels
+        self.fpn_layer = len(self.in_channels)
+        self.code = code
+
+        assert not self.num_outs < len(in_channels)
+
+        self.lateral_convs_list_ = []
+        self.fpn_convs_ = []
+
+        for _, channel in enumerate(in_channels):
+            l_conv = _conv(channel, out_channels, kernel_size=1, stride=1, padding=0, pad_mode='valid')
+            fpn_conv = _conv(out_channels, out_channels, kernel_size=3, stride=1, padding=0, pad_mode='same')
+            self.lateral_convs_list_.append(l_conv)
+            self.fpn_convs_.append(fpn_conv)
+        self.lateral_convs_list = nn.layer.CellList(self.lateral_convs_list_)
+        self.fpn_convs_list = nn.layer.CellList(self.fpn_convs_)
+        self.interpolate1 = P.ResizeNearestNeighbor((48, 80))
+        self.interpolate2 = P.ResizeNearestNeighbor((96, 160))
+        self.interpolate3 = P.ResizeNearestNeighbor((192, 320))
+        self.maxpool = P.MaxPool(kernel_size=1, strides=2, pad_mode="same")
+
+    def construct(self, inputs):
+        """Construct the trainer of SpNas."""
+        x = ()
+        for i in range(self.fpn_layer):
+            x += (self.lateral_convs_list[i](inputs[i]),)
+
+        y = (x[3],)
+        y = y + (x[2] + self.interpolate1(y[self.fpn_layer - 4]),)
+        y = y + (x[1] + self.interpolate2(y[self.fpn_layer - 3]),)
+        y = y + (x[0] + self.interpolate3(y[self.fpn_layer - 2]),)
+
+        z = ()
+        for i in range(self.fpn_layer - 1, -1, -1):
+            z = z + (y[i],)
+
+        outs = ()
+        for i in self.code or range(self.fpn_layer):
+            outs = outs + (self.fpn_convs_list[i](z[i]),)
+
+        for i in range(self.num_outs - self.fpn_layer):
+            outs = outs + (self.maxpool(outs[3]),)
+        return outs
diff --git a/vega/networks/mindspore/faster_rcnn/proposal_generator.py b/vega/networks/mindspore/faster_rcnn/proposal_generator.py
new file mode 100644
index 00000000..dc85c5a6
--- /dev/null
+++ b/vega/networks/mindspore/faster_rcnn/proposal_generator.py
@@ -0,0 +1,193 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+"""FasterRcnn proposal generator."""
+
+import numpy as np
+import mindspore.nn as nn
+import mindspore.common.dtype as mstype
+from mindspore.ops import operations as P
+from mindspore import Tensor
+
+
+class Proposal(nn.Cell):
+    """
+    Proposal subnet.
+
+    Args:
+        config (dict): Config.
+        batch_size (int): Batchsize.
+        num_classes (int) - Class number.
+        use_sigmoid_cls (bool) - Select sigmoid or softmax function.
+        target_means (tuple) - Means for encode function. Default: (.0, .0, .0, .0).
+        target_stds (tuple) - Stds for encode function. Default: (1.0, 1.0, 1.0, 1.0).
+
+    Returns:
+        Tuple, tuple of output tensor,(proposal, mask).
+
+    Examples:
+        Proposal(config = config, batch_size = 1, num_classes = 81, use_sigmoid_cls = True, \
+                 target_means=(.0, .0, .0, .0), target_stds=(1.0, 1.0, 1.0, 1.0))
+    """
+
+    def __init__(self,
+                 config,
+                 batch_size,
+                 num_classes,
+                 use_sigmoid_cls,
+                 target_means=(.0, .0, .0, .0),
+                 target_stds=(1.0, 1.0, 1.0, 1.0)
+                 ):
+        super(Proposal, self).__init__()
+        cfg = config
+        self.batch_size = batch_size
+        self.num_classes = num_classes
+        self.target_means = target_means
+        self.target_stds = target_stds
+        self.use_sigmoid_cls = use_sigmoid_cls
+
+        if self.use_sigmoid_cls:
+            self.cls_out_channels = num_classes - 1
+            self.activation = P.Sigmoid()
+            self.reshape_shape = (-1, 1)
+        else:
+            self.cls_out_channels = num_classes
+            self.activation = P.Softmax(axis=1)
+            self.reshape_shape = (-1, 2)
+
+        if self.cls_out_channels <= 0:
+            raise ValueError('num_classes={} is too small'.format(num_classes))
+
+        self.num_pre = cfg.rpn_proposal_nms_pre
+        self.min_box_size = cfg.rpn_proposal_min_bbox_size
+        self.nms_thr = cfg.rpn_proposal_nms_thr
+        self.nms_post = cfg.rpn_proposal_nms_post
+        self.nms_across_levels = cfg.rpn_proposal_nms_across_levels
+        self.max_num = cfg.rpn_proposal_max_num
+        self.num_levels = cfg.fpn_num_outs
+
+        # Op Define
+        self.squeeze = P.Squeeze()
+        self.reshape = P.Reshape()
+        self.cast = P.Cast()
+
+        self.feature_shapes = cfg.feature_shapes
+
+        self.transpose_shape = (1, 2, 0)
+
+        self.decode = P.BoundingBoxDecode(max_shape=(cfg.img_height, cfg.img_width), means=self.target_means,
+                                          stds=self.target_stds)
+        self.nms = P.NMSWithMask(self.nms_thr)
+        self.concat_axis0 = P.Concat(axis=0)
+        self.concat_axis1 = P.Concat(axis=1)
+        self.split = P.Split(axis=1, output_num=5)
+        self.min = P.Minimum()
+        self.gatherND = P.GatherNd()
+        self.slice = P.Slice()
+        self.select = P.Select()
+        self.greater = P.Greater()
+        self.transpose = P.Transpose()
+        self.tile = P.Tile()
+        self.set_train_local(config, training=True)
+
+        self.dtype = np.float32
+        self.ms_type = mstype.float32
+
+        self.multi_10 = Tensor(10.0, self.ms_type)
+
+    def set_train_local(self, config, training=True):
+        """Set training flag."""
+        self.training_local = training
+
+        cfg = config
+        self.topK_stage1 = ()
+        self.topK_shape = ()
+        total_max_topk_input = 0
+        if not self.training_local:
+            self.num_pre = cfg.rpn_nms_pre
+            self.min_box_size = cfg.rpn_min_bbox_min_size
+            self.nms_thr = cfg.rpn_nms_thr
+            self.nms_post = cfg.rpn_nms_post
+            self.nms_across_levels = cfg.rpn_nms_across_levels
+            self.max_num = cfg.rpn_max_num
+
+        for shp in self.feature_shapes:
+            k_num = min(self.num_pre, (shp[0] * shp[1] * 3))
+            total_max_topk_input += k_num
+            self.topK_stage1 += (k_num,)
+            self.topK_shape += ((k_num, 1),)
+
+        self.topKv2 = P.TopK(sorted=True)
+        self.topK_shape_stage2 = (self.max_num, 1)
+        self.min_float_num = -65500.0
+        self.topK_mask = Tensor(self.min_float_num * np.ones(total_max_topk_input, np.float32))
+
+    def construct(self, rpn_cls_score_total, rpn_bbox_pred_total, anchor_list):
+        """Construct the trainer of SpNas."""
+        proposals_tuple = ()
+        masks_tuple = ()
+        for img_id in range(self.batch_size):
+            cls_score_list = ()
+            bbox_pred_list = ()
+            for i in range(self.num_levels):
+                rpn_cls_score_i = self.squeeze(rpn_cls_score_total[i][img_id:img_id + 1:1, ::, ::, ::])
+                rpn_bbox_pred_i = self.squeeze(rpn_bbox_pred_total[i][img_id:img_id + 1:1, ::, ::, ::])
+
+                cls_score_list = cls_score_list + (rpn_cls_score_i,)
+                bbox_pred_list = bbox_pred_list + (rpn_bbox_pred_i,)
+
+            proposals, masks = self.get_bboxes_single(cls_score_list, bbox_pred_list, anchor_list)
+            proposals_tuple += (proposals,)
+            masks_tuple += (masks,)
+        return proposals_tuple, masks_tuple
+
+    def get_bboxes_single(self, cls_scores, bbox_preds, mlvl_anchors):
+        """Get proposal boundingbox."""
+        mlvl_proposals = ()
+        mlvl_mask = ()
+        for idx in range(self.num_levels):
+            rpn_cls_score = self.transpose(cls_scores[idx], self.transpose_shape)
+            rpn_bbox_pred = self.transpose(bbox_preds[idx], self.transpose_shape)
+            anchors = mlvl_anchors[idx]
+
+            rpn_cls_score = self.reshape(rpn_cls_score, self.reshape_shape)
+            rpn_cls_score = self.activation(rpn_cls_score)
+            rpn_cls_score_process = self.cast(self.squeeze(rpn_cls_score[::, 0::]), self.ms_type)
+
+            rpn_bbox_pred_process = self.cast(self.reshape(rpn_bbox_pred, (-1, 4)), self.ms_type)
+
+            scores_sorted, topk_inds = self.topKv2(rpn_cls_score_process, self.topK_stage1[idx])
+
+            topk_inds = self.reshape(topk_inds, self.topK_shape[idx])
+
+            bboxes_sorted = self.gatherND(rpn_bbox_pred_process, topk_inds)
+            anchors_sorted = self.cast(self.gatherND(anchors, topk_inds), self.ms_type)
+
+            proposals_decode = self.decode(anchors_sorted, bboxes_sorted)
+
+            proposals_decode = self.concat_axis1((proposals_decode, self.reshape(scores_sorted, self.topK_shape[idx])))
+            proposals, _, mask_valid = self.nms(proposals_decode)
+
+            mlvl_proposals = mlvl_proposals + (proposals,)
+            mlvl_mask = mlvl_mask + (mask_valid,)
+
+        proposals = self.concat_axis0(mlvl_proposals)
+        masks = self.concat_axis0(mlvl_mask)
+
+        _, _, _, _, scores = self.split(proposals)
+        scores = self.squeeze(scores)
+        topk_mask = self.cast(self.topK_mask, self.ms_type)
+        scores_using = self.select(masks, scores, topk_mask)
+
+        _, topk_inds = self.topKv2(scores_using, self.max_num)
+
+        topk_inds = self.reshape(topk_inds, self.topK_shape_stage2)
+        proposals = self.gatherND(proposals, topk_inds)
+        masks = self.gatherND(masks, topk_inds)
+        return proposals, masks
diff --git a/vega/networks/mindspore/faster_rcnn/rcnn.py b/vega/networks/mindspore/faster_rcnn/rcnn.py
new file mode 100644
index 00000000..40ca3c0c
--- /dev/null
+++ b/vega/networks/mindspore/faster_rcnn/rcnn.py
@@ -0,0 +1,178 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+"""FasterRcnn Rcnn network."""
+
+import numpy as np
+import mindspore.common.dtype as mstype
+import mindspore.nn as nn
+from mindspore.ops import operations as P
+from mindspore.common.tensor import Tensor
+from mindspore.common.initializer import initializer
+from mindspore.common.parameter import Parameter
+from mindspore import context
+
+
+class DenseNoTranpose(nn.Cell):
+    """Dense method."""
+
+    def __init__(self, input_channels, output_channels, weight_init):
+        super(DenseNoTranpose, self).__init__()
+        self.weight = Parameter(initializer(weight_init, [input_channels, output_channels], mstype.float32))
+        self.bias = Parameter(initializer("zeros", [output_channels], mstype.float32))
+
+        self.matmul = P.MatMul(transpose_b=False)
+        self.bias_add = P.BiasAdd()
+        self.cast = P.Cast()
+        self.device_type = "Ascend" if context.get_context("device_target") == "Ascend" else "Others"
+
+    def construct(self, x):
+        """Construct the trainer of SpNas."""
+        if self.device_type == "Ascend":
+            x = self.cast(x, mstype.float16)
+            weight = self.cast(self.weight, mstype.float16)
+            output = self.bias_add(self.matmul(x, weight), self.bias)
+        else:
+            output = self.bias_add(self.matmul(x, self.weight), self.bias)
+        return output
+
+
+class Rcnn(nn.Cell):
+    """
+    Rcnn subnet.
+
+    Args:
+        config (dict) - Config.
+        representation_size (int) - Channels of shared dense.
+        batch_size (int) - Batchsize.
+        num_classes (int) - Class number.
+        target_means (list) - Means for encode function. Default: (.0, .0, .0, .0]).
+        target_stds (list) - Stds for encode function. Default: (0.1, 0.1, 0.2, 0.2).
+
+    Returns:
+        Tuple, tuple of output tensor.
+
+    Examples:
+        Rcnn(config=config, representation_size = 1024, batch_size=2, num_classes = 81, \
+             target_means=(0., 0., 0., 0.), target_stds=(0.1, 0.1, 0.2, 0.2))
+    """
+
+    def __init__(self,
+                 config,
+                 representation_size,
+                 batch_size,
+                 num_classes,
+                 target_means=(0., 0., 0., 0.),
+                 target_stds=(0.1, 0.1, 0.2, 0.2)
+                 ):
+        super(Rcnn, self).__init__()
+        cfg = config
+        self.dtype = np.float32
+        self.ms_type = mstype.float32
+        self.rcnn_loss_cls_weight = Tensor(np.array(cfg.rcnn_loss_cls_weight).astype(self.dtype))
+        self.rcnn_loss_reg_weight = Tensor(np.array(cfg.rcnn_loss_reg_weight).astype(self.dtype))
+        self.rcnn_fc_out_channels = cfg.rcnn_fc_out_channels
+        self.target_means = target_means
+        self.target_stds = target_stds
+        self.num_classes = num_classes
+        self.in_channels = cfg.rcnn_in_channels
+        self.train_batch_size = batch_size
+        self.test_batch_size = cfg.test_batch_size
+
+        shape_0 = (self.rcnn_fc_out_channels, representation_size)
+        weights_0 = initializer("XavierUniform", shape=shape_0[::-1], dtype=self.ms_type).to_tensor()
+        shape_1 = (self.rcnn_fc_out_channels, self.rcnn_fc_out_channels)
+        weights_1 = initializer("XavierUniform", shape=shape_1[::-1], dtype=self.ms_type).to_tensor()
+        self.shared_fc_0 = DenseNoTranpose(representation_size, self.rcnn_fc_out_channels, weights_0)
+        self.shared_fc_1 = DenseNoTranpose(self.rcnn_fc_out_channels, self.rcnn_fc_out_channels, weights_1)
+
+        cls_weight = initializer('Normal', shape=[num_classes, self.rcnn_fc_out_channels][::-1],
+                                 dtype=self.ms_type).to_tensor()
+        reg_weight = initializer('Normal', shape=[num_classes * 4, self.rcnn_fc_out_channels][::-1],
+                                 dtype=self.ms_type).to_tensor()
+        self.cls_scores = DenseNoTranpose(self.rcnn_fc_out_channels, num_classes, cls_weight)
+        self.reg_scores = DenseNoTranpose(self.rcnn_fc_out_channels, num_classes * 4, reg_weight)
+
+        self.flatten = P.Flatten()
+        self.relu = P.ReLU()
+        self.logicaland = P.LogicalAnd()
+        self.loss_cls = P.SoftmaxCrossEntropyWithLogits()
+        self.loss_bbox = P.SmoothL1Loss(beta=1.0)
+        self.reshape = P.Reshape()
+        self.onehot = P.OneHot()
+        self.greater = P.Greater()
+        self.cast = P.Cast()
+        self.sum_loss = P.ReduceSum()
+        self.tile = P.Tile()
+        self.expandims = P.ExpandDims()
+
+        self.gather = P.GatherNd()
+        self.argmax = P.ArgMaxWithValue(axis=1)
+
+        self.on_value = Tensor(1.0, mstype.float32)
+        self.off_value = Tensor(0.0, mstype.float32)
+        self.value = Tensor(1.0, self.ms_type)
+
+        self.num_bboxes = (cfg.num_expected_pos_stage2 + cfg.num_expected_neg_stage2) * batch_size
+
+        rmv_first = np.ones((self.num_bboxes, self.num_classes))
+        rmv_first[:, 0] = np.zeros((self.num_bboxes,))
+        self.rmv_first_tensor = Tensor(rmv_first.astype(self.dtype))
+
+        self.num_bboxes_test = cfg.rpn_max_num * cfg.test_batch_size
+
+        range_max = np.arange(self.num_bboxes_test).astype(np.int32)
+        self.range_max = Tensor(range_max)
+
+    def construct(self, featuremap, bbox_targets, labels, mask):
+        """Construct the trainer of SpNas."""
+        x = self.flatten(featuremap)
+
+        x = self.relu(self.shared_fc_0(x))
+        x = self.relu(self.shared_fc_1(x))
+
+        x_cls = self.cls_scores(x)
+        x_reg = self.reg_scores(x)
+
+        if self.training:
+            bbox_weights = self.cast(self.logicaland(self.greater(labels, 0), mask), mstype.int32) * labels
+            labels = self.onehot(labels, self.num_classes, self.on_value, self.off_value)
+            bbox_targets = self.tile(self.expandims(bbox_targets, 1), (1, self.num_classes, 1))
+
+            loss, loss_cls, loss_reg, loss_print = self.loss(x_cls, x_reg, bbox_targets, bbox_weights, labels, mask)
+            out = (loss, loss_cls, loss_reg, loss_print)
+        else:
+            out = (x_cls, (x_cls / self.value), x_reg, x_cls)
+
+        return out
+
+    def loss(self, cls_score, bbox_pred, bbox_targets, bbox_weights, labels, weights):
+        """Loss method."""
+        loss_print = ()
+        loss_cls, _ = self.loss_cls(cls_score, labels)
+
+        weights = self.cast(weights, self.ms_type)
+        loss_cls = loss_cls * weights
+        loss_cls = self.sum_loss(loss_cls, (0,)) / self.sum_loss(weights, (0,))
+
+        bbox_weights = self.cast(self.onehot(bbox_weights, self.num_classes, self.on_value, self.off_value),
+                                 self.ms_type)
+        bbox_weights = bbox_weights * self.rmv_first_tensor
+
+        pos_bbox_pred = self.reshape(bbox_pred, (self.num_bboxes, -1, 4))
+        loss_reg = self.loss_bbox(pos_bbox_pred, bbox_targets)
+        loss_reg = self.sum_loss(loss_reg, (2,))
+        loss_reg = loss_reg * bbox_weights
+        loss_reg = loss_reg / self.sum_loss(weights, (0,))
+        loss_reg = self.sum_loss(loss_reg, (0, 1))
+
+        loss = self.rcnn_loss_cls_weight * loss_cls + self.rcnn_loss_reg_weight * loss_reg
+        loss_print += (loss_cls, loss_reg)
+
+        return loss, loss_cls, loss_reg, loss_print
diff --git a/vega/networks/mindspore/faster_rcnn/resnet.py b/vega/networks/mindspore/faster_rcnn/resnet.py
new file mode 100644
index 00000000..f5e7eeab
--- /dev/null
+++ b/vega/networks/mindspore/faster_rcnn/resnet.py
@@ -0,0 +1,255 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+"""Resnet backbone."""
+
+import numpy as np
+import mindspore.nn as nn
+from mindspore.ops import operations as P
+from mindspore.common.tensor import Tensor
+from mindspore.ops import functional as F
+
+
+def weight_init_ones(shape):
+    """Weight init."""
+    return Tensor(np.full(shape, 0.01).astype(np.float32))
+
+
+def _conv(in_channels, out_channels, kernel_size=3, stride=1, padding=0, pad_mode='pad'):
+    """Conv2D wrapper."""
+    shape = (out_channels, in_channels, kernel_size, kernel_size)
+    weights = weight_init_ones(shape)
+    return nn.Conv2d(in_channels, out_channels,
+                     kernel_size=kernel_size, stride=stride, padding=padding,
+                     pad_mode=pad_mode, weight_init=weights, has_bias=False)
+
+
+def _BatchNorm2dInit(out_chls, momentum=0.1, affine=True, use_batch_statistics=True):
+    """Batchnorm2D wrapper."""
+    dtype = np.float32
+    gamma_init = Tensor(np.array(np.ones(out_chls)).astype(dtype))
+    beta_init = Tensor(np.array(np.ones(out_chls) * 0).astype(dtype))
+    moving_mean_init = Tensor(np.array(np.ones(out_chls) * 0).astype(dtype))
+    moving_var_init = Tensor(np.array(np.ones(out_chls)).astype(dtype))
+    return nn.BatchNorm2d(out_chls, momentum=momentum, affine=affine, gamma_init=gamma_init,
+                          beta_init=beta_init, moving_mean_init=moving_mean_init,
+                          moving_var_init=moving_var_init, use_batch_statistics=use_batch_statistics)
+
+
+class ResidualBlockUsing(nn.Cell):
+    """
+    ResNet V1 residual block definition.
+
+    Args:
+        in_channels (int) - Input channel.
+        out_channels (int) - Output channel.
+        stride (int) - Stride size for the initial convolutional layer. Default: 1.
+        down_sample (bool) - If to do the downsample in block. Default: False.
+        momentum (float) - Momentum for batchnorm layer. Default: 0.1.
+        training (bool) - Training flag. Default: False.
+        weights_updata (bool) - Weights update flag. Default: False.
+
+    Returns:
+        Tensor, output tensor.
+
+    Examples:
+        ResidualBlock(3,256,stride=2,down_sample=True)
+    """
+
+    expansion = 4
+
+    def __init__(self,
+                 in_channels,
+                 out_channels,
+                 stride=1,
+                 down_sample=False,
+                 momentum=0.1,
+                 training=False,
+                 weights_update=False):
+        super(ResidualBlockUsing, self).__init__()
+
+        self.affine = weights_update
+
+        out_chls = out_channels // self.expansion
+        self.conv1 = _conv(in_channels, out_chls, kernel_size=1, stride=1, padding=0)
+        self.bn1 = _BatchNorm2dInit(out_chls, momentum=momentum, affine=self.affine, use_batch_statistics=training)
+
+        self.conv2 = _conv(out_chls, out_chls, kernel_size=3, stride=stride, padding=1)
+        self.bn2 = _BatchNorm2dInit(out_chls, momentum=momentum, affine=self.affine, use_batch_statistics=training)
+
+        self.conv3 = _conv(out_chls, out_channels, kernel_size=1, stride=1, padding=0)
+        self.bn3 = _BatchNorm2dInit(out_channels, momentum=momentum, affine=self.affine, use_batch_statistics=training)
+
+        if training:
+            self.bn1 = self.bn1.set_train()
+            self.bn2 = self.bn2.set_train()
+            self.bn3 = self.bn3.set_train()
+
+        if not weights_update:
+            self.conv1.weight.requires_grad = False
+            self.conv2.weight.requires_grad = False
+            self.conv3.weight.requires_grad = False
+
+        self.relu = P.ReLU()
+        self.downsample = down_sample
+        if self.downsample:
+            self.conv_down_sample = _conv(in_channels, out_channels, kernel_size=1, stride=stride, padding=0)
+            self.bn_down_sample = _BatchNorm2dInit(out_channels, momentum=momentum, affine=self.affine,
+                                                   use_batch_statistics=training)
+            if training:
+                self.bn_down_sample = self.bn_down_sample.set_train()
+            if not weights_update:
+                self.conv_down_sample.weight.requires_grad = False
+        self.add = P.Add()
+
+    def construct(self, x):
+        """
+        Construct the ResNet V1 residual block.
+
+        Args:
+            x: input feature data.
+
+        Returns:
+        Tensor, output tensor.
+        """
+        identity = x
+
+        out = self.conv1(x)
+        out = self.bn1(out)
+        out = self.relu(out)
+
+        out = self.conv2(out)
+        out = self.bn2(out)
+        out = self.relu(out)
+
+        out = self.conv3(out)
+        out = self.bn3(out)
+
+        if self.downsample:
+            identity = self.conv_down_sample(identity)
+            identity = self.bn_down_sample(identity)
+
+        out = self.add(out, identity)
+        out = self.relu(out)
+
+        return out
+
+
+class ResNetFea(nn.Cell):
+    """
+    ResNet architecture.
+
+    Args:
+        block (Cell): Block for network.
+        layer_nums (list): Numbers of block in different layers.
+        in_channels (list): Input channel in each layer.
+        out_channels (list): Output channel in each layer.
+        weights_update (bool): Weight update flag.
+    Returns:
+        Tensor, output tensor.
+    """
+
+    def __init__(self,
+                 block=ResidualBlockUsing,
+                 in_channels=64,
+                 code='111-2111-211111-211',
+                 weights_update=False):
+        super(ResNetFea, self).__init__()
+
+        bn_training = False
+        self.inplanes = in_channels
+        self.planes = self.inplanes
+        self.conv1 = _conv(3, 64, kernel_size=7, stride=2, padding=3, pad_mode='pad')
+        self.bn1 = _BatchNorm2dInit(64, affine=bn_training, use_batch_statistics=bn_training)
+        self.relu = P.ReLU()
+        self.maxpool = P.MaxPool(kernel_size=3, strides=2, pad_mode="SAME")
+        self.weights_update = weights_update
+        code = code.split('-')
+
+        if not self.weights_update:
+            self.conv1.weight.requires_grad = False
+        self.channels = []
+        self.channels.append(self.planes)
+
+        self.layer1, in_channels = self._make_layer(block,
+                                                    code[0],
+                                                    in_channel=self.inplanes,
+                                                    out_channel=self.planes,
+                                                    training=bn_training,
+                                                    weights_update=self.weights_update)
+        out_channels = in_channels * 2
+        self.channels.append(out_channels)
+        self.layer2, in_channels = self._make_layer(block,
+                                                    code[1],
+                                                    in_channel=in_channels,
+                                                    out_channel=out_channels,
+                                                    training=bn_training,
+                                                    weights_update=True)
+        out_channels = in_channels * 2
+        self.channels.append(out_channels)
+        self.layer3, in_channels = self._make_layer(block,
+                                                    code[2],
+                                                    in_channel=in_channels,
+                                                    out_channel=out_channels,
+                                                    training=bn_training,
+                                                    weights_update=True)
+        out_channels = in_channels * 2
+        self.channels.append(out_channels)
+        self.layer4, in_channels = self._make_layer(block,
+                                                    code[3],
+                                                    in_channel=in_channels,
+                                                    out_channel=out_channels,
+                                                    training=bn_training,
+                                                    weights_update=True)
+
+    def _make_layer(self, block, code, in_channel, out_channel, training=False, weights_update=False):
+        """Make block layer."""
+        strides = list(map(int, code))
+        layers = []
+        down_sample = False
+        if strides[0] != 1 or in_channel != out_channel:
+            down_sample = True
+        resblk = block(in_channel,
+                       out_channel,
+                       stride=strides[0],
+                       down_sample=down_sample,
+                       training=training,
+                       weights_update=weights_update)
+        layers.append(resblk)
+
+        for stride in strides[1:]:
+            resblk = block(out_channel, out_channel, stride=stride, training=training, weights_update=weights_update)
+            layers.append(resblk)
+
+        return nn.SequentialCell(layers), out_channel
+
+    def construct(self, x):
+        """
+        Construct the ResNet Network.
+
+        Args:
+            x: input feature data.
+
+        Returns:
+        Tensor, output tensor.
+        """
+        x = self.conv1(x)
+        x = self.bn1(x)
+        x = self.relu(x)
+        c1 = self.maxpool(x)
+
+        c2 = self.layer1(c1)
+        identity = c2
+        if not self.weights_update:
+            identity = F.stop_gradient(c2)
+        c3 = self.layer2(identity)
+        c4 = self.layer3(c3)
+        c5 = self.layer4(c4)
+
+        return identity, c3, c4, c5
diff --git a/vega/networks/mindspore/faster_rcnn/resnet50v1.py b/vega/networks/mindspore/faster_rcnn/resnet50v1.py
new file mode 100644
index 00000000..a8759d2c
--- /dev/null
+++ b/vega/networks/mindspore/faster_rcnn/resnet50v1.py
@@ -0,0 +1,261 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+"""Resnet50v1.0 backbone."""
+
+import numpy as np
+import mindspore.nn as nn
+from mindspore.ops import operations as P
+from mindspore.common.tensor import Tensor
+from mindspore.ops import functional as F
+
+
+def weight_init_ones(shape):
+    """Weight init."""
+    return Tensor(np.full(shape, 0.01).astype(np.float32))
+
+
+def _conv(in_channels, out_channels, kernel_size=3, stride=1, padding=0, pad_mode='pad'):
+    """Conv2D wrapper."""
+    shape = (out_channels, in_channels, kernel_size, kernel_size)
+    weights = weight_init_ones(shape)
+    return nn.Conv2d(in_channels, out_channels,
+                     kernel_size=kernel_size, stride=stride, padding=padding,
+                     pad_mode=pad_mode, weight_init=weights, has_bias=False)
+
+
+def _BatchNorm2dInit(out_chls, momentum=0.1, affine=True, use_batch_statistics=True):
+    """Batchnorm2D wrapper."""
+    dtype = np.float32
+    gamma_init = Tensor(np.array(np.ones(out_chls)).astype(dtype))
+    beta_init = Tensor(np.array(np.ones(out_chls) * 0).astype(dtype))
+    moving_mean_init = Tensor(np.array(np.ones(out_chls) * 0).astype(dtype))
+    moving_var_init = Tensor(np.array(np.ones(out_chls)).astype(dtype))
+    return nn.BatchNorm2d(out_chls, momentum=momentum, affine=affine, gamma_init=gamma_init,
+                          beta_init=beta_init, moving_mean_init=moving_mean_init,
+                          moving_var_init=moving_var_init, use_batch_statistics=use_batch_statistics)
+
+
+class ResNetFea(nn.Cell):
+    """
+    ResNet architecture.
+
+    Args:
+        block (Cell): Block for network.
+        layer_nums (list): Numbers of block in different layers.
+        in_channels (list): Input channel in each layer.
+        out_channels (list): Output channel in each layer.
+        weights_update (bool): Weight update flag.
+    Returns:
+        Tensor, output tensor.
+
+    Examples:
+        >>> ResNet(ResidualBlock,
+        >>>        [3, 4, 6, 3],
+        >>>        [64, 256, 512, 1024],
+        >>>        [256, 512, 1024, 2048],
+        >>>        False)
+    """
+
+    def __init__(self,
+                 block,
+                 layer_nums,
+                 in_channels,
+                 out_channels,
+                 weights_update=False):
+        super(ResNetFea, self).__init__()
+
+        if not len(layer_nums) == len(in_channels) == len(out_channels) == 4:
+            raise ValueError("the length of "
+                             "layer_num, inchannel, outchannel list must be 4!")
+
+        bn_training = False
+        self.conv1 = _conv(3, 64, kernel_size=7, stride=2, padding=3, pad_mode='pad')
+        self.bn1 = _BatchNorm2dInit(64, affine=bn_training, use_batch_statistics=bn_training)
+        self.relu = P.ReLU()
+        self.maxpool = P.MaxPool(kernel_size=3, strides=2, pad_mode="SAME")
+        self.weights_update = weights_update
+
+        if not self.weights_update:
+            self.conv1.weight.requires_grad = False
+
+        self.layer1 = self._make_layer(block,
+                                       layer_nums[0],
+                                       in_channel=in_channels[0],
+                                       out_channel=out_channels[0],
+                                       stride=1,
+                                       training=bn_training,
+                                       weights_update=self.weights_update)
+        self.layer2 = self._make_layer(block,
+                                       layer_nums[1],
+                                       in_channel=in_channels[1],
+                                       out_channel=out_channels[1],
+                                       stride=2,
+                                       training=bn_training,
+                                       weights_update=True)
+        self.layer3 = self._make_layer(block,
+                                       layer_nums[2],
+                                       in_channel=in_channels[2],
+                                       out_channel=out_channels[2],
+                                       stride=2,
+                                       training=bn_training,
+                                       weights_update=True)
+        self.layer4 = self._make_layer(block,
+                                       layer_nums[3],
+                                       in_channel=in_channels[3],
+                                       out_channel=out_channels[3],
+                                       stride=2,
+                                       training=bn_training,
+                                       weights_update=True)
+
+    def _make_layer(self, block, layer_num, in_channel, out_channel, stride, training=False, weights_update=False):
+        """Make block layer."""
+        layers = []
+        down_sample = False
+        if stride != 1 or in_channel != out_channel:
+            down_sample = True
+        resblk = block(in_channel,
+                       out_channel,
+                       stride=stride,
+                       down_sample=down_sample,
+                       training=training,
+                       weights_update=weights_update)
+        layers.append(resblk)
+
+        for _ in range(1, layer_num):
+            resblk = block(out_channel, out_channel, stride=1, training=training, weights_update=weights_update)
+            layers.append(resblk)
+
+        return nn.SequentialCell(layers)
+
+    def construct(self, x):
+        """
+        Construct the ResNet Network.
+
+        Args:
+            x: input feature data.
+
+        Returns:
+        Tensor, output tensor.
+        """
+        x = self.conv1(x)
+        x = self.bn1(x)
+        x = self.relu(x)
+        c1 = self.maxpool(x)
+
+        c2 = self.layer1(c1)
+        identity = c2
+        if not self.weights_update:
+            identity = F.stop_gradient(c2)
+        c3 = self.layer2(identity)
+        c4 = self.layer3(c3)
+        c5 = self.layer4(c4)
+
+        return identity, c3, c4, c5
+
+
+class ResidualBlockUsing_V1(nn.Cell):
+    """
+    ResNet V1 residual block definition.
+
+    Args:
+        in_channels (int) - Input channel.
+        out_channels (int) - Output channel.
+        stride (int) - Stride size for the initial convolutional layer. Default: 1.
+        down_sample (bool) - If to do the downsample in block. Default: False.
+        momentum (float) - Momentum for batchnorm layer. Default: 0.1.
+        training (bool) - Training flag. Default: False.
+        weights_updata (bool) - Weights update flag. Default: False.
+
+    Returns:
+        Tensor, output tensor.
+
+    Examples:
+        ResidualBlock(3,256,stride=2,down_sample=True)
+    """
+
+    expansion = 4
+
+    def __init__(self,
+                 in_channels,
+                 out_channels,
+                 stride=1,
+                 down_sample=False,
+                 momentum=0.1,
+                 training=False,
+                 weights_update=False):
+        super(ResidualBlockUsing_V1, self).__init__()
+
+        self.affine = weights_update
+
+        out_chls = out_channels // self.expansion
+        # self.conv1 = _conv(in_channels, out_chls, kernel_size=1, stride=1, padding=0)
+        self.conv1 = _conv(in_channels, out_chls, kernel_size=1, stride=stride, padding=0)
+        self.bn1 = _BatchNorm2dInit(out_chls, momentum=momentum, affine=self.affine, use_batch_statistics=training)
+
+        # self.conv2 = _conv(out_chls, out_chls, kernel_size=3, stride=stride, padding=1)
+        self.conv2 = _conv(out_chls, out_chls, kernel_size=3, stride=1, padding=1)
+        self.bn2 = _BatchNorm2dInit(out_chls, momentum=momentum, affine=self.affine, use_batch_statistics=training)
+
+        self.conv3 = _conv(out_chls, out_channels, kernel_size=1, stride=1, padding=0)
+        self.bn3 = _BatchNorm2dInit(out_channels, momentum=momentum, affine=self.affine, use_batch_statistics=training)
+
+        if training:
+            self.bn1 = self.bn1.set_train()
+            self.bn2 = self.bn2.set_train()
+            self.bn3 = self.bn3.set_train()
+
+        if not weights_update:
+            self.conv1.weight.requires_grad = False
+            self.conv2.weight.requires_grad = False
+            self.conv3.weight.requires_grad = False
+
+        self.relu = P.ReLU()
+        self.downsample = down_sample
+        if self.downsample:
+            self.conv_down_sample = _conv(in_channels, out_channels, kernel_size=1, stride=stride, padding=0)
+            self.bn_down_sample = _BatchNorm2dInit(out_channels, momentum=momentum, affine=self.affine,
+                                                   use_batch_statistics=training)
+            if training:
+                self.bn_down_sample = self.bn_down_sample.set_train()
+            if not weights_update:
+                self.conv_down_sample.weight.requires_grad = False
+        self.add = P.Add()
+
+    def construct(self, x):
+        """
+        Construct the ResNet V1 residual block.
+
+        Args:
+            x: input feature data.
+
+        Returns:
+        Tensor, output tensor.
+        """
+        identity = x
+
+        out = self.conv1(x)
+        out = self.bn1(out)
+        out = self.relu(out)
+
+        out = self.conv2(out)
+        out = self.bn2(out)
+        out = self.relu(out)
+
+        out = self.conv3(out)
+        out = self.bn3(out)
+
+        if self.downsample:
+            identity = self.conv_down_sample(identity)
+            identity = self.bn_down_sample(identity)
+
+        out = self.add(out, identity)
+        out = self.relu(out)
+
+        return out
diff --git a/vega/networks/mindspore/faster_rcnn/roi_align.py b/vega/networks/mindspore/faster_rcnn/roi_align.py
new file mode 100644
index 00000000..4e032f6c
--- /dev/null
+++ b/vega/networks/mindspore/faster_rcnn/roi_align.py
@@ -0,0 +1,184 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+"""FasterRcnn ROIAlign module."""
+
+import numpy as np
+import mindspore.nn as nn
+import mindspore.common.dtype as mstype
+from mindspore.ops import operations as P
+from mindspore.ops import composite as C
+from mindspore.nn import layer as L
+from mindspore.common.tensor import Tensor
+
+
+class ROIAlign(nn.Cell):
+    """
+    Extract RoI features from multiple feature map.
+
+    Args:
+        out_size_h (int) - RoI height.
+        out_size_w (int) - RoI width.
+        spatial_scale (int) - RoI spatial scale.
+        sample_num (int) - RoI sample number.
+    """
+
+    def __init__(self,
+                 out_size_h,
+                 out_size_w,
+                 spatial_scale,
+                 sample_num=0):
+        super(ROIAlign, self).__init__()
+
+        self.out_size = (out_size_h, out_size_w)
+        self.spatial_scale = float(spatial_scale)
+        self.sample_num = int(sample_num)
+        self.align_op = P.ROIAlign(self.out_size[0], self.out_size[1],
+                                   self.spatial_scale, self.sample_num)
+
+    def construct(self, features, rois):
+        """Construct the trainer of SpNas."""
+        return self.align_op(features, rois)
+
+    def __repr__(self):
+        """Construct the trainer of SpNas."""
+        format_str = self.__class__.__name__
+        format_str += '(out_size={}, spatial_scale={}, sample_num={}'.format(
+            self.out_size, self.spatial_scale, self.sample_num)
+        return format_str
+
+
+class SingleRoIExtractor(nn.Cell):
+    """
+    Extract RoI features from a single level feature map.
+
+    If there are multiple input feature levels, each RoI is mapped to a level
+    according to its scale.
+
+    Args:
+        config (dict): Config
+        roi_layer (dict): Specify RoI layer type and arguments.
+        out_channels (int): Output channels of RoI layers.
+        featmap_strides (int): Strides of input feature maps.
+        batch_size (int)： Batchsize.
+        finest_scale (int): Scale threshold of mapping to level 0.
+    """
+
+    def __init__(self,
+                 config,
+                 roi_layer,
+                 out_channels,
+                 featmap_strides,
+                 batch_size=1,
+                 finest_scale=56):
+        super(SingleRoIExtractor, self).__init__()
+        cfg = config
+        self.train_batch_size = batch_size
+        self.out_channels = out_channels
+        self.featmap_strides = featmap_strides
+        self.num_levels = len(self.featmap_strides)
+        self.out_size = config.roi_layer.out_size
+        self.sample_num = config.roi_layer.sample_num
+        self.roi_layers = self.build_roi_layers(self.featmap_strides)
+        self.roi_layers = L.CellList(self.roi_layers)
+
+        self.sqrt = P.Sqrt()
+        self.log = P.Log()
+        self.finest_scale_ = finest_scale
+        self.clamp = C.clip_by_value
+
+        self.cast = P.Cast()
+        self.equal = P.Equal()
+        self.select = P.Select()
+
+        _mode_16 = False
+        self.dtype = np.float16 if _mode_16 else np.float32
+        self.ms_dtype = mstype.float16 if _mode_16 else mstype.float32
+        self.set_train_local(cfg, training=True)
+
+    def set_train_local(self, config, training=True):
+        """Set training flag."""
+        self.training_local = training
+
+        cfg = config
+        # Init tensor
+        self.batch_size = cfg.roi_sample_num if self.training_local else cfg.rpn_max_num
+        self.batch_size = self.train_batch_size * self.batch_size \
+            if self.training_local else cfg.test_batch_size * self.batch_size
+        self.ones = Tensor(np.array(np.ones((self.batch_size, 1)), dtype=self.dtype))
+        finest_scale = np.array(np.ones((self.batch_size, 1)), dtype=self.dtype) * self.finest_scale_
+        self.finest_scale = Tensor(finest_scale)
+        self.epslion = Tensor(np.array(np.ones((self.batch_size, 1)), dtype=self.dtype) * self.dtype(1e-6))
+        self.zeros = Tensor(np.array(np.zeros((self.batch_size, 1)), dtype=np.int32))
+        self.max_levels = Tensor(np.array(np.ones((self.batch_size, 1)), dtype=np.int32) * (self.num_levels - 1))
+        self.twos = Tensor(np.array(np.ones((self.batch_size, 1)), dtype=self.dtype) * 2)
+        self.res_ = Tensor(np.array(np.zeros((self.batch_size, self.out_channels,
+                                              self.out_size, self.out_size)), dtype=self.dtype))
+
+    def num_inputs(self):
+        """Construct the trainer of SpNas."""
+        return len(self.featmap_strides)
+
+    def init_weights(self):
+        """Construct the trainer of SpNas."""
+        pass
+
+    def log2(self, value):
+        """Construct the trainer of SpNas."""
+        return self.log(value) / self.log(self.twos)
+
+    def build_roi_layers(self, featmap_strides):
+        """Construct the trainer of SpNas."""
+        roi_layers = []
+        for s in featmap_strides:
+            layer_cls = ROIAlign(self.out_size, self.out_size,
+                                 spatial_scale=1 / s,
+                                 sample_num=self.sample_num)
+            roi_layers.append(layer_cls)
+        return roi_layers
+
+    def _c_map_roi_levels(self, rois):
+        """Map rois to corresponding feature levels by scales.
+
+        - scale < finest_scale * 2: level 0
+        - finest_scale * 2 <= scale < finest_scale * 4: level 1
+        - finest_scale * 4 <= scale < finest_scale * 8: level 2
+        - scale >= finest_scale * 8: level 3
+
+        Args:
+            rois (Tensor): Input RoIs, shape (k, 5).
+            num_levels (int): Total level number.
+
+        Returns:
+            Tensor: Level index (0-based) of each RoI, shape (k, )
+        """
+        scale = self.sqrt(rois[::, 3:4:1] - rois[::, 1:2:1] + self.ones) * self.sqrt(
+            rois[::, 4:5:1] - rois[::, 2:3:1] + self.ones)
+
+        target_lvls = self.log2(scale / self.finest_scale + self.epslion)
+        target_lvls = P.Floor()(target_lvls)
+        target_lvls = self.cast(target_lvls, mstype.int32)
+        target_lvls = self.clamp(target_lvls, self.zeros, self.max_levels)
+
+        return target_lvls
+
+    def construct(self, rois, feat1, feat2, feat3, feat4):
+        """Construct the trainer of SpNas."""
+        feats = (feat1, feat2, feat3, feat4)
+        res = self.res_
+        target_lvls = self._c_map_roi_levels(rois)
+        for i in range(self.num_levels):
+            mask = self.equal(target_lvls, P.ScalarToArray()(i))
+            mask = P.Reshape()(mask, (-1, 1, 1, 1))
+            roi_feats_t = self.roi_layers[i](feats[i], rois)
+            mask = self.cast(P.Tile()(self.cast(mask, mstype.int32), (1, 256, self.out_size, self.out_size)),
+                             mstype.bool_)
+            res = self.select(mask, roi_feats_t, res)
+
+        return res
diff --git a/vega/networks/mindspore/faster_rcnn/rpn.py b/vega/networks/mindspore/faster_rcnn/rpn.py
new file mode 100644
index 00000000..852a2f0b
--- /dev/null
+++ b/vega/networks/mindspore/faster_rcnn/rpn.py
@@ -0,0 +1,316 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+"""RPN for fasterRCNN."""
+import numpy as np
+import mindspore.nn as nn
+import mindspore.common.dtype as mstype
+from mindspore import context, Tensor
+from mindspore.ops import operations as P
+from mindspore.ops import functional as F
+from mindspore.common.initializer import initializer
+from .bbox_assign_sample import BboxAssignSample
+
+
+class RpnRegClsBlock(nn.Cell):
+    """
+    Rpn reg cls block for rpn layer.
+
+    Args:
+        in_channels (int) - Input channels of shared convolution.
+        feat_channels (int) - Output channels of shared convolution.
+        num_anchors (int) - The anchor number.
+        cls_out_channels (int) - Output channels of classification convolution.
+        weight_conv (Tensor) - weight init for rpn conv.
+        bias_conv (Tensor) - bias init for rpn conv.
+        weight_cls (Tensor) - weight init for rpn cls conv.
+        bias_cls (Tensor) - bias init for rpn cls conv.
+        weight_reg (Tensor) - weight init for rpn reg conv.
+        bias_reg (Tensor) - bias init for rpn reg conv.
+
+    Returns:
+        Tensor, output tensor.
+    """
+
+    def __init__(self,
+                 in_channels,
+                 feat_channels,
+                 num_anchors,
+                 cls_out_channels,
+                 weight_conv,
+                 bias_conv,
+                 weight_cls,
+                 bias_cls,
+                 weight_reg,
+                 bias_reg):
+        super(RpnRegClsBlock, self).__init__()
+        self.rpn_conv = nn.Conv2d(in_channels, feat_channels, kernel_size=3, stride=1, pad_mode='same',
+                                  has_bias=True, weight_init=weight_conv, bias_init=bias_conv)
+        self.relu = nn.ReLU()
+
+        self.rpn_cls = nn.Conv2d(feat_channels, num_anchors * cls_out_channels, kernel_size=1, pad_mode='valid',
+                                 has_bias=True, weight_init=weight_cls, bias_init=bias_cls)
+        self.rpn_reg = nn.Conv2d(feat_channels, num_anchors * 4, kernel_size=1, pad_mode='valid',
+                                 has_bias=True, weight_init=weight_reg, bias_init=bias_reg)
+
+    def construct(self, x):
+        """Construct the trainer of SpNas."""
+        x = self.relu(self.rpn_conv(x))
+
+        x1 = self.rpn_cls(x)
+        x2 = self.rpn_reg(x)
+
+        return x1, x2
+
+
+class RPN(nn.Cell):
+    """
+    ROI proposal network..
+
+    Args:
+        config (dict) - Config.
+        batch_size (int) - Batchsize.
+        in_channels (int) - Input channels of shared convolution.
+        feat_channels (int) - Output channels of shared convolution.
+        num_anchors (int) - The anchor number.
+        cls_out_channels (int) - Output channels of classification convolution.
+
+    Returns:
+        Tuple, tuple of output tensor.
+
+    Examples:
+        RPN(config=config, batch_size=2, in_channels=256, feat_channels=1024,
+            num_anchors=3, cls_out_channels=512)
+    """
+
+    def __init__(self,
+                 config,
+                 batch_size,
+                 in_channels,
+                 feat_channels,
+                 num_anchors,
+                 cls_out_channels):
+        super(RPN, self).__init__()
+        cfg_rpn = config
+        self.dtype = np.float32
+        self.ms_type = mstype.float32
+        self.device_type = "Ascend" if context.get_context("device_target") == "Ascend" else "Others"
+        self.num_bboxes = cfg_rpn.num_bboxes
+        self.slice_index = ()
+        self.feature_anchor_shape = ()
+        self.slice_index += (0,)
+        index = 0
+        for shape in cfg_rpn.feature_shapes:
+            self.slice_index += (self.slice_index[index] + shape[0] * shape[1] * num_anchors,)
+            self.feature_anchor_shape += (shape[0] * shape[1] * num_anchors * batch_size,)
+            index += 1
+
+        self.num_anchors = num_anchors
+        self.batch_size = batch_size
+        self.test_batch_size = cfg_rpn.test_batch_size
+        self.num_layers = 5
+        self.real_ratio = Tensor(np.ones((1, 1)).astype(self.dtype))
+
+        self.rpn_convs_list = nn.layer.CellList(self._make_rpn_layer(self.num_layers, in_channels, feat_channels,
+                                                                     num_anchors, cls_out_channels))
+
+        self.transpose = P.Transpose()
+        self.reshape = P.Reshape()
+        self.concat = P.Concat(axis=0)
+        self.fill = P.Fill()
+        self.placeh1 = Tensor(np.ones((1,)).astype(self.dtype))
+
+        self.trans_shape = (0, 2, 3, 1)
+
+        self.reshape_shape_reg = (-1, 4)
+        self.reshape_shape_cls = (-1,)
+        self.rpn_loss_reg_weight = Tensor(np.array(cfg_rpn.rpn_loss_reg_weight).astype(self.dtype))
+        self.rpn_loss_cls_weight = Tensor(np.array(cfg_rpn.rpn_loss_cls_weight).astype(self.dtype))
+        self.num_expected_total = Tensor(np.array(cfg_rpn.num_expected_neg * self.batch_size).astype(self.dtype))
+        self.num_bboxes = cfg_rpn.num_bboxes
+        self.get_targets = BboxAssignSample(cfg_rpn, self.batch_size, self.num_bboxes, False)
+        self.CheckValid = P.CheckValid()
+        self.sum_loss = P.ReduceSum()
+        self.loss_cls = P.SigmoidCrossEntropyWithLogits()
+        self.loss_bbox = P.SmoothL1Loss(beta=1.0 / 9.0)
+        self.squeeze = P.Squeeze()
+        self.cast = P.Cast()
+        self.tile = P.Tile()
+        self.zeros_like = P.ZerosLike()
+        self.loss = Tensor(np.zeros((1,)).astype(self.dtype))
+        self.clsloss = Tensor(np.zeros((1,)).astype(self.dtype))
+        self.regloss = Tensor(np.zeros((1,)).astype(self.dtype))
+
+    def _make_rpn_layer(self, num_layers, in_channels, feat_channels, num_anchors, cls_out_channels):
+        """
+        Make rpn layer for rpn proposal network.
+
+        Args:
+        num_layers (int) - layer num.
+        in_channels (int) - Input channels of shared convolution.
+        feat_channels (int) - Output channels of shared convolution.
+        num_anchors (int) - The anchor number.
+        cls_out_channels (int) - Output channels of classification convolution.
+
+        Returns:
+        List, list of RpnRegClsBlock cells.
+        """
+        rpn_layer = []
+
+        shp_weight_conv = (feat_channels, in_channels, 3, 3)
+        shp_bias_conv = (feat_channels,)
+        weight_conv = initializer('Normal', shape=shp_weight_conv, dtype=self.ms_type).to_tensor()
+        bias_conv = initializer(0, shape=shp_bias_conv, dtype=self.ms_type).to_tensor()
+
+        shp_weight_cls = (num_anchors * cls_out_channels, feat_channels, 1, 1)
+        shp_bias_cls = (num_anchors * cls_out_channels,)
+        weight_cls = initializer('Normal', shape=shp_weight_cls, dtype=self.ms_type).to_tensor()
+        bias_cls = initializer(0, shape=shp_bias_cls, dtype=self.ms_type).to_tensor()
+
+        shp_weight_reg = (num_anchors * 4, feat_channels, 1, 1)
+        shp_bias_reg = (num_anchors * 4,)
+        weight_reg = initializer('Normal', shape=shp_weight_reg, dtype=self.ms_type).to_tensor()
+        bias_reg = initializer(0, shape=shp_bias_reg, dtype=self.ms_type).to_tensor()
+
+        for i in range(num_layers):
+            rpn_reg_cls_block = RpnRegClsBlock(in_channels, feat_channels, num_anchors, cls_out_channels, weight_conv,
+                                               bias_conv, weight_cls, bias_cls, weight_reg, bias_reg)
+            if self.device_type == "Ascend":
+                rpn_reg_cls_block.to_float(mstype.float16)
+            rpn_layer.append(rpn_reg_cls_block)
+
+        for i in range(1, num_layers):
+            rpn_layer[i].rpn_conv.weight = rpn_layer[0].rpn_conv.weight
+            rpn_layer[i].rpn_cls.weight = rpn_layer[0].rpn_cls.weight
+            rpn_layer[i].rpn_reg.weight = rpn_layer[0].rpn_reg.weight
+
+            rpn_layer[i].rpn_conv.bias = rpn_layer[0].rpn_conv.bias
+            rpn_layer[i].rpn_cls.bias = rpn_layer[0].rpn_cls.bias
+            rpn_layer[i].rpn_reg.bias = rpn_layer[0].rpn_reg.bias
+
+        return rpn_layer
+
+    def construct(self, inputs, img_metas, anchor_list, gt_bboxes, gt_labels, gt_valids):
+        """Construct the trainer of SpNas."""
+        loss_print = ()
+        rpn_cls_score = ()
+        rpn_bbox_pred = ()
+        rpn_cls_score_total = ()
+        rpn_bbox_pred_total = ()
+
+        for i in range(self.num_layers):
+            x1, x2 = self.rpn_convs_list[i](inputs[i])
+
+            rpn_cls_score_total = rpn_cls_score_total + (x1,)
+            rpn_bbox_pred_total = rpn_bbox_pred_total + (x2,)
+
+            x1 = self.transpose(x1, self.trans_shape)
+            x1 = self.reshape(x1, self.reshape_shape_cls)
+
+            x2 = self.transpose(x2, self.trans_shape)
+            x2 = self.reshape(x2, self.reshape_shape_reg)
+
+            rpn_cls_score = rpn_cls_score + (x1,)
+            rpn_bbox_pred = rpn_bbox_pred + (x2,)
+
+        loss = self.loss
+        clsloss = self.clsloss
+        regloss = self.regloss
+        bbox_targets = ()
+        bbox_weights = ()
+        labels = ()
+        label_weights = ()
+
+        output = ()
+        if self.training:
+            for i in range(self.batch_size):
+                multi_level_flags = ()
+                anchor_list_tuple = ()
+
+                for j in range(self.num_layers):
+                    res = self.cast(self.CheckValid(anchor_list[j], self.squeeze(img_metas[i:i + 1:1, ::])),
+                                    mstype.int32)
+                    multi_level_flags = multi_level_flags + (res,)
+                    anchor_list_tuple = anchor_list_tuple + (anchor_list[j],)
+
+                valid_flag_list = self.concat(multi_level_flags)
+                anchor_using_list = self.concat(anchor_list_tuple)
+
+                gt_bboxes_i = self.squeeze(gt_bboxes[i:i + 1:1, ::])
+                gt_labels_i = self.squeeze(gt_labels[i:i + 1:1, ::])
+                gt_valids_i = self.squeeze(gt_valids[i:i + 1:1, ::])
+
+                bbox_target, bbox_weight, label, label_weight = self.get_targets(gt_bboxes_i,
+                                                                                 gt_labels_i,
+                                                                                 self.cast(valid_flag_list,
+                                                                                           mstype.bool_),
+                                                                                 anchor_using_list, gt_valids_i)
+
+                bbox_target = self.cast(bbox_target, self.ms_type)
+                bbox_weight = self.cast(bbox_weight, self.ms_type)
+                label = self.cast(label, self.ms_type)
+                label_weight = self.cast(label_weight, self.ms_type)
+
+                for j in range(self.num_layers):
+                    begin = self.slice_index[j]
+                    end = self.slice_index[j + 1]
+                    stride = 1
+                    bbox_targets += (bbox_target[begin:end:stride, ::],)
+                    bbox_weights += (bbox_weight[begin:end:stride],)
+                    labels += (label[begin:end:stride],)
+                    label_weights += (label_weight[begin:end:stride],)
+
+            for i in range(self.num_layers):
+                bbox_target_using = ()
+                bbox_weight_using = ()
+                label_using = ()
+                label_weight_using = ()
+
+                for j in range(self.batch_size):
+                    bbox_target_using += (bbox_targets[i + (self.num_layers * j)],)
+                    bbox_weight_using += (bbox_weights[i + (self.num_layers * j)],)
+                    label_using += (labels[i + (self.num_layers * j)],)
+                    label_weight_using += (label_weights[i + (self.num_layers * j)],)
+
+                bbox_target_with_batchsize = self.concat(bbox_target_using)
+                bbox_weight_with_batchsize = self.concat(bbox_weight_using)
+                label_with_batchsize = self.concat(label_using)
+                label_weight_with_batchsize = self.concat(label_weight_using)
+
+                # stop
+                bbox_target_ = F.stop_gradient(bbox_target_with_batchsize)
+                bbox_weight_ = F.stop_gradient(bbox_weight_with_batchsize)
+                label_ = F.stop_gradient(label_with_batchsize)
+                label_weight_ = F.stop_gradient(label_weight_with_batchsize)
+
+                cls_score_i = self.cast(rpn_cls_score[i], self.ms_type)
+                reg_score_i = self.cast(rpn_bbox_pred[i], self.ms_type)
+
+                loss_cls = self.loss_cls(cls_score_i, label_)
+                loss_cls_item = loss_cls * label_weight_
+                loss_cls_item = self.sum_loss(loss_cls_item, (0,)) / self.num_expected_total
+
+                loss_reg = self.loss_bbox(reg_score_i, bbox_target_)
+                bbox_weight_ = self.tile(self.reshape(bbox_weight_, (self.feature_anchor_shape[i], 1)), (1, 4))
+                loss_reg = loss_reg * bbox_weight_
+                loss_reg_item = self.sum_loss(loss_reg, (1,))
+                loss_reg_item = self.sum_loss(loss_reg_item, (0,)) / self.num_expected_total
+
+                loss_total = self.rpn_loss_cls_weight * loss_cls_item + self.rpn_loss_reg_weight * loss_reg_item
+
+                loss += loss_total
+                loss_print += (loss_total, loss_cls_item, loss_reg_item)
+                clsloss += loss_cls_item
+                regloss += loss_reg_item
+
+                output = (loss, rpn_cls_score_total, rpn_bbox_pred_total, clsloss, regloss, loss_print)
+        else:
+            output = (self.placeh1, rpn_cls_score_total, rpn_bbox_pred_total, self.placeh1, self.placeh1, self.placeh1)
+
+        return output
diff --git a/vega/networks/mobilenet.py b/vega/networks/mobilenet.py
index 16ea745a..b262ab45 100644
--- a/vega/networks/mobilenet.py
+++ b/vega/networks/mobilenet.py
@@ -102,7 +102,7 @@ def __init__(self, load_path=None, width_mult=1.0, round_nearest=8):
                 features.append(InvertedResidual(
                     inp=input_channel, oup=output_channel, stride=stride, expand_ratio=t))
                 input_channel = output_channel
-        self.block = OutlistSequential(*features[:18], out_list=[3, 6, 13, 17])
+        self.features = OutlistSequential(*features[:18], out_list=[3, 6, 13, 17])
         if load_path is not None and is_torch_backend():
             import torch
             self.load_state_dict(torch.load(load_path), strict=False)
diff --git a/vega/networks/model_config.py b/vega/networks/model_config.py
index fc58081d..6de852f4 100644
--- a/vega/networks/model_config.py
+++ b/vega/networks/model_config.py
@@ -25,8 +25,6 @@ class ModelConfig(ConfigSerializable):
     pretrained_model_file = None
     head = None
     models_folder = None
-    num_classes = None
-    getter = None
 
     @classmethod
     def from_dict(cls, data, skip_check=True):
diff --git a/vega/networks/network_desc.py b/vega/networks/network_desc.py
index ea1ad176..a0719752 100644
--- a/vega/networks/network_desc.py
+++ b/vega/networks/network_desc.py
@@ -25,7 +25,11 @@ def __init__(self, desc):
     def to_model(self):
         """Transform a NetworkDesc to a special model."""
         logging.debug("Start to Create a Network.")
-        module = ClassFactory.get_cls(ClassType.NETWORK, "Module")
+        module_type = self._desc.get('type', None)
+        if module_type == "DagNetwork":
+            module = ClassFactory.get_cls(ClassType.NETWORK, module_type)
+        else:
+            module = ClassFactory.get_cls(ClassType.NETWORK, "Module")
         model = module.from_desc(self._desc)
         if not model:
             raise Exception("Failed to create model, model desc={}".format(self._desc))
diff --git a/vega/networks/pytorch/customs/modnas/arch_space/construct/predefined/hparams.py b/vega/networks/pytorch/customs/modnas/arch_space/construct/predefined/hparams.py
index 0abf56be..4b0cf726 100644
--- a/vega/networks/pytorch/customs/modnas/arch_space/construct/predefined/hparams.py
+++ b/vega/networks/pytorch/customs/modnas/arch_space/construct/predefined/hparams.py
@@ -9,6 +9,7 @@
 # MIT License for more details.
 
 """Hyperparameter constructor."""
+from typing import Dict, List, Union
 from modnas.registry.construct import register
 from modnas.core.params import Numeric, Categorical
 
@@ -17,14 +18,13 @@
 class DefaultHParamSpaceConstructor():
     """Constructor that generates parameters from config."""
 
-    def __init__(self, params):
+    def __init__(self, params: Union[Dict, List]) -> None:
         if isinstance(params, dict):
-            params = params.items()
+            self.params = params.items()
         elif isinstance(params, list):
-            params = [(None, p) for p in params]
-        self.params = params
+            self.params = [(None, p) for p in params]
 
-    def __call__(self, model):
+    def __call__(self, model: None) -> None:
         """Run constructor."""
         del model
         for k, v in self.params:
diff --git a/vega/networks/pytorch/customs/modnas/arch_space/construct/torch/arch_desc.py b/vega/networks/pytorch/customs/modnas/arch_space/construct/torch/arch_desc.py
index bdbc55fc..65c6478f 100644
--- a/vega/networks/pytorch/customs/modnas/arch_space/construct/torch/arch_desc.py
+++ b/vega/networks/pytorch/customs/modnas/arch_space/construct/torch/arch_desc.py
@@ -18,6 +18,8 @@
 from modnas.registry.construct import register
 from modnas.arch_space.slot import Slot
 from modnas.utils.logging import get_logger
+from torch.nn.modules.module import Module
+from typing import Dict, Optional, Any, Sequence
 
 
 logger = get_logger('construct')
@@ -30,7 +32,7 @@
 }
 
 
-def parse_arch_desc(desc, parser=None):
+def parse_arch_desc(desc: Any, parser: Optional[str] = None) -> Any:
     """Return archdesc parsed from file."""
     if isinstance(desc, str):
         default_parser = 'yaml'
@@ -51,7 +53,7 @@ def parse_arch_desc(desc, parser=None):
 class DefaultArchDescConstructor():
     """Constructor that builds network from archdesc."""
 
-    def __init__(self, arch_desc, parse_args=None):
+    def __init__(self, arch_desc: Any, parse_args: Optional[Dict[str, Any]] = None) -> None:
         arch_desc = parse_arch_desc(arch_desc, **(parse_args or {}))
         logger.info('construct from arch_desc: {}'.format(arch_desc))
         self.arch_desc = arch_desc
@@ -65,16 +67,20 @@ def __call__(self, *args, **kwargs):
 class DefaultRecursiveArchDescConstructor(DefaultArchDescConstructor):
     """Constructor that recursively builds network submodules from archdesc."""
 
-    def __init__(self, arch_desc, parse_args=None, construct_fn='build_from_arch_desc', fn_args=None, substitute=False):
+    def __init__(
+        self, arch_desc: Any, parse_args: Optional[Dict] = None, construct_fn: str = 'build_from_arch_desc',
+        fn_args: Optional[Dict] = None, substitute: bool = False, skip_exist: bool = True
+    ) -> None:
         super().__init__(arch_desc, parse_args)
         self.construct_fn = construct_fn
         self.fn_args = fn_args or {}
         self.substitute = substitute
+        self.skip_exist = skip_exist
 
-    def visit(self, module):
+    def visit(self, module: Module) -> Module:
         """Construct and return module."""
         construct_fn = getattr(module, self.construct_fn, None)
-        if construct_fn is not None:
+        if construct_fn is not None and not (isinstance(module, Slot) and module.get_entity() is not None):
             ret = construct_fn(self.arch_desc, **copy.deepcopy(self.fn_args))
             return module if ret is None else ret
         for n, m in module.named_children():
@@ -83,13 +89,16 @@ def visit(self, module):
                 module.add_module(n, m)
         return module
 
-    def __call__(self, model):
+    def __call__(self, model: Module) -> Module:
         """Run constructor."""
         Slot.set_convert_fn(self.convert)
         return self.visit(model)
 
-    def convert(self, slot, desc, *args, **kwargs):
+    def convert(self, slot: Slot, desc: Sequence[str], *args, **kwargs) -> Module:
         """Convert Slot to module from archdesc."""
+        if slot.get_entity() is not None and self.skip_exist:
+            logger.warning('slot {} already built'.format(slot.sid))
+            return None
         desc = desc[0] if isinstance(desc, list) else desc
         return build_module(desc, slot, *args, **kwargs)
 
@@ -98,13 +107,17 @@ def convert(self, slot, desc, *args, **kwargs):
 class DefaultSlotArchDescConstructor(DefaultSlotTraversalConstructor, DefaultArchDescConstructor):
     """Constructor that converts Slots to modules from archdesc."""
 
-    def __init__(self, arch_desc, parse_args=None, fn_args=None):
-        DefaultSlotTraversalConstructor.__init__(self)
-        DefaultArchDescConstructor.__init__(self, arch_desc, parse_args)
+    def __init__(
+        self, arch_desc: Any, parse_args: Optional[Dict] = None, construct_fn: str = 'build_from_arch_desc',
+        fn_args: Optional[Dict] = None, traversal_args: Optional[Dict] = None, desc_args: Optional[Dict] = None
+    ) -> None:
+        DefaultSlotTraversalConstructor.__init__(self, **(traversal_args or {}))
+        DefaultArchDescConstructor.__init__(self, arch_desc, parse_args, **(desc_args or {}))
+        self.construct_fn = construct_fn
         self.fn_args = fn_args or {}
         self.idx = -1
 
-    def get_next_desc(self):
+    def get_next_desc(self) -> Any:
         """Return next archdesc item."""
         self.idx += 1
         desc = self.arch_desc[self.idx]
@@ -112,7 +125,17 @@ def get_next_desc(self):
             desc = desc[0]
         return desc
 
-    def convert(self, slot):
+    def convert(self, slot: Slot, desc=None, *args, **kwargs) -> Module:
         """Convert Slot to module from archdesc."""
-        m_type = self.get_next_desc()
-        return build_module(m_type, slot, **copy.deepcopy(self.fn_args))
+        if slot in self.visited:
+            return None
+        self.visited.add(slot)
+        desc = desc or self.get_next_desc()
+        ent = slot.get_entity()
+        fn_args = copy.deepcopy(self.fn_args)
+        if ent is not None:
+            construct_fn = getattr(ent, self.construct_fn, None)
+            if construct_fn is not None:
+                ret = construct_fn(desc, **fn_args)
+                return ent if ret is None else ret
+        return build_module(desc, slot, *args, **fn_args, **kwargs)
diff --git a/vega/networks/pytorch/customs/modnas/arch_space/construct/torch/default.py b/vega/networks/pytorch/customs/modnas/arch_space/construct/torch/default.py
index 3a49dbed..0ecfebfa 100644
--- a/vega/networks/pytorch/customs/modnas/arch_space/construct/torch/default.py
+++ b/vega/networks/pytorch/customs/modnas/arch_space/construct/torch/default.py
@@ -104,6 +104,7 @@ def __init__(self, gen=None, convert_fn=None, args=None, skip_exist=True):
         self.skip_exist = skip_exist
         if convert_fn:
             self.convert = get_convert_fn(convert_fn, **(args or {}))
+        self.visited = set()
 
     def convert(self, slot):
         """Return converted module from slot."""
@@ -111,6 +112,7 @@ def convert(self, slot):
 
     def __call__(self, model):
         """Run constructor."""
+        Slot.set_convert_fn(self.convert)
         gen = self.gen or Slot.gen_slots_model(model)
         all_slots = list(gen())
         for m in all_slots:
@@ -119,6 +121,7 @@ def __call__(self, model):
             ent = self.convert(m)
             if ent is not None:
                 m.set_entity(ent)
+        self.visited.clear()
         return model
 
 
@@ -134,7 +137,11 @@ def __init__(self, candidates, mixed_op, candidate_args=None):
 
     def convert(self, slot):
         """Return converted MixedOp from slot."""
-        cands = OrderedDict([(cand, build_module(cand, slot, **self.candidate_args)) for cand in self.candidates])
+        cand_args = self.candidate_args.copy()
+        candidates = self.candidates
+        if isinstance(candidates, (list, tuple)):
+            candidates = {k: k for k in candidates}
+        cands = OrderedDict([(k, build_module(v, slot=slot, **cand_args)) for k, v in candidates.items()])
         return build_module(self.mixed_op_conf, candidates=cands)
 
 
diff --git a/vega/networks/pytorch/customs/modnas/arch_space/construct/torch/droppath.py b/vega/networks/pytorch/customs/modnas/arch_space/construct/torch/droppath.py
index 2b18fb18..18a9a7c3 100644
--- a/vega/networks/pytorch/customs/modnas/arch_space/construct/torch/droppath.py
+++ b/vega/networks/pytorch/customs/modnas/arch_space/construct/torch/droppath.py
@@ -10,40 +10,93 @@
 
 """DropPath constructor."""
 import torch
-from modnas.arch_space.ops import DropPath, Identity
+from modnas.arch_space.ops import Identity
 from modnas.core.event import event_on
 from .default import DefaultSlotTraversalConstructor
 from modnas.registry.construct import register
-from modnas.utils import copy_members
+from modnas.arch_space.slot import Slot
+from torch.nn.modules.container import Sequential
+from torch.nn.modules.module import Module
+from typing import Optional
+
+
+class DropPath(torch.nn.Module):
+    """DropPath module."""
+
+    def __init__(self, prob=0.):
+        super().__init__()
+        self.drop_prob = prob
+
+    def extra_repr(self):
+        """Return extra representation string."""
+        return 'prob={}, inplace'.format(self.drop_prob)
+
+    def forward(self, x):
+        """Return operator output."""
+        if self.training and self.drop_prob > 0.:
+            keep_prob = 1. - self.drop_prob
+            mask = torch.FloatTensor(x.size(0), 1, 1, 1).to(device=x.device).bernoulli_(keep_prob)
+            x.div_(keep_prob).mul_(mask)
+        return x
+
+
+def _apply_drop_prob(module, prob):
+    for m in module.modules():
+        if isinstance(m, DropPath):
+            m.drop_prob = prob
+
+
+def _parse_drop_prob(drop_prob):
+    if isinstance(drop_prob, (tuple, list)):
+        return drop_prob[0], drop_prob[1]
+    else:
+        return 0, drop_prob
+
+
+@register
+class DropPathConverter():
+    """Constructor that applies DropPath on a single module."""
+
+    def __init__(self, drop_prob=0.1):
+        self.min_drop_prob, self.max_drop_prob = _parse_drop_prob(drop_prob)
+
+    def __call__(self, module):
+        """Run constructor."""
+        def drop_prob_update(*args, epoch=None, tot_epochs=None, **kwargs):
+            _apply_drop_prob(module, self.max_drop_prob * epoch / tot_epochs)
+
+        event_on('before:TrainerBase.train_epoch', drop_prob_update)
+        if module is None or isinstance(module, Identity):
+            return module
+        return torch.nn.Sequential(module, DropPath(self.min_drop_prob))
 
 
 @register
 class DropPathConstructor(DefaultSlotTraversalConstructor):
     """Constructor that applies DropPath on Slot modules."""
 
-    def __init__(self, *args, drop_prob=0.1, skip_exist=False, **kwargs):
+    def __init__(self, *args, drop_prob=0.1, skip_exist=False, **kwargs) -> None:
         super().__init__(*args, skip_exist=skip_exist, **kwargs)
-        self.drop_prob = drop_prob
+        self.min_drop_prob, self.max_drop_prob = _parse_drop_prob(drop_prob)
         self.transf = DropPathTransformer()
 
-    def __call__(self, model):
+    def __call__(self, model: Module) -> Module:
         """Run constructor."""
         super().__call__(model)
 
         def drop_prob_update(*args, epoch=None, tot_epochs=None, **kwargs):
-            self.transf.set_prob(self.drop_prob * epoch / tot_epochs)
+            self.transf.set_prob(self.max_drop_prob * epoch / tot_epochs)
             self.transf(model)
 
         event_on('before:TrainerBase.train_epoch', drop_prob_update)
         return model
 
-    def convert(self, slot):
+    def convert(self, slot: Slot) -> Optional[Sequential]:
         """Return module with DropPath."""
         ent = slot.get_entity()
         if ent is None or isinstance(ent, Identity):
-            return
-        new_ent = torch.nn.Sequential(ent, DropPath())
-        copy_members(new_ent, ent, excepts=['forward', 'modules', 'named_modules'])
+            return None
+        new_ent = torch.nn.Sequential(ent, DropPath(self.min_drop_prob))
         return new_ent
 
 
@@ -51,19 +104,17 @@ def convert(self, slot):
 class DropPathTransformer(DefaultSlotTraversalConstructor):
     """Transformer that update DropPath probability."""
 
-    def __init__(self, *args, skip_exist=False, **kwargs):
+    def __init__(self, *args, skip_exist=False, **kwargs) -> None:
         super().__init__(*args, skip_exist=skip_exist, **kwargs)
         self.prob = None
 
-    def set_prob(self, prob):
+    def set_prob(self, prob: float) -> None:
         """Set DropPath probability."""
         self.prob = prob
 
-    def convert(self, slot):
+    def convert(self, slot: Slot) -> None:
         """Apply DropPath probability on Slot module."""
         ent = slot.get_entity()
-        if ent is None or isinstance(ent, Identity):
+        if ent is None:
             return
-        for m in ent.modules():
-            if isinstance(m, DropPath):
-                m.drop_prob = self.prob
+        _apply_drop_prob(ent, self.prob)
diff --git a/vega/networks/pytorch/customs/modnas/arch_space/construct/torch/model_init.py b/vega/networks/pytorch/customs/modnas/arch_space/construct/torch/model_init.py
index 0240a931..7c84953a 100644
--- a/vega/networks/pytorch/customs/modnas/arch_space/construct/torch/model_init.py
+++ b/vega/networks/pytorch/customs/modnas/arch_space/construct/torch/model_init.py
@@ -9,68 +9,150 @@
 # MIT License for more details.
 
 """Model weight initializer."""
+import copy
 import math
-import torch.nn as nn
+import torch.nn.init as init
 from modnas.registry.construct import register
 
 
-def _init_he_normal_fout(t, gain, fan_in, fan_out):
+def _t_init_he_normal_fout(t, gain, fan_in, fan_out):
     stdv = gain / math.sqrt(fan_out)
-    nn.init.normal_(t, 0, stdv)
+    init.normal_(t, 0, stdv)
 
 
-def _init_he_normal_fin(t, gain, fan_in, fan_out):
+def _t_init_he_normal_fin(t, gain, fan_in, fan_out):
     stdv = gain / math.sqrt(fan_in)
-    nn.init.normal_(t, 0, stdv)
+    init.normal_(t, 0, stdv)
 
 
-def _init_he_uniform_fout(t, gain, fan_in, fan_out):
+def _t_init_he_uniform_fout(t, gain, fan_in, fan_out):
     b = math.sqrt(3.) * gain / math.sqrt(fan_out)
-    nn.init.uniform_(t, -b, b)
+    init.uniform_(t, -b, b)
 
 
-def _init_he_uniform_fin(t, gain, fan_in, fan_out):
+def _t_init_he_uniform_fin(t, gain, fan_in, fan_out):
     b = math.sqrt(3.) * gain / math.sqrt(fan_in)
-    nn.init.uniform_(t, -b, b)
+    init.uniform_(t, -b, b)
 
 
-def _init_xavier_uniform(t, gain, fan_in, fan_out):
+def _t_init_xavier_uniform(t, gain, fan_in, fan_out):
     b = math.sqrt(6.) * gain / math.sqrt(fan_in + fan_out)
-    nn.init.uniform_(t, -b, b)
+    init.uniform_(t, -b, b)
 
 
-def _init_xavier_normal(t, gain, fan_in, fan_out):
+def _t_init_xavier_normal(t, gain, fan_in, fan_out):
     stdv = math.sqrt(2.) * gain / math.sqrt(fan_in + fan_out)
-    nn.init.normal_(t, 0, stdv)
+    init.normal_(t, 0, stdv)
 
 
-def _init_uniform_fin(t, gain, fan_in, fan_out):
+def _t_init_uniform_fin(t, gain, fan_in, fan_out):
     b = 1.0 / math.sqrt(fan_in)
-    nn.init.uniform_(t, -b, b)
+    init.uniform_(t, -b, b)
 
 
-def _init_uniform_fout(t, gain, fan_in, fan_out):
+def _t_init_uniform_fout(t, gain, fan_in, fan_out):
     b = 1.0 / math.sqrt(fan_out)
-    nn.init.uniform_(t, -b, b)
-
-
-def _init_uniform(t, gain, fan_in, fan_out):
-    nn.init.uniform_(t)
-
-
-def _init_normal(t, gain, fan_in, fan_out):
-    nn.init.normal_(t)
-
-
-def _init_zeros(t, gain, fan_in, fan_out):
-    nn.init.zeros_(t)
-
-
-def _init_ones(t, gain, fan_in, fan_out):
-    nn.init.ones_(t)
-
-
-_initializers = {k[5:]: v for (k, v) in globals().items() if k.startswith('_init_')}
+    init.uniform_(t, -b, b)
+
+
+def _t_init_uniform(t, gain, fan_in, fan_out):
+    init.uniform_(t)
+
+
+def _t_init_normal(t, gain, fan_in, fan_out):
+    init.normal_(t)
+
+
+def _t_init_zeros(t, gain, fan_in, fan_out):
+    init.zeros_(t)
+
+
+def _t_init_ones(t, gain, fan_in, fan_out):
+    init.ones_(t)
+
+
+def _init_tensor(init_type, t, gain, fan_in, fan_out):
+    init_fn = _tensor_init_fn.get(init_type)
+    if init_fn is None or t is None:
+        return
+    init_fn(t, gain, fan_in, fan_out)
+
+
+def _m_init_conv(m, config):
+    init_type = config['conv']['type']
+    bias_init_type = config['bias']['type']
+    gain = config['gain']
+    if init_type is None:
+        return
+    rec_size = m.kernel_size[0] * m.kernel_size[1]
+    fan_in = rec_size * m.in_channels
+    fan_out = rec_size * m.out_channels
+    if config['conv'].get('div_groups', True):
+        fan_in /= m.groups
+        fan_out /= m.groups
+    _init_tensor(init_type, m.weight, gain, fan_in, fan_out)
+    if m.bias is not None:
+        _init_tensor(bias_init_type, m.bias, gain, fan_in, fan_out)
+
+
+def _m_init_norm(m, config):
+    init_type = config['norm']['type']
+    bias_init_type = config['bias']['type']
+    momentum = config['norm'].get('momentum')
+    eps = config['norm'].get('eps')
+    gain = config['gain']
+    m.reset_running_stats()
+    if momentum is not None:
+        m.momentum = momentum
+    if eps is not None:
+        m.eps = eps
+    if not m.affine:
+        return
+    fan_in = fan_out = m.num_features
+    _init_tensor(init_type, m.weight, gain, fan_in, fan_out)
+    _init_tensor(bias_init_type, m.bias, gain, fan_in, fan_out)
+
+
+def _m_init_fc(m, config):
+    init_type = config['fc']['type']
+    bias_init_type = config['bias']['type']
+    gain = config['gain']
+    if init_type is None:
+        return
+    fan_in, fan_out = m.in_features, m.out_features
+    _init_tensor(init_type, m.weight, gain, fan_in, fan_out)
+    if m.bias is None:
+        return
+    _init_tensor(bias_init_type, m.bias, gain, fan_in, fan_out)
+
+
+_tensor_init_fn = {k[8:]: v for (k, v) in globals().items() if k.startswith('_t_init_')}
+_module_init_fn = {k[8:]: v for (k, v) in globals().items() if k.startswith('_m_init_')}
+
+
+_default_init_config = {
+    'conv': {
+        'type': None,
+        'div_groups': True,
+    },
+    'norm': {
+        'type': None,
+    },
+    'fc': {
+        'type': None,
+    },
+    'bias': {
+        'type': None,
+    },
+}
+
+
+_default_module_map = {
+    'Conv2d': 'conv',
+    'BatchNorm2d': 'norm',
+    'GroupNorm': 'norm',
+    'Linear': 'fc',
+}
 
 
 @register
@@ -78,71 +160,28 @@ class DefaultModelInitializer():
     """Model weight initializer class."""
 
     def __init__(self,
+                 init_config=None,
+                 module_init_map=None,
                  default_init_type=None,
-                 conv_init_type=None,
-                 conv_div_groups=True,
-                 bn_init_type=None,
-                 bn_momentum=None,
-                 bn_eps=None,
-                 fc_init_type=None,
-                 bias_init_type=None,
                  neg_slope=math.sqrt(5),
                  nonlinear='leaky_relu'):
+        self.init_config = copy.deepcopy(_default_init_config)
+        self.init_config['gain'] = init.calculate_gain(nonlinear, neg_slope)
+        self.init_config.update(init_config or {})
+        self.module_init_map = _default_module_map.copy()
+        self.module_init_map.update(module_init_map or {})
         self.default_init_type = default_init_type
-        self.conv_init_type = conv_init_type
-        self.conv_div_groups = conv_div_groups
-        self.bn_init_type = bn_init_type
-        self.bn_momentum = bn_momentum
-        self.bn_eps = bn_eps
-        self.fc_init_type = fc_init_type
-        self.bias_init_type = bias_init_type
-        self.neg_slope = neg_slope
-        self.nonlinear = nonlinear
-        self.gain = nn.init.calculate_gain(nonlinear, neg_slope)
-
-    def _init_tensor(self, init_type, t, gain, fan_in, fan_out):
-        if init_type not in _initializers or t is None:
-            return
-        init_fn = _initializers[init_type]
-        init_fn(t, gain, fan_in, fan_out)
 
     def __call__(self, model):
         """Return initialized model."""
-        gain = self.gain
         for m in model.modules():
-            if isinstance(m, nn.Conv2d):
-                if self.conv_init_type is None:
-                    continue
-                rec_size = m.kernel_size[0] * m.kernel_size[1]
-                fan_in = rec_size * m.in_channels
-                fan_out = rec_size * m.out_channels
-                if self.conv_div_groups:
-                    fan_in /= m.groups
-                    fan_out /= m.groups
-                self.init_tensor(self.conv_init_type, m.weight, gain, fan_in, fan_out)
-                if m.bias is not None:
-                    self.init_tensor(self.bias_init_type, m.bias, gain, fan_in, fan_out)
-            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
-                m.reset_running_stats()
-                if self.bn_momentum is not None:
-                    m.momentum = self.bn_momentum
-                if self.bn_eps is not None:
-                    m.eps = self.bn_eps
-                if not m.affine:
-                    continue
-                fan_in = fan_out = m.num_features
-                self.init_tensor(self.bn_init_type, m.weight, gain, fan_in, fan_out)
-                self.init_tensor(self.bias_init_type, m.bias, gain, fan_in, fan_out)
-            elif isinstance(m, nn.Linear):
-                if self.fc_init_type is None:
-                    continue
-                self.init_tensor(self.fc_init_type, m.weight, gain, fan_in, fan_out)
-                if m.bias is None:
-                    continue
-                self.init_tensor(self.bias_init_type, m.bias, gain, fan_in, fan_out)
+            m_init_type = self.module_init_map.get(type(m).__name__)
+            if m_init_type is not None:
+                _module_init_fn[m_init_type](m, self.init_config)
             elif len(list(m.children())) == 0:
                 for p in m.parameters():
                     sz = p.shape
                     fan_out = sz[0] if len(sz) else 1
                     fan_in = sz[min(1, len(sz) - 1)] if len(sz) else 1
-                    self.init_tensor(self.default_init_type, p, gain, fan_in, fan_out)
+                    _init_tensor(self.default_init_type, p, self.init_config['gain'], fan_in, fan_out)
+        return model
diff --git a/vega/networks/pytorch/customs/modnas/arch_space/construct/torch/torch.py b/vega/networks/pytorch/customs/modnas/arch_space/construct/torch/torch.py
index fac7a8b4..36a4ae48 100644
--- a/vega/networks/pytorch/customs/modnas/arch_space/construct/torch/torch.py
+++ b/vega/networks/pytorch/customs/modnas/arch_space/construct/torch/torch.py
@@ -23,6 +23,8 @@
 
 def parse_device(device):
     """Return device ids from config."""
+    if isinstance(device, int):
+        device = str(device)
     if not isinstance(device, str):
         return []
     device = device.lower()
@@ -81,6 +83,7 @@ def __call__(self, model):
         if model is None:
             return
         device_ids = self.device_ids
+        backend.set_device(device_ids[0])
         if device_ids[0] is not None:
             torch.cuda.set_device(device_ids[0])
         model.to(device=device_ids[0])
diff --git a/vega/networks/pytorch/customs/modnas/arch_space/export/predefined/default.py b/vega/networks/pytorch/customs/modnas/arch_space/export/predefined/default.py
index 3820dc02..1e2a4d7e 100644
--- a/vega/networks/pytorch/customs/modnas/arch_space/export/predefined/default.py
+++ b/vega/networks/pytorch/customs/modnas/arch_space/export/predefined/default.py
@@ -14,20 +14,21 @@
 import yaml
 from modnas.core.param_space import ParamSpace
 from modnas.registry.export import register, build
+from typing import Any, Dict, List, Optional, Union
 
 
 @register
 class DefaultToFileExporter():
     """Exporter that saves archdesc to file."""
 
-    def __init__(self, path, ext='yaml'):
+    def __init__(self, path: str, ext: str = 'yaml') -> None:
         path, pathext = os.path.splitext(path)
         ext = pathext or ext
         path = path + '.' + ext
         self.path = path
         self.ext = ext
 
-    def __call__(self, desc):
+    def __call__(self, desc: Any) -> None:
         """Run Exporter."""
         ext = self.ext
         if isinstance(desc, str):
@@ -58,18 +59,15 @@ def __call__(self, model):
 class DefaultParamsExporter():
     """Exporter that outputs parameter values."""
 
-    def __init__(self, export_fmt=None, with_keys=True):
+    def __init__(self, export_fmt: Optional[str] = None, with_keys: bool = True) -> None:
         self.export_fmt = export_fmt
         self.with_keys = with_keys
 
-    def __call__(self, model):
+    def __call__(self, model: None) -> Union[Dict[str, Any], List[Any], str]:
         """Run Exporter."""
         if self.with_keys:
-            params = dict(ParamSpace().named_param_values())
+            params_dct = dict(ParamSpace().named_param_values())
+            return self.export_fmt.format(**params_dct) if self.export_fmt else params_dct
         else:
-            params = [p.value() for p in ParamSpace().params()]
-        if self.export_fmt:
-            if self.with_keys:
-                return self.export_fmt.format(**params)
-            return self.export_fmt.format(*params)
-        return params
+            params_list = [p.value() for p in ParamSpace().params()]
+            return self.export_fmt.format(*params_list) if self.export_fmt else params_list
diff --git a/vega/networks/pytorch/customs/modnas/arch_space/layer_defs.py b/vega/networks/pytorch/customs/modnas/arch_space/layer_defs.py
index d176069d..f5a57f00 100644
--- a/vega/networks/pytorch/customs/modnas/arch_space/layer_defs.py
+++ b/vega/networks/pytorch/customs/modnas/arch_space/layer_defs.py
@@ -9,8 +9,10 @@
 # MIT License for more details.
 
 """Dataflow defining components in Layers."""
-import itertools
 import torch
+from itertools import combinations
+from torch import Tensor
+from typing import Iterator, List, Tuple, Union
 from modnas.registry.layer_def import register
 
 
@@ -34,19 +36,19 @@ def merge_range(self, num_states):
 class ConcatMerger(MergerBase):
     """Merger that outputs concatenation of inputs."""
 
-    def __init__(self, start=0):
+    def __init__(self, start: int = 0) -> None:
         super().__init__()
         self.start = start
 
-    def chn_out(self, chn_states):
+    def chn_out(self, chn_states: List[int]) -> int:
         """Return number of channels in merged output."""
         return sum(chn_states[self.start:])
 
-    def merge(self, states):
+    def merge(self, states: List[Tensor]) -> Tensor:
         """Return merged output from inputs."""
         return torch.cat(states[self.start:], dim=1)
 
-    def merge_range(self, num_states):
+    def merge_range(self, num_states: int) -> range:
         """Return indices of merged inputs."""
         return range(self.start, num_states)
 
@@ -76,15 +78,15 @@ def merge_range(self, num_states):
 class SumMerger(MergerBase):
     """Merger that outputs sum of inputs."""
 
-    def __init__(self, start=0):
+    def __init__(self, start: int = 0) -> None:
         super().__init__()
         self.start = start
 
-    def chn_out(self, chn_states):
+    def chn_out(self, chn_states: List[int]) -> int:
         """Return number of channels in merged output."""
         return chn_states[-1]
 
-    def merge(self, states):
+    def merge(self, states: List[Tensor]) -> Union[Tensor, int]:
         """Return merged output from inputs."""
         return sum(states[self.start:])
 
@@ -130,13 +132,13 @@ def len_enum(self, n_states, n_inputs):
 class CombinationEnumerator(EnumeratorBase):
     """Enumerator that enums all combinations of inputs."""
 
-    def enum(self, n_states, n_inputs):
+    def enum(self, n_states: int, n_inputs: int) -> Iterator[Tuple[int, ...]]:
         """Return enumerated indices from all inputs."""
-        return itertools.combinations(range(n_states), n_inputs)
+        return combinations(range(n_states), n_inputs)
 
     def len_enum(self, n_states, n_inputs):
         """Return number of enumerated inputs."""
-        return len(list(itertools.combinations(range(n_states), n_inputs)))
+        return len(list(combinations(range(n_states), n_inputs)))
 
 
 @register
@@ -198,7 +200,7 @@ def len_enum(self, n_states, n_inputs):
 class AllocatorBase():
     """Base layer dataflow input allocator class."""
 
-    def __init__(self, n_inputs, n_states):
+    def __init__(self, n_inputs: int, n_states: int) -> None:
         self.n_inputs = n_inputs
         self.n_states = n_states
 
@@ -248,10 +250,10 @@ def chn_in(self, chn_states, sidx, cur_state):
 class ReplicateAllocator(AllocatorBase):
     """Allocator that replicate states for each input."""
 
-    def alloc(self, states, sidx, cur_state):
+    def alloc(self, states: List[Tensor], sidx: Tuple[int], cur_state: int) -> List[Tensor]:
         """Return allocated input from previous states."""
         return states
 
-    def chn_in(self, chn_states, sidx, cur_state):
+    def chn_in(self, chn_states: List[int], sidx: Tuple[int], cur_state: int) -> List[int]:
         """Return number of channels of allocated input."""
         return chn_states
diff --git a/vega/networks/pytorch/customs/modnas/arch_space/layers.py b/vega/networks/pytorch/customs/modnas/arch_space/layers.py
index 6bc1ee12..35e4c50a 100644
--- a/vega/networks/pytorch/customs/modnas/arch_space/layers.py
+++ b/vega/networks/pytorch/customs/modnas/arch_space/layers.py
@@ -16,6 +16,9 @@
 from . import layer_defs
 from modnas.registry.layer_def import build as build_layer_def
 from modnas.utils.logging import get_logger
+from torch import Tensor
+from torch.nn.modules.module import Module
+from typing import Dict, List, Optional, Tuple, Type, Union, Any
 
 
 logger = get_logger('arch_space')
@@ -26,18 +29,18 @@ class DAGLayer(nn.Module):
     """Directed Acyclic Graph Layer."""
 
     def __init__(self,
-                 chn_in,
-                 chn_out,
-                 stride,
-                 n_nodes,
-                 allocator,
-                 merger_state,
-                 merger_out,
-                 enumerator,
-                 preproc=None,
-                 edge_cls=Slot,
-                 edge_kwargs=None,
-                 name=None):
+                 chn_in: Tuple[int, int],
+                 chn_out: None,
+                 stride: int,
+                 n_nodes: int,
+                 allocator: str,
+                 merger_state: str,
+                 merger_out: str,
+                 enumerator: str,
+                 preproc: Optional[List[Type[Module]]] = None,
+                 edge_cls: Type[Slot] = Slot,
+                 edge_kwargs: Optional[Dict[str, Any]] = None,
+                 name: Optional[str] = None) -> None:
         super().__init__()
         self.n_nodes = n_nodes
         self.stride = stride
@@ -92,7 +95,7 @@ def __init__(self,
         self.chn_out = self.merger_out.chn_out(chn_states)
         self.chn_states = chn_states
 
-    def forward(self, x):
+    def forward(self, x: Union[Tensor, List[Tensor]]) -> Tensor:
         """Compute Layer output."""
         states = x if isinstance(x, list) else [x]
         if self.preprocs is not None:
@@ -113,7 +116,7 @@ def forward(self, x):
         out = self.merger_out.merge(states)
         return out
 
-    def to_arch_desc(self, k=2):
+    def to_arch_desc(self, k: Union[int, List[int]] = 2) -> Any:
         """Return archdesc from Layer."""
         desc = []
         edge_k = 1
@@ -134,8 +137,6 @@ def to_arch_desc(self, k=2):
                 try:
                     w_edge = torch.max(edges[eidx].ent.prob().detach()[:-1])
                 except AttributeError:
-                    w_edge = -1
-                if w_edge < 0:
                     continue
                 g_edge = [g_edge_child, list(sidx), n_states]
                 if len(topk_edges) < k_states[nidx]:
@@ -149,7 +150,7 @@ def to_arch_desc(self, k=2):
             desc.append([g for w, g in topk_edges])
         return desc
 
-    def build_from_arch_desc(self, desc, *args, **kwargs):
+    def build_from_arch_desc(self, desc: Any, *args, **kwargs) -> None:
         """Build layer ops from desc."""
         chn_states = self.chn_states[:self.n_input]
         num_edges = 0
diff --git a/vega/networks/pytorch/customs/modnas/arch_space/mixed_ops.py b/vega/networks/pytorch/customs/modnas/arch_space/mixed_ops.py
index f91e9d11..72233b28 100644
--- a/vega/networks/pytorch/customs/modnas/arch_space/mixed_ops.py
+++ b/vega/networks/pytorch/customs/modnas/arch_space/mixed_ops.py
@@ -12,6 +12,11 @@
 import torch
 import torch.nn as nn
 import torch.nn.functional as F
+from collections import OrderedDict
+from torch import Tensor
+from typing import Any, Collection, Iterator, List, Tuple, Optional, Union
+from torch.nn.modules.module import Module
+from modnas.core.params.base import Param
 from modnas.core.params import Categorical
 from modnas.registry.params import build
 from modnas.registry.arch_space import register
@@ -24,7 +29,9 @@
 class MixedOp(nn.Module):
     """Base Mixed operator class."""
 
-    def __init__(self, candidates, arch_param):
+    def __init__(
+        self, candidates: Union[OrderedDict, Collection[Tuple[str, Module]]], arch_param: Optional[Param]
+    ) -> None:
         super().__init__()
         if isinstance(candidates, (tuple, list)):
             candidates = {n: p for n, p in candidates}
@@ -37,11 +44,11 @@ def __init__(self, candidates, arch_param):
         self.arch_param = arch_param
         logger.debug('mixed op: {} p: {}'.format(type(self), arch_param))
 
-    def candidates(self):
+    def candidates(self) -> Any:
         """Return list of candidate operators."""
         return list(self._ops.values())
 
-    def candidate_names(self):
+    def candidate_names(self) -> List[str]:
         """Return list of candidate operator names."""
         return list(self._ops.keys())
 
@@ -50,15 +57,15 @@ def named_candidates(self):
         for n, cand in self._ops.items():
             yield n, cand
 
-    def alpha(self):
+    def alpha(self) -> Any:
         """Return architecture parameter value."""
         return self.arch_param_value()
 
-    def prob(self):
+    def prob(self) -> Tensor:
         """Return candidate probabilities."""
         return F.softmax(self.alpha(), dim=-1)
 
-    def arch_param_value(self):
+    def arch_param_value(self) -> Any:
         """Return architecture parameter value."""
         return self.arch_param.value()
 
@@ -67,7 +74,7 @@ def to_arch_desc(self, *args, **kwargs):
         raise NotImplementedError
 
     @staticmethod
-    def gen(model):
+    def gen(model: Module) -> Iterator[Module]:
         """Return an iterator over all MixedOp in a model."""
         for m in model.modules():
             if isinstance(m, MixedOp):
@@ -78,16 +85,16 @@ def gen(model):
 class SoftmaxSumMixedOp(MixedOp):
     """Mixed operator using softmax weighted sum."""
 
-    def __init__(self, candidates, arch_param=None):
+    def __init__(self, candidates: OrderedDict, arch_param: Optional[Param] = None) -> None:
         super().__init__(candidates, arch_param)
 
-    def forward(self, *args, **kwargs):
+    def forward(self, *args, **kwargs) -> Union[Tensor, int]:
         """Compute MixedOp output."""
         outputs = [op(*args, **kwargs) for op in self.candidates()]
         w_path = F.softmax(self.alpha().to(device=outputs[0].device), dim=-1)
-        return sum(w * o for w, o in zip(w_path, outputs))
+        return sum((w * o for w, o in zip(w_path, outputs)))
 
-    def to_arch_desc(self, k=1):
+    def to_arch_desc(self, k: int = 1) -> Any:
         """Return archdesc from mixed operator."""
         cname = self.candidate_names()
         w = F.softmax(self.alpha().detach(), dim=-1)
@@ -102,20 +109,20 @@ def to_arch_desc(self, k=1):
 class BinaryGateMixedOp(MixedOp):
     """Mixed operator controlled by BinaryGate."""
 
-    def __init__(self, candidates, arch_param=None, n_samples=1):
+    def __init__(self, candidates: OrderedDict, arch_param: Optional[Param] = None, n_samples: int = 1) -> None:
         super().__init__(candidates, arch_param)
         self.n_samples = n_samples
-        self.s_path_f = None
+        self.s_path_f = []
         self.last_samples = []
         self.s_op = []
         self.a_grad_enabled = False
         self.reset_ops()
 
-    def arch_param_grad(self, enabled):
+    def arch_param_grad(self, enabled: bool) -> None:
         """Set if enable architecture parameter grad."""
         self.a_grad_enabled = enabled
 
-    def sample_path(self):
+    def sample_path(self) -> None:
         """Sample candidates in forward pass."""
         p = self.alpha()
         s_op = self.s_op
@@ -123,18 +130,18 @@ def sample_path(self):
         samples = self.w_path_f.multinomial(1 if self.a_grad_enabled else self.n_samples)
         self.s_path_f = [s_op[i] for i in samples]
 
-    def sample_ops(self, n_samples):
+    def sample_ops(self, n_samples: int) -> None:
         """Sample activated candidates."""
         samples = self.prob().multinomial(n_samples).detach()
         self.s_op = list(samples.flatten().cpu().numpy())
 
-    def reset_ops(self):
+    def reset_ops(self) -> None:
         """Reset activated candidates."""
         s_op = list(range(len(self.candidates())))
         self.last_samples = s_op
         self.s_op = s_op
 
-    def forward(self, *args, **kwargs):
+    def forward(self, *args, **kwargs) -> Tensor:
         """Compute MixedOp output."""
         self.sample_path()
         s_path_f = self.s_path_f
@@ -156,7 +163,7 @@ def forward(self, *args, **kwargs):
         self.last_samples = s_path_f
         return m_out
 
-    def swap_ops(self, samples):
+    def swap_ops(self, samples: List[int]) -> None:
         """Remove unused candidates from computation graph."""
         cands = self.candidates()
         for i in self.last_samples:
@@ -173,7 +180,7 @@ def swap_ops(self, samples):
                     continue
                 p.requires_grad_(True)
 
-    def to_arch_desc(self, k=1):
+    def to_arch_desc(self, k: int = 1) -> Any:
         """Return archdesc from mixed operator."""
         cname = self.candidate_names()
         w = F.softmax(self.alpha().detach(), dim=-1)
@@ -234,7 +241,7 @@ def backward(ctx, m_grad):
 class BinaryGateUniformMixedOp(BinaryGateMixedOp):
     """Mixed operator controlled by BinaryGate, which candidates sampled uniformly."""
 
-    def sample_path(self):
+    def sample_path(self) -> None:
         """Sample candidates in forward pass."""
         p = self.alpha()
         s_op = self.s_op
@@ -244,7 +251,7 @@ def sample_path(self):
         s_path_f = [s_op[i] for i in samples]
         self.s_path_f = s_path_f
 
-    def sample_ops(self, n_samples):
+    def sample_ops(self, n_samples: int) -> None:
         """Sample activated candidates."""
         p = self.alpha()
         # sample uniformly
@@ -258,15 +265,15 @@ def sample_ops(self, n_samples):
 class GumbelSumMixedOp(MixedOp):
     """Mixed operator using gumbel softmax sum."""
 
-    def __init__(self, candidates, arch_param=None):
+    def __init__(self, candidates: OrderedDict, arch_param: Optional[Param] = None) -> None:
         super().__init__(candidates, arch_param)
         self.temp = 1e5
 
-    def set_temperature(self, temp):
+    def set_temperature(self, temp: float) -> None:
         """Set annealing temperature."""
         self.temp = temp
 
-    def prob(self):
+    def prob(self) -> Tensor:
         """Return candidate probabilities."""
         p = self.alpha()
         eps = 1e-7
@@ -275,13 +282,13 @@ def prob(self):
         scores = (p + gumbels) / self.temp
         return F.softmax(scores, dim=-1)
 
-    def forward(self, *args, **kwargs):
+    def forward(self, *args, **kwargs) -> Union[Tensor, int]:
         """Compute MixedOp output."""
         outputs = [op(*args, **kwargs) for op in self.candidates()]
         w_path = self.prob().to(outputs[0].device)
         return sum(w * o for w, o in zip(w_path, outputs))
 
-    def to_arch_desc(self, k=1):
+    def to_arch_desc(self, k: int = 1) -> Any:
         """Return archdesc from mixed operator."""
         cname = self.candidate_names()
         w = F.softmax(self.alpha().detach(), dim=-1)  # use alpha softmax
@@ -296,19 +303,19 @@ def to_arch_desc(self, k=1):
 class IndexMixedOp(MixedOp):
     """Mixed operator controlled by index."""
 
-    def __init__(self, candidates, arch_param=None):
+    def __init__(self, candidates: OrderedDict, arch_param: Optional[Categorical] = None) -> None:
         if arch_param is None:
             arch_param = Categorical(list(candidates.keys()))
         super().__init__(candidates, arch_param)
         self.last_samples = list(range(len(self.candidates())))
 
-    def alpha(self):
+    def alpha(self) -> Tensor:
         """Return architecture parameter value."""
         alpha = torch.zeros(len(self.candidates()))
         alpha[self.arch_param.index()] = 1.0
         return alpha
 
-    def forward(self, *args, **kwargs):
+    def forward(self, *args, **kwargs) -> Tensor:
         """Compute MixedOp output."""
         cands = self.candidates()
         smp = self.arch_param.index()
@@ -317,7 +324,7 @@ def forward(self, *args, **kwargs):
         self.last_samples = [smp]
         return cands[smp](*args, **kwargs)
 
-    def swap_ops(self, samples):
+    def swap_ops(self, samples: List[int]) -> None:
         """Remove unused candidates from computation graph."""
         cands = self.candidates()
         for i in self.last_samples:
@@ -334,6 +341,6 @@ def swap_ops(self, samples):
                     continue
                 p.requires_grad_(True)
 
-    def to_arch_desc(self, *args, **kwargs):
+    def to_arch_desc(self, *args, **kwargs) -> str:
         """Return archdesc from mixed operator."""
         return self.arch_param_value()
diff --git a/vega/networks/pytorch/customs/modnas/arch_space/ops.py b/vega/networks/pytorch/customs/modnas/arch_space/ops.py
index f45d33cd..114122c6 100644
--- a/vega/networks/pytorch/customs/modnas/arch_space/ops.py
+++ b/vega/networks/pytorch/customs/modnas/arch_space/ops.py
@@ -9,11 +9,13 @@
 # MIT License for more details.
 
 """Network operators / candidates."""
+from typing import Any, List
 import torch
 import torch.nn as nn
 from modnas.utils import get_same_padding
 from modnas.utils.config import Config
 from .slot import register_slot_ccs
+from torch import Tensor
 
 register_slot_ccs(lambda C_in, C_out, stride: PoolBN('avg', C_in, C_out, 3, stride, 1), 'AVG')
 register_slot_ccs(lambda C_in, C_out, stride: PoolBN('max', C_in, C_out, 3, stride, 1), 'MAX')
@@ -43,31 +45,10 @@
 })
 
 
-class DropPath(nn.Module):
-    """DropPath module."""
-
-    def __init__(self, prob=0.):
-        super().__init__()
-        self.drop_prob = prob
-
-    def extra_repr(self):
-        """Return extra representation string."""
-        return 'prob={}, inplace'.format(self.drop_prob)
-
-    def forward(self, x):
-        """Return operator output."""
-        if self.training and self.drop_prob > 0.:
-            keep_prob = 1. - self.drop_prob
-            # per data point mask; assuming x in cuda.
-            mask = torch.cuda.FloatTensor(x.size(0), 1, 1, 1).bernoulli_(keep_prob)
-            x.div_(keep_prob).mul_(mask)
-        return x
-
-
 class PoolBN(nn.Module):
     """AvgPool or MaxPool - BN."""
 
-    def __init__(self, pool_type, C_in, C_out, kernel_size, stride, padding):
+    def __init__(self, pool_type: str, C_in: int, C_out: int, kernel_size: int, stride: int, padding: int) -> None:
         super().__init__()
         if C_in != C_out:
             raise ValueError('invalid channel in pooling layer')
@@ -78,10 +59,10 @@ def __init__(self, pool_type, C_in, C_out, kernel_size, stride, padding):
         else:
             raise ValueError('invalid pooling layer type')
 
-        nets = []
-        for i in config.ops_order:
+        nets: List[Any] = []
+        for i in config['ops_order']:
             if i == 'bn':
-                nets.append(nn.BatchNorm2d(C_in, **config.bn))
+                nets.append(nn.BatchNorm2d(C_in, **config['bn']))
             elif i == 'weight':
                 nets.append(pool)
             elif i == 'act':
@@ -89,7 +70,7 @@ def __init__(self, pool_type, C_in, C_out, kernel_size, stride, padding):
 
         self.net = nn.Sequential(*nets)
 
-    def forward(self, x):
+    def forward(self, x: Tensor) -> Tensor:
         """Return operator output."""
         return self.net(x)
 
@@ -97,21 +78,21 @@ def forward(self, x):
 class StdConv(nn.Module):
     """Standard conv, ReLU - Conv - BN."""
 
-    def __init__(self, C_in, C_out, kernel_size, stride, padding, groups=1):
+    def __init__(self, C_in: int, C_out: int, kernel_size: int, stride: int, padding: int, groups: int = 1) -> None:
         super().__init__()
         C = C_in
-        nets = []
-        for i in config.ops_order:
+        nets: List[Any] = []
+        for i in config['ops_order']:
             if i == 'bn':
-                nets.append(nn.BatchNorm2d(C, **config.bn))
+                nets.append(nn.BatchNorm2d(C, **config['bn']))
             elif i == 'weight':
-                nets.append(nn.Conv2d(C_in, C_out, kernel_size, stride, padding, **config.conv, groups=groups))
+                nets.append(nn.Conv2d(C_in, C_out, kernel_size, stride, padding, **config['conv'], groups=groups))
                 C = C_out
             elif i == 'act':
-                nets.append(nn.ReLU(**config.act))
+                nets.append(nn.ReLU(**config['act']))
         self.net = nn.Sequential(*nets)
 
-    def forward(self, x):
+    def forward(self, x: Tensor) -> Tensor:
         """Return operator output."""
         return self.net(x)
 
@@ -119,23 +100,23 @@ def forward(self, x):
 class FacConv(nn.Module):
     """Factorized conv, ReLU - Conv(Kx1) - Conv(1xK) - BN."""
 
-    def __init__(self, C_in, C_out, kernel_length, stride, padding):
+    def __init__(self, C_in: int, C_out: int, kernel_length: int, stride: int, padding: int) -> None:
         super().__init__()
         C = C_in
-        nets = []
-        for i in config.ops_order:
+        nets: List[Any] = []
+        for i in config['ops_order']:
             if i == 'bn':
-                nets.append(nn.BatchNorm2d(C, **config.bn))
+                nets.append(nn.BatchNorm2d(C, **config['bn']))
             elif i == 'weight':
-                nets.append(nn.Conv2d(C_in, C_in, (kernel_length, 1), stride, (padding, 0), **config.conv))
-                nets.append(nn.Conv2d(C_in, C_out, (1, kernel_length), 1, (0, padding), **config.conv))
+                nets.append(nn.Conv2d(C_in, C_in, (kernel_length, 1), stride, (padding, 0), **config['conv']))
+                nets.append(nn.Conv2d(C_in, C_out, (1, kernel_length), 1, (0, padding), **config['conv']))
                 C = C_out
             elif i == 'act':
-                nets.append(nn.ReLU(**config.act))
+                nets.append(nn.ReLU(**config['act']))
 
         self.net = nn.Sequential(*nets)
 
-    def forward(self, x):
+    def forward(self, x: Tensor) -> Tensor:
         """Return operator output."""
         return self.net(x)
 
@@ -149,22 +130,23 @@ class DilConv(nn.Module):
                       5x5 conv => 9x9 receptive field
     """
 
-    def __init__(self, C_in, C_out, kernel_size, stride, padding, dilation):
+    def __init__(self, C_in: int, C_out: int, kernel_size: int, stride: int, padding: int, dilation: int) -> None:
         super().__init__()
         C = C_in
-        nets = []
-        for i in config.ops_order:
+        nets: List[Any] = []
+        for i in config['ops_order']:
             if i == 'bn':
-                nets.append(nn.BatchNorm2d(C, **config.bn))
+                nets.append(nn.BatchNorm2d(C, **config['bn']))
             elif i == 'weight':
-                nets.append(nn.Conv2d(C_in, C_in, kernel_size, stride, padding, dilation, groups=C_in, **config.conv))
-                nets.append(nn.Conv2d(C_in, C_out, 1, stride=1, padding=0, **config.conv))
+                nets.append(nn.Conv2d(C_in, C_in, kernel_size, stride,
+                            padding, dilation, groups=C_in, **config['conv']))
+                nets.append(nn.Conv2d(C_in, C_out, 1, stride=1, padding=0, **config['conv']))
                 C = C_out
             elif i == 'act':
-                nets.append(nn.ReLU(**config.act))
+                nets.append(nn.ReLU(**config['act']))
         self.net = nn.Sequential(*nets)
 
-    def forward(self, x):
+    def forward(self, x: Tensor) -> Tensor:
         """Return operator output."""
         return self.net(x)
 
@@ -172,12 +154,12 @@ def forward(self, x):
 class SepConv(nn.Module):
     """Depthwise separable conv, DilConv(dilation=1) * 2."""
 
-    def __init__(self, C_in, C_out, kernel_size, stride, padding):
+    def __init__(self, C_in: int, C_out: int, kernel_size: int, stride: int, padding: int) -> None:
         super().__init__()
         self.net = nn.Sequential(DilConv(C_in, C_in, kernel_size, stride, padding, dilation=1),
                                  DilConv(C_in, C_out, kernel_size, 1, padding, dilation=1))
 
-    def forward(self, x):
+    def forward(self, x: Tensor) -> Tensor:
         """Return operator output."""
         return self.net(x)
 
@@ -188,16 +170,16 @@ class SepSingle(nn.Module):
     def __init__(self, C_in, C_out, kernel_size, stride, padding):
         super().__init__()
         C = C_in
-        nets = []
-        for i in config.ops_order:
+        nets: List[Any] = []
+        for i in config['ops_order']:
             if i == 'bn':
-                nets.append(nn.BatchNorm2d(C, **config.bn))
+                nets.append(nn.BatchNorm2d(C, **config['bn']))
             elif i == 'weight':
-                nets.append(nn.Conv2d(C_in, C_in, kernel_size, stride, padding, groups=C_in, **config.conv))
-                nets.append(nn.Conv2d(C_in, C_out, 1, stride=1, padding=0, **config.conv))
+                nets.append(nn.Conv2d(C_in, C_in, kernel_size, stride, padding, groups=C_in, **config['conv']))
+                nets.append(nn.Conv2d(C_in, C_out, 1, stride=1, padding=0, **config['conv']))
                 C = C_out
             elif i == 'act':
-                nets.append(nn.ReLU(**config.act))
+                nets.append(nn.ReLU(**config['act']))
         self.net = nn.Sequential(*nets)
 
     def forward(self, x):
@@ -208,10 +190,10 @@ def forward(self, x):
 class Identity(nn.Module):
     """Identity operation."""
 
-    def __init__(self, *args, **kwargs):
+    def __init__(self, *args, **kwargs) -> None:
         super().__init__()
 
-    def forward(self, x):
+    def forward(self, x: Tensor) -> Tensor:
         """Return operator output."""
         return x
 
@@ -219,14 +201,14 @@ def forward(self, x):
 class Zero(nn.Module):
     """Null operation that returns input-sized zero tensor."""
 
-    def __init__(self, C_in, C_out, stride, *args, **kwargs):
+    def __init__(self, C_in: int, C_out: int, stride: int, *args, **kwargs) -> None:
         super().__init__()
         if C_in != C_out:
             raise ValueError('invalid channel in zero layer')
         self.stride = stride
         self.C_out = C_out
 
-    def forward(self, x):
+    def forward(self, x: Tensor) -> Tensor:
         """Return operator output."""
         if self.stride == 1:
             return x * 0.
@@ -237,14 +219,14 @@ def forward(self, x):
 class FactorizedReduce(nn.Module):
     """Reduce feature map size by factorized pointwise(stride=2)."""
 
-    def __init__(self, C_in, C_out):
+    def __init__(self, C_in: int, C_out: int) -> None:
         super().__init__()
         self.relu = nn.ReLU()
         self.conv1 = nn.Conv2d(C_in, C_out // 2, 1, stride=2, padding=0, bias=False)
         self.conv2 = nn.Conv2d(C_in, C_out // 2, 1, stride=2, padding=0, bias=False)
-        self.bn = nn.BatchNorm2d(C_out, **config.bn)
+        self.bn = nn.BatchNorm2d(C_out, **config['bn'])
 
-    def forward(self, x):
+    def forward(self, x: Tensor) -> Tensor:
         """Return operator output."""
         x = self.relu(x)
         out = torch.cat([self.conv1(x), self.conv2(x[:, :, 1:, 1:])], dim=1)
diff --git a/vega/networks/pytorch/customs/modnas/arch_space/predefined/__init__.py b/vega/networks/pytorch/customs/modnas/arch_space/predefined/__init__.py
index e69de29b..629b43c2 100644
--- a/vega/networks/pytorch/customs/modnas/arch_space/predefined/__init__.py
+++ b/vega/networks/pytorch/customs/modnas/arch_space/predefined/__init__.py
@@ -0,0 +1 @@
+from . import constructed
diff --git a/vega/networks/pytorch/customs/modnas/arch_space/predefined/constructed.py b/vega/networks/pytorch/customs/modnas/arch_space/predefined/constructed.py
new file mode 100644
index 00000000..ad14f21f
--- /dev/null
+++ b/vega/networks/pytorch/customs/modnas/arch_space/predefined/constructed.py
@@ -0,0 +1,24 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""Constructed modules."""
+from modnas.registry.construct import build as build_constructor
+from modnas.registry.arch_space import build as build_module
+from modnas.registry.arch_space import register
+from modnas.registry import streamline_spec
+
+
+@register
+def Constructed(slot=None, construct=None, module=None):
+    """Return a module from constructors."""
+    m = None if module is None else build_module(module, slot=slot)
+    for con in streamline_spec(construct):
+        m = build_constructor(con)(m)
+    return m
diff --git a/vega/networks/pytorch/customs/modnas/arch_space/slot.py b/vega/networks/pytorch/customs/modnas/arch_space/slot.py
index cf115ea9..a6b41a33 100644
--- a/vega/networks/pytorch/customs/modnas/arch_space/slot.py
+++ b/vega/networks/pytorch/customs/modnas/arch_space/slot.py
@@ -187,12 +187,13 @@ def to_arch_desc(self, *args, **kwargs):
 
     def build_from_arch_desc(self, *args, **kwargs):
         """Convert Slot to module from archdesc."""
-        if self.ent is None:
-            ent = Slot._convert_fn(self, *args, **kwargs)
-            if ent is not None:
-                self.set_entity(ent)
-        else:
-            logger.warning('slot {} already built'.format(self.sid))
+        convert_fn = Slot._convert_fn
+        if convert_fn is None:
+            logger.warning('slot {} has no constructor'.format(self.sid))
+            return
+        ent = convert_fn(self, *args, **kwargs)
+        if ent is not None:
+            self.set_entity(ent)
 
     def extra_repr(self):
         """Return extra string representation."""
@@ -240,8 +241,8 @@ def get_slot_args(slot, args_fmt, kwargs_fmt):
         kwargs = slot.kwargs if kwargs_fmt == '*' else {k: slot.kwargs[k] for k in (kwargs_fmt or [])}
         return args, kwargs
 
-    def bld(s, *args, **kwargs):
-        s_args, s_kwargs = get_slot_args(s, args_fmt, kwargs_fmt)
+    def bld(slot, *args, **kwargs):
+        s_args, s_kwargs = get_slot_args(slot, args_fmt, kwargs_fmt)
         return builder(*s_args, *args, **s_kwargs, **kwargs)
 
     return bld
diff --git a/vega/networks/pytorch/customs/modnas/compat.py b/vega/networks/pytorch/customs/modnas/compat.py
index db9149c0..03b04f95 100644
--- a/vega/networks/pytorch/customs/modnas/compat.py
+++ b/vega/networks/pytorch/customs/modnas/compat.py
@@ -26,25 +26,17 @@ class ModNasArchSpace(Module):
     """ModularNAS Architecture Space."""
 
     def __init__(self,
-                 construct=None,
-                 desc_construct=None,
                  net=None,
                  **kwargs):
         super().__init__()
         use_backend('torch')
         config = Config(kwargs)
-        fully_train = config.get('arch_desc', config.get('fully_train')) or False
-        config['construct'] = (desc_construct if fully_train is not False else construct) or {}
         self.config = config
-        self.constructor = None
         self.net = None
-        if fully_train is not False:
-            self.build_net()
-
-    def build_net(self):
-        """Build network with constructors."""
-        self.constructor = get_default_constructors(self.config)
-        self.net = self.constructor(self.net)
+        is_augment = True if config.get('arch_desc') is not None else False
+        if not config.get('vega_no_construct', False) and is_augment:
+            Config.apply(config, config.pop('augment', {}))
+            self.net = get_default_constructors(self.config)(self.net)
 
     def forward(self, *args, **kwargs):
         """Compute forward output."""
diff --git a/vega/networks/pytorch/customs/modnas/contrib/arch_space/activations/torch.py b/vega/networks/pytorch/customs/modnas/contrib/arch_space/activations/torch.py
index be197190..ee5a9afc 100644
--- a/vega/networks/pytorch/customs/modnas/contrib/arch_space/activations/torch.py
+++ b/vega/networks/pytorch/customs/modnas/contrib/arch_space/activations/torch.py
@@ -1,13 +1,3 @@
-# -*- coding:utf-8 -*-
-
-# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
-# This program is free software; you can redistribute it and/or modify
-# it under the terms of the MIT License.
-# This program is distributed in the hope that it will be useful,
-# but WITHOUT ANY WARRANTY; without even the implied warranty of
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-# MIT License for more details.
-
 """Torch activation functions."""
 import torch.nn
 from modnas.registry.arch_space import register
diff --git a/vega/networks/pytorch/customs/modnas/contrib/arch_space/elastic/modifier.py b/vega/networks/pytorch/customs/modnas/contrib/arch_space/elastic/modifier.py
index f533572f..3c6d8ce1 100644
--- a/vega/networks/pytorch/customs/modnas/contrib/arch_space/elastic/modifier.py
+++ b/vega/networks/pytorch/customs/modnas/contrib/arch_space/elastic/modifier.py
@@ -9,6 +9,9 @@
 # MIT License for more details.
 
 """Module states modifier."""
+from torch.nn.modules.module import Module
+from torch import Tensor
+from typing import Callable, Union
 
 
 def get_ori_param(module, name):
@@ -16,7 +19,7 @@ def get_ori_param(module, name):
     return module._params_ori[name]
 
 
-def get_ori_buffer(module, name):
+def get_ori_buffer(module: Module, name: str) -> Tensor:
     """Return original module buffer."""
     return module._buffers_ori[name]
 
@@ -26,7 +29,7 @@ def get_ori_attr(module, name):
     return module._attrs_ori[name]
 
 
-def backup_param(module, name):
+def backup_param(module: Module, name: str) -> None:
     """Backup module parameter."""
     if not hasattr(module, '_params_ori'):
         module._params_ori = dict()
@@ -36,7 +39,7 @@ def backup_param(module, name):
     module._params_ori[name] = val
 
 
-def backup_buffer(module, name):
+def backup_buffer(module: Module, name: str) -> None:
     """Backup module buffer."""
     if not hasattr(module, '_buffers_ori'):
         module._buffers_ori = dict()
@@ -46,7 +49,7 @@ def backup_buffer(module, name):
     module._buffers_ori[name] = val
 
 
-def backup_attr(module, name):
+def backup_attr(module: Module, name: str) -> None:
     """Backup module attribute."""
     if not hasattr(module, '_attrs_ori'):
         module._attrs_ori = dict()
@@ -113,46 +116,46 @@ def restore_attr(module, name):
     object.__setattr__(module, name, val)
 
 
-def modify_param(module, name, value):
+def modify_param(module: Module, name: str, value: Tensor) -> None:
     """Modify module parameter."""
     backup_param(module, name)
     module._parameters[name] = value
 
 
-def modify_buffer(module, name, value):
+def modify_buffer(module: Module, name: str, value: Tensor) -> None:
     """Modify module buffer."""
     backup_buffer(module, name)
     module._buffers[name] = value
 
 
-def modify_attr(module, name, value):
+def modify_attr(module: Module, name: str, value: Union[Callable, int]) -> None:
     """Modify module attribute."""
     backup_attr(module, name)
     object.__setattr__(module, name, value)
 
 
-def restore_module_parameters(module):
+def restore_module_parameters(module: Module) -> None:
     """Restore module parameters."""
     if hasattr(module, '_params_ori'):
         module._parameters.update(module._params_ori)
         module._params_ori.clear()
 
 
-def restore_module_buffers(module):
+def restore_module_buffers(module: Module) -> None:
     """Restore module buffers."""
     if hasattr(module, '_buffers_ori'):
         module._buffers.update(module._buffers_ori)
         module._buffers_ori.clear()
 
 
-def restore_module_attrs(module):
+def restore_module_attrs(module: Module) -> None:
     """Restore module attributes."""
     if hasattr(module, '_attrs_ori'):
         module.__dict__.update(module._attrs_ori)
         module._attrs_ori.clear()
 
 
-def restore_module_states(module):
+def restore_module_states(module: Module) -> None:
     """Restore all module states."""
     restore_module_parameters(module)
     restore_module_buffers(module)
diff --git a/vega/networks/pytorch/customs/modnas/contrib/arch_space/elastic/sequential.py b/vega/networks/pytorch/customs/modnas/contrib/arch_space/elastic/sequential.py
index fb53aa8f..3f151c32 100644
--- a/vega/networks/pytorch/customs/modnas/contrib/arch_space/elastic/sequential.py
+++ b/vega/networks/pytorch/customs/modnas/contrib/arch_space/elastic/sequential.py
@@ -11,86 +11,24 @@
 """Elastic sequential (depth) transformations."""
 import torch.nn as nn
 from .modifier import modify_attr, restore_module_attrs
+from torch import Tensor
+from torch.nn.modules.module import Module
+from typing import Iterator, List, Optional, Tuple
 
 
-def _hook_module_in(module, inputs):
+def _hook_module_in(module: Module, inputs: Tuple[Tensor]) -> None:
     if ElasticSequential.get_sequential_state(module):
         modify_attr(module, 'forward', lambda x: x)
 
 
-def _hook_module_out(module, inputs, result):
+def _hook_module_out(module: Module, inputs: Tuple[Tensor], result: Tensor) -> None:
     restore_module_attrs(module)
 
 
-class ElasticSequential():
-    """Elastic sequential group manager."""
-
-    _module_hooks = dict()
-    _groups = list()
-
-    @staticmethod
-    def add_group(group):
-        """Add a group."""
-        ElasticSequential._groups.append(group)
-
-    @staticmethod
-    def remove_group(group):
-        """Remove a group."""
-        idx = ElasticSequential._groups.index(group)
-        if not idx == -1:
-            group.destroy()
-            del ElasticSequential._groups[idx]
-
-    @staticmethod
-    def groups():
-        """Return an iterator over groups."""
-        for g in ElasticSequential._groups:
-            yield g
-
-    @staticmethod
-    def num_groups():
-        """Return the number of groups."""
-        return len(ElasticSequential._groups)
-
-    @staticmethod
-    def enable_sequential_transform(module):
-        """Enable sequential transformation on a module."""
-        if module not in ElasticSequential._module_hooks:
-            h_in = module.register_forward_pre_hook(_hook_module_in)
-            h_out = module.register_forward_hook(_hook_module_out)
-            ElasticSequential._module_hooks[module] = (h_in, h_out)
-
-    @staticmethod
-    def disable_sequential_transform(module):
-        """Disable sequential transformation on a module."""
-        if module in ElasticSequential._module_hooks:
-            m_hooks = ElasticSequential._module_hooks.pop(module)
-            for h in m_hooks:
-                h.remove()
-            del module._sequential_state
-
-    @staticmethod
-    def set_sequential_state(module, state):
-        """Set sequential state of a module."""
-        module._sequential_state = state
-
-    @staticmethod
-    def reset_sequential_state(module):
-        """Reset sequential state of a module."""
-        module._sequential_state = None
-
-    @staticmethod
-    def get_sequential_state(module):
-        """Get sequential state of a module."""
-        if not hasattr(module, '_sequential_state'):
-            module._sequential_state = None
-        return module._sequential_state
-
-
 class ElasticSequentialGroup():
     """Module group with elastic sequential dimensions."""
 
-    def __init__(self, *args):
+    def __init__(self, *args) -> None:
         module_groups = []
         for m in args:
             if isinstance(m, nn.Module):
@@ -110,7 +48,7 @@ def destroy(self):
         self.reset_sequential_idx()
         self.disable_sequential_transform()
 
-    def enable_sequential_transform(self):
+    def enable_sequential_transform(self) -> None:
         """Enable sequential transformation of group modules."""
         for m in self.modules():
             ElasticSequential.enable_sequential_transform(m)
@@ -120,7 +58,7 @@ def disable_sequential_transform(self):
         for m in self.modules():
             ElasticSequential.disable_sequential_transform(m)
 
-    def set_depth_ratio(self, ratio):
+    def set_depth_ratio(self, ratio: float) -> None:
         """Set group depth by ratio of the max depth."""
         if ratio is None:
             self.reset_sequential_idx()
@@ -128,7 +66,7 @@ def set_depth_ratio(self, ratio):
         depth = int(self.max_depth * ratio)
         self.set_depth(depth)
 
-    def set_depth(self, depth):
+    def set_depth(self, depth: int) -> None:
         """Set group depth."""
         if depth is None:
             self.reset_sequential_idx()
@@ -137,7 +75,7 @@ def set_depth(self, depth):
             raise ValueError('depth out of range')
         self.set_sequential_idx(list(range(depth)), reverse=True)
 
-    def set_sequential_idx(self, idx, reverse=False):
+    def set_sequential_idx(self, idx: List[int], reverse: bool = False) -> None:
         """Set group sequential index."""
         if isinstance(idx, int):
             idx = [idx]
@@ -152,9 +90,74 @@ def reset_sequential_idx(self):
         for m in self.modules():
             ElasticSequential.reset_sequential_state(m)
 
-    def modules(self, active=False):
+    def modules(self, active: bool = False) -> Iterator[Module]:
         """Return an iterator over all group modules."""
         for m_group in self.module_groups:
             for m in m_group:
                 if not active or not ElasticSequential.get_sequential_state(m):
                     yield m
+
+
+class ElasticSequential():
+    """Elastic sequential group manager."""
+
+    _module_hooks = dict()
+    _groups = list()
+
+    @staticmethod
+    def add_group(group: ElasticSequentialGroup) -> None:
+        """Add a group."""
+        ElasticSequential._groups.append(group)
+
+    @staticmethod
+    def remove_group(group):
+        """Remove a group."""
+        idx = ElasticSequential._groups.index(group)
+        if not idx == -1:
+            group.destroy()
+            del ElasticSequential._groups[idx]
+
+    @staticmethod
+    def groups() -> Iterator[ElasticSequentialGroup]:
+        """Return an iterator over groups."""
+        for g in ElasticSequential._groups:
+            yield g
+
+    @staticmethod
+    def num_groups() -> int:
+        """Return the number of groups."""
+        return len(ElasticSequential._groups)
+
+    @staticmethod
+    def enable_sequential_transform(module: Module) -> None:
+        """Enable sequential transformation on a module."""
+        if module not in ElasticSequential._module_hooks:
+            h_in = module.register_forward_pre_hook(_hook_module_in)
+            h_out = module.register_forward_hook(_hook_module_out)
+            ElasticSequential._module_hooks[module] = (h_in, h_out)
+
+    @staticmethod
+    def disable_sequential_transform(module):
+        """Disable sequential transformation on a module."""
+        if module in ElasticSequential._module_hooks:
+            m_hooks = ElasticSequential._module_hooks.pop(module)
+            for h in m_hooks:
+                h.remove()
+            del module._sequential_state
+
+    @staticmethod
+    def set_sequential_state(module: Module, state: int) -> None:
+        """Set sequential state of a module."""
+        module._sequential_state = state
+
+    @staticmethod
+    def reset_sequential_state(module):
+        """Reset sequential state of a module."""
+        module._sequential_state = None
+
+    @staticmethod
+    def get_sequential_state(module: Module) -> Optional[int]:
+        """Get sequential state of a module."""
+        if not hasattr(module, '_sequential_state'):
+            module._sequential_state = None
+        return module._sequential_state
diff --git a/vega/networks/pytorch/customs/modnas/contrib/arch_space/elastic/spatial.py b/vega/networks/pytorch/customs/modnas/contrib/arch_space/elastic/spatial.py
index f5b1e938..5dd77a81 100644
--- a/vega/networks/pytorch/customs/modnas/contrib/arch_space/elastic/spatial.py
+++ b/vega/networks/pytorch/customs/modnas/contrib/arch_space/elastic/spatial.py
@@ -13,9 +13,12 @@
 import torch.nn as nn
 from .modifier import modify_param, modify_buffer, modify_attr,\
     restore_module_states, get_ori_buffer
+from torch import Tensor
+from torch.nn.modules.module import Module
+from typing import Callable, Iterator, List, Optional, Tuple, Type, Union
 
 
-def _conv2d_fan_out_trnsf(m, idx):
+def _conv2d_fan_out_trnsf(m: nn.Conv2d, idx: Tensor) -> None:
     modify_param(m, 'weight', m.weight[idx, :, :, :])
     if m.bias is not None:
         modify_param(m, 'bias', m.bias[idx])
@@ -24,7 +27,7 @@ def _conv2d_fan_out_trnsf(m, idx):
         modify_attr(m, 'groups', width)
 
 
-def _conv2d_fan_in_trnsf(m, idx):
+def _conv2d_fan_in_trnsf(m: nn.Conv2d, idx: Tensor) -> None:
     bias_idx = None
     if m.groups == 1:
         modify_param(m, 'weight', m.weight[:, idx, :, :])
@@ -39,7 +42,7 @@ def _conv2d_fan_in_trnsf(m, idx):
         modify_param(m, 'bias', m.bias[bias_idx])
 
 
-def _batchnorm2d_fan_in_out_trnsf(m, idx):
+def _batchnorm2d_fan_in_out_trnsf(m: nn.BatchNorm2d, idx: Tensor) -> None:
     if m.weight is not None:
         modify_param(m, 'weight', m.weight[idx])
     if m.bias is not None:
@@ -48,7 +51,7 @@ def _batchnorm2d_fan_in_out_trnsf(m, idx):
     modify_buffer(m, 'running_var', m.running_var[idx])
 
 
-def _batchnorm2d_fan_in_out_post_trnsf(m, idx):
+def _batchnorm2d_fan_in_out_post_trnsf(m: nn.BatchNorm2d, idx: Tensor) -> None:
     if isinstance(idx, slice):
         return
     get_ori_buffer(m, 'running_mean')[idx] = m.running_mean
@@ -74,22 +77,22 @@ def _batchnorm2d_fan_in_out_post_trnsf(m, idx):
 }
 
 
-def get_fan_out_transform(mtype):
+def get_fan_out_transform(mtype: Type) -> Callable:
     """Return the fan out transform of a module type."""
     return _fan_out_transform.get(mtype, None)
 
 
-def get_fan_in_transform(mtype):
+def get_fan_in_transform(mtype: Type) -> Callable:
     """Return the fan in transform of a module type."""
     return _fan_in_transform.get(mtype, None)
 
 
-def get_fan_out_post_transform(mtype):
+def get_fan_out_post_transform(mtype: Type) -> Optional[Callable]:
     """Return the fan out post transform of a module type."""
     return _fan_out_post_transform.get(mtype, None)
 
 
-def get_fan_in_post_transform(mtype):
+def get_fan_in_post_transform(mtype: Type) -> Optional[Callable]:
     """Return the fan in post transform of a module type."""
     return _fan_in_post_transform.get(mtype, None)
 
@@ -114,7 +117,7 @@ def set_fan_in_post_transform(mtype, transf):
     _fan_in_post_transform[mtype] = transf
 
 
-def _hook_module_in(module, inputs):
+def _hook_module_in(module: Module, inputs: Tuple[Tensor]) -> None:
     fan_in_idx, fan_out_idx = ElasticSpatial.get_spatial_idx(module)
     mtype = type(module)
     trnsf = get_fan_in_transform(mtype)
@@ -125,7 +128,7 @@ def _hook_module_in(module, inputs):
         trnsf(module, fan_out_idx)
 
 
-def _hook_module_out(module, inputs, outputs):
+def _hook_module_out(module: Module, inputs: Tuple[Tensor], outputs: Tensor) -> None:
     fan_in_idx, fan_out_idx = ElasticSpatial.get_spatial_idx(module)
     mtype = type(module)
     trnsf = get_fan_in_post_transform(mtype)
@@ -137,95 +140,13 @@ def _hook_module_out(module, inputs, outputs):
     restore_module_states(module)
 
 
-class ElasticSpatial():
-    """Elastic spatial group manager."""
-
-    _module_hooks = dict()
-    _groups = list()
-
-    @staticmethod
-    def add_group(group):
-        """Add a group."""
-        ElasticSpatial._groups.append(group)
-
-    @staticmethod
-    def remove_group(group):
-        """Remove a group."""
-        idx = ElasticSpatial._groups.index(group)
-        if not idx == -1:
-            group.destroy()
-            del ElasticSpatial._groups[idx]
-
-    @staticmethod
-    def groups():
-        """Return an iterator over groups."""
-        for g in ElasticSpatial._groups:
-            yield g
-
-    @staticmethod
-    def num_groups():
-        """Return the number of groups."""
-        return len(ElasticSpatial._groups)
-
-    @staticmethod
-    def enable_spatial_transform(module):
-        """Enable spatial transformation on a module."""
-        if module not in ElasticSpatial._module_hooks:
-            h_in = module.register_forward_pre_hook(_hook_module_in)
-            h_out = module.register_forward_hook(_hook_module_out)
-            ElasticSpatial._module_hooks[module] = (h_in, h_out)
-
-    @staticmethod
-    def disable_spatial_transform(module):
-        """Disable spatial transformation on a module."""
-        if module in ElasticSpatial._module_hooks:
-            m_hooks = ElasticSpatial._module_hooks.pop(module)
-            for h in m_hooks:
-                h.remove()
-            del module._spatial_idx
-
-    @staticmethod
-    def set_spatial_fan_in_idx(module, idx):
-        """Set spatial fan in index of a module."""
-        ElasticSpatial.get_spatial_idx(module)[0] = idx
-
-    @staticmethod
-    def set_spatial_fan_out_idx(module, idx):
-        """Set spatial fan out index of a module."""
-        ElasticSpatial.get_spatial_idx(module)[1] = idx
-
-    @staticmethod
-    def reset_spatial_fan_in_idx(module):
-        """Reset spatial fan in index of a module."""
-        ElasticSpatial.get_spatial_idx(module)[0] = None
-
-    @staticmethod
-    def reset_spatial_fan_out_idx(module):
-        """Reset spatial fan out index of a module."""
-        ElasticSpatial.get_spatial_idx(module)[1] = None
-
-    @staticmethod
-    def reset_spatial_idx(module):
-        """Reset spatial index of a module."""
-        module._spatial_idx = [None, None]
-
-    @staticmethod
-    def get_spatial_idx(module):
-        """Get spatial index of a module."""
-        if not hasattr(module, '_spatial_idx'):
-            module._spatial_idx = [None, None]
-        return module._spatial_idx
-
-    @staticmethod
-    def set_spatial_idx(module, fan_in, fan_out):
-        """Set spatial index of a module."""
-        module._spatial_idx = [fan_in, fan_out]
-
-
 class ElasticSpatialGroup():
     """Module group with elastic spatial dimensions."""
 
-    def __init__(self, fan_out_modules, fan_in_modules, max_width=None, rank_fn=None):
+    def __init__(
+        self, fan_out_modules: List[Module], fan_in_modules: List[Module],
+        max_width: Optional[int] = None, rank_fn: Optional[Callable] = None
+    ) -> None:
         super().__init__()
         if fan_in_modules is None:
             fan_in_modules = []
@@ -257,14 +178,14 @@ def add_idx_mapping(self, dest, map_fn):
         """Add spatial index mapping to group."""
         self.idx_mapping[dest] = map_fn
 
-    def map_index(self, idx, dest):
+    def map_index(self, idx: Tensor, dest: Module) -> Tensor:
         """Return mapped spatial index to target module."""
         map_fn = self.idx_mapping.get(dest, None)
         if map_fn is None:
             return idx
         return [map_fn(i) for i in idx]
 
-    def enable_spatial_transform(self):
+    def enable_spatial_transform(self) -> None:
         """Enable spatial transformation of group modules."""
         for m in self.fan_in_modules + self.fan_out_modules:
             ElasticSpatial.enable_spatial_transform(m)
@@ -274,7 +195,7 @@ def disable_spatial_transform(self):
         for m in self.fan_in_modules + self.fan_out_modules:
             ElasticSpatial.disable_spatial_transform(m)
 
-    def set_width_ratio(self, ratio, rank=None):
+    def set_width_ratio(self, ratio: float, rank: Optional[List[int]] = None) -> None:
         """Set group width by ratio of the max width."""
         if ratio is None:
             self.reset_spatial_idx()
@@ -284,14 +205,14 @@ def set_width_ratio(self, ratio, rank=None):
         width = int(self.max_width * ratio)
         self.set_width(width, rank)
 
-    def set_width(self, width, rank=None):
+    def set_width(self, width: int, rank: Optional[List[int]] = None) -> None:
         """Set group width."""
         if width is None:
             self.reset_spatial_idx()
             return
         if self.cur_rank is None:
             self.set_spatial_rank()
-        rank = self.cur_rank
+        rank = rank or self.cur_rank
         if rank is None:
             idx = slice(0, width)
         else:
@@ -302,13 +223,13 @@ def reset_spatial_rank(self):
         """Reset ranking of group spatial dimension."""
         self.cur_rank = None
 
-    def set_spatial_rank(self, rank=None):
+    def set_spatial_rank(self, rank: Optional[List[int]] = None) -> None:
         """Rank group spatial dimension."""
         if rank is None and self.rank_fn is not None:
             rank = self.rank_fn()
         self.cur_rank = rank
 
-    def set_spatial_idx(self, idx):
+    def set_spatial_idx(self, idx: Tensor) -> None:
         """Set group spatial index."""
         if idx is None:
             self.reset_spatial_idx()
@@ -330,7 +251,7 @@ def reset_spatial_idx(self):
             ElasticSpatial.reset_spatial_fan_out_idx(m)
 
 
-def conv2d_rank_weight_l1norm_fan_in(module):
+def conv2d_rank_weight_l1norm_fan_in(module: nn.Conv2d) -> Tensor:
     """Return the rank of Conv2d weight by L1 norm along fan in dimension."""
     if module.groups == 1:
         sum_dim = 0
@@ -342,13 +263,98 @@ def conv2d_rank_weight_l1norm_fan_in(module):
     return idx
 
 
-def conv2d_rank_weight_l1norm_fan_out(module):
+def conv2d_rank_weight_l1norm_fan_out(module: nn.Conv2d):
     """Return the rank of Conv2d weight by L1 norm along fan out dimension."""
     _, idx = torch.sort(torch.sum(torch.abs(module.weight.data), dim=(1, 2, 3)), dim=0, descending=True)
     return idx
 
 
-def batchnorm2d_rank_weight_l1norm(module):
+def batchnorm2d_rank_weight_l1norm(module: nn.BatchNorm2d):
     """Return the rank of BatchNorm2d weight by L1 norm."""
     _, idx = torch.sort(torch.abs(module.weight.data), dim=0, descending=True)
     return idx
+
+
+class ElasticSpatial():
+    """Elastic spatial group manager."""
+
+    _module_hooks = dict()
+    _groups = list()
+
+    @staticmethod
+    def add_group(group: ElasticSpatialGroup) -> None:
+        """Add a group."""
+        ElasticSpatial._groups.append(group)
+
+    @staticmethod
+    def remove_group(group):
+        """Remove a group."""
+        idx = ElasticSpatial._groups.index(group)
+        if not idx == -1:
+            group.destroy()
+            del ElasticSpatial._groups[idx]
+
+    @staticmethod
+    def groups() -> Iterator[ElasticSpatialGroup]:
+        """Return an iterator over groups."""
+        for g in ElasticSpatial._groups:
+            yield g
+
+    @staticmethod
+    def num_groups() -> int:
+        """Return the number of groups."""
+        return len(ElasticSpatial._groups)
+
+    @staticmethod
+    def enable_spatial_transform(module: Module) -> None:
+        """Enable spatial transformation on a module."""
+        if module not in ElasticSpatial._module_hooks:
+            h_in = module.register_forward_pre_hook(_hook_module_in)
+            h_out = module.register_forward_hook(_hook_module_out)
+            ElasticSpatial._module_hooks[module] = (h_in, h_out)
+
+    @staticmethod
+    def disable_spatial_transform(module):
+        """Disable spatial transformation on a module."""
+        if module in ElasticSpatial._module_hooks:
+            m_hooks = ElasticSpatial._module_hooks.pop(module)
+            for h in m_hooks:
+                h.remove()
+            del module._spatial_idx
+
+    @staticmethod
+    def set_spatial_fan_in_idx(module: Module, idx: Tensor) -> None:
+        """Set spatial fan in index of a module."""
+        ElasticSpatial.get_spatial_idx(module)[0] = idx
+
+    @staticmethod
+    def set_spatial_fan_out_idx(module: Module, idx: Tensor) -> None:
+        """Set spatial fan out index of a module."""
+        ElasticSpatial.get_spatial_idx(module)[1] = idx
+
+    @staticmethod
+    def reset_spatial_fan_in_idx(module):
+        """Reset spatial fan in index of a module."""
+        ElasticSpatial.get_spatial_idx(module)[0] = None
+
+    @staticmethod
+    def reset_spatial_fan_out_idx(module):
+        """Reset spatial fan out index of a module."""
+        ElasticSpatial.get_spatial_idx(module)[1] = None
+
+    @staticmethod
+    def reset_spatial_idx(module):
+        """Reset spatial index of a module."""
+        module._spatial_idx = [None, None]
+
+    @staticmethod
+    def get_spatial_idx(module: Module) -> Union[List[Optional[Tensor]], List[None]]:
+        """Get spatial index of a module."""
+        if not hasattr(module, '_spatial_idx'):
+            module._spatial_idx = [None, None]
+        return module._spatial_idx
+
+    @staticmethod
+    def set_spatial_idx(module, fan_in, fan_out):
+        """Set spatial index of a module."""
+        module._spatial_idx = [fan_in, fan_out]
diff --git a/vega/networks/pytorch/customs/utils/logical_graph.py b/vega/networks/pytorch/customs/utils/logical_graph.py
index ff438ae6..da7b3b85 100644
--- a/vega/networks/pytorch/customs/utils/logical_graph.py
+++ b/vega/networks/pytorch/customs/utils/logical_graph.py
@@ -183,9 +183,10 @@ def _init_graph(self, graphparams):
                 continue
             try:
                 self.nodes, self.input_nodes, self.output_nodes = get_graph_info(self.graph)
-                break
+                return graph
             except Exception:
                 continue
+        self.nodes, self.input_nodes, self.output_nodes = ([], [], [])
         return graph
 
 
diff --git a/vega/networks/pytorch/cyclesrbodys/cyclesr_net.py b/vega/networks/pytorch/cyclesrbodys/cyclesr_net.py
index 732dc620..7f1814ac 100644
--- a/vega/networks/pytorch/cyclesrbodys/cyclesr_net.py
+++ b/vega/networks/pytorch/cyclesrbodys/cyclesr_net.py
@@ -56,7 +56,7 @@ class CycleSRModel(TransModel):
     def __init__(self, **cfg):
         """Initialize method."""
         cfg = Config(cfg)
-        self.use_cuda = cfg.use_cuda
+        self.use_cuda = True
         self.use_distributed = cfg.use_distributed
         self.SR_lr = cfg.SR_lr
         self.cyc_lr = cfg.cyc_lr
diff --git a/vega/report/record.py b/vega/report/record.py
index 377ba69b..b8fc2224 100644
--- a/vega/report/record.py
+++ b/vega/report/record.py
@@ -16,7 +16,6 @@
 from vega.common.utils import remove_np_value
 from vega.common import Status, JsonEncoder, DatatimeFormatString
 
-
 logger = logging.getLogger(__name__)
 
 
@@ -178,12 +177,6 @@ def performance(self, value):
             value = json.loads(value)
         if isinstance(value, dict):
             self._performance.update(value)
-            for key in value:
-                if key not in self.objectives:
-                    if key in ["flops", "params", "latency"]:
-                        self.objectives[key] = "MIN"
-                    else:
-                        self.objectives[key] = 'MAX'
         elif value is not None:
             logger.warn(f"Invalid record performance value: {value}")
 
@@ -224,6 +217,9 @@ def _cal_rewards(self):
             self._objective_keys = list(self.performance.keys())
         res = []
         res_ori = []
+        for k in self.performance.keys():
+            if k not in self.objectives and k in ['flops', 'params', 'latency']:
+                self._objectives.update({k: 'MIN'})
         for obj in self.objective_keys:
             if isinstance(obj, int):
                 obj = list(self.performance.keys())[obj]
@@ -259,7 +255,7 @@ def load_dict(self, src_dic):
                 update_flag = update_flag and key not in ["desc"]
                 if update_flag:
                     for value_key, value_value in value.items():
-                        getattr(self, key)[value_key] = value_value
+                        getattr(self, key)[value_key] = remove_np_value(value_value)
                 else:
                     setattr(self, key, remove_np_value(value))
         self._cal_rewards()
diff --git a/vega/report/report_server.py b/vega/report/report_server.py
index 05d66220..ac212dc7 100644
--- a/vega/report/report_server.py
+++ b/vega/report/report_server.py
@@ -132,21 +132,6 @@ def get_pareto_front_records(self, step_name=None, nums=None, selected_key=None,
         else:
             return pareto
 
-    # def _select_one_record(self, outs, choice='normal'):
-    #     """Select one record."""
-    #     if outs.size == 1:
-    #         return outs.astype(int).tolist()
-    #     if choice == 'normal':
-    #         data = outs[:, 1:].reshape(-1, 1).tolist()
-    #         prob = [round(np.log(i + 1e-2), 2) for i in range(1, len(data[0]) + 1)]
-    #         prob_temp = prob
-    #         for idx, out in enumerate(data):
-    #             sorted_ind = np.argsort(out)
-    #             for idx, ind in enumerate(sorted_ind):
-    #                 prob[ind] += prob_temp[idx]
-    #         normalization = [float(i) / float(sum(prob)) for i in prob]
-    #         return [np.random.choice(len(data[0]), p=normalization)]
-
     @classmethod
     def restore(cls):
         """Transfer cvs_file to records."""
@@ -302,34 +287,35 @@ def update_record(step_name=None, worker_id=None, **kwargs):
     """Update record."""
     if step_name is None or worker_id is None:
         return {"result": "failed", "message": "request message missing step_name or worker id."}
-    if kwargs:
-        kwargs["step_name"] = step_name
-        kwargs["worker_id"] = worker_id
-        uid = "{}_{}".format(step_name, worker_id)
-        global _records_lock, _modified
-        with _records_lock:
-            _modified = True
-            records = ReportServer()._hist_records
-            if uid in records:
-                records[uid].load_dict(kwargs)
-                logging.debug("update record: {}".format(records[uid].to_dict()))
-            else:
-                records[uid] = ReportRecord().load_dict(kwargs)
-                logging.debug("new record: {}".format(records[uid].to_dict()))
-    return {"result": "success", "data": records[uid].to_dict()}
+    kwargs["step_name"] = step_name
+    kwargs["worker_id"] = worker_id
+    uid = "{}_{}".format(step_name, worker_id)
+    global _records_lock, _modified
+    with _records_lock:
+        _modified = True
+        records = ReportServer()._hist_records
+        if uid in records:
+            records[uid].load_dict(kwargs)
+            logging.debug("update record: {}".format(records[uid].to_dict()))
+        else:
+            records[uid] = ReportRecord().load_dict(kwargs)
+            logging.debug("new record: {}".format(records[uid].to_dict()))
+        return {"result": "success", "data": records[uid].to_dict()}
 
 
 def get_record(step_name=None, worker_id=None, **kwargs):
     """Get record."""
     if step_name is None or worker_id is None:
         return {"result": "failed", "message": "require message missing step_name or worker id."}
-    uid = "{}_{}".format(step_name, worker_id)
-    records = ReportServer()._hist_records
-    if uid in records:
-        data = records[uid].to_dict()
-    else:
-        data = ReportRecord().to_dict()
-    return {"result": "success", "data": data}
+    global _records_lock, _modified
+    with _records_lock:
+        uid = "{}_{}".format(step_name, worker_id)
+        records = ReportServer()._hist_records
+        if uid in records:
+            data = records[uid].to_dict()
+        else:
+            data = ReportRecord().to_dict()
+        return {"result": "success", "data": data}
 
 
 def _dump_report(report_server, persistence):
@@ -342,13 +328,13 @@ def _dump_report(report_server, persistence):
             all_records = report_server.all_records
             _modified = False
 
-        try:
-            persistence.save_report(all_records)
-            # TODO
-            # persistence.pickle_report(report_server._hist_records, report_server.__instances__)
-            report_server.backup_output_path()
-        except Exception as e:
-            logging.warning(f"Failed to dump reports, message={str(e)}")
+            try:
+                persistence.save_report(all_records)
+                # TODO
+                # persistence.pickle_report(report_server._hist_records, report_server.__instances__)
+                report_server.backup_output_path()
+            except Exception as e:
+                logging.warning(f"Failed to dump reports, message={str(e)}")
 
 
 def query_report():
diff --git a/vega/security/__init__.py b/vega/security/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/vega/security/config_op.py b/vega/security/config_op.py
new file mode 100644
index 00000000..4147f989
--- /dev/null
+++ b/vega/security/config_op.py
@@ -0,0 +1,129 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""Run pipeline."""
+
+import configparser
+import os
+from argparse import ArgumentParser
+
+
+def read_config_file():
+    """Read config file and return ConfigParser."""
+    vega_config_file = os.path.join(os.environ['HOME'], ".vega", "vega.ini")
+    if not os.path.exists(vega_config_file):
+        print(f"Not found vega security configure file: {vega_config_file}")
+        return None
+    config = configparser.ConfigParser()
+    config.read(vega_config_file)
+    return config
+
+
+def _parse_args():
+    parser = ArgumentParser("Vega Configuration")
+    group_resume = parser.add_mutually_exclusive_group(required=True)
+    group_resume.add_argument("-i", "--init", action='store_true',
+                              help="init vega security config file")
+    group_resume.add_argument("-q", "--query", type=str, choices=["sec", "https"],
+                              help="query current vega security setting")
+    group_resume.add_argument("-s", "--set", type=int, choices=[0, 1],
+                              help="set vega security mode to be on or off")
+    group_resume.add_argument("-m", "--module", type=str, choices=["https"],
+                              help="set cert/key file of each module")
+
+    group_config = parser.add_argument_group(title='cert key files')
+    group_config.add_argument("-ca", "--ca-cert", default=None, type=str,
+                              help="ca cert file")
+    group_config.add_argument("-c", "--cert", default=None, type=str,
+                              help='server cert file')
+    group_config.add_argument("-p", "--public-key", default=None, type=str,
+                              help="server public key file")
+    group_config.add_argument("-k", "--secret-key", default=None, type=str,
+                              help="server secret key file")
+    group_config.add_argument("-ck", "--cli-secret-key", default=None, type=str,
+                              help="client secret key file")
+    args = parser.parse_args()
+    return args
+
+
+def _init_config_file():
+    vega_dir = os.path.join(os.getenv("HOME"), ".vega")
+    os.makedirs(vega_dir, exist_ok=True)
+    vega_config_file = os.path.join(vega_dir, "vega.ini")
+    if os.path.exists(vega_config_file):
+        print("vega config file ({}) already exists.".format(vega_config_file))
+        return
+    with open(vega_config_file, "w") as f:
+        f.write("[security]\n")
+        f.write("enable=True\n")
+        f.write("\n")
+        f.write("[https]\n")
+        f.write("cert_pem_file=\n")
+        f.write("secret_key_file=\n")
+        f.write("\n")
+        f.write("[limit]\n")
+        f.write("request_frequency_limit=100/minute\n")
+        f.write("max_content_length=1000000000\n")
+        f.write("#white_list=0.0.0.0,127.0.0.1\n")
+    print("initializing vega config file ({}).".format(vega_config_file))
+
+
+def _process_cmd(args):
+    if args.init:
+        _init_config_file()
+        return
+    config = read_config_file()
+    if not config:
+        return
+    if args.query:
+        config = _process_cmd_query(args, config)
+        return
+    if args.set is not None:
+        if args.set == 1:
+            config.set("security", "enable", "True")
+            print("set vega security mode to True")
+        else:
+            config.set("security", "enable", "False")
+            print("set vega security mode to False")
+    elif args.module is not None:
+        config = _process_cmd_module(args, config)
+    with open(os.path.join(os.environ['HOME'], ".vega", "vega.ini"), "w") as f:
+        config.write(f)
+
+
+def _process_cmd_query(args, config):
+    if args.query == "sec":
+        print(str(config["security"]["enable"]))
+    elif args.query == "https":
+        print("cert_pem_file: {}".format(
+            config["https"]["cert_pem_file"] if "cert_pem_file" in config["https"] else None))
+        print("secret_key_file: {}".format(
+            config["https"]["secret_key_file"] if "secret_key_file" in config["https"] else None))
+    return config
+
+
+def _process_cmd_module(args, config):
+    print("set cert/key file of module {}".format(args.module))
+    if args.module == "https":
+        if args.cert:
+            config.set("https", "cert_pem_file", args.cert)
+        if args.secret_key:
+            config.set("https", "secret_key_file", args.secret_key)
+    return config
+
+
+def vega_config_operate():
+    """Run pipeline."""
+    args = _parse_args()
+    _process_cmd(args)
+
+
+if __name__ == '__main__':
+    vega_config_operate()
diff --git a/vega/security/kill.py b/vega/security/kill.py
new file mode 100644
index 00000000..8091f74e
--- /dev/null
+++ b/vega/security/kill.py
@@ -0,0 +1,218 @@
+# -*- coding: utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""Kill vega progress."""
+
+import os
+import signal
+import psutil
+import time
+from vega.common import argment_parser
+from vega.tools.query_process import query_process, get_pid, query_processes, get_vega_pids, print_process
+from .run_pipeline import check_env
+
+
+def _parse_args(desc):
+    parser = argment_parser(desc)
+    group = parser.add_mutually_exclusive_group(required=True)
+    group.add_argument("-p", "--pid", type=int,
+                       help="kill Vega main process based on the specified process ID")
+    group.add_argument("-t", "--task_id", type=str,
+                       help="kill Vega main process based on the specified Vega application task ID")
+    group.add_argument("-a", "--all", action='store_true',
+                       help="kill all Vega main process")
+    group.add_argument("-f", "--force", action='store_true',
+                       help="Forcibly kill all Vega-related processes even if the main process does not exist")
+    args = parser.parse_args()
+    return args
+
+
+def _kill_vega_process(pid):
+    if not psutil.pid_exists(pid):
+        print("The Vega process {} does not exist.".format(pid))
+        return
+    if pid not in get_vega_pids():
+        print("Process {} is not the main Vega process.".format(pid))
+        return
+    print_process(query_process(pid))
+    print("")
+    _input = input("Do you want kill vega processes? [Y/n]: ")
+    if _input.upper() in ["N", "NO"]:
+        print("Operation was cancelled.")
+        os._exit(0)
+
+    spids = _get_sub_processes(pid)
+    print("Start to kill Vega process {}.".format(pid))
+    try:
+        os.kill(pid, signal.SIGINT)
+    except Exception:
+        pass
+    _wait(3)
+    spids.append(pid)
+    not_stoped = _check_exited(spids)
+    for pid in not_stoped:
+        try:
+            os.kill(pid, signal.SIGKILL)
+        except Exception:
+            pass
+    _wait(5)
+    print("")
+    not_stoped = _check_exited(not_stoped)
+    if _check_exited(not_stoped):
+        print("Warning: The following processes do not exit completely.")
+        print(not_stoped)
+    else:
+        print("All Vega processes have been killed.")
+
+
+def _kill_vega_process_by_task_id(task_id):
+    pid = get_pid(task_id)
+    if not pid:
+        print("Task ID {} is not the task ID of a Vega process.".format(task_id))
+        return
+    _kill_vega_process(pid)
+
+
+def _kill_all_vega_process():
+    pids = get_vega_pids()
+    if not pids:
+        print("The Vega main program is not found.")
+        return
+
+    print("Vega processes:")
+    for key, value in query_processes().items():
+        print("{}:".format(key))
+        print_process(value)
+    print("")
+    _input = input("Do you want kill all vega processes? [Y/n]: ")
+    if _input.upper() in ["N", "NO"]:
+        print("Operation was cancelled.")
+        os._exit(0)
+
+    all_spids = []
+    all_spids.extend(pids)
+    for pid in pids:
+        spids = _get_sub_processes(pid)
+        all_spids.extend(spids)
+        print("Start to kill the Vega process {}".format(pid))
+        try:
+            os.kill(pid, signal.SIGINT)
+        except Exception:
+            pass
+    _wait(3)
+    not_stoped = _check_exited(all_spids)
+    for pid in not_stoped:
+        try:
+            os.kill(pid, signal.SIGKILL)
+        except Exception:
+            pass
+    _wait(5)
+    print("")
+    not_stoped = _check_exited(not_stoped)
+    if _check_exited(not_stoped):
+        print("Warning: The following processes do not exit completely.")
+        print(not_stoped)
+    else:
+        print("All Vega processes have been killed.")
+
+
+def _get_sub_processes(pid, cpids=[]):
+    p = psutil.Process(pid)
+    for cp in p.children():
+        cpid = cp.pid
+        cpids.append(cpid)
+        try:
+            _get_sub_processes(cpid, cpids)
+        except Exception:
+            pass
+    return cpids
+
+
+def _force_kill():
+    vega_pids = _get_all_related_processes()
+    if not vega_pids:
+        print("No Vega-releted progress found.")
+        return
+
+    _input = input("Do you want kill all Vega-related processes? [Y/n]: ")
+    if _input.upper() in ["N", "NO"]:
+        print("Operation was cancelled.")
+        os._exit(0)
+
+    vega_pids = _get_all_related_processes()
+    print("Start to kill all Vega-related processes.")
+    for pid in vega_pids:
+        try:
+            os.kill(pid, signal.SIGKILL)
+        except Exception:
+            pass
+    _wait(5)
+    print("")
+    not_stoped = _check_exited(vega_pids)
+    if not_stoped:
+        print("Warning: The following processes do not exit completely.")
+        print(not_stoped)
+    else:
+        print("All Vega-related processes have been killed.")
+
+
+def _get_all_related_processes():
+    pids = psutil.pids()
+    vega_pids = []
+    for pid in pids:
+        try:
+            p = psutil.Process(pid)
+        except Exception:
+            continue
+        if p.name() in ["vega", "dask-scheduler", "dask-worker", "vega-main"]:
+            vega_pids.append(pid)
+            vega_pids.extend(_get_sub_processes(pid))
+            continue
+        cmd = " ".join(p.cmdline())
+        if "/bin/vega-kill" in cmd or "/bin/vega-process" in cmd or "/bin/vega-progress" in cmd:
+            continue
+        if "vega.tools.run_pipeline" in cmd or "vega.trainer.deserialize" in cmd or "/bin/vega" in cmd:
+            vega_pids.append(pid)
+            vega_pids.extend(_get_sub_processes(pid))
+            continue
+    vega_pids = set(vega_pids)
+    return vega_pids
+
+
+def _check_exited(pids):
+    not_killed = []
+    for pid in pids:
+        if psutil.pid_exists(pid):
+            not_killed.append(pid)
+    return not_killed
+
+
+def _wait(seconds):
+    for _ in range(seconds * 2):
+        print("*", end="", flush=True)
+        time.sleep(0.5)
+
+
+def _kill():
+    if not check_env():
+        return
+    args = _parse_args("Kill Vega processes.")
+    if args.pid:
+        _kill_vega_process(args.pid)
+    elif args.task_id:
+        _kill_vega_process_by_task_id(args.task_id)
+    elif args.all:
+        _kill_all_vega_process()
+    elif args.force:
+        _force_kill()
+
+
+if __name__ == "__main__":
+    _kill()
diff --git a/vega/security/main.py b/vega/security/main.py
new file mode 100644
index 00000000..9dd1f39c
--- /dev/null
+++ b/vega/security/main.py
@@ -0,0 +1,206 @@
+# -*- coding: utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""The Evaluate Service of the service."""
+import os
+import logging
+
+try:
+    import flask
+    import flask_restful
+    import flask_limiter
+    import werkzeug
+except Exception:
+    logging.warning(
+        "The dependencies [Flask==1.1.2,Flask-RESTful==0.3.8, Werkzeug==1.0.1 ] have not been install, \
+        and will install it automatically, if failed, please install it manually.")
+    os.system("pip3 install Flask==1.1.2")
+    os.system("pip3 install Flask-RESTful==0.3.8")
+    os.system("pip3 install Flask-Limiter==1.4")
+    os.system("pip3 install Werkzeug==1.0.1")
+
+from flask import abort, Flask, request
+from flask_restful import Resource, Api
+from flask_limiter import Limiter
+from flask_limiter.util import get_remote_address
+
+try:
+    from werkzeug import secure_filename
+except Exception:
+    from werkzeug.utils import secure_filename
+
+import glob
+import multiprocessing
+import time
+import shutil
+from evaluate_service.class_factory import ClassFactory
+from .hardwares import *  # noqa F401
+import datetime
+import traceback
+import argparse
+from .run_flask import run_flask, get_white_list, get_request_frequency_limit
+
+app = Flask(__name__)
+api = Api(app)
+
+limiter = Limiter(
+    app,
+    key_func=get_remote_address,
+    default_limits=["100/minute"]
+)
+
+
+@app.before_request
+def limit_remote_addr():
+    """Set limit remote address."""
+    client_ip = str(request.remote_addr)
+    white_list = get_white_list()
+    if white_list and client_ip not in white_list:
+        abort(403)
+
+
+class Evaluate(Resource):
+    """Evaluate Service for service."""
+
+    decorators = [limiter.limit(get_request_frequency_limit)]
+
+    def __init__(self):
+        self.result = {"latency": "9999", "out_data": [], "status": "sucess", "timestamp": "", "error_message": ""}
+
+    @classmethod
+    def _add_params(cls, work_path, optional_params):
+        cls.current_path = work_path
+        cls.optional_params = optional_params
+
+    def post(self):
+        """Interface to response to the post request of the client."""
+        self.parse_paras()
+        self.upload_files()
+
+        self.hardware_instance = ClassFactory.get_cls(self.hardware)(self.optional_params)
+
+        if self.reuse_model == "True":
+            logging.warning("Reuse the model, no need to convert the model.")
+        else:
+            try:
+                self.hardware_instance.convert_model(backend=self.backend, model=self.model, weight=self.weight,
+                                                     save_dir=self.share_dir, input_shape=self.input_shape,
+                                                     out_nodes=self.out_nodes, precision=self.precision)
+            except Exception:
+                self.result["status"] = "Model convert failed."
+                self.result["error_message"] = traceback.format_exc()
+                logging.error("[ERROR] Model convert failed!")
+                traceback.print_exc()
+        try:
+            latency_sum = 0
+            for repeat in range(self.repeat_times):
+                latency, output = self.hardware_instance.inference(converted_model=self.share_dir,
+                                                                   input_data=self.input_data)
+                latency_sum += float(latency)
+            self.result["latency"] = latency_sum / self.repeat_times
+            self.result["out_data"] = output
+        except Exception:
+            self.result["status"] = "Inference failed."
+            self.result["error_message"] = traceback.format_exc()
+            logging.error("[ERROR] Inference failed! ")
+            traceback.print_exc()
+
+        return self.result
+
+    def parse_paras(self):
+        """Parse the parameters in the request from the client."""
+        self.backend = request.form["backend"]
+        self.hardware = request.form["hardware"]
+        self.reuse_model = request.form["reuse_model"]
+        self.job_id = request.form["job_id"]
+        self.input_shape = request.form.get("input_shape", type=str, default="")
+        self.out_nodes = request.form.get("out_nodes", type=str, default="")
+        self.repeat_times = int(request.form.get("repeat_times"))
+        self.precision = request.form.get("precision", type=str, default="FP32")
+
+    def upload_files(self):
+        """Upload the files from the client to the service."""
+        self.now_time = datetime.datetime.now().strftime('%Y%m%d%H%M%S%f')
+        self.result["timestamp"] = self.now_time
+        logging.warning("The timestamp is {}.".format(self.now_time))
+        self.upload_file_path = os.path.join(self.current_path, "out", self.now_time)
+        self.share_dir = os.path.join(self.current_path, "out", self.job_id)
+        os.makedirs(self.upload_file_path)
+
+        model_file = request.files.get("model_file")
+        if model_file is not None:
+            self.model = self.upload_file_path + "/" + secure_filename(model_file.filename)
+            model_file.save(self.model)
+
+        data_file = request.files.get("data_file")
+        if data_file is not None:
+            self.input_data = self.upload_file_path + "/" + secure_filename(data_file.filename)
+            data_file.save(self.input_data)
+
+        weight_file = request.files.get("weight_file")
+        if weight_file is not None:
+            self.weight = self.upload_file_path + "/" + secure_filename(weight_file.filename)
+            weight_file.save(self.weight)
+        else:
+            self.weight = ""
+        logging.warning("upload file sucess!")
+
+
+def _clean_data_path(clean_interval, work_path):
+    while True:
+        _clean_time = time.time() - clean_interval
+        # _current_path = os.path.dirname(os.path.abspath(__file__))
+        folder_pattern = "{}/out/*".format(work_path)
+        folders = glob.glob(folder_pattern)
+        for folder in folders:
+            if os.path.isdir(folder) and os.path.getctime(folder) < _clean_time:
+                logging.warning("remove old folder: {}".format(folder))
+                try:
+                    shutil.rmtree(folder)
+                except Exception:
+                    logging.warning("failed to remove {}".format(folder))
+        time.sleep(3600)
+
+
+def _parse_args():
+    parser = argparse.ArgumentParser(description="Evaluate service")
+    parser.add_argument("-i", "--host_ip", type=str, required=True, help="the ip of the evaluate service machine")
+    parser.add_argument("-p", "--port", type=int, required=False, default=8888, help="the listening port")
+    parser.add_argument("-w", "--work_path", type=str, required=True, help="the work dir to save the file")
+    parser.add_argument("-t", "--davinci_environment_type", type=str, required=False, default="ATLAS300",
+                        help="the type the davinci hardwares")
+    parser.add_argument("-c", "--clean_interval", type=int, required=False, default=1 * 6 * 3600,
+                        help="the time interval to clean the temp folder")
+    parser.add_argument("-u", "--ddk_user_name", type=str, required=False, default="user",
+                        help="the user to acess ATLAS200200 DK")
+    parser.add_argument("-atlas_host_ip", "--atlas_host_ip", type=str, required=False, default=None,
+                        help="the ip of ATLAS200200 DK")
+
+    args = parser.parse_args()
+    return args
+
+
+def run():
+    """Run the evaluate service."""
+    args = _parse_args()
+    ip_address = args.host_ip
+    listen_port = args.port
+    clean_interval = args.clean_interval
+    work_path = args.work_path
+    optional_params = {"davinci_environment_type": args.davinci_environment_type,
+                       "ddk_user_name": args.ddk_user_name,
+                       "atlas_host_ip": args.atlas_host_ip
+                       }
+
+    p = multiprocessing.Process(target=_clean_data_path, args=(clean_interval, work_path), daemon=True)
+    p.start()
+    Evaluate._add_params(work_path, optional_params)
+    api.add_resource(Evaluate, '/')
+    run_flask(app, host=ip_address, port=listen_port)
diff --git a/vega/security/query_process.py b/vega/security/query_process.py
new file mode 100644
index 00000000..6a7e9bf6
--- /dev/null
+++ b/vega/security/query_process.py
@@ -0,0 +1,193 @@
+# -*- coding: utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""Query vega process."""
+
+import psutil
+import json
+import time
+from psutil import _pprint_secs
+from vega.common import MessageServer, MessageClient, argment_parser
+from .run_pipeline import check_env
+
+
+__all__ = [
+    "query_task_info", "get_pid", "is_vega_process", "get_vega_pids",
+    "query_process", "query_processes", "print_process", "print_processes",
+]
+
+
+def _parse_args(desc):
+    parser = argment_parser(desc)
+    parser.add_argument("-j", "--json", action='store_true',
+                        help="return json format string")
+    args = parser.parse_args()
+    return args
+
+
+def get_vega_pids():
+    """Get vega pids."""
+    pids = psutil.pids()
+    vega_pids = []
+    for pid in pids:
+        if is_vega_process(pid):
+            try:
+                p = psutil.Process(pid)
+            except Exception:
+                continue
+            ppid = p.ppid()
+            if ppid in [_pid for (_pid, _ppid) in vega_pids]:
+                continue
+            if pid in [_ppid for (_pid, _ppid) in vega_pids]:
+                vega_pids = [(_pid, _ppid) for (_pid, _ppid) in vega_pids if _ppid != pid]
+                vega_pids.append((pid, ppid))
+                continue
+            vega_pids.append((pid, ppid))
+    return [_pid for (_pid, _ppid) in vega_pids]
+
+
+def get_task_id_path_port(pid):
+    """Get task id."""
+    try:
+        p = psutil.Process(pid)
+        for connection in p.connections():
+            port = connection.laddr.port
+            ip = connection.laddr.ip
+            if port in range(MessageServer().min_port, MessageServer().max_port):
+                client = MessageClient(ip=ip, port=port, timeout=1)
+                result = client.send(action="query_task_info")
+                if isinstance(result, dict) and "task_id" in result:
+                    return result.get("task_id"), result.get("base_path"), ip, port
+        return None, None, None, None
+    except Exception:
+        return None, None, None, None
+
+
+def get_pid(task_id):
+    """Get process id."""
+    processes = query_processes()
+    for process in processes.values():
+        if "task_id" in process and task_id == process["task_id"]:
+            return process["PID"]
+    return None
+
+
+def is_vega_process(pid):
+    """Is it vega process."""
+    try:
+        p = psutil.Process(pid)
+        if p.name().startswith("vega-main"):
+            return True
+    except Exception:
+        return False
+    return False
+
+
+def _print_processes_info(processes):
+    if processes:
+        print("Vega processes:")
+        for id in processes:
+            print("{}:".format(id))
+            process = processes[id]
+            print_process(process)
+            if "task_id" in process and process["task_id"] != "Unknown":
+                _pid = process["PID"]
+                _task_id = process["task_id"]
+                _cwd = process["cwd"]
+                _base_path = process["base_path"]
+        if "_pid" in locals():
+            print("")
+            if _task_id != "Unknown":
+                print("Query progress:")
+                print(f"    vega-progress -t {_task_id} -r {_base_path}")
+                print("")
+            print("Kill process:")
+            print(f"    vega-kill -p {_pid}")
+            if _task_id != "Unknown":
+                print(f"    vega-kill -t {_task_id}")
+            print("")
+    else:
+        print("The Vega main program is not found.")
+
+
+def print_process(process):
+    """Print process info."""
+    if "task_id" in process:
+        print("       PID: {}".format(process["PID"]))
+        print("   task id: {}".format(process["task_id"]))
+        print("       cwd: {}".format(process["cwd"]))
+        print("      user: {}".format(process["user"]))
+        print("  start at: {}".format(process["create_time"]))
+        print("   cmdline: {}".format(process["cmdline"]))
+    else:
+        print("       PID: {}".format(process["PID"]))
+        print("   message: {}".format(process["message"]))
+
+
+def query_process(pid):
+    """Query process info."""
+    try:
+        p = psutil.Process(pid)
+        (task_id, base_path, ip, port) = get_task_id_path_port(pid)
+        return {
+            "PID": pid,
+            "cmdline": p.cmdline()[2:],
+            "create_time": _pprint_secs(p.create_time()),
+            "cwd": p.cwd(),
+            "task_id": task_id if task_id is not None else "Unknown",
+            "base_path": base_path if base_path is not None else "Unknown",
+            "user": p.username(),
+            "ip": ip,
+            "port": port,
+            "running_seconds": int(time.time() - p.create_time()),
+        }
+    except Exception as e:
+        return {
+            "PID": pid,
+            "message": str(e),
+        }
+
+
+def query_task_info(task_id):
+    """Query task info."""
+    pids = get_vega_pids()
+    if pids:
+        for id, pid in enumerate(pids):
+            info = query_process(pid)
+            if isinstance(info, dict) and info.get("task_id", None) == task_id:
+                return info
+    return None
+
+
+def query_processes():
+    """Query all process."""
+    pids = get_vega_pids()
+    infos = {}
+    if pids:
+        for id, pid in enumerate(pids):
+            infos[str(id)] = query_process(pid)
+    return infos
+
+
+def print_processes():
+    """Print all processes."""
+    if not check_env():
+        return
+    """Print all processes."""
+    args = _parse_args("Quey Vega processes.")
+    processes = query_processes()
+    if args.json:
+        print(json.dumps(processes, indent=4))
+    else:
+        _print_processes_info(processes)
+
+
+if __name__ == "__main__":
+    print_processes()
diff --git a/vega/security/query_progress.py b/vega/security/query_progress.py
new file mode 100644
index 00000000..3b25c2f1
--- /dev/null
+++ b/vega/security/query_progress.py
@@ -0,0 +1,175 @@
+# -*- coding: utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""Inference of vega model."""
+
+import os
+import json
+import time
+from datetime import datetime
+from vega.common import Status, JsonEncoder, DatatimeFormatString, argment_parser
+from vega.tools.query_process import query_task_info
+from vega.common import MessageClient
+from .run_pipeline import check_env
+
+
+__all__ = ["query_progress"]
+
+
+def _parse_args(desc):
+    parser = argment_parser(desc)
+    parser.add_argument("-t", "--task_id", type=str, required=True,
+                        help="vega application task id")
+    parser.add_argument("-r", "--root_path", type=str, required=True,
+                        help="root path where vega application is running")
+    args = parser.parse_args()
+    return args
+
+
+def _get_report_path(root_path, task_id):
+    task_path = os.path.join(root_path, task_id)
+    report_path = os.path.join(task_path, "output/reports.json")
+    return report_path
+
+
+def _load_report(report_path):
+    try:
+        with open(report_path, "r") as f:
+            return json.load(f)
+    except Exception:
+        return None
+
+
+def _parse_report(report):
+    if "_steps_" not in report:
+        return {
+            "status": Status.error,
+            "message": "Invalid report file."
+        }
+
+    progress = {
+        "steps": report["_steps_"]
+    }
+
+    model_keys = [
+        "worker_id", "status", "message", "current_epoch", "num_epochs",
+        "start_time", "end_time", "model_path", "performance"
+    ]
+
+    for step in progress["steps"]:
+        step_name = step["step_name"]
+        if step_name not in report:
+            continue
+        step["models"] = report[step_name]
+        for model in step["models"]:
+            keys = list(model.keys())
+            for key in keys:
+                if key not in model_keys:
+                    model.pop(key)
+    return progress
+
+
+def _statistic_progress(progress):
+    # count epochs and models
+    for step in progress["steps"]:
+        finished_models = 0
+        finished_epochs = 0
+        if "models" not in step:
+            continue
+        for model in step["models"]:
+            if model["status"] in [Status.finished.value, Status.finished]:
+                finished_models += 1
+                finished_epochs += model["current_epoch"]
+            else:
+                current_epoch = max((model["current_epoch"] - 1), 0) if "current_epoch" in model else 0
+                finished_epochs += current_epoch
+        step["finished_models"] = finished_models
+        step["finished_epochs"] = finished_epochs
+    # calc time
+    for step in progress["steps"]:
+        step["estimated_end_time"] = None
+        if step["status"] == Status.running.value:
+            if "finished_epochs" in step and step["finished_epochs"] != 0:
+                start_time = datetime.strptime(step["start_time"], DatatimeFormatString)
+                delta = datetime.now() - start_time
+                delta = delta * (step["num_epochs"] - step["finished_epochs"]) / step["finished_epochs"]
+                step["estimated_end_time"] = datetime.now() + delta
+    # count status
+    all_finished = True
+    progress["status"] = Status.running
+    for step in progress["steps"]:
+        if step["status"] in [Status.error.value, Status.error]:
+            progress["status"] = Status.error
+            progress["message"] = step["message"]
+            all_finished = False
+            break
+        if step["status"] not in [Status.finished.value, Status.finished]:
+            all_finished = False
+            break
+    if all_finished:
+        progress["status"] = Status.finished
+
+    return progress
+
+
+def _query_report(task_info):
+    """Get task id."""
+    try:
+        port = task_info["port"]
+        ip = task_info["ip"]
+        client = MessageClient(ip=ip, port=port, timeout=1)
+        return client.send(action="query_report")
+    except Exception:
+        return None
+
+
+def query_progress(times=0):
+    """Query vega progress."""
+    args = _parse_args("Query Vega progress.")
+    task_info = query_task_info(args.task_id)
+
+    if not task_info:
+        report_path = _get_report_path(args.root_path, args.task_id)
+        if not os.path.exists(report_path):
+            times += 1
+            if times <= 3:
+                time.sleep(0.5)
+                query_progress(times)
+            else:
+                return json.dumps({
+                    "status": Status.error,
+                    "message": "The task does not exist, please check root path and task id."
+                }, cls=JsonEncoder, indent=4)
+        report = _load_report(report_path)
+    else:
+        report = _query_report(task_info)
+    if not report:
+        return json.dumps({
+            "status": Status.error,
+            "message": "Failed to query progress."
+        }, cls=JsonEncoder, indent=4)
+
+    progress = _parse_report(report)
+    progress = _statistic_progress(progress)
+    if progress["status"] == Status.running and not task_info:
+        progress["status"] = Status.stopped
+
+    return json.dumps(progress, cls=JsonEncoder, indent=4)
+
+
+def print_progress():
+    if not check_env():
+        return
+    """Print progress."""
+    print(query_progress())
+
+
+if __name__ == "__main__":
+    print_progress()
diff --git a/vega/security/rest.py b/vega/security/rest.py
new file mode 100644
index 00000000..6d74cee6
--- /dev/null
+++ b/vega/security/rest.py
@@ -0,0 +1,27 @@
+# -*- coding: utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""Rest operation."""
+
+import requests
+from vega.common import General
+
+
+def post(host, files, data):
+    """Post a REST requstion."""
+    if General.security_setting.get("security").get("enable"):
+        pem_file = General.security_setting.get("https").get("cert_pem_file")
+        if not pem_file:
+            print("CERT file ({}) is not existed.".format(pem_file))
+        result = requests.post(host, files=files, data=data, proxies={"https": None}, verify=pem_file)
+    else:
+        result = requests.post(host, files=files, data=data, proxies={"http": None})
+    data = result.json()
+    return data
diff --git a/vega/security/run_dask.py b/vega/security/run_dask.py
new file mode 100644
index 00000000..faa6578c
--- /dev/null
+++ b/vega/security/run_dask.py
@@ -0,0 +1,83 @@
+# -*- coding: utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""Run dask scheduler and worker."""
+import os
+import subprocess
+import shutil
+from distributed import Client
+from vega.common.utils import get_available_port
+
+
+def get_client(address):
+    """Get dask client."""
+    return Client(address)
+
+
+def get_address(master_host, master_port):
+    """Get master address."""
+    return "tcp://{}:{}".format(master_host, master_port)
+
+
+def run_scheduler(port):
+    """Run scheduler."""
+    dashboard_port = get_available_port(min_port=30000, max_port=30999)
+    """Run dask-scheduler."""
+    return subprocess.Popen(
+        [
+            "dask-scheduler",
+            ""
+            "--no-dashboard",
+            "--no-show",
+            "--host=127.0.0.1",
+            port,
+            f"--dashboard-address={dashboard_port}"
+        ],
+        env=os.environ
+    )
+
+
+def run_local_worker(address, local_dir):
+    """Run dask-worker on local node."""
+    work_port = get_available_port(min_port=31000, max_port=31999)
+    nanny_port = get_available_port(min_port=32000, max_port=32999)
+    dashboard_address = get_available_port(min_port=33000, max_port=33999)
+    return subprocess.Popen(
+        [
+            "dask-worker",
+            address,
+            '--nthreads=1',
+            '--nprocs=1',
+            '--memory-limit=0',
+            local_dir,
+            "--no-dashboard",
+            f'--listen-address=tcp://127.0.0.1:{work_port}',
+            f'--nanny-port={nanny_port}',
+            f'--dashboard-address={dashboard_address}'
+        ],
+        env=os.environ
+    )
+
+
+def run_remote_worker(slave_ip, address, local_dir):
+    """Run dask-worker on remote node."""
+    return subprocess.Popen(
+        [
+            "ssh",
+            slave_ip,
+            shutil.which("dask-worker"),
+            address,
+            '--nthreads=1',
+            '--nprocs=1',
+            '--memory-limit=0',
+            local_dir
+        ],
+        env=os.environ
+    )
diff --git a/vega/security/run_flask.py b/vega/security/run_flask.py
new file mode 100644
index 00000000..749be844
--- /dev/null
+++ b/vega/security/run_flask.py
@@ -0,0 +1,168 @@
+# -*- coding: utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""Run Flask."""
+
+import configparser
+import getpass
+import re
+import os
+import ssl
+import gevent
+from gevent import pywsgi
+
+
+security_mode = True
+cert_pem_file = ""
+secret_key_file = ""
+white_list = None
+request_frequency_limit = "100/minute"
+max_content_length = 1000 * 1000 * 1000
+
+
+def load_security_setting():
+    """Load security settings."""
+    home = os.environ['HOME']
+    config_file = os.path.join(home, ".vega/vega.ini")
+    if not os.path.exists(config_file):
+        print(f"Not found configure file: {config_file}")
+        return False
+    config = configparser.ConfigParser()
+    config.read(config_file)
+    if "limit" in config:
+        global white_list
+        global request_frequency_limit
+        global max_content_length
+        if "white_list" in config["limit"]:
+            white_list = config["limit"]["white_list"].split(',')
+        if "request_frequency_limit" in config["limit"]:
+            request_frequency_limit = config["limit"]["request_frequency_limit"]
+        if "max_content_length" in config["limit"]:
+            max_content_length = int(config["limit"]["max_content_length"])
+    if "security" not in config or "enable" not in config["security"]:
+        print(f"Invalid config file: {config_file},security field must be included")
+        return False
+    global security_mode
+    security_mode = True if config["security"]["enable"].upper() == "TRUE" else False
+    if security_mode:
+        if "https" not in config or \
+                "cert_pem_file" not in config["https"] or \
+                "secret_key_file" not in config["https"]:
+            print(f"Invalid config file: {config_file},https field must be included")
+            return False
+        https_config = config["https"]
+        global cert_pem_file
+        global secret_key_file
+        if not os.path.exists(https_config['cert_pem_file']):
+            print(f"CERT file ({https_config['cert_pem_file']}) is not existed.")
+            return False
+        if not os.path.exists(https_config['secret_key_file']):
+            print(f"KEY file ({https_config['secret_key_file']}) is not existed.")
+            return False
+        cert_pem_file = https_config['cert_pem_file']
+        secret_key_file = https_config['secret_key_file']
+    return True
+
+
+def get_white_list():
+    """Get white list."""
+    global white_list
+    return white_list
+
+
+def get_request_frequency_limit():
+    """Get request frequncy limit."""
+    global request_frequency_limit
+    return request_frequency_limit
+
+
+def get_max_content_length():
+    """Get max contect length."""
+    global max_content_length
+    return max_content_length
+
+
+def check_password_rule(password):
+    """Check password rule."""
+    digit_regex = re.compile(r'\d')
+    upper_regex = re.compile(r'[A-Z]')
+    lower_regex = re.compile(r'[a-z]')
+
+    if len(password) < 8:
+        print("The length of your password must >= 8")
+        return False
+
+    if len(digit_regex.findall(password)) == 0:
+        print("Your password must contains digit letters")
+        return False
+
+    if len(upper_regex.findall(password)) == 0:
+        print("Your password must contains capital letters")
+        return False
+
+    if len(lower_regex.findall(password)) == 0:
+        print("Your password must contains lowercase letters")
+        return False
+
+    return True
+
+
+def get_secret_key_passwd():
+    """Get secret key password."""
+    password = getpass.getpass("Please input password of your server key: ")
+
+    if not check_password_rule(password):
+        print("You should re-generate your server cert/key by a password with following rules:")
+        print("1. equals to or longer than 8 letters")
+        print("2. contains at least one digit letter")
+        print("3. contains at least one capital letter")
+        print("4. contains at least one lowercase letter")
+        return None
+
+    return password
+
+
+def run_flask(app, host, port):
+    """Run flask."""
+    if not load_security_setting():
+        return
+
+    app.config['MAX_CONTENT_LENGTH'] = get_max_content_length()
+
+    global security_mode
+    if security_mode:
+        ciphers = "ECDHE-ECDSA-AES128-CCM:ECDHE-ECDSA-AES256-CCM:ECDHE-ECDSA-AES128-GCM-SHA256"\
+                  ":ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384"\
+                  ":DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-DSS-AES128-GCM-SHA256"\
+                  ":DHE-DSS-AES256-GCM-SHA384:DHE-RSA-AES128-CCM:DHE-RSA-AES256-CCM"
+        password = get_secret_key_passwd()
+        if password is None:
+            return
+        global cert_pem_file
+        global secret_key_file
+        context = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2)
+        context.set_ciphers(ciphers)
+        context.load_cert_chain(certfile=cert_pem_file, keyfile=secret_key_file, password=password)
+        server = pywsgi.WSGIServer((host, port), app, ssl_context=context)
+    else:
+        server = pywsgi.WSGIServer((host, port), app)
+
+    server.init_socket()
+    server._stop_event.clear()
+
+    def server_forever():
+        server.start_accepting()
+        print("server started.")
+        server._stop_event.wait()
+        gevent.wait()
+
+    from multiprocessing import Process
+    p = Process(target=server_forever)
+    p.start()
diff --git a/vega/security/run_pipeline.py b/vega/security/run_pipeline.py
new file mode 100644
index 00000000..67a75c07
--- /dev/null
+++ b/vega/security/run_pipeline.py
@@ -0,0 +1,289 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""Run pipeline."""
+import configparser
+import logging
+import os
+import sys
+import vega
+from copy import deepcopy
+
+from vega.common.general import General
+from vega.common.config import Config
+from vega.common.utils import verify_requires
+from vega.common import argment_parser
+from vega.core.pipeline.conf import PipeStepConfig
+
+
+def _append_env():
+    dir_path = os.getcwd()
+    sys.path.insert(0, dir_path)
+    if "PYTHONPATH" not in os.environ:
+        os.environ["PYTHONPATH"] = dir_path
+    else:
+        os.environ["PYTHONPATH"] += ":{}".format(dir_path)
+
+
+def _parse_args():
+    parser = argment_parser("Run Vega")
+    parser.add_argument("config_file", default=None, type=str,
+                        help="Pipeline config file name")
+    group_backend = parser.add_argument_group(
+        title="set backend and device, priority: specified in the command line > "
+              "specified in the configuration file > default settings(pytorch and GPU)")
+    group_backend.add_argument("-b", "--backend", default=None, type=str,
+                               choices=["pytorch", "p", "tensorflow", "t", "mindspore", "m"],
+                               help="set training platform")
+    group_backend.add_argument("-d", "--device", default=None, type=str,
+                               choices=["GPU", "NPU"],
+                               help="set training device")
+    group_resume = parser.add_argument_group(title="Resume not finished task")
+    group_resume.add_argument("-r", "--resume", action='store_true',
+                              help="resume not finished task")
+    group_resume.add_argument("-t", "--task_id", default=None, type=str,
+                              help="specify the ID of the task to be resumed")
+    group_config = parser.add_argument_group(title='Modify config for yml')
+    group_config.add_argument("-m", "--modify", action='store_true',
+                              help="modify some config")
+    group_config.add_argument("-dt", "--dataset", default=None, type=str,
+                              help='modify dataset for all pipe_step')
+    group_config.add_argument("-dp", "--data_path", default=None, type=str,
+                              help="modify data_path for all pipe_step")
+    group_config.add_argument("-bs", "--batch_size", default=None, type=str,
+                              help='modify batch_size of dataset for all pipe_step')
+    group_config.add_argument("-es", "--epochs", default=None, type=str,
+                              help='modify fully_train epochs')
+    group_config.add_argument("-f", "--force", default=None, action="store_true",
+                              help='skip check validation of pretrained model')
+    args = parser.parse_args()
+    return args
+
+
+def _modify_config(args, cfg):
+    if isinstance(cfg, dict):
+        for key in cfg.keys():
+            if key in args.keys():
+                if isinstance(cfg[key], dict):
+                    cfg[key] = _modify_config(args[key], cfg[key])
+                else:
+                    cfg[key] = args[key]
+            cfg[key] = _modify_config(args, cfg[key])
+    return deepcopy(cfg)
+
+
+def _check_parse(args):
+    keys = [key for key in args.keys()]
+    for key in keys:
+        if args[key] is None:
+            args.pop(key)
+    if 'dataset' in args.keys():
+        dataset_type = args['dataset']
+        args['dataset'] = {'type': dataset_type}
+    return args
+
+
+def _set_backend(args):
+    backend = args.backend
+    device = args.device
+    if backend:
+        if args.backend in ["pytorch", "p"]:
+            backend = "pytorch"
+        elif args.backend in ["tensorflow", "t"]:
+            backend = "tensorflow"
+        elif args.backend in ["mindspore", "m"]:
+            backend = "mindspore"
+    else:
+        config = Config(args.config_file)
+        if "general" in config and "backend" in config["general"]:
+            backend = config["general"]["backend"]
+    if not device:
+        config = Config(args.config_file)
+        if "general" in config and "device_category" in config["general"]:
+            device = config["general"]["device_category"]
+    if backend:
+        General.backend = backend
+    if device:
+        General.device_category = device
+    vega.set_backend(General.backend, General.device_category)
+
+
+def _resume(args):
+    if args.resume:
+        if not args.task_id:
+            raise Exception("Please set task id (-t task_id) if you want resume not finished task.")
+        from vega.common.general import TaskConfig
+        General.task.task_id = args.task_id
+        General._resume = True
+        TaskConfig.backup_original_value(force=True)
+        General.backup_original_value(force=True)
+
+
+def _backup_config(args):
+    _file = args.config_file
+    from vega.common.task_ops import TaskOps
+    from vega.common.file_ops import FileOps
+    dest_file = FileOps.join_path(TaskOps().local_output_path, os.path.basename(_file))
+    FileOps.make_base_dir(dest_file)
+    FileOps.copy_file(_file, dest_file)
+
+
+def _change_process_name():
+    from ctypes import cdll, byref, create_string_buffer
+    libc = cdll.LoadLibrary('libc.so.6')
+    buff = create_string_buffer(bytes("vega-main", "utf-8"))
+    libc.prctl(15, byref(buff), 0, 0, 0)
+
+
+class LoadConfigException(Exception):
+    """Load config exception."""
+
+    pass
+
+
+def _read_config_file():
+    """Read config file and return ConfigParser."""
+    vega_config_file = os.path.join(os.environ['HOME'], ".vega", "vega.ini")
+    if not os.path.exists(vega_config_file):
+        raise LoadConfigException(f"Not found configure file: {vega_config_file}")
+    config = configparser.ConfigParser()
+    config.read(vega_config_file)
+    return config
+
+
+def _parse_config(config):
+    General.security_setting = config._sections
+    General.security_setting.get("security")["enable"] = True \
+        if str(General.security_setting.get("security").get("enable")).upper() == "TRUE" else False
+
+
+def _get_config_field(config, field):
+    if field not in config:
+        raise LoadConfigException("field <{}> is not existed in config file".format(field))
+    return config[field]
+
+
+def _get_config_key(config, key, field):
+    if key not in config:
+        raise LoadConfigException("key <{}> is not in field <{}> of config file".format(key, field))
+    return config[key]
+
+
+def _check_if_file_config_correct(config, key, field):
+    file = _get_config_key(config, key, field)
+    if not os.path.exists(file):
+        raise LoadConfigException("file <{}> is not existed.".format(file))
+
+
+def _check_security_switch_valid(config):
+    if "security" not in config or "enable" not in config["security"]:
+        raise LoadConfigException("Invalid config file: security field must be included")
+
+
+def _get_security_switch_on_off(config):
+    return True if config["security"]["enable"].upper() == "TRUE" else False
+
+
+def load_security_setting():
+    """Load security settings."""
+    try:
+        config = _read_config_file()
+        _check_security_switch_valid(config)
+        security_mode = _get_security_switch_on_off(config)
+        if not security_mode:
+            General.security_setting = {
+                "security": {
+                    "enable": False
+                }
+            }
+            return True
+        _check_config_validation(config)
+        _parse_config(config)
+    except LoadConfigException:
+        logging.warning("load_security_setting failed")
+        return False
+    return True
+
+
+def _check_config_validation(config):
+    https_config = _get_config_field(config, "https")
+    _check_if_file_config_correct(https_config, "cert_pem_file", "https")
+
+
+def check_env():
+    """Check environment."""
+    if not load_security_setting():
+        return False
+    return True
+
+
+def contained_pth_file(config):
+    """Get contained pth file."""
+    file_list = []
+    if isinstance(config, Config):
+        for value in config.values():
+            if isinstance(value, str) and (str(value).endswith(".pth") or str(value).endswith(".pth.tar")):
+                file_list.append(value)
+            file_list.extend(contained_pth_file(value))
+    return file_list
+
+
+def check_pth_file(args, config):
+    """Check pth file."""
+    if args.force:
+        return True
+    file_list = contained_pth_file(config)
+    if len(file_list) > 0:
+        print("There are pth files: ")
+        print(file_list)
+        user_confirm = input("It is possible to construct malicious pickle data "
+                             "which will execute arbitrary code during unpickling pth file. "
+                             "\nPlease ensure the safety and consistency of the model file. "
+                             "\nDo you want to continue? (yes/no) ")
+        while user_confirm != "yes" and user_confirm != "no":
+            user_confirm = input("Please enter yes or no! ")
+        if user_confirm == "yes":
+            return True
+        elif user_confirm == "no":
+            return False
+    return True
+
+
+def run_pipeline(load_special_lib_func=None):
+    """Run pipeline."""
+    os.umask(0o027)
+    args = _parse_args()
+    _resume(args)
+    _set_backend(args)
+    _append_env()
+    if load_special_lib_func:
+        load_special_lib_func(args.config_file)
+    config = Config(args.config_file)
+    # load general
+    if config.get("general"):
+        General.from_dict(config.get("general"), skip_check=False)
+    os.environ['TF_CPP_MIN_LOG_LEVEL'] = str(General.TF_CPP_MIN_LOG_LEVEL)
+    # check env
+    if not check_env():
+        return
+    if not check_pth_file(args, config):
+        return
+    if General.requires and not verify_requires(General.requires):
+        return
+    dict_args = vars(args)
+    dict_args = _check_parse(dict_args)
+    config = _modify_config(dict_args, config)
+    # _backup_config(args)
+    _change_process_name()
+    vega.run(config)
+
+
+if __name__ == '__main__':
+    run_pipeline()
diff --git a/vega/security/setup.py b/vega/security/setup.py
new file mode 100644
index 00000000..6a7db5d0
--- /dev/null
+++ b/vega/security/setup.py
@@ -0,0 +1,91 @@
+# -*- coding: utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""Setuptools of vega."""
+
+import os
+import setuptools
+import sys
+from setuptools.command.install import install as _install
+
+if sys.version_info < (3, 6):
+    sys.exit("Sorry, Python < 3.6 is not supported.")
+
+with open("RELEASE.md", "r") as fh:
+    long_desc = fh.read()
+
+
+def _post_install():
+    vega_dir = os.path.join(os.getenv("HOME"), ".vega")
+    os.makedirs(vega_dir, exist_ok=True)
+    vega_config_file = os.path.join(vega_dir, "vega.ini")
+    if os.path.exists(vega_config_file):
+        return
+
+    with open(vega_config_file, "w") as wf:
+        wf.write("[security]\n")
+        wf.write("enable=True\n")
+        wf.write("\n")
+        wf.write("[https]\n")
+        wf.write("cert_pem_file=\n")
+        wf.write("secret_key_file=\n")
+        wf.write("\n")
+        wf.write("[limit]\n")
+        wf.write("request_frequency_limit=100/minute\n")
+        wf.write("max_content_length=1000000000\n")
+        wf.write("#white_list=0.0.0.0,127.0.0.1\n")
+
+
+class install(_install):
+    """Post installation."""
+
+    def run(self):
+        """Run."""
+        _install.run(self)
+        self.execute(_post_install, (), msg="Running pre install task")
+
+
+cmd_class = dict(install=install)
+
+setuptools.setup(
+    name="noah-vega",
+    cmdclass=cmd_class,
+    version="1.7.0.mindstudio",
+    packages=["vega", "evaluate_service"],
+    include_package_data=True,
+    python_requires=">=3.6",
+    author="Huawei Noah's Ark Lab",
+    author_email="",
+    description="AutoML Toolkit",
+    long_description=long_desc,
+    long_description_content_type="text/markdown",
+    license="MIT",
+    url="https://github.com/huawei-noah/vega",
+    # packages=setuptools.find_packages(),
+    classifiers=[
+        "Programming Language :: Python :: 3",
+        "License :: OSI Approved :: MIT License",
+        "Operating System :: POSIX :: Linux",
+    ],
+    install_requires=[
+        "pyzmq",
+    ],
+    entry_points="""
+        [console_scripts]
+        vega=vega.tools.run_pipeline:run_pipeline
+        vega-security-config=vega.tools.config_op:vega_config_operate
+        vega-kill=vega.tools.kill:_kill
+        vega-verify-cluster=vega.tools.verify_cluster:_verify_cluster
+        vega-fine-tune=vega.tools.fine_tune:_fine_tune
+        vega-progress=vega.tools.query_progress:print_progress
+        vega-process=vega.tools.query_process:print_processes
+        vega-evaluate-service=evaluate_service.main:run
+      """,
+)
diff --git a/vega/tools/run_pipeline.py b/vega/tools/run_pipeline.py
index 48b341a1..0a2f8e9d 100644
--- a/vega/tools/run_pipeline.py
+++ b/vega/tools/run_pipeline.py
@@ -8,7 +8,7 @@
 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 # MIT License for more details.
 
-"""Run example."""
+"""Run pipeline."""
 
 import os
 import sys
@@ -35,7 +35,7 @@ def _parse_args():
                         help="Pipeline config file name")
     group_backend = parser.add_argument_group(
         title="set backend and device, priority: specified in the command line > "
-        "specified in the configuration file > default settings(pytorch and GPU)")
+              "specified in the configuration file > default settings(pytorch and GPU)")
     group_backend.add_argument("-b", "--backend", default=None, type=str,
                                choices=["pytorch", "p", "tensorflow", "t", "mindspore", "m"],
                                help="set training platform")
diff --git a/vega/tools/run_slave.py b/vega/tools/run_slave.py
index 57572813..9da44281 100644
--- a/vega/tools/run_slave.py
+++ b/vega/tools/run_slave.py
@@ -34,7 +34,3 @@ def run_dask_worker(master_ip, port, num_workers):
         raise Exception("Failed to start dask-worker. Gave up.")
     else:
         print("dask-worker running.")
-
-
-if __name__ == "__main__":
-    run_dask_worker()
diff --git a/vega/tools/verify_cluster.py b/vega/tools/verify_cluster.py
index b749e78c..0f0ddc15 100644
--- a/vega/tools/verify_cluster.py
+++ b/vega/tools/verify_cluster.py
@@ -182,13 +182,13 @@ def _init_dask_scheduler(args):
     global _port
     _port = str(get_available_port())
     try:
-        result = _popen(["dask-scheduler", "--port", _port])
+        result = _popen(["dask-scheduler", "--no-dashboard", "--no-show", "--port", _port])
     except Exception:
         raise Exception("Failed to start dask scheduler.")
     if not isinstance(result, subprocess.Popen):
         _print("Failed to start dask scheduler.")
         _print("Please run the command in CLI, and resovlue the problems.")
-        _print(f"dask-scheduler --port {_port}")
+        _print(f"dask-scheduler --no-dashboard --no-show --port {_port}")
         raise Exception("Failed to start dask scheduler.")
     time.sleep(5)
     _print("Pass.")
diff --git a/vega/trainer/callbacks/__init__.py b/vega/trainer/callbacks/__init__.py
index fcb90eee..041e0649 100644
--- a/vega/trainer/callbacks/__init__.py
+++ b/vega/trainer/callbacks/__init__.py
@@ -2,10 +2,8 @@
 from .callback_list import CallbackList
 from vega.common.class_factory import ClassFactory
 
-
 __all__ = ["Callback", "CallbackList"]
 
-
 ClassFactory.lazy_register("vega.trainer.callbacks", {
     "metrics_evaluator": ["trainer.callback:MetricsEvaluator"],
     "progress_logger": ["trainer.callback:ProgressLogger"],
@@ -21,5 +19,8 @@
     "visual_callback": ["trainer.callback:VisualCallBack"],
     "model_tuner": ["trainer.callback:ModelTuner"],
     "timm_trainer_callback": ["trainer.callback:TimmTrainerCallback"],
-    "data_parallel": ["trainer.callback:DataParallel"],
+    "ddp_torch": ["trainer.callback:DdpTorch"],
+    "fusion": ["trainer.callback:OperatorFusionCallback"],
+    "horovod": ["trainer.callback:Horovod"],
+    "hccl": ["trainer.callback:Hccl"],
 })
diff --git a/vega/trainer/callbacks/callback_list.py b/vega/trainer/callbacks/callback_list.py
index efd063dd..1235add1 100644
--- a/vega/trainer/callbacks/callback_list.py
+++ b/vega/trainer/callbacks/callback_list.py
@@ -27,7 +27,6 @@ def __init__(self, customs, disables):
         self.model_fn = None
         self.train_input_fn = None
         self.valid_input_fn = None
-        self.params = {}
         self.callbacks = self._get_callbacks(customs, disables)
         for callback in self.callbacks:
             # Get make_batch if callback has defined one
@@ -72,13 +71,13 @@ def _get_callbacks(self, customs, disables):
         if vega.is_torch_backend():
             defaults = ["ModelStatistics", "MetricsEvaluator", "ModelCheckpoint", "ModelBuilder", "PerformanceSaver",
                         "RuntimeCallback", "LearningRateScheduler", "ProgressLogger", "ReportCallback",
-                        "DataParallel"]
+                        "DdpTorch", "Horovod", "Hccl"]
         elif vega.is_tf_backend():
             defaults = ["ModelStatistics", "MetricsEvaluator", "ModelCheckpoint", "ModelBuilder", "PerformanceSaver",
-                        "RuntimeCallback", "ProgressLogger", "ReportCallback", ]
+                        "RuntimeCallback", "ProgressLogger", "ReportCallback", "Horovod", "Hccl"]
         elif vega.is_ms_backend():
             defaults = ["ModelStatistics", "MetricsEvaluator", "ModelCheckpoint", "ModelBuilder", "PerformanceSaver",
-                        "ProgressLogger", "ReportCallback"]
+                        "ProgressLogger", "ReportCallback", "Hccl"]
 
         custom_disables = []
         disables = disables if disables else []
@@ -109,16 +108,7 @@ def _get_callbacks(self, customs, disables):
         return callbacks
 
     def _set_params(self, trainer):
-        params = {
-            'epochs': trainer.epochs,
-            'is_chief': trainer.is_chief,
-            'use_cuda': vega.is_gpu_device(),
-            'do_validation': trainer.do_validation,
-            'is_detection_trainer': trainer.config.is_detection_trainer
-        }
-        self.params = params
-        for callback in self.callbacks:
-            callback.set_params(params)
+        pass
 
     def set_trainer(self, trainer):
         """Set the trainer object for callback container."""
@@ -153,7 +143,8 @@ def _set_callback_func(self):
         # Replace the default model_fn of Trainer
         if self.model_fn is not None:
             self.trainer.model_fn = self.model_fn
-            self.trainer._init_tf_estimator()
+            if hasattr(self.trainer, "_init_tf_estimator"):
+                self.trainer._init_tf_estimator()
         # Replace the default train_input_fn of Trainer
         if self.train_input_fn is not None:
             self.trainer.train_input_fn = self.train_input_fn
diff --git a/vega/trainer/callbacks/callbacks.md b/vega/trainer/callbacks/callbacks.md
index 17fd25b4..ea11896b 100644
--- a/vega/trainer/callbacks/callbacks.md
+++ b/vega/trainer/callbacks/callbacks.md
@@ -10,7 +10,7 @@
 | MetricsEvaluator          | 230   |   | √ | √ | √ |   |   |   |   |   |   | √ | √ | √ |   | √ | √ | √ |
 | ModelCheckpoint           | 240   |   | √ |   |   |   |   |   |   |   |   |   | √ |   |   |   |   |   |
 | PerformanceSaver          | 250   |   | √ |   |   |   |   |   |   |   |   |   | √ | √ |   |   |   |   |
-| DataParallel              | 260   |   | √ |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
+| DdpTorch                  | 260   |   | √ |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
 | ProgressLogger            | 270   |   | √ | √ | √ |   |   |   |   |   |   | √ |   | √ |   |   | √ | √ |
 | ReportCallback            | 280   |   | √ |   |   |   |   |   |   |   |   |   | √ | √ |   |   |   | √ |
 | VisualCallBack            | 290   |   | √ |   |   |   |   |   |   |   |   | √ | √ | √ | √ |   |   |   |
diff --git a/vega/trainer/callbacks/data_parallel.py b/vega/trainer/callbacks/ddp_torch.py
similarity index 51%
rename from vega/trainer/callbacks/data_parallel.py
rename to vega/trainer/callbacks/ddp_torch.py
index f8d2cc6e..6be7afe8 100644
--- a/vega/trainer/callbacks/data_parallel.py
+++ b/vega/trainer/callbacks/ddp_torch.py
@@ -10,8 +10,8 @@
 
 """Data parallel callback."""
 
-import os
 import logging
+import torch
 import vega
 from .callback import Callback
 from vega.common import ClassFactory, ClassType
@@ -21,30 +21,18 @@
 
 
 @ClassFactory.register(ClassType.CALLBACK)
-class DataParallel(Callback):
+class DdpTorch(Callback):
     """Callback that saves the evaluated Performance."""
 
     def __init__(self):
         """Initialize ModelCheckpoint callback."""
-        super(DataParallel, self).__init__()
+        super(DdpTorch, self).__init__()
         self.priority = 260
 
     def before_train(self, logs=None):
         """Be called before the training process."""
-        if not vega.is_torch_backend() or not General._parallel or General.devices_per_trainer == 1:
+        if not vega.is_torch_backend() or not vega.is_gpu_device():
             return
-        model = self.trainer.model
-        import torch
-        if vega.is_gpu_device():
-            if General._parallel and General.devices_per_trainer > 1:
-                model = torch.nn.DataParallel(model)
-        elif vega.is_npu_device():
-            if General._parallel and General.devices_per_trainer > 1:
-                import torch.distributed as dist
-                dist.init_process_group(
-                    backend='hccl', world_size=int(os.environ['WORLD_SIZE']),
-                    rank=int(os.environ['RANK_ID']))
-                model = torch.nn.parallel.DistributedDataParallel(
-                    model,
-                    device_ids=[int(os.environ['DEVICE_ID'])])
-        self.trainer.model = model
+        if not General._parallel or General.devices_per_trainer <= 1:
+            return
+        self.trainer.model = torch.nn.DataParallel(self.trainer.model)
diff --git a/vega/trainer/callbacks/detection_metrics_evaluator.py b/vega/trainer/callbacks/detection_metrics_evaluator.py
index 3dd80f0d..58e5b8f7 100644
--- a/vega/trainer/callbacks/detection_metrics_evaluator.py
+++ b/vega/trainer/callbacks/detection_metrics_evaluator.py
@@ -36,7 +36,7 @@ def before_epoch(self, epoch, logs=None):
 
     def after_train_step(self, batch_index, logs=None):
         """Be called after each train batch."""
-        input, target = self.train_batch
+        input, _ = self.train_batch
         batch_size = input.size(0)
         self.cur_loss = logs['loss']
         self.loss_avg = self._average_loss_during_train_period(batch_size, self.cur_loss)
@@ -44,8 +44,8 @@ def after_train_step(self, batch_index, logs=None):
 
     def after_valid_step(self, batch_index, logs=None):
         """Be called after each batch of validation."""
-        if self.do_validation and self.valid_metrics is not None:
-            input, target = self.valid_batch
+        if self.trainer.do_validation and self.valid_metrics is not None:
+            _, target = self.valid_batch
             output = logs['valid_batch_output']
             self.valid_metrics(output, target)
 
diff --git a/vega/trainer/callbacks/detection_progress_logger.py b/vega/trainer/callbacks/detection_progress_logger.py
index f13673f6..2a8c0dbc 100644
--- a/vega/trainer/callbacks/detection_progress_logger.py
+++ b/vega/trainer/callbacks/detection_progress_logger.py
@@ -24,7 +24,7 @@ class DetectionProgressLogger(ProgressLogger):
 
     def after_train_step(self, batch_index, logs=None):
         """Be called before each batch training."""
-        if self.train_verbose >= 2 and self.is_chief \
+        if self.train_verbose >= 2 and self.trainer.is_chief \
                 and batch_index % self.train_report_steps == 0:
             try:
                 out_buffer = OrderedDict(
@@ -47,8 +47,8 @@ def after_train_step(self, batch_index, logs=None):
 
     def after_valid_step(self, batch_index, logs=None):
         """Be called after each batch of the validation."""
-        if self.valid_verbose >= 2 and self.is_chief \
-                and self.do_validation and batch_index % self.valid_report_steps == 0:
+        if self.valid_verbose >= 2 and self.trainer.is_chief \
+                and self.trainer.do_validation and batch_index % self.valid_report_steps == 0:
             metrics_results = logs.get('valid_step_metrics', None)
             if metrics_results is not None:
                 out_buffer = OrderedDict(
@@ -63,7 +63,7 @@ def after_valid_step(self, batch_index, logs=None):
 
     def after_valid(self, logs=None):
         """Be called after validation."""
-        if (self.valid_verbose >= 1 and self.is_chief and self.do_validation):
+        if (self.valid_verbose >= 1 and self.trainer.is_chief and self.trainer.do_validation):
             cur_valid_perfs = logs.get('cur_valid_perfs', None)
             if cur_valid_perfs is not None:
                 log_info = "epoch [{}/{}], current valid perfs {}".format(
diff --git a/vega/trainer/callbacks/fusion.py b/vega/trainer/callbacks/fusion.py
new file mode 100644
index 00000000..8c4f077d
--- /dev/null
+++ b/vega/trainer/callbacks/fusion.py
@@ -0,0 +1,54 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""Callbacks called at certain points of trainer."""
+import logging
+import copy
+import vega
+from vega.common.class_factory import ClassFactory, ClassType
+from vega.trainer.callbacks.callback import Callback
+from vega.modules.operators import Identity
+
+if vega.is_torch_backend():
+    import torch
+    from torch.nn.utils.fusion import fuse_conv_bn_weights
+
+
+@ClassFactory.register(ClassType.CALLBACK)
+class OperatorFusionCallback(Callback):
+    """Callback that fuse operators when valid model."""
+
+    def __init__(self):
+        """Construct a OperatorFusionCallback callback."""
+        super(OperatorFusionCallback, self).__init__()
+
+    def after_train(self, logs=None):
+        """Be called before the validation."""
+        if not vega.is_torch_backend() or self.trainer.model.__class__.__name__ != 'DagNetwork':
+            return
+        logging.info("Start operator fusion.")
+        for name, node in self.trainer.model.module_map.items():
+            module = node.module
+            if isinstance(node.module, torch.nn.Conv2d):
+                next_nodes = node.child_nodes
+                if next_nodes and isinstance(next_nodes[0].module, torch.nn.BatchNorm2d):
+                    node.module = self._fuse_conv_bn(module, next_nodes[0].module)
+                    next_nodes[0].module = Identity()
+        self._save_model()
+
+    def _fuse_conv_bn(self, conv, bn):
+        fused_conv = copy.deepcopy(conv)
+        fused_conv.weight, fused_conv.bias = fuse_conv_bn_weights(
+            fused_conv.weight, fused_conv.bias, bn.running_mean, bn.running_var, bn.eps, bn.weight, bn.bias)
+        return fused_conv
+
+    def _save_model(self):
+        if vega.is_torch_backend():
+            torch.save(self.trainer.model.state_dict(), self.trainer.weights_file)
diff --git a/vega/trainer/callbacks/hccl.py b/vega/trainer/callbacks/hccl.py
new file mode 100644
index 00000000..1151ce73
--- /dev/null
+++ b/vega/trainer/callbacks/hccl.py
@@ -0,0 +1,77 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""Data parallel callback."""
+
+import logging
+import vega
+from .callback import Callback
+from vega.common import ClassFactory, ClassType
+from vega.common.general import General
+
+logger = logging.getLogger(__name__)
+
+
+@ClassFactory.register(ClassType.CALLBACK)
+class Hccl(Callback):
+    """Callback that saves the evaluated Performance."""
+
+    def __init__(self):
+        """Initialize ModelCheckpoint callback."""
+        super(Hccl, self).__init__()
+        self.priority = 260
+
+    def init_trainer(self, logs=None):
+        """Set trainer object for current callback."""
+        if not self.trainer.hccl:
+            return
+
+        if vega.is_torch_backend():
+            self._init_pytorch_trainer()
+        if vega.is_ms_backend():
+            self._init_ms_trainer()
+
+    def _init_pytorch_trainer(self):
+        import torch
+        import torch.distributed as dist
+        logger.info("init HCCL")
+        model = self.trainer.model
+        dist.init_process_group(
+            backend='hccl',
+            init_method=f"tcp://{General.cluster.hccl_server_ip}:{General.cluster.hccl_port}",
+            world_size=self.trainer.num_workers,
+            rank=self.trainer.rank_id)
+        model = torch.nn.parallel.DistributedDataParallel(
+            model,
+            device_ids=[self.trainer.device_id],
+            broadcast_buffers=General.cluster.enable_broadcast_buffers)
+        self.trainer.model = model
+
+    def _init_ms_trainer(self):
+        from mindspore import context
+        from mindspore.context import ParallelMode
+        from mindspore.communication.management import init
+
+        logger.info("init HCCL")
+        context.set_auto_parallel_context(parallel_mode=ParallelMode.DATA_PARALLEL, gradients_mean=True)
+        init()
+
+    def before_epoch(self, epoch, logs=None):
+        """Be called before each epoach."""
+        if not vega.is_torch_backend() or not self.trainer.hccl:
+            return
+        if self.trainer.sampler is not None:
+            self.trainer.sampler.set_epoch(epoch)
+
+    def after_train(self, logs=None):
+        """Stop session."""
+        if self.trainer.hccl and vega.is_tf_backend():
+            self.trainer.sess.run(self.trainer.npu_shutdown)
+            self.trainer.sess.close()
diff --git a/vega/trainer/callbacks/horovod.py b/vega/trainer/callbacks/horovod.py
new file mode 100644
index 00000000..073c1fe5
--- /dev/null
+++ b/vega/trainer/callbacks/horovod.py
@@ -0,0 +1,60 @@
+# -*- coding:utf-8 -*-
+
+# Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the MIT License.
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# MIT License for more details.
+
+"""Data parallel callback."""
+
+import logging
+import vega
+from .callback import Callback
+from vega.common import ClassFactory, ClassType
+
+logger = logging.getLogger(__name__)
+
+
+@ClassFactory.register(ClassType.CALLBACK)
+class Horovod(Callback):
+    """Callback that saves the evaluated Performance."""
+
+    def __init__(self):
+        """Initialize ModelCheckpoint callback."""
+        super(Horovod, self).__init__()
+        self.priority = 260
+
+    def before_train(self, logs=None):
+        """Be called before the training process."""
+        if not self.trainer.horovod:
+            return
+        if vega.is_torch_backend():
+            self._init_torch()
+        # elif vega.is_tf_backend():
+        #     self._init_tf()
+
+    def _init_torch(self):
+        import torch
+        import horovod.torch as hvd
+        hvd.broadcast_parameters(self.trainer.model.state_dict(), root_rank=0)
+        hvd.broadcast_optimizer_state(self.trainer.optimizer, root_rank=0)
+        # torch.cuda.set_device(hvd.local_rank())
+        self.trainer._average_metrics = self._average_metrics
+
+    # def _init_tf(self):
+    #     import horovod.tensorflow as hvd
+    #     # hvd.init()
+    #     # TODO horovod tf
+    #     self.trainer.sess_config.gpu_options.visible_device_list = str(hvd.local_rank())
+
+    def _average_metrics(self, metrics_results):
+        import torch
+        import horovod.torch as hvd
+        for key, value in metrics_results.items():
+            tensor = torch.tensor(value)
+            avg_tensor = hvd.allreduce(tensor, name=key)
+            metrics_results[key] = avg_tensor.item()
+        return metrics_results
diff --git a/vega/trainer/callbacks/lr_scheduler.py b/vega/trainer/callbacks/lr_scheduler.py
index f1625a19..3b76caa2 100644
--- a/vega/trainer/callbacks/lr_scheduler.py
+++ b/vega/trainer/callbacks/lr_scheduler.py
@@ -38,5 +38,6 @@ def after_epoch(self, epoch, logs=None):
     def after_train_step(self, batch_index, logs=None):
         """Call after_train_step of the managed callbacks."""
         if self.lr_scheduler and not self.lr_scheduler.by_epoch:
-            step = self.trainer.batch_num_train * self.epoch + self.epoch + batch_index
+            # step = self.trainer.batch_num_train * self.epoch + self.epoch + batch_index
+            step = self.trainer.batch_num_train * self.epoch + batch_index
             self.lr_scheduler.step(epoch=step)
diff --git a/vega/trainer/callbacks/metrics_evaluator.py b/vega/trainer/callbacks/metrics_evaluator.py
index 70e24749..4b948c4f 100644
--- a/vega/trainer/callbacks/metrics_evaluator.py
+++ b/vega/trainer/callbacks/metrics_evaluator.py
@@ -9,7 +9,7 @@
 # MIT License for more details.
 
 """ProgressLogger call defination."""
-import vega
+
 from copy import deepcopy
 from .callback import Callback
 from vega.common import ClassFactory, ClassType
@@ -26,7 +26,7 @@ def __init__(self):
 
     def before_train(self, logs=None):
         """Be called before the training process."""
-        self.do_validation = self.params.get('do_validation', False)
+        self.do_validation = self.trainer.do_validation
         self.cur_loss = None
         self.loss_avg = None
         self.cur_train_perfs = None
@@ -98,9 +98,8 @@ def after_valid(self, logs=None):
         if self.do_validation and self.valid_metrics is not None:
             # Get the summary of valid metrics
             metrics_results = self.valid_metrics.results
-            if vega.is_torch_backend() and self.trainer.distributed:
-                for key, value in metrics_results.items():
-                    metrics_results[key] = self.trainer._metric_average(value, key)
+            if hasattr(self.trainer, "_average_metrics"):
+                metrics_results = self.trainer._average_metrics(metrics_results)
             if 'loss' in metrics_results:
                 metrics_results.pop('loss')
             if 'global_step' in metrics_results:
@@ -116,7 +115,7 @@ def after_valid(self, logs=None):
                                                                   self.best_valid_perfs)
             logs.update({'cur_valid_perfs': self.cur_valid_perfs,
                          'best_valid_perfs': self.best_valid_perfs,
-                         'best_valid_perfs_changed': self.best_valid_changed})
+                         'best_changed': self.best_valid_changed})
 
     def after_epoch(self, epoch, logs=None):
         """Be called after each epoch."""
@@ -137,7 +136,7 @@ def after_epoch(self, epoch, logs=None):
         if self.do_validation and self.valid_metrics is not None:
             self.summary_perfs.update({'cur_valid_perfs': self.cur_valid_perfs,
                                        'best_valid_perfs': self.best_valid_perfs,
-                                       'best_valid_perfs_changed': self.best_valid_changed})
+                                       'best_changed': self.best_valid_changed})
 
         logs.update({'summary_perfs': self.summary_perfs})
 
diff --git a/vega/trainer/callbacks/model_builder.py b/vega/trainer/callbacks/model_builder.py
index 9aa4f79b..f2a9c0f8 100644
--- a/vega/trainer/callbacks/model_builder.py
+++ b/vega/trainer/callbacks/model_builder.py
@@ -9,12 +9,11 @@
 # MIT License for more details.
 
 """ModelCheckpoint callback defination."""
-import os
-import glob
+
 import logging
 import vega
 from .callback import Callback
-from vega.common import FileOps, Config
+from vega.common import Config
 from vega.common import ClassFactory, ClassType
 from vega.networks.model_config import ModelConfig
 from vega.model_zoo import ModelZoo
@@ -33,71 +32,29 @@ def __init__(self):
 
     def init_trainer(self, logs=None):
         """Set trainer object for current callback."""
-        self.trainer.model = self._init_model()
+        model = self.trainer.model
+        if not model:
+            model = self._init_model()
+        if hasattr(model, "desc"):
+            self.trainer.model_desc = model.desc
+        self.trainer.model = self._set_device(model)
 
     def _init_model(self):
         """Load model desc from save path and parse to model."""
-        model = self.trainer.model
-        if self.trainer.config.is_detection_trainer:
-            model_desc = self.trainer.model_desc or self._get_model_desc()
-        else:
-            model_desc = self._get_model_desc()
-        pretrained_model_file = self._get_pretrained_model_file()
-        if not model:
-            if not model_desc:
-                raise Exception("Failed to Init model, can not get model description.")
-            model = ModelZoo.get_model(model_desc, pretrained_model_file, ModelConfig.head)
-        if model:
-            if hasattr(model, "desc"):
-                self.trainer.model_desc = model.desc
-            if vega.is_torch_backend():
-                if vega.is_gpu_device():
-                    model = model.cuda()
-                elif vega.is_npu_device():
-                    model = model.to(vega.get_devices())
+        config = Config(ModelConfig().to_dict())
+        if self.trainer.model_desc:
+            config.model_desc = self.trainer.model_desc
+        if not config.model_desc:
+            raise Exception("Failed to Init model, can not get model description.")
+        if self.trainer.load_weights_file:
+            config.pretrained_model_file = self.trainer.config.kwargs.get(
+                "pretrained_model_file") or config.pretrained_model_file
+        return ModelZoo.get_model(**config)
+
+    def _set_device(self, model):
+        if vega.is_torch_backend():
+            if vega.is_gpu_device():
+                model = model.cuda()
+            elif vega.is_npu_device():
+                model = model.to(vega.get_devices())
         return model
-
-    def _get_model_desc(self):
-        model_desc = self.trainer.model_desc
-        if not model_desc:
-            if ModelConfig.model_desc_file is not None:
-                desc_file = ModelConfig.model_desc_file
-                desc_file = desc_file.replace("{local_base_path}", self.trainer.local_base_path)
-                if ":" not in desc_file:
-                    desc_file = os.path.abspath(desc_file)
-                if ":" in desc_file:
-                    local_desc_file = FileOps.join_path(
-                        self.trainer.local_output_path, os.path.basename(desc_file))
-                    FileOps.copy_file(desc_file, local_desc_file)
-                    desc_file = local_desc_file
-                model_desc = Config(desc_file)
-                logger.info("net_desc:{}".format(model_desc))
-            elif ModelConfig.model_desc is not None:
-                model_desc = ModelConfig.model_desc
-            elif ModelConfig.models_folder is not None:
-                folder = ModelConfig.models_folder.replace("{local_base_path}", self.trainer.local_base_path)
-                pattern = FileOps.join_path(folder, "desc_*.json")
-                desc_file = glob.glob(pattern)[0]
-                model_desc = Config(desc_file)
-        return model_desc
-
-    def _get_pretrained_model_file(self):
-        if not self.trainer.load_weights_file:
-            return None
-        model_file = self.trainer.config.kwargs.get("pretrained_model_file")
-        if model_file:
-            return model_file
-        if ModelConfig.pretrained_model_file:
-            model_file = ModelConfig.pretrained_model_file
-            model_file = model_file.replace("{local_base_path}", self.trainer.local_base_path)
-            model_file = model_file.replace("{worker_id}", str(self.trainer.worker_id))
-            if ":" not in model_file:
-                model_file = os.path.abspath(model_file)
-            if ":" in model_file:
-                local_model_file = FileOps.join_path(
-                    self.trainer.local_output_path, os.path.basename(model_file))
-                FileOps.copy_file(model_file, local_model_file)
-                model_file = local_model_file
-            return model_file
-        else:
-            return None
diff --git a/vega/trainer/callbacks/model_checkpoint.py b/vega/trainer/callbacks/model_checkpoint.py
index 016eeabb..6b9c9c8d 100644
--- a/vega/trainer/callbacks/model_checkpoint.py
+++ b/vega/trainer/callbacks/model_checkpoint.py
@@ -9,6 +9,7 @@
 # MIT License for more details.
 
 """ModelCheckpoint callback defination."""
+
 import os
 import glob
 import logging
@@ -35,7 +36,6 @@ def __init__(self):
 
     def before_train(self, logs=None):
         """Be called before the training process."""
-        self.is_chief = self.params['is_chief']
         if self.trainer.load_checkpoint:
             self._load_checkpoint()
 
@@ -49,7 +49,7 @@ def after_epoch(self, epoch, logs=None):
         self._save_checkpoint(epoch)
         if self.trainer.multi_task:
             self._saved_multi_checkpoint(epoch)
-        if self.is_chief and logs.get('summary_perfs').get('best_valid_perfs_changed', False):
+        if self.trainer.is_chief and logs.get('summary_perfs').get('best_changed', False):
             self._save_best_model()
 
     def _save_best_model(self):
@@ -178,7 +178,7 @@ def _save_om_model(self, weight_file, model_id):
         from mindspore.train.serialization import export
         from mindspore import Tensor
         import subprocess
-        for step, batch in enumerate(self.trainer.valid_loader.create_dict_iterator()):
+        for _, batch in enumerate(self.trainer.valid_loader.create_dict_iterator()):
             data = batch["image"]
         input_shape = data.shape
         fake_input = np.random.random(input_shape).astype(np.float32)
diff --git a/vega/trainer/callbacks/ms_callbacks.py b/vega/trainer/callbacks/ms_callbacks.py
index 77508688..054a305a 100644
--- a/vega/trainer/callbacks/ms_callbacks.py
+++ b/vega/trainer/callbacks/ms_callbacks.py
@@ -9,7 +9,7 @@
 # MIT License for more details.
 
 """Custom callbacks used in mindspore."""
-import os
+
 import logging
 from mindspore.train.callback import Callback
 from vega.report import ReportClient
@@ -40,7 +40,7 @@ def epoch_end(self, run_context):
             cb_params.cur_epoch_num, cb_params.epoch_num, metric))
 
         self.trainer.performance.update(metric)
-        if self.trainer.distributed and os.environ["DEVICE_ID"] != "0":
+        if not self.trainer.is_chief:
             return
         else:
             ReportClient().update(
diff --git a/vega/trainer/callbacks/performance_saver.py b/vega/trainer/callbacks/performance_saver.py
index a6c05b64..07169423 100644
--- a/vega/trainer/callbacks/performance_saver.py
+++ b/vega/trainer/callbacks/performance_saver.py
@@ -28,8 +28,6 @@ def __init__(self, best=True, after_epoch=True, after_train=True):
 
     def before_train(self, logs=None):
         """Be called before the training process."""
-        self.is_chief = self.params['is_chief']
-        self.do_validation = self.params['do_validation']
         self.summary_perfs = logs.get('summary_perfs', {})
         self.step_name = self.trainer.step_name
         self.worker_id = self.trainer.worker_id
@@ -43,7 +41,7 @@ def before_train(self, logs=None):
     def after_epoch(self, epoch, logs=None):
         """Be called after the training epoch."""
         logging.debug("train record: saver performance after epoch run successes.")
-        if not (self.is_chief and self.save_after_epoch):
+        if not (self.trainer.is_chief and self.save_after_epoch):
             return
         self._update_pfm(logs)
 
@@ -54,7 +52,7 @@ def after_train(self, logs=None):
     def _update_pfm(self, logs):
         self.summary_perfs = logs.get('summary_perfs', {})
 
-        best_changed = self.summary_perfs.get('best_valid_perfs_changed', False)
+        best_changed = self.summary_perfs.get('best_changed', False)
         if self.save_best and best_changed:
             pfm = self._get_best_perf(self.summary_perfs)
             self.trainer.best_performance = pfm
diff --git a/vega/trainer/callbacks/progress_logger.py b/vega/trainer/callbacks/progress_logger.py
index 9fe8c899..50ee17b7 100644
--- a/vega/trainer/callbacks/progress_logger.py
+++ b/vega/trainer/callbacks/progress_logger.py
@@ -52,8 +52,7 @@ def before_train(self, logs=None):
         self.total_time_pre_reports = []
         self.time_total_reports = []
         logging.debug("Start the unified trainer ... ")
-        self.is_chief = self.params['is_chief']
-        self.do_validation = self.params['do_validation']
+        self.do_validation = self.trainer.do_validation
 
     def before_train_step(self, batch_index, logs=None):
         """Be called before a batch training."""
@@ -69,7 +68,7 @@ def before_epoch(self, epoch, logs=None):
     def after_train_step(self, batch_index, logs=None):
         """Be called before each batch training."""
         self.total_time_pre_reports.append(time.perf_counter() - self.step_start_time)
-        if self.train_verbose >= 2 and self.is_chief \
+        if self.train_verbose >= 2 and self.trainer.is_chief \
                 and batch_index % self.train_report_steps == 0:
             metrics_results = logs.get('train_step_metrics', None)
             lr = logs['lr']
@@ -82,8 +81,11 @@ def after_train_step(self, batch_index, logs=None):
                 logging.warning("Cant't get the loss, maybe the loss doesn't update in the metric evaluator.")
 
             time_pre_batch = statistics.mean(self.total_time_pre_reports)
-            self.time_total_reports.append(sum(self.total_time_pre_reports))
-            time_pre_report = statistics.mean(self.time_total_reports) / self.train_report_steps
+            if batch_index == 0:
+                time_pre_report = time_pre_batch
+            else:
+                self.time_total_reports.append(sum(self.total_time_pre_reports))
+                time_pre_report = statistics.mean(self.time_total_reports) / self.train_report_steps
             self.total_time_pre_reports.clear()
 
             if metrics_results is not None:
@@ -110,7 +112,7 @@ def after_train_step(self, batch_index, logs=None):
 
     def after_valid_step(self, batch_index, logs=None):
         """Be called after each batch of the validation."""
-        if self.valid_verbose >= 2 and self.is_chief \
+        if self.valid_verbose >= 2 and self.trainer.is_chief \
                 and self.do_validation and batch_index % self.valid_report_steps == 0:
             metrics_results = logs.get('valid_step_metrics', None)
             if metrics_results is not None:
@@ -124,7 +126,7 @@ def after_valid_step(self, batch_index, logs=None):
 
     def after_valid(self, logs=None):
         """Be called after validation."""
-        if (self.valid_verbose >= 1 and self.is_chief and self.do_validation):
+        if (self.valid_verbose >= 1 and self.trainer.is_chief and self.do_validation):
             cur_valid_perfs = logs.get('cur_valid_perfs', None)
             best_valid_perfs = logs.get('best_valid_perfs', None)
             if cur_valid_perfs is not None:
diff --git a/vega/trainer/callbacks/report_callback.py b/vega/trainer/callbacks/report_callback.py
index 4e15902d..07c9eada 100644
--- a/vega/trainer/callbacks/report_callback.py
+++ b/vega/trainer/callbacks/report_callback.py
@@ -9,7 +9,7 @@
 # MIT License for more details.
 
 """Report callback defination."""
-import os
+
 import logging
 from .callback import Callback
 from vega.report import ReportClient
@@ -52,9 +52,8 @@ def after_train(self, logs=None):
     def _update_report(self, epoch=0):
         if self.trainer.standalone:
             return
-        if self.trainer.distributed:
-            if "DEVICE_ID" in os.environ and os.environ.get("DEVICE_ID") != "0":
-                return
+        if not self.trainer.is_chief:
+            return
         try:
             record = ReportClient().get_record(self.trainer.step_name, self.trainer.worker_id)
         except Exception as e:
@@ -62,7 +61,10 @@ def _update_report(self, epoch=0):
             return
         if hasattr(self.trainer.model, '_arch_params_type') and self.trainer.model._arch_params_type:
             if vega.is_ms_backend():
-                record.desc = self.trainer.model_desc
+                if hasattr(self.trainer.model, "to_desc"):
+                    record.desc = self.trainer.model.to_desc()
+                else:
+                    record.desc = self.trainer.model_desc
             else:
                 record.desc = self.trainer.model.to_desc()
         if not record.desc:
@@ -95,6 +97,8 @@ def _update_report(self, epoch=0):
     def _next_rung(self, record):
         if self.trainer.standalone:
             return
+        if not self.trainer.is_chief:
+            return
         result = ReportClient().request(action="next_rung", **record.to_dict())
         logging.debug(f"next rung result: {result}")
 
diff --git a/vega/trainer/callbacks/timm_trainer_callback.py b/vega/trainer/callbacks/timm_trainer_callback.py
index d124116a..defa3517 100644
--- a/vega/trainer/callbacks/timm_trainer_callback.py
+++ b/vega/trainer/callbacks/timm_trainer_callback.py
@@ -176,11 +176,9 @@ def _init_all_settings(self):  # noqa: C901
         self.config = self.trainer.config
         if self.trainer.hps and self.trainer.hps.get('trainer'):
             self.config.from_dict(self.trainer.hps.get('trainer'))
-        self.trainer._init_distributed_setting()
         if not vega.is_cpu_device():
             self.trainer._init_setting()
-        self.epochs = self.trainer.epochs
-        self.distributed = self.trainer.distributed
+        self.distributed = self.trainer.horovod
         self.trainer.model = self._init_model()
         self.model = self.trainer.model
         self.use_syncbn = self.config.syncbn
@@ -193,9 +191,7 @@ def _init_all_settings(self):  # noqa: C901
             self.model_ema = self._init_model_ema()
         self.trainer.lr_scheduler = self._init_lr_scheduler()
         self.trainer.loss = self._init_loss()
-        if self.distributed:
-            self.trainer._init_horovod_setting()
-        self.use_amp = self.config.amp
+        self.use_amp = self.config.use_amp
         if self.use_amp:
             self.trainer.model, self.trainer.optimizer = amp.initialize(self.trainer.model,
                                                                         self.trainer.optimizer,
diff --git a/vega/trainer/conf.py b/vega/trainer/conf.py
index 94f22748..b096ad2c 100644
--- a/vega/trainer/conf.py
+++ b/vega/trainer/conf.py
@@ -7,7 +7,10 @@
 # but WITHOUT ANY WARRANTY; without even the implied warranty of
 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 # MIT License for more details.
+
 """Default configs."""
+
+import os
 from .modules.conf.loss import LossConfig
 from .modules.conf.lr_scheduler import LrSchedulerConfig
 from .modules.conf.optim import OptimConfig
@@ -58,8 +61,9 @@ class TrainerConfig(ConfigSerializable):
     epochs = 1
     valid_interval = 1
     syncbn = False
-    amp = False
-    opt_level = 'O1'
+    use_amp = False
+    keep_batchnorm_fp32 = False
+    opt_level = 'O2'
     lazy_built = False
     callbacks = None
     grad_clip = None
@@ -103,6 +107,9 @@ class TrainerConfig(ConfigSerializable):
     mixup = False
     multi_task = False
     adaptive_muti_loss = False
+    eval_per_epoch = True
+    # script runner
+    script = None
 
     @classmethod
     def set_task(cls, task):
@@ -118,6 +125,8 @@ def from_dict(cls, data, skip_check=True):
         """Restore config from a dictionary or a file."""
         if "task" in data.keys() and data["task"] != cls.task and data["task"] is not None:
             cls.set_task(data["task"])
+        if "script" in data.keys() and data["script"] is not None:
+            data["script"] = os.path.abspath(data["script"])
         return super(TrainerConfig, cls).from_dict(data, skip_check)
 
     @classmethod
diff --git a/vega/trainer/deserialize.py b/vega/trainer/deserialize.py
index 4753cd83..4b29c4e8 100644
--- a/vega/trainer/deserialize.py
+++ b/vega/trainer/deserialize.py
@@ -24,24 +24,6 @@ def _get_worker_config(worker):
     from vega.evaluator.conf import EvaluatorConfig
     from vega.core.pipeline.conf import PipeStepConfig
 
-    env = {
-        "LOCAL_RANK": os.environ.get("LOCAL_RANK", None),
-        "PYTHONPATH": os.environ.get("PYTHONPATH", None),
-        "LD_LIBRARY_PATH": os.environ.get("LD_LIBRARY_PATH", None),
-        "PWD": os.environ.get("PWD", None),
-        "DLS_JOB_ID": os.environ.get("DLS_JOB_ID", None),
-        "RANK_TABLE_FILE": os.environ.get("RANK_TABLE_FILE", None),
-        "RANK_SIZE": os.environ.get("RANK_SIZE", None),
-        "DEVICE_ID": os.environ.get("DEVICE_ID", None),
-        "RANK_ID": os.environ.get("RANK_ID", None),
-        "DLS_TASK_NUMBER": os.environ.get("DLS_TASK_NUMBER", None),
-        "NPU-VISIBLE-DEVICES": os.environ.get("NPU-VISIBLE-DEVICES", None),
-        "NPU_VISIBLE_DEVICES": os.environ.get("NPU_VISIBLE_DEVICES", None),
-        "PATH": os.environ.get("PATH", None),
-        "ASCEND_OPP_PATH": os.environ.get("ASCEND_OPP_PATH", None),
-        "DEVICE_CATEGORY": os.environ.get("DEVICE_CATEGORY", None),
-        "BACKEND_TYPE": os.environ.get("BACKEND_TYPE", None),
-    }
     worker_config = {
         "class_factory": deepcopy(ClassFactory.__registry__),
         "general": General().to_dict(),
@@ -49,12 +31,6 @@ def _get_worker_config(worker):
         "model": ModelConfig().to_dict(),
         "trainer": worker.config.to_dict(),
         "evaluator": EvaluatorConfig().to_dict(),
-
-        "worker_nccl_port": worker.worker_nccl_port,
-        "world_size": worker.world_size,
-        "timeout": worker.timeout,
-
-        "env": env,
         "pipe_step": PipeStepConfig().to_dict()
     }
     return worker_config
@@ -63,11 +39,10 @@ def _get_worker_config(worker):
 def pickle_worker(workers, id):
     """Pickle worker to file."""
     for index, worker in enumerate(workers):
-        # pickle config
+        worker_config = _get_worker_config(worker)
         config_file = os.path.join(
             worker.get_local_worker_path(),
             f".{str(id)}.{str(index)}.config.pkl")
-        worker_config = _get_worker_config(worker)
         with open(config_file, "wb") as f:
             pickle.dump(worker_config, f)
         # pickle worker
@@ -80,17 +55,10 @@ def pickle_worker(workers, id):
 
 def load_config(config_file):
     """Load config from file."""
-    import os
     import pickle
-    import vega
 
     with open(config_file, 'rb') as f:
         config = pickle.load(f)
-    for (key, value) in config["env"].items():
-        if value is not None:
-            os.environ[key] = value
-
-    vega.set_backend(os.environ['BACKEND_TYPE'].lower(), os.environ["DEVICE_CATEGORY"])
 
     from vega.common.class_factory import ClassFactory
     from vega.common.general import General
diff --git a/vega/trainer/modules/conf/optim.py b/vega/trainer/modules/conf/optim.py
index 7cc0f949..7d614623 100644
--- a/vega/trainer/modules/conf/optim.py
+++ b/vega/trainer/modules/conf/optim.py
@@ -40,7 +40,8 @@ class OptimMappingDict(object):
     """Optimizer Mapping Dictionary."""
 
     type_mapping_dict = dict(
-        SGD=dict(torch='SGD', tf='MomentumOptimizer', ms='Momentum'),
+        SGD=dict(torch='SGD', tf='MomentumOptimizer', ms='SGD'),
+        Momentum=dict(torch='SGD', tf='MomentumOptimizer', ms='Momentum'),
         Adam=dict(torch='Adam', tf='AdamOptimizer', ms='Adam'),
         RMSProp=dict(torch='RMSProp', tf='RMSPropOptimizer', ms='RMSProp')
     )
@@ -50,13 +51,26 @@ class OptimMappingDict(object):
             lr=dict(torch='lr', tf='learning_rate', ms='learning_rate'),
             momentum=dict(torch='momentum', tf='momentum', ms='momentum'),
             weight_decay=dict(torch='weight_decay', tf='weight_decay', ms='weight_decay'),
+            no_decay_params=dict(torch=None, tf=None, ms='no_decay_params'),
+            loss_scale=dict(torch=None, tf=None, ms='loss_scale'),
+        ),
+        Momentum=dict(
+            lr=dict(torch='lr', tf='learning_rate', ms='learning_rate'),
+            momentum=dict(torch='momentum', tf='momentum', ms='momentum'),
+            weight_decay=dict(torch='weight_decay', tf='weight_decay', ms='weight_decay'),
+            no_decay_params=dict(torch=None, tf=None, ms='no_decay_params'),
+            loss_scale=dict(torch=None, tf=None, ms='loss_scale'),
         ),
         Adam=dict(
             lr=dict(torch='lr', tf='learning_rate', ms='learning_rate'),
             weight_decay=dict(torch='weight_decay', tf='weight_decay', ms='weight_decay'),
+            no_decay_params=dict(torch=None, tf=None, ms='no_decay_params'),
+            loss_scale=dict(torch=None, tf=None, ms='loss_scale'),
         ),
         RMSProp=dict(
             lr=dict(torch='lr', tf='learning_rate', ms='learning_rate'),
             weight_decay=dict(torch='weight_decay', tf='weight_decay', ms='weight_decay'),
+            no_decay_params=dict(torch=None, tf=None, ms='no_decay_params'),
+            loss_scale=dict(torch=None, tf=None, ms='loss_scale'),
         )
     )
diff --git a/vega/trainer/modules/lr_schedulers/__init__.py b/vega/trainer/modules/lr_schedulers/__init__.py
index f1244588..ec8b966d 100644
--- a/vega/trainer/modules/lr_schedulers/__init__.py
+++ b/vega/trainer/modules/lr_schedulers/__init__.py
@@ -11,4 +11,4 @@
     from .step_lr import StepLR
     from .ca_restart_tf import CosineAnnealingRestartLR
 elif vega.is_ms_backend():
-    from .ms_lr_scheduler import MultiStepLR, StepLR, CosineAnnealingLR
+    from .ms_lr_scheduler import MultiStepLR, StepLR, CosineAnnealingLR, PolyLR, WarmupScheduler
diff --git a/vega/trainer/modules/lr_schedulers/ms_lr_scheduler.py b/vega/trainer/modules/lr_schedulers/ms_lr_scheduler.py
index 11089de9..678c42c4 100644
--- a/vega/trainer/modules/lr_schedulers/ms_lr_scheduler.py
+++ b/vega/trainer/modules/lr_schedulers/ms_lr_scheduler.py
@@ -111,6 +111,24 @@ def __call__(self, base_lr, global_step, total_epoch):
         return lr_each_step
 
 
+@ClassFactory.register(ClassType.LR_SCHEDULER)
+class PolyLR():
+    """Applies polynomial decay to generate learning rate array."""
+
+    def __init__(self, optimizer=None, lr_max=0.1):
+        super(PolyLR, self).__init__()
+        self.lr_max = lr_max
+
+    def __call__(self, base_lr, global_step, total_epoch):
+        """Call lr scheduler class."""
+        lr_each_step = []
+        for cur_step in range(global_step):
+            base = 1 - cur_step / global_step
+            lr = self.lr_max * base * base
+            lr_each_step.append(lr)
+        return lr_each_step
+
+
 @ClassFactory.register(ClassType.LR_SCHEDULER)
 class WarmupScheduler():
     """WarmupScheduler learning rate."""
diff --git a/vega/trainer/modules/optimizer/optim.py b/vega/trainer/modules/optimizer/optim.py
index c0119b16..7e7c0fae 100644
--- a/vega/trainer/modules/optimizer/optim.py
+++ b/vega/trainer/modules/optimizer/optim.py
@@ -9,26 +9,15 @@
 # MIT License for more details.
 
 """Manage LrScheduler class."""
+
+from types import MethodType
 import logging
 import vega
 from vega.common import ClassFactory, ClassType
 from ..config_bakcend_map import ConfigBackendMapping
 from ..conf.optim import OptimConfig, OptimMappingDict
 from vega.common.config import Config
-
-if vega.is_gpu_device():
-    try:
-        if vega.is_torch_backend():
-            import horovod.torch as hvd
-        elif vega.is_tf_backend():
-            import horovod.tensorflow as hvd
-    except Exception:
-        pass
-elif vega.is_npu_device() and vega.is_tf_backend():
-    from npu_bridge.estimator.npu.npu_optimizer import NPUDistributedOptimizer
-
-if vega.is_tf_backend():
-    from vega.trainer.modules.optimizer.optimizer import dynamic_optimizer, dynamic_distributed_optimizer
+from vega.common.general import General
 
 
 class Optimizer(object):
@@ -72,11 +61,31 @@ def __call__(self, model=None, distributed=False, **kwargs):
                 if distributed:
                     optimizer = self.set_distributed(optimizer, model)
             elif vega.is_tf_backend():
+                from vega.trainer.modules.optimizer.optimizer import dynamic_optimizer
                 optimizer = dynamic_optimizer(self.optim_cls, **params)
             elif vega.is_ms_backend():
                 if "dynamic_lr" in kwargs:
                     params.update({"learning_rate": kwargs["dynamic_lr"]})
                 learnable_params = [param for param in model.trainable_params() if param.requires_grad]
+                if 'no_decay_params' in kwargs and len(kwargs['no_decay_params']) > 0:
+                    logging.info(f"no_decay_params is {kwargs['no_decay_params']}.")
+                    decayed_params = []
+                    no_decayed_params = []
+                    for param in learnable_params:
+                        decay_flag = True
+                        for no_decay in kwargs['no_decay_params']:
+                            if no_decay in param.name:
+                                no_decayed_params.append(param)
+                                decay_flag = False
+                                break
+                        if decay_flag:
+                            decayed_params.append(param)
+
+                    learnable_params = [{'params': decayed_params, 'weight_decay': params['weight_decay']},
+                                        {'params': no_decayed_params},
+                                        {'order_params': model.trainable_params()}]
+                if 'no_decay_params' in params:
+                    params.pop('no_decay_params')
                 optimizer = self.optim_cls(learnable_params, **params)
             return optimizer
         except Exception as ex:
@@ -86,13 +95,26 @@ def __call__(self, model=None, distributed=False, **kwargs):
     @classmethod
     def set_distributed(cls, optimizer, model=None):
         """Set distributed optimizer."""
-        if vega.is_torch_backend():
+        if General.cluster.horovod and vega.is_torch_backend():
+            import horovod.torch as hvd
             optimizer = hvd.DistributedOptimizer(optimizer,
                                                  named_parameters=model.named_parameters(),
                                                  compression=hvd.Compression.none)
-        elif vega.is_tf_backend():
-            optim_class = hvd.DistributedOptimizer if vega.is_gpu_device() else NPUDistributedOptimizer
-            optimizer = dynamic_distributed_optimizer(optim_class, optimizer)
+        elif General.cluster.horovod and vega.is_tf_backend():
+            import horovod.tensorflow as hvd
+            from vega.trainer.modules.optimizer.optimizer import OptimizerStep
+            base_lr = optimizer.base_lr
+            weight_decay = optimizer.weight_decay
+            optimizer = hvd.DistributedOptimizer(optimizer)
+            setattr(optimizer, "base_lr", base_lr)
+            setattr(optimizer, "weight_decay", weight_decay)
+            optimizer.step = MethodType(OptimizerStep.step, optimizer)
+            optimizer.set_lr = MethodType(OptimizerStep.set_lr, optimizer)
+            optimizer.regularize_loss = MethodType(OptimizerStep.regularize_loss, optimizer)
+        elif General.cluster.hccl and vega.is_tf_backend():
+            from npu_bridge.estimator.npu.npu_optimizer import NPUDistributedOptimizer
+            from vega.trainer.modules.optimizer.optimizer import dynamic_distributed_optimizer
+            optimizer = dynamic_distributed_optimizer(NPUDistributedOptimizer, optimizer)
         return optimizer
 
 
@@ -103,6 +125,7 @@ def set_distributed(cls, optimizer, model=None):
     if vega.is_npu_device():
         try:
             from apex.optimizers import NpuFusedSGD
+
             ClassFactory.register_cls(NpuFusedSGD, ClassType.OPTIMIZER)
         except Exception:
             pass
diff --git a/vega/trainer/modules/optimizer/optimizer.py b/vega/trainer/modules/optimizer/optimizer.py
index df7932a2..6cdc9756 100644
--- a/vega/trainer/modules/optimizer/optimizer.py
+++ b/vega/trainer/modules/optimizer/optimizer.py
@@ -9,10 +9,6 @@
 # MIT License for more details.
 
 """TF Adam."""
-import vega
-
-if vega.is_tf_backend():
-    import tensorflow as tf
 
 
 class OptimizerStep(object):
@@ -45,6 +41,7 @@ def step(self, loss, loss_scale, global_step, var_list=None):
 
     def regularize_loss(self, loss):
         """Compute and return l2 loss."""
+        import tensorflow as tf
         l2_loss_list = [tf.nn.l2_loss(v) for v in tf.compat.v1.trainable_variables()
                         if 'batch_normalization' not in v.name]
         loss = loss + self.weight_decay * tf.add_n(l2_loss_list)
diff --git a/vega/trainer/run_remote_worker.py b/vega/trainer/run_remote_worker.py
index 01a1682e..ee7e7adc 100644
--- a/vega/trainer/run_remote_worker.py
+++ b/vega/trainer/run_remote_worker.py
@@ -11,53 +11,43 @@
 """Run worker remotely."""
 
 import os
-import pickle
+import sys
 import psutil
 import logging
 import subprocess
-import json
 import traceback
 import signal
 import vega
+from vega.trainer.deserialize import load_config, load_worker
 
 
 def run_remote_worker(worker_id, worker_path, id, num_workers):
     """Run worker on remote node."""
-    from vega.common.utils import init_log
-    init_log(level="info",
-             log_file=".temp_{}.log".format(worker_id),
-             log_path=worker_path)
+    from vega.common.utils import init_log, close_log
+    fh = init_log(level="info",
+                  log_file=".temp_{}.log".format(worker_id),
+                  log_path=worker_path)
     for index in range(num_workers):
-        config = _load_config(worker_id, worker_path, id, index)
-        if "LD_LIBRARY_PATH" in config["env"] and config["env"]["LD_LIBRARY_PATH"] is not None:
-            os.environ["LD_LIBRARY_PATH"] = config["env"]["LD_LIBRARY_PATH"]
-        os.environ["PWD"] = config["env"]["PWD"]
         os.chdir(os.environ["PWD"])
-        vega.set_backend(os.environ['BACKEND_TYPE'].lower(), os.environ["DEVICE_CATEGORY"])
+        if 'PYTHONPATH' in os.environ:
+            os.environ['PYTHONPATH'] = "{}:{}:{}".format(
+                os.environ['PYTHONPATH'], worker_path, os.path.abspath(os.curdir))
+        elif worker_id is not None and worker_path is not None:
+            os.environ['PYTHONPATH'] = "{}:{}".format(
+                worker_path, os.path.abspath(os.curdir))
 
         if vega.is_gpu_device():
-            sub_pid_list = call_in_gpu(config, id, worker_id, worker_path, index)
+            sub_pid_list = call_in_gpu(id, worker_id, worker_path, index)
         elif vega.is_npu_device():
-            os.environ["PYTHONPATH"] = config["env"]["PYTHONPATH"]
-            os.environ["PATH"] = config["env"]["PATH"]
-            os.environ["ASCEND_OPP_PATH"] = config["env"]["ASCEND_OPP_PATH"]
-            sub_pid_list = call_in_npu(config, id, worker_id, worker_path, index)
+            sub_pid_list = call_in_npu(id, worker_id, worker_path, index)
         logging.info("DistributedWorker finished!")
         for sub_pid in sub_pid_list:
             kill_proc_tree(pid=sub_pid)
         logging.info("DistributedWorker subprocess cleaned!")
+    close_log(fh)
     return 0
 
 
-def _load_config(worker_id, worker_path, id, index):
-    _config_file = os.path.join(
-        worker_path,
-        f".{str(id)}.{str(index)}.config.pkl")
-    with open(_config_file, 'rb') as f:
-        config = pickle.load(f)
-    return config
-
-
 def kill_proc_tree(pid, sig=signal.SIGKILL, include_parent=True,
                    timeout=None, on_terminate=None):
     """Kill a process tree (including grandchildren) with signal.
@@ -84,64 +74,27 @@ def kill_proc_tree(pid, sig=signal.SIGKILL, include_parent=True,
     return (gone, alive)
 
 
-def call_in_gpu(config, id, worker_id, worker_path, index):
+def call_in_gpu(id, worker_id, worker_path, index):
     """Call function based on GPU devices."""
-    env = os.environ.copy()
     sub_pid_list = []
-    worker_nccl_port = config["worker_nccl_port"]
-    world_size = config["world_size"]
-    if 'CUDA_VISIBLE_DEVICES' in env:
-        try:
-            first_gpu_id = env['CUDA_VISIBLE_DEVICES'].split(",")[0]
-            env['VEGA_WORKER_PORT'] = '{}'.format(worker_nccl_port + int(first_gpu_id))
-        except Exception:
-            env['VEGA_WORKER_PORT'] = '{}'.format(worker_nccl_port)
-    if 'PYTHONPATH' in env:
-        env['PYTHONPATH'] = "{}:{}:{}".format(
-            env['PYTHONPATH'], worker_path, os.path.abspath(os.curdir))
-    elif worker_id is not None and worker_path is not None:
-        env['PYTHONPATH'] = "{}:{}".format(
-            worker_path, os.path.abspath(os.curdir))
     sub_pid = _subprocess(
-        config, id, worker_id, worker_path, rank=0, world_size=world_size,
-        env=env, is_backend=False, index=index)
+        id, worker_id, worker_path, rank=0, is_backend=False, index=index)
     sub_pid_list.append(sub_pid)
     return sub_pid_list
 
 
-def call_in_npu(config, id, worker_id, worker_path, index):
+def call_in_npu(id, worker_id, worker_path, index):
     """Call function based on NPU devices."""
-    env = os.environ.copy()
     sub_pid_list = []
-    if 'PYTHONPATH' in env:
-        env['PYTHONPATH'] = "{}:{}:{}".format(
-            env['PYTHONPATH'], worker_path, os.path.abspath(os.curdir))
-    elif worker_id is not None and worker_path is not None:
-        env['PYTHONPATH'] = "{}:{}".format(
-            worker_path, os.path.abspath(os.curdir))
-    rank_file = env.get('RANK_TABLE_FILE')
-    with open(rank_file, 'r') as f:
-        rank_table_json = json.loads(f.read())
-    if config["general"].get('dft', False):
-        env['RANK_SIZE'] = env['ORIGIN_RANK_SIZE']
-        env['RANK_TABLE_FILE'] = env['ORIGIN_RANK_TABLE_FILE']
-    else:
-        env['RANK_SIZE'] = '1'
-        env['DEVICE_ID'] = rank_table_json['server_list'][0]['device'][0]['device_id']
-        env['MASTER_ADDR'] = rank_table_json['server_list'][0]['device'][0]['device_ip']
-        env['MASTER_PORT'] = rank_table_json['server_list'][0].get('server_port', '29688')
-        env['RANK_ID'] = env['DEVICE_ID']
-        env.pop('RANK_TABLE_FILE', None)
     from vega.common import switch_directory
     with switch_directory(worker_path):
         sub_pid = _subprocess(
-            config, id, worker_id, worker_path, rank=0, world_size=1,
-            env=env, is_backend=False, index=index)
+            id, worker_id, worker_path, rank=0, is_backend=False, index=index)
     sub_pid_list.append(sub_pid)
     return sub_pid_list
 
 
-def _subprocess(config, id, worker_id, worker_path, rank, world_size, env, is_backend, index):
+def _subprocess(id, worker_id, worker_path, rank, is_backend, index):
     """Subprocess on each rank.
 
     Load pickle file into worker class, and use subprocess to run the
@@ -156,11 +109,6 @@ def _subprocess(config, id, worker_id, worker_path, rank, world_size, env, is_ba
     :param is_backend: backend or not
     :type is_backend: bool
     """
-    env['RANK'] = "{}".format(rank)
-    env['WORLD_SIZE'] = "{}".format(world_size)
-
-    _refresh_config_file(config, id, worker_id, worker_path, env, index)
-
     config_file = os.path.join(
         worker_path,
         f".{str(id)}.{str(index)}.config.pkl")
@@ -168,45 +116,39 @@ def _subprocess(config, id, worker_id, worker_path, rank, world_size, env, is_ba
         worker_path,
         f".{str(id)}.{str(index)}.worker.pkl")
 
-    cmd = "from vega.trainer.deserialize import load_config;"
-    cmd += "load_config('{}');".format(config_file)
-
-    if 'VEGA_INIT_ENV' in os.environ:
-        cmd += os.environ.copy()['VEGA_INIT_ENV']
-
-    cmd += "from vega.trainer.deserialize import load_worker;"
-    cmd += "worker=load_worker('{}');".format(worker_file)
-    cmd += "worker.train_process();"
-
-    python_command = config["general"].get('python_command')
-
+    python_command = os.environ["vega_python_command"]
     if is_backend:
-        proc = subprocess.Popen([python_command, '-c', cmd], close_fds=True, env=env)
+        proc = subprocess.Popen(
+            [python_command, '-m', "vega.trainer.run_remote_worker", config_file, worker_file],
+            close_fds=True,
+            env=os.environ.copy())
         pid = proc.pid
     else:
         try:
-            proc = subprocess.Popen([python_command, '-c', cmd], env=env)
+            proc = subprocess.Popen(
+                [python_command, '-m', "vega.trainer.run_remote_worker", config_file, worker_file],
+                env=os.environ.copy())
             pid = proc.pid
-            proc.wait(timeout=config["timeout"])
+            proc.wait(timeout=int(os.environ["vega_timeout"]))
         except Exception:
             logging.warn("Timeout worker has been killed.")
             logging.warn(traceback.print_exc())
     return pid
 
 
-def _refresh_config_file(config, id, worker_id, worker_path, env, index):
-    config["env"]["RANK"] = env.get("RANK", None)
-    config["env"]["WORLD_SIZE"] = env.get("WORLD_SIZE", None)
-    config["env"]["PYTHONPATH"] = env.get("PYTHONPATH", None)
-    config["env"]["RANK_TABLE_FILE"] = env.get("RANK_TABLE_FILE", None)
-    config["env"]["RANK_SIZE"] = env.get("RANK_SIZE", None)
-    config["env"]["DEVICE_ID"] = env.get("DEVICE_ID", None)
-    config["env"]["RANK_ID"] = env.get("RANK_ID", None)
-    config["env"]["MASTER_ADDR"] = env.get("MASTER_ADDR", None)
-    config["env"]["MASTER_PORT"] = env.get("MASTER_PORT", None)
+def run_worker():
+    """Run worker."""
+    try:
+        vega.set_backend(os.environ["BACKEND_TYPE"].lower(), os.environ["DEVICE_CATEGORY"])
+        (config_file, worker_file) = sys.argv[1:]
+        load_config(config_file)
+        # cmd += os.environ["vega_init_env"] if "vega_init_env" in os.environ else ""
+        worker = load_worker(worker_file)
+        worker.train_process()
+    except Exception:
+        traceback.print_exc(file=open("./error.log", "w+"))
+        logging.error(traceback.format_exc())
 
-    config_file = os.path.join(
-        worker_path,
-        f".{str(id)}.{str(index)}.config.pkl")
-    with open(config_file, "wb") as f:
-        pickle.dump(config, f)
+
+if __name__ == "__main__":
+    run_worker()
diff --git a/vega/trainer/script_runner.py b/vega/trainer/script_runner.py
index de35cab0..301ff3ec 100644
--- a/vega/trainer/script_runner.py
+++ b/vega/trainer/script_runner.py
@@ -16,8 +16,9 @@
 import os
 import pickle
 import glob
-from vega.common import init_log, Config
+from vega.common import Config
 from vega.common.general import General
+from vega.common.wrappers import train_process_wrapper
 from vega.trainer.distributed_worker import DistributedWorker
 from vega.trainer.conf import TrainerConfig
 from vega.common.class_factory import ClassFactory, ClassType
@@ -38,16 +39,18 @@ def __init__(self, model=None, id=None, hps=None, model_desc=None, **kwargs):
         self.hps = self._get_hps(hps)
         self.worker_type = WorkerTypes.TRAINER
 
+    @train_process_wrapper
     def train_process(self):
         """Whole train process of the TrainWorker specified in config.
 
         After training, the model and validation results are saved to local_worker_path and s3_path.
         """
-        init_log(level=General.logger.level,
-                 log_file=f"{self.step_name}_worker_{self.worker_id}.log",
-                 log_path=self.local_log_path)
-        self._dump_trial_config()
-        self._run_script()
+        try:
+            self._dump_trial_config()
+            self._run_script()
+        except Exception:
+            logger.error(traceback.format_exc())
+            logger.error("Failed to run script.")
 
     def _run_script(self):
         """Run script."""
diff --git a/vega/trainer/trainer_base.py b/vega/trainer/trainer_base.py
index 19187b18..c9101803 100644
--- a/vega/trainer/trainer_base.py
+++ b/vega/trainer/trainer_base.py
@@ -14,7 +14,7 @@
 import glob
 import logging
 import vega
-from vega.common import FileOps, init_log
+from vega.common import FileOps
 from vega.common.class_factory import ClassFactory, ClassType
 from vega.common.config import Config
 from vega.trainer.callbacks import CallbackList
@@ -24,13 +24,15 @@
 from vega.datasets import Adapter
 from vega.common.general import General
 from vega.common.utils import update_dict
+from vega.common.wrappers import train_process_wrapper
 
 
 class TrainerBase(DistributedWorker):
     """Trainer base class."""
 
     def __init__(self, model=None, id=None, hps=None, load_ckpt_flag=False,
-                 model_desc=None, multi_task=None, **kwargs):
+                 model_desc=None, multi_task=None, horovod=False, hccl=False,
+                 **kwargs):
         super().__init__()
 
         self.config = TrainerConfig()
@@ -55,7 +57,7 @@ def __init__(self, model=None, id=None, hps=None, load_ckpt_flag=False,
         self.lr_scheduler = None
         self.loss = None
         self.use_syncbn = self.config.syncbn
-        self.use_amp = self.config.amp
+        self.use_amp = self.config.use_amp
         self.train_metrics = None
         self.valid_metrics = None
         self.call_metrics_on_train = self.config.call_metrics_on_train
@@ -83,17 +85,16 @@ def __init__(self, model=None, id=None, hps=None, load_ckpt_flag=False,
         self._start_epoch = 0
         self.visual_data = {}
         self.load_ckpt_flag = load_ckpt_flag
-        self.distributed = self.config.distributed
-        if vega.is_gpu_device():
-            self.distributed = not General._parallel and self.distributed
+        self.ddp = General._parallel and General.devices_per_trainer > 1 and vega.is_gpu_device()
+        self.horovod = horovod
+        self.hccl = hccl
+        self.num_workers = General.cluster.num_workers
+        self.sampler = None
         # Used by TimmTrainerCallbacks since it builds its trainer in
         # the before_train callback
         self.lazy_built = self.config.lazy_built
         # Indicate whether the necessary components of a trainer
         # has been built for running
-        self._world_size = 1
-        self._rank_id = 0
-        self._local_rank_id = 0
         self._next_rung = False
         self.config.kwargs = kwargs
         self.checkpoint_file_name = 'checkpoint.pth'
@@ -112,23 +113,22 @@ def __init__(self, model=None, id=None, hps=None, load_ckpt_flag=False,
             TrainerConfig.model_desc = model_desc
         self.standalone = General.cluster.master_ip is None or General.message_port is None
 
+    @train_process_wrapper
     def train_process(self):
         """Whole train process of the TrainWorker specified in config.
 
         After training, the model and validation results are saved to local_worker_path and s3_path.
         """
-        init_log(level=General.logger.level,
-                 log_file=f"{self.step_name}_worker_{self.worker_id}.log",
-                 log_path=self.local_log_path)
         if self.standalone:
             logging.info("Standalone mode. The result data will not be sent to server through report.")
-        self._set_default_funcs()
-        self._set_condition()
+        self.init_env()
         self._init_callbacks()
         self.callbacks.init_trainer()
+        self.set_training_settings()
         if not self.lazy_built:
             self.build()
         self._train_loop()
+        self.closeout()
         return self.model
 
     def build(self):
@@ -147,48 +147,27 @@ def build(self):
         self.batch_num_train = len(self.train_loader)
         self.batch_num_valid = len(self.valid_loader)
 
-    def train(self, inputs, labels):
-        """Train model."""
+    def set_training_settings(self):
+        """Set training settings."""
         pass
 
-    def predict(self, input):
-        """Inference model."""
-        pass
-
-    def save(self, file_name):
-        """Save model."""
-        pass
-
-    def load(self, model_name, by_name):
-        """Load model."""
-        pass
-
-    def set_weights(self, weights):
-        """Set weight with memory tensor."""
-        pass
-
-    def get_weights(self):
-        """Get the weights."""
-        pass
-
-    def _train_epoch(self):
-        pass
-
-    def _valid_epoch(self):
-        pass
-
-    def _set_default_funcs(self):
-        pass
-
-    def _set_condition(self):
-        pass
-
-    def _init_tf_estimator(self):
-        pass
-
-    def _init_horovod_setting(self):
-        """Init horovod setting."""
-        self.is_chief = True
+    def init_env(self):
+        """Init trainer environment."""
+        self.num_workers = General.cluster.num_workers
+        if self.hccl:
+            self.rank_id = int(os.environ.get('RANK_ID', "0"))
+            self.device_id = int(os.environ.get('DEVICE_ID', "0"))
+            if not General.cluster.show_all_ranks:
+                self.is_chief = "-" not in str(self.worker_id)
+        elif self.horovod:
+            if vega.is_torch_backend():
+                import horovod.torch as hvd
+            elif vega.is_tf_backend():
+                import horovod.tensorflow as hvd
+                hvd.init()
+            self.rank_id = hvd.rank()
+            if not General.cluster.show_all_ranks:
+                self.is_chief = self.rank_id == 0
 
     def _init_hps(self, hps=None):
         """Load hps from file."""
@@ -217,20 +196,6 @@ def _init_hps(self, hps=None):
             self.load_checkpoint = self.config.load_checkpoint
         self.epochs = self.config.epochs
 
-    def _init_minimize_op(self, loss, global_step, var_list=None):
-        """Init loss minimize operation, include loss scale method."""
-        loss_scale = self.config.loss_scale if self.use_amp else 1.
-        if loss_scale != 1:
-            scaled_grad_vars = self.optimizer.compute_gradients(loss * loss_scale, var_list=var_list)
-            unscaled_grad_vars = []
-            for grad, var in scaled_grad_vars:
-                unscaled_grad_vars.append((grad, var) if grad is None else (grad / loss_scale, var))
-            minimize_op = self.optimizer.apply_gradients(unscaled_grad_vars, global_step)
-        else:
-            grad_vars = self.optimizer.compute_gradients(loss, var_list=var_list)
-            minimize_op = self.optimizer.apply_gradients(grad_vars, global_step)
-        return minimize_op
-
     def _init_metrics(self, metrics=None):
         """Init metrics."""
         if metrics is not None:
@@ -266,10 +231,13 @@ def _init_dataloader(self, mode, loader=None, transforms=None):
             dataset = dataset_cls(mode=mode)
         if transforms is not None:
             dataset.transforms = transforms
-        if self.distributed and mode == "train":
-            dataset.set_distributed(self._world_size, self._rank_id)
+        if (self.hccl or self.horovod) and mode == "train":
+            dataset.set_distributed(self.num_workers, self.rank_id)
         # adapt the dataset to specific backend
-        dataloader = Adapter(dataset).loader
+        adapter = Adapter(dataset)
+        if (self.hccl or self.horovod) and mode == "train" and hasattr(adapter, "sampler"):
+            self.sampler = adapter.sampler
+        dataloader = adapter.loader
         return dataloader
 
     def _train_loop(self):
@@ -294,18 +262,15 @@ def _train_loop(self):
                 if self.do_validation:
                     epoch_logs.update({'valid_num_batches': self.batch_num_valid})
                 self.callbacks.before_epoch(epoch, epoch_logs)
-                if self.config.with_train:
+                if self.config.with_train and hasattr(self, "_train_epoch"):
                     self._train_epoch()
-                if self.do_validation and self._should_run_validation(epoch):
+                if self.do_validation and hasattr(self, "_valid_epoch") and self._should_run_validation(epoch):
                     self._valid_epoch()
                 self.callbacks.after_epoch(epoch)
             self.callbacks.after_train()
             if not self._next_rung:
                 break
 
-        if self.distributed:
-            self._shutdown_distributed()
-
     def _should_run_validation(self, epoch):
         # Zero valid_interval means doesn't run _valid_loop of the trainer
         # and user may provide _valid_loop in other callbacks
@@ -324,19 +289,6 @@ def _init_callbacks(self):
         self.callbacks = CallbackList(customs, disables)
         self.callbacks.set_trainer(self)
 
-    def _metric_average(self, val, name):
-        """Do metric average.
-
-        :param val: input value
-        :param name: metric name
-        :return:
-        """
-        import torch
-        import horovod.torch as hvd
-        tensor = torch.tensor(val)
-        avg_tensor = hvd.allreduce(tensor, name=name)
-        return avg_tensor.item()
-
     def _backup(self):
         """Backup result worker folder."""
         if self.need_backup is True and self.backup_base_path is not None:
@@ -345,7 +297,6 @@ def _backup(self):
             FileOps.copy_folder(
                 self.get_local_worker_path(self.step_name, self.worker_id), backup_worker_path)
 
-    def _shutdown_distributed(self):
-        if vega.is_npu_device() and self.distributed and vega.is_tf_backend():
-            self.sess.run(self.npu_shutdown)
-            self.sess.close()
+    def closeout(self):
+        """Closeout."""
+        pass
diff --git a/vega/trainer/trainer_ms.py b/vega/trainer/trainer_ms.py
index 705396ed..4a33ec49 100644
--- a/vega/trainer/trainer_ms.py
+++ b/vega/trainer/trainer_ms.py
@@ -13,7 +13,9 @@
 import os
 from mindspore import context
 from mindspore.train import Model as MsModel
-from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor
+from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor, TimeMonitor
+from mindspore.train.loss_scale_manager import FixedLossScaleManager
+from mindspore import save_checkpoint
 from vega.trainer.callbacks.ms_callbacks import EvalCallBack
 import vega
 from vega.trainer.trainer_base import TrainerBase
@@ -22,8 +24,7 @@
 from vega.modules.loss import Loss
 from vega.common import ClassFactory, ClassType
 import logging
-from mindspore.communication.management import init as hccl_init
-from mindspore.context import ParallelMode
+from vega.common.general import General
 
 
 @ClassFactory.register(ClassType.TRAINER)
@@ -33,14 +34,17 @@ class TrainerMs(TrainerBase):
     def build(self):
         """Build the trainer by assembling the necessary components."""
         super().build()
+        no_decay_params = self.config.optimizer.params.get("no_decay_params", [])
         if self.config.lr_scheduler.params:
             self.lr_scheduler = LrScheduler()
             dynamic_lr = self.lr_scheduler()(base_lr=self.config.optimizer.params["lr"],
                                              global_step=self.config.epochs * len(self.train_loader),
                                              total_epoch=self.config.epochs)
-            self.optimizer = Optimizer()(model=self.model, dynamic_lr=dynamic_lr)
+
+            self.optimizer = Optimizer()(model=self.model, dynamic_lr=dynamic_lr, no_decay_params=no_decay_params)
         else:
-            self.optimizer = Optimizer()(model=self.model)
+            self.optimizer = Optimizer()(model=self.model, no_decay_params=no_decay_params)
+        logging.info(f"The optimizer is {self.optimizer}.")
         if hasattr(self.model, 'add_loss'):
             loss_cls = Loss()()
             self.model.add_loss(loss_cls)
@@ -55,14 +59,33 @@ def build(self):
         self.ms_metrics = self.valid_metrics() if isinstance(self.valid_metrics(), dict) else {
             self.metric_name: self.valid_metrics()}
 
-        self.ms_model = MsModel(network=self.model,
-                                loss_fn=self.loss,
-                                optimizer=self.optimizer,
-                                metrics=self.ms_metrics)
+        if self.use_amp:
+            loss_scale = FixedLossScaleManager(self.config.loss_scale, drop_overflow_update=False)
+            logging.info(f"Use auto mix precision, and loss scale is {self.config.loss_scale},"
+                         f"loss_scale_manager is {loss_scale}.")
+            self.ms_model = MsModel(network=self.model,
+                                    loss_fn=self.loss,
+                                    optimizer=self.optimizer,
+                                    metrics=self.ms_metrics,
+                                    loss_scale_manager=loss_scale,
+                                    amp_level=self.config.opt_level,
+                                    keep_batchnorm_fp32=self.config.keep_batchnorm_fp32)
+        else:
+            self.ms_model = MsModel(network=self.model,
+                                    loss_fn=self.loss,
+                                    optimizer=self.optimizer,
+                                    metrics=self.ms_metrics)
+
+        if not self.config.with_train:
+            save_path = self.get_local_worker_path(self.step_name, self.worker_id)
+            ckpt_file_name = os.path.join(save_path, "model_" + str(self.worker_id) + ".ckpt")
+            save_checkpoint(self.model, ckpt_file_name)
+            logging.info("Save checkpoint file without training.")
 
-    def _set_condition(self):
+    def init_env(self):
+        """Init mindspore trainer environment."""
+        super().init_env()
         self._init_ms_context()
-        self._init_distributed_setting()
 
     def _train_epoch(self):
         config_ck = CheckpointConfig(save_checkpoint_steps=self.config.save_steps, keep_checkpoint_max=1)
@@ -70,8 +93,12 @@ def _train_epoch(self):
         save_path = self.get_local_worker_path(self.step_name, self.worker_id)
         ckpoint_cb = ModelCheckpoint(config=config_ck, directory=save_path)
         loss_cb = LossMonitor(per_print_times=1)
-        eval_cb = EvalCallBack(self.ms_model, self.valid_loader, self.dataset_sink_mode, self)
-        callback_list = [ckpoint_cb, loss_cb] if self.config.mixup else [ckpoint_cb, loss_cb, eval_cb]
+        time_cb = TimeMonitor(data_size=self.train_loader.get_dataset_size())
+        callback_list = [ckpoint_cb, loss_cb, time_cb]
+        if self.config.eval_per_epoch and not self.config.mixup:
+            eval_cb = EvalCallBack(self.ms_model, self.valid_loader, self.dataset_sink_mode, self)
+            callback_list.append(eval_cb)
+
         try:
             self.ms_model.train(epoch=self.epochs,
                                 train_dataset=self.train_loader,
@@ -102,22 +129,17 @@ def _valid_epoch(self):
             logging.warning("RuntimeError occurred when eval the model. Skip eval this model.")
             logging.warning("The RuntimeError message is : {}.".format(exc))
 
-    def _init_distributed_setting(self):
-        if not self.distributed:
-            return
-        else:
-            logging.info("init hccl ...")
-            context.set_auto_parallel_context(parallel_mode=ParallelMode.DATA_PARALLEL, gradients_mean=True)
-            hccl_init()
-
     def _init_ms_context(self):
-        if hasattr(self.config, "execute_mode"):
-            mode = context.PYNATIVE_MODE if self.config.execute_mode == "PYNATIVE_MODE" else context.GRAPH_MODE
-        else:
-            mode = context.GRAPH_MODE
+        mode = General.ms_execute_mode
+        logging.info(f"Run train/val in mode: {mode}.")
         if vega.is_npu_device():
-            context.set_context(mode=mode, device_target="Ascend", device_id=int(os.environ["DEVICE_ID"]))
+            logging.info(f"minspore context, mode: {context.get_context('mode')}, "
+                         f"target: {context.get_context('device_target')}, "
+                         f"device_id: {context.get_context('device_id')}")
+            logging.info(f"DEVICE_ID: {os.environ['DEVICE_ID']}")
+            context.set_context(mode=mode, device_target="Ascend")
         else:
             context.set_context(mode=mode, device_target="CPU")
 
-        self.dataset_sink_mode = True if vega.is_npu_device() else False
+        self.dataset_sink_mode = General.dataset_sink_mode
+        logging.info(f"Dataset_sink_mode:{self.dataset_sink_mode}.")
diff --git a/vega/trainer/trainer_tf.py b/vega/trainer/trainer_tf.py
index b28ccc8a..1f32da4f 100644
--- a/vega/trainer/trainer_tf.py
+++ b/vega/trainer/trainer_tf.py
@@ -12,7 +12,6 @@
 
 import logging
 import tensorflow as tf
-from vega.common.general import General
 import vega
 from vega.trainer.trainer_base import TrainerBase
 from vega.modules.loss import Loss
@@ -28,18 +27,15 @@ class TrainerTf(TrainerBase):
     def build(self):
         """Build the trainer by assembling the necessary components."""
         super().build()
-
         # Some trainer has different train batch size from valid batch
         self.train_metrics = None
         self.valid_metrics = self._init_metrics()
-        self._init_horovod_setting()
 
-    def _set_default_funcs(self):
+    def set_training_settings(self):
+        """Set trainer training settings."""
         self.model_fn = self._default_model_fn
         self.train_input_fn = self._default_train_input_fn
         self.valid_input_fn = self._default_valid_input_fn
-
-    def _set_condition(self):
         self._init_tf_session()
         self._init_distributed_setting()
         self._init_tf_estimator()
@@ -68,10 +64,7 @@ def _valid_epoch(self):
         self.callbacks.after_valid(valid_logs)
 
     def _init_distributed_setting(self):
-        if not self.distributed:
-            return
-
-        if vega.is_npu_device():
+        if self.hccl:
             sess_config = self._init_session_config()
             self.sess = tf.compat.v1.Session(config=sess_config)
             from npu_bridge.estimator import npu_ops
@@ -79,19 +72,6 @@ def _init_distributed_setting(self):
             self.npu_shutdown = npu_ops.shutdown_system()
             self.sess.run(self.npu_init)
 
-        if vega.is_gpu_device():
-            import horovod.tensorflow as hvd
-            self._world_size = hvd.size()
-            self._rank_id = hvd.rank()
-            self._local_rank_id = hvd.local_rank()
-        elif vega.is_npu_device():
-            from hccl.manage.api import get_local_rank_id
-            from hccl.manage.api import get_rank_size
-            from hccl.manage.api import get_rank_id
-            self._world_size = get_rank_size()
-            self._rank_id = get_rank_id()
-            self._local_rank_id = get_local_rank_id()
-
     def _default_train_input_fn(self):
         return self.train_loader.input_fn()
 
@@ -135,10 +115,11 @@ def _default_model_fn(self, features, labels, mode):
         if mode == tf.estimator.ModeKeys.TRAIN:
             global_step = tf.compat.v1.train.get_or_create_global_step()
             epoch = tf.cast(global_step, tf.float32) / tf.cast(len(self.train_loader), tf.float32)
-            self.optimizer = Optimizer()(distributed=self.distributed)
+            distributed = self.horovod or self.hccl
+            self.optimizer = Optimizer()(distributed=distributed)
             self.lr_scheduler = LrScheduler()(optimizer=self.optimizer)
             self.lr_scheduler.step(epoch)
-            if self.distributed:
+            if distributed:
                 self.optimizer = Optimizer.set_distributed(self.optimizer)
 
             update_ops = tf.compat.v1.get_collection(tf.compat.v1.GraphKeys.UPDATE_OPS)
@@ -195,7 +176,7 @@ def _init_session_config(self):
 
     def _init_logging_hook(self):
         logging_hook = []
-        if vega.is_gpu_device() and self.distributed:
+        if self.horovod:
             import horovod.tensorflow as hvd
             logging_hook += [hvd.BroadcastGlobalVariablesHook(0)]
         return logging_hook
@@ -203,7 +184,7 @@ def _init_logging_hook(self):
     def _init_gpu_estimator(self, sess_config):
         """Init tensorflow estimator."""
         distribution = None
-        if not self.distributed and General._parallel and General.devices_per_trainer > 1:
+        if self.horovod:
             distribution = tf.contrib.distribute.MirroredStrategy()
         config = tf.estimator.RunConfig(model_dir=self.get_local_worker_path(),
                                         save_checkpoints_steps=self.config.save_steps,
@@ -228,9 +209,9 @@ def _init_npu_estimator(self, sess_config):
     def _init_gpu_session_config(self):
         sess_config = tf.compat.v1.ConfigProto()
         sess_config.gpu_options.allow_growth = True
-        if self.distributed:
-            import horovod.tensorflow as hvd
-            sess_config.gpu_options.visible_device_list = str(hvd.local_rank())
+        # if self.horovod:
+        #     import horovod.tensorflow as hvd
+        #     sess_config.gpu_options.visible_device_list = str(hvd.local_rank())
         return sess_config
 
     def _init_npu_session_config(self):
@@ -242,5 +223,4 @@ def _init_npu_session_config(self):
         if self.use_amp:
             custom_op.parameter_map["precision_mode"].s = tf.compat.as_bytes("allow_mix_precision")
         custom_op.parameter_map["use_off_line"].b = True
-
         return sess_config
diff --git a/vega/trainer/trainer_torch.py b/vega/trainer/trainer_torch.py
index 94cee6da..fdd79ec3 100644
--- a/vega/trainer/trainer_torch.py
+++ b/vega/trainer/trainer_torch.py
@@ -27,7 +27,7 @@ def build(self):
         """Build the trainer by assembling the necessary components."""
         super().build()
         if self.optimizer is None:
-            self.optimizer = Optimizer()(model=self.model, distributed=self.distributed)
+            self.optimizer = Optimizer()(model=self.model, distributed=self.horovod)
         if hasattr(self.model, 'add_loss'):
             loss_cls = Loss()()
             self.model.add_loss(loss_cls)
@@ -38,7 +38,9 @@ def build(self):
             self.loss.adaptive_muti_loss(save_path=self.get_local_worker_path(self.step_name, self.worker_id),
                                          weight=self.config.loss_weight)
         if self.lr_scheduler is None:
-            self.lr_scheduler = LrScheduler()(self.optimizer)
+            self.lr_scheduler = LrScheduler()(self.optimizer,
+                                              steps=len(self.train_loader) if self.train_loader is not None else None,
+                                              epochs=self.config.epochs)
         if self.actions_list is not None:
             self.total_optimizer = self.optimizer
             self.total_loss = self.loss
@@ -46,13 +48,13 @@ def build(self):
         # Some trainer has different train batch size from valid batch
         self.train_metrics = self._init_metrics()
         self.valid_metrics = self._init_metrics()
-        self._init_horovod_setting()
         if self.use_amp:
             from apex import amp
             self.model, self.optimizer = amp.initialize(
                 self.model, self.optimizer, opt_level=self.config.opt_level, loss_scale=64, combine_grad=True)
 
-    def _set_default_funcs(self):
+    def set_training_settings(self):
+        """Set trainer training setting."""
         self.make_batch = self._default_make_batch
         if isinstance(self.config.optimizer, list):
             self.train_step = self._multi_train_step
@@ -60,8 +62,9 @@ def _set_default_funcs(self):
             self.train_step = self._default_train_step
         self.valid_step = self._default_valid_step
 
-    def _set_condition(self):
-        self._init_distributed_setting()
+    def init_env(self):
+        """Init trainer environment."""
+        super().init_env()
         torch.manual_seed(self.config.seed)
         self._init_setting()
 
@@ -70,8 +73,6 @@ def _init_setting(self):
         if vega.is_gpu_device():
             import torch.cuda
             self.config.device = vega.is_gpu_device() if vega.is_gpu_device() is not True else 0
-            if self.distributed:
-                torch.cuda.set_device(self._local_rank_id)
             torch.cuda.manual_seed(self.config.seed)
         elif vega.is_npu_device():
             import torch.npu
@@ -83,25 +84,6 @@ def _init_setting(self):
         else:
             raise ValueError('Set a correct device: cuda or npu.')
 
-    def _init_distributed_setting(self):
-        if self.distributed:
-            import horovod.torch as hvd
-            self._world_size = hvd.size()
-            self._rank_id = hvd.rank()
-            self._local_rank_id = hvd.local_rank()
-
-    def _init_horovod_setting(self):
-        """Init horovod setting."""
-        self.is_chief = True
-        if self.distributed:
-            import horovod.torch as hvd
-            hvd.broadcast_parameters(self.model.state_dict(), root_rank=0)
-            hvd.broadcast_optimizer_state(self.optimizer, root_rank=0)
-            if hvd.rank() != 0:
-                self.is_chief = False
-            else:
-                self.is_chief = True
-
     def _train_epoch(self):
         self.model.train()
         for batch_index, batch in enumerate(self.train_loader):
diff --git a/vega/trainer/trial_agent.py b/vega/trainer/trial_agent.py
index 85769d6e..90d3070b 100644
--- a/vega/trainer/trial_agent.py
+++ b/vega/trainer/trial_agent.py
@@ -14,13 +14,9 @@
 import logging
 import pickle
 import vega
-from vega.common import init_log
-from vega.common.task_ops import TaskOps
 from vega.common.general import General
 from vega.report.report_client import ReportClient
 
-logger = logging.getLogger(__name__)
-
 
 class TrialAgent(object):
     """Trial."""
@@ -28,9 +24,6 @@ class TrialAgent(object):
     def __init__(self):
         self._load_config()
         vega.set_backend(General.backend, General.device_category)
-        init_log(level=General.logger.level,
-                 log_file=f"{General.step_name}_worker_{self.worker_id}.log",
-                 log_path=TaskOps().local_log_path)
         self.report_client = ReportClient()
 
     def _load_config(self):
diff --git a/vega/trainer/utils.py b/vega/trainer/utils.py
index e7c5e7b4..967df803 100644
--- a/vega/trainer/utils.py
+++ b/vega/trainer/utils.py
@@ -9,14 +9,12 @@
 # MIT License for more details.
 
 """Utils functions that been used in pipeline."""
+
 import os
 import socket
-import subprocess
-import sys
 import logging
 import signal
 import psutil
-from collections import OrderedDict
 from enum import Enum
 from vega.common import FileOps
 from vega.common.task_ops import TaskOps
@@ -31,88 +29,6 @@ class WorkerTypes(Enum):
     DeviceEvaluator = 5
 
 
-class PairDictQueue():
-    """A special Dict Queue only for Master to use to collect all finished Evaluator results.
-
-    the insert and pop item could only be string or int.
-    as a example for how to used in Evalutor, the stored odict could be :
-    {
-        "step_name::worker1": {"EVALUATE_GPU":0, "EVALUATE_DLOOP":0},
-        "step_name::worker2": {"EVALUATE_GPU":0, "EVALUATE_DLOOP":1},
-        "step_name::worker3": {"EVALUATE_GPU":1, "EVALUATE_DLOOP":0},
-        "step_name::worker4": {"EVALUATE_GPU":1, "EVALUATE_DLOOP":1},
-    }
-    the list could mean each sub-evalutor-worker's status, 0 is not finished,
-    1 is finished, here as example, this list could mean [gpu, dloop].
-    and the key of odict is the id of this task(which combined with step name
-    and worker-id).
-    Only sub-evalutor-worker's all status turn to 1(finshed), could it be able
-    to be popped from this PairDictQueue.
-
-    :param int pair_size: Description of parameter `pair_size`.
-    """
-
-    def __init__(self):
-        self.dq_id = 0
-        self.odict = OrderedDict()
-        return
-
-    def add_new(self, item, type):
-        """Short summary.
-
-        :param type item: Description of parameter `item`.
-        :param type key: Description of parameter `key`.
-        """
-        if item not in self.odict:
-            self.odict[item] = dict()
-        self.odict[item][type] = 0
-
-    def put(self, item, type):
-        """Short summary.
-
-        :param type item: Description of parameter `item`.
-        :param type type: Description of parameter `type`.
-        :return: Description of returned object.
-        :rtype: type
-
-        """
-        if item not in self.odict:
-            logging.debug("item({}) not in PairDictQueue!".format(item))
-            return
-        self.odict[item][type] = 1
-        logging.debug("PairDictQueue add item({}) key({})".format(item, type))
-        return True
-
-    def get(self):
-        """Short summary.
-
-        :return: Description of returned object.
-        :rtype: type
-
-        """
-        item = None
-        for key, subdict in self.odict.items():
-            item_ok = True
-            for k, i in subdict.items():
-                if i != 1:
-                    item_ok = False
-                    break
-            if item_ok:
-                self.odict.pop(key)
-                item = key
-                break
-        return item
-
-    def qsize(self):
-        """Short summary.
-
-        :return: Description of returned object.
-        :rtype: type
-
-        """
-        return len(self.odict)
-
-
 # Here start the stand alone functions for master to use!
 def clean_cuda_proc(master_pid, device_id):
     """Short summary.
@@ -128,25 +44,6 @@ def clean_cuda_proc(master_pid, device_id):
     return
 
 
-def kill_children_proc(sig=signal.SIGTERM, recursive=True,
-                       timeout=1, on_terminate=None):
-    """Kill a process tree of curret process (including grandchildren).
-
-    with signal "sig" and return a (gone, still_alive) tuple.
-    "on_terminate", if specified, is a callabck function which is
-    called as soon as a child terminates.
-    """
-    pid = os.getpid()
-    parent = psutil.Process(pid)
-    children = parent.children(recursive)
-    for p in children:
-        logging.info("children: {}".format(p.as_dict(attrs=['pid', 'name', 'username'])))
-        p.send_signal(sig)
-    gone, alive = psutil.wait_procs(children, timeout=timeout,
-                                    callback=on_terminate)
-    return (gone, alive)
-
-
 def kill_proc_tree(pid, sig=signal.SIGKILL, include_parent=True,
                    timeout=None, on_terminate=None):
     """Kill a process tree (including grandchildren) with signal.
@@ -174,42 +71,6 @@ def kill_proc_tree(pid, sig=signal.SIGKILL, include_parent=True,
     return (gone, alive)
 
 
-def install_and_import_local(package, package_path=None, update=False):
-    """Install and import local python packages.
-
-    :param str package: `package` name that need to install and import.
-    :param package_path: if the package is a local whl, then the `package_path`.
-    :type package_path: str or None
-    :param bool update: Description of parameter `update`.
-
-    """
-    import importlib
-    try:
-        if not update:
-            try:
-                importlib.import_module(package)
-            except ImportError:
-                import pip
-                if hasattr(pip, 'main'):
-                    pip.main(['install', package_path])
-                elif hasattr(pip, '_internal'):
-                    pip._internal.main(['install', package_path])
-                else:
-                    subprocess.call([sys.executable, "-m", "pip", "install",
-                                     package_path])
-        else:
-            import pip
-            if hasattr(pip, 'main'):
-                pip.main(['install', '-U', package_path])
-            elif hasattr(pip, '_internal'):
-                pip._internal.main(['install', '-U', package_path])
-            else:
-                subprocess.call([sys.executable, "-m", "pip", "install", "-U",
-                                 package_path])
-    finally:
-        globals()[package] = importlib.import_module(package)
-
-
 def get_master_address(args):
     """Get master address(ip, port) from `args.init_method`.