Skip to content

Commit

Permalink
[FTT-NAS doc] amend ftt-nas doc, configs. Small fix of the rram patch.
Browse files Browse the repository at this point in the history
  • Loading branch information
walkerning committed Sep 1, 2020
1 parent d9d0ba2 commit 8126860
Show file tree
Hide file tree
Showing 8 changed files with 192 additions and 64 deletions.
43 changes: 38 additions & 5 deletions examples/research/ftt-nas/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,45 @@ If you find this work/repo helpful, please cite:
}
```

#### feature fault model (MiBB)
All experiments are conducted with 8-bit quantization. We use a patch-based quantization library [nics_fix_pytorch](https://github.com/walkerning/nics_fix_pytorch), you can install the compatible version of `nics_fix_pytorch` by:
```
pip install git+git://github.com/walkerning/nics_fix_pytorch.git@9b97b9402521577cf40910ba4f18c790abe5319f
```

Note that since the quantization is simulated, it makes the search and final process much slower.

### feature fault model (MiBB)
The quantization patch of MiBB model is in `examples/research/ftt-nas/fixed_point_plugins/fixed_point_patch_new.py`. And the following scripts will copy this patch into the plugin directory of `aw_nas`. This patch will quantize all the weights before each call of `forward` or `forward_one_step_callback`. And the every feature map is quantized in `aw_nas.objective.fault_injection:FaultInjectionObjective.inject`, which will be called by `forward_one_step_callback`.

`bash ./examples/research/ftt-nas/run_mibb.sh [exp name] ./examples/research/ftt-nas/mibb.yaml --load_state_dict {state_dict}`
MiBB fault injection is conducted in `aw_nas.objective.fault_injection.FaultInjectionObjective.inject`.

*Feel free to ignore this note* An unimportant note is that, there is another patch named `fixed_point_patch.py` that patches `nn.Conv2d` and `nn.Linear` modules directly. During our experiments, we find the previous patch method in `fixed_point_patch_new.py` is faster (see the comments in the patch), thus we use the `fixed_point_patch_new.py` patch.

#### Search
```
FIXED=1 bash ./examples/research/ftt-nas/run_mibb_search.sh ./examples/research/ftt-nas/mibb.yaml
```
Use `GPU=1 ...` to run on different GPUs.

#### Final training
`bash ./examples/research/ftt-nas/run_mibb.sh [exp name] [final config] --load_state_dict {state_dict}`
`(option) GPU=x seed=x fixed=x(0/1)`

#### weight fault model (adSAF)
### weight fault model (adSAF)

Different from the MiBB model, the quantization and fault injection under adSAF fault model are all conducted in the `fixed_point_plugins/fixed_point_rram_patch*.py` patches. These two patches are slightly different.
* The `fixed_point_plugins/fixed_point_rram_patch_all.py` patch adds differently-shifted biases onto the weights. This is only an approximation of bit-stucks in RRAM cells.
* The `fixed_point_plugins/fixed_point_rram_patch_bit.py` patch employs bitwise operations, which corresponds better to the hardware faults.

The experiments in the paper are conducted with the `_all.py` patch.

#### Search
```
bash ./examples/research/ftt-nas/run_adsaf_search.sh ./examples/research/ftt-nas/adsaf.yaml
```

#### Final training
`bash ./examples/research/ftt-nas/run_adsaf.sh adsaf_final ./examples/research/ftt-nas/adsaf_final.yaml`
You can add optional environment variables such as `GPU=x seed=x`. Optionally, can add other arguments such as `--load_state_dict {state_dict}`

`bash ./examples/research/ftt-nas/run_adsaf.sh [exp name] ./examples/research/ftt-nas/adsaf.yaml --load_state_dict {state_dict}`
`(option) GPU=x seed=x`
Because the FTT-NAS experiments are conducted using the commit `27d1aeb4121c320ed11361b705`, I have adapted the `adsaf_final.yaml` configuration to the current master `d9d0ba26870b009778f2209f22fde876c0e55aa2`. But I'm not sure whether there are other sutble changes that would make the results differ. If you find that you cannot reproduce the results with the latest code, you can contact us by email or issue.
92 changes: 92 additions & 0 deletions examples/research/ftt-nas/adsaf_final.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
## ---- Component search_space ----
# ---- Type cnn ----
search_space_type: cnn
search_space_cfg:
# Schedulable attributes:
# cell_layout: [0, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 3]
cell_layout: null
num_cell_groups: 2
num_init_nodes: 2
num_layers: 8
num_node_inputs: 2
num_steps: 4
reduce_cell_groups:
- 1
shared_primitives: # this is not used
- none
- max_pool_3x3
- avg_pool_3x3
- skip_connect
- sep_conv_3x3
- sep_conv_5x5
- dil_conv_3x3
- dil_conv_5x5
# ---- End Type cnn ----
## ---- End Component search_space ----

## ---- Component dataset ----
# ---- Type cifar10 ----
dataset_type: cifar10
dataset_cfg:
# Schedulable attributes:
cutout: null
# ---- End Type cifar10 ----
## ---- End Component dataset ----

## ---- Component final_model ----
# ---- Type cnn_genotype ----
final_model_type: cnn_patch_final_model
final_model_cfg:
# Schedulable attributes: dropout_path_rate
genotypes: "normal_0=[('relu_conv_bn_3x3', 0, 2), ('relu_conv_bn_5x5', 0, 2), ('conv_1x1', 1, 3), ('dil_conv_3x3', 1, 3), ('sep_conv_3x3', 1, 4), ('sep_conv_3x3', 0, 4), ('max_pool_3x3', 2, 5), ('relu_conv_bn_5x5', 1, 5)], reduce_1=[('skip_connect', 0, 2), ('relu_conv_bn_5x5', 1, 2), ('dil_conv_3x3', 2, 3), ('sep_conv_3x3', 0, 3), ('max_pool_3x3', 1, 4), ('conv_1x1', 3, 4), ('dil_conv_3x3', 4, 5), ('skip_connect', 2, 5)]"
auxiliary_cfg: null
auxiliary_head: false
dropout_path_rate: 0.0
dropout_rate: 0.0
init_channels: 20
num_classes: 10
cell_use_preprocess: true
cell_preprocess_stride: relu_conv_bn_3x3
cell_preprocess_normal: relu_conv_bn_3x3
# modified due to interface change of cnn_final_model
# preprocess_op_type: relu_conv_bn_3x3
schedule_cfg: null
stem_multiplier: 3
# ---- End Type cnn_genotype ----
## ---- End Component final_model ----

objective_type: saf_injection
objective_cfg:
as_evaluator_regularization: true
as_controller_regularization: true
inject_prob: 0.08
fault_loss_coeff: 0.7
fault_reward_coeff: 0.2
latency_reward_coeff: 0
activation_fixed_bitwidth: 8

## ---- Component final_trainer ----
# ---- Type cnn_trainer ----
final_trainer_type: cnn_trainer
final_trainer_cfg:
# Schedulable attributes:
auxiliary_head: false
auxiliary_weight: 0.0
add_regularization: true
batch_size: 128
epochs: 100
grad_clip: 5.0
learning_rate: 0.1
momentum: 0.9
no_bias_decay: false
optimizer_type: SGD
optimizer_scheduler:
type: MultiStepLR
milestones: [40, 80]
gamma: 0.1
schedule_cfg: null
warmup_epochs: 0
weight_decay: 0.00004
save_as_state_dict: true
# ---- End Type cnn_trainer ----
## ---- End Component final_trainer ----
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
"""
Script for patching fixed point modules.
"""
import six
import numpy as np
import torch
from torch import nn
Expand Down Expand Up @@ -140,39 +139,12 @@ def __init__(self, *args, **kwargs):
class CNNGenotypeModelPatch(CNNGenotypeModel):
NAME = "cnn_patch_final_model"

SCHEDULABLE_ATTRS = ["dropout_path_rate"]

def __init__(self, search_space, device, genotypes,
num_classes=10, init_channels=36, layer_channels=tuple(), stem_multiplier=3,
dropout_rate=0.1, dropout_path_rate=0.2,
auxiliary_head=False, auxiliary_cfg=None,
use_stem="conv_bn_3x3", stem_stride=1, stem_affine=True,
cell_use_preprocess=True, preprocess_op_type=None,
cell_pool_batchnorm=False, cell_group_kwargs=None,
cell_independent_conn=False,
schedule_cfg=None):
super(CNNGenotypeModelPatch, self).__init__(search_space, device, genotypes,
num_classes, init_channels, layer_channels, stem_multiplier, dropout_rate,
dropout_path_rate, auxiliary_head, auxiliary_cfg,
use_stem, stem_stride, stem_affine,
preprocess_op_type, cell_use_preprocess, cell_pool_batchnorm, cell_group_kwargs,
cell_independent_conn, schedule_cfg)

def set_saf_ratio(self, ratio):
for idx, _module in self.named_modules():
if isinstance(_module, nn.Conv2d):
_module.set_saf_ratio(ratio)

class SubCandidateNetPatch(SubCandidateNet):
"""
The candidate net for SuperNet weights manager.
"""

def __init__(self, super_net, rollout, member_mask, gpus=tuple(), cache_named_members=False,
virtual_parameter_only=True, eval_no_grad=True):
super(SubCandidateNetPatch, self).__init__(super_net, rollout, member_mask, gpus, cache_named_members,
virtual_parameter_only, eval_no_grad)

def set_saf_ratio(self, ratio):
for idx, _module in self.named_modules():
if isinstance(_module, nn.Conv2d):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -185,39 +185,12 @@ def __init__(self, *args, **kwargs):
class CNNGenotypeModelPatch(CNNGenotypeModel):
NAME = "cnn_patch_final_model"

SCHEDULABLE_ATTRS = ["dropout_path_rate"]

def __init__(self, search_space, device, genotypes,
num_classes=10, init_channels=36, layer_channels=tuple(), stem_multiplier=3,
dropout_rate=0.1, dropout_path_rate=0.2,
auxiliary_head=False, auxiliary_cfg=None,
use_stem="conv_bn_3x3", stem_stride=1, stem_affine=True,
cell_use_preprocess=True, preprocess_op_type=None,
cell_pool_batchnorm=False, cell_group_kwargs=None,
cell_independent_conn=False,
schedule_cfg=None):
super(CNNGenotypeModelPatch, self).__init__(search_space, device, genotypes,
num_classes, init_channels, layer_channels, stem_multiplier, dropout_rate,
dropout_path_rate, auxiliary_head, auxiliary_cfg,
use_stem, stem_stride, stem_affine,
cell_use_preprocess, preprocess_op_type, cell_pool_batchnorm, cell_group_kwargs,
cell_independent_conn, schedule_cfg)

def set_saf_ratio(self, ratio):
for idx, _module in self.named_modules():
if isinstance(_module, nn.Conv2d):
_module.set_saf_ratio(ratio)

class SubCandidateNetPatch(SubCandidateNet):
"""
The candidate net for SuperNet weights manager.
"""

def __init__(self, super_net, rollout, member_mask, gpus=tuple(), cache_named_members=False,
virtual_parameter_only=True, eval_no_grad=True):
super(SubCandidateNetPatch, self).__init__(super_net, rollout, member_mask, gpus, cache_named_members,
virtual_parameter_only, eval_no_grad)

def set_saf_ratio(self, ratio):
for idx, _module in self.named_modules():
if isinstance(_module, nn.Conv2d):
Expand Down
6 changes: 4 additions & 2 deletions examples/research/ftt-nas/run_adsaf.sh
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
#!/bin/bash
here=$(dirname "$0")
gpu=${GPU:-0}
weight_fault=1
exp_name=${1}
result_dir=./results/ftt_nas_adsaf/$exp_name
result_dir=$here/results/ftt_nas_adsaf/$exp_name
seed=${seed:-123}

if [[ $weight_fault -gt 0 ]]; then
Expand All @@ -13,7 +14,8 @@ if [[ $weight_fault -gt 0 ]]; then
if [[ ! -e $result_dir/awnas/data ]]; then
ln -s ~/awnas/data $result_dir/awnas/data
fi
cp ./examples/research/ftt-nas/fixed_point_plugins/fixed_point_rram_patch_bit.py $result_dir/awnas/plugins/
# cp $here/fixed_point_plugins/fixed_point_rram_patch_bit.py $result_dir/awnas/plugins/
cp $here/fixed_point_plugins/fixed_point_rram_patch_all.py $result_dir/awnas/plugins/
fi
config=${2}
shift 2
Expand Down
25 changes: 25 additions & 0 deletions examples/research/ftt-nas/run_adsaf_search.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/bin/bash
set -e

here=$(dirname "$0")
weight_fault=1
gpu=${GPU:-0}
cfg_file=${1}
default_exp_name=$(basename ${cfg_file})
default_exp_name=${default_exp_name%.yaml}
exp_name=${2:-${default_exp_name}}
result_dir=$here/results/ftt_nas_adsaf_search/$exp_name

if [[ $weight_fault -gt 0 ]]; then
echo "$result_dir/awnas/plugins"
if [[ -d "$result_dir/awnas/plugins" ]]; then
rm -r $result_dir/awnas/plugins
fi
mkdir -p $result_dir/awnas/plugins
if [[ ! -e $result_dir/awnas/data ]]; then
ln -s ~/awnas/data $result_dir/awnas/data
fi
cp $here/fixed_point_plugins/fixed_point_rram_patch_bit.py $result_dir/awnas/plugins/
fi

AWNAS_HOME=$result_dir/awnas/ awnas search --gpu $gpu --train-dir $result_dir/train --vis-dir results/tensorboard_new/weights/$exp_name/ --save-every 10 $cfg_file --develop
5 changes: 3 additions & 2 deletions examples/research/ftt-nas/run_mibb.sh
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
#!/bin/bash
here=$(dirname "$0")
fixed=${FIXED:-1}
gpu=${GPU:-0}
exp_name=${1}
result_dir=./results/ftt_nas_mibb/$exp_name
result_dir=$here/results/ftt_nas_mibb/$exp_name
seed=${seed:-123}

echo "use plugin dir: $result_dir/awnas/plugins"
Expand All @@ -16,7 +17,7 @@ fi

if [[ $fixed -gt 0 ]]; then
echo "copy fixed patch to plugin dir $result_dir/awnas/plugins/"
cp ./examples/research/ftt-nas/fixed_point_plugins/fixed_point_patch_new.py $result_dir/awnas/plugins/
cp $here/fixed_point_plugins/fixed_point_patch_new.py $result_dir/awnas/plugins/
fi
config=${2}
shift 2
Expand Down
30 changes: 30 additions & 0 deletions examples/research/ftt-nas/run_mibb_search.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/bin/bash
set -e

here=$(dirname "$0")
fixed=${FIXED:-1}
gpu=${GPU:-0}
cfg_file=${1}
default_exp_name=$(basename ${cfg_file})
default_exp_name=${default_exp_name%.yaml}
exp_name=${2:-${default_exp_name}}
result_dir=$here/results/ftt_nas_mibb_search/$exp_name
addi_args=${ADDI_ARGS:-""}

echo "use plugin dir: $result_dir/awnas/plugins"
if [[ -d "$result_dir/awnas/plugins" ]]; then
rm -r $result_dir/awnas/plugins
fi
mkdir -p $result_dir/awnas/plugins
if [[ ! -e $result_dir/awnas/data ]]; then
ln -s $HOME/awnas/data $result_dir/awnas/data
fi

if [[ $fixed -gt 0 ]]; then
echo "copy fixed patch to plugin dir $result_dir/awnas/plugins/"
cp $here/fixed_point_plugins/fixed_point_patch_new.py $result_dir/awnas/plugins/
fi
# For profiling only
# AWNAS_HOME=$result_dir/awnas/ python -m cProfile awnas search --gpu $gpu --train-dir $result_dir/train --vis-dir results/tensorboard/ftt_search_tcad/$exp_name/ --save-every 10 ${2} --develop ${addi_args}

AWNAS_HOME=$result_dir/awnas/ awnas search --gpu $gpu --train-dir $result_dir/train --vis-dir $result_dir/tensorboard --save-every 10 ${cfg_file} --develop ${addi_args}

0 comments on commit 8126860

Please sign in to comment.