Analysis and Eval | |||||
---|---|---|---|---|---|
Supported Layers | Performance/Resource Utilization | ||||
Performance Eval | |||||
Design and Development | |||||
API Reference | Quantization User Guide for CHaiDNN | Model Zoo | Running Inference on new Network | ||
Creating SDx GUI Project | Configurable Parameters | Custom Platform Generation | Software Layer Plugin | ||
SDSoC Environment User Guide | Hardware-Software Partitioning for Performance |
Design goal of a deep neural network is to achieve best accuracy with maximum performance. CHaiDNN works in fixed point domain for better performance. All the feature maps and trained parameters are converted from single precision to fixed point before the computation starts. The precision parameters can vary a lot depending upon the network, datasets, or even across layers in the same network. Accuracy of a network depends on the precision parameters used to represent the feature maps and trained parameters. Well-crafted precision parameters might give accuracy similar to accuracy obtained from a single precision model.
CHaiDNN supports 2 different types of quantization modes:
-
This quantization technique produces an optimal target quantization from a given network (deploy prototxt and caffemodel) and calibration set (unlabeled input images) without requiring hours of retraining or a labeled dataset for most of the use-cases.
-
Dynamic Fixed Point Quantization
This refers to a fixed point representation where the number of bits for the integer and the fractional part is dynamic in nature. NOTE: For CHai, The total bitwidths(Sign + Integer + Fractional) required should be either 8bit or 6bit for the weights/ input or output activations.
CHaiDNN expects the precision parameters specified in the deploy.prototxt
. The original input caffemodel can be used for the inference. There are two approaches to generate CHai deploy.prototxt
:
Approach-1: Generating prototxt using XportDNN
XportDNN is an unified tool provided to the CHai-users to be able to quickly produce CHai prototxt with appropriate precision parameters specified for various supported layers. CHaiDNN supports Caffe Deep Learning Framework.
Feature | Xilinx Quantizer Mode | Dynamic Fixed Mode |
---|---|---|
Precision/ threshold deduction | Yes | N.A. |
Auto CHai prototxt Generation | Yes | Yes |
User configurable Bitwidth | Yes | Yes |
User configurable Q.F format | N.A. | Yes |
To run the XportDNN, use the command python XportDNN.pyc
with the instructions below.
[-h]
[--quant_type]
- Specify the quantization mode {'Xilinx' or 'DynamicFixed'}, default 'Xilinx'
Arguments required for both quantization modes:
[--deploy_model DEPLOY_MODEL]
- Input deploy prototxt[--weights WEIGHTS]
- FP32 pretrained caffe model[--quantized_deploy_model QUANTIZED_TRAIN_VAL_MODEL]
- Output file name for CHaiDNN deploy prototxt[--bitwidths BITWIDTHS]
- Bit widths for input,params,output default: 6,6,6
Arguments for Xilinx-Quantization mode ONLY:
[--calibration_directory CALIBRATION_DIRECTORY]
- Dir of dataset of original images[--calibration_size CALIBRATION_SIZE]
- Number of images to use for calibration, default is 8[--dims DIMS]
- Dimensions for first layer, default 3,224,224[--transpose TRANSPOSE]
- Passed to caffe.io.Transformer function set_transpose, default 2,0,1[--channel_swap CHANNEL_SWAP]
- Passed to caffe.io.Transformer function set_channel_swap, default 2,1,0[--raw_scale RAW_SCALE]
- Passed to caffe.io.Transformer function set_raw_scale, default 255.0[--mean_value MEAN_VALUE]
- Passed to caffe.io.Transformer function set_mean, default 104,117,123
Arguments for DynamicFixed mode ONLY:
[--fl_bitwidths FL_BITWIDTHS]
- Bit widths for Fl-part of input,params,output default: 2,5,2. Required for "DynamicFixed" quant_mode ONLY.
Points to Remember
-
XportDNN expects the input/ data to be defined as a layer.
📃 Example:
layer { name: "data" type: "Input" top: "data" input_param { shape { #As required dim: 1 dim: 3 dim: 224 dim: 224 } } }
-
Arguments that are not passed will be set with a default value.
-
Threshold values are generated for only supported layers and a warning is generated for non-supported layers.
-
Finetuning/ re-training feature, which could help users improve the accuracy of the DNNs, will be provided soon.
📃 GoogLeNet(Without LRN) v1 Example:
Pre-requisites:
- Pre-trained Caffemodel file
- Associated deploy prototxt
- Calibration dataset(Images). For details, refer to Imagenet for downloading ILSVRC files.
Quantize a network by inference thresholding against a calibration set:
$ python XportDNN.pyc --quant_type "Xilinx" \
--deploy_model ./models/bvlc_googlenet_without_lrn/bvlc_googlenet_without_lrn_deploy.prototxt \
--weights ./models/bvlc_googlenet_without_lrn/bvlc_googlenet_without_lrn.caffemodel \
--quantized_deploy_model ./models/bvlc_googlenet_without_lrn/RistrettoDemo/bvlc_googlenet_without_lrn_quantized_deploy.prototxt \
--calibration_directory ./data/ilsvrc12/ILSVRC2012_img_val --calibration_size 32 \
--bitwidths 6,6,6 --dims 3,224,224 --transpose 2,0,1 \
--channel_swap 2,1,0 --raw_scale 255.0 \
--mean_value 104,117,123 --input_scale 1.0
The output of the above step is a CHaiDNN compatible prototxt
📃 GoogLeNet(Without LRN) v1 Example: Pre-requisites:
- Pre-trained Caffemodel file
- Associated deploy prototxt
XportDNN produces the CHai prototxt with the specified precision parameters for required layers. User is expected to set appropriate precision details(Q.F format) via XportDNN.pyc
arguments --bitwidths
and --fl_bitwidths
for best accuracy
$ python XportDNN.pyc --quant_type "DynamicFixed" \
--deploy_model ./models/bvlc_googlenet_without_lrn/bvlc_googlenet_without_lrn_deploy.prototxt \
--weights ./models/bvlc_googlenet_without_lrn/bvlc_googlenet_without_lrn.caffemodel \
--quantized_deploy_model ./models/bvlc_googlenet_without_lrn/RistrettoDemo/bvlc_googlenet_without_lrn_quantized_deploy.prototxt\
--bitwidths 6,6,6 \
--fl_bitwidths 2,5,2
The output of the above step is a CHaiDNN compatible prototxt.
Approach-2: Generating prototxt manually
CHaiDNN provides a new field precision_param
for every layer to provide various precision parameters. XportDNN will be able to add automatically these precision_param{}
blocks to required layers in the prototxt. Another option would be the user adding appropriate extra fields in the prototxt manually based on the details provided below.
Precision Parameters should be mentioned for input/output feature maps as well as the trained parameters. Total supported bit-widths for each of these parameters are given in the below table:
Item | I/O Feature Maps | Trained Parameters |
---|---|---|
Option#1 | 8 | 8 |
Option#2 | 6 | 6 |
Whether or not precision_param{}
block needs to be specified for a layer-type is as given below:
Layer | Xilinx Quant Mode | DynamicFixed Quant Mode |
---|---|---|
Convolution | Yes | Yes |
BatchNorm | Yes | Yes |
Power | No | No |
Scale | Yes | Yes |
InnerProduct | No | No |
Pooling(Max, Avg) | Yes | Yes |
ReLU | Yes | No |
Deconvolution | Yes | No |
Concat | Yes | No |
Crop | No | No |
Softmax | No | No |
Dropout | No | No |
Permute | Yes | Yes |
Normalize | Yes | No |
Argmax | No | No |
Flatten | No | No |
PriorBox | No | No |
Reshape | No | No |
NMS | No | No |
Eltwise | Yes | Yes |
Depthwise Separable Convolution | Yes | Yes |
Input/ Data | Yes | No |
The terminology for the various precision_params are:
Item | Xilinx Quantization | Dynamic fixed point Quantization | Comments |
---|---|---|---|
Total Bitwidth: weights | bw_params | bw_params | Single value per layer |
fl-bits: weights | -- | fl_params | Single value per layer |
Threshold: weights | th_params | -- | Single value per channel |
Total Bitwidth: Input Activations | bw_layer_in | bw_layer_in | Single value per layer |
fl-bits: Input Activations | -- | fl_layer_in | Single value per layer |
Threshold: Input Activations | th_layer_in | -- | Single value per layer |
Total Bitwidth: Output Activations | bw_layer_out | bw_layer_out | Single value per layer |
fl-bits: Output Activations | -- | fl_layer_out | Single value per layer |
Threshold: Output Activations | th_layer_out | -- | Single value per layer |
CHaiDNN expects the precision parameters for Input and Output feature maps for Convolution, InnerProduct (FC or Fully-Connected), Average Pooling, Eltwise, BatchNorm and Scale layers.
bw_layer_in : Total bitwidth required for Input Feature Maps
fl_layer_in : Fractional bits required for Input Feature Maps
bw_layer_out : Total bitwidth required for Output Feature Maps
fl_layer_out : Fractional bits required for Output Feature Maps
th_layer_in : Threshold for the Input Feature Maps
th_layer_out : Threshold for the Output Feature maps
📃 Eltwise Layer ("DynamicFixed" quant mode) Example:
layer {
bottom: "group3/block1/conv3"
bottom: "group3/block0/eltwise"
top: "group3/block1/eltwise"
name: "group3/block1/eltwise"
type: "Eltwise"
precision_param {
quant_type: "DynamicFixed"
bw_layer_in: 8
bw_layer_out: 8
fl_layer_in: 3
fl_layer_out: 3
}
}
📃 Eltwise Layer ("Xilinx" quant mode) Example:
layer {
bottom: "group3/block1/conv3"
bottom: "group3/block0/eltwise"
top: "group3/block1/eltwise"
name: "group3/block1/eltwise"
type: "Eltwise"
precision_param {
quant_type: "Xilinx"
bwlayer_in: 8
bw_layer_out: 8
th_layer_in: 752.14132257
th_layer_out: 1106.06941786
}
}
>**:pushpin: NOTE:** Eltwise Layer doesn't have any trained parameters, so we don't have to specify `precision_param` block. But CHaiDNN expects the precision parameters for trained parameters of Convolution, InnerProduct (FC), BatchNorm and Scale layers.
bw_params : Total bitwidth required for weights of Convolution and InnerProduct
fl_params : Fractional bits required for weights of Convolution and InnerProduct
📌 NOTE: Bias precision parameters are internally taken care by CHaiDNN.
📃 Convolution Layer ("DynamicFixed" quant mode) Example:
layer {
name: "conv1/7x7_s2"
type: "Convolution"
bottom: "data"
top: "conv1/7x7_s2"
convolution_param {
num_output: 4
pad: 3
kernel_size: 7
stride: 2
}
precision_param {
quant_type: "DynamicFixed"
bw_layer_in: 6
bw_layer_out: 6
bw_params: 6
fl_layer_in: 2
fl_layer_out: 2
fl_params: 5
}
}
📃 Convolution Layer ("Xilinx" quant mode) Example:
layer {
name: "conv1/7x7_s2"
type: "Convolution"
bottom: "data"
top: "conv1/7x7_s2"
convolution_param {
num_output: 4
pad: 3
kernel_size: 7
stride: 2
}
precision_param {
quant_type: "Xilinx"
bw_layer_in: 8
bw_layer_out: 8
bw_params: 8
th_layer_in: 150.866221005
th_layer_out: 752.14132257
th_params: 0.63660389185
th_params: 0.37376704812
th_params: 0.594033837318
th_params: 0.400175541639
}
}
batchnorm_mean_bw : Total bitwidth required for mean of BatchNorm Layer
batchnorm_mean_fl : Fractional bits required for mean of BatchNorm Layer
batchnorm_variance_bw : Total bitwidth required for variance of BatchNorm Layer
batchnorm_variance_fl : Fractional bits required for variance of BatchNorm Layer
📃 Mean and Variance of BatchNorm Layer Example:
layer {
bottom: "conv0"
top: "conv0/bn/mv"
name: "conv0/bn/mv"
type: "BatchNorm"
precision_param {
bw_layer_in: 8
bw_layer_out: 8
fl_layer_in: 3
fl_layer_out: 3
batchnorm_mean_fl: 4
batchnorm_variance_fl: 3
batchnorm_mean_bw: 8
batchnorm_variance_bw: 8
}
}
scale_gamma_bw : Total bitwidth required for gamma of Scale Layer
scale_gamma_fl : Fractional bits required for gamma of Scale Layer
scale_beta_bw : Total bitwidth required for beta of Scale Layer
scale_beta_fl : Fractional bits required for beta of Scale Layer
CHaiDNN fuses BatchNorm layer and Scale Layer into one single operation. So it requires the precision parameters for combined "gamma/sqrt(variance+eps)".
scale_gamma_by_std_bw : Total bitwidth required for 'gamma/sqrt(variance+eps)'
scale_gamma_by_std_fl : Fractional bits required for 'gamma/sqrt(variance+eps)'
📌 NOTE:
scale_gamma_by_std_bw
&scale_gamma_by_std_bw
are mandatory even if scale layer appears alone (without preceding batchnorm layer). In that case, you can put any arbitrary values.📃 Gamma and Beta of Scale Layer Example:
layer {
bottom: "conv0/bn/mv"
top: "conv0/bn/bg"
name: "conv0/bn/bg"
type: "Scale"
precision_param {
bw_layer_in: 8
bw_layer_out: 8
fl_layer_in: 3
fl_layer_out: 3
scale_gamma_fl: 5
scale_beta_fl: 5
scale_gamma_bw: 8
scale_beta_bw: 8
scale_gamma_by_std_bw: 8
scale_gamma_by_std_fl: 2
}
}
Copyright© 2018 Xilinx