The summary of "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift" #15

maiff · 2018-12-15T13:33:24Z

The summary of "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift"

contribution

The paper refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs.
using mini-batch.
refer scale and bias to handle simply normalizing

Towards Reducing Internal Covariate Shif

what is Internal Covariate Shif？I think the simplest answer is that the difference of distribution. For the machine problem, it shows that the different training distribution between testing distribution. Using normalizing, which make them to have zero means and unit variance, can handle this problem.

We define Internal Covariate Shift as the change in the distribution of network activations due to the change in network parameters during training.

Normalization via Mini-Batch Statistics

Note that simply normalizing each input of a layer may change what the layer can represent. For instance, normalizing the inputs of a sigmoid would constrain them to the linear regime of the onlinearity. To address this, we make sure that the transformation inserted in the network can represent the identity transform. To accomplish this, we introduce, for each activation x(k) , a pair of parameters γ (k) , β(k), which scale and shift the normalized value:

the derivative shows that:

the detail about how to derivate

Training and Inference with BatchNormalized Networks

We use the unbiased variance estimate Var[x] = m/(m−1)· EB[σ2], where the expectation is over training mini-batches of size m and σ2 are their sample variances.

other advantages

Batch Normalization enables higher learning rates
- Batch Normalization also makes training more resilient to the parameter scale.
Batch Normalization regularizes the model by experiments showing

maiff added the blog label Dec 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The summary of "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift" #15

The summary of "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift" #15

maiff commented Dec 15, 2018

The summary of "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift" #15

The summary of "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift" #15

Comments

maiff commented Dec 15, 2018