Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The summary of "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift" #15

Open
maiff opened this issue Dec 15, 2018 · 0 comments
Labels

Comments

@maiff
Copy link
Owner

maiff commented Dec 15, 2018

The summary of "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift"

paper

contribution

  • The paper refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs.
  • using mini-batch.
  • refer scale and bias to handle simply normalizing

Towards Reducing Internal Covariate Shif

what is Internal Covariate Shif?I think the simplest answer is that the difference of distribution. For the machine problem, it shows that the different training distribution between testing distribution. Using normalizing, which make them to have zero means and unit variance, can handle this problem.

We define Internal Covariate Shift as the change in the distribution of network activations due to the change in network parameters during training.

Normalization via Mini-Batch Statistics

Note that simply normalizing each input of a layer may change what the layer can represent. For instance, normalizing the inputs of a sigmoid would constrain them to the linear regime of the onlinearity. To address this, we make sure that the transformation inserted in the network can represent the identity transform. To accomplish this, we introduce, for each activation x(k) , a pair of parameters γ (k) , β(k), which scale and shift the normalized value:

the derivative shows that:

the detail about how to derivate

Training and Inference with BatchNormalized Networks

We use the unbiased variance estimate Var[x] = m/(m−1)· EB[σ2], where the expectation is over training mini-batches of size m and σ2 are their sample variances.

other advantages

  • Batch Normalization enables higher learning rates
    • Batch Normalization also makes training more resilient to the parameter scale.
  • Batch Normalization regularizes the model by experiments showing
@maiff maiff added the blog label Dec 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant