-
Notifications
You must be signed in to change notification settings - Fork 1
Background knowledge for Bayesian Statistics and Maximum Likelihood Estimation
- Wikipedia: Prior Probability
- Wikipedia: Posterior Probability
- Wikipedia: Maximum Likelihood Estimation
- Youtube: 오토인코더의 모든 것 1/3, 2/3, 3/3
- My blog post(math formula displayed correctly)
The prior probability distribution of an uncertain quantity is the probability distribution about that quantity before some evidence is taken into account. This is often expressed as
The posterior probability of a random event is the conditional probability that is assigned after relevant evidence is taken into account. This is often expressed as
MLE is a method of estimating the parameters of a statistical model, given observations. This is done by finding the parameters that maximizes the likelihood function; by selecting the parameters that make the observed data most probable. We can formulate this problem as follows:
where
We often use the average log-likelihood function
since it has preferable qualities. One of this is illustrated later in this document.
A traditional machine learning model for classification is visualized as the above: we receive an input image
Now, when we create an ML model, we choose a statistical model that our output may follow. Then, our ML model function calculates the parameters of that statistical model. For example, let us assume that our output
$$f_\theta(x) =\begin{bmatrix}\mu\\sigma\end{bmatrix}$$.
Thus for each input
If we assume that our inputs are independent and identically distributed (i.i.d), we can obtain the following:
Rewriting our optimization problem:
When we perform inference from our model, we no longer get determined outputs as we did in traditional machine learning models. We now get a distribution of
where we should sample a single
Two famous loss functions, mean square error and cross-entropy error, can be derived using the MLE perspective.
2019 Deepest Season 5 (2018.12.29 ~ 2019.06.15)