Skip to content

Commit

Permalink
Fixed DPE
Browse files Browse the repository at this point in the history
  • Loading branch information
mrava87 committed Feb 24, 2024
1 parent d5282b3 commit ad47e45
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/lectures/09_mdn.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ A number of more or less simple strategies can however be employed when training

$$
\boldsymbol \theta = \underset{\boldsymbol \theta} {\mathrm{argmin}} \; \sum_{i=1}^{N_s} \frac{log \hat{\sigma}^{(i)2}}{2} +
\frac{||\hat{y}^{(i)} - y^{(i)}||_2^2}{2\hat{\sigma}^{(i)2}} \\
\frac{(\hat{y}^{(i)} - y^{(i)})^2}{2\hat{\sigma}^{(i)2}} \\
$$

with the main difference that not only the mean (here denoted as $\hat{y}^{(i)}$) but also the standard deviation ($\hat{\sigma}^{(i)}$) are produced by the network and therefore function of the free-parameters that we wish to optimize. Intuitively, the numerator of the second term encourages the mean prediction to be close to the observed data, while the denominator makes sure the variance is penalized the predictions. The first term avoids the network making the variance grow to infinity (which would lead to minimizing the second term no matter the mean value prediction).
Expand Down

0 comments on commit ad47e45

Please sign in to comment.