From fd6a64df122e34f6361f3ab565f4421a3eead019 Mon Sep 17 00:00:00 2001
From: mrava87 <matteoravasi@gmail.com>
Date: Fri, 16 Feb 2024 21:16:33 +0300
Subject: [PATCH] Small fix to 08_gradopt1

---
 docs/lectures/08_gradopt1.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/lectures/08_gradopt1.md b/docs/lectures/08_gradopt1.md
index 75d15a5..0b6b7dc 100644
--- a/docs/lectures/08_gradopt1.md
+++ b/docs/lectures/08_gradopt1.md
@@ -32,14 +32,14 @@ $$
 and evaluate it at the next gradient step $\boldsymbol \theta = \boldsymbol \theta_0 - \alpha \mathbf{g}$:
 
 $$
-J(\boldsymbol \theta_0 - \alpha \mathbf{g}) \approx J(\boldsymbol \theta_0) - \mathbf{g}^T \mathbf{g} + 
+J(\boldsymbol \theta_0 - \alpha \mathbf{g}) \approx J(\boldsymbol \theta_0) - \alpha \mathbf{g}^T \mathbf{g} + 
 \frac{1}{2} \alpha^2 \mathbf{g}^T \mathbf{H} \mathbf{g}
 $$
 
 We can interpret this expression as follows: a gradient step of $- \alpha \mathbf{g}$ adds the following contribution
-to the cost function, $-\mathbf{g}^T \mathbf{g} + 
+to the cost function, $-\alpha \mathbf{g}^T \mathbf{g} + 
 \frac{1}{2} \alpha^2 \mathbf{g}^T \mathbf{H} \mathbf{g}$. When this contribution is positive (i.e., 
-$\frac{1}{2} \alpha^2 \mathbf{g}^T \mathbf{H} \mathbf{g} > \mathbf{g}^T \mathbf{g}$), the cost function grows instead of
+$\frac{1}{2} \alpha^2 \mathbf{g}^T \mathbf{H} \mathbf{g} > \alpha\mathbf{g}^T \mathbf{g}$), the cost function grows instead of
 being reduced. Under the assumption that $\mathbf{H}$ is known, we could easily choose a step-size $\alpha$ that prevents this from happening. However, when the Hessian cannot be estimated, a conservative selection of the step-size is the only remedy to prevent the cost function from growing. A downside of such an approach is that the smaller the learning rate the slower the training process.
 
 ### Local minima