From 393ea4a09abe4a252ae589e1eea6213b2835137f Mon Sep 17 00:00:00 2001
From: Ivan Yashchuk <ivan.yashchuk@aalto.fi>
Date: Wed, 15 Apr 2020 09:55:05 +0300
Subject: [PATCH 1/9] Fixed typos

---
 algebraic-equations.Rmd | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/algebraic-equations.Rmd b/algebraic-equations.Rmd
index f04e835..7437419 100644
--- a/algebraic-equations.Rmd
+++ b/algebraic-equations.Rmd
@@ -24,7 +24,7 @@ Then
 This chapter is mostly relevant to the case where we cannot evaluate $u$ analytically
 or by doing a matrix solve, which we would do if $f$ were a linear function of $u$.
 Rather, we compute $u$ using a numerical solver.
-The classic example for such a solver is the Newton-Rhapson method.
+The classic example for such a solver is the Newton-Raphson method.
 Most numerical solvers are based on iterative algorithms, 
 which start with an initial guess $u_0$, and update the guess until they find an
 acceptable (by some metric) solution.
@@ -65,7 +65,7 @@ We can derive the above result as follows:
 where we assume the requisite differentiation and inversion are possible.
 
 It remains to evaluate $\partial f / \partial u$ and $\partial f / \partial \psi$
-using automtic differentiation.
+using automatic differentiation.
 This approach, compared to the direct method, can be orders of magnitude faster.
 
 
@@ -80,14 +80,14 @@ that depends on $u$, and potentially also on $\psi$,
 Here, we chose $j$ to be a scalar, as would be the case when differentiating
 a probability density or an objective function.
 
-One of the key factors that makes automatic differentiation so successfull
-is that we do not explicitly construct the Jacobian matrices, incured by intermediate operations
+One of the key factors that makes automatic differentiation so successful
+is that we do not explicitly construct the Jacobian matrices, incurred by intermediate operations
 required to compute $j$.
-Rahter, we only sequentially compute cotangent-Jacobian products in reverse mode,
+Rather, we only sequentially compute cotangent-Jacobian products in reverse mode,
 or Jacobian-tangent products in forward mode.
 Applying this logic, we should aim to _not_ compute $\mathrm d u / \mathrm d \psi$ explicitly.
 
-This is where the _adjoint method_ comes into play. It was not originally developped
+This is where the _adjoint method_ comes into play. It was not originally developed
 for algebraic equations, but it's other applications are a bit more involved,
 so introducing it here has pedagogical value.
 The goal is to remove the explicit dependence on $\mathrm d u / \mathrm d \psi$.

From 3560b765412404e3ef9667baec81283d06477821 Mon Sep 17 00:00:00 2001
From: Ivan Yashchuk <ivan.yashchuk@aalto.fi>
Date: Wed, 15 Apr 2020 11:10:10 +0300
Subject: [PATCH 2/9] Added a few fixes to implicit function theorem section

---
 algebraic-equations.Rmd | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/algebraic-equations.Rmd b/algebraic-equations.Rmd
index 7437419..b61b29c 100644
--- a/algebraic-equations.Rmd
+++ b/algebraic-equations.Rmd
@@ -10,7 +10,7 @@ and in differential equations.
 Formally, consider the function
 \begin{eqnarray*}
   f: & \mathbb R^n \times \mathbb R^p \rightarrow \mathbb R^n \\
-     & (x, \psi) \rightarrow f(x, \psi)
+     & (x, \psi) \mapsto f(x, \psi)
 \end{eqnarray*}
 Our goal is to find to find the value of $x$, such that $f = 0$,
 given a value of $\psi$ and to propagate sensitivities with respect to $\psi$
@@ -38,11 +38,15 @@ Instead, we can exploit the structure of the problem to construct efficient
 differentiation algorithms.
 
 
-## The Implicit function theorem
+## The Implicit function theorem \ Tangent linear method?
 
 The _implicit function theorem_ states that under certain regularity conditions,
-we can express $u$ as a function of $\psi$, that is $u = u(\psi)$ and
-furthermore
+we can express $u$ as a function of $\psi$, that is
+\begin{eqnarray*}
+  u: & \mathbb R^p \rightarrow \mathbb R^n \\
+     & \psi \mapsto u(\psi)
+\end{eqnarray*}
+and furthermore
 \begin{equation*}
   \frac{\mathrm d u}{\mathrm d \psi} 
     = - \left [ \frac{\partial f}{\partial u} \right]^{-1} \frac{\partial f}{\partial \psi} 
@@ -54,8 +58,8 @@ in the neighborhood of $x = u$, and if $\partial f / \partial x$ is invertible.
 We can derive the above result as follows:
 \begin{align*}
   & f(u, \psi) = 0  \\
-  \implies & \frac{\mathrm d f}{\mathrm d \psi}(u, \psi) = 0 \\
-  \iff & \frac{\partial f}{\partial \psi} 
+  \implies & \frac{\mathrm d f}{\mathrm d \psi}(u, \psi) = \frac{\mathrm d f}{\mathrm d \psi} 0 \\
+  \iff & \frac{\partial f}{\partial \psi} \frac{\mathrm d \psi}{\mathrm d \psi}
     + \frac{\partial f}{\partial u}\frac{\mathrm d u}{\mathrm d \psi} = 0 \\
   \iff & \frac{\partial f}{\partial u}\frac{\mathrm d u}{\mathrm d \psi} 
     = - \frac{\partial f}{\partial \psi}  \\

From 24a787921608043d0d9145636f8a6673078d8b40 Mon Sep 17 00:00:00 2001
From: Ivan Yashchuk <ivan.yashchuk@aalto.fi>
Date: Wed, 15 Apr 2020 11:36:21 +0300
Subject: [PATCH 3/9] Added (imo better) derivation of the adjoint approach

---
 algebraic-equations.Rmd | 38 ++++++++++++++++++++++++++++++--------
 1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/algebraic-equations.Rmd b/algebraic-equations.Rmd
index b61b29c..092dc87 100644
--- a/algebraic-equations.Rmd
+++ b/algebraic-equations.Rmd
@@ -49,7 +49,7 @@ we can express $u$ as a function of $\psi$, that is
 and furthermore
 \begin{equation*}
   \frac{\mathrm d u}{\mathrm d \psi} 
-    = - \left [ \frac{\partial f}{\partial u} \right]^{-1} \frac{\partial f}{\partial \psi} 
+    = -  \frac{\partial f}{\partial u}^{-1} \frac{\partial f}{\partial \psi} 
 \end{equation*}
 The derivatives here are short-handed for Jacobian matrices.
 The derivative exists if $f$ is differentiable with respect to $x$ and $\psi$
@@ -63,10 +63,11 @@ We can derive the above result as follows:
     + \frac{\partial f}{\partial u}\frac{\mathrm d u}{\mathrm d \psi} = 0 \\
   \iff & \frac{\partial f}{\partial u}\frac{\mathrm d u}{\mathrm d \psi} 
     = - \frac{\partial f}{\partial \psi}  \\
-  \iff & \frac{\mathrm d u}{\mathrm d \psi} = - \left [\frac{\partial f}{\partial u} \right]^{-1}
+  \iff & \frac{\mathrm d u}{\mathrm d \psi} = - \frac{\partial f}{\partial u}^{-1}
     \frac{\partial f}{\partial \psi}
 \end{align*}
 where we assume the requisite differentiation and inversion are possible.
+The linear system that is solved for $\frac{\mathrm d u}{\mathrm d \psi}$ is called the _tangent linear system_.
 
 It remains to evaluate $\partial f / \partial u$ and $\partial f / \partial \psi$
 using automatic differentiation.
@@ -78,11 +79,9 @@ This approach, compared to the direct method, can be orders of magnitude faster.
 For many applications, our goal is not to differentiate $u$, but a functional $j$
 that depends on $u$, and potentially also on $\psi$,
 \begin{eqnarray*}
-  j : & \mathbb R^n \times \mathbb R^p \to \mathbb R \\
-      & (u, \psi) \to j(u, \psi)
+  j : & \mathbb R^n \times \mathbb R^p \to \mathbb R^m \\
+      & (u, \psi) \mapsto j(u, \psi)
 \end{eqnarray*}
-Here, we chose $j$ to be a scalar, as would be the case when differentiating
-a probability density or an objective function.
 
 One of the key factors that makes automatic differentiation so successful
 is that we do not explicitly construct the Jacobian matrices, incurred by intermediate operations
@@ -98,8 +97,31 @@ The goal is to remove the explicit dependence on $\mathrm d u / \mathrm d \psi$.
 
 We start with
 \begin{equation*}
-  \frac{\mathrm d j}{\mathrm d \psi} = \frac{\partial j}{\partial \psi} 
-    + \frac{\partial j}{\partial u} \frac{\mathrm d u}{\mathrm d \psi}
+  \frac{\mathrm d j}{\mathrm d \psi} = \frac{\partial j}{\partial u} \frac{\mathrm d u}{\mathrm d \psi}
+  + \frac{\partial j}{\partial \psi}.
+\end{equation*}
+Then substitute the expression for $\frac{\mathrm d u}{\mathrm d \psi}$ from the previous section
+\begin{equation*}
+  \frac{\mathrm d j}{\mathrm d \psi} = - \frac{\partial j}{\partial u} \frac{\partial f}{\partial u}^{-1}
+    \frac{\partial f}{\partial \psi}
+  + \frac{\partial j}{\partial \psi}.
+\end{equation*}
+Take the adjoint (transpose) of $\frac{\mathrm d j}{\mathrm d \psi}$
+\begin{equation*}
+  \frac{\mathrm d j}{\mathrm d \psi}^{*} = - \frac{\partial f}{\partial \psi}^{*} \frac{\partial f}{\partial u}^{-*}
+    \frac{\partial j}{\partial u}^{*}
+  + \frac{\partial j}{\partial \psi}^{*}.
+\end{equation*}
+Define a new variable $\lambda$ as
+\begin{equation*}
+  \lambda = - \frac{\partial f}{\partial u}^{-*} \frac{\partial j}{\partial u}^{*}.
+\end{equation*}
+The linear system that is solved for $\lambda$ is called the _adjoint system_.
+Having the solution to the adjoint system \lambda, it remains to evaluate \frac{\mathrm d j}{\mathrm d \psi}^{*}
+with one additional matrix multiplication operation
+\begin{equation*}
+  \frac{\mathrm d j}{\mathrm d \psi}^{*} = - \frac{\partial f}{\partial \psi}^{*} \lambda
+  + \frac{\partial j}{\partial \psi}^{*}.
 \end{equation*}
 
 Consider now the _Lagrangian_,

From 41b22a2a5d733075ba5772beb5f3762fab3c4314 Mon Sep 17 00:00:00 2001
From: Ivan Yashchuk <ivan.yashchuk@aalto.fi>
Date: Wed, 15 Apr 2020 12:20:24 +0300
Subject: [PATCH 4/9] Added info on forward and reverse AD

---
 algebraic-equations.Rmd | 33 ++++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/algebraic-equations.Rmd b/algebraic-equations.Rmd
index 092dc87..3cccce4 100644
--- a/algebraic-equations.Rmd
+++ b/algebraic-equations.Rmd
@@ -114,7 +114,7 @@ Take the adjoint (transpose) of $\frac{\mathrm d j}{\mathrm d \psi}$
 \end{equation*}
 Define a new variable $\lambda$ as
 \begin{equation*}
-  \lambda = - \frac{\partial f}{\partial u}^{-*} \frac{\partial j}{\partial u}^{*}.
+  \lambda = \frac{\partial f}{\partial u}^{-*} \frac{\partial j}{\partial u}^{*}.
 \end{equation*}
 The linear system that is solved for $\lambda$ is called the _adjoint system_.
 Having the solution to the adjoint system \lambda, it remains to evaluate \frac{\mathrm d j}{\mathrm d \psi}^{*}
@@ -167,4 +167,35 @@ and in the unifying approach it provides when studying implicit functions.
 
 ## Practical considerations
 
+One of the key factors that makes automatic differentiation so successful
+is that we do not explicitly construct the Jacobian matrices.
+Rather, we only sequentially compute cotangent-Jacobian products in reverse mode,
+or Jacobian-tangent products in forward mode.
+
+### Tangent linear equation and forward mode AD
+
+In forward mode given the tangent vector $\dot{\psi}$ we evaluate the Jacobian-tangent product
+\begin{equation*}
+  (\psi, \dot{\psi}) \mapsto \frac{\mathrm d u(\psi)}{\mathrm d \psi} \dot{\psi},
+\end{equation*}
+that is the solution to the following linear system
+\begin{equation*}
+  \frac{\partial f}{\partial u} \left(\frac{\mathrm d u(\psi)}{\mathrm d \psi} \dot{\psi} \right) =
+  - \frac{\partial f}{\partial \psi} \cdot \dot{psi}.
+\end{equation*}
+
+### Adjoint equation and reverse mode AD
+
+In reverse mode given the cotangent vector $\bar{\psi}$ we evaluate the Jacobian-transpose-vector product
+\begin{equation*}
+  (\psi, \bar{\psi}) \mapsto \frac{\mathrm d u(\psi)}{\mathrm d \psi}^{*} \bar{\psi},
+\end{equation*}
+that is evaluated as
+\begin{equation*}
+  \frac{\mathrm d u(\psi)}{\mathrm d \psi}^{*} \bar{\psi} = - \frac{\partial f}{\partial \psi}^{*} \cdot \lambda,
+\end{equation*}
+where $\lambda$ is the solution to the following linear system
+\begin{equation*}
+  \frac{\partial f}{\partial u}^{*} \lambda = \bar{psi}.
+\end{equation*}
 

From 2c459c3c56c97d9b1716cc0c6255fb9aa16ec050 Mon Sep 17 00:00:00 2001
From: Ivan Yashchuk <ivan.yashchuk@aalto.fi>
Date: Wed, 15 Apr 2020 12:29:29 +0300
Subject: [PATCH 5/9] Renamed algebraic eqns -> nonlinear system

---
 algebraic-equations.Rmd | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/algebraic-equations.Rmd b/algebraic-equations.Rmd
index 3cccce4..6ce57ba 100644
--- a/algebraic-equations.Rmd
+++ b/algebraic-equations.Rmd
@@ -1,6 +1,6 @@
-# Algebraic equations
+# Nonlinear system of equations
 
-Algebraic equations take the form $f(x) = 0$.
+Nonlinear system of equations take the form $f(x) = 0$.
 The solution to such equations likely constitutes the simplest non-trivial example
 of an _implicit function_. With it as a motivating problem, we
 develop the principles and mechanisms needed to differentiate more sophisticated
@@ -38,7 +38,7 @@ Instead, we can exploit the structure of the problem to construct efficient
 differentiation algorithms.
 
 
-## The Implicit function theorem \ Tangent linear method?
+## The tangent linear method
 
 The _implicit function theorem_ states that under certain regularity conditions,
 we can express $u$ as a function of $\psi$, that is
@@ -74,7 +74,7 @@ using automatic differentiation.
 This approach, compared to the direct method, can be orders of magnitude faster.
 
 
-## The Adjoint method
+## The adjoint method
 
 For many applications, our goal is not to differentiate $u$, but a functional $j$
 that depends on $u$, and potentially also on $\psi$,

From c5b9c16ff343bb7980a95b4b9502694a416765c8 Mon Sep 17 00:00:00 2001
From: Ivan Yashchuk <ivan.yashchuk@aalto.fi>
Date: Wed, 15 Apr 2020 12:36:30 +0300
Subject: [PATCH 6/9] Fixed typo

---
 algebraic-equations.Rmd | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/algebraic-equations.Rmd b/algebraic-equations.Rmd
index 6ce57ba..c67e392 100644
--- a/algebraic-equations.Rmd
+++ b/algebraic-equations.Rmd
@@ -181,7 +181,7 @@ In forward mode given the tangent vector $\dot{\psi}$ we evaluate the Jacobian-t
 that is the solution to the following linear system
 \begin{equation*}
   \frac{\partial f}{\partial u} \left(\frac{\mathrm d u(\psi)}{\mathrm d \psi} \dot{\psi} \right) =
-  - \frac{\partial f}{\partial \psi} \cdot \dot{psi}.
+  - \frac{\partial f}{\partial \psi} \cdot \dot{\psi}.
 \end{equation*}
 
 ### Adjoint equation and reverse mode AD
@@ -196,6 +196,6 @@ that is evaluated as
 \end{equation*}
 where $\lambda$ is the solution to the following linear system
 \begin{equation*}
-  \frac{\partial f}{\partial u}^{*} \lambda = \bar{psi}.
+  \frac{\partial f}{\partial u}^{*} \lambda = \bar{\psi}.
 \end{equation*}
 

From 027c3060b86731109a6bbe46b16455a579453433 Mon Sep 17 00:00:00 2001
From: Ivan Yashchuk <ivan.yashchuk@aalto.fi>
Date: Wed, 15 Apr 2020 21:23:49 +0300
Subject: [PATCH 7/9] Revert "Renamed algebraic eqns -> nonlinear system"

This reverts commit 2c459c3c56c97d9b1716cc0c6255fb9aa16ec050.
---
 algebraic-equations.Rmd | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/algebraic-equations.Rmd b/algebraic-equations.Rmd
index c67e392..31c83b5 100644
--- a/algebraic-equations.Rmd
+++ b/algebraic-equations.Rmd
@@ -1,6 +1,6 @@
-# Nonlinear system of equations
+# Algebraic equations
 
-Nonlinear system of equations take the form $f(x) = 0$.
+Algebraic equations take the form $f(x) = 0$.
 The solution to such equations likely constitutes the simplest non-trivial example
 of an _implicit function_. With it as a motivating problem, we
 develop the principles and mechanisms needed to differentiate more sophisticated
@@ -38,7 +38,7 @@ Instead, we can exploit the structure of the problem to construct efficient
 differentiation algorithms.
 
 
-## The tangent linear method
+## The Implicit function theorem \ Tangent linear method?
 
 The _implicit function theorem_ states that under certain regularity conditions,
 we can express $u$ as a function of $\psi$, that is
@@ -74,7 +74,7 @@ using automatic differentiation.
 This approach, compared to the direct method, can be orders of magnitude faster.
 
 
-## The adjoint method
+## The Adjoint method
 
 For many applications, our goal is not to differentiate $u$, but a functional $j$
 that depends on $u$, and potentially also on $\psi$,

From 028b65898d138f873bf9ff139aad3ed1ad194178 Mon Sep 17 00:00:00 2001
From: Ivan Yashchuk <ivan.yashchuk@aalto.fi>
Date: Thu, 16 Apr 2020 17:29:12 +0300
Subject: [PATCH 8/9] Addressed the requested changes

---
 algebraic-equations.Rmd | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/algebraic-equations.Rmd b/algebraic-equations.Rmd
index 31c83b5..0665574 100644
--- a/algebraic-equations.Rmd
+++ b/algebraic-equations.Rmd
@@ -38,7 +38,7 @@ Instead, we can exploit the structure of the problem to construct efficient
 differentiation algorithms.
 
 
-## The Implicit function theorem \ Tangent linear method?
+## The Implicit function theorem
 
 The _implicit function theorem_ states that under certain regularity conditions,
 we can express $u$ as a function of $\psi$, that is
@@ -49,7 +49,7 @@ we can express $u$ as a function of $\psi$, that is
 and furthermore
 \begin{equation*}
   \frac{\mathrm d u}{\mathrm d \psi} 
-    = -  \frac{\partial f}{\partial u}^{-1} \frac{\partial f}{\partial \psi} 
+    = - \left [ \frac{\partial f}{\partial u} \right]^{-1} \frac{\partial f}{\partial \psi} 
 \end{equation*}
 The derivatives here are short-handed for Jacobian matrices.
 The derivative exists if $f$ is differentiable with respect to $x$ and $\psi$
@@ -63,7 +63,7 @@ We can derive the above result as follows:
     + \frac{\partial f}{\partial u}\frac{\mathrm d u}{\mathrm d \psi} = 0 \\
   \iff & \frac{\partial f}{\partial u}\frac{\mathrm d u}{\mathrm d \psi} 
     = - \frac{\partial f}{\partial \psi}  \\
-  \iff & \frac{\mathrm d u}{\mathrm d \psi} = - \frac{\partial f}{\partial u}^{-1}
+  \iff & \frac{\mathrm d u}{\mathrm d \psi} = - \left [\frac{\partial f}{\partial u} \right]^{-1}
     \frac{\partial f}{\partial \psi}
 \end{align*}
 where we assume the requisite differentiation and inversion are possible.
@@ -79,9 +79,11 @@ This approach, compared to the direct method, can be orders of magnitude faster.
 For many applications, our goal is not to differentiate $u$, but a functional $j$
 that depends on $u$, and potentially also on $\psi$,
 \begin{eqnarray*}
-  j : & \mathbb R^n \times \mathbb R^p \to \mathbb R^m \\
+  j : & \mathbb R^n \times \mathbb R^p \to \mathbb R \\
       & (u, \psi) \mapsto j(u, \psi)
 \end{eqnarray*}
+Here, we chose $j$ to be a scalar, as would be the case when differentiating
+a probability density or an objective function.
 
 One of the key factors that makes automatic differentiation so successful
 is that we do not explicitly construct the Jacobian matrices, incurred by intermediate operations

From 96f0b3f67e2989ea053fd87f7d59ba2852e5c4fd Mon Sep 17 00:00:00 2001
From: Ivan Yashchuk <ivan.yashchuk@aalto.fi>
Date: Thu, 16 Apr 2020 17:38:16 +0300
Subject: [PATCH 9/9] Move new content to summary of results section

---
 algebraic-equations.Rmd | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/algebraic-equations.Rmd b/algebraic-equations.Rmd
index 0665574..88d6b7a 100644
--- a/algebraic-equations.Rmd
+++ b/algebraic-equations.Rmd
@@ -167,7 +167,7 @@ Note that we could have obtained this result by using the implicit function theo
 The adjoint method however has merit in its ability to solve more complicated problems
 and in the unifying approach it provides when studying implicit functions.
 
-## Practical considerations
+## Summary of results
 
 One of the key factors that makes automatic differentiation so successful
 is that we do not explicitly construct the Jacobian matrices.
@@ -201,3 +201,4 @@ where $\lambda$ is the solution to the following linear system
   \frac{\partial f}{\partial u}^{*} \lambda = \bar{\psi}.
 \end{equation*}
 
+## Practical considerations