-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 7/implicit functions #8
base: issue-7/implicit_functions
Are you sure you want to change the base?
Changes from all commits
393ea4a
3560b76
24a7879
41b22a2
2c459c3
c5b9c16
027c306
028b658
96f0b3f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,7 +10,7 @@ and in differential equations. | |
Formally, consider the function | ||
\begin{eqnarray*} | ||
f: & \mathbb R^n \times \mathbb R^p \rightarrow \mathbb R^n \\ | ||
& (x, \psi) \rightarrow f(x, \psi) | ||
& (x, \psi) \mapsto f(x, \psi) | ||
\end{eqnarray*} | ||
Our goal is to find to find the value of $x$, such that $f = 0$, | ||
given a value of $\psi$ and to propagate sensitivities with respect to $\psi$ | ||
|
@@ -24,7 +24,7 @@ Then | |
This chapter is mostly relevant to the case where we cannot evaluate $u$ analytically | ||
or by doing a matrix solve, which we would do if $f$ were a linear function of $u$. | ||
Rather, we compute $u$ using a numerical solver. | ||
The classic example for such a solver is the Newton-Rhapson method. | ||
The classic example for such a solver is the Newton-Raphson method. | ||
Most numerical solvers are based on iterative algorithms, | ||
which start with an initial guess $u_0$, and update the guess until they find an | ||
acceptable (by some metric) solution. | ||
|
@@ -41,8 +41,12 @@ differentiation algorithms. | |
## The Implicit function theorem | ||
|
||
The _implicit function theorem_ states that under certain regularity conditions, | ||
we can express $u$ as a function of $\psi$, that is $u = u(\psi)$ and | ||
furthermore | ||
we can express $u$ as a function of $\psi$, that is | ||
\begin{eqnarray*} | ||
u: & \mathbb R^p \rightarrow \mathbb R^n \\ | ||
& \psi \mapsto u(\psi) | ||
\end{eqnarray*} | ||
and furthermore | ||
\begin{equation*} | ||
\frac{\mathrm d u}{\mathrm d \psi} | ||
= - \left [ \frac{\partial f}{\partial u} \right]^{-1} \frac{\partial f}{\partial \psi} | ||
|
@@ -54,18 +58,19 @@ in the neighborhood of $x = u$, and if $\partial f / \partial x$ is invertible. | |
We can derive the above result as follows: | ||
\begin{align*} | ||
& f(u, \psi) = 0 \\ | ||
\implies & \frac{\mathrm d f}{\mathrm d \psi}(u, \psi) = 0 \\ | ||
\iff & \frac{\partial f}{\partial \psi} | ||
\implies & \frac{\mathrm d f}{\mathrm d \psi}(u, \psi) = \frac{\mathrm d f}{\mathrm d \psi} 0 \\ | ||
\iff & \frac{\partial f}{\partial \psi} \frac{\mathrm d \psi}{\mathrm d \psi} | ||
+ \frac{\partial f}{\partial u}\frac{\mathrm d u}{\mathrm d \psi} = 0 \\ | ||
\iff & \frac{\partial f}{\partial u}\frac{\mathrm d u}{\mathrm d \psi} | ||
= - \frac{\partial f}{\partial \psi} \\ | ||
\iff & \frac{\mathrm d u}{\mathrm d \psi} = - \left [\frac{\partial f}{\partial u} \right]^{-1} | ||
\frac{\partial f}{\partial \psi} | ||
\end{align*} | ||
where we assume the requisite differentiation and inversion are possible. | ||
The linear system that is solved for $\frac{\mathrm d u}{\mathrm d \psi}$ is called the _tangent linear system_. | ||
|
||
It remains to evaluate $\partial f / \partial u$ and $\partial f / \partial \psi$ | ||
using automtic differentiation. | ||
using automatic differentiation. | ||
This approach, compared to the direct method, can be orders of magnitude faster. | ||
|
||
|
||
|
@@ -75,27 +80,50 @@ For many applications, our goal is not to differentiate $u$, but a functional $j | |
that depends on $u$, and potentially also on $\psi$, | ||
\begin{eqnarray*} | ||
j : & \mathbb R^n \times \mathbb R^p \to \mathbb R \\ | ||
& (u, \psi) \to j(u, \psi) | ||
& (u, \psi) \mapsto j(u, \psi) | ||
\end{eqnarray*} | ||
Here, we chose $j$ to be a scalar, as would be the case when differentiating | ||
IvanYashchuk marked this conversation as resolved.
Show resolved
Hide resolved
|
||
a probability density or an objective function. | ||
|
||
One of the key factors that makes automatic differentiation so successfull | ||
is that we do not explicitly construct the Jacobian matrices, incured by intermediate operations | ||
One of the key factors that makes automatic differentiation so successful | ||
is that we do not explicitly construct the Jacobian matrices, incurred by intermediate operations | ||
required to compute $j$. | ||
Rahter, we only sequentially compute cotangent-Jacobian products in reverse mode, | ||
Rather, we only sequentially compute cotangent-Jacobian products in reverse mode, | ||
or Jacobian-tangent products in forward mode. | ||
Applying this logic, we should aim to _not_ compute $\mathrm d u / \mathrm d \psi$ explicitly. | ||
|
||
This is where the _adjoint method_ comes into play. It was not originally developped | ||
This is where the _adjoint method_ comes into play. It was not originally developed | ||
IvanYashchuk marked this conversation as resolved.
Show resolved
Hide resolved
|
||
for algebraic equations, but it's other applications are a bit more involved, | ||
so introducing it here has pedagogical value. | ||
The goal is to remove the explicit dependence on $\mathrm d u / \mathrm d \psi$. | ||
|
||
We start with | ||
\begin{equation*} | ||
\frac{\mathrm d j}{\mathrm d \psi} = \frac{\partial j}{\partial \psi} | ||
+ \frac{\partial j}{\partial u} \frac{\mathrm d u}{\mathrm d \psi} | ||
\frac{\mathrm d j}{\mathrm d \psi} = \frac{\partial j}{\partial u} \frac{\mathrm d u}{\mathrm d \psi} | ||
+ \frac{\partial j}{\partial \psi}. | ||
\end{equation*} | ||
Then substitute the expression for $\frac{\mathrm d u}{\mathrm d \psi}$ from the previous section | ||
\begin{equation*} | ||
\frac{\mathrm d j}{\mathrm d \psi} = - \frac{\partial j}{\partial u} \frac{\partial f}{\partial u}^{-1} | ||
\frac{\partial f}{\partial \psi} | ||
+ \frac{\partial j}{\partial \psi}. | ||
\end{equation*} | ||
Take the adjoint (transpose) of $\frac{\mathrm d j}{\mathrm d \psi}$ | ||
\begin{equation*} | ||
\frac{\mathrm d j}{\mathrm d \psi}^{*} = - \frac{\partial f}{\partial \psi}^{*} \frac{\partial f}{\partial u}^{-*} | ||
\frac{\partial j}{\partial u}^{*} | ||
+ \frac{\partial j}{\partial \psi}^{*}. | ||
\end{equation*} | ||
Define a new variable $\lambda$ as | ||
\begin{equation*} | ||
\lambda = \frac{\partial f}{\partial u}^{-*} \frac{\partial j}{\partial u}^{*}. | ||
\end{equation*} | ||
The linear system that is solved for $\lambda$ is called the _adjoint system_. | ||
Having the solution to the adjoint system \lambda, it remains to evaluate \frac{\mathrm d j}{\mathrm d \psi}^{*} | ||
with one additional matrix multiplication operation | ||
\begin{equation*} | ||
\frac{\mathrm d j}{\mathrm d \psi}^{*} = - \frac{\partial f}{\partial \psi}^{*} \lambda | ||
+ \frac{\partial j}{\partial \psi}^{*}. | ||
\end{equation*} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a pain to read without the pdf output. Can you summarize the change you made here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here are derivations with rendered latex, though the symbols I used there are different There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seeing I'm doing a code review, linking to similar code that does the calculations with different symbols is not good enough. The code you have doesn't compile, and I had to make a few corrections to render it. You use * for transpose, but this is inconsistent with previous notation where T is used. You also do not use []^{-1} as I have requested for other sections of your code. Last but not least, it is unclear to me why you have replaced the original calculations. |
||
|
||
Consider now the _Lagrangian_, | ||
|
@@ -139,6 +167,38 @@ Note that we could have obtained this result by using the implicit function theo | |
The adjoint method however has merit in its ability to solve more complicated problems | ||
and in the unifying approach it provides when studying implicit functions. | ||
|
||
## Practical considerations | ||
## Summary of results | ||
|
||
One of the key factors that makes automatic differentiation so successful | ||
is that we do not explicitly construct the Jacobian matrices. | ||
Rather, we only sequentially compute cotangent-Jacobian products in reverse mode, | ||
or Jacobian-tangent products in forward mode. | ||
|
||
### Tangent linear equation and forward mode AD | ||
|
||
In forward mode given the tangent vector $\dot{\psi}$ we evaluate the Jacobian-tangent product | ||
\begin{equation*} | ||
(\psi, \dot{\psi}) \mapsto \frac{\mathrm d u(\psi)}{\mathrm d \psi} \dot{\psi}, | ||
\end{equation*} | ||
that is the solution to the following linear system | ||
\begin{equation*} | ||
\frac{\partial f}{\partial u} \left(\frac{\mathrm d u(\psi)}{\mathrm d \psi} \dot{\psi} \right) = | ||
- \frac{\partial f}{\partial \psi} \cdot \dot{\psi}. | ||
\end{equation*} | ||
|
||
### Adjoint equation and reverse mode AD | ||
|
||
In reverse mode given the cotangent vector $\bar{\psi}$ we evaluate the Jacobian-transpose-vector product | ||
\begin{equation*} | ||
(\psi, \bar{\psi}) \mapsto \frac{\mathrm d u(\psi)}{\mathrm d \psi}^{*} \bar{\psi}, | ||
\end{equation*} | ||
that is evaluated as | ||
\begin{equation*} | ||
\frac{\mathrm d u(\psi)}{\mathrm d \psi}^{*} \bar{\psi} = - \frac{\partial f}{\partial \psi}^{*} \cdot \lambda, | ||
\end{equation*} | ||
where $\lambda$ is the solution to the following linear system | ||
\begin{equation*} | ||
\frac{\partial f}{\partial u}^{*} \lambda = \bar{\psi}. | ||
\end{equation*} | ||
IvanYashchuk marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Practical considerations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be true but I don't see the point of introducing this notion here.