Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 7/implicit functions #8

Open
wants to merge 9 commits into
base: issue-7/implicit_functions
Choose a base branch
from

Conversation

IvanYashchuk
Copy link

Hello @charlesm93, here are my friendly suggestions on the section you wrote.

I propose renaming "algebraic equations" -> "nonlinear system" as the same results hold for this more general case (including also PDEs).
Implicit function theorem just states that du/dpsi exists under certain conditions, therefore, I renamed the section to "tangent linear method".
I think that there is no need to introduce the Lagrangian approach for the adjoint method, as it is naturally derived from the tangent method. I propose to remove that section.
In "practical considerations" I have added the expressions needed for the implementation of forward and reverse AD of nonlinear solvers.

Ref. #7

@bob-carpenter
Copy link
Owner

Thanks for the PR!

"Algebraic equations" is how these are typically described in the literature. I'd like to continue to at least connect to the mainstream literature.

Part of the motivation for writing this volume is to provide an accessible presentation. So I'd like to keep the presentations to the simplest possible level rather than the most general mathematical level. I'm just bringing this up as a criterion to consider this PR.

I'll let @charlesm93 respond directly on the content.

@charlesm93
Copy link
Collaborator

charlesm93 commented Apr 15, 2020

Hi @IvanYashchuk,
thank you for chiming in. I'll go through your PR, but first a few comments:

  • Another reason for keeping "algebraic equation" is that the handbook is meant to be usable as a lookup textbook.
  • Statements of the implicit function theorem I've encountered also give you the Jacobian matrix. In any case, to derive the Jacobian, you need u to be a function of psi. Is "tangent linear method" a termed used in the certainf fields?
  • the case of PDEs is different because the target is an integral of f, rather than a function of it. The adjoint method is then more sophisticated.
  • You're right that we don't need to discuss the Lagrangian approach for algebraic equation, but I'm introducing it here for pedagogical reasons. It is easy to understand with algebraic equations and much more difficult for ODEs, DAEs, etc where we will need them.

What I do in this section might make more sense once I've written the other sections on implicit functions.

@IvanYashchuk
Copy link
Author

Is "tangent linear method" a termed used in the certain fields?

It is a common term, even used in the current draft

Forward-mode automatic differentiation employs the tangent method of

the case of PDEs is different because the target is an integral of f, rather than a function of it.

There are many different PDEs and there are many different objective (target?) functions. AD rules are independent of the form of the objective functions. Stationary PDEs are nonlinear equations of form F(x) = 0, linear stationary PDEs can be written in the same form, time-dependent PDEs are often reduced to a sequence of stationary PDEs.

Alright, I agree that the Lagrangian approach is needed for ODEs, etc. Anyways the actual rules for AD are more important than the derivation of them.

@charlesm93
Copy link
Collaborator

charlesm93 commented Apr 16, 2020

It is a common term,

Point taken.

Adding a summary at the end, as you have done, with only the results plainly stated works well. These sections can be named "tangent linear method" and "adjoint method" (again, as you have done). The first sections, in which the differentiation algorithms are derived, can be named "Implicit function theorem" and "Lagrangian approach". The implicit function theorem allows you to derive the tangent linear and adjoint method.

AD rules are independent of the form of the objective functions

Some properties of the objective function matter. A scalar objective motivates an adjoint method, while a long vector may warrant a tangent method.

Stationary PDEs are nonlinear equations of form F(x) = 0

Isn't an equation of the form F(x) = 0 simply an algebraic equation? To have a PDE, F must also depend on partial derivatives of x. Unless of course you can point me to an example where this is not the case.

Anyways the actual rules for AD are more important than the derivation of them.

It depends on the reader. For Stan developers, being able to extend results to other cases can sometimes be important, as the AD handbook may not be exhaustive.

@charlesm93 charlesm93 self-assigned this Apr 16, 2020
Copy link
Collaborator

@charlesm93 charlesm93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is very good. Please address the requested changes.

I'm still drafting this section (and the other chapters on implicit functions). In the future, please wait for a chapter to be finished and merged to the master branch before proposing a PR. You can always bring up issues. All that said, thank you for your help!

algebraic-equations.Rmd Outdated Show resolved Hide resolved
algebraic-equations.Rmd Outdated Show resolved Hide resolved
algebraic-equations.Rmd Outdated Show resolved Hide resolved
algebraic-equations.Rmd Show resolved Hide resolved
algebraic-equations.Rmd Show resolved Hide resolved
with one additional matrix multiplication operation
\begin{equation*}
\frac{\mathrm d j}{\mathrm d \psi}^{*} = - \frac{\partial f}{\partial \psi}^{*} \lambda
+ \frac{\partial j}{\partial \psi}^{*}.
\end{equation*}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pain to read without the pdf output. Can you summarize the change you made here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are derivations with rendered latex, though the symbols I used there are different
https://colab.research.google.com/drive/1zA75xSfsy2d7-7ojWoUbmCqS6CZy30m3

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing I'm doing a code review, linking to similar code that does the calculations with different symbols is not good enough. The code you have doesn't compile, and I had to make a few corrections to render it. You use * for transpose, but this is inconsistent with previous notation where T is used. You also do not use []^{-1} as I have requested for other sections of your code.

Last but not least, it is unclear to me why you have replaced the original calculations.

algebraic-equations.Rmd Show resolved Hide resolved
@IvanYashchuk
Copy link
Author

I think I've addressed all the needed changes.
Rendered latex derivation: #8 (comment)

Copy link
Collaborator

@charlesm93 charlesm93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still a bit of work to be done and you need to test that your code compiles.

(I realize that I probably misinterpreted your friendly suggestions as a formal PR, and I could've taken some of your suggestions without doing code review. But since we're almost done, we may as well make a merge.)

with one additional matrix multiplication operation
\begin{equation*}
\frac{\mathrm d j}{\mathrm d \psi}^{*} = - \frac{\partial f}{\partial \psi}^{*} \lambda
+ \frac{\partial j}{\partial \psi}^{*}.
\end{equation*}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing I'm doing a code review, linking to similar code that does the calculations with different symbols is not good enough. The code you have doesn't compile, and I had to make a few corrections to render it. You use * for transpose, but this is inconsistent with previous notation where T is used. You also do not use []^{-1} as I have requested for other sections of your code.

Last but not least, it is unclear to me why you have replaced the original calculations.

+ \frac{\partial f}{\partial u}\frac{\mathrm d u}{\mathrm d \psi} = 0 \\
\iff & \frac{\partial f}{\partial u}\frac{\mathrm d u}{\mathrm d \psi}
= - \frac{\partial f}{\partial \psi} \\
\iff & \frac{\mathrm d u}{\mathrm d \psi} = - \left [\frac{\partial f}{\partial u} \right]^{-1}
\frac{\partial f}{\partial \psi}
\end{align*}
where we assume the requisite differentiation and inversion are possible.
The linear system that is solved for $\frac{\mathrm d u}{\mathrm d \psi}$ is called the _tangent linear system_.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be true but I don't see the point of introducing this notion here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants