Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reverse-over-forward AD #162

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

jrmaddison
Copy link
Contributor

@jrmaddison jrmaddison commented Jul 10, 2024

Reverse-over-forward AD.

The usage

u = [initialize forward variable]
u.block_variable.tlm_value = [initialize tangent-linear variable]

continue_annotation()
continue_reverse_over_forward()
...
J = [functional]
pause_annotation()
pause_reverse_over_forward()

leads to tangent-linear operations being recorded on the tape, allowing a high-order adjoint calculation via e.g.

hessian_action = compute_gradient(J.block_variable.tlm_value, Control(u))

The primary advantage is that this allows checkpointing at higher-order. The primary disadvantage is that multiple Hessian actions require reruns of the forward and first order adjoint.

API changes:

  • Add reverse-over-forward controls reverse_over_forward_enabled, no_reverse_over_forward (decorator), stop_reverse_over_forward (context manager), pause_reverse_over_forward, and continue_reverse_over_forward.
  • Add optional n_outputs argument to the Block constructor. This is currently required only for reverse-over-forward AD, and is used to trigger tangent-linear operations.
  • Add Block.solve_tlm method, for performing differentiable tangent-linear operations.
  • Add OverloadedType._ad_assign for in-place assignment. This is used to temporarily reset forward variables to the required (input) values before performing tangent-linear operations.
  • Add PosBlock, used by AdjFloat.__pos__.

Limitations:

  • Not added for NumpyArraySliceBlock, as this is lacking tangent-linear methods.
  • In principle higher order calculations with all tangent-linear directions equal is possible via this approach, but this would need to be used quite carefully. Avoiding inefficiency in the (likely much more common) second order case would also add quite a bit of complexity. Higher-order is therefore disabled (around the solve_tlm call in Block.add_output). Higher order calculations with different directions requires multiple tangent-linear values (multiples values for tlm_value for a single BlockVariable) and would likely require much more extensive changes.
  • Functionality has been added for AdjFloat operations, but this could lead to a large number of operations appearing on the tape (the usual symbolic differentiation scaling problem, but here appearing on the pyadjoint tape). The complexity could perhaps be moved from the tape into SymPy expressions (e.g. a port of the tlm_adjoint FloatEquation tape object).

@jrmaddison jrmaddison marked this pull request as ready for review July 11, 2024 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant