Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differentiating Implicit function inside implicit function with Enzyme #153

Open
benjaminfaber opened this issue Aug 14, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@benjaminfaber
Copy link

I've run into an issue where I want to compute the gradient of an implicit function that itself depends on another implicit function. I can do the operation successfully with FowardDiff, however I would like to use a backend that is faster, like Enzyme. When I run the MWE below, I run into a segmentation fault, I think related to the fact that one needs use autodiff_deferred, but it looks like ADTypes doesn't yet have a backend option for autodiff_deferred. Should I open an issue there or is this something that can be implemented in ImplicitDifferentiation/DifferentiationInterface?

Tagging @just-walk

using ImplicitDifferentiation, Optim, Enzyme, ForwardDiff

function f1(x, method)
    f(y) = sum(abs2, y .^ 2 .- x)
    y0 = ones(eltype(x), size(x))
    result = optimize(f, y0, method)
    return Optim.minimizer(result)
end;

function c1(x, y, method)
    ∇₂f = @. 4 * (y^2 - x) * y
    return ∇₂f
end;

implicit_f1 = ImplicitFunction(f1, c1)

function f2(x, method)
    z = implicit_f1(x, method)
    f(y) = sum(abs2, y .^ 2 .- z .* x)
    y0 = ones(eltype(x), size(x))
    result = optimize(f, y0, method)
    return Optim.minimizer(result)
end

function c2(x, y, method)
    z = implicit_f1(x, method)
    ∇₂f = @. 4 * (y^2 - z * x) * y
    return ∇₂f
end

implicit_f2 = ImplicitFunction(f2, c2)

x = [4., 9.]

dx = ([1., 0.], [0., 1.])

# Works
df1 = Enzyme.autodiff(Enzyme.Forward, implicit_f1, BatchDuplicatedNoNeed, BatchDuplicated(x, dx), Const(LBFGS()))[1]

┌ Warning: Using fallback BLAS replacements for (["dasum_64_"]), performance may be degraded
└ @ Enzyme.Compiler ~/.julia/packages/GPUCompiler/Y4hSX/src/utils.jl:59
((var"1" = (var"1" = [0.25000000000806183, 0.0], var"2" = [0.0, 0.16666666667288246]),)

df2 = ForwardDiff.jacobian(_x -> implicit_f2(_x, LBFGS()), x)
2×2 Matrix{Float64}:
 0.53033  0.0
 0.0      0.433013

# Does not work

df2 = autodiff(Forward, implicit_f2, BatchDuplicatedNoNeed, BatchDuplicated(x, dx), Const(LBFGS()))[1]
[32468] signal (11.1): Segmentation fault
in expression starting at /home/bfaber/TempPkg/src/diff_test.jl:44
gc_mark_obj16 at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:1894 [inlined]
gc_mark_outrefs at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:2654 [inlined]
gc_mark_loop_serial_ at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:2697
gc_mark_loop_serial at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:2720
gc_mark_loop at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:2901 [inlined]
_jl_gc_collect at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:3234
ijl_gc_collect at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:3531
maybe_collect at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:937 [inlined]
jl_gc_pool_alloc_inner at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:1300
jl_gc_pool_alloc_noinline at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:1357 [inlined]
jl_gc_alloc_ at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia_internal.h:476 [inlined]
jl_gc_alloc at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:3583
_new_array_ at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/array.c:134 [inlined]
_new_array at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/array.c:198 [inlined]
ijl_alloc_array_1d at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/array.c:436
Array at ./boot.jl:477 [inlined]
.
.
.
@gdalle
Copy link
Member

gdalle commented Aug 15, 2024

Indeed ADTypes has no option for choosing betweeen autodiff and autodiff_deferred with Enzyme, partly because this choice should become unnecessary in the medium term (according to @wsmoses).
In the meantime, DifferentiationInterface allows the switch internally with the function DifferentiationInterface.nested(backend).
You can try setting conditions_x_backend and conditions_y_backend to DifferentiationInterface.nested(AutoEnzyme()) in your inner ImplicitFunction to automatically use autodiff_deferred.

I should also note that at the moment, Enzyme is only supported in forward mode by ImplicitDifferentiation (see #35). If your input is very high-dimensional (how big are we talking?), you may be better served by a reverse mode backend.
Can you maybe give a mathematical description of the real function(s) you're trying to differentiate?

@wsmoses
Copy link

wsmoses commented Aug 15, 2024

That bug is a GC error not the result of deferred.

i fixed two such bugs in Julia proper for the 1.10.5 release see if that helps in not open a mwe issue on enzyme proper

@wsmoses
Copy link

wsmoses commented Aug 15, 2024

Deferred isn’t needed here, since the custom rule used is calling autodiff. Deferred is only required if doing higher order AD, Eg doing autodiff of autodiff

@gdalle
Copy link
Member

gdalle commented Aug 15, 2024

I think higher order is what's being done here: the conditions c2 involve autodiff of implicit_f1

@wsmoses
Copy link

wsmoses commented Aug 15, 2024 via email

@benjaminfaber
Copy link
Author

Thank you both, for the clarification on this, and being willing to put up with questions from a relative newcomer to AD. To answer your question @gdalle, the intended application is a plasma physics/fusion energy optimization problem where we can have up to 300+ parameters in our state vector. The gradient calculation needed is ultimately going to be the gradient of a scalar optimization target, so reverse mode will be the way we want to go. The problem I'm currently trying to tackle has three nested optimization problems (A, B, C), where B depends on A and C depends on both B and A. I can use NonlinearSolve and the NonlinearLeastSquaresProblem, however I was not having any luck calculating the sensitivities whereas I have been able to (albeit slowly) with ForwardDiff and ImplicitDifferentiation, so I've been pursuing that route. I may be doing things incorrectly with NonlinearSolve, however we still have the ultimate goal of wanting to use a legacy Fortran application to do the forward solve in the implicit problem, so we'll need ImplicitDifferentiation for that.

I'll see what happens with I use a reverse mode differentiator and if that has any issues.

@gdalle
Copy link
Member

gdalle commented Aug 15, 2024

If you have any mathematical formulation of the nested optimization problems that you're free to share, id love to take a look!

@benjaminfaber
Copy link
Author

Sure, I'd be happy to give a little mathematical background on the problem. We are trying to optimize the shape of the magnetic field used in magnetic confinement fusion concepts, which can be represented efficiently in cylindrical coordinates with a non-standard Fourier transformation:

$$R(r,\theta,\zeta) = \sum\limits_{m,n}^{M,N} R_{m,n}^c(r)\cos\left(n\theta + n\zeta\right) + R_{m,n}^s \sin\left(m\theta + n\zeta\right),$$ $$Z(r,\theta,\zeta) = \sum\limits_{m,n}^{M,N} Z_{m,n}^c(r)\cos\left(n\theta + n\zeta\right) + Z_{m,n}^s \sin\left(m\theta + n\zeta\right),$$ $$\phi = -\zeta$$

Derived quantities, like the Jacobian of the coordinate transformation, also have the same Fourier representation. I'm using Optimization.jl to find the Fourier modes of the derived quantities that satisfy their constraint conditions. For example, Jacobian equation satisfies the following condition

$$\frac{\partial}{\partial\theta}\left(\frac{1 + \frac{\partial R}{\partial \zeta}^2 + \frac{\partial Z}{\partial \zeta}^2 + \iota \left(\frac{\partial R}{\partial \theta} \frac{\partial R}{\partial \zeta} + \frac{\partial Z}{\partial \theta}\frac{\partial Z}{\partial \zeta}\right)}{\sqrt{g}}\right) + \frac{\partial}{\partial\zeta}\left(\frac{\frac{\partial R}{\partial \theta}\frac{\partial R}{\partial \zeta} + \frac{\partial Z}{\partial \theta}\frac{\partial Z}{\partial \zeta} + \iota \left(\frac{\partial R}{\partial \theta} ^2 + \frac{\partial Z}{\partial \theta} ^ 2\right)}{\sqrt{g}}\right) = 0.$$

Other quantities then depend on this Jacobian, so I've found it straight forward to solve for the Fourier modes of those quantities as another optimization problem where I can write down the optimality condition easily to use ImplicitDifferentiation to compute sensitivities, and is where I ran into this problem initially.

All that being said, I've found a workaround is to solve for all of the dependent quantities simultaneously, so that I only need to make one call to compute the derivative of an ImplicitFunction and don't need to currently deal with this problem.

@gdalle
Copy link
Member

gdalle commented Aug 23, 2024

Thanks, glad you found a workaround! We'll revisit this when Julia 1.10.5 comes out

@gdalle gdalle added the bug Something isn't working label Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants