Implement more matrix functions #43982
Replies: 61 comments 6 replies
-
It would be amazing to include all of these algorithms. Thanks for putting this issue together, @jiahao. |
Beta Was this translation helpful? Give feedback.
-
It's not clear to me what interface should be implemented for the very generic things like "function of a symmetric or Hermitian matrix". We could define methods for elementary functions until the cows come home, but that is both tedious and namespace polluting (we'd need Maybe we want something like |
Beta Was this translation helpful? Give feedback.
-
This is one of the reasons why having all these functions be vectorized is kind of shitty. I suspect that @JeffBezanson will vehemently agree. There could be a |
Beta Was this translation helpful? Give feedback.
-
Sure, but that won't help with the fact that a complicated matrix function like |
Beta Was this translation helpful? Give feedback.
-
Yes, that's a very good point. |
Beta Was this translation helpful? Give feedback.
-
Trefethen's contour-integration method is also neat for mostly analytic functions (with known singularities), though I don't know how it compares to the state of the art (in performance and generality). |
Beta Was this translation helpful? Give feedback.
-
@stevengj it looks like that paper is Ref. 32 of the Higham and Deadman report, and isn't currently available in any major package (although it has Matlab code). |
Beta Was this translation helpful? Give feedback.
-
This seems like a good candidate for a GSOC project. As far as an interface goes, I would suggest a macro
A naive approach could simply replace the functions (e.g. |
Beta Was this translation helpful? Give feedback.
-
The item "[sqrtm] real version for real matrices (Higham, 1987)" is checked. I can't see that algorithm in base. I made an attempt on it about a year ago but concluded that the complexity just wasn't worth it. I've put up the code as a gist if someone wants to have a look. |
Beta Was this translation helpful? Give feedback.
-
Thanks for catching that. I must have ticked it by accident when writing up this issue. |
Beta Was this translation helpful? Give feedback.
-
Just tried to call logm, but wasn't implemented. Had to go to Mathematica |
Beta Was this translation helpful? Give feedback.
-
Whoever tackles this issue might find this paper helpful for testing: |
Beta Was this translation helpful? Give feedback.
-
Great reference – thanks, @fabianlischka. |
Beta Was this translation helpful? Give feedback.
-
The |
Beta Was this translation helpful? Give feedback.
-
Also cc: @higham @alanedelman |
Beta Was this translation helpful? Give feedback.
-
Unless I'm missing something, not really: differentials of matrix functions are tricky due to non-commutativity. @sethaxen it would be strange to have this live in LinearAlgebra, can you maybe refactor LinearAlgebra to expose the functions that you need and use them in ChainRules? |
Beta Was this translation helpful? Give feedback.
-
I think @antoine-levitt is right. Looks like differentiating f(A(t)) = ∮ f(ζ) inv(ζ*I - A(t)) dζ then we get d/dt f(A(t)) = ∮ f(ζ) inv(ζ*I - A(t)) * dA/dt * inv(ζ*I - A(t)) dζ which I don't believe can be reduced to matrix factorisations. Really interested to know if anyone has a reference for differentiating matrix functions. Don't have a copy of Higham handy but nothing pops out of the table of contents. |
Beta Was this translation helpful? Give feedback.
-
Sort of. For any matrix function function frechet(f, A, ΔA)
n = checksquare(A)
Ablock = [A ΔA; zero(A) A]
Yblock = f(Ablock)
Y = Yblock[1:n,1:n]
ΔY = Yblock[1:n,n+1:2n]
return Y, ΔY
end But this is ~8x the cost of
I agree, I only ask because given this issue it seemed there was (old) interest in them being in base, and the Frechet derivative of
Perhaps. While the most efficient way to implement will be completely in parallel, we can get close by reusing all matrix products. So a nice refactor might be to move the work of |
Beta Was this translation helpful? Give feedback.
-
Basically continue what you were doing with the integral formula, expand in an eigenbasis and integrate the poles explicitly, and you get divided differences (f(lambda_n) - f(lambda_m)) / (lambda_n - lambda_m), with the convention (f(x)-f(y))/(x-y) = f'(x) when x==y. It's probably discussed in Higham somewhere, I think it's called Daleskii-Krein there (I know it from quantum mechanics, where it's known under the generic name of "perturbation theory"). It's a bit tricky to implement in the case of close eigenvalues for generic functions It was suggested above to split matrix functions (more than exp/log/sqrt) to a separate MatrixFunctions.jl package; I think that would be a very natural place to put derivatives. |
Beta Was this translation helpful? Give feedback.
-
@sethaxen's method is probably a bad idea for normal matrices as |
Beta Was this translation helpful? Give feedback.
-
Section 3.2 of Higham's Functions of Matrices gives several approaches for computing Frechet derivative:
For automatic differentiation, we indeed only want to define a rule if it works for all matrices, and the latter approach should be fine. In fact, the Frechet derivative of the matrix exponential by Higham is precisely exactly this, just checked for accuracy. I don't think we can autodiff
The issue with that, raised above, is type piracy. I guess that could be worked around by re-appending the |
Beta Was this translation helpful? Give feedback.
-
I see. In the applications I'm interested in the function f is often the Heaviside function, so that's not an option. Regarding a separate package, clearly exp/sin/log/sqrt are special because they're in base, but there are other functions one might want to use, so a general |
Beta Was this translation helpful? Give feedback.
-
This is best done in external packages. Moving to a discussion so that we have the list. |
Beta Was this translation helpful? Give feedback.
-
Hey, I'm just done with I'm new to Julia ( |
Beta Was this translation helpful? Give feedback.
-
Certainly! These packages can live in the JuliaMath org. |
Beta Was this translation helpful? Give feedback.
-
Yes, it'd be awesome to have something like a package that implements matrix_function(f, A) and has derivatives through compatibility with ChainRules/ForwardDiff. |
Beta Was this translation helpful? Give feedback.
-
I don't think getting back the full Frechet derivative is super helpful, as that's probably too big for most applications; much more useful is the jvp (L * delta_X) and the vjp (delta_X * L), which are what ChainRules needs for its frule and rrule. I'm not a big fan of complex step (it's cute, but it's just simpler to use the real one) I implemented it for my own purposes for Symmetric matrices, see below. I imagine something similar holds for non-symmetric, but I've never worked with those directly. function matrix_function(f, H)
E = eigen(Symmetric(H))
D = E.vectors * Diagonal(f.(E.values)) * E.vectors'
end
@views function dmatrix_function(f, H, δH)
E = eigen(Symmetric(H))
occ = f.(E.values)
N = size(H, 1)
δD = zeros(eltype(δH), N, N)
# δD = ∑nm (fn-fm)/(en-em) |n><m| δHnm
for n = 1:N
for m = 1:N
(occ[n] == occ[m]) && continue
δD .+= ((occ[n]-occ[m])/(E.values[n]-E.values[m]) * dot(E.vectors[:, m], δH * E.vectors[:, n])) .* (E.vectors[:, m] .* E.vectors[:, n]')
end
end
δD
end
# makes matrix_function compatible with ForwardDiff
function matrix_function(f, H::Matrix{T}) where {T <: ForwardDiff.Dual}
H_value = ForwardDiff.value.(H)
D = matrix_function(f, H_value)
DT = ForwardDiff.Dual{ForwardDiff.tagtype(T)}
res = [dmatrix_function(f, H_value, ForwardDiff.partials.(H, α)) for α = 1:ForwardDiff.npartials(T)]
map((val, δval...) -> DT(val, δval...), D, res...)
end EDIT: this function is wrong, see below |
Beta Was this translation helpful? Give feedback.
-
Hello @antoine-levitt, I'm not sure I fully understand your comment. Do you mean the Jacobian? Perhaps we must discuss this in another discussion thread? In fact, I couldn't also understand what is being computed in the function function dmatrix_function(f, H, dH)
N = size(H,1)
E = eigen(Symmetric(H))
U = E.vectors
occ = f.(E.values)
F = [if m == n 0 else (occ[m] - occ[n])/(E.values[m]-E.values[n]) end for m=1:N, n=1:N]
δD = U*(F.*(U'*dH*U))*U'
end On the other hand, the matrix function Lf(f, H, dH)
N = size(H,1)
E = eigen(Symmetric(H))
U = E.vectors
fL = Diagonal(f.(E.values))
dfL = Diagonal(ForwardDiff.derivative.(f, E.values))
F = [if m == n 0 else 1/(E.values[n]-E.values[m]) end for m=1:N, n=1:N]
L = U*dfL*Diagonal(U'*dH*U)*U' + U*(F.*(U'*dH*U))*fL*U' + U*fL*(F'.*(U'*dH*U))*U'
end and works for symmetric julia> A = [1 2; 2 5]/10 ;
julia> dA = [4 5; 5 7]*1e-6 ;
julia> A1 = [A dA; zeros(2,2) A] ;
julia> exp(A1)[1:2,3:4]
2×2 Matrix{Float64}:
5.82694e-6 8.46229e-6
8.46229e-6 1.3176e-5
julia> Lf(exp, A, dA)
2×2 Matrix{Float64}:
5.82694e-6 8.46229e-6
8.46229e-6 1.3176e-5 It also works for I'm not sure how to use the functions in your comment above to reach the same output. This is because
|
Beta Was this translation helpful? Give feedback.
-
FYI, ChainRules implements forward- and reverse-mode rules for most of the matrix functions defined in base Julia.
The latter uses the following signatures:
The Frechet derivative is the pushforward/jvp used in forward-mode AD, i.e. the action of the Jacobian on the direction matrix. As a pushforward, it is a linear map whose adjoint is the pullback/vjp used in reverse-mode AD. A really convenient property of matrix functions is that if the pushforward (Frechet derivative) at So the Frechet derivative of a matrix function is a really useful primitive that can be used for both forward- and reverse-mode AD. It might be worth starting from the implementations in ChainRules. My advice would be to keep the package very lightweight. |
Beta Was this translation helpful? Give feedback.
-
OK sorry I misunderstood what you meant by Fréchet derivative, I thought the full 4th order tensor. LinearAlgebra does not depend on chainrules. Though I guess there could be a |
Beta Was this translation helpful? Give feedback.
-
Higham and Deadman have published a list of software implementing algorithms for matrix functions, showing that Julia's library for these functions could be improved greatly relative to the state of the art.
The purpose of this issue is to discuss and track the implementation of matrix functions.
From Higham and Deadman's list:
General function evaluations
cond(f,A)
f(A) * b
Specific functions
expm
: scaling and squaring algorithm (Al-Mohy and Higham, 2009)logm
: inverse scaling and squaring algorithm (Al-Mohy, Higham, and Relton, 2012,2013)
sqrtm
:A^t
for real matrix powers: Schur– Padé algorithm (Higham and Lin, 2013) (Fixed the algorithm for powers of a matrix. #21184)Fréchet derivatives
A^t
for real matrix powers: Schur– Padé algorithm (Higham and Lin, 2013)Miscellaneous functions
inv
anddet
forTridiagonal
andSymTridiagonal
matrices (Usmani, 1994) (Provide inv and det of Tridiagonal and SymTridiagonal matrices #5358)Beta Was this translation helpful? Give feedback.
All reactions