-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inverse isometric log ratio simplex method #39
Comments
Now that I look at it again, it looks like the softmax formulation but using data {
int<lower=0> N;
matrix[N - 1 , N - 1] Vinv;
}
parameters {
vector[N - 1] y;
}
transformed parameters {
simplex[N] z;
vector[N - 1] s = Vinv * y;
{
real alpha = log_sum_exp(append_row(s, 0));
z = append_row(exp(s - alpha), 1 / exp(alpha));
}
}
model {
target += -N * log1p_exp(log_sum_exp(s)) + sum(s);
}
generated quantities {
real z_sum = sum(z);
} |
Is there an explanation of the reasoning behind this transformation somewhere?
Happy to! 🙂 |
I was following this https://www.researchgate.net/publication/266864217_AN_ALGEBRAIC_METHOD_TO_COMPUTE_ISOMETRIC_LOGRATIO_TRANSFORMATION_AND_BACK_TRANSFORMATION_OF_COMPOSITIONAL_DATA. It isn't a great paper but the calculations were confirmed in the compositions R package. Here's my noob julia code to get the Jacobian. It is not equal to what I have in the Stan program above. using Test, LinearAlgebra, ForwardDiff, StatsFuns, StatsModels
D = 5
helmert_mat = transpose(StatsModels.ContrastsMatrix(HelmertCoding(), 1:D).matrix)
V_raw = eachrow(helmert_mat) ./ norm.(eachrow(helmert_mat))
V_raw = transpose(reduce(hcat, V_raw))
V_fullrank = vcat(V_raw, vcat(fill(0, D - 1), 1)')
V_inv_fullrank = inv(V_fullrank)
y = rand(Uniform(-10, 10), D - 1)
function euclid_to_simplex!(y, Vinv)
s = Vinv * y
alpha = log(sum(exp.(vcat(s,0 ))))
return vcat(exp.(s .- alpha), 1/exp(alpha))
end
J = ForwardDiff.jacobian(y -> euclid_to_simplex!(y, V_inv_fullrank[1:D-1,1:D-1]), y)
logabsdet(J'J)[1]/2
# does not equal this
s_test = V_inv_fullrank[1:D-1,1:D-1] * y
-D * log1pexp(logsumexp(s_test)) + sum(s_test)
|
I did get the Jacobian by plugging There's gotta be a simpler formula though data {
int<lower=0> N;
matrix[N - 1 , N - 1] Vinv;
matrix[N, N] Vinv_full;
}
parameters {
vector[N - 1] y;
}
transformed parameters {
simplex[N] z;
vector[N - 1] s = Vinv * y;
real logabsdet = 0;
{
real alpha = log_sum_exp(append_row(s, 0));
z = append_row(exp(s - alpha), 1 / exp(alpha));
vector[N] t0 = exp(Vinv_full * append_row(y, 0));
real t1 = sum(t0);
vector[N] t2 = t0 / t1;
// matrix[N, N] J = diag_pre_multiply(t2, Vinv_full) - (1/t1) * t2 * (t0' * Vinv_full);
// a bit faster
matrix[N, N - 1] J = add_diag( -(1/t1) * t2 * t0', t2) * Vinv_full[:, 1:N - 1];
logabsdet += log_determinant_spd(crossprod(J[,1:N-1])) * 0.5;
}
}
model {
target += logabsdet;
}
generated quantities {
real z_sum = sum(z);
} Test of jacobian in Julia using Test, LinearAlgebra, ForwardDiff, StatsFuns, StatsModels
D = 5
helmert_mat = transpose(StatsModels.ContrastsMatrix(HelmertCoding(), 1:D).matrix)
V_raw = eachrow(helmert_mat) ./ norm.(eachrow(helmert_mat))
V_raw = transpose(reduce(hcat, V_raw))
V_fullrank = vcat(V_raw, vcat(fill(0, D - 1), 1)')
V_inv_fullrank = inv(V_fullrank)
y = rand(Uniform(-10, 10), D - 1)
s = V_inv_fullrank[1:D-1,1:D-1] * y
z = vcat(exp.(s .- alpha), 1/exp(alpha))
function euclide_to_simplex!(y, Vinv)
s = Vinv * y
alpha = log(sum(exp.(vcat(s,0 ))))
return vcat(exp.(s .- alpha), 1/exp(alpha))
end
J = ForwardDiff.jacobian(y -> euclide_to_simplex!(y, V_inv_fullrank[1:D-1,1:D-1]), y)
t0 = V_inv_fullrank * vcat(y, 0)
t1 = exp.(t0)
t2 = sum(t1)
t3 = exp.(t0 .- log(t2) )
J_test = diagm(t3) * V_inv_fullrank - (1/t2) * t3 * (transpose(t1) * V_inv_fullrank)
@test logabsdet(J_test[:, 1:D-1]'J_test[:, 1:D-1])[1] * 0.5 ≈ logabsdet(J'J)[1]/2
Test Passed
Expression: (logabsdet((J_test[:, 1:D - 1])' * J_test[:, 1:D - 1]))[1] * 0.5 ≈ (logabsdet(J' * J))[1] / 2
Evaluated: -24.100049227510453 ≈ -24.100049227559065 |
Two things should help. That means so we can avoid the cross-product. And then Now we can apply the matrix determinant lemma with I just learned this matrix determinant lemma last week and it's cropping up everywhere! Of course, I could've botched the linear algebra, so please double check! |
I got it!! By chance I happened to find a ILR cheatsheet. Wow! So easy to take the derivative. I confirmed with the Julia code. data {
int<lower=0> N;
matrix[N - 1 , N - 1] Vinv;
matrix[N, N] Vinv_full;
}
parameters {
vector[N - 1] y;
}
transformed parameters {
simplex[N] z;
vector[N - 1] s = Vinv * y;
real logabsdet = sum(log_softmax( Vinv_full[:, 1:N-1] * y)) + log(N);
{
real alpha = log_sum_exp(append_row(s, 0));
z = append_row(exp(s - alpha), 1 / exp(alpha));
}
}
model {
target += logabsdet;
}
generated quantities {
real z_sum = sum(z);
} |
The code can be simplified quite a bit. Where I've updated data {
int<lower=0> N;
matrix[N, N - 1] Vinv;
}
transformed data {
real logN = log(N);
}
parameters {
vector[N - 1] y;
}
transformed parameters {
vector[N] s = Vinv * y;
real alpha = log_sum_exp(s);
simplex[N] z = exp(s - alpha);
}
model {
target += sum(s - alpha) + logN;
}
generated quantities {
real z_sum = sum(z);
} |
yes that cheatsheet was super useful i came across it last week only!!!!
…On Fri, Jul 22, 2022 at 3:16 PM Sean Pinkney ***@***.***> wrote:
I got it!!
By chance I happened to find a ILR cheatsheet
<http://www.sediment.uni-goettingen.de/staff/tolosana/extra/CoDaNutshell.pdf>.
Wow! So easy to take the derivative. I confirmed with the Julia code.
$$ \log |J| = \log\text{softmax}(V^\top y)+ \log(D) $$
data {
int<lower=0> N;
matrix[N - 1 , N - 1] Vinv;
//matrix[N, N] Vinv_full;
}parameters {
vector[N - 1] y;
}transformed parameters {
simplex[N] z;
vector[N - 1] s = Vinv * y;
real logabsdet = sum(log_softmax( Vinv_full[:, 1:N-1] * y)) + log(N);
{
real alpha = log_sum_exp(append_row(s, 0));
z = append_row(exp(s - alpha), 1 / exp(alpha));
}
}model {
target += logabsdet;
}generated quantities {
real z_sum = sum(z);
}
—
Reply to this email directly, view it on GitHub
<#39 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANEZILHDJK3PCYIJUI533O3VVLXPPANCNFSM54JQKIWQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
@sethaxen I'm seeing discrepancies from ForwardDiff and this log-abs-det that I'm not sure are due to numerical computation errors or I'm missing something on this transform. If I make the dimension size 10000 the difference is about 2 which seems much too high. Is this a typical observation or does it indicate I'm still missing something. The differences are very small for low dimension sizes. |
Ok, it's floating point errors. If I restrict the domain of y to 0,1 or -1, 1 or something the differences disappear. I'm guessing it's the final exp. |
I haven't had this much fun doing math in a long time. What I have is correct and it follows from the matrix determinant lemma that @bob-carpenter gave (thanks a ton for that hint)! It's quite elegant that the log-determinant of the matrix calculus formula boils down to this simple update. I'll update the paper with the math and make a pull request for the stan code over the next few days. |
Algebra is a lot of fun when a big pile of math like a log Jacobian determinant works out to something simple. I'm hoping we can convey that to the reader while making all this understandable. |
After going through all the math this is just the softmax transform with a linear scaling applied. |
I'm opening this back up because - although I don't understand why - the ESS is 10x higher for the last parameter in this transform vs the softmax. |
Is that softmax with the last element pinned to zero? |
Yes, softmax with the last value pinned to zero |
Perhaps that's not surprising when you look at the Jacobian, which is all about how much probability is left over for the last element and then raised to dimensionality power. |
I can't find where @bob-carpenter mentioned that he was still trying to understand this transform but it piqued my curiosity because I hadn't heard of it.
After grokking the idea I was able to put together an R script that generates a simplex using it. The idea connects unit vector and simplex stuff together. The following is the reverse of the ILR so it's a bit weird. Needing to invert the Helmert matrix - which enforces the the sum-to-zero constraint in the forward mapping.
I've written a Stan program without the log-det-jac adjustment because I'm tired and I think @sethaxen may be able to whip it up much quicker than me. I put a normal prior just to get sampling going.
To run this you need
and example output of
The text was updated successfully, but these errors were encountered: