-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
correlation matrix transforms #24
Comments
Here's also a parameter augmentation approach. The constraints of the correlation matrix are that it is symmetric with a unit diagonal and off-diagonal elements between -1 and 1. These constraints imply that the (upper) Cholesky factor is upper triangular with the first Let Then if we use the prior It would be nice to include this to see how it compares to the bijective maps. |
Nice observation. I'd also be curious to see how it compares. Doing the same tricks with simplex and unit vectors is illustrative of how to think about these problems.
Yes, please! This also illustrates that a lot of these transforms have constrain-to-positive components (e.g., softmax uses exp()). But we also want to evaluate soft plus and maybe inverse (c)loglog as alternatives to exp. |
matrix onion_lkj_ben (row_vector l, vector R2) {
int K = num_elements(R2) + 1;
matrix[K, K] L;
int start = 1;
int end = 2;
L[1,1] = 1.0;
L[2,1] = 2.0 * R2[1] - 1.0;
L[1,2] = 0;
L[2,2] = sqrt(1.0 - square(L[2,1]));
for(k in 2:K - 1) {
int kp1 = k + 1;
// unit vector "transform" next line
L[kp1, 1:k] = segment(l, start, k) * sqrt(R2[k] / dot_self(segment(l, start, k)));
L[1:k, kp1] = zeros_vector(k);
L[kp1, kp1] = sqrt(1.0 - R2[k]);
start = end + 1;
end = start + k - 1;
}
return L;
}
parameters {
vector[choose(K, 2) - 1] l;
vector<lower = 0, upper = 1>[K-1] R2; // first element is not really a R^2 but is on (0,1)
} I found out that I was getting treedepth issues due to the unit vector normalization. Changing that line with hyperspherical transform fixed those issues.
functions {
vector lower_elements(matrix M, int tri_size){
int n = rows(M);
int counter = 1;
vector[tri_size] out;
for (i in 2:n){
for (j in 1:(i - 1)) {
out[counter] = M[i, j];
counter += 1;
}
}
return out;
}
matrix angle2chol(matrix angle_mat) {
int N = rows(angle_mat);
matrix[N, N] inv_mat = identity_matrix(N);
inv_mat[ : , 1] = cos(angle_mat[ : , 1]);
for (i in 2 : N) {
inv_mat[i, i] = prod(sin(angle_mat[i, 1 : (i - 1)]));
if (i > 2) {
for (j in 2 : (i - 1)) {
inv_mat[i, j] = cos(angle_mat[i, j]) * prod(sin(angle_mat[i, 1 : (j - 1)]));
}
}
}
return inv_mat;
}
matrix angle_vec2angle_mat(vector angle, int K) {
int N = num_elements(angle);
matrix[K, K] mat = add_diag(identity_matrix(K), rep_vector(-1, K));
int count = K - 1;
int pos = 1;
for (k in 1 : K - 1) {
mat[k + 1:K, k] = segment(angle, pos, count);
pos += count;
count -= 1;
}
return mat;
}
}
data {
int N;
matrix[N, N] R;
}
transformed data {
int K = (N * (N - 1)) %/% 2;
}
parameters {
vector<lower = 0, upper = pi()>[K] theta;
cholesky_factor_corr[N] L_stan;
}
transformed parameters {
matrix[N, N] L = angle2chol(angle_vec2angle_mat(theta, N));
}
model {
// log_absd_jacs
// sin(theta) is always > 0 since theta in (0, pi)
// because of this the diagonal of L is always > 0 also
target += sum(log(sin(theta))); // angle2chol
}
generated quantities {
matrix[N, N] R_spherical = multiply_lower_tri_self_transpose(L);
matrix[N, N] R_stan = multiply_lower_tri_self_transpose(L_stan);
} |
+1 for onion. I recall I found that the incremental onion was ~3x faster than vine (for random sampling). Here's a Julia implementation: |
The best possible outcome for this work in my opinion will be a better set of transforms! I couldn't understand the LKJ paper when Ben first showed it to me 10 years ago, but I should give it a go again now that I understand multivariate calc better. |
I can add the angle/hypershperical to the paper if we want. We'd probably not use it for "efficiency" because of all the ops involved. We first need a bunch of random values between the Cholesky factor The log-abs-determinant of the Jacobian is just the product of the diagonals. Since the diagonals are positive by construction, no need to abs. |
Great idea! As suggested in #24 (comment), we could use any of our approaches for unit vectors for Cholesky factors of correlation matrices. In those cases probably best to just refer in the paper to the Jacobian determinants already computed for unit vectors.
Are you sure? IIUC this transformation using a separate set of hyperspherical coordinates for each row vector (which lives on the hemisphere). The combined Jacobian determinant would then be the product of the Jacobian determinants for the transform in each row, not of the product of the diagonal entries of so would I think be the full Jacobian determinant. Multiplying the diagonal terms of |
Yes, you're right, I was doing the change from L -> R. Ok, let's see if we can get this. My head is hurting a bit from this but I think your latter equation is missing an index? For eg, take N, K = 5 then we get
we're missing I think it should come out to be
which is In the Stan program I can return the target directly using an matrix angle2chol_lp(matrix angle_mat) {
int N = rows(angle_mat);
matrix[N, N] inv_mat = identity_matrix(N);
inv_mat[ : , 1] = cos(angle_mat[ : , 1]);
for (i in 2 : N) {
inv_mat[i, i] = prod(sin(angle_mat[i, 1 : (i - 1)]));
target += (N - i + 1) * log(sin(angle_mat[i:N, i - 1]));
if (i > 2) {
for (j in 2 : (i - 1)) {
inv_mat[i, j] = cos(angle_mat[i, j]) * prod(sin(angle_mat[i, 1 : (j - 1)]));
}
}
}
return inv_mat;
} |
Yes, you're absolutely right, and thanks for catching that! Let's restart. For an The Jacobian determinant for row So This time I tested it with AD in Julia: function spherical_coords_to_unit_vector!(x, ϕ)
T = eltype(x)
N = length(x)
sin_prod = one(T)
log_det_jac = zero(T)
rcounter = N - 2
for i in 1:(N-2)
sᵢ, cᵢ = sincos(ϕ[i])
x[i] = sin_prod * cᵢ
sin_prod *= sᵢ
log_det_jac += rcounter * log(sᵢ)
rcounter -= 1
end
sᵢ, cᵢ = sincos(ϕ[N - 1])
x[N-1] = sin_prod * cᵢ
x[N] = sin_prod * sᵢ
return x, log_det_jac
end
function spherical_coords_to_unit_vector(ϕ)
N = length(ϕ) + 1
T = float(typeof(first(ϕ)))
x = similar(ϕ, T, N)
return spherical_coords_to_unit_vector!(x, ϕ)
end
function spherical_coords_to_corrchol(ϕ, N, K)
@assert length(ϕ) == num_angles_corrchol(N, K)
T = float(typeof(first(ϕ)))
L = similar(ϕ, T, N, K)
fill!(L, 0)
L[1, 1] = 1
log_det_jac = zero(T)
i = 1
for n in 2:N
row_length = min(n, K)
ϕₙ = @views ϕ[i:(i + row_length - 2)]
Lₙ = @views L[n, 1:row_length]
_, logJ = spherical_coords_to_unit_vector!(Lₙ, ϕₙ)
log_det_jac += logJ
i += row_length - 1
end
@assert i == length(ϕ) + 1
return L, log_det_jac
end
num_angles_corrchol(N, K) = sum(n -> min(n, K) - 1, 2:N)
function corrchol_to_vec(X)
N, K = size(X)
return @views reduce(vcat, X[n, 1:min(n, K)] for n in 2:N)
end
julia> using Test, LinearAlgebra, ForwardDiff
julia> N, K = 16, 9
julia> ϕ = rand(num_angles_corrchol(N, K)) * π;
julia> L, logJ = spherical_coords_to_corrchol(ϕ, N, K);
julia> L
16×9 Matrix{Float64}:
1.0 0.0 0.0 0.0 … 0.0 0.0 0.0
0.947746 0.319027 0.0 0.0 0.0 0.0 0.0
-0.453748 0.455677 0.765814 0.0 0.0 0.0 0.0
0.162752 -0.815836 0.0411449 0.553381 0.0 0.0 0.0
-0.0675745 -0.678065 -0.390384 -0.357414 0.0 0.0 0.0
-0.794639 -0.452183 0.277417 -0.278334 … 0.0 0.0 0.0
0.985237 -0.141132 -0.0191612 0.0796641 0.00170641 0.0 0.0
0.21713 0.976095 -0.00572299 0.00442652 0.000308881 0.000644938 0.0
0.545889 -0.612698 -0.441773 -0.329475 0.0631373 -0.036901 0.0304645
0.333311 0.930827 -0.130218 -0.00151273 -0.023701 0.00087454 0.00131505
-0.337961 0.890776 0.271462 -0.130395 … 0.00101614 0.00016551 0.00121652
0.998049 -0.0417353 0.046215 -0.00379095 0.000462996 -0.00149741 0.000162026
-0.377976 -0.74856 0.462584 0.200646 -0.0564089 -0.0722899 0.0165523
0.743822 0.113604 -0.199132 0.289515 0.106297 0.182008 0.157098
-0.814088 0.580163 0.0188838 0.00198259 -0.00379611 -0.0121217 0.00579842
0.535706 0.649043 -0.537816 -0.00748968 … -0.00159064 0.000449167 0.000740431
julia> @testset "cholesky factor satisfies constraints" begin
@test istril(L)
@test all(>(0), diag(L))
R = L*L'
@test all(≈(1), diag(R))
@test L ≈ cholesky(R; check=false).L[:, 1:K]
end
Test Summary: | Pass Total
cholesky factor satisfies constraints | 4 4
Test.DefaultTestSet("cholesky factor satisfies constraints", Any[], 4, false, false)
julia> J = ForwardDiff.jacobian(ϕ -> corrchol_to_vec(spherical_coords_to_corrchol(ϕ, N, K)[1]), ϕ)
107×92 Matrix{Float64}:
-0.319027 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0
0.947746 0.0 0.0 0.0 0.0 0.0 0.0 0.0
-0.0 -0.89113 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0
0.0 -0.232023 -0.765814 0.0 0.0 0.0 0.0 0.0
0.0 -0.38994 0.455677 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 -0.986667 0.0 … 0.0 0.0 0.0
-0.0 -0.0 -0.0 -0.134574 -0.554908 -0.0 -0.0 -0.0
0.0 0.0 0.0 0.00678693 -0.0604921 0.0 0.0 0.0
0.0 0.0 0.0 0.0912811 -0.81359 0.0 0.0 0.0
⋮ ⋱ ⋮
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
-0.0 -0.0 -0.0 -0.0 -0.0 … -0.0 -0.0 -0.0
-0.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0
-0.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0
0.0 0.0 0.0 0.0 0.0 -0.00181111 0.0 0.0
-0.0 -0.0 -0.0 -0.0 -0.0 -0.0329006 -0.000866019 -0.0
0.0 0.0 0.0 0.0 0.0 … 0.00929052 -0.000824996 -0.000740431
0.0 0.0 0.0 0.0 0.0 0.015315 -0.00135997 0.000449167
julia> @test logabsdet(J'J)[1]/2 ≈ logJ
Test Passed
Expression: (logabsdet(J' * J))[1] / 2 ≈ logJ
Evaluated: -205.01162441028785 ≈ -205.01162441028777 |
Add analysis of correlation matrix transforms
The text was updated successfully, but these errors were encountered: