correlation matrix transforms #24

bob-carpenter · 2022-07-12T14:34:58Z

Add analysis of correlation matrix transforms

implement Stan's correlation matrix transform as a Stan program
implement the alternative from @adamhaber used in TensorFlow and discussed here: Discrepancy between TFP and Stan's CholeskyLKJ log prob tensorflow/probability#694
implement test models
- LKJ of various parameterizations
- ???

sethaxen · 2022-07-14T18:28:56Z

Here's also a parameter augmentation approach. The constraints of the correlation matrix are that it is symmetric with a unit diagonal and off-diagonal elements between -1 and 1. These constraints imply that the (upper) Cholesky factor is upper triangular with the first $i$ elements of a column being on the unit hemisphere with the $i$ th element being positive.

Let $Y$ be an upper triangular matrix of unconstrained parameters that can be stacked into a vector $y$. Let $y_i$ be the $i$ th column of $Y$. Let $z_i = \operatorname{cat}(y_i[1:i-1], \exp(y_i[i])$. Then $x_i = \operatorname{normalize}(z_i)$ is on the unit hemisphere. This is the same approach Stan uses for unit vectors, except we precompose the map with the exponential applied to the final element.

Then if we use the prior $\operatorname{norm}(z_i) \sim \chi_i$ for the unconstrained norm, we end up with the log density correction
$$\sum_{i=2}^n e^{y_i} - \frac{1}{2} \left( e^{2y_i} + \sum_{k=1}^{i-1} y_i^2\right)$$

It would be nice to include this to see how it compares to the bijective maps.

bob-carpenter · 2022-07-15T15:23:12Z

Nice observation. I'd also be curious to see how it compares. Doing the same tricks with simplex and unit vectors is illustrative of how to think about these problems.

It would be nice to include this to see how it compares to the bijective maps.

Yes, please!

This also illustrates that a lot of these transforms have constrain-to-positive components (e.g., softmax uses exp()). But we also want to evaluate soft plus and maybe inverse (c)loglog as alternatives to exp.

spinkney · 2022-07-19T22:53:28Z

There's also the "onion" method from the lkj paper. Stan's Cholesky factors often fail for dimensions > 50 and possible even > 30. The onion method constructs the matrix and it couples the unit vector transform inside. Ben G. had this code

 matrix onion_lkj_ben (row_vector l, vector R2) {
    int K = num_elements(R2) + 1;
    matrix[K, K] L;
    int start = 1;
    int end = 2;

    L[1,1] = 1.0;
    L[2,1] = 2.0 * R2[1] - 1.0;
    L[1,2] = 0;
    L[2,2] = sqrt(1.0 - square(L[2,1]));
    for(k in 2:K - 1) {
      int kp1 = k + 1;
      // unit vector "transform" next line
      L[kp1, 1:k] = segment(l, start, k) * sqrt(R2[k] / dot_self(segment(l, start, k)));
      L[1:k, kp1] = zeros_vector(k);
      L[kp1, kp1] = sqrt(1.0 - R2[k]);
      start = end + 1;
      end = start + k - 1;
    }
  return L;
  }
parameters {
  vector[choose(K, 2) - 1]  l;       
  vector<lower = 0, upper = 1>[K-1] R2; // first element is not really a R^2 but is on (0,1)  
}

I found out that I was getting treedepth issues due to the unit vector normalization. Changing that line with hyperspherical transform fixed those issues.

I think this does the hyperspherical transform for correlation matrices

functions {
vector lower_elements(matrix M, int tri_size){
    int n = rows(M);
    int counter = 1;
    vector[tri_size] out;

    for (i in 2:n){
      for (j in 1:(i - 1)) {
        out[counter] = M[i, j];
        counter += 1;
      }
    }
    return out;
  }

 matrix angle2chol(matrix angle_mat) {
  int N = rows(angle_mat);
  matrix[N, N] inv_mat = identity_matrix(N);
  
  inv_mat[ : , 1] = cos(angle_mat[ : , 1]);
  
  for (i in 2 : N) {
    inv_mat[i, i] = prod(sin(angle_mat[i, 1 : (i - 1)]));
    if (i > 2) {
      for (j in 2 : (i - 1)) {
        inv_mat[i, j] = cos(angle_mat[i, j]) * prod(sin(angle_mat[i, 1 : (j - 1)]));
      }
    }
  }
  return inv_mat;
}
matrix angle_vec2angle_mat(vector angle, int K) {
  int N = num_elements(angle);
  matrix[K, K] mat = add_diag(identity_matrix(K), rep_vector(-1, K));
  int count = K - 1;
  int pos = 1;

  for (k in 1 : K - 1) {
    mat[k + 1:K, k] = segment(angle, pos, count);
    pos += count;
    count -= 1;
  }
  
  return mat;
}
}
data {
  int N;
  matrix[N, N] R;
}
transformed data {
  int K = (N * (N - 1)) %/% 2;
}
parameters {
  vector<lower = 0, upper = pi()>[K] theta;
  cholesky_factor_corr[N] L_stan;
}
transformed parameters {
  matrix[N, N] L = angle2chol(angle_vec2angle_mat(theta, N));
}
model {
  // log_absd_jacs 
  // sin(theta) is always > 0 since theta in (0, pi)
  // because of this the diagonal of L is always > 0 also
   target += sum(log(sin(theta)));            // angle2chol
}
generated quantities {
   matrix[N, N] R_spherical = multiply_lower_tri_self_transpose(L);
    matrix[N, N] R_stan = multiply_lower_tri_self_transpose(L_stan);
}

sethaxen · 2022-07-19T23:39:20Z

+1 for onion. I recall I found that the incremental onion was ~3x faster than vine (for random sampling). Here's a Julia implementation:

https://github.com/JuliaStats/Distributions.jl/blob/0c9367ca7a7549d46c12d05b0ee5ec8e5000bc13/src/cholesky/lkjcholesky.jl#L229-L267

bob-carpenter · 2022-07-20T15:16:22Z

The best possible outcome for this work in my opinion will be a better set of transforms! I couldn't understand the LKJ paper when Ben first showed it to me 10 years ago, but I should give it a go again now that I understand multivariate calc better.

spinkney · 2022-07-20T15:38:10Z

I can add the angle/hypershperical to the paper if we want. We'd probably not use it for "efficiency" because of all the ops involved.

We first need a bunch of random values between $0$ and $\pi$ which are the raw values input s.t.

$$ \theta=\left[\begin{array}{ccccccc} 0 & 0 & 0 & \cdots & 0 & 0 & 0 \\ \theta_{2,1} & 0 & 0 & \cdots & 0 & 0 & 0 \\ \theta_{3,1} & \theta_{3,2} & 0 & \cdots & 0 & 0 & 0 \\ \theta_{4,1} & \theta_{4,2} & \theta_{4,3} & \cdots & 0 & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots \\ \theta_{n-1,1} & \theta_{n-1,2} & \theta_{n-1,3} & \cdots & \theta_{n-1, n-2} & 0 & 0 \\ \theta_{n, 1} & \theta_{n, 2} & \theta_{n, 3} & \cdots & \theta_{n, n-2} & \theta_{n, n-1} & 0 \end{array}\right] $$

the Cholesky factor $L$ can be constructed as

$$ L=\left[\begin{array}{ccccc} 1 & 0 & 0 & \cdots & 0 \\ \cos \theta_{2,1} & \sin \theta_{2,1} & 0 & \cdots & 0 \\ \cos \theta_{3,1} & \cos \theta_{3,2} \sin \theta_{3,1} & \sin \theta_{3,2} \sin \theta_{3,1} & \cdots & 0 \\ \cos \theta_{4,1} & \cos \theta_{4,2} \sin \theta_{4,1} & \cos \theta_{4,3} \sin \theta_{4,2} \sin \theta_{4,1} & \cdots & 0 \\ \vdots & \vdots & \vdots & \cdots & \vdots \\ \cos \theta_{n, 1} & \cos \theta_{n, 2} \sin \theta_{n, 1} & \cos \theta_{n, 3} \sin \theta_{n, 2} \sin \theta_{n, 1} & \cdots & \prod_{k=1}^{n-1} \sin \theta_{n, k} \end{array}\right] . $$

The log-abs-determinant of the Jacobian is just the product of the diagonals. Since the diagonals are positive by construction, no need to abs.

sethaxen · 2022-07-20T15:56:28Z

I can add the angle/hypershperical to the paper if we want. We'd probably not use it for "efficiency" because of all the ops involved.

Great idea! As suggested in #24 (comment), we could use any of our approaches for unit vectors for Cholesky factors of correlation matrices. In those cases probably best to just refer in the paper to the Jacobian determinants already computed for unit vectors.

The log-abs-determinant of the Jacobian is just the product of the diagonals

Are you sure? IIUC this transformation using a separate set of hyperspherical coordinates for each row vector (which lives on the hemisphere). The combined Jacobian determinant would then be the product of the Jacobian determinants for the transform in each row, not of the product of the diagonal entries of $L$. The paper contains a derivation of the Jacobian determinant for what seem to be the same hyperspherical coordinates used here. If $L$ is $N \times K$, then

$$ |J(\phi_{n,:})| = \prod_{k=1}^{K-2} \sin(\phi_{n,k})^{K-k-1}, $$

so
$$|J(\phi)| = \prod_{n=2}^N \prod_{k=1}^{K-2} \sin(\phi_{n,k})^{K-k-1}$$

would I think be the full Jacobian determinant. Multiplying the diagonal terms of $L$ would miss the exponents in these expressions.

spinkney · 2022-07-20T18:04:51Z

Yes, you're right, I was doing the change from L -> R.

Ok, let's see if we can get this. My head is hurting a bit from this but I think your latter equation is missing an index?

For eg, take N, K = 5 then we get

n=2: sin(rho[2, 1])^(K - 2) * sin(rho[2, 2])^(K - 3) * sin(rho[2, 3])^(K - 4) 
n=3: sin(rho[3, 1])^(K - 2) * sin(rho[3, 2])^(K - 3) * sin(rho[3, 3])^(K - 4)
n=4: sin(rho[4, 1])^(K - 2) * sin(rho[4, 2])^(K - 3) * sin(rho[4, 3])^(K - 4)
n=5: sin(rho[5, 1])^(K - 2) * sin(rho[5, 2])^(K - 3) * sin(rho[5, 3])^(K - 4)

we're missing rho[4, 5] and I don't think we should have any $i = j$ indexes, right?

I think it should come out to be

n=2: sin(rho[2, 1])^(K - 1) * sin(rho[3, 2])^(K - 2) * sin(rho[4, 3])^(K - 3) * sin(rho(5, 4))^(K - 4)
n=3: sin(rho[3, 1])^(K - 1) * sin(rho[4, 2])^(K - 2) * sin(rho[5, 3])^(K - 3)
n=4: sin(rho[4, 1])^(K - 1) * sin(rho[5, 2])^(K - 2) 
n=5: sin(rho[5, 1])^(K - 1)

which is

$$ |J(\phi)| = \prod_{k=1}^{K-1} \bigg(\prod_{n=k+ 1}^{N} \sin(\phi_{n,k})\bigg)^{N-k} $$

In the Stan program I can return the target directly using an _lp function

 matrix angle2chol_lp(matrix angle_mat) {
  int N = rows(angle_mat);
  matrix[N, N] inv_mat = identity_matrix(N);
  
  inv_mat[ : , 1] = cos(angle_mat[ : , 1]);
  
  for (i in 2 : N) {
    inv_mat[i, i] = prod(sin(angle_mat[i, 1 : (i - 1)]));
    target += (N - i + 1) * log(sin(angle_mat[i:N, i - 1]));
    if (i > 2) {
      for (j in 2 : (i - 1)) {
        inv_mat[i, j] = cos(angle_mat[i, j]) * prod(sin(angle_mat[i, 1 : (j - 1)]));
      }
    }
  }
  return inv_mat;
}

sethaxen · 2022-07-20T19:34:44Z

Yes, you're absolutely right, and thanks for catching that! Let's restart.

For an $N \times K$ lower cholesky factor $L$ where $N \ge K$, row $n$ has the constraint that $L_{n,:}$ is on the unit hemisphere in $\min(n, K)$-dimensional space.

The Jacobian determinant for row $n$ is then:

$$ |J(\phi_{n,:})| = \prod_{k=1}^{\min(n,K)-2} \sin(\phi_{n,k})^{\min(n,K)-k-1}, $$

So

$$ |J(\phi)| = \prod_{k=1}^{K-2} \prod_{n=k+2}^N \sin(\phi_{n,k})^{\min(n, K)-k-1} $$

This time I tested it with AD in Julia:

function spherical_coords_to_unit_vector!(x, ϕ)
    T = eltype(x)
    N = length(x)
    sin_prod = one(T)
    log_det_jac = zero(T)
    rcounter = N - 2
    for i in 1:(N-2)
        sᵢ, cᵢ = sincos(ϕ[i])
        x[i] = sin_prod * cᵢ
        sin_prod *= sᵢ
        log_det_jac += rcounter * log(sᵢ)
        rcounter -= 1
    end
    sᵢ, cᵢ = sincos(ϕ[N - 1])
    x[N-1] = sin_prod * cᵢ
    x[N] = sin_prod * sᵢ
    return x, log_det_jac
end

function spherical_coords_to_unit_vector(ϕ)
    N = length(ϕ) + 1
    T = float(typeof(first(ϕ)))
    x = similar(ϕ, T, N)
    return spherical_coords_to_unit_vector!(x, ϕ)
end

function spherical_coords_to_corrchol(ϕ, N, K)
    @assert length(ϕ) == num_angles_corrchol(N, K)
    T = float(typeof(first(ϕ)))
    L = similar(ϕ, T, N, K)
    fill!(L, 0)
    L[1, 1] = 1
    log_det_jac = zero(T)
    i = 1
    for n in 2:N
        row_length = min(n, K)
        ϕₙ = @views ϕ[i:(i + row_length - 2)]
        Lₙ = @views L[n, 1:row_length]
        _, logJ = spherical_coords_to_unit_vector!(Lₙ, ϕₙ)
        log_det_jac += logJ
        i += row_length - 1
    end
    @assert i == length(ϕ) + 1
    return L, log_det_jac
end

num_angles_corrchol(N, K) = sum(n -> min(n, K) - 1, 2:N)

function corrchol_to_vec(X)
    N, K = size(X)
    return @views reduce(vcat, X[n, 1:min(n, K)] for n in 2:N)
end

julia> using Test, LinearAlgebra, ForwardDiff

julia> N, K = 16, 9

julia> ϕ = rand(num_angles_corrchol(N, K)) * π;

julia> L, logJ = spherical_coords_to_corrchol(ϕ, N, K);

julia> L
16×9 Matrix{Float64}:
1.0         0.0         0.0          0.0         …   0.0           0.0          0.0
0.947746    0.319027    0.0          0.0             0.0           0.0          0.0
-0.453748    0.455677    0.765814     0.0             0.0           0.0          0.0
0.162752   -0.815836    0.0411449    0.553381        0.0           0.0          0.0
-0.0675745  -0.678065   -0.390384    -0.357414        0.0           0.0          0.0
-0.794639   -0.452183    0.277417    -0.278334    …   0.0           0.0          0.0
0.985237   -0.141132   -0.0191612    0.0796641       0.00170641    0.0          0.0
0.21713     0.976095   -0.00572299   0.00442652      0.000308881   0.000644938  0.0
0.545889   -0.612698   -0.441773    -0.329475        0.0631373    -0.036901     0.0304645
0.333311    0.930827   -0.130218    -0.00151273     -0.023701      0.00087454   0.00131505
-0.337961    0.890776    0.271462    -0.130395    …   0.00101614    0.00016551   0.00121652
0.998049   -0.0417353   0.046215    -0.00379095      0.000462996  -0.00149741   0.000162026
-0.377976   -0.74856     0.462584     0.200646       -0.0564089    -0.0722899    0.0165523
0.743822    0.113604   -0.199132     0.289515        0.106297      0.182008     0.157098
-0.814088    0.580163    0.0188838    0.00198259     -0.00379611   -0.0121217    0.00579842
0.535706    0.649043   -0.537816    -0.00748968  …  -0.00159064    0.000449167  0.000740431

julia> @testset "cholesky factor satisfies constraints" begin
    @test istril(L)
    @test all(>(0), diag(L))
    R = L*L'
    @test all(≈(1), diag(R))
    @test L ≈ cholesky(R; check=false).L[:, 1:K]
end
Test Summary:                         | Pass  Total
cholesky factor satisfies constraints |    4      4
Test.DefaultTestSet("cholesky factor satisfies constraints", Any[], 4, false, false)

julia> J = ForwardDiff.jacobian(ϕ -> corrchol_to_vec(spherical_coords_to_corrchol(ϕ, N, K)[1]), ϕ)
107×92 Matrix{Float64}:
 -0.319027   0.0        0.0        0.0          0.0        …   0.0          0.0           0.0
  0.947746   0.0        0.0        0.0          0.0            0.0          0.0           0.0
 -0.0       -0.89113   -0.0       -0.0         -0.0           -0.0         -0.0          -0.0
  0.0       -0.232023  -0.765814   0.0          0.0            0.0          0.0           0.0
  0.0       -0.38994    0.455677   0.0          0.0            0.0          0.0           0.0
  0.0        0.0        0.0       -0.986667     0.0        …   0.0          0.0           0.0
 -0.0       -0.0       -0.0       -0.134574    -0.554908      -0.0         -0.0          -0.0
  0.0        0.0        0.0        0.00678693  -0.0604921      0.0          0.0           0.0
  0.0        0.0        0.0        0.0912811   -0.81359        0.0          0.0           0.0
  ⋮                                                        ⋱                ⋮            
  0.0        0.0        0.0        0.0          0.0            0.0          0.0           0.0
 -0.0       -0.0       -0.0       -0.0         -0.0        …  -0.0         -0.0          -0.0
 -0.0       -0.0       -0.0       -0.0         -0.0           -0.0         -0.0          -0.0
 -0.0       -0.0       -0.0       -0.0         -0.0           -0.0         -0.0          -0.0
  0.0        0.0        0.0        0.0          0.0           -0.00181111   0.0           0.0
 -0.0       -0.0       -0.0       -0.0         -0.0           -0.0329006   -0.000866019  -0.0
  0.0        0.0        0.0        0.0          0.0        …   0.00929052  -0.000824996  -0.000740431
  0.0        0.0        0.0        0.0          0.0            0.015315    -0.00135997    0.000449167

julia> @test logabsdet(J'J)[1]/2 ≈ logJ
Test Passed
  Expression: (logabsdet(J' * J))[1] / 2 ≈ logJ
   Evaluated: -205.01162441028785 ≈ -205.01162441028777

bob-carpenter added the enhancement New feature or request label Jul 12, 2022

adamhaber mentioned this issue Jul 14, 2022

Feature/corr cholesky [WIP] #27

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

correlation matrix transforms #24

correlation matrix transforms #24

bob-carpenter commented Jul 12, 2022

sethaxen commented Jul 14, 2022

bob-carpenter commented Jul 15, 2022

spinkney commented Jul 19, 2022 •

edited

Loading

sethaxen commented Jul 19, 2022

bob-carpenter commented Jul 20, 2022

spinkney commented Jul 20, 2022

sethaxen commented Jul 20, 2022

spinkney commented Jul 20, 2022 •

edited

Loading

sethaxen commented Jul 20, 2022 •

edited

Loading

correlation matrix transforms #24

correlation matrix transforms #24

Comments

bob-carpenter commented Jul 12, 2022

sethaxen commented Jul 14, 2022

bob-carpenter commented Jul 15, 2022

spinkney commented Jul 19, 2022 • edited Loading

sethaxen commented Jul 19, 2022

bob-carpenter commented Jul 20, 2022

spinkney commented Jul 20, 2022

sethaxen commented Jul 20, 2022

spinkney commented Jul 20, 2022 • edited Loading

sethaxen commented Jul 20, 2022 • edited Loading

spinkney commented Jul 19, 2022 •

edited

Loading

spinkney commented Jul 20, 2022 •

edited

Loading

sethaxen commented Jul 20, 2022 •

edited

Loading