Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

correlation matrix transforms #24

Open
5 tasks
bob-carpenter opened this issue Jul 12, 2022 · 9 comments
Open
5 tasks

correlation matrix transforms #24

bob-carpenter opened this issue Jul 12, 2022 · 9 comments
Labels
enhancement New feature or request

Comments

@bob-carpenter
Copy link
Collaborator

Add analysis of correlation matrix transforms

@bob-carpenter bob-carpenter added the enhancement New feature or request label Jul 12, 2022
@sethaxen
Copy link
Collaborator

Here's also a parameter augmentation approach. The constraints of the correlation matrix are that it is symmetric with a unit diagonal and off-diagonal elements between -1 and 1. These constraints imply that the (upper) Cholesky factor is upper triangular with the first $i$ elements of a column being on the unit hemisphere with the $i$ th element being positive.

Let $Y$ be an upper triangular matrix of unconstrained parameters that can be stacked into a vector $y$. Let $y_i$ be the $i$ th column of $Y$. Let $z_i = \operatorname{cat}(y_i[1:i-1], \exp(y_i[i])$. Then $x_i = \operatorname{normalize}(z_i)$ is on the unit hemisphere. This is the same approach Stan uses for unit vectors, except we precompose the map with the exponential applied to the final element.

Then if we use the prior $\operatorname{norm}(z_i) \sim \chi_i$ for the unconstrained norm, we end up with the log density correction
$$\sum_{i=2}^n e^{y_i} - \frac{1}{2} \left( e^{2y_i} + \sum_{k=1}^{i-1} y_i^2\right)$$

It would be nice to include this to see how it compares to the bijective maps.

@bob-carpenter
Copy link
Collaborator Author

Nice observation. I'd also be curious to see how it compares. Doing the same tricks with simplex and unit vectors is illustrative of how to think about these problems.

It would be nice to include this to see how it compares to the bijective maps.

Yes, please!

This also illustrates that a lot of these transforms have constrain-to-positive components (e.g., softmax uses exp()). But we also want to evaluate soft plus and maybe inverse (c)loglog as alternatives to exp.

@spinkney
Copy link
Collaborator

spinkney commented Jul 19, 2022

  1. There's also the "onion" method from the lkj paper. Stan's Cholesky factors often fail for dimensions > 50 and possible even > 30. The onion method constructs the matrix and it couples the unit vector transform inside. Ben G. had this code
 matrix onion_lkj_ben (row_vector l, vector R2) {
    int K = num_elements(R2) + 1;
    matrix[K, K] L;
    int start = 1;
    int end = 2;

    L[1,1] = 1.0;
    L[2,1] = 2.0 * R2[1] - 1.0;
    L[1,2] = 0;
    L[2,2] = sqrt(1.0 - square(L[2,1]));
    for(k in 2:K - 1) {
      int kp1 = k + 1;
      // unit vector "transform" next line
      L[kp1, 1:k] = segment(l, start, k) * sqrt(R2[k] / dot_self(segment(l, start, k)));
      L[1:k, kp1] = zeros_vector(k);
      L[kp1, kp1] = sqrt(1.0 - R2[k]);
      start = end + 1;
      end = start + k - 1;
    }
  return L;
  }
parameters {
  vector[choose(K, 2) - 1]  l;       
  vector<lower = 0, upper = 1>[K-1] R2; // first element is not really a R^2 but is on (0,1)  
}

I found out that I was getting treedepth issues due to the unit vector normalization. Changing that line with hyperspherical transform fixed those issues.

  1. I think this does the hyperspherical transform for correlation matrices
functions {
vector lower_elements(matrix M, int tri_size){
    int n = rows(M);
    int counter = 1;
    vector[tri_size] out;

    for (i in 2:n){
      for (j in 1:(i - 1)) {
        out[counter] = M[i, j];
        counter += 1;
      }
    }
    return out;
  }

 matrix angle2chol(matrix angle_mat) {
  int N = rows(angle_mat);
  matrix[N, N] inv_mat = identity_matrix(N);
  
  inv_mat[ : , 1] = cos(angle_mat[ : , 1]);
  
  for (i in 2 : N) {
    inv_mat[i, i] = prod(sin(angle_mat[i, 1 : (i - 1)]));
    if (i > 2) {
      for (j in 2 : (i - 1)) {
        inv_mat[i, j] = cos(angle_mat[i, j]) * prod(sin(angle_mat[i, 1 : (j - 1)]));
      }
    }
  }
  return inv_mat;
}
matrix angle_vec2angle_mat(vector angle, int K) {
  int N = num_elements(angle);
  matrix[K, K] mat = add_diag(identity_matrix(K), rep_vector(-1, K));
  int count = K - 1;
  int pos = 1;

  for (k in 1 : K - 1) {
    mat[k + 1:K, k] = segment(angle, pos, count);
    pos += count;
    count -= 1;
  }
  
  return mat;
}
}
data {
  int N;
  matrix[N, N] R;
}
transformed data {
  int K = (N * (N - 1)) %/% 2;
}
parameters {
  vector<lower = 0, upper = pi()>[K] theta;
  cholesky_factor_corr[N] L_stan;
}
transformed parameters {
  matrix[N, N] L = angle2chol(angle_vec2angle_mat(theta, N));
}
model {
  // log_absd_jacs 
  // sin(theta) is always > 0 since theta in (0, pi)
  // because of this the diagonal of L is always > 0 also
   target += sum(log(sin(theta)));            // angle2chol
}
generated quantities {
   matrix[N, N] R_spherical = multiply_lower_tri_self_transpose(L);
    matrix[N, N] R_stan = multiply_lower_tri_self_transpose(L_stan);
}

@sethaxen
Copy link
Collaborator

+1 for onion. I recall I found that the incremental onion was ~3x faster than vine (for random sampling). Here's a Julia implementation:

https://github.com/JuliaStats/Distributions.jl/blob/0c9367ca7a7549d46c12d05b0ee5ec8e5000bc13/src/cholesky/lkjcholesky.jl#L229-L267

@bob-carpenter
Copy link
Collaborator Author

The best possible outcome for this work in my opinion will be a better set of transforms! I couldn't understand the LKJ paper when Ben first showed it to me 10 years ago, but I should give it a go again now that I understand multivariate calc better.

@spinkney
Copy link
Collaborator

I can add the angle/hypershperical to the paper if we want. We'd probably not use it for "efficiency" because of all the ops involved.

We first need a bunch of random values between $0$ and $\pi$ which are the raw values input s.t.

$$ \theta=\left[\begin{array}{ccccccc} 0 & 0 & 0 & \cdots & 0 & 0 & 0 \\ \theta_{2,1} & 0 & 0 & \cdots & 0 & 0 & 0 \\ \theta_{3,1} & \theta_{3,2} & 0 & \cdots & 0 & 0 & 0 \\ \theta_{4,1} & \theta_{4,2} & \theta_{4,3} & \cdots & 0 & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots \\ \theta_{n-1,1} & \theta_{n-1,2} & \theta_{n-1,3} & \cdots & \theta_{n-1, n-2} & 0 & 0 \\ \theta_{n, 1} & \theta_{n, 2} & \theta_{n, 3} & \cdots & \theta_{n, n-2} & \theta_{n, n-1} & 0 \end{array}\right] $$

the Cholesky factor $L$ can be constructed as

$$ L=\left[\begin{array}{ccccc} 1 & 0 & 0 & \cdots & 0 \\ \cos \theta_{2,1} & \sin \theta_{2,1} & 0 & \cdots & 0 \\ \cos \theta_{3,1} & \cos \theta_{3,2} \sin \theta_{3,1} & \sin \theta_{3,2} \sin \theta_{3,1} & \cdots & 0 \\ \cos \theta_{4,1} & \cos \theta_{4,2} \sin \theta_{4,1} & \cos \theta_{4,3} \sin \theta_{4,2} \sin \theta_{4,1} & \cdots & 0 \\ \vdots & \vdots & \vdots & \cdots & \vdots \\ \cos \theta_{n, 1} & \cos \theta_{n, 2} \sin \theta_{n, 1} & \cos \theta_{n, 3} \sin \theta_{n, 2} \sin \theta_{n, 1} & \cdots & \prod_{k=1}^{n-1} \sin \theta_{n, k} \end{array}\right] . $$

The log-abs-determinant of the Jacobian is just the product of the diagonals. Since the diagonals are positive by construction, no need to abs.

@sethaxen
Copy link
Collaborator

I can add the angle/hypershperical to the paper if we want. We'd probably not use it for "efficiency" because of all the ops involved.

Great idea! As suggested in #24 (comment), we could use any of our approaches for unit vectors for Cholesky factors of correlation matrices. In those cases probably best to just refer in the paper to the Jacobian determinants already computed for unit vectors.

The log-abs-determinant of the Jacobian is just the product of the diagonals

Are you sure? IIUC this transformation using a separate set of hyperspherical coordinates for each row vector (which lives on the hemisphere). The combined Jacobian determinant would then be the product of the Jacobian determinants for the transform in each row, not of the product of the diagonal entries of $L$. The paper contains a derivation of the Jacobian determinant for what seem to be the same hyperspherical coordinates used here. If $L$ is $N \times K$, then

$$ |J(\phi_{n,:})| = \prod_{k=1}^{K-2} \sin(\phi_{n,k})^{K-k-1}, $$

so
$$|J(\phi)| = \prod_{n=2}^N \prod_{k=1}^{K-2} \sin(\phi_{n,k})^{K-k-1}$$

would I think be the full Jacobian determinant. Multiplying the diagonal terms of $L$ would miss the exponents in these expressions.

@spinkney
Copy link
Collaborator

spinkney commented Jul 20, 2022

Yes, you're right, I was doing the change from L -> R.

Ok, let's see if we can get this. My head is hurting a bit from this but I think your latter equation is missing an index?

For eg, take N, K = 5 then we get

n=2: sin(rho[2, 1])^(K - 2) * sin(rho[2, 2])^(K - 3) * sin(rho[2, 3])^(K - 4) 
n=3: sin(rho[3, 1])^(K - 2) * sin(rho[3, 2])^(K - 3) * sin(rho[3, 3])^(K - 4)
n=4: sin(rho[4, 1])^(K - 2) * sin(rho[4, 2])^(K - 3) * sin(rho[4, 3])^(K - 4)
n=5: sin(rho[5, 1])^(K - 2) * sin(rho[5, 2])^(K - 3) * sin(rho[5, 3])^(K - 4)

we're missing rho[4, 5] and I don't think we should have any $i = j$ indexes, right?

I think it should come out to be

n=2: sin(rho[2, 1])^(K - 1) * sin(rho[3, 2])^(K - 2) * sin(rho[4, 3])^(K - 3) * sin(rho(5, 4))^(K - 4)
n=3: sin(rho[3, 1])^(K - 1) * sin(rho[4, 2])^(K - 2) * sin(rho[5, 3])^(K - 3)
n=4: sin(rho[4, 1])^(K - 1) * sin(rho[5, 2])^(K - 2) 
n=5: sin(rho[5, 1])^(K - 1) 

which is

$$ |J(\phi)| = \prod_{k=1}^{K-1} \bigg(\prod_{n=k+ 1}^{N} \sin(\phi_{n,k})\bigg)^{N-k} $$

In the Stan program I can return the target directly using an _lp function

 matrix angle2chol_lp(matrix angle_mat) {
  int N = rows(angle_mat);
  matrix[N, N] inv_mat = identity_matrix(N);
  
  inv_mat[ : , 1] = cos(angle_mat[ : , 1]);
  
  for (i in 2 : N) {
    inv_mat[i, i] = prod(sin(angle_mat[i, 1 : (i - 1)]));
    target += (N - i + 1) * log(sin(angle_mat[i:N, i - 1]));
    if (i > 2) {
      for (j in 2 : (i - 1)) {
        inv_mat[i, j] = cos(angle_mat[i, j]) * prod(sin(angle_mat[i, 1 : (j - 1)]));
      }
    }
  }
  return inv_mat;
}

@sethaxen
Copy link
Collaborator

sethaxen commented Jul 20, 2022

Yes, you're absolutely right, and thanks for catching that! Let's restart.

For an $N \times K$ lower cholesky factor $L$ where $N \ge K$, row $n$ has the constraint that $L_{n,:}$ is on the unit hemisphere in $\min(n, K)$-dimensional space.

The Jacobian determinant for row $n$ is then:

$$ |J(\phi_{n,:})| = \prod_{k=1}^{\min(n,K)-2} \sin(\phi_{n,k})^{\min(n,K)-k-1}, $$

So

$$ |J(\phi)| = \prod_{k=1}^{K-2} \prod_{n=k+2}^N \sin(\phi_{n,k})^{\min(n, K)-k-1} $$

This time I tested it with AD in Julia:

function spherical_coords_to_unit_vector!(x, ϕ)
    T = eltype(x)
    N = length(x)
    sin_prod = one(T)
    log_det_jac = zero(T)
    rcounter = N - 2
    for i in 1:(N-2)
        sᵢ, cᵢ = sincos(ϕ[i])
        x[i] = sin_prod * cᵢ
        sin_prod *= sᵢ
        log_det_jac += rcounter * log(sᵢ)
        rcounter -= 1
    end
    sᵢ, cᵢ = sincos(ϕ[N - 1])
    x[N-1] = sin_prod * cᵢ
    x[N] = sin_prod * sᵢ
    return x, log_det_jac
end

function spherical_coords_to_unit_vector(ϕ)
    N = length(ϕ) + 1
    T = float(typeof(first(ϕ)))
    x = similar(ϕ, T, N)
    return spherical_coords_to_unit_vector!(x, ϕ)
end

function spherical_coords_to_corrchol(ϕ, N, K)
    @assert length(ϕ) == num_angles_corrchol(N, K)
    T = float(typeof(first(ϕ)))
    L = similar(ϕ, T, N, K)
    fill!(L, 0)
    L[1, 1] = 1
    log_det_jac = zero(T)
    i = 1
    for n in 2:N
        row_length = min(n, K)
        ϕₙ = @views ϕ[i:(i + row_length - 2)]
        Lₙ = @views L[n, 1:row_length]
        _, logJ = spherical_coords_to_unit_vector!(Lₙ, ϕₙ)
        log_det_jac += logJ
        i += row_length - 1
    end
    @assert i == length(ϕ) + 1
    return L, log_det_jac
end

num_angles_corrchol(N, K) = sum(n -> min(n, K) - 1, 2:N)

function corrchol_to_vec(X)
    N, K = size(X)
    return @views reduce(vcat, X[n, 1:min(n, K)] for n in 2:N)
end

julia> using Test, LinearAlgebra, ForwardDiff

julia> N, K = 16, 9

julia> ϕ = rand(num_angles_corrchol(N, K)) * π;

julia> L, logJ = spherical_coords_to_corrchol(ϕ, N, K);

julia> L
16×9 Matrix{Float64}:
1.0         0.0         0.0          0.0            0.0           0.0          0.0
0.947746    0.319027    0.0          0.0             0.0           0.0          0.0
-0.453748    0.455677    0.765814     0.0             0.0           0.0          0.0
0.162752   -0.815836    0.0411449    0.553381        0.0           0.0          0.0
-0.0675745  -0.678065   -0.390384    -0.357414        0.0           0.0          0.0
-0.794639   -0.452183    0.277417    -0.278334       0.0           0.0          0.0
0.985237   -0.141132   -0.0191612    0.0796641       0.00170641    0.0          0.0
0.21713     0.976095   -0.00572299   0.00442652      0.000308881   0.000644938  0.0
0.545889   -0.612698   -0.441773    -0.329475        0.0631373    -0.036901     0.0304645
0.333311    0.930827   -0.130218    -0.00151273     -0.023701      0.00087454   0.00131505
-0.337961    0.890776    0.271462    -0.130395       0.00101614    0.00016551   0.00121652
0.998049   -0.0417353   0.046215    -0.00379095      0.000462996  -0.00149741   0.000162026
-0.377976   -0.74856     0.462584     0.200646       -0.0564089    -0.0722899    0.0165523
0.743822    0.113604   -0.199132     0.289515        0.106297      0.182008     0.157098
-0.814088    0.580163    0.0188838    0.00198259     -0.00379611   -0.0121217    0.00579842
0.535706    0.649043   -0.537816    -0.00748968    -0.00159064    0.000449167  0.000740431

julia> @testset "cholesky factor satisfies constraints" begin
    @test istril(L)
    @test all(>(0), diag(L))
    R = L*L'
    @test all((1), diag(R))
    @test L  cholesky(R; check=false).L[:, 1:K]
end
Test Summary:                         | Pass  Total
cholesky factor satisfies constraints |    4      4
Test.DefaultTestSet("cholesky factor satisfies constraints", Any[], 4, false, false)

julia> J = ForwardDiff.jacobian-> corrchol_to_vec(spherical_coords_to_corrchol(ϕ, N, K)[1]), ϕ)
107×92 Matrix{Float64}:
 -0.319027   0.0        0.0        0.0          0.0           0.0          0.0           0.0
  0.947746   0.0        0.0        0.0          0.0            0.0          0.0           0.0
 -0.0       -0.89113   -0.0       -0.0         -0.0           -0.0         -0.0          -0.0
  0.0       -0.232023  -0.765814   0.0          0.0            0.0          0.0           0.0
  0.0       -0.38994    0.455677   0.0          0.0            0.0          0.0           0.0
  0.0        0.0        0.0       -0.986667     0.0           0.0          0.0           0.0
 -0.0       -0.0       -0.0       -0.134574    -0.554908      -0.0         -0.0          -0.0
  0.0        0.0        0.0        0.00678693  -0.0604921      0.0          0.0           0.0
  0.0        0.0        0.0        0.0912811   -0.81359        0.0          0.0           0.0
                                                                                      
  0.0        0.0        0.0        0.0          0.0            0.0          0.0           0.0
 -0.0       -0.0       -0.0       -0.0         -0.0          -0.0         -0.0          -0.0
 -0.0       -0.0       -0.0       -0.0         -0.0           -0.0         -0.0          -0.0
 -0.0       -0.0       -0.0       -0.0         -0.0           -0.0         -0.0          -0.0
  0.0        0.0        0.0        0.0          0.0           -0.00181111   0.0           0.0
 -0.0       -0.0       -0.0       -0.0         -0.0           -0.0329006   -0.000866019  -0.0
  0.0        0.0        0.0        0.0          0.0           0.00929052  -0.000824996  -0.000740431
  0.0        0.0        0.0        0.0          0.0            0.015315    -0.00135997    0.000449167

julia> @test logabsdet(J'J)[1]/2  logJ
Test Passed
  Expression: (logabsdet(J' * J))[1] / 2  logJ
   Evaluated: -205.01162441028785  -205.01162441028777

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants