-
Notifications
You must be signed in to change notification settings - Fork 0
Sampling correlation matrices
Sampling from correlation matrices is useful for numerous applications in statistics and bioinformatics.
It is a fundamental problem in numerous Bayesian models.
This coding project considers the case of sampling from a log-concave distribution restricted to the set of correlation matrices.
The aim is to develop efficient open-source software, expanding package volesti
.
The student will implement these methods in C++ and she/he will perform an extended
empirical comparison with existing software and report on the results.
Sampling correlation matrices is a relatively difficult problem due to three constraints imposed on a rectangular matrix: positive definiteness, that is, a symmetric matrix with non-negative eigenvalues, fixed unit diagonal elements, and non-diagonal elements bounded in [-1,1]. The project relies on the geometric representation of correlation matrices in [1] and the Markov Chain Monte Carlo methods implemented in volesti
for sampling from a multivariate truncated distribution.
The simplest method for constructing a correlation matrix is to use the rejection-sampling method, which generates correlation coefficients using uniform random variables in the closed interval [-1, 1]. Subsequently, each time the matrix positive definiteness is checked, and if it is not satisfied, another correlation matrix is generated. Instead, for large-dimensional problems, there are several techniques for generating a correlation matrix. In [2, 3, 4] they generate correlation matrices with predetermined eigenvalues and spectrum. In [5] they generate correlation matrices with a given mean value, structure, or eigenvalues. In [6, 7] they generate a correlation matrix with MCMC algorithms (sampling from the posterior distribution in certain Bayesian models).
The student will implement in C++ a class of the convex body that represents the set of correlation matrices. In that class, she/he has
to implement the membership, boundary, and reflection oracles that the random walks in volesti
require to operate (see [1] for more details). Moreover, the student will have to perform experiments to justify the efficiency of the implementation.
Matlab prototypes for the implementations will be given to the student.
Difficulty: Medium
- Required: C++, Probability theory, Basic applied math background
- Preferred: Experience with statistical or other mathematical software is a plus
The project will be a very useful addition to package volesti
. They will crucially contribute to the implementation
of efficient Bayesian models to learn the covariance matrix and to fit a copula on given data.
[1] Efficient Bayesian inference of systemic risk interlinkages, V Arakelian, A Chalkis (2021).
[2] Generation of correlation matrices with a given eigenstructure, C. Chalmers (1975).
[3] Population correlation matrices for sampling experiments, Bendel, R. B. and M. R. Mickey (1978).
[4] Generating correlation matrices with specified eigenvalues using the method of alternating projections, N. G. Waller (2020).
[5] Generating correlation matrices, G. Marsaglia, and I. Olkin (1984).
[6] Efficient estimation of covariance selection models, F. Wong, C. K. Carter, and R. Kohn (2003).
[7] Efficient Bayesian inference for Gaussian copula regression models, M. Pitt, D. Chan, and R. Kohn (2006).
-
Apostolos Chalkis <tolis.chal at gmail.com> is a PhD student in Computer Science. His research focuses on mathematical computing, optimization, and computational finance. He has previous experience in GSoC 2018 and 2019 as a student under Org.
R-project
, implementing state-of-the-art algorithms for sampling from high dimensional multivariate distributions. He was GSOC mentor in three projects with Geomscale (2020). He is one of the authors ofvolesti
. -
Veni Arakelian is an academic expert in macro-finance, and financial econometrics. She holds a Ph.D. in econometrics from Athens University of Economics and Business. She is skilled in econometrics, macroeconomics, and machine learning techniques. She has more than fifteen years of experience in research, teaching, and working for the banking sector. Author and co-author of several publications in leading scientific journals in finance, statistics, and econometrics.
Students, please contact the first and the third mentor after completing at least one of the tests below.
Students, please do one or more of the following tests before contacting the mentors above.
- Easy: Implement in C++ Acceptance-Rejection method to sample uniformly distributed correlation matrices and create a PR to add the code in
volesti
. - Medium: Compile and run
volesti
. Create a C++ method that generates the matrices in the Linear Matrix Inequality of the spectrahedron that represents the set of nxn correlation matrices. - Hard: Use the generator from the
medium
test and the existing C++ spectrahedron class involesti
to sample uniformly distributed correlation matrices with Billiard Walk. Report then
that the sampler becomes inefficient (e.g. takes more than 3 minutes to sample 1000 matrices).
For tips and references contact the Mentors!
Students, please post a link to your test results here.
- EXAMPLE STUDENT 1 NAME, LINK TO GITHUB PROFILE, LINK TO TEST RESULTS.
STUDENT 1 AGILAN S, https://github.com/Agi7an, https://github.com/Agi7an/VolEsti/blob/main/setup.r
STUDENT 2 Ioannis Iakovidis, https://github.com/iakoviid/, https://github.com/iakoviid/volesti/tree/gsoc2022
STUDENT 3 HUSSAIN LOHAWALA, https://github.com/H9660/Volesti-/blob/main/setup.r
STUDENT 4 Huu Phuoc Le, https://github.com/huuphuocle, https://github.com/huuphuocle/sampling_correlation_matrices