Skip to content

Simulated Annealing for high dimensional log concave sampling

Apostolos Chalkis edited this page Mar 23, 2022 · 4 revisions

Background

This project is about sampling from high dimensional log-concave distributions with No-U-Turn Hamiltonian Monte Carlo (HMC) Sampler. In particular, the contributor will adjust for sampling and implement the simulated annealing [1,2] in the case of a log-concave distribution restricted to a convex body. The contributor will have to implement a practical version of this scheme to address the problem of efficient sampling in practice.

Related work

Only volesti package provides an open-source implementation for sampling from multivariate distributions truncated by a polytope or a spectrahedron with HMC and NUTS HMC algorithms. It employs several integrators for the ODE that HMC step requires with the most efficient being the leapfrog integrator. However, when the log-concave distribution is extremely skinny the mixing rate of those random walks could be arbitrarily small. The current implementation can not efficiently handle those cases. This project will contribute to volesti with a new implementation of a practical simulated annealing scheme that would address these hard instances.

Details of your coding project

The contributor (a) will implement the simulated annealing [1,2] in C++ for the case of sampling, (b) will evaluate and tune the implementation, (c) will report experimental evidence on the efficiency of the implementation, and (d) will write brief documentation. The proposed programming language is C++ because the implemented code will have to be added to volesti package and GeomScale project in general. Of course, the implementation will be based on the current software of volesti, so the basic geometrical concepts (polytopes, boundary reflections, ODE solvers) are ready to be used. A Matlab prototype will be given to the student.

Difficulty: Hard

Skills

  • Required: C++
  • Preferred: Experience with statistical or other mathematical software is a plus

Expected impact

This is a high-priority project for GeomScale org. It is expected to be very helpful to GeomScale sampling software as it is about efficient sampling from log-concave or general distributions which appear in many applications. Moreover, it will be an important stepping stone towards high dimensional MCMC sampling with open source software, addressing some important applications in biology and bayesian inference.

[1] Simulated Annealing for Convex Optimization, A. Kalai, S. Vempala
[2] Fast Algorithms for Logconcave Functions: Sampling, Rounding, Integration and Optimization, L. Lovasz, S. Vempala

Mentors

Contributors, please contact both mentors below after completing at least one of the tests below.

  • Vissarion Fisikopoulos <vissarion.fisikopoulos at gmail.com> is an expert in mathematical software, computational geometry and optimization, and has previous GSOC mentoring experience with Boost C++ libraries (2016-2020) and the R-project (2017-2020).

  • Marios Papachristou < papachristoumarios at gmail.com > is a PhD student in the Computer Science Department at Cornell University. His primary research interests lie within the field of Data Science. He has previous experience in GSoC 2018 and 2020 as a student under Org. FOSS and GeomScale. He was GSoC's mentor in GSoC 2019 and 2020.

  • Elias Tsigaridas <elias.tsigaridas at inria.fr> is an expert in computational nonlinear algebra and geometry with experience in mathematical software. He has contributed to the implementation, in C and C++, of several solving algorithms for various open-source computer algebra libraries and has previous GSOC mentoring experience with the R-project (2019) and Geomscale (2020).

Tests

Contributors, please do one or more of the following tests before contacting the mentors above.

  • Easy: Download, compile and run a simple sampling example with both C++ and R interfaces of volesti. For example, you can sample points from a 100-dimensional cube using the C++ HMC algorithm implemented in volesti for various distributions.

  • Medium: Add counters in C++ code of volesti to count the average number of reflections per HMC step. For different (a) integration time, (b) step length in ODE solver, (c) dimension, compute the PSRF/run-time and Effective Sample Size (ESS)/run-time for a large number of samples and report.

  • Hard: Implement in C++ an approximate and an exact oracle for the gradient of the Spherical Gaussian distribution with unit variance and repeat the experiments requested by the Medium test. Compare the performances of two cases and report.