Skip to content

Latest commit

 

History

History
31 lines (21 loc) · 2.23 KB

README.md

File metadata and controls

31 lines (21 loc) · 2.23 KB

GaussDCA (Cython)

Python implementation of GaussDCA using Cython. Adapted from here.

For the original paper please refer to "Fast and accurate multivariate Gaussian modeling of protein families: Predicting residue contacts and protein-interaction partners" by Carlo Baldassi, Marco Zamparo, Christoph Feinauer, Andrea Procaccini, Riccardo Zecchina, Martin Weigt and Andrea Pagnani, (2014) PLoS ONE 9(3): e92721.

This version implements what is called the "slow fallback" in the original Julia implementation.

Installation

Runs in Python 3.6

  1. Make sure cython and numpy are installed and up to date: pip install Cython and pip install numpy.
  2. Compile the cython source code: cd src; python setup.py build_ext -i; cd ..

Usage

python src/gaussdca.py [-h] [-o OUTPUT] [-s SEPARATION] [-t THREADS] alignment_file

So far, the alignment file needs to be in a3m format (with or without insertions). The output will be printed or saved into a file if given. Sequence separation and the number of threads for multiprocessing can be specified.

Performance

The following chart shows the elapsed runtime in minutes for a large test alignment (test/large.a3m) using 8 cores. performance

The first three bars show the effect of using different methods to do the matrix inversion:

  • pinv: pseudoinverse from numpy.linalg (uses SVD)
  • inv: multiplicative inverse from numpy.linalg
  • inv(chol): computes the Cholesky decomposition first and then inverts the matrix

The next bar "inv(chol) opt" uses the same inversion as above, but with some additional techincal optimizations.

The last bar "julia" shows the runtime of the julia implementation on 8 cores, with alignment compression.

Alignment compression has not been implemented yet.