WASCO: A Wasserstein-based statistical tool to compare conformational ensembles of intrinsically disordered proteins

Welcome to the beta version of this IDP ensemble comparison tool. The method implemented in this jupyter notebook computes residue-specific distances between a pair of IDP conformational ensembles, together with an overall distance for the entire ensemble. The comparison is simultaneously made at two scales:

Global scale: distances between the distributions of the relative positions of all residue pairs in both ensembles. For each pair of residues, we compute the (2-Wasserstein) distance between a pair of probability distributions supported on the three-dimensional euclidean space (point clouds).
Local scale: distances between the (phi, psi) angle distributions of each ensemble, for each residue along the sequence. For each residue, we compute the (2-Wasserstein) distance between a pair of probability distributions supported on the two dimensional flat torus.

Results are returned through a distance matrix, depicting both scales' results: global distances are included in the lower triangle and local distances along the diagonal. Computations include a correction to mitigate the effect of uncertainty (if independent replicas are provided or sampled). The matrix color scales correspond to:

If no independent replicas are provided/sampled (and thus, uncertainty is ignored): the intra-ensemble distances between each pair of distributions.
If independent replicas are provided/sampled (and thus, uncertainty is considered): the proportion of intra-ensemble distances that is added to the intra-ensemble distances to reach the encountered inter-ensemble distances. In the legend, $\Delta W$ corresponds to the difference between the inter-ensemble and the intra-ensemble distances, and $W_{\mathrm{ind}}$ indicates the intra-ensemble differences. In other words, this scale represents how different are the inter-ensemble distances with respect to the intra-ensemble ones (e.g. the "net" distance that has been added to uncertainty represents the 150% of such uncertainty). This was set as the easiest interpretable scale, using uncertainty as a reference to which compare the inter-ensemble differences.

The entry (i,j) of the matrix coresponds to the distance between the distributions of the relative positions i-j (one distribution per ensemble). It measures how different is the relative position of residue i with respect to j when changing from one ensemble to the other. The entry (i,i) corresponds to the distance between the distributions of the i-th residue's (phi, psi) angles (one distribution per ensemble). It measures how different is the (phi, psi) distribution of i-th amino-acid when changing from one ensemble to the other.

To apply the comparison tool for a given pair of IDP ensembles, the user can directly execute the comparison_tool file, which contains its specific instructions and guidelines. This file calls all the other notebooks included in the same folder, which can also be used individually if desired.

Before running the function, be sure to set Python version to 3.8 and to have installed all of the following libraries: numpy, os, ipynb, tqdm, joblib, functools, mdtraj, h5py, itertools, pandas, warnings, Bio, time, shutil, seaborn, matplotlib, mdanalysis, ot, scipy and faiss. We are currently working on a file that automatically executes all the required installation. We apologize for the difficulties that you may encounter meanwhile.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Examples		Examples
LICENSE		LICENSE
README.md		README.md
build_frames.ipynb		build_frames.ipynb
comparison_tool.ipynb		comparison_tool.ipynb
graphical_representation.ipynb		graphical_representation.ipynb
multiframe_conversion.ipynb		multiframe_conversion.ipynb
sample_independent_replicas.ipynb		sample_independent_replicas.ipynb
wmatrix.ipynb		wmatrix.ipynb
wvector.ipynb		wvector.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WASCO: A Wasserstein-based statistical tool to compare conformational ensembles of intrinsically disordered proteins

About

Releases

Packages

Languages

License

ggerlach1/WASCO

Folders and files

Latest commit

History

Repository files navigation

WASCO: A Wasserstein-based statistical tool to compare conformational ensembles of intrinsically disordered proteins

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages