Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solvation Shell #196

Merged
merged 14 commits into from
Sep 21, 2021
3 changes: 2 additions & 1 deletion doc/sphinx/source/analysis.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,5 @@ are for users who wish to construct their own analyses.

analysis/ensemble
analysis/ensemble_analysis
analysis/dihedral
analysis/solvation
analysis/dihedral
12 changes: 12 additions & 0 deletions doc/sphinx/source/analysis/solvation.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
========================
Solvation Shell Analysis
========================

Analyzes the number of solvent molecules within given distances of the solute.

.. versionadded:: 0.8.0

.. autoclass:: mdpow.analysis.solvation.SolvationAnalysis
:members:

.. automethod:: run
85 changes: 85 additions & 0 deletions mdpow/analysis/solvation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# MDPOW: solvation.py
# 2021 Alia Lescoulie

from typing import List

import numpy as np
import pandas as pd

import MDAnalysis as mda
from MDAnalysis.lib.distances import capped_distance

from .ensemble import EnsembleAnalysis, Ensemble, EnsembleAtomGroup

import logging

logger = logging.getLogger('mdpow.dihedral')


class SolvationAnalysis(EnsembleAnalysis):
"""Measures the number of solvent molecules withing the given distances
in an :class:`~mdpow.analysis.ensemble.Ensemble` .

:keyword:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameters

(keywords are optional)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


*solute*
An :class:`~mdpow.analysis.ensemble.EnsembleAtom` containing the solute
used to measure distance.

*solvent*
An :class:`~mdpow.analysis.ensemble.EnsembleAtom` containing the solvents
counted in by the distance measurement. Each solvent atom is counted by the
distance calculation.


*distances*
The cutoff distances around the solute measured in Angstroms.

The data is returned in a :class:`pandas.DataFrame` with observations sorted by
distance, solvent, interaction, lambda, time.

.. ruberic:: Example
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rubric

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


Typical Workflow::

ens = Ensemble(dirname='Mol')
solvent = ens.select_atoms('resname SOL and name OW')
solute = ens.select_atoms('not resname SOL')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it more specific, using the resname of the solute. Otherwise people might copy and paste and get wrong answers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


solv_dist = SolvationAnalysis(solute, solvent, [1.2, 2.4]).run(start=0, stop=10, step=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove start and step from the example as they are just using the defaults

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


"""
def __init__(self, solute: EnsembleAtomGroup, solvent: EnsembleAtomGroup, distances: List[float]):
self.check_groups_from_common_ensemble([solute, solvent])
super(SolvationAnalysis, self).__init__(solute.ensemble)
self._solute = solute
self._solvent = solvent
self._dists = distances

def _prepare_ensemble(self):
self._sel = 'name '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the whole selection self._sel thing as it is not needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

keys = [k for k in self._solute.keys()]
for n in self._solute[keys[0]].names:
self._sel += f' {n}'
self._col = ['distance', 'solvent', 'interaction',
'lambda', 'time', 'N_solvent']
self.results = pd.DataFrame(columns=self._col)
self._res_dict = {key: [] for key in self._col}

def _single_frame(self):
solute = self._solute[self._key]
solvent = self._solvent[self._key]
pairs, distaces = capped_distance(solute.positions, solvent.positions,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

distaces -> distances

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix spelling of "distaces"

max(self._dists), box=self._ts.dimensions)
solute_i, solvent_j = np.transpose(pairs)
for d in range(len(self._dists)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more elegant:

for d in self._dists:
     close_solv_atoms = solvent[solvent_j[distances < d]]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

close_solv_atoms = solvent[solvent_j[distaces < self._dists[d]]]
n = len(close_solv_atoms)
result = [self._dists[d], self._key[0], self._key[1],
self._key[2], self._ts.time, n]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of n you can also put close_solv_atoms.n_atoms there. You are not selecting residues so there's not an immediate step left that particularly benefits from the extra line of code. At least that's my view, but if you like the current code better, leave it. It's more a personal style issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

for i in range(len(self._col)):
self._res_dict[self._col[i]].append(result[i])

def _conclude_ensemble(self):
for k in self._col:
self.results[k] = self._res_dict[k]
55 changes: 55 additions & 0 deletions mdpow/tests/test_solv_shell.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
from __future__ import absolute_import
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove for py 3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


import numpy as np

from . import tempdir as td

import py.path

import pybol
import pytest

from numpy.testing import assert_almost_equal
from scipy.stats import variation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My view is that this is too elaborate, keep it simple and just use numpy. I had to look up what scipy.stats.variation does, which was a sign that too much is going on than necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove


from ..analysis.ensemble import Ensemble, EnsembleAnalysis, EnsembleAtomGroup

from ..analysis.solvation import SolvationAnalysis

from pkg_resources import resource_filename

RESOURCES = py.path.local(resource_filename(__name__, 'testing_resources'))
MANIFEST = RESOURCES.join("manifest.yml")


class TestSolvShell(object):
mean = 2654.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove and replace with a parametrized test (see below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

std = 2654.465059103246

def setup(self):
self.tmpdir = td.TempDir()
self.m = pybol.Manifest(str(RESOURCES / 'manifest.yml'))
self.m.assemble('example_FEP', self.tmpdir.name)
self.ens = Ensemble(dirname=self.tmpdir.name, solvents=['water'])
self.solute = self.ens.select_atoms('not resname SOL')
self.solvent = self.ens.select_atoms('resname SOL and name OW')

def teardown(self):
self.tmpdir.dissolve()

def test_dataframe(self):
solv = SolvationAnalysis(self.solute, self.solvent, [1.2]).run(start=0, stop=4, step=1)

for d in solv.results['distance']:
assert d == 1.2
for s in solv.results['solvent']:
assert s == 'water'
for i in solv.results['interaction'][:12]:
assert i == 'Coulomb'

def test_selection(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a parametrized test that checks for each distance separately. We also create a fixture for running the analysis. We make it class-scoped so that it only runs once.

@pytest.fixture(scope="class")
def solvation_analysis_list_results(self):
    return SolvationAnalysis(self.solute, self.solvent, [2, 10]).run(start=0, stop=4, step=1).results

@pytest.mark.parametrize("d,ref_mean,ref_std", [(2, ..., ...), (10, ..., ...)])
def test_selection(self, solvation_analysis_list_results, d, ref_mean, ref_std):
     results = solvation_analysis_list_result  # the fixture provides the results, aliased to a shorter variable for convenience
     mean = ... # calculate the mean for distance d from solv.re
     std = ... # calculate std dev for distance d
     assert mean = pytest.approx(ref_mean)   # or use assert_almost_equal
     assert  std = ...

You can either use pytest.approx or the numpy assertion; I normally use the latter ones for arrays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks this is a way more robust method.

solv = SolvationAnalysis(self.solute, self.solvent, [2, 10]).run(start=0, stop=4, step=1)
mean = np.mean(solv.results['N_solvent'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do the asserts for the two distances separately? The one for 2 should be very small, the one for 10 large. Having actual numbers can help with validation because then we can also apply our knowledge about the system.

std = np.std(solv.results['N_solvent'])
assert_almost_equal(mean, self.mean, 6)
assert_almost_equal(std, self.std, 6)