Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor align #1197

Merged
merged 20 commits into from
Feb 14, 2017
Merged

Refactor align #1197

merged 20 commits into from
Feb 14, 2017

Conversation

kain88-de
Copy link
Member

@kain88-de kain88-de commented Feb 4, 2017

See #1136 old PR.

I'd appreciate some help in testing the new weights options I've introduced here.

Old text

It stills needs some checking and tests. I would like go get this into 0.16.0 though. The RMSD class has been refactored this cycle and I'm currently not sure if the bug has been present as well in 0.15.0

Changes made in this Pull Request:

use arbitrary weights in align.py and rms.py
rewrite RMSF to use AnalysisBase
fix centering bug in RMSD (should be done with chosen weights)

PR Checklist

  • Tests?
  • tests for new deprecated option
  • Docs?
  • CHANGELOG updated?
  • [ ] Issue raised/referenced?
  • add deprecation warnings for mass_weighted
  • updated functions depending on new kwarg
  • fix all mass_weighted in encore. Or at least raise an issue and fix in other PR
  • RMSF deprecation warnings for run

@jbarnoud
Copy link
Contributor

The change in analysis.rms.RMSF is going to break old code where arguments were passed to the run method. The class needs a deprecation warning when it happens. One way of doing this would be to have a proxy run method that issues the deprecation warning if it receives more than self as an argument.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about making copies (old behaviour) vs changing user data in place (new) + docs.

# superposition only works if structures are centered
if center or superposition:
# make copies (do not change the user data!)
# weights=None is equivalent to all weights 1
a = a - np.average(a, axis=0, weights=weights)
b = b - np.average(b, axis=0, weights=weights)
a -= np.average(a, axis=0, weights=weights)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now changing user data – is that intentional (I don't think so)? If so, the comments above must be changed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You suggested to include this change ;). But I'll change it back I actually prefer for it to not change user data.

Whether to perform mass-weighted covariance matrix estimation
(default is True).

weights : str/array_like, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need to get this into 0.16.0 – otherwise we have to do deprecations for ENCORE stuff right away.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I will have some time tomorrow

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly still want to add tests for custom weights. My current idea is to use the masses as custom weights and compare the outputs when I chose weights='mass' that should be sufficient.

write RSMD into file file :meth:`RMSD.save`
mass_weighted : bool (optional)
mass_weighted : bool (deprecated)
do a mass-weighted RMSD fit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do a mass-weighted RMSD fit; deprecated keyword argument, use weights='mass' instead

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should also be some markup at the bottom such as

.. versionchanged:: 0.16.0
   Flexible weighting scheme with new ``weights`` keyword.

.. deprecated:: 0.16.0
   Instead of ``mass_weighted=True`` use new ``weights='mass'`` 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should now be added to all affected classes/functions

verbose=None, quiet=None):
"""Calculate RMSF of given atoms across a trajectory.
class RMSF(AnalysisBase):
"""Calculate RMSF of given atoms across a trajectory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please leave the original versionadded markup and add a new versionchanged ("Follows analysis API").

@@ -104,9 +108,9 @@ def tearDown(self):
def test_rmsd(self):
self.universe.trajectory[0] # ensure first frame
bb = self.universe.select_atoms('backbone')
first_frame = bb.positions
first_frame = bb.positions.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed because you're not making copies of user arrays anymore? (See comment above)

@orbeckst
Copy link
Member

orbeckst commented Feb 11, 2017 via email

@orbeckst
Copy link
Member

orbeckst commented Feb 11, 2017 via email

simplify weights handling
This new general weights kwarg can take an mass argument or arbitrary weights to
allow new fitting algorithms.
This also fixes a centering bug. Previous the center of mass was used
independent of the used weights. Now we center with the chosen weights. This
means results might change!
The old results have been wrong due to a centering bug in the RMSD code
@kain88-de
Copy link
Member Author

Another test would be to mask the full weights so that only CA weights remain one and all others are 0

Good one. I'll add this now and hope that our selection code doesn't have any bugs.

if mass_weighted:
weights = 'mass'

if weights == 'mass':
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw this method leaves a nasty warning right now when weights is a numpy array since the equal operator tries a elementwise operation, this is the only comparison that will happen in the future. Anyone a idea how to nicely express the logic that tests for an array_like?

selection : str, optional
use this selection
superimposition_selection : str, optional
TODO
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtiberti @wouterboomsma do you have a good idea for the doc description of this parameter. There isn't one in develop right now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added one in aad74e4 - sorry I missed it before

if len(weights) != len(ensembles):
raise ValueError("need weights for every ensemble")
else:
weights = [None for _ in range(len(ensembles))]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtiberti, @wouterboomsma is this correct to demand for hes that custom weights have to be specified for each ensemble?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think so - since ensembles are independent MDAnalysis.Universe they are not technically required to have the same topology, so it would make sense to specify a weight/mass array for each of them. Thanks for these changes!

weights = self.atomgroup.masses
self.weights = weights

def run(self, start=None, stop=None, step=None, progout=None,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbarnoud This allows to use the class also in the old style and it prints a warnings.

This changes the methods in encore to the new arbitrary weights system.

- fix docs and default parameters
This makes a comparison of tests when I chose mass as weights or directly
the mass arrays from the atomgroup
@kain88-de
Copy link
Member Author

Funny the coveralls report comes in before travis is done with the full run.

warnings.warn("run arguments are deprecated. Please pass them at "
"class construction. These options will be removed in 0.17.0",
category=DeprecationWarning)
if quiet is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better here to use mda.lib.log._set_verbose. That would do the same test as this, plus it would test the case where both quiet and verbose are provided. But most importantly, it will help latter on to find all the places to change when we will remove quiet.


def run(self, start=None, stop=None, step=None, progout=None,
verbose=None, quiet=None):
if np.any([el is not None for el in (start, stop, step, progout, quiet)]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why np.any rather than just any?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just didn't know about any

#test mass_weighted
# test masses as weights
last_atoms_weight = self.universe.atoms.masses
A = self.universe.trajectory[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gives Timesteps for A and B, with the side effect of moving the frame cursor.

Do you actually want to do

A = self.universe.trajectory[0].positions.copy()

???

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the timestep is always a new one I think. But the timestep has a similar API as a atomgroup so it can work if the rmsd only looks at the positions.

A = self.universe.trajectory[0]
B = self.reference.trajectory[-1]
rmsd = align.alignto(self.universe, self.reference, weights='mass')
rmsd_sup_weight = rms.rmsd(A, B, weights=last_atoms_weight,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rmsd() can take Timestep?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. THe np.asarray call works on a timestep. I didn't know that!

@orbeckst
Copy link
Member

Looking good... just curious about the use of Timestep instances in the tests (see comments). Or maybe I misunderstood what's happening. But the tests pass...

@kain88-de
Copy link
Member Author

Anything else left to do?

@orbeckst
Copy link
Member

orbeckst commented Feb 13, 2017 via email

@orbeckst
Copy link
Member

orbeckst commented Feb 13, 2017 via email

@orbeckst orbeckst self-assigned this Feb 14, 2017
@orbeckst orbeckst added this to the 0.16.0 milestone Feb 14, 2017
@orbeckst orbeckst merged commit bb5d106 into develop Feb 14, 2017
@orbeckst orbeckst deleted the refactor-align branch February 14, 2017 01:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants