Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topology update calls #85

Merged
merged 32 commits into from
Mar 22, 2019
Merged

Conversation

ahy3nz
Copy link
Collaborator

@ahy3nz ahy3nz commented Mar 14, 2019

#84
Tests included

top.add_site(site1)
site2 = Site(name='site2', atom_type=atomtype)
top.add_site(site2)
top.update_top()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have some asserts before this call so we can be sure we're updating what we want to update

@mattwthompson
Copy link
Member

Just recording a few general comments:

  • It will probably be relevant for performance reasons to intelligently call only a minimum number of these functions, i.e. when updating an AtomType only call the functions that could be affected such as top.update_atom_types(), not everything

  • I wonder if there's actually a use case for top. update_top(). Ideally things be updated automatically as they're changed, right?

This approach of storing high-level summarizes of many low-level details will be useful for many things in the future, but now #52 #80 (similar ideas) are good examples

Copy link
Collaborator

@justinGilmer justinGilmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One or two small changes, but LGTM

@@ -29,6 +29,18 @@ def __init__(self,
self._connections = list()

def add_connection(self, other_site):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. this could get confusing since site and top have the same method name.

What would be the difference if someone used top.add_connection but the connectionType was just None

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ties into #86 , but should hopefully be resolved when #98 gets merged


@property
def connection_type_functionals(self):
return [ctype.potential_function for ctype in self.connection_types]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctype is very close in name to the python standard lib ctypes

might be better to call this contype?

self.update_connection_list()
self.update_connection_types()

def update_atom_types(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for right now, but I can see this being an issue with a large system and you only change one atom type.

@ahy3nz ahy3nz added WIP work in progress - do not merge and removed ready for review labels Mar 20, 2019
@ahy3nz
Copy link
Collaborator Author

ahy3nz commented Mar 20, 2019

Waiting for #98 #96 , then I'll probably have to tweak this a bit

@ahy3nz
Copy link
Collaborator Author

ahy3nz commented Mar 21, 2019

Summary of what top.Topology knows:

  • sites (site_list) - continuously updated when we call top.add_site [EDIT: also when we call add conection]
  • connections (connection_list) - continuously updated when we call top.add_connection, but a "safetynet" is top.update_connection_list
  • bonds (bond_list) - not updated until top.update_bond_list [EDIT: now this is continuously updated when adding a connection]
  • angles (angle_list) - not updated until top.update_angle_list [EDIT: now this is continuously updated when adding a connection]

Potential instances:

  • atom types (atom_types) - not updated until top.update_atom_types is called [EDIT: now this is continuously updated when adding a site and adding a connection]
  • connection types (connection_types) - not updated until top.update_connection_types is called [EDIT: now this is continuously updated when adding a connection, but not when adding a site]
  • bond types (bond_types) - not updated until top.update_bond_types is called [EDIT: now this is contiuously updated when adding a connection, but not when adding a site]
  • angle types (angle_types) - not updated until top.update_angle_types is called [EDIT: now this is continuously updated when adding a connection, but not when adding a site]

Potential expressions:

  • atomtype expression (atom_type_expressions) - this list is not actually stored, but created whenever top.atom_type_expressions is called
  • connectiontype expression (connection_type_expressions) - this list is not actually stored, but created whenever top.connection_type_expressions is called
  • bondtype expression (bond_type_expressions) - this list is not actually stored, but created whenever top.bond_type_expressions is called
  • angletype expression (angle_type_expressions) - this list is not actually stored, but created whenever top.angle_type_expressions is called

Design concerns:
As we can see, there are Topology attributes that are either

  1. continuously updated and stored
  2. stored but updated only upon call [EDIT: now we are storing + updating this fairly frequently]
  3. not stored and only returned upon call

This seems like a parallelism issue. So I'd like to get some opinions on how we should handle all these attributes (uniformly go with method 1, 2, or 3? Or handle all these attributes differently).

Furthermore, we have some attributes in topology like top.bond_list while other things are top.bond_types and top.bond_type_expressions (they don't have "list" at the end of the variable name). So I'd like to get opinions on naming conventions too

@ahy3nz
Copy link
Collaborator Author

ahy3nz commented Mar 21, 2019

For example, one option is that we never store information about bonds, angles, bondtypes, angletypes. We merely store the entire connection_list, connection_types. Upon a call to top.bonds, only then will we iterate through the connection_list to find bonds, or upon a call to top.bond_types will we iterate through the connection_types to find bond_types

@colinebunner
Copy link
Collaborator

@ahy3nz Thinking about it more, I'm definitely confusing the purpose of topology with that of mBuild. I agree that validation functions in topology would be great.

ahy3nz added 4 commits March 21, 2019 17:20
…s if adding redundant objects. Re-work update functions to not 'reset and rebuild' lists, just 'append' to the lists if something new presents itself
…include update site list function (might not be necessary?)
@ahy3nz ahy3nz added ready for review and removed WIP work in progress - do not merge labels Mar 21, 2019
@ahy3nz
Copy link
Collaborator Author

ahy3nz commented Mar 21, 2019

I think this is ready for review. We have copious update calls. Adding sites or connections to the topology will also trigger a bunch of update calls. Tests are passing.

Still some design/syntactical discussion that might be worth having: #85 (comment)

@mattwthompson
Copy link
Member

mattwthompson commented Mar 22, 2019

Just a few scattered ideas (good luck following my thoughts):

Could we just define connection_list as the sum of bond_list, angle_list etc.? I guess the approach is to add any connection with the same function and then, after it is added, distinguish between the different kinds of connections. This seems good, I think.

It's hard to say until we get to experimenting with large systems, but I have a strong suspicion that the following are both true:

  1. A lot of things get called whenever things get changed (added/updated/idk/etc)
  2. Probably just some small changes to these functions can make it not speed-prohibitive (once we actually tinker with it)

A potential problem with automatically finding connections is that strictly speaking, many systems include tons of dihedrals even though only a small amount of them matter.

On naming conventions, I prefer top.things vs top.thing_list.

On the neighbor-searching idea: I'm leaning toward keeping it how it is right now, but I would like to be sure of it.

Conversely, "adjacent" Bond objects make no logical conclusion that there should also be an Angle.

Just to be clear, are you pointing out that this is the way it is or it's the way it should be? Because I think it's the way it should be ... maybe. It makes sense to have parts of an angle exist as bonds, but I guess there may be cases in which full force field parametrization includes a few un-parametrized parts?

@mattwthompson
Copy link
Member

That was an unfortunate misclick - not only is this a good PR but I wasn't even done writing my comment

@colinebunner
Copy link
Collaborator

@mattwthompson I completely agree. topology is just a container. The other stuff (i.e. automatically finding angles) should be handled by mBuild or some similar program. I still think the searching could be good for sanity checks, like @ahy3nz mentioned, but adding connections and bonds should be limited to just those specific functions since searching through the connection list will become prohibitively expensive as the size of the system grows.

@ahy3nz
Copy link
Collaborator Author

ahy3nz commented Mar 22, 2019

Summary:

  • Top knows and stores : sites, connections, bonds, angles, atomtypes, connectiontypes, bondtypes, angletypes. From a user-perspective, these attributes are updated whenever add_site, add_connection are called (these add functions internally update these top attributes)
  • Under the hood, the topology API has update functions that could be called whenever
  • Top has functionality to obtain: unique atomtypes, connectiontypes, bondtypes, angletypes
  • Top attributes are named top.sites, top.connections, top.bonds, etc. NOT top.site_list, top.connection_list
  • Tests reflect the API changes and are passing locally
  • Validation and trying to infer angles from bonds is a PR for another day

I think this is ready for review

@mattwthompson
Copy link
Member

Is there something up with angles?

>>> from topology.external.convert_parmed import from_parmed
>>> from topology.utils.io import get_fn
>>> import parmed as pmd
>>> struct = pmd.load_file(get_fn('ethane.top'), xyz=get_fn('ethane.gro'))
>>> top = from_parmed(struct)
>>> struct.angle_types
TrackedList([
	<AngleType; k=37.500, theteq=110.700>
	<AngleType; k=33.000, theteq=107.800>
])
>>> top.angle_types
[]
>>> top.angle_type_expressions
[]
>>> top.bond_type_expressions
[0.5*k*(r - r_eq)**2]
>>> top.atom_type_expressions
[4*epsilon*(-sigma**6/r**6 + sigma**12/r**12)]

Other than this, I agree with the above comment, tests are passing for me, and I think this is good to merge

@ahy3nz
Copy link
Collaborator Author

ahy3nz commented Mar 22, 2019

Oh that's because I didn't update the parmed converter, one moment..

@ahy3nz
Copy link
Collaborator Author

ahy3nz commented Mar 22, 2019

Should be good now. I did my due diligence for verifying units (notice pmd.angletype k uses rad, but pmd.angletype theteq uses deg). I encourage others to call me out on any of these unit/off-by-two discrepancies. #31

>>> import parmed as pmd
>>> from topology.external.convert_parmed import from_parmed
>>> struc = pmd.load_file('ethane.top', xyz='ethane.gro')
>>> top = from_parmed(struc)
>>> top.bonds
[<2-partner Connection, id 4611317600, type <BondType BondType, id 4611317152>>, <2-partner Connection, id 4611317264, type <BondType BondType, id 4611317432>>, <2-partner Connection, id 4611317208, type <BondType BondType, id 4611317656>>, <2-partner Connection, id 4611317376, type <BondType BondType, id 4611317712>>, <2-partner Connection, id 4611346560, type <BondType BondType, id 4611316928>>, <2-partner Connection, id 4611347176, type <BondType BondType, id 4611347064>>, <2-partner Connection, id 4611346616, type <BondType BondType, id 4611346504>>]
>>> top.angles
[<3-partner Connection, id 4611347232, type <AngleType AngleType, id 4611347624>>, <3-partner Connection, id 4611346840, type <AngleType AngleType, id 4611347008>>, <3-partner Connection, id 4611347400, type <AngleType AngleType, id 4611347512>>, <3-partner Connection, id 4611347120, type <AngleType AngleType, id 4611347568>>, <3-partner Connection, id 4611347792, type <AngleType AngleType, id 4611347344>>, <3-partner Connection, id 4611347680, type <AngleType AngleType, id 4611346784>>, <3-partner Connection, id 4611347848, type <AngleType AngleType, id 4611346896>>, <3-partner Connection, id 4611348016, type <AngleType AngleType, id 4611347736>>, <3-partner Connection, id 4611348184, type <AngleType AngleType, id 4611346728>>, <3-partner Connection, id 4611348352, type <AngleType AngleType, id 4611347904>>, <3-partner Connection, id 4611348520, type <AngleType AngleType, id 4611348072>>, <3-partner Connection, id 4611348688, type <AngleType AngleType, id 4611348240>>]
>>> top.sites
[<Site C, id 4608389360>, <Site H, id 4611090248>, <Site H, id 4611184848>, <Site H, id 4611186080>, <Site C, id 4611266152>, <Site H, id 4611268504>, <Site H, id 4611296504>, <Site H, id 4611308624>]
>>> top.n_sites
8
>>> top.n_bonds
7
>>> top.n_angles
12
>>> top.atom_types
[<AtomType opls_135, id 4610904976>, <AtomType opls_140, id 4611090304>]
>>> top.bond_types
[<BondType BondType, id 4611317152>, <BondType BondType, id 4611317432>]
>>> top.angle_types
[<AngleType AngleType, id 4611347624>, <AngleType AngleType, id 4611347568>]
>>> top.angle_type_expressions
[0.5*k*(theta - theta_eq)**2]
>>> top.bond_type_expressions
[0.5*k*(r - r_eq)**2]
>>> top.atom_type_expressions
[4*epsilon*(-sigma**6/r**6 + sigma**12/r**12)]

@mattwthompson
Copy link
Member

💥

@mattwthompson mattwthompson merged commit 8d1b26d into mosdef-hub:master Mar 22, 2019
@ahy3nz ahy3nz deleted the top_update_calls branch March 23, 2019 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants