Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP, ENH: add TPR position and velocity read support #4873

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

tylerjereddy
Copy link
Member

@tylerjereddy tylerjereddy commented Dec 29, 2024

  • Add support for reading positions from GMX .tpr files at tpx version 134. We've had an issue open to add this capability for almost a decade now.

Related to gh-464. This was a fair bit of work so I'd prefer to keep the scope for this specific PR under control and defer as much as possible on the TODO list below:

TODO:

  • Support reading position data from more tpx version .tpr files (I don't anticipate too much trouble with this, but it may also be helpful if we can scope to a modern version range initially) and add test cases for those older .tpr files.
  • Support reading velocity data (again, I don't anticipate much trouble here now that the hard work of finding the right binary seek position has been drafted, but if we can defer that and keep this narrowly scoped.. then great!)
  • Performance/duplication considerations -- as Richard noted in the cognate issue, the way our topology and coordinate handling code is organized, it may make sense to keep the separation here, but there should at least be opportunity to reduce code duplication if not completely avoid re-reading the topology to seek to the coordinate positions); that said, our .tpr reading performance is atrocious anyway and my attempts to fix at WIP, ENH: faster TPR topology building for large systems #4098 have stalled for more than a year, so I suggest we defer performance (and probably even duplication) considerations until after the capability and tests are cemented in

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

Developers certificate of origin


📚 Documentation preview 📚: https://mdanalysis--4873.org.readthedocs.build/en/4873/

@pep8speaks
Copy link

pep8speaks commented Dec 29, 2024

Hello @tylerjereddy! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 59:80: E501 line too long (89 > 79 characters)
Line 61:80: E501 line too long (93 > 79 characters)
Line 75:80: E501 line too long (81 > 79 characters)

Line 15:80: E501 line too long (119 > 79 characters)
Line 16:23: E261 at least two spaces before inline comment
Line 22:5: E124 closing bracket does not match visual indentation
Line 24:23: E261 at least two spaces before inline comment
Line 30:5: E124 closing bracket does not match visual indentation
Line 32:16: E261 at least two spaces before inline comment
Line 33:19: E241 multiple spaces after ','
Line 33:33: E241 multiple spaces after ','
Line 34:20: E241 multiple spaces after ','
Line 34:34: E241 multiple spaces after ','
Line 38:5: E124 closing bracket does not match visual indentation
Line 40:21: E261 at least two spaces before inline comment
Line 41:19: E241 multiple spaces after ','
Line 41:33: E241 multiple spaces after ','
Line 42:19: E241 multiple spaces after ','
Line 42:33: E241 multiple spaces after ','
Line 44:20: E241 multiple spaces after ','
Line 46:5: E124 closing bracket does not match visual indentation
Line 47:14: E261 at least two spaces before inline comment
Line 48:19: E241 multiple spaces after ','
Line 48:33: E241 multiple spaces after ','
Line 49:20: E241 multiple spaces after ','
Line 49:34: E241 multiple spaces after ','
Line 53:5: E124 closing bracket does not match visual indentation
Line 54:14: E261 at least two spaces before inline comment
Line 55:19: E241 multiple spaces after ','
Line 55:33: E241 multiple spaces after ','
Line 56:20: E241 multiple spaces after ','
Line 56:34: E241 multiple spaces after ','
Line 60:5: E124 closing bracket does not match visual indentation

Comment last updated at 2024-12-31 17:39:01 UTC

@@ -0,0 +1,73 @@
# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-
# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

header needs expanding to full version

package/MDAnalysis/coordinates/TPR.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/TPR.py Show resolved Hide resolved
im_excl_grp_size = data.unpack_int()
ndo_int(data, im_excl_grp_size)
# TODO: why is this needed?
data.unpack_int()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above can probably be cleaned up a bit.. probably has some unused vars, etc.

It was produced by careful printf-ing the GMX source as you might expect. Expanding to support other tpx versions may not be too bad, although this was fairly time consuming to draft.

# api/legacy/include/gromacs/topology/topology_enums.h
# worst case scenario we hard code it based on
# tpx/GMX version?
SimulationAtomGroupType_size = 10
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On latest GMX main branch, this is the data structure in question:

enum class SimulationAtomGroupType : int
{
    TemperatureCoupling,
    EnergyOutput,
    Acceleration,
    Freeze,
    User1,
    User2,
    MassCenterVelocityRemoval,
    CompressedPositionOutput,
    OrientationRestraintsFit,
    QuantumMechanics,
    Count
};

u = mda.Universe(tpr_file)
assert_allclose(u.atoms.positions[0, ...], exp_first_atom)
assert_allclose(u.atoms.positions[-1, ...], exp_last_atom)
assert_equal(u.atoms.positions.shape, exp_shape)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have specific conventions for coordinate reader testing, but for now this is where I've started. Should be easy to expand to include velocities as well.

Cases with only positions and no velocities (etc.) may also be sensible to add, on top of older tpx files...

Copy link

codecov bot commented Dec 29, 2024

Codecov Report

Attention: Patch coverage is 77.94118% with 15 lines in your changes missing coverage. Please review.

Project coverage is 93.59%. Comparing base (5656fe4) to head (d67bccb).
Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
package/MDAnalysis/coordinates/TPR.py 65.11% 10 Missing and 5 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4873      +/-   ##
===========================================
- Coverage    93.66%   93.59%   -0.08%     
===========================================
  Files          177      190      +13     
  Lines        21787    22921    +1134     
  Branches      3067     3082      +15     
===========================================
+ Hits         20406    21452    +1046     
- Misses         929     1012      +83     
- Partials       452      457       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@RMeli
Copy link
Member

RMeli commented Dec 29, 2024

@tylerjereddy since this PR is composed of two new files, and the modifications are all in one place, would you mind if I go ahead with #4849? I don't think there would be significant conflicts.

@tylerjereddy
Copy link
Member Author

ok

@RMeli
Copy link
Member

RMeli commented Dec 29, 2024

Thanks @tylerjereddy. It is now merged.

* Add support for reading positions from
GMX `.tpr` files at `tpx` version `134`.
* Expand `TPRReader()` support to include velocity handling,
and add tests/functionality for an additional tpx version (`133`).

[ci skip] [skip azp]
@tylerjereddy
Copy link
Member Author

tylerjereddy commented Dec 30, 2024

I drafted in support/testing for handling velocities as well, plus support/testing for reading positions/velocities from one more tpx version (133). I'm trying to skip the CI when I push in for the next little bit (looks like Azure still ran though..).

Need to poke around the .tpr test files to see if there are some more interesting ones as well re: non-zero velocities, etc.

@tylerjereddy tylerjereddy changed the title WIP, ENH: add TPR position read support WIP, ENH: add TPR position and velocity read support Dec 31, 2024
* Add `.tpr` position/velocity reading support back
to GMX 2023 (tpx version 129), and associated testing.

* Add a `.tpr` position/velocity reading test
case that has non-zero velocities and many more
atoms. This test case currently passes for position
retrieval but not for velocity retrieval.

[ci skip] [skip azp] [azp skip] [skip ci]
@tylerjereddy
Copy link
Member Author

tylerjereddy commented Dec 31, 2024

I was able to extend support and testing for .tpr position/velocity reads back as far as GMX 2023 (tpx version 129); GMX 2022rc1 (tpx 127) will fail testing because it needs more binary file striding version checks that I'll need to study in the C++ code (not too hard, but time consuming).

I also modernized an old TPR file we had with non-zero velocities (and a large number of atoms) for cobrotoxin via gmx convert-tpr and then added a test case for this. It passes for positions, but not for velocities yet, so that was a good intuition to check non-zero velocity/larger system case. Since that was converted to tpx 134 it will be a little harder to sort out the problem there, likely requiring side-by-side printf-vs-print for GMX vs. us on the binary striding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants