Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DCD-Cython #1372

Merged
merged 101 commits into from
Jul 8, 2017
Merged

DCD-Cython #1372

merged 101 commits into from
Jul 8, 2017

Conversation

kain88-de
Copy link
Member

@kain88-de kain88-de commented Jun 1, 2017

succeeds #1155
Fixes #659

I made a clean rebase on develop here

Changes made in this Pull Request:

  • add new cython dcd reader.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@kain88-de
Copy link
Member Author

If anyone is interested how to do a rebase with a merge commit.

  1. git checkout -b new-rebase-copy old-branch
  2. git reset --hard MERGE-SHA~ add the tilde to go one commit before merge
  3. git log --online old-branch > shas
  4. use editor to get only shas of commits after merge into shas file
  5. for SHA in $(tac shas); do git cherry-pick $SHA; done
  6. the last command might produce errors.
    • fix errors
    • git commit
    • repeat from step 3 using only sha values from un-picked commits

The author stuff can be fixed with a interactive rebase before this is done.

@richardjgowers richardjgowers self-assigned this Jun 1, 2017
@kain88-de kain88-de force-pushed the dcd-cython-rebase branch from 81aff85 to daedcdf Compare June 1, 2017 15:58
@tylerjereddy
Copy link
Member

@kain88-de The test linting failures showed up after you refactored the new libdcd tests. I think those issues are probably irrelevant--should the travis / linting settings be adjusted accordingly?

@tylerjereddy
Copy link
Member

The print statements in the old dcd test code might be valid complaints though--I'm just referring to the init issues with the new tests.

@kain88-de
Copy link
Member Author

You can leave the linting tests as is. I'll fix the corresponding code.

@kain88-de
Copy link
Member Author

Argh timeseries won't work

@orbeckst
Copy link
Member

orbeckst commented Jun 1, 2017

So that's a good reason to retire the DCD-based correl facility, as in #1373 ?

@tylerjereddy
Copy link
Member

Yeah, the removal of timeseries was suggested and agreed upon by all present at the dev meeting yesterday, if I'm not mistaken. Are there DCD-specific tests that fail because timeseries can't be hooked in to the new Cython stuff?

@tylerjereddy
Copy link
Member

Checking the logs--looks like mostly core stuff that expects DCD to have timeseries, but actually a very small number of tests involved.

@kain88-de
Copy link
Member Author

The time series I'm referring to is the ability to read n frames at once. The correl stuff can go. I also think we should keep reading multiple frames at once.

self.current_frame += 1
return DCDFrame(xyz, unitcell)

def readframes(self, start=None, stop=None, step=None):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question about the API here.

  • Should we allow to read only a subset of the trajectory with a atom_indices argument?
  • Should we have a format='fac' kwarg specifying the layout of the return array?

Keep in mind that this is a low-level function which should also be suitable for a large range of other people while being efficient in runtime and memory usage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we support subset reading before the libdcd rewrite? If not, I'd probably prefer to avoid making any more special features and focus on only supporting what old DCD did, but in Python 3 as well, then when the readers are excised to their own packages they can be perfected as needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we supported subset reading.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you said the old & new tests were all passing after the hookup was done? So subset reading was tested elsewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeseries didn't pass yet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did comment out some of them. Also the memory reader tests use this feature and those tests failed because of missing this feature.

@tylerjereddy
Copy link
Member

@kain88-de Fair enough, though the message might be a little clearer for the class issue, and I'm not sure there's a huge practical advantage to new style classes here in any case, but might as well use them.

Actually, I don't think this would be an issue at all if we were inheriting from the standard library unittest module (as I like to do), but I think you cut that inheritance out when refactoring. Personally, I like to have access to the standard library test functionalities / structuring on top of numpy stuff, but MDA seems to be moving away from that model for whatever reason.

@tylerjereddy
Copy link
Member

@kain88-de Is reading multiple frames at once explicitly supported by the C / Cython for DCD? I guess I didn't notice that?!

@kain88-de
Copy link
Member Author

@kain88-de Is reading multiple frames at once explicitly supported by the C / Cython for DCD? I guess I didn't notice that?!

No it is not. We had it on our old C code though.

I'm not sure there's a huge practical advantage to new style classes here in any case, but might as well use them.

AFAIK old style isn't available in python 3.

Actually, I don't think this would be an issue at all if we were inheriting from the standard library unittest module (as I like to do), but I think you cut that inheritance out when refactoring. Personally, I like to have access to the standard library test functionalities / structuring on top of numpy stuff, but MDA seems to be moving away from that model for whatever reason.

There are multiple reasons for the switch away from TestCase inheritance. The biggest is that mixes two testing frameworks and therefore test discovery isn't straight forward anymore or how one should best assert. When I ran your tests locally I also experienced a complete crash of nose after I changed a test. That went away after the refactor. A last reason is our switch to pytests. There you will write more functions and almost no more classes. I know that sounds weird but it actually will make tests easier to read. For example currently you are using inheritance have the same tests run on multiple files. In pytest you would express it like this.

@pytest.mark.parametrize('dcdfile, ref', [[DCD, DCD_REF], [DCD_NAMD, DCD_NAMD_REF]])
def test_something(dcdfile, ref):
    with DCDFile(dcdfile) as dcd:
        # test something

Here we are saying with mark.parametrize to run the test function twice with the arguments (DCD, DCD_REF) and (DCD_NAMD, DCD_NAMD_REF). I myself like this because the test is now an isolated unit and it's clear which files are tested with it. Right now I have to look up whether the testclass is inherited somewhere else.

@tylerjereddy
Copy link
Member

I've started using pytest in some projects, but I still like the old Java-inspired framework in the standard lib, and hey Guido also doesn't like pytest. I tend to agree with his view there, but yeah we have to move forward for MDAnalysis. pytest isn't going to replace the numpy testing framework, so in a sense there are still two systems involved, and if you aren't testing an array object, sometimes the plethora of assert statements available in the std lib unittest can be quite powerful. Also, setUp / tearDown is nice and explicit and there's a tendency to avoid heavy decoration of functions with fancy-looking hacks.

Re: old-style classes I can't easily tell at glance which of our Travis test suites is running Py 2 vs. 3-- presumably there's a good reason for this. Anyway, it was compatible before the refactor as long you use vanilla nosetests with no plugins, etc.

So, the ability to read multiple frames at once has to be coded in to the new C / Cython then, which could be a little bit of effort.

@kain88-de kain88-de force-pushed the dcd-cython-rebase branch from e439ffd to b9dabd2 Compare June 4, 2017 15:39
@kain88-de kain88-de mentioned this pull request Jun 4, 2017
4 tasks
@kain88-de kain88-de force-pushed the dcd-cython-rebase branch from b9dabd2 to b5b749b Compare June 4, 2017 16:11
@kain88-de
Copy link
Member Author

coverage decrease is due TimeseriesCollection not being tested with the new cython code.

@kain88-de
Copy link
Member Author

@richardjgowers if you don't like to remove the TimeSeriesCollection #(1375) for the next release how about leaving it in with a DCD legacy reader? For this I would still leave the old DCD code in MDAnalysis together with the TimeSeriesCollection but it would get a new name. DCD_legacy. In case someone wants to use the old correl function we can then issue a deprecation warning and construct a legacy reader that can do the correl. Of course this will work python 2.7 only. What do you think of the general idea to move this along for your visit to arizona?

@richardjgowers
Copy link
Member

@kain88-de if we push out 16.2 with some deprecations we can add DCD + Timeseries removal into 17.0, so should be fine?

@kain88-de
Copy link
Member Author

At what time do you want to release 16.2? Keep in mind that this PR is essential for python 3. For deprecation announcements I would prefer to announce deprecations only in major releases. They happen less often and give people more time to change their scripts. Not everyone updates their environment constantly.

@tylerjereddy
Copy link
Member

@kain88-de Is there anything here I should be working on at the moment? I've added this PR to the Python 2/3 project page. If this PR is getting close to completion, it would probably be helpful to update the Python 2/3 project page to make a sort of 'to do list.' I think there's something from @jbarnoud that we need to address? Maybe it is already up to date until more issues are discovered though.

@orbeckst orbeckst force-pushed the dcd-cython-rebase branch from 8111d71 to 42c7105 Compare July 8, 2017 09:23
@orbeckst
Copy link
Member

orbeckst commented Jul 8, 2017

I spent a little bit of time and condensed the history. I still have a copy of the old history in case you want me to restore it. I reduced the originally 179 commits to 100 commits. I diffed old and new history and they are the same (modulo some white space).

Locally, all tests related to DCD pass, now waiting for Travis.

@kain88-de kain88-de merged commit 742d7c4 into develop Jul 8, 2017
@jbarnoud
Copy link
Contributor

jbarnoud commented Jul 8, 2017

youhou! Congrats guys!

@kain88-de
Copy link
Member Author

Awesome. It only took a year to finish this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants