Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we provide all trajectories having starting timestep to be zero? #195

Open
comcon1 opened this issue Jun 20, 2024 · 13 comments
Open
Assignees

Comments

@comcon1
Copy link
Contributor

comcon1 commented Jun 20, 2024

Now when we go through the trajectory using MDAnalysis, we use TIMELEFTOUT starting from 0 time. And trajectory can have starting time 120 ns for example. And TIMELEFTOUT 50 means that it will actually skip nothing.
Is that correct behavior?
Should we check then that TIMELEFTOUT is larger than zero timestep?

I offer just checking starting time to be zero inside AddFile script. That will fix at least future trajectories.

@ohsOllila
Copy link
Member

In which code exactly is this problem?

@comcon1
Copy link
Contributor Author

comcon1 commented Jun 27, 2024

I saw that in scattering code, but I suppose it could be in many parts. That is not a problem actually, it's more like an uncertainty in the behavior.

@batukav
Copy link
Collaborator

batukav commented Jun 29, 2024

I think we can fix this issue by making use of the PREEQTIME. It looks like we're actually not using this variable.

For any trajectory, the first timestep in the analysis should be PREEQTIME + TIMELEFTOUT

In calc_FormFactors.py (actually in all calculation scripts)

EQtime=float(system['TIMELEFTOUT'])*1000

setting

EQtime=float(system['TIMELEFTOUT'] + system['PREEQTIME])*1000 should do the trick. We'll need to rerun the entire databank, though.

@comcon1
Copy link
Contributor Author

comcon1 commented Jun 29, 2024

PREEQTIME can not be the starting point of the trajectory. I can do the equilibration during 300 ns first and then make new MD run starting from zero. Then PREEQTIME I will anyway set to 300 to inform a user that I did equilibrate it for a long. There are three possibilities:

  • we ask users to always start trajectory from 0
  • we ask users to always start trajectory from PREEQTIME
  • we ignore starting time of the trajectory and calculate TIMELEFTOUT from starting trajectory time

That is the choice, I suppose.

@batukav
Copy link
Collaborator

batukav commented Jun 29, 2024 via email

@comcon1
Copy link
Contributor Author

comcon1 commented Jun 29, 2024

Ok, let's do the third one. I like it better also. You can assign it to me.

@comcon1
Copy link
Contributor Author

comcon1 commented Aug 8, 2024

I found the problem that some of systems cannot be recomputed. For example many of united-atom systems contain definition, which buildh doesn't know. Probably they were added to buildH locally but were never synchronized with buildh github. Probably, there exist other problems so I'm now quite skeptical regarding a possibility of recomputing the entire databank. Probably, I can do this issue without recomputing?

@batukav
Copy link
Collaborator

batukav commented Aug 8, 2024

Could you please post the output for some of the united-atom systems that you couldn't recompute? As a test, can you recompute the same united-atom system with the code you haven't modified?

@comcon1
Copy link
Contributor Author

comcon1 commented Aug 8, 2024

Could you please post the output for some of the united-atom systems that you couldn't recompute? As a test, can you recompute the same united-atom system with the code you haven't modified?

I answered but then I found my problem and deleted last post. I found that I accidentaly broke path to lipid_json_buildh folder which contains dictionaries locally in NMRlipids and that's why I experienced these problems. Now seems that everything is OK. So that was misreporing from my side..

@batukav
Copy link
Collaborator

batukav commented Aug 9, 2024

okay, that's relieving. so you're done with implementing the change and now just recomputing the systems with the new code, right?

@comcon1
Copy link
Contributor Author

comcon1 commented Aug 13, 2024

It's not a problem to fix this issue. It's a problem to recompute the databank. I would like to do it after I'm ready with my current code optimizing and testing. Once I'm ready to start recomputing, I will do this issue. Also, I would probably want to ensure everyone to separate the Data into a GitHub submodule to have an independent commit history for the code and the data. I had a talk about it with @ohsOllila, @fsuarezleston and @markussmiettinen when Samuli was visiting us in Bergen this summer. Note, that databank recomputing will touch huge amount of files. Now, for example, almost all functions writes JSON in more human-readable format (not in one-line), so even having the same numbers, JSONs will change. Probably it's a good time to split up the repository exactly at the point of recomputing.

Probably it's actually better for me to FIX this issue after I make tests, to cover this fix with tests and then just wait until everything will be ready for the recomputing...

@batukav
Copy link
Collaborator

batukav commented Aug 16, 2024

It makes a lot of sense to separate data and scripts. I'll create a separate issue to discuss this.

@comcon1
Copy link
Contributor Author

comcon1 commented Dec 3, 2024

Some story about magic 22000 number for gmx3. It is also related to this problem of zero time step.

Here is the list of the trajectories with GROMACS_VERSION: gromacs3

0a2/272/0a22727a0659852772b4a1193ced99c5981fb739/973cf49703e3217f44b5c18759fd8926e8f5d1f1
219/709/219709a1090be0c8ecc1fc3c6362cdf996c1c997/fdf916cac7bed9521a8d6acd71f12207aa12f6fd
52e/142/52e1425c86de88b82dc50fc6539269b4e73b3a61/cac0f8f72a05076af7c53c7a17bf05f20666c5d7
532/695/5326957d67eb0355d0eccf6aac9d9a44fd984777/48e776c91986eeb05d8d407567423c42bbc6126e
671/199/671199c998a93dcf594469ea33e3b25e73d08b9e/25bf45d5d477c8ae86de3b9c044ead9489d678f6
7ab/2b4/7ab2b45dcff6f8ff34f5f26d8d15fb94f7cf3393/1d42d856ac5622426ba19d7a73030b1a749f817a
7cb/7a3/7cb7a3281b5e79d8e0328d82cdebbcc11f1bab2a/f89b798e078bcc3f9793b20176dbe25ae42778ad
88c/02d/88c02d2e58d6c59918f230e060130570e398071e/32ba461564dfbf8212488be80ddf109e3de23c1f
a69/b1f/a69b1f89e3c9dd265a7ac3009eceb75695de582b/51b9a8cb6e4f8c4f17816bfcc025978b93276e5d
f78/e18/f78e188379aa6f6b1be9f2a8afc8bc2aac191ed0/443765a3de567fd6fe4e8666d29c63e8580e6306
f97/dff/f97dfff6e14f18c3d24bda5b8fdf9d4b4a7ec763/d9fa5dfeae8940ba2ee2faa7784bec5b2d44613a

Do you want me to open an issue about the discussion of supporting these records?

Originally posted by @comcon1 in #201 (comment)


I want to avoid using 22000 magic number and also the gromacs 3 dependency. So yes, maybe an issue can be worth.

I believe that 22000 is just a magic number that once was established for all this trajectories. We actually can't use MDAnalysis or other tools to extract the value of the first frame. What can be done, is application of gmx check on the trajectory and then parsing the output to understand what is the time of the first frame. Then using this time as a parameter for dump. It is not really complicated, but maybe it is too much work for supporting 10 trajectories.

Originally posted by @pbuslaev in #201 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants