Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nan produced by Molecular Transfer near edge of box #467

Open
GregorySchwing opened this issue Oct 6, 2022 · 7 comments
Open

Nan produced by Molecular Transfer near edge of box #467

GregorySchwing opened this issue Oct 6, 2022 · 7 comments
Assignees

Comments

@GregorySchwing
Copy link
Collaborator

Describe the bug
The molecular transfer move when calculating the weights of the CBMC trials, sums all the weights of each trial. The problem is one of the trials has a value of Nan for inter energy and infinite real space energy, producing a nan weight causing GOMC to crash.

I was able to prevent this from happening by only adding the finite weights to the sum

double stepWeight = 0;
uint numFiniteLJTrials = calculateStepWeight(nLJTrials,
stepWeight,
ljWeights,
data->ff.beta,
inter,
real);
uint winner = prng.PickWeighted(ljWeights, nLJTrials, stepWeight);
newMol.UpdateOverlap(overlap[winner]);
newMol.MultWeight(stepWeight / numFiniteLJTrials);

normalizedStepWeight = sum/numberOfFiniteWeights.

I only did this for the DCSingle file. unfortunately these types of summations are peppered throughout the DC files, .

Also, I am concerned that I may need to track the numberOfFiniteWeights to maintain detailed balance.

To Reproduce
Running the attached GCMC simulation of the water shell with tip3 water.
bugFix_bad_CBMC_trial_Shouldnt_Kill_GOMC.zip

Expected behavior
If a trial produces an an extremely poor energy, weight it with weight 0; instead of crashing GOMC.

Screenshots
Green sphere indicates position of growing molecule.

Screenshot from 2022-10-06 13-35-40
Screenshot from 2022-10-06 13-50-23

Input files
build dev version and run with gomc_production....conf

Please complete the following information:

  • OS: [Ubuntu]
  • Ensemble: [GCMC]
  • Code version [2.75 dev branch]

Additional context
Add any other context about the problem here.

@GregorySchwing GregorySchwing self-assigned this Oct 6, 2022
@GOMC-WSU
Copy link
Owner

GOMC-WSU commented Oct 6, 2022

Ok, so why are we getting infinite energy for one of the trials?

Also, I am concerned that I may need to track the numberOfFiniteWeights to maintain detailed balance.

yes, you will have to do this. But first, we need to figure out what we are even getting an infinite energy. That should not happen. It would be better to set the weight of a very bad configuration to zero and keep it in the sum.

@GregorySchwing
Copy link
Collaborator Author

GregorySchwing commented Oct 6, 2022

Ok, so why are we getting infinite energy for one of the trials?

Also, I am concerned that I may need to track the numberOfFiniteWeights to maintain detailed balance.

yes, you will have to do this. But first, we need to figure out what we are even getting an infinite energy. That should not happen. It would be better to set the weight of a very bad configuration to zero and keep it in the sum.

One of the randomly generated trial coordinates is identical to a system coordinate, resulting in a distSq of 0.

Then when we calculate the LJ energy

inline double FFParticle::CalcEn(const double distSq, const uint index) const {
double rRat2 = sigmaSq[index] / distSq;

rRat2 is undefined, which returns +Nan.

This could be prevented with a simpler change of this

if (forcefield.rCutSq < distSq)
return 0.0;

to this

if (forcefield.rCutSq < distSq)
return 0.0;
else if (distSq == 0.0)
return INFINITY;

The exact same coordinate to double precision is being generated at random, and makes me think the RNG isn't being incremented.

Since this a restarted GOMC simulation, but not restart from checkpoint GOMC simulation, and an intseed is provided, GOMC reuses the same random numbers since both start from step 0.

Basically on GOMC run 1, step 12 successfully transferred a molecule into position xyz (which is a function of step # and seed).

Then on GOMC run 2, it attempted to transfer a molecule into position xyz (which is a function of step # and seed), which is the identical position as the molecule already there.

Hence distSq == 0, and we divide by distSq in the LJ Particle inter calculation.

I recommend we do a find/replace of

if (forcefield.rCutSq < distSq)
return 0.0;

with

if (forcefield.rCutSq < distSq)
return 0.0;
else if (distSq == 0.0)
return INFINITY;

Also

if (forcefield.rCutCoulombSq[b] < distSq)
return 0.0;

with

if (forcefield.rCutCoulombSq[b] < distSq)
return 0.0;
else if (distSq == 0.0)
return INFINITY;

For IEEE floats, division of a finite nonzero float by 0 is well-defined and results in +infinity (if the value was >zero) or -infinity (if the value was less than zero). The result of 0.0/0.0 is NaN. If you use integers, the behaviour is undefined.

GOMC uses the c++ exp() function, which doesn't mind INF values, it's just NAN than cause GOMC to crash.

@GregorySchwing
Copy link
Collaborator Author

Alternatively, we could print out an error informing the user they are reusing random numbers and their simulation is going to be correlated with the previous GOMC run they are restarting from.

The error message would include instructions to pass in

"InitStep lastStepPreviousRunGeneratedARestartFile"

or

"Checkpoint true checkpoint.file"

in the conf file.

@LSchwiebert
Copy link
Collaborator

I think it's a bad idea to allow someone to restart the simulation with the same random sequence/seed that was used at step 0. I can't think of a good reason for wanting to do this, so it would be done only by mistake. I think it would be better to generate an error in this case or just handle it in the code without user-intervention if possible.

In any case, it would be a good idea to handle the degenerate case of distSq == 0.0. It seems, though, that any value that results in rejecting the move would be fine, so Infinity works.

@GOMC-WSU
Copy link
Owner

GOMC-WSU commented Oct 7, 2022

I agree with @LSchwiebert. Restarting with the same random number sequence as a prior simulation is bad practice, unless the goal is to debug something. But in that case, one would be using the checkpoint functionality, and expecting that behavior.

BTW, for production runs for publications, we should be starting each simulation with a unique random number seed (e.g. seeded by the system time, or by a well-defined, unique seed).

@LSchwiebert
Copy link
Collaborator

I think we should generate an error message and terminate if the user is restarting with the same random seed as described above. Alternatively, we could generate a warning if the user is running in expert mode, but we should be able to work around this for debugging purposes, such as using a checkpoint file or specifying the step as above, so that we get consistent random numbers for testing.

We should include the information on how to fix the config file as mentioned above.

@LSchwiebert
Copy link
Collaborator

In other words, I'm fine with generating an error in all cases, even expert mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants