Nan produced by Molecular Transfer near edge of box #467

GregorySchwing · 2022-10-06T19:48:35Z

Describe the bug
The molecular transfer move when calculating the weights of the CBMC trials, sums all the weights of each trial. The problem is one of the trials has a value of Nan for inter energy and infinite real space energy, producing a nan weight causing GOMC to crash.

I was able to prevent this from happening by only adding the finite weights to the sum

double stepWeight = 0;
uint numFiniteLJTrials = calculateStepWeight(nLJTrials,
stepWeight,
ljWeights,
data->ff.beta,
inter,
real);
uint winner = prng.PickWeighted(ljWeights, nLJTrials, stepWeight);
newMol.UpdateOverlap(overlap[winner]);
newMol.MultWeight(stepWeight / numFiniteLJTrials);

normalizedStepWeight = sum/numberOfFiniteWeights.

I only did this for the DCSingle file. unfortunately these types of summations are peppered throughout the DC files, .

Also, I am concerned that I may need to track the numberOfFiniteWeights to maintain detailed balance.

To Reproduce
Running the attached GCMC simulation of the water shell with tip3 water.
bugFix_bad_CBMC_trial_Shouldnt_Kill_GOMC.zip

Expected behavior
If a trial produces an an extremely poor energy, weight it with weight 0; instead of crashing GOMC.

Screenshots
Green sphere indicates position of growing molecule.

Input files
build dev version and run with gomc_production....conf

Please complete the following information:

OS: [Ubuntu]
Ensemble: [GCMC]
Code version [2.75 dev branch]

Additional context
Add any other context about the problem here.

GOMC-WSU · 2022-10-06T20:29:46Z

Ok, so why are we getting infinite energy for one of the trials?

Also, I am concerned that I may need to track the numberOfFiniteWeights to maintain detailed balance.

yes, you will have to do this. But first, we need to figure out what we are even getting an infinite energy. That should not happen. It would be better to set the weight of a very bad configuration to zero and keep it in the sum.

GregorySchwing · 2022-10-06T22:13:31Z

Ok, so why are we getting infinite energy for one of the trials?

Also, I am concerned that I may need to track the numberOfFiniteWeights to maintain detailed balance.

yes, you will have to do this. But first, we need to figure out what we are even getting an infinite energy. That should not happen. It would be better to set the weight of a very bad configuration to zero and keep it in the sum.

One of the randomly generated trial coordinates is identical to a system coordinate, resulting in a distSq of 0.

Then when we calculate the LJ energy

inline double FFParticle::CalcEn(const double distSq, const uint index) const {
double rRat2 = sigmaSq[index] / distSq;

rRat2 is undefined, which returns +Nan.

This could be prevented with a simpler change of this

if (forcefield.rCutSq < distSq)
return 0.0;

to this

if (forcefield.rCutSq < distSq)
return 0.0;
else if (distSq == 0.0)
return INFINITY;

The exact same coordinate to double precision is being generated at random, and makes me think the RNG isn't being incremented.

Since this a restarted GOMC simulation, but not restart from checkpoint GOMC simulation, and an intseed is provided, GOMC reuses the same random numbers since both start from step 0.

Basically on GOMC run 1, step 12 successfully transferred a molecule into position xyz (which is a function of step # and seed).

Then on GOMC run 2, it attempted to transfer a molecule into position xyz (which is a function of step # and seed), which is the identical position as the molecule already there.

Hence distSq == 0, and we divide by distSq in the LJ Particle inter calculation.

I recommend we do a find/replace of

if (forcefield.rCutSq < distSq)
return 0.0;

with

if (forcefield.rCutSq < distSq)
return 0.0;
else if (distSq == 0.0)
return INFINITY;

Also

if (forcefield.rCutCoulombSq[b] < distSq)
return 0.0;

with

if (forcefield.rCutCoulombSq[b] < distSq)
return 0.0;
else if (distSq == 0.0)
return INFINITY;

For IEEE floats, division of a finite nonzero float by 0 is well-defined and results in +infinity (if the value was >zero) or -infinity (if the value was less than zero). The result of 0.0/0.0 is NaN. If you use integers, the behaviour is undefined.

GOMC uses the c++ exp() function, which doesn't mind INF values, it's just NAN than cause GOMC to crash.

GregorySchwing · 2022-10-06T22:41:00Z

Alternatively, we could print out an error informing the user they are reusing random numbers and their simulation is going to be correlated with the previous GOMC run they are restarting from.

The error message would include instructions to pass in

"InitStep lastStepPreviousRunGeneratedARestartFile"

or

"Checkpoint true checkpoint.file"

in the conf file.

LSchwiebert · 2022-10-06T23:51:36Z

I think it's a bad idea to allow someone to restart the simulation with the same random sequence/seed that was used at step 0. I can't think of a good reason for wanting to do this, so it would be done only by mistake. I think it would be better to generate an error in this case or just handle it in the code without user-intervention if possible.

In any case, it would be a good idea to handle the degenerate case of distSq == 0.0. It seems, though, that any value that results in rejecting the move would be fine, so Infinity works.

GOMC-WSU · 2022-10-07T01:09:06Z

I agree with @LSchwiebert. Restarting with the same random number sequence as a prior simulation is bad practice, unless the goal is to debug something. But in that case, one would be using the checkpoint functionality, and expecting that behavior.

BTW, for production runs for publications, we should be starting each simulation with a unique random number seed (e.g. seeded by the system time, or by a well-defined, unique seed).

LSchwiebert · 2022-10-07T13:19:34Z

I think we should generate an error message and terminate if the user is restarting with the same random seed as described above. Alternatively, we could generate a warning if the user is running in expert mode, but we should be able to work around this for debugging purposes, such as using a checkpoint file or specifying the step as above, so that we get consistent random numbers for testing.

We should include the information on how to fix the config file as mentioned above.

LSchwiebert · 2022-10-07T13:20:17Z

In other words, I'm fine with generating an error in all cases, even expert mode.

GregorySchwing self-assigned this Oct 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nan produced by Molecular Transfer near edge of box #467

Nan produced by Molecular Transfer near edge of box #467

GregorySchwing commented Oct 6, 2022

GOMC-WSU commented Oct 6, 2022 •

edited

Loading

GregorySchwing commented Oct 6, 2022 •

edited

Loading

GregorySchwing commented Oct 6, 2022

LSchwiebert commented Oct 6, 2022

GOMC-WSU commented Oct 7, 2022

LSchwiebert commented Oct 7, 2022

LSchwiebert commented Oct 7, 2022

Nan produced by Molecular Transfer near edge of box #467

Nan produced by Molecular Transfer near edge of box #467

Comments

GregorySchwing commented Oct 6, 2022

GOMC-WSU commented Oct 6, 2022 • edited Loading

GregorySchwing commented Oct 6, 2022 • edited Loading

GregorySchwing commented Oct 6, 2022

LSchwiebert commented Oct 6, 2022

GOMC-WSU commented Oct 7, 2022

LSchwiebert commented Oct 7, 2022

LSchwiebert commented Oct 7, 2022

GOMC-WSU commented Oct 6, 2022 •

edited

Loading

GregorySchwing commented Oct 6, 2022 •

edited

Loading