-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failure to resume from chain file #53
Comments
@jeremy-baier, Do you get this error even when you set |
Hi Ken, |
Hi, Jeremy. If you're starting a new run, you should of course either say |
Thanks for the reply, Ken. |
OK. I don't think it's #54, because that had to do with not writing hot chains. So let's try to find the bug. Could you start a run, then cancel it as you said, then look at the number of rows in all your chain files (e.g., with "wc -l")? If the number of rows is not one plus a multiple of 100, let me know and we'll try to understand how that occurs. If every file does indeed have the form 100n+1, try resuming and see if it works. |
Ok Ken, I think I have tracked down what is going on. PTMCMCSampler/PTMCMCSampler/PTMCMCSampler.py Line 475 in 9811073
I think this could be solved by checking the dimensionality of the chain when it gets loaded in. But let me know what you think makes the most sense for a fix. I am a bit surprised that other people have not run into this issue before. Does this mean that my models are really slow getting started?? (I checked and it was about ~40 minutes to get to the first check point for some cases.) Either way, I would be happy to help put in a PR to fix this. Let me know if my explanation is coherent and sounds right to you! Thanks, —jeremy |
This also explains why I was not able to consistently replicate the error. It was only happening for the jobs that were slow getting started. |
Thanks, Jeremy. Good catch! In my opinion python is too willing to muddle the difference between different shapes of arrays with the same data. I think you can fix this by passing ndmin = 2 to np.loadtxt. Please go ahead. |
That sounds like a good fix! |
Fixed by #55 |
specifically with parallel tempering, I am getting failures to start sampling (both resuming and starting a new job) with the following error message:
File "/home/baierj/miniconda3/envs/custom_noise/lib/python3.9/site-packages/PTMCMCSampler/PTMCMCSampler.py", line 303, in initialize raise Exception( Exception: Old chain has 21 rows, which is not the initial sample plus a multiple of isave/thin = 100
I am using the most up-to-date master version of PTMCMCsampler installed from git.
Weirdly, I cannot replicate this error consistently. It just happens for some jobs but not for others.
The text was updated successfully, but these errors were encountered: