-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return incomplete results when simulation errors out #212
Comments
Just as a temp local fix I am adding a catch-all except in the DOE loop and breaking to return the incomplete results. while k_iteration < limit:
# Get the next recommendations and corresponding measurements
try:
measured = campaign.recommend(batch_size=batch_size)
except:
break |
Hi @brandon-holt 👋🏼 One part that can be easily answered: your suggestion of providing access to partial simulation results sounds absolutely reasonable and I think I can confidently say that we'll incorporate some appropriate mechanism into the refactored module (so far, we simply haven't had the need for it because the simulations always succeeded). The small challenge is see here is that a clean handling requires more than just returning the incomplete dataframe (your workaround) or passing through the exception (current logic) because:
Also, the mechanism needs to be compatible with all simulation layers we offer (i.e. simulating a single campaign vs simulating multiple campaigns, etc). However, I think I already have some good ideas how this can be accomplished. That said, I've nothing against providing a quick workaround to unblock you, as long as the changes do not cause backward compatibility issues later. Let me draft a quick PR and see what my colleagues think about it 👍🏼 will tag you there. |
Now, the more worrisome part. So far, I haven't experience any of the problems you describe. While I can offer trying to debug/investigate the botorch internals if we can come up with a minimal reproducing example, I would only do it as a last resort and first see if we get a better understanding of what's going on. So here a few things we should consider first:
|
Heyo! These are good points, I will look into them and let you know what I find!! |
@AdrianSosic Okay so after some quick and dirty initial testing, it looks like In the meantime, I'm looking into the features in my comp_df for each parameter in my search space to see if any features are highly correlated. Attaching here if you're curious! |
You mean it fails sooner if you set the attribute to |
@AdrianSosic Yes, so it definitely appears that setting
This could make sense because when the model isn't allowed to pick 'repeated' measurements, it is more likely to reach the model-breaking outliers/datapoints faster. However, our hypothesis was that the 'repeated' measurements that have the same features but disparate target values were in fact the ones that were breaking the model. |
@AdrianSosic Hey just adding a repro for you in case it helps. Just a heads up running as is will take ~200-300 GB of RAM. If that's problematic, you could bump The only concern is that by replacing my molecules SMILES with random ones, it may change what solves the issue, but after running some initial tests on my end, it looks like the behavior is similar (as the results shown in the table above). |
Hi @brandon-holt, thanks for sharing. I would like to have a look but need to postpone this until next week. Currently, we are a bit swamped with open PRs and features to be merged + need to release 0.9.0 asap, which will keep me busy for a while. Will let you know once I've had a chance to look 🙃 |
@AdrianSosic No worries, thanks for the heads up! |
Fixes #212 by returning incomplete simulation results with a warning instead of terminating the simulation when encountering an exception.
Hi! I was wondering if it would be possible to add a feature where simulations will still return the results compiled up to the point of an error?
The situation I'm running into when running on larger datasets is a botorch error to the tune of
All attempts to fit the model have failed.
I am in the process of troubleshooting what about the dataset is causing the failure, but in the meantime it would be nice to see the results up to that point, which should include dozens of batches of experiments.
Also, if you have any experience with what might be causing an error like this, that would be helpful!
Referring to this comment in a botorch thread: pytorch/botorch#1226 (comment)
I initially wondered if this could be my issue, but baybe should prevent this from being an issue since it identifies duplicate parameter values and randomly picks one.
The text was updated successfully, but these errors were encountered: