Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return incomplete results when simulation errors out #212

Closed
brandon-holt opened this issue Apr 18, 2024 · 10 comments · Fixed by #213
Closed

Return incomplete results when simulation errors out #212

brandon-holt opened this issue Apr 18, 2024 · 10 comments · Fixed by #213
Assignees
Labels
enhancement Expand / change existing functionality question Further information is requested

Comments

@brandon-holt
Copy link
Contributor

Hi! I was wondering if it would be possible to add a feature where simulations will still return the results compiled up to the point of an error?

The situation I'm running into when running on larger datasets is a botorch error to the tune of
All attempts to fit the model have failed.

I am in the process of troubleshooting what about the dataset is causing the failure, but in the meantime it would be nice to see the results up to that point, which should include dozens of batches of experiments.

Also, if you have any experience with what might be causing an error like this, that would be helpful!

Referring to this comment in a botorch thread: pytorch/botorch#1226 (comment)
I initially wondered if this could be my issue, but baybe should prevent this from being an issue since it identifies duplicate parameter values and randomly picks one.

@brandon-holt
Copy link
Contributor Author

Just as a temp local fix I am adding a catch-all except in the DOE loop and breaking to return the incomplete results.

while k_iteration < limit:
        # Get the next recommendations and corresponding measurements
        try:
            measured = campaign.recommend(batch_size=batch_size)
        except:
            break

@AdrianSosic
Copy link
Collaborator

Hi @brandon-holt 👋🏼 One part that can be easily answered: your suggestion of providing access to partial simulation results sounds absolutely reasonable and I think I can confidently say that we'll incorporate some appropriate mechanism into the refactored module (so far, we simply haven't had the need for it because the simulations always succeeded). The small challenge is see here is that a clean handling requires more than just returning the incomplete dataframe (your workaround) or passing through the exception (current logic) because:

  • In the former, we lose the error message plus it's not trivial to even identify that an error occurred.
  • In the latter, we lose the data.

Also, the mechanism needs to be compatible with all simulation layers we offer (i.e. simulating a single campaign vs simulating multiple campaigns, etc). However, I think I already have some good ideas how this can be accomplished.

That said, I've nothing against providing a quick workaround to unblock you, as long as the changes do not cause backward compatibility issues later. Let me draft a quick PR and see what my colleagues think about it 👍🏼 will tag you there.

@AdrianSosic
Copy link
Collaborator

Now, the more worrisome part. So far, I haven't experience any of the problems you describe. While I can offer trying to debug/investigate the botorch internals if we can come up with a minimal reproducing example, I would only do it as a last resort and first see if we get a better understanding of what's going on.

So here a few things we should consider first:

  • Have you already tried what has been suggested in the thread and checked if it mitigates the problem?
  • You are right that only one of the parameter duplicates is picked per simulation step, but be aware that the recommender might still suggest the same configuration in a later iteration in case the predictive model favors it! (From a statistical perspective, this is indeed the correct thing to do if we assume a noisy environment.) You can explicitly forbid this to happen by setting the allow_recommending_already_measured to False, which all our "pure" recommenders support. That way, we know for certain that no duplicates can appear in the training data throughout the simulation.
  • Perhaps the issue of "being close" is not only to be interpreted on the level of parameter configurations but instead also on individual feature columns, i.e. I could imagine that numerical issues can appear when columns are highly correlated or have show only little variation in their values. We have an open workitem that will allow to activate/control automatic feature decorrelation, but we are not fully there yet. Still, it could help to have a look at the comp_df of your searchspace to see if there is anything suspicious ...

@AdrianSosic AdrianSosic added enhancement Expand / change existing functionality question Further information is requested labels Apr 19, 2024
@brandon-holt
Copy link
Contributor Author

Heyo! These are good points, I will look into them and let you know what I find!!

@brandon-holt
Copy link
Contributor Author

@AdrianSosic Okay so after some quick and dirty initial testing, it looks like allow_recommending_already_measured actually causes it to fail sooner, which is potentially interesting! But I am running more extensive tests to make meaningful comparisons, which should be done in a day or so and I'll lyk what I find.

In the meantime, I'm looking into the features in my comp_df for each parameter in my search space to see if any features are highly correlated. Attaching here if you're curious!

comp_df.zip

@AdrianSosic
Copy link
Collaborator

You mean it fails sooner if you set the attribute to False? That's indeed a bit surprising. Curios what's going on here... 🤔 I'll let you first finish your tests and we can have a look 👍🏼

@brandon-holt
Copy link
Contributor Author

@AdrianSosic Yes, so it definitely appears that setting allow_recommending_already_measured=False causes the model to reach a failure point sooner. In my tests with my dataset that would take ~1000 iterations to test every datapoint, the model fails in:

Setting Result
allow_recommending_already_measured=True >200 iterations
allow_recommending_already_measured=False >100 iterations

This could make sense because when the model isn't allowed to pick 'repeated' measurements, it is more likely to reach the model-breaking outliers/datapoints faster. However, our hypothesis was that the 'repeated' measurements that have the same features but disparate target values were in fact the ones that were breaking the model.

@AdrianSosic AdrianSosic self-assigned this Apr 26, 2024
@brandon-holt
Copy link
Contributor Author

brandon-holt commented Apr 26, 2024

@AdrianSosic Hey just adding a repro for you in case it helps. Just a heads up running as is will take ~200-300 GB of RAM. If that's problematic, you could bump percent_discretize up to 20-50 to reduce the amount of memory required. Just note that I think there may be a point above which the issue disappears, so keep that in mind if you go to higher discretization factors.

The only concern is that by replacing my molecules SMILES with random ones, it may change what solves the issue, but after running some initial tests on my end, it looks like the behavior is similar (as the results shown in the table above).

baybe212_repeat_measurements.zip

@AdrianSosic
Copy link
Collaborator

Hi @brandon-holt, thanks for sharing. I would like to have a look but need to postpone this until next week. Currently, we are a bit swamped with open PRs and features to be merged + need to release 0.9.0 asap, which will keep me busy for a while. Will let you know once I've had a chance to look 🙃

@brandon-holt
Copy link
Contributor Author

@AdrianSosic No worries, thanks for the heads up!

AdrianSosic added a commit that referenced this issue May 13, 2024
Fixes #212 by returning incomplete simulation results with a warning
instead of terminating the simulation when encountering an exception.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Expand / change existing functionality question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants