-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parameter sweep run_model
is hard to debug
#1356
Comments
@Robbybp I don't know that there was too much rationale behind the current design; it mostly echoes designs from previous iterations but I am not sure that there has been much thought into different ways it can go wrong. Having a terminate-immediately Exception type sounds like a good idea however; usage would primarily fall upon the user but that is probably how it has to be (i.e. it is up to them how much they want to instrument their method). That being said, we probably should look at catching some of the obvious ones ourselves if possible. The To consider other alternatives, there is the opposite approach in that we fail outright on any Exception and insist that users put in try/excepts as necessary (which they would need to do in the above anyway). However, that could be just as frustrating if you have a long run and late in the cycle you run into the one Exception you did not expect. So, in short I think trying to find something in the middle might be he best approach. I.e. have an Exception users can use to indicate fail immediately (that they can use as they wish), add some more protections where we can to catch obvious issues like callback signatures with the wrong number of arguments, and otherwise just catch the Exception and log it in the results. Adding the terminate immediately Exception is easy, but we should think about what cases we have where we want to fail immeidately (or better yet, fail before executing anything). |
On re-reading the original message, we should also log any Exceptions we catch in the results for posterity. |
This would have been my suggestion, but I see that it could be frustrating to get an unhandled exception near the end of a run. Maybe we could inspect the exception in our |
We do this already, right? |
@Robbybp My question then is what exceptions would we expect a solver to raise, or perhaps more specifically can we neatly categorize things into exceptions to catch and log versus those to re-raise? My initial feeling is that it would be better to have a "white-list" of exceptions to catch and log rather than a black-list of exceptions to terminate on - it will fail more aggressively, but would generally indicate a user-error that should be addressed. We could then clearly document the exception types that will be logged for the user. Another thing however would be to think about how this might work in a parallel environment where we would ideally want a critical failure in one thread to kill all other threads as well (I think this is probably part of the reason for the liberal catch and log approach at the moment). |
I think in this case I would advocate for a
I think this is out-of-scope for our current implementation. The parallel implementation can override the default, although I think I see the advantages of the current implementation in the parallel setting. |
In
ParameterSweepBase
, we wraprun_model
in a try/except with no specified exception. This seems a bit dangerous. For errors in solvers and other "black-box" subroutines, we probably want to record them. For other errors, e.g. typos, we probably want the sweep to fail immediately. Otherwise we can run an entire parameter sweep just to see something likeValueError: too many values to unpack
for every item. The risk of odd errors increases as we write more complicatedrun_model
methods. Right now I have scaling, re-initialization, solve, and validation steps in arun_model
function that I'm using. My question is: Should we define an exception type that users must raise in order to be caught by the sweep runner? The downside is that users will have to know what types of exceptions to expect from their black-box subroutines, and will have to catch these and re-raise our exception type. I assume the current design was chosen intentionally, but am curious to hear the rationale for it.The text was updated successfully, but these errors were encountered: