-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Safeguards don't always stop the experiment #5
Comments
@Lawouach any ETA for that? If you don't have time we can change it and PR. |
No ETA. Setting an environment variable that is seen by other processes is usually not possible considering it's done in the private space of the process. Unless my Unix foo is rusted :p Perhaps a different approach would be to rely on a well-known lock file instead? |
Found two very interesting articles on that matter |
Thanks. Not sure this answers the initial question. The actual requests of Leo is supported already (being able to stop before/after activities), at least in the sense you can already create a controller that does this. But, as mentioned elsewhere, the challenge is not synchronizing threads. The challenge is interrupting blocking calls that live outside the Python VM. Say you call Now, if you are only looking for a change in the safeguard control to say "don't exit randomly but wait for the next 'before_activity', we can indeed do that without lock file or env variable". But I had understood you needed the env variable to terminate the chaos process from outside, by monitoring for that var/file. But you still have to acceept that if the activity is doing a long blocking call, I will not be able to do anything until it has terminated. |
Please have a try with that branch and let me know if this helps. Add this flag to the safeguards arguments (next to the probes list):
|
Because of the unreliable nature of the signals, they don't always stop the experiment. If a signal was thrown, some random try and except closure can catch and swallow it. Causing the experiment to continue running.
We would like to suggest a quick workaround.
In safeguards we would like to set an environment variable "CHAOS_STOP" whenever the safeguard probe fails.
And then in controls, we can verify if the environment is set before every activity and stop the experiment by preventing the activity from running. Perhaps by running exit gracefully again or creating a proper way of stopping the experiments in controls.
The text was updated successfully, but these errors were encountered: