You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would start with a min interval, then double this interval on every subsequent call, up to a max interval. E.g. from 5secs min to 60secs max, it would go like:
Yes, it could make sense to share the throttling implementation with slurm_cluster_system and then just use a much higher threshold (like once a second). It's just reading a file, but the MPI cluster also does not like if you hammer it's filesystem too much.
The text was updated successfully, but these errors were encountered:
The purpose of this is mostly to quickly detect if something is fundamentally wrong that makes all jobs fail, right? That is, it is enough to do this only once in the beginning? I was wondering if it would make sense to reset when new jobs are submitted, but depending on the number and duration of jobs this might again lead to over-polling the system.
I'd probably start with a slightly higher value (let's say 10s) but increase a bit more slowly as in my experience so far, it sometimes take a bit until Slurm actually starts the job, so lot's of checking in the very beginning might not be that useful.
By Felix Widmaier on 2024-01-11T12:21:19 (imported from GitLab)
The following discussion from !71 should be addressed:
discussion 1: (+1 comment)
discussion 2: Also use it for Condor
The text was updated successfully, but these errors were encountered: