-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SARC-330] Implémenter les alertes : Proportion de jobs GPU avec stats prometheus spécifique aux GPUs sur un noeud donné plus bas qu’un threshold X #135
Conversation
273ef0e
to
5fdac84
Compare
…s prometheus spécifique aux GPUs sur un noeud donné plus bas qu’un threshold X
5fdac84
to
d47872f
Compare
min_jobs_per_group: Optional[Union[int, Dict[str, int]]] = None, | ||
nb_stddev=2, | ||
with_gres_gpu=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ce nouveau parametre n'est pas testés dans test_check_prometheus_scraping_stats, et donc on teste uniquement le cas des jobs CPU.
@@ -81,24 +91,41 @@ def check_prometheus_stats_occurrences( | |||
clip_time = True | |||
df = load_job_series(start=start, end=end, clip_time=clip_time) | |||
|
|||
# Parse minimum_runtime, and select only jobs where | |||
# elapsed time >= minimum runtime and allocated.gres_gpu == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... d'ailleurs avant on ignorait les jobs GPU, on dirait bien ^^
@@ -175,3 +201,43 @@ def check_prometheus_stats_occurrences( | |||
logger.warning( | |||
f"[{cluster_name}] no Prometheus data available: no job found" | |||
) | |||
|
|||
|
|||
def check_prometheus_stats_for_gpu_jobs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... au temps pour moi, je n'étais pas bien réveillé. test_check_prometheus_stats_for_gpu_jobs
teste bien les GPU à travers l'appel à check_prometheus_stats_for_gpu_jobs. Pas vraiment un test unitaire, mais c'est ok pour moi :-)
@nurbal ! Prêt pour une review !