-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
partial cancel: transition partial cancel final error to warning #1309
Comments
I guess it's obvious but reducing the log level below LOG_ERR means it won't be seen on the user's stderr. The bug that was just found and fixed would not have been seen at all since it didn't occur in the node-exclusive configured system instance where logs are persistent. So good idea/bad idea? Hmm. |
The PR I have out to fix issue #1284 and parts of rzadams-related issues demotes the error to a warning since it will be a common occurrence on clusters with I can't think of a way to distinguish between a state inconsistency that indicates an error from an inconsistency related to canceling brokerless vertices. |
@jameshcorbett pointed out that even the LOG_WARNING will fill the logs on systems with rabbits or brokerless ssds. I think this is a compelling argument for making it a LOG_DEBUG. |
This might be a case for keeping the “is it an ssd” check, as a way to say “is this because of rabbits” check just wrapped around the warning message.
Get Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Daniel Milroy ***@***.***>
Sent: Friday, October 18, 2024 4:48:52 PM
To: flux-framework/flux-sched ***@***.***>
Cc: Scogland, Tom ***@***.***>; Author ***@***.***>
Subject: Re: [flux-framework/flux-sched] partial cancel: transition partial cancel final error to warning (Issue #1309)
@jameshcorbett<https://urldefense.us/v3/__https://github.com/jameshcorbett__;!!G2kpM7uM-TzIFchu!2iUJZT3G-4PgyXXi1RTjspeANmvBEY6ts0uDAqWXrsBKER6iUT72TfFN7NATYQq9_GialN8ilWSrZhqxgr5Jqu0oDW0$> pointed out that even the LOG_WARNING will fill the logs on systems with rabbits or brokerless ssds. I think this is a compelling argument for making it a LOG_DEBUG.
—
Reply to this email directly, view it on GitHub<https://urldefense.us/v3/__https://github.com/flux-framework/flux-sched/issues/1309*issuecomment-2423380417__;Iw!!G2kpM7uM-TzIFchu!2iUJZT3G-4PgyXXi1RTjspeANmvBEY6ts0uDAqWXrsBKER6iUT72TfFN7NATYQq9_GialN8ilWSrZhqxgr5JxwDbO74$>, or unsubscribe<https://urldefense.us/v3/__https://github.com/notifications/unsubscribe-auth/AAFBFNJUYCLDBUWT5UO7KZTZ4GM6JAVCNFSM6AAAAABPVNHLQ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRTGM4DANBRG4__;!!G2kpM7uM-TzIFchu!2iUJZT3G-4PgyXXi1RTjspeANmvBEY6ts0uDAqWXrsBKER6iUT72TfFN7NATYQq9_GialN8ilWSrZhqxgr5JhhJnv1o$>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
I should add that depending on the pruning filter settings an error/warning/debug may not get logged in this case. In qmanager, that condition is met when the partial cancels didn't set However, if the pruning filter is set to default ( So I actually don't think the "is it an ssd" check helps disambiguate the cases. |
What I could do is add a bool output parameter to |
No description provided.
The text was updated successfully, but these errors were encountered: