-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question for faster execution: Seeing cpu_info add 10 secs to execution #141
Comments
Hey Jeb! Great to hear from you again! 😃 Not sure how I missed seeing this before... Good thing I checked the Pulse page. 😖 What version of NHC is it that you're running? For this specific check, I'd strongly recommend using the NHC 1.5 code currently in the Feedback on the fix is definitely welcome! You might also be able to get away with just dropping in the scripts/lbnl_hw.nhc from the Of course, if it would make things easier on you, I'm happy to provide snapshot tarballs and/or RPMs; just let me know! |
Thanks for getting back to me! I’m currently trying to incorporate this with a new workload manager. Is there a simple way to provide scripts that drain and undrain? Sent from my iPhoneOn Sep 19, 2023, at 03:04, Michael Jennings ***@***.***> wrote:
Hey Jeb! Great to hear from you again! 😃
Not sure how I missed seeing this before... Good thing I checked the Pulse page. 😖
What version of NHC is it that you're running? For this specific check, I'd strongly recommend using the NHC 1.5 code currently in the dev branch; while 1.5 hasn't been released yet, the dev branch has a fix for this exact issue -- #121 (commit 7e2a8c6). (At least I think that's what you're seeing.)
Feedback on the fix is definitely welcome!
You might also be able to get away with just dropping in the scripts/lbnl_hw.nhc from the dev branch. I've never tried this myself, exactly, but they should be pretty self-contained. Of course, you'd also need test/test_lbnl_hw.nhc dropped in too if you wanted to run the unit tests for the new module. Feedback on this method is also welcome, if you decide to try it.
Of course, if it would make things easier on you, I'm happy to provide snapshot tarballs and/or RPMs; just let me know!
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>
|
In the default configuration, the scripts that handle draining/offlining and undraining/onlining nodes are node-mark-offline and node-mark-online, respectively. By default, they get installed into |
Thanks! I had found those as well. I’ll ask if the team wants to push it upstream, but doubt they’ll want to as the wlm was built in house for their specific workload. I saw frontier was released, how’s the new cluster doing? And how’s the team? Hope the crazy on call has calmed downSent from my iPhoneOn Sep 20, 2023, at 00:46, Michael Jennings ***@***.***> wrote:
Thanks for getting back to me! I’m currently trying to incorporate this with a new workload manager. Is there a simple way to provide scripts that drain and undrain?
In the default configuration, the scripts that handle draining/offlining and undraining/onlining nodes are node-mark-offline and node-mark-online, respectively. By default, they get installed into /usr/libexec/nhc/ (or /usr/lib/nhc/ on Debian). Modifying those scripts is one option -- and if you're considering contributing your support for this other WLM to the upstream project, this would definitely be the way to go! -- since the handling of the different RM/WLM products is pretty straightforward. Another option would be to change the values of the OFFLINE_NODE and ONLINE_NODE config variables; those control what commands NHC will use to drain or resume a node.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Strangely when I add this cpu_info check the script takes 10 sec. longer to execute.
am I adding this incorrectly? Also, how can I be sure the nhc is running the checks in parrallel for faster execution? attempting to minimize health checking.
time with:
real 0m11.548s
user 0m0.246s
sys 0m10.159s
time without:
real 0m0.119s
user 0m0.062s
sys 0m0.018s
The text was updated successfully, but these errors were encountered: