Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

powerman: support a --nowait option #68

Open
chu11 opened this issue Jan 20, 2024 · 16 comments
Open

powerman: support a --nowait option #68

chu11 opened this issue Jan 20, 2024 · 16 comments

Comments

@chu11
Copy link
Member

chu11 commented Jan 20, 2024

Per offline discussion, some power control commands can take a long time given underneath "wait-until-on" and "wait-unfil-off" implementations (such as in ipmipower and redfishpower). In one case an "on" operation with redfishpower took 46 seconds. This can make using powerman very annoying to borderline intolerable.

Implementation of a --nowait option could make use of powerman much less annoying. users that use this option have to accept that things like:

$ pm --off foo1
$ pm -q foo1
on: foo1
off:

will happen b/c powerman --nowait returned before it could be verified that the off completed.

This could be mediated by changing the default "command completed successfully" output to "command issued, unconfirmed completed" or something.

@garlick
Copy link
Member

garlick commented Jan 20, 2024

Let's see if the sys admins really want this before proceeding. Those times do seem pretty annoying but I'm not sure just getting the prompt back solves anything.

@chu11
Copy link
Member Author

chu11 commented Jan 22, 2024

food for thought,

there are redfish power states called "poweringOff" and "poweringOn", which are exactly what you think they are.

We could consider making those equivalent to "off" and "on" for "wait-until" logic ...

BUT ...

if a node is in state (lets say) "poweringOff", it may not allow you to do a "power on" operation until its finally off. So I'm not sure if it's a win ... especially since "pm --on" can only respond with "command completed successfully". It would always given the appearance "pm --on" is broken.

random idea, could powerman have another set of plugstates? i.e.

> pm -q
on: ...
off: ...
powering on: ...
powering down: ...
unknown: ...

????

@garlick
Copy link
Member

garlick commented Jan 23, 2024

Hmm, interesting idea. Wait, what happens if you query powerman while the "device" is busy carrying out an on or off command? Does it fail immediately with "device busy" or similar?

If that's true then adding pm --nowait (as I was proposing anyway) would leave the user unable to manually poll for status, and moreover it ties up the power control system for the cluster if there's only one "device" (instance of redfishpower) configured.

@chu11
Copy link
Member Author

chu11 commented Jan 23, 2024

Hmm, interesting idea. Wait, what happens if you query powerman while the "device" is busy carrying out an on or off command? Does it fail immediately with "device busy" or similar?

Ahh good question. I assume you're referring to the hypothetical:

$ pm --off --nowait foonode (this normally takes very long)
$ pm -q

right now redfishpower would returning foonode: unknown b/c it's not "off" or "on" state. And pm -q would output that accordingly.

If that's true then adding pm --nowait (as I was proposing anyway) would leave the user unable to manually poll for status, and > moreover it ties up the power control system for the cluster if there's only one "device" (instance of redfishpower) configured.

Yeah, so --nowait is more just b/c somebody has to do a lot of random ons and/or offs and doesn't want to wait for each one. The benefit is a little smaller than originally envisioned. (Edit): discounting perhaps the idea I proposed in #78

@chu11
Copy link
Member Author

chu11 commented Jan 23, 2024

brainstorming with @adambertsch, --nowait would be useful, b/c there are times just don't want to sit and wait.

@garlick
Copy link
Member

garlick commented Jan 23, 2024

I assume you're referring to the hypothetical:

Not really hypothetical since you can be powering off some nodes while I'm querying.

right now redfishpower would returning foonode: unknown b/c it's not "off" or "on" state. And pm -q would output that accordingly.

Did you try it? Because if a power off is still in progress on that device and there is only one communication channel between powermand and redfishpower, then powerman's expect engine is still blocked waiting for a response from redifishpower and won't send redfishpower the query command. What does it do then?

@chu11
Copy link
Member Author

chu11 commented Jan 23, 2024

Did you try it? Because if a power off is still in progress on that device and there is only one communication channel between powermand and redfishpower, then powerman's expect engine is still blocked waiting for a response from redifishpower and won't send redfishpower the query command. What does it do then?

Ohh I tried that too, but figured that wasn't the case you're referring to :-)

You are correct, pm -q will hang until the pm --on is waiting to finish.

@garlick
Copy link
Member

garlick commented Jan 23, 2024

So the busy state isn't going to be too helpful while this is how things work.

@chu11
Copy link
Member Author

chu11 commented Jan 23, 2024

So the busy state isn't going to be too helpful while this is how things work.

Ahhh got it. I guess redfishpower would also need a on-nowait command or something. And that probably also means we need a on_ranged_nowait scriptlet in the device file. Or redfishpower's on would have to take an option.

@garlick
Copy link
Member

garlick commented Jan 23, 2024

But powerman needs to know when the command is done...

@chu11
Copy link
Member Author

chu11 commented Jan 23, 2024

But powerman needs to know when the command is done...

Perhaps there's an internal thing I don't get.

redfishpower> on foo1
<wait 30 seconds for on to truly be done>
foo1: ok

vs

redfishpower> on-nowait foo1
<basically return immediately>
foo1: ok

why would powerman need to know that the "on" was truly done? As far as it is concerned, the command was issued?

@garlick
Copy link
Member

garlick commented Jan 23, 2024

Because we need the powerman client to not return until the command is done. So powermand needs to know so it can end the client session.

@chu11
Copy link
Member Author

chu11 commented Jan 23, 2024

Why can't powermand end the client session after the hypothetical "on-nowait" command is issued? I guess in my mind:

$ pm --off --nowait foo1 // or hypothetically a new power command called "pm --off-nowait foo1"
<returns immediately>

$ pm -q foo1
busy: foo1

$ pm --on foo1
foo1: some error message

<wait a bit>

$ pm --on foo1
command successful

@garlick
Copy link
Member

garlick commented Jan 24, 2024

I was thinking nothing really changes with --nowait except the client doesn't wait around.

You're taking it a bit further. So basically new set of "nowait" on/off/cycle actions would be added to the dev script, without removing the old ones?

@chu11
Copy link
Member Author

chu11 commented Jan 24, 2024

You're taking it a bit further. So basically new set of "nowait" on/off/cycle actions would be added to the dev script, without removing the old ones?

Yeah, but this is mostly b/c of the inherent problem you spoke of before (powermand needs the client session to end). Not sure if there's alternate solutions.

Dunno if it'd be possible to pass a "nowait" option to "on" within the device scriptlet?

@garlick
Copy link
Member

garlick commented Jan 24, 2024

It seems like we'll incur a lot of technical debt if we try to cram that into the current powerman device design. Perhaps that needs a rethink.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants