Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reboot efi systems make them unusable. #3549

Open
skycastlelily opened this issue Feb 25, 2025 · 13 comments
Open

Reboot efi systems make them unusable. #3549

skycastlelily opened this issue Feb 25, 2025 · 13 comments
Assignees
Labels
priority | should medium priority, should be included in the next release

Comments

@skycastlelily
Copy link
Collaborator

1.Reserve an efi system
2. tmt run -l reboot (--hard)
3. failed to boot into the system anymore

We need use rhts-reboot instead of reboot for the soft-reboot:
https://beaker-project.org/docs/architecture-guide/provisioning-process.html#boot-order
In hard reboot scenario, bkr system-power --action reboot will be called, and it will make the efi system
boot from network instead of the anaconda created entry,just as in soft-reboot scenario mentioned above

@pcahyna
Copy link
Collaborator

pcahyna commented Feb 25, 2025

There is a tmt-reboot command that should DTRT ( #1506 ), probably it should be used in preference to rhts-reboot.
For hard reboot, use rstrnt-prepare-reboot first and then execute the hard reboot (depending on what you need to do of course - sometimes one does not want tmt to alter the boot order as the test may have done it internally).

@happz
Copy link
Collaborator

happz commented Feb 25, 2025

I believe this is rather a question of what remote command tmt needs to run to soft reboot a guest. Apparently, reboot or shutdown -r now are not the best option with Beaker where rhts-reboot is the way to go, according to Beaker docs.

WRT hard reboot, is there a better way than bkr system-power --action reboot? Assume the machine is not responding, it's not possible to connect and run any command, no chance rstrnt-prepare-reboot.

@skycastlelily
Copy link
Collaborator Author

sometimes one does not want tmt to alter the boot order as the test may have done it internally

According to the rhts-reboot code,it will Not change the boot order,will change boot next to os-boot-entry, if that what you mean by "alter boot order", then I would say:users are pretty unlikely set the boot next other options, unless they don't want to login that system anymore.

WRT hard reboot, is there a better way than bkr system-power --action reboot?

I guess the answer may be run  bkr system-power --clear-netboot first.according to the --clear-netboot part of the system-power doc:)

@pcahyna
Copy link
Collaborator

pcahyna commented Feb 25, 2025

it will Not change the boot order,will change boot next to os-boot-entry, if that what you mean by "alter boot order"

Yes, that's what I have meant indeed, as it effectively changes the boot order temporarily (for one boot).

I would say:users are pretty unlikely set the boot next other options

Unless they test bootloader-related code - I am doing that regularly.

unless they don't want to login that system anymore

You can login to the system after changing boot order / boot next, you just need to be careful what you change them to.

@skycastlelily
Copy link
Collaborator Author

skycastlelily commented Feb 25, 2025

Unless they test bootloader-related code

yes,so --clear-netboot does not work for your case

You can login to the system after changing boot order / boot next, you just need to be careful what you change them to.

Yeah, booting from network won't remove the already installed system and the boot entry unless a new installation is performed, what I mean is that the test can be continued without manually changing the boot order , after the reboot,sorry for making you confused anyway.

@psss psss added the priority | should medium priority, should be included in the next release label Feb 26, 2025
@martinhoyer
Copy link
Collaborator

Hi @skycastlelily, looking at the code, I can confirm what @pcahyna wrote. tmt-reboot sets the next boot entry to the current one, in the same way as rhts-reboot (which is afaik calling rstrnt-prepare-reboot).

From design perspective, the hard reboot is intended to be used when the machine is unreachable/frozen, at which point setting boot order is not possible through efibootmgr. (only through out-of-band management where available).

That said, doing a hard reboot for testing purposes sounds like a valid use-case. Should it be handled within tmt though?
I'm thinking if a user is purposefully 'power-cycling' a system from a test, they might as well add rhts-reboot or the efibootmgr -n directly.

@happz WDYT? Am I missing something?

@skycastlelily
Copy link
Collaborator Author

@happz WDYT? Am I missing something?

yeah, I guess you did:)

for soft reboot: "I believe this is rather a question of what remote command tmt needs to run to soft reboot a guest"

for hard reboot:

they might as well add rhts-reboot or the efibootmgr -n directly.

"Assume the machine is not responding, it's not possible to connect and run any command, no chance rstrnt-prepare-reboot" / rhts-reboot /efibootmgr -n

@martinhoyer
Copy link
Collaborator

yeah, I guess you did:)

@skycastlelily Touché. I see what. You are talking about reboot step, while I'm talking about tmt-reboot command - Sorry :)

for soft reboot: "I believe this is rather a question of what remote command tmt needs to run to soft reboot a guest"

Define "remote command", but if it's run as part of the test, it's tmt-reboot. (as long as you don't use -e). Unless you

they might as well add rhts-reboot or the efibootmgr -n directly.

"Assume the machine is not responding, it's not possible to connect and run any command, no chance rstrnt-prepare-reboot" / rhts-reboot /efibootmgr -n

The context in that comment is that you are forcing the machine to sudden forced reboot as part of a test - i.e. you know it in advance. If it crashes unexpectedly - then it's no different than options you have in beaker/rstraint.

Will look into how the reboot step works tomorrow though. I presume it is dependent on provisioning plugins.

@pcahyna
Copy link
Collaborator

pcahyna commented Mar 4, 2025

To me it seems that the semantics of "hard reboot" is a bit unclear and this is the source of confusion. It seems to mean "reboot without going though the usual reboot procedure", but this could be used when the machine has locked up/is not responding, as well as being an expected part of the test (some tests will surely need to check whether something behaves properly during a system crash or power outage). The two cases are different. For the latter, tmt-reboot is not useful, but rstrnt-prepare-reboot is. For the former, no in-band command can help.

@happz
Copy link
Collaborator

happz commented Mar 4, 2025

To me it seems that the semantics of "hard reboot" is a bit unclear and this is the source of confusion. It seems to mean "reboot without going though the usual reboot procedure", but this could be used when the machine has locked up/is not responding, as well as being an expected part of the test (some tests will surely need to check whether something behaves properly during a system crash or power outage). The two cases are different. For the latter, tmt-reboot is not useful, but rstrnt-prepare-reboot is. For the former, no in-band command can help.

The current semantics of "hard reboot" in tmt is the former: it is an action taken by tmt when the guest becomes unresponsive, and "hard reboot" is an action taken by the watchdog check to possibly restore the guest. Think kernel crash followed by flipping a power switch. When tmt detects this condition, tmt code tries hard to not touch the guest via channels that require the guest's cooperation. tmt shall not run any commands on the guest, even though the guest might become responsive again before the hard reboot is triggered - once it's marked as "dead", hard reboot triggered by tmt follows.

For the latter, i.e. a test that wishes to validate some behavior after a crash, we would need to establish some expectations first, although the combination of the watchdog check and the crash might be good enough already: a test could enable the watchdog check, cause kernel panic, and after some failed attempts to reconnect, hard reboot followed by restarted test should give the test a chance to verify things post-crash.

@martinhoyer
Copy link
Collaborator

we would need to establish some expectations

Imho this is the crucial part.
Looks like we are mixing apples, oranges and plenty of other fruits here :)

Maybe let's start with - Is there something that can be done in beaker that cannot be done as easily with tmt?

@skycastlelily
Copy link
Collaborator Author

skycastlelily commented Mar 5, 2025

You are talking about reboot step

Np, I found the issue when run tmt run -l reboot, but it may also affect reboot step.And by remote command,(at least) I mean the command running on the guest to trigger the reboot, we will have rhts-reboot on all the provisioned beaker servers, and we will have reboot /shutdown -r on all the plugins support soft-reboot.

I presume it is dependent on provisioning plugins.

yeah,kind of, however, please note,this issue is for beaker plugin.

In conclusion, you could just set a "rhts-reboot" for command, the soft-reboot issue part would be fixed:)

For hard reboot part, I vote for what others say:)

@happz
Copy link
Collaborator

happz commented Mar 5, 2025

Splitting into smaller issues, focusing on just one problem of the bigger bag of trouble around reboot, should help. If there is a tangible change we can begin with, we should, probably, to make the rest of the bag smaller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority | should medium priority, should be included in the next release
Projects
None yet
Development

No branches or pull requests

5 participants