-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use sysrq-trigger for software reboot #136
use sysrq-trigger for software reboot #136
Conversation
Signed-off-by: Michael Shitrit <[email protected]>
Skipping CI for Draft Pull Request. |
/test 4.13-openshift-e2e |
/retest |
/retest |
pkg/reboot/rebooter.go
Outdated
@@ -82,7 +82,7 @@ func (r *watchdogRebooter) Reboot() error { | |||
func (r *watchdogRebooter) softwareReboot() error { | |||
r.log.Info("about to try software reboot") | |||
// hostPID: true and privileged:true required to run this | |||
rebootCmd := exec.Command("/usr/bin/nsenter", "-m/proc/1/ns/mnt", "/bin/systemctl", "reboot", "--force", "--force") | |||
rebootCmd := exec.Command("bash", "-c", "echo c > /proc/sysrq-trigger") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 'c' command triggers a crashdump. If kdump is not enabled in the os, it will be stuck.
So I think the 'b' is fine to do emergency restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm far from being a linux expert but I found the following (see below) here
b | Will immediately reboot the system without syncing or unmounting your disks.
c | Will perform a system crash by a NULL pointer dereference. A crashdump will be taken if configured.
IIUC it's mandatory to configure the crashdump.
Also in that context I'm not sure what is a difference between a reboot
and a system crash
(i.e whether the successful execution of one of them is more reliable )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated my understanding to the latest RHEL9 code(5.14.0-284.23.1.el9_2 used by OCP4.13.6).
Please let me share the ways of software reboot(reboot, softdog reset, panic() in kernel and so on):
We have the following ways to software reboot. It's listed in order of reliability from top to bottom:
- softdog reset
- echo b > /proc/sysrq-trigger
- systemctl reboot -f -f(current)
- echo c > /proc/sysrq-trigger & enabling kdump("systemctl reboot -f" after collecting a dump)
Compared with 1 and 2, both of them will reboot with the same way(do the best effort to reboot in kernel space, it's designed to be able to execute in IRQ context), but the "2" needs to write 'c' to /proc/sysrq-trigger in user space. So the "1" is more reliable than the "2"
Compared with 2 and 3, the "3" is a safer way to reboot than the reboot way of "1" and "2" because it performs a clean reboot like hw resetting and so on in the kernel space. However the reboot way of "1" and "2" is more reliable because it focusses on rebooting without HW resetting, not protecting server.
Compared with 3 and 4, the "4" will reboot with "systemctl reboot -f" after collecting a memory dump. This way is a safer way than the reboot way of "3" as described in the man. But it seems like there is almost no difference between 3 and 4 compared to the "1" and "2".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have another option:
- echo c > /proc/sysrq-trigger & "kernel.panic = < secs before rebooting> " in sysctl.conf & disabling kdump
Without enabling kdump, we have to wait forever after "echo c > /proc/sysrq-trigger".
However the /proc/sys/kernel/panic will help us to change this situation. If the timeout value is set to the /proc/sys/kernel/panic, the node will be rebooted with the same reboot way of "1" and "2" after waiting for the timeout seconds.
We can reboot the node with the panic messages although the additional configuration to sysctl.conf is required.
To summarize:
1. softdog reset
2. echo b > /proc/sysrq-trigger
3. systemctl reboot -f -f(current)
4. echo c > /proc/sysrq-trigger & enabling kdump("systemctl reboot -f" after collecting a dump)
5. echo c > /proc/sysrq-trigger & "kernel.panic = <secs before rebooting>" in sysctl.conf & disabling kdump
IMO, we can replace the current method("3") with the "2".
And the "4" or "5" may be a good option if users can choose.
However, I think HW watchdog for baremetal envs and softdog for general envs(virtual envs) will cover the most of use cases. I am not sure if collecting debug information is required when a failover to software reboot occurs.
Reliability : we can reboot
1 >>> 2 > 5 > 3 > 4
Traceability : debugging and troubleshooting
4 >>> 5 > 3 > 2 > 1
Safety : protecting servers
4 > 3 > 5, 2, 1
Generality : less configuration and less maintenance
2, 3 >>> 5 > 1 > 4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @k-keiichi-rh ,
your research and effort is much appreciated ! 🥇
Based on that I agree with your recommendation, and I've made the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, would it make sense to use more commands before the reboot?
From https://www.kernel.org/doc/html/v4.18/admin-guide/sysrq.html#okay-so-what-can-i-use-them-for:
I generally sync(s), umount(u), then reboot(b) when my system locks. It’s saved me many a fsck
Or from https://www.thegeekstuff.com/2008/12/safe-reboot-of-linux-using-magic-sysrq-key/:
To perform a safe reboot of a Linux computer which hangs up, do the following. This will avoid the fsck during the next re-booting. i.e Press Alt+SysRq+letter highlighted below.
unRaw (take control of keyboard back from X11,
tErminate (send SIGTERM to all processes, allowing them to terminate gracefully),
kIll (send SIGILL to all processes, forcing them to terminate immediately),
Sync (flush data to disk),
Unmount (remount all filesystems read-only),
reBoot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this means "b" is the best option for us, not "c" like in the updated code? 🤔
Yes, I think "b" is the best option in this case.
why does this work at all on OCP? I understand you can configure what sysrq is allowed to do in /proc/sys/kernel/sysrq, and on my current cluster it has value 16, which doesn't include reboots according to https://www.kernel.org/doc/html/v4.18/admin-guide/sysrq.html#how-do-i-enable-the-magic-sysrq-key.
Because this checking(/proc/sys/kernel/sysrq) you mentioned is bypassed when using /proc/sysrq-trigger.
It's for using the magic sysrq key in our keyboard. And there is no way for /proc/sysrq-trigger to restrict triggering some actions like "b" and "c".
Please refer to https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/sysrq.rst for more information.
This is the doc for the latest kernel, but I confirmed we can say same thing for the RHEL8 and RHEL9 kernels.
Note that the value of /proc/sys/kernel/sysrq influences only the invocation via a keyboard. Invocation of any operation via /proc/sysrq-trigger is always allowed (by a user with admin privileges).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, would it make sense to use more commands before the reboot?
If I understand correctly, I think one of the high priority things for SNR is to reset the failed node to schedule workloads on the failed node to another node.
So we don't need more commands like sync(s), umount(u) for safe reboot via /proc/sysrq-trigger.
These safe reboot commands don't help us to improve reliability for rebooting the node.
And either the softdog reset or the HW watchdog reset won't do anything to save the server.
I think it's difficult to imagine that soft reboot as a failover action needs the safe reboot commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may have a reason to use the more safe reboot commands of /proc/sysrq-trigger in terms of the compatibility with "reboot -f -f" in the current implementation...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this checking(/proc/sys/kernel/sysrq) you mentioned is bypassed when using /proc/sysrq-trigger.
Ok 👍🏼
These safe reboot commands don't help us to improve reliability for rebooting the node.
Makes sense.
Thanks for clarifying!
Signed-off-by: Michael Shitrit <[email protected]>
/test 4.13-openshift-e2e |
/lgtm |
pkg/reboot/rebooter.go
Outdated
@@ -82,7 +82,7 @@ func (r *watchdogRebooter) Reboot() error { | |||
func (r *watchdogRebooter) softwareReboot() error { | |||
r.log.Info("about to try software reboot") | |||
// hostPID: true and privileged:true required to run this | |||
rebootCmd := exec.Command("/usr/bin/nsenter", "-m/proc/1/ns/mnt", "/bin/systemctl", "reboot", "--force", "--force") | |||
rebootCmd := exec.Command("bash", "-b", "echo c > /proc/sysrq-trigger") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need the full path to bash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe it needs to be done inside an nsenter
context?
rebootCmd := exec.Command("bash", "-b", "echo c > /proc/sysrq-trigger") | |
rebootCmd := exec.Command("/usr/bin/nsenter", "-m/proc/1/ns/mnt", "bash", "-b", "echo c > /proc/sysrq-trigger") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe it needs to be done inside an nsenter context?
Not an expert on nsenter
, but from the little reading I've done my understanding is that the mnt
option would create an isolated namespace for the command to run in.
Since in any case we plan to reboot the whole machine is there a value in that separation which I'm missing ? (my initial intuition is that it's just an unneeded overhead)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my understanding is that the mnt option would create an isolated namespace for the command to run in.
no, you don't create that namespace, you enter it. With -m/proc/1/ns/mnt
you enter the "mount" namespace of the process with PID 1
. That's the host' s very first process. This gives you access to binaries and files which are installed on the host but not in the container you are running in, like /bin/systemctl
in the old version, or /bin/bash
and /proc/sysrq-trigger
in your new version. That's what I understand at least. So while bash might be installed in the container as well, I'm not sure if/how /proc/sysrq-trigger
works in containers...
/lgtm cancel there are comments which need an answer at least |
…led on the host and are required for the reboot action. Signed-off-by: Michael Shitrit <[email protected]>
/test 4.13-openshift-e2e |
pkg/reboot/rebooter.go
Outdated
@@ -82,7 +82,7 @@ func (r *watchdogRebooter) Reboot() error { | |||
func (r *watchdogRebooter) softwareReboot() error { | |||
r.log.Info("about to try software reboot") | |||
// hostPID: true and privileged:true required to run this | |||
rebootCmd := exec.Command("/usr/bin/nsenter", "-m/proc/1/ns/mnt", "/bin/systemctl", "reboot", "--force", "--force") | |||
rebootCmd := exec.Command("/usr/bin/nsenter", "-m/proc/1/ns/mnt", "bash", "-b", "echo c > /proc/sysrq-trigger") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be "/bin/bash", "-c", "echo b..."
Signed-off-by: Michael Shitrit <[email protected]>
/test 4.13-openshift-e2e |
1 similar comment
/test 4.13-openshift-e2e |
/test 4.12-openshift-e2e |
/test 4.13-openshift-e2e |
2 similar comments
/test 4.13-openshift-e2e |
/test 4.13-openshift-e2e |
/test 4.13-openshift-e2e |
Signed-off-by: Michael Shitrit <[email protected]>
/test 4.12-openshift-e2e |
/test 4.13-openshift-e2e |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mshitrit, slintes The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
/retest |
2 similar comments
/retest |
/retest |
ECOPROJECT-1457