use sysrq-trigger for software reboot #136

mshitrit · 2023-07-16T12:06:47Z

Signed-off-by: Michael Shitrit <[email protected]>

openshift-ci · 2023-07-16T12:06:52Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

mshitrit · 2023-07-16T12:07:13Z

/test 4.13-openshift-e2e

mshitrit · 2023-07-16T17:23:06Z

/retest

mshitrit · 2023-07-17T07:27:23Z

/retest

k-keiichi-rh · 2023-07-19T23:49:54Z

pkg/reboot/rebooter.go

@@ -82,7 +82,7 @@ func (r *watchdogRebooter) Reboot() error {
 func (r *watchdogRebooter) softwareReboot() error {
 	r.log.Info("about to try software reboot")
 	// hostPID: true and privileged:true required to run this
-	rebootCmd := exec.Command("/usr/bin/nsenter", "-m/proc/1/ns/mnt", "/bin/systemctl", "reboot", "--force", "--force")
+	rebootCmd := exec.Command("bash", "-c", "echo c > /proc/sysrq-trigger")


The 'c' command triggers a crashdump. If kdump is not enabled in the os, it will be stuck.
So I think the 'b' is fine to do emergency restart.

I'm far from being a linux expert but I found the following (see below) here

b | Will immediately reboot the system without syncing or unmounting your disks. c | Will perform a system crash by a NULL pointer dereference. A crashdump will be taken if configured.

IIUC it's mandatory to configure the crashdump.

Also in that context I'm not sure what is a difference between a reboot and a system crash (i.e whether the successful execution of one of them is more reliable )

I updated my understanding to the latest RHEL9 code(5.14.0-284.23.1.el9_2 used by OCP4.13.6).

Please let me share the ways of software reboot(reboot, softdog reset, panic() in kernel and so on):
We have the following ways to software reboot. It's listed in order of reliability from top to bottom:

softdog reset

echo b > /proc/sysrq-trigger

systemctl reboot -f -f(current)

echo c > /proc/sysrq-trigger & enabling kdump("systemctl reboot -f" after collecting a dump)

Compared with 1 and 2, both of them will reboot with the same way(do the best effort to reboot in kernel space, it's designed to be able to execute in IRQ context), but the "2" needs to write 'c' to /proc/sysrq-trigger in user space. So the "1" is more reliable than the "2"

Compared with 2 and 3, the "3" is a safer way to reboot than the reboot way of "1" and "2" because it performs a clean reboot like hw resetting and so on in the kernel space. However the reboot way of "1" and "2" is more reliable because it focusses on rebooting without HW resetting, not protecting server.

Compared with 3 and 4, the "4" will reboot with "systemctl reboot -f" after collecting a memory dump. This way is a safer way than the reboot way of "3" as described in the man. But it seems like there is almost no difference between 3 and 4 compared to the "1" and "2".

We have another option:

echo c > /proc/sysrq-trigger & "kernel.panic = < secs before rebooting> " in sysctl.conf & disabling kdump

Without enabling kdump, we have to wait forever after "echo c > /proc/sysrq-trigger".
However the /proc/sys/kernel/panic will help us to change this situation. If the timeout value is set to the /proc/sys/kernel/panic, the node will be rebooted with the same reboot way of "1" and "2" after waiting for the timeout seconds.
We can reboot the node with the panic messages although the additional configuration to sysctl.conf is required.

To summarize:

1. softdog reset 2. echo b > /proc/sysrq-trigger 3. systemctl reboot -f -f(current) 4. echo c > /proc/sysrq-trigger & enabling kdump("systemctl reboot -f" after collecting a dump) 5. echo c > /proc/sysrq-trigger & "kernel.panic = <secs before rebooting>" in sysctl.conf & disabling kdump

IMO, we can replace the current method("3") with the "2".

And the "4" or "5" may be a good option if users can choose.
However, I think HW watchdog for baremetal envs and softdog for general envs(virtual envs) will cover the most of use cases. I am not sure if collecting debug information is required when a failover to software reboot occurs.

Reliability : we can reboot
1 >>> 2 > 5 > 3 > 4
Traceability : debugging and troubleshooting
4 >>> 5 > 3 > 2 > 1
Safety : protecting servers
4 > 3 > 5, 2, 1
Generality : less configuration and less maintenance
2, 3 >>> 5 > 1 > 4

Thanks @k-keiichi-rh ,

your research and effort is much appreciated ! 🥇
Based on that I agree with your recommendation, and I've made the changes.

Also, would it make sense to use more commands before the reboot?

From https://www.kernel.org/doc/html/v4.18/admin-guide/sysrq.html#okay-so-what-can-i-use-them-for:

I generally sync(s), umount(u), then reboot(b) when my system locks. It’s saved me many a fsck

Or from https://www.thegeekstuff.com/2008/12/safe-reboot-of-linux-using-magic-sysrq-key/:

To perform a safe reboot of a Linux computer which hangs up, do the following. This will avoid the fsck during the next re-booting. i.e Press Alt+SysRq+letter highlighted below.

~~unRaw (take control of keyboard back from X11,~~
tErminate (send SIGTERM to all processes, allowing them to terminate gracefully),
kIll (send SIGILL to all processes, forcing them to terminate immediately),
Sync (flush data to disk),
Unmount (remount all filesystems read-only),
reBoot.

I understand this means "b" is the best option for us, not "c" like in the updated code? 🤔

Yes, I think "b" is the best option in this case.

why does this work at all on OCP? I understand you can configure what sysrq is allowed to do in /proc/sys/kernel/sysrq, and on my current cluster it has value 16, which doesn't include reboots according to https://www.kernel.org/doc/html/v4.18/admin-guide/sysrq.html#how-do-i-enable-the-magic-sysrq-key.

Because this checking(/proc/sys/kernel/sysrq) you mentioned is bypassed when using /proc/sysrq-trigger.
It's for using the magic sysrq key in our keyboard. And there is no way for /proc/sysrq-trigger to restrict triggering some actions like "b" and "c".

Please refer to https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/sysrq.rst for more information.
This is the doc for the latest kernel, but I confirmed we can say same thing for the RHEL8 and RHEL9 kernels.

Note that the value of /proc/sys/kernel/sysrq influences only the invocation via a keyboard. Invocation of any operation via /proc/sysrq-trigger is always allowed (by a user with admin privileges).

Also, would it make sense to use more commands before the reboot?

If I understand correctly, I think one of the high priority things for SNR is to reset the failed node to schedule workloads on the failed node to another node.
So we don't need more commands like sync(s), umount(u) for safe reboot via /proc/sysrq-trigger.
These safe reboot commands don't help us to improve reliability for rebooting the node.

And either the softdog reset or the HW watchdog reset won't do anything to save the server.
I think it's difficult to imagine that soft reboot as a failover action needs the safe reboot commands.

We may have a reason to use the more safe reboot commands of /proc/sysrq-trigger in terms of the compatibility with "reboot -f -f" in the current implementation...

Because this checking(/proc/sys/kernel/sysrq) you mentioned is bypassed when using /proc/sysrq-trigger.

Ok 👍🏼

These safe reboot commands don't help us to improve reliability for rebooting the node.

Makes sense.

Thanks for clarifying!

Signed-off-by: Michael Shitrit <[email protected]>

mshitrit · 2023-07-23T09:31:19Z

/test 4.13-openshift-e2e

razo7 · 2023-07-23T12:12:17Z

/lgtm
Holding so @k-keiichi-rh can have more time for another review (very interesting research and findings)
/hold

beekhof · 2023-07-24T03:25:26Z

pkg/reboot/rebooter.go

@@ -82,7 +82,7 @@ func (r *watchdogRebooter) Reboot() error {
 func (r *watchdogRebooter) softwareReboot() error {
 	r.log.Info("about to try software reboot")
 	// hostPID: true and privileged:true required to run this
-	rebootCmd := exec.Command("/usr/bin/nsenter", "-m/proc/1/ns/mnt", "/bin/systemctl", "reboot", "--force", "--force")
+	rebootCmd := exec.Command("bash", "-b", "echo c > /proc/sysrq-trigger")


do we need the full path to bash?

And maybe it needs to be done inside an nsenter context?

Suggested change

rebootCmd := exec.Command("bash", "-b", "echo c > /proc/sysrq-trigger")

rebootCmd := exec.Command("/usr/bin/nsenter", "-m/proc/1/ns/mnt", "bash", "-b", "echo c > /proc/sysrq-trigger")

And maybe it needs to be done inside an nsenter context?

Not an expert on nsenter , but from the little reading I've done my understanding is that the mnt option would create an isolated namespace for the command to run in.

Since in any case we plan to reboot the whole machine is there a value in that separation which I'm missing ? (my initial intuition is that it's just an unneeded overhead)

my understanding is that the mnt option would create an isolated namespace for the command to run in.

no, you don't create that namespace, you enter it. With -m/proc/1/ns/mnt you enter the "mount" namespace of the process with PID 1. That's the host' s very first process. This gives you access to binaries and files which are installed on the host but not in the container you are running in, like /bin/systemctl in the old version, or /bin/bash and /proc/sysrq-trigger in your new version. That's what I understand at least. So while bash might be installed in the container as well, I'm not sure if/how /proc/sysrq-trigger works in containers...

slintes · 2023-07-24T17:16:30Z

/lgtm cancel

there are comments which need an answer at least

…led on the host and are required for the reboot action. Signed-off-by: Michael Shitrit <[email protected]>

mshitrit · 2023-07-25T11:52:32Z

/test 4.13-openshift-e2e

slintes · 2023-07-25T12:40:10Z

pkg/reboot/rebooter.go

@@ -82,7 +82,7 @@ func (r *watchdogRebooter) Reboot() error {
 func (r *watchdogRebooter) softwareReboot() error {
 	r.log.Info("about to try software reboot")
 	// hostPID: true and privileged:true required to run this
-	rebootCmd := exec.Command("/usr/bin/nsenter", "-m/proc/1/ns/mnt", "/bin/systemctl", "reboot", "--force", "--force")
+	rebootCmd := exec.Command("/usr/bin/nsenter", "-m/proc/1/ns/mnt", "bash", "-b", "echo c > /proc/sysrq-trigger")


I think it should be "/bin/bash", "-c", "echo b..."

Signed-off-by: Michael Shitrit <[email protected]>

mshitrit · 2023-07-26T12:02:01Z

/test 4.13-openshift-e2e

mshitrit · 2023-07-27T08:25:33Z

/test 4.13-openshift-e2e

mshitrit · 2023-07-30T11:39:25Z

/test 4.12-openshift-e2e
/test 4.13-openshift-e2e
/test 4.14-openshift-e2e

mshitrit · 2023-07-30T17:32:04Z

/test 4.13-openshift-e2e
/test 4.14-openshift-e2e

mshitrit · 2023-07-31T04:27:20Z

/test 4.13-openshift-e2e
/test 4.14-openshift-e2e

mshitrit · 2023-07-31T07:14:21Z

/test 4.13-openshift-e2e
/test 4.14-openshift-e2e

mshitrit · 2023-07-31T10:13:06Z

/test 4.13-openshift-e2e

Signed-off-by: Michael Shitrit <[email protected]>

mshitrit · 2023-07-31T12:19:51Z

/test 4.12-openshift-e2e
/test 4.13-openshift-e2e

mshitrit · 2023-07-31T15:54:28Z

/test 4.13-openshift-e2e
/test 4.14-openshift-e2e

openshift-ci · 2023-08-01T07:20:05Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mshitrit, slintes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mshitrit,slintes]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

slintes · 2023-08-01T07:20:53Z

/hold cancel

slintes · 2023-08-01T09:24:04Z

/retest

razo7 · 2023-08-01T11:23:58Z

/retest

mshitrit · 2023-08-01T13:10:43Z

/retest

use sysrq-trigger for software reboot

322836d

Signed-off-by: Michael Shitrit <[email protected]>

openshift-ci bot added the do-not-merge/work-in-progress label Jul 16, 2023

openshift-ci bot added the approved label Jul 16, 2023

mshitrit changed the title ~~[WIP] use sysrq-trigger for software reboot~~ use sysrq-trigger for software reboot Jul 17, 2023

k-keiichi-rh reviewed Jul 19, 2023

View reviewed changes

changing from b to c (c includes unneeded kdump)

e05bac4

Signed-off-by: Michael Shitrit <[email protected]>

openshift-ci bot added the do-not-merge/hold label Jul 23, 2023

openshift-ci bot assigned razo7 Jul 23, 2023

openshift-ci bot added the lgtm label Jul 23, 2023

beekhof reviewed Jul 24, 2023

View reviewed changes

openshift-ci bot assigned slintes Jul 24, 2023

openshift-ci bot removed the lgtm label Jul 24, 2023

use nsenter to make sure we have access to relevant binaries instal…

5534bd4

…led on the host and are required for the reboot action. Signed-off-by: Michael Shitrit <[email protected]>

slintes reviewed Jul 25, 2023

View reviewed changes

use full bash path, correct how b option is used

97c85d5

Signed-off-by: Michael Shitrit <[email protected]>

reduce e2e test flakiness based on cluster resources

5118acb

Signed-off-by: Michael Shitrit <[email protected]>

slintes approved these changes Aug 1, 2023

View reviewed changes

openshift-ci bot added the lgtm label Aug 1, 2023

slintes marked this pull request as ready for review August 1, 2023 07:20

openshift-ci bot removed the do-not-merge/work-in-progress label Aug 1, 2023

openshift-ci bot requested review from clobrano and razo7 August 1, 2023 07:21

openshift-ci bot removed the do-not-merge/hold label Aug 1, 2023

openshift-merge-robot merged commit 1fc3f14 into medik8s:main Aug 1, 2023
1 check passed

	rebootCmd := exec.Command("bash", "-b", "echo c > /proc/sysrq-trigger")
	rebootCmd := exec.Command("/usr/bin/nsenter", "-m/proc/1/ns/mnt", "bash", "-b", "echo c > /proc/sysrq-trigger")

use sysrq-trigger for software reboot #136

use sysrq-trigger for software reboot #136

Conversation

mshitrit commented Jul 16, 2023 • edited by openshift-ci bot Loading

openshift-ci bot commented Jul 16, 2023

mshitrit commented Jul 16, 2023

mshitrit commented Jul 16, 2023

mshitrit commented Jul 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k-keiichi-rh Jul 21, 2023 • edited Loading

Choose a reason for hiding this comment

k-keiichi-rh Jul 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k-keiichi-rh Jul 24, 2023 • edited Loading

Choose a reason for hiding this comment

k-keiichi-rh Jul 24, 2023 • edited Loading

Choose a reason for hiding this comment

k-keiichi-rh Jul 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mshitrit commented Jul 23, 2023

razo7 commented Jul 23, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

slintes commented Jul 24, 2023

mshitrit commented Jul 25, 2023

Choose a reason for hiding this comment

mshitrit commented Jul 26, 2023

mshitrit commented Jul 27, 2023

mshitrit commented Jul 30, 2023

mshitrit commented Jul 30, 2023

mshitrit commented Jul 31, 2023

mshitrit commented Jul 31, 2023

mshitrit commented Jul 31, 2023

mshitrit commented Jul 31, 2023

mshitrit commented Jul 31, 2023

openshift-ci bot commented Aug 1, 2023

slintes commented Aug 1, 2023

slintes commented Aug 1, 2023

razo7 commented Aug 1, 2023

mshitrit commented Aug 1, 2023

mshitrit commented Jul 16, 2023 •

edited by openshift-ci bot

Loading

k-keiichi-rh Jul 21, 2023 •

edited

Loading

k-keiichi-rh Jul 21, 2023 •

edited

Loading

k-keiichi-rh Jul 24, 2023 •

edited

Loading

k-keiichi-rh Jul 24, 2023 •

edited

Loading

k-keiichi-rh Jul 24, 2023 •

edited

Loading