Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use sysrq-trigger for software reboot #136

Merged

Conversation

mshitrit
Copy link
Member

@mshitrit mshitrit commented Jul 16, 2023

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 16, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@mshitrit
Copy link
Member Author

/test 4.13-openshift-e2e

@mshitrit
Copy link
Member Author

/retest

@mshitrit mshitrit changed the title [WIP] use sysrq-trigger for software reboot use sysrq-trigger for software reboot Jul 17, 2023
@mshitrit
Copy link
Member Author

/retest

@@ -82,7 +82,7 @@ func (r *watchdogRebooter) Reboot() error {
func (r *watchdogRebooter) softwareReboot() error {
r.log.Info("about to try software reboot")
// hostPID: true and privileged:true required to run this
rebootCmd := exec.Command("/usr/bin/nsenter", "-m/proc/1/ns/mnt", "/bin/systemctl", "reboot", "--force", "--force")
rebootCmd := exec.Command("bash", "-c", "echo c > /proc/sysrq-trigger")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'c' command triggers a crashdump. If kdump is not enabled in the os, it will be stuck.
So I think the 'b' is fine to do emergency restart.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm far from being a linux expert but I found the following (see below) here

b | Will immediately reboot the system without syncing or unmounting your disks.
c | Will perform a system crash by a NULL pointer dereference. A crashdump will be taken if configured.

IIUC it's mandatory to configure the crashdump.

Also in that context I'm not sure what is a difference between a reboot and a system crash (i.e whether the successful execution of one of them is more reliable )

Copy link
Contributor

@k-keiichi-rh k-keiichi-rh Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated my understanding to the latest RHEL9 code(5.14.0-284.23.1.el9_2 used by OCP4.13.6).

Please let me share the ways of software reboot(reboot, softdog reset, panic() in kernel and so on):
We have the following ways to software reboot. It's listed in order of reliability from top to bottom:

  1. softdog reset
  2. echo b > /proc/sysrq-trigger
  3. systemctl reboot -f -f(current)
  4. echo c > /proc/sysrq-trigger & enabling kdump("systemctl reboot -f" after collecting a dump)

Compared with 1 and 2, both of them will reboot with the same way(do the best effort to reboot in kernel space, it's designed to be able to execute in IRQ context), but the "2" needs to write 'c' to /proc/sysrq-trigger in user space. So the "1" is more reliable than the "2"

Compared with 2 and 3, the "3" is a safer way to reboot than the reboot way of "1" and "2" because it performs a clean reboot like hw resetting and so on in the kernel space. However the reboot way of "1" and "2" is more reliable because it focusses on rebooting without HW resetting, not protecting server.

Compared with 3 and 4, the "4" will reboot with "systemctl reboot -f" after collecting a memory dump. This way is a safer way than the reboot way of "3" as described in the man. But it seems like there is almost no difference between 3 and 4 compared to the "1" and "2".

Copy link
Contributor

@k-keiichi-rh k-keiichi-rh Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have another option:

  1. echo c > /proc/sysrq-trigger & "kernel.panic = < secs before rebooting> " in sysctl.conf & disabling kdump

Without enabling kdump, we have to wait forever after "echo c > /proc/sysrq-trigger".
However the /proc/sys/kernel/panic will help us to change this situation. If the timeout value is set to the /proc/sys/kernel/panic, the node will be rebooted with the same reboot way of "1" and "2" after waiting for the timeout seconds.
We can reboot the node with the panic messages although the additional configuration to sysctl.conf is required.

To summarize:

1. softdog reset
2. echo b > /proc/sysrq-trigger
3. systemctl reboot -f -f(current)
4. echo c > /proc/sysrq-trigger & enabling kdump("systemctl reboot -f" after collecting a dump)
5. echo c > /proc/sysrq-trigger & "kernel.panic = <secs before rebooting>" in sysctl.conf & disabling kdump

IMO, we can replace the current method("3") with the "2".

And the "4" or "5" may be a good option if users can choose.
However, I think HW watchdog for baremetal envs and softdog for general envs(virtual envs) will cover the most of use cases. I am not sure if collecting debug information is required when a failover to software reboot occurs.

Reliability : we can reboot
1 >>> 2 > 5 > 3 > 4
Traceability : debugging and troubleshooting
4 >>> 5 > 3 > 2 > 1
Safety : protecting servers
4 > 3 > 5, 2, 1
Generality : less configuration and less maintenance
2, 3 >>> 5 > 1 > 4

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @k-keiichi-rh ,

your research and effort is much appreciated ! 🥇
Based on that I agree with your recommendation, and I've made the changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, would it make sense to use more commands before the reboot?

From https://www.kernel.org/doc/html/v4.18/admin-guide/sysrq.html#okay-so-what-can-i-use-them-for:

I generally sync(s), umount(u), then reboot(b) when my system locks. It’s saved me many a fsck

Or from https://www.thegeekstuff.com/2008/12/safe-reboot-of-linux-using-magic-sysrq-key/:

To perform a safe reboot of a Linux computer which hangs up, do the following. This will avoid the fsck during the next re-booting. i.e Press Alt+SysRq+letter highlighted below.

unRaw (take control of keyboard back from X11,
tErminate (send SIGTERM to all processes, allowing them to terminate gracefully),
kIll (send SIGILL to all processes, forcing them to terminate immediately),
Sync (flush data to disk),
Unmount (remount all filesystems read-only),
reBoot.

Copy link
Contributor

@k-keiichi-rh k-keiichi-rh Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this means "b" is the best option for us, not "c" like in the updated code? 🤔

Yes, I think "b" is the best option in this case.

why does this work at all on OCP? I understand you can configure what sysrq is allowed to do in /proc/sys/kernel/sysrq, and on my current cluster it has value 16, which doesn't include reboots according to https://www.kernel.org/doc/html/v4.18/admin-guide/sysrq.html#how-do-i-enable-the-magic-sysrq-key.

Because this checking(/proc/sys/kernel/sysrq) you mentioned is bypassed when using /proc/sysrq-trigger.
It's for using the magic sysrq key in our keyboard. And there is no way for /proc/sysrq-trigger to restrict triggering some actions like "b" and "c".

Please refer to https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/sysrq.rst for more information.
This is the doc for the latest kernel, but I confirmed we can say same thing for the RHEL8 and RHEL9 kernels.

Note that the value of /proc/sys/kernel/sysrq influences only the invocation via a keyboard. Invocation of any operation via /proc/sysrq-trigger is always allowed (by a user with admin privileges).

Copy link
Contributor

@k-keiichi-rh k-keiichi-rh Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, would it make sense to use more commands before the reboot?

If I understand correctly, I think one of the high priority things for SNR is to reset the failed node to schedule workloads on the failed node to another node.
So we don't need more commands like sync(s), umount(u) for safe reboot via /proc/sysrq-trigger.
These safe reboot commands don't help us to improve reliability for rebooting the node.

And either the softdog reset or the HW watchdog reset won't do anything to save the server.
I think it's difficult to imagine that soft reboot as a failover action needs the safe reboot commands.

Copy link
Contributor

@k-keiichi-rh k-keiichi-rh Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may have a reason to use the more safe reboot commands of /proc/sysrq-trigger in terms of the compatibility with "reboot -f -f" in the current implementation...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this checking(/proc/sys/kernel/sysrq) you mentioned is bypassed when using /proc/sysrq-trigger.

Ok 👍🏼

These safe reboot commands don't help us to improve reliability for rebooting the node.

Makes sense.

Thanks for clarifying!

@mshitrit
Copy link
Member Author

/test 4.13-openshift-e2e

@razo7
Copy link
Member

razo7 commented Jul 23, 2023

/lgtm
Holding so @k-keiichi-rh can have more time for another review (very interesting research and findings)
/hold

@@ -82,7 +82,7 @@ func (r *watchdogRebooter) Reboot() error {
func (r *watchdogRebooter) softwareReboot() error {
r.log.Info("about to try software reboot")
// hostPID: true and privileged:true required to run this
rebootCmd := exec.Command("/usr/bin/nsenter", "-m/proc/1/ns/mnt", "/bin/systemctl", "reboot", "--force", "--force")
rebootCmd := exec.Command("bash", "-b", "echo c > /proc/sysrq-trigger")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the full path to bash?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe it needs to be done inside an nsenter context?

Suggested change
rebootCmd := exec.Command("bash", "-b", "echo c > /proc/sysrq-trigger")
rebootCmd := exec.Command("/usr/bin/nsenter", "-m/proc/1/ns/mnt", "bash", "-b", "echo c > /proc/sysrq-trigger")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe it needs to be done inside an nsenter context?

Not an expert on nsenter , but from the little reading I've done my understanding is that the mnt option would create an isolated namespace for the command to run in.

Since in any case we plan to reboot the whole machine is there a value in that separation which I'm missing ? (my initial intuition is that it's just an unneeded overhead)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is that the mnt option would create an isolated namespace for the command to run in.

no, you don't create that namespace, you enter it. With -m/proc/1/ns/mnt you enter the "mount" namespace of the process with PID 1. That's the host' s very first process. This gives you access to binaries and files which are installed on the host but not in the container you are running in, like /bin/systemctl in the old version, or /bin/bash and /proc/sysrq-trigger in your new version. That's what I understand at least. So while bash might be installed in the container as well, I'm not sure if/how /proc/sysrq-trigger works in containers...

@slintes
Copy link
Member

slintes commented Jul 24, 2023

/lgtm cancel

there are comments which need an answer at least

…led on the host and are required for the reboot action.

Signed-off-by: Michael Shitrit <[email protected]>
@mshitrit
Copy link
Member Author

/test 4.13-openshift-e2e

@@ -82,7 +82,7 @@ func (r *watchdogRebooter) Reboot() error {
func (r *watchdogRebooter) softwareReboot() error {
r.log.Info("about to try software reboot")
// hostPID: true and privileged:true required to run this
rebootCmd := exec.Command("/usr/bin/nsenter", "-m/proc/1/ns/mnt", "/bin/systemctl", "reboot", "--force", "--force")
rebootCmd := exec.Command("/usr/bin/nsenter", "-m/proc/1/ns/mnt", "bash", "-b", "echo c > /proc/sysrq-trigger")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be "/bin/bash", "-c", "echo b..."

@mshitrit
Copy link
Member Author

/test 4.13-openshift-e2e

1 similar comment
@mshitrit
Copy link
Member Author

/test 4.13-openshift-e2e

@mshitrit
Copy link
Member Author

/test 4.12-openshift-e2e
/test 4.13-openshift-e2e
/test 4.14-openshift-e2e

@mshitrit
Copy link
Member Author

/test 4.13-openshift-e2e
/test 4.14-openshift-e2e

2 similar comments
@mshitrit
Copy link
Member Author

/test 4.13-openshift-e2e
/test 4.14-openshift-e2e

@mshitrit
Copy link
Member Author

/test 4.13-openshift-e2e
/test 4.14-openshift-e2e

@mshitrit
Copy link
Member Author

/test 4.13-openshift-e2e

@mshitrit
Copy link
Member Author

/test 4.12-openshift-e2e
/test 4.13-openshift-e2e

@mshitrit
Copy link
Member Author

/test 4.13-openshift-e2e
/test 4.14-openshift-e2e

@openshift-ci openshift-ci bot added the lgtm label Aug 1, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 1, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mshitrit, slintes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@slintes slintes marked this pull request as ready for review August 1, 2023 07:20
@slintes
Copy link
Member

slintes commented Aug 1, 2023

/hold cancel

@slintes
Copy link
Member

slintes commented Aug 1, 2023

/retest

2 similar comments
@razo7
Copy link
Member

razo7 commented Aug 1, 2023

/retest

@mshitrit
Copy link
Member Author

mshitrit commented Aug 1, 2023

/retest

@openshift-merge-robot openshift-merge-robot merged commit 1fc3f14 into medik8s:main Aug 1, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants