|
| 1 | +--- |
| 2 | +title: Node Health Check |
| 3 | +linktitle: Node Health Check |
| 4 | +description: OpenShift Virtualization - Fencing and VM High Availability Guide |
| 5 | +tags: ["kubevirt","ocp-v","cnv"] |
| 6 | +--- |
| 7 | +# Node Health Check |
| 8 | + |
| 9 | +## Resources |
| 10 | + |
| 11 | +* [OpenShift Virtualization - Fencing and VM High Availability Guide](https://access.redhat.com/articles/7057929) |
| 12 | + |
| 13 | +## Installation & configuration |
| 14 | + |
| 15 | +* Install Operator "Node Health Check Operator" |
| 16 | + |
| 17 | +### Start operator for worker nodes |
| 18 | + |
| 19 | +``` { .yaml .annotate } |
| 20 | +apiVersion: remediation.medik8s.io/v1alpha1 |
| 21 | +kind: NodeHealthCheck |
| 22 | +metadata: |
| 23 | + name: worker-availability |
| 24 | +spec: |
| 25 | + minHealthy: 51% |
| 26 | + remediationTemplate: |
| 27 | + apiVersion: self-node-remediation.medik8s.io/v1alpha1 |
| 28 | + kind: SelfNodeRemediationTemplate |
| 29 | + name: self-node-remediation-automatic-strategy-template |
| 30 | + namespace: openshift-workload-availability |
| 31 | + selector: |
| 32 | + matchExpressions: |
| 33 | + - key: node-role.kubernetes.io/worker |
| 34 | + operator: Exists |
| 35 | + values: [] |
| 36 | + unhealthyConditions: |
| 37 | + - duration: 1s # (1)! |
| 38 | + status: 'False' |
| 39 | + type: Ready |
| 40 | + - duration: 1s # (2)! |
| 41 | + status: Unknown |
| 42 | + type: Ready |
| 43 | +``` |
| 44 | + |
| 45 | +1. Change the seconds to achieve the fasted VM recovery, according to [OpenShift Virtualization - Fencing and VM High Availability Guide](https://access.redhat.com/articles/7057929#test-results-9) |
| 46 | +2. Change the seconds to achieve the fasted VM recovery, according to [OpenShift Virtualization - Fencing and VM High Availability Guide](https://access.redhat.com/articles/7057929#test-results-9) |
| 47 | + |
| 48 | +### Update `self-node-remediation-automatic-strategy-template` |
| 49 | + |
| 50 | +``` { .yaml .hl_lines="13" .annotate } |
| 51 | +apiVersion: self-node-remediation.medik8s.io/v1alpha1 |
| 52 | +kind: SelfNodeRemediationTemplate |
| 53 | +metadata: |
| 54 | + annotations: |
| 55 | + remediation.medik8s.io/multiple-templates-support: "true" |
| 56 | + labels: |
| 57 | + remediation.medik8s.io/default-template: "true" |
| 58 | + name: self-node-remediation-automatic-strategy-template |
| 59 | + namespace: openshift-workload-availability |
| 60 | +spec: |
| 61 | + template: |
| 62 | + spec: |
| 63 | + remediationStrategy: OutOfServiceTaint # (1)! |
| 64 | +``` |
| 65 | + |
| 66 | +1. Default is "Automatic", but I want a predictable behavor. [Offical documentation](https://docs.redhat.com/en/documentation/workload_availability_for_red_hat_openshift/23.2/html-single/remediation_fencing_and_maintenance/index#about-self-node-remediation-operator_self-node-remediation-operator-remediate-nodes) |
| 67 | + |
| 68 | + ```bash |
| 69 | + $ oc explain SelfNodeRemediationTemplate.spec.template.spec.remediationStrategy |
| 70 | + GROUP: self-node-remediation.medik8s.io |
| 71 | + KIND: SelfNodeRemediationTemplate |
| 72 | + VERSION: v1alpha1 |
| 73 | + |
| 74 | + FIELD: remediationStrategy <string> |
| 75 | + |
| 76 | + DESCRIPTION: |
| 77 | + RemediationStrategy is the remediation method for unhealthy nodes. |
| 78 | + Currently, it could be either "Automatic", "OutOfServiceTaint" or |
| 79 | + "ResourceDeletion". |
| 80 | + ResourceDeletion will iterate over all pods and VolumeAttachment related to |
| 81 | + the unhealthy node and delete them. |
| 82 | + OutOfServiceTaint will add the out-of-service taint which is a new |
| 83 | + well-known taint "node.kubernetes.io/out-of-service" |
| 84 | + that enables automatic deletion of pv-attached pods on failed nodes, |
| 85 | + "out-of-service" taint is only supported on clusters with k8s version 1.26+ |
| 86 | + or OCP/OKD version 4.13+. |
| 87 | + Automatic will choose the most appropriate strategy during runtime. |
| 88 | + ``` |
0 commit comments