Skip to content

Commit 81aa90c

Browse files
authored
Add Node Health Check (#62)
1 parent ce24eaa commit 81aa90c

File tree

3 files changed

+90
-0
lines changed

3 files changed

+90
-0
lines changed

.mdl_style.rb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,4 @@
1212
exclude_rule 'MD033'
1313
exclude_rule 'MD007'
1414
rule 'MD003', :style => :atx
15+
rule 'MD029', :style => :ordered
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
---
2+
title: Node Health Check
3+
linktitle: Node Health Check
4+
description: OpenShift Virtualization - Fencing and VM High Availability Guide
5+
tags: ["kubevirt","ocp-v","cnv"]
6+
---
7+
# Node Health Check
8+
9+
## Resources
10+
11+
* [OpenShift Virtualization - Fencing and VM High Availability Guide](https://access.redhat.com/articles/7057929)
12+
13+
## Installation & configuration
14+
15+
* Install Operator "Node Health Check Operator"
16+
17+
### Start operator for worker nodes
18+
19+
``` { .yaml .annotate }
20+
apiVersion: remediation.medik8s.io/v1alpha1
21+
kind: NodeHealthCheck
22+
metadata:
23+
name: worker-availability
24+
spec:
25+
minHealthy: 51%
26+
remediationTemplate:
27+
apiVersion: self-node-remediation.medik8s.io/v1alpha1
28+
kind: SelfNodeRemediationTemplate
29+
name: self-node-remediation-automatic-strategy-template
30+
namespace: openshift-workload-availability
31+
selector:
32+
matchExpressions:
33+
- key: node-role.kubernetes.io/worker
34+
operator: Exists
35+
values: []
36+
unhealthyConditions:
37+
- duration: 1s # (1)!
38+
status: 'False'
39+
type: Ready
40+
- duration: 1s # (2)!
41+
status: Unknown
42+
type: Ready
43+
```
44+
45+
1. Change the seconds to achieve the fasted VM recovery, according to [OpenShift Virtualization - Fencing and VM High Availability Guide](https://access.redhat.com/articles/7057929#test-results-9)
46+
2. Change the seconds to achieve the fasted VM recovery, according to [OpenShift Virtualization - Fencing and VM High Availability Guide](https://access.redhat.com/articles/7057929#test-results-9)
47+
48+
### Update `self-node-remediation-automatic-strategy-template`
49+
50+
``` { .yaml .hl_lines="13" .annotate }
51+
apiVersion: self-node-remediation.medik8s.io/v1alpha1
52+
kind: SelfNodeRemediationTemplate
53+
metadata:
54+
annotations:
55+
remediation.medik8s.io/multiple-templates-support: "true"
56+
labels:
57+
remediation.medik8s.io/default-template: "true"
58+
name: self-node-remediation-automatic-strategy-template
59+
namespace: openshift-workload-availability
60+
spec:
61+
template:
62+
spec:
63+
remediationStrategy: OutOfServiceTaint # (1)!
64+
```
65+
66+
1. Default is "Automatic", but I want a predictable behavor. [Offical documentation](https://docs.redhat.com/en/documentation/workload_availability_for_red_hat_openshift/23.2/html-single/remediation_fencing_and_maintenance/index#about-self-node-remediation-operator_self-node-remediation-operator-remediate-nodes)
67+
68+
```bash
69+
$ oc explain SelfNodeRemediationTemplate.spec.template.spec.remediationStrategy
70+
GROUP: self-node-remediation.medik8s.io
71+
KIND: SelfNodeRemediationTemplate
72+
VERSION: v1alpha1
73+
74+
FIELD: remediationStrategy <string>
75+
76+
DESCRIPTION:
77+
RemediationStrategy is the remediation method for unhealthy nodes.
78+
Currently, it could be either "Automatic", "OutOfServiceTaint" or
79+
"ResourceDeletion".
80+
ResourceDeletion will iterate over all pods and VolumeAttachment related to
81+
the unhealthy node and delete them.
82+
OutOfServiceTaint will add the out-of-service taint which is a new
83+
well-known taint "node.kubernetes.io/out-of-service"
84+
that enables automatic deletion of pv-attached pods on failed nodes,
85+
"out-of-service" taint is only supported on clusters with k8s version 1.26+
86+
or OCP/OKD version 4.13+.
87+
Automatic will choose the most appropriate strategy during runtime.
88+
```

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -236,6 +236,7 @@ nav:
236236

237237
- OpenShift Virtualization:
238238
- kubevirt/index.md
239+
- Node Health Check: kubevirt/node-health-check.md
239240
- Templates: kubevirt/template.md
240241
- Ansible: kubevirt/ansible/README.md
241242
- Networking: kubevirt/networking.md

0 commit comments

Comments
 (0)