Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Eviction Timeout Field for Maintenance Time #122

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ Follow the instructions [here](https://sdk.operatorframework.io/docs/building-op

To set maintenance on a node a `NodeMaintenance` custom resource should be created.
The `NodeMaintenance` CR spec contains:

- evictionTimeout: The timeout for pods eviction by drain/delete before giving up. Zero means infinite, and the default value is 30s.
- nodeName: The name of the node which will be put into maintenance mode.
- reason: The reason why the node will be under maintenance.

Expand All @@ -58,6 +60,7 @@ kind: NodeMaintenance
metadata:
name: nodemaintenance-sample
spec:
evictionTimeout: "30s"
nodeName: node02
reason: "Test node maintenance"

Expand Down
12 changes: 10 additions & 2 deletions api/v1beta1/nodemaintenance_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,18 @@ type NodeMaintenanceSpec struct {
// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
// Important: Run "make" to regenerate code after modifying this file

// Node name to apply maintanance on/off
// EvictionTimeout is the timeout for pods eviction by drain/delete before giving up
// Zero means infinite
// Valid time units are "ms", "s", "m", "h".
// +kubebuilder:default:="30s"
// +kubebuilder:validation:Pattern="^(0|([0-9]+(\\.[0-9]+)?(ms|s|m|h)))$"
// +kubebuilder:validation:Type:=string
//+operator-sdk:csv:customresourcedefinitions:type=spec
EvictionTimeout metav1.Duration `json:"evictionTimeout,omitempty"`
// Node name to apply maintenance on/off
//+operator-sdk:csv:customresourcedefinitions:type=spec
NodeName string `json:"nodeName"`
// Reason for maintanance
// Reason for maintenance
//+operator-sdk:csv:customresourcedefinitions:type=spec
Reason string `json:"reason,omitempty"`
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ metadata:
"name": "nodemaintenance-sample"
},
"spec": {
"evictionTimeout": "30s",
"nodeName": "node02",
"reason": "Test node maintenance"
}
Expand Down Expand Up @@ -43,10 +44,15 @@ spec:
name: nodemaintenances
version: v1beta1
specDescriptors:
- description: Node name to apply maintanance on/off
- description: EvictionTimeout is the timeout for pods eviction by drain/delete
before giving up Zero means infinite Valid time units are "ms", "s", "m",
"h".
displayName: Eviction Timeout
path: evictionTimeout
- description: Node name to apply maintenance on/off
displayName: Node Name
path: nodeName
- description: Reason for maintanance
- description: Reason for maintenance
displayName: Reason
path: reason
statusDescriptors:
Expand Down
12 changes: 10 additions & 2 deletions bundle/manifests/nodemaintenance.medik8s.io_nodemaintenances.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,19 @@ spec:
spec:
description: NodeMaintenanceSpec defines the desired state of NodeMaintenance
properties:
evictionTimeout:
default: 30s
description: |-
EvictionTimeout is the timeout for pods eviction by drain/delete before giving up
Zero means infinite
Valid time units are "ms", "s", "m", "h".
pattern: ^(0|([0-9]+(\.[0-9]+)?(ms|s|m|h)))$
type: string
nodeName:
description: Node name to apply maintanance on/off
description: Node name to apply maintenance on/off
type: string
reason:
description: Reason for maintanance
description: Reason for maintenance
type: string
required:
- nodeName
Expand Down
12 changes: 10 additions & 2 deletions config/crd/bases/nodemaintenance.medik8s.io_nodemaintenances.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,11 +41,19 @@ spec:
spec:
description: NodeMaintenanceSpec defines the desired state of NodeMaintenance
properties:
evictionTimeout:
default: 30s
description: |-
EvictionTimeout is the timeout for pods eviction by drain/delete before giving up
Zero means infinite
Valid time units are "ms", "s", "m", "h".
pattern: ^(0|([0-9]+(\.[0-9]+)?(ms|s|m|h)))$
type: string
nodeName:
description: Node name to apply maintanance on/off
description: Node name to apply maintenance on/off
type: string
reason:
description: Reason for maintanance
description: Reason for maintenance
type: string
required:
- nodeName
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,15 @@ spec:
name: nodemaintenances
version: v1beta1
specDescriptors:
- description: Node name to apply maintanance on/off
- description: EvictionTimeout is the timeout for pods eviction by drain/delete
before giving up Zero means infinite Valid time units are "ms", "s", "m",
"h".
displayName: Eviction Timeout
path: evictionTimeout
- description: Node name to apply maintenance on/off
displayName: Node Name
path: nodeName
- description: Reason for maintanance
- description: Reason for maintenance
displayName: Reason
path: reason
statusDescriptors:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,6 @@ kind: NodeMaintenance
metadata:
name: nodemaintenance-sample
spec:
evictionTimeout: "30s"
nodeName: node02
reason: "Test node maintenance"
4 changes: 3 additions & 1 deletion controllers/controllers_suite_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import (
"context"
"path/filepath"
"testing"
"time"

"github.com/medik8s/common/pkg/lease"
. "github.com/onsi/ginkgo/v2"
Expand Down Expand Up @@ -95,7 +96,8 @@ var _ = BeforeSuite(func() {
logger: ctrl.Log.WithName("unit test"),
}
ctx, cancel = context.WithCancel(ctrl.SetupSignalHandler())
drainer, err = createDrainer(ctx, cfg)
evictionTimeout := time.Duration(30)
drainer, err = createDrainer(ctx, evictionTimeout, cfg)
Expect(err).NotTo(HaveOccurred())
// in test pods are not evicted, so don't wait forever for them
drainer.SkipWaitForDeleteTimeoutSeconds = 0
Expand Down
8 changes: 3 additions & 5 deletions controllers/nodemaintenance_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,6 @@ const (
//lease consts
LeaseHolderIdentity = "node-maintenance"
LeaseDuration = 3600 * time.Second
DrainerTimeout = 30 * time.Second
)

// NodeMaintenanceReconciler reconciles a NodeMaintenance object
Expand Down Expand Up @@ -118,7 +117,7 @@ func (r *NodeMaintenanceReconciler) Reconcile(ctx context.Context, req ctrl.Requ
}

// Add finalizer when object is created
drainer, err := createDrainer(ctx, r.MgrConfig)
drainer, err := createDrainer(ctx, nm.Spec.EvictionTimeout.Duration, r.MgrConfig)
if err != nil {
return emptyResult, err
}
Expand Down Expand Up @@ -229,7 +228,7 @@ func (r *NodeMaintenanceReconciler) Reconcile(ctx context.Context, req ctrl.Requ
}

// createDrainer creates a drain.Helper struct for external cordon and drain API
func createDrainer(ctx context.Context, mgrConfig *rest.Config) (*drain.Helper, error) {
func createDrainer(ctx context.Context, evictionTimeout time.Duration, mgrConfig *rest.Config) (*drain.Helper, error) {
drainer := &drain.Helper{}

//Continue even if there are pods not managed by a ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet.
Expand All @@ -254,9 +253,8 @@ func createDrainer(ctx context.Context, mgrConfig *rest.Config) (*drain.Helper,
//Period of time in seconds given to each pod to terminate gracefully. If negative, the default value specified in the pod will be used.
drainer.GracePeriodSeconds = -1

// TODO - add logical value or attach from the maintenance CR
//The length of time to wait before giving up, zero means infinite
drainer.Timeout = DrainerTimeout
drainer.Timeout = evictionTimeout

cs, err := kubernetes.NewForConfig(mgrConfig)
if err != nil {
Expand Down
2 changes: 1 addition & 1 deletion controllers/taint.go
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ func AddOrRemoveTaint(clientset kubernetes.Interface, add bool, node *corev1.Nod
return err
}
taintStr = "remove"
log.Infof("Maintenance taints will be removed from node %s", node.Name)
log.Infof("Maintenance taints will be removed from node %s", node.Name)
patch = fmt.Sprintf(`{ "op": "replace", "path": "/spec/taints", "value": %s }`, string(removeTaints))
}

Expand Down
11 changes: 7 additions & 4 deletions test/e2e/e2e_suite_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,15 @@ import (
"fmt"
"os"
"testing"
"time"

. "github.com/onsi/ginkgo/v2"
"github.com/onsi/ginkgo/v2/reporters"
. "github.com/onsi/gomega"

corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/errors"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

"github.com/medik8s/node-maintenance-operator/api/v1beta1"
)
Expand All @@ -41,8 +42,10 @@ var (
// The ns for test deployments
testNsName string
testNamespace *corev1.Namespace
//namespace leases are created in
// namespace leases are created in
leaseNs = "medik8s-leases"
// default eviction timeout (30s)
evicitonTimeout = time.Second * 50
)

var _ = BeforeSuite(func() {
Expand All @@ -52,7 +55,7 @@ var _ = BeforeSuite(func() {
testNsName = os.Getenv("TEST_NAMESPACE")
Expect(testNsName).ToNot(BeEmpty(), "TEST_NAMESPACE env var not set, can't start e2e test")
testNamespace = &corev1.Namespace{
ObjectMeta: v1.ObjectMeta{
ObjectMeta: metav1.ObjectMeta{
Name: testNsName,
},
}
Expand All @@ -66,7 +69,7 @@ var _ = BeforeSuite(func() {
Expect(err).ToNot(HaveOccurred())

// wait until webhooks are up and running by trying to create a CR and ignoring unexpected errors
testCR := getNodeMaintenance("webhook-test", "some-not-existing-node-name")
testCR := getNodeMaintenance("webhook-test", "some-not-existing-node-name", evicitonTimeout)
_ = createCRIgnoreUnrelatedErrors(testCR)
})

Expand Down
Loading
Loading