Skip to content

kops 1.31 - kops update cluster instance-group-role targeting causes side-effects #17294

@mkoepke-xion

Description

@mkoepke-xion

/kind bug

1. What kops version are you running? The command kops version, will display
this information.

1.31.0

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

1.31.6

4. What commands did you run? What is the simplest way to reproduce this issue?

kops update cluster --instance-group-role=control-plane --yes
kops rolling-update cluster --instance-group-role=control-plane --yes
kops update cluster --instance-group-role=node --yes
kops rolling-update cluster --instance-group-role=node --yes
kops rolling-update cluster
NAME		STATUS		NEEDUPDATE	READY	MIN	TARGET	MAX	NODES
bastion-zone-01	Ready		0		1	1	1	2	0
master-zone-01	Ready		1		1	1	1	1	1
master-zone-02	Ready		1		1	1	1	1	1
master-zone-03	Ready		1		1	1	1	1	1
node-zone-01	NeedsUpdate	0		3	3	3	6	3
node-zone-02	NeedsUpdate	0		3	3	3	6	3
node-zone-03	NeedsUpdate	0		3	3	3	6	3

kops update cluster --instance-group-role=control-plane

I0228 11:25:46.077157    2426 executor.go:113] Tasks: 0 done / 123 total; 53 can run
I0228 11:25:47.337779    2426 executor.go:113] Tasks: 53 done / 123 total; 45 can run
I0228 11:25:47.642155    2426 executor.go:113] Tasks: 98 done / 123 total; 9 can run
I0228 11:25:47.960137    2426 executor.go:113] Tasks: 107 done / 123 total; 2 can run
I0228 11:25:48.263454    2426 executor.go:113] Tasks: 109 done / 123 total; 5 can run
I0228 11:25:49.489246    2426 executor.go:113] Tasks: 114 done / 123 total; 6 can run
I0228 11:25:49.936979    2426 executor.go:113] Tasks: 120 done / 123 total; 3 can run
I0228 11:25:51.164291    2426 executor.go:113] Tasks: 123 done / 123 total; 0 can run
Will modify resources:
  ManagedFile/xi-paas-staging.xiaas2.k8s.local-addons-bootstrap
  	Contents
  	                    	...
  	                    	    - id: k8s-1.16
  	                    	      manifest: networking.cilium.io/k8s-1.16-v1.15.yaml
  	                    	+     manifestHash: e36e05e73bb68c69546064c016187d99c57bbe154167a58555bc93d16844604a
  	                    	-     manifestHash: 9133199d404c1951330e82c5fb81441e512f6dbb64c31f9e537f9a8595a1565b
  	                    	      name: networking.cilium.io
  	                    	      needsRollingUpdate: all
  	                    	...


  ManagedFile/xi-paas-staging.xiaas2.k8s.local-addons-networking.cilium.io-k8s-1.16
  	Contents
  	                    	...
  	                    	    namespace: kube-system
  	                    	  spec:
  	                    	+   replicas: 2
  	                    	-   replicas: 1
  	                    	    selector:
  	                    	      matchLabels:
  	                    	...

5. What happened after the commands executed?

control-plane nodes where marked as NeedsUpdate.

6. What did you expect to happen?

no node in the cluster needs an update as we did a complete update of the cluster with update and rolling-update targeting all nodes.

9. Anything else do we need to know?

We are running cilium and an HA control plane. Using kops update --instance-group-role=nodes seems to cause the templating for cilium to think the control-plane is not HA and thus change the replica settings.

Probably in this code:

if tf.HasHighlyAvailableControlPlane() {

I understand that the introduction of --instance-group-role targeting for kops update was introduced in 1.31 due to the upstream changes in Kubernetes and is a fairly recent addition to solve a specific issue (enable the reconcile targeting of control-plane first).

We have some automation around kops for our own updates. We do use instance-group and instance-group-role targeting to roll out our change in a controlled manner.

Comparing our (1.31 adjusted) workflow to kOps reconcile yields on simple difference:

  • reconcile does update control-plane and than update all
  • we do update control-plane and than update nodes

While it is easy to fix our side to work like reconcile, it makes me wonder:
What is the expectation from kOps side here? Is it expected for kops update --instance-group-role to work without side-effects with just node role or is it not expected to just target nodes?

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions