Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-22293: upgrade from 4.13 2023-09-30 to 2023-10-28 stuck on network CO because of ovnkube-master DS preStop #1775

Closed
kai-uwe-rommel opened this issue Oct 29, 2023 · 14 comments

Comments

@kai-uwe-rommel
Copy link

Describe the bug
The upgrade is stuck at the network cluster operator. It looks like it cannot update ovnkube-master. The error message is:
Error while updating operator configuration: could not apply (apps/v1, Kind=DaemonSet) openshift-ovn-kubernetes/ovnkube-master: failed to apply / update (apps/v1, Kind=DaemonSet) openshift-ovn-kubernetes/ovnkube-master: DaemonSet.apps "ovnkube-master" is invalid: [spec.template.spec.containers[0].lifecycle.preStop: Required value: must specify a handler type, spec.template.spec.containers[1].lifecycle.preStop: Required value: must specify a handler type, spec.template.spec.containers[3].lifecycle.preStop: Required value: must specify a handler type]

Version
OKD 4.13 2023-09-30 to 2023-10-28

How reproducible
happened right now on upgrade of first cluster

Log bundle
Will create one.

@kai-uwe-rommel
Copy link
Author

Apparently, the problem is already known:
https://issues.redhat.com/browse/OCPBUGS-22293

@llomgui
Copy link

llomgui commented Oct 29, 2023

Hello,

Same error here during an upgrade from 4.13.0-0.okd-2023-09-30-084937 to 4.13.0-0.okd-2023-10-28-065448

@vrutkovs vrutkovs pinned this issue Oct 29, 2023
@madpearl

This comment was marked as duplicate.

@LorbusChris LorbusChris changed the title upgrade from 4.13 2023-09-30 to 2023-10-28 stuck on network CO because of ovnkube-master DS preStop OCPBUGS-22293: upgrade from 4.13 2023-09-30 to 2023-10-28 stuck on network CO because of ovnkube-master DS preStop Oct 30, 2023
@kai-uwe-rommel
Copy link
Author

I have heard from someone else that he upgraded the same step without problem.
But that cluster was much younger (installed in spring 2023).
While the one I reported from here was initially installed in February 2021 with OKD 4.6 ...

@kai-uwe-rommel
Copy link
Author

I compared the configs of the ovnkube-master DS of both clusters and found that this newer and unaffected cluster did not have any lifecycle.prestop branches in this. I removed these (three of these) from the ovnkube-master DS on my affected cluster and then the CNO progressed and completed the upgrade.

@llomgui
Copy link

llomgui commented Oct 30, 2023

@kai-uwe-rommel Can you provide the yaml of the unaffected cluster?

@kai-uwe-rommel
Copy link
Author

My cluster upgraded now successfully.
All you need to do is:
oc edit ds ovnkube-master -n openshift-ovn-kubernetes
and remove three occurrences of lifecycle.prestop branches in the YAML.

@kai-uwe-rommel
Copy link
Author

kai-uwe-rommel commented Oct 30, 2023

Here is a diff of this object. Remove only the three "preStop:" branches and ignore (do not touch) the other changes.

ovnkube-master-diff.txt

@llomgui
Copy link

llomgui commented Oct 30, 2023

Thank you!

I had to manually force delete "terminating" ovnkube-master pods after the DS modification to continue the update.

Still ongoing...

@alishchytovych
Copy link

@llomgui you just need to wait a bit until pods get synchronized. After that update will continue...

@kai-uwe-rommel
Copy link
Author

It does, BTW, not help to remove the lifecycle.preStop entries in advance of the upgrade. The old CNO puts them back in. One must run into the problem and fix it then.

@kai-uwe-rommel
Copy link
Author

It is quite mysterious. I just upgraded a cluster that was on the exact same version like the cluster for which I opened this issue with the problem. I checked in advance and it had the preStop entries. Nevertheless, the upgrade went through without any problem.

@alishchytovych
Copy link

It is quite mysterious. I just upgraded a cluster that was on the exact same version like the cluster for which I opened this issue with the problem. I checked in advance and it had the preStop entries. Nevertheless, the upgrade went through without any problem.

The same here. But the new cluster was installed later from the different AMIs (AWS).

@kai-uwe-rommel
Copy link
Author

Seems to be solved in later versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants