Enhance K8s upgrade validation #109

ipetrov117 · 2024-11-22T15:30:13Z

Current status

We are validating if the K8s upgrade has succeeded by looking at whether the node is shedulable, in a Ready state and with the desired K8s version. If the validations passes we proceed to the workload upgrades.

Problem

While the node is marked as Ready, Schedulable and has the correct version, its underlying core applications are still being recreated. At the same time the upgrade-controller proceeds to do workload upgrades.

This results in the workload upgrades possibly failing for a number of reasons, for example:

A workload upgrade needs network access , which cannot currently be given, because the core application for networking is being recreated.
A workload upgrade needs ingress access, which cannot currently be given, because the core ingress controller application is being recreated.

(1) and (2) are just examples, there can be a number of different possible clashes. These failures can persist through our retry mechanism and could result in the controller marking the upgrade as failed, while it actually succeeded, but just took significantly more time to complete (because of the core application recreate).

Conclusion

We need to enhance the K8s upgrade validation to also validate the state of the K8s core components before marking the K8s upgrade as complete.

The text was updated successfully, but these errors were encountered:

ipetrov117 · 2024-11-26T14:08:25Z

The most achievable approach here would be to wait for all helm-controller managed HelmCharts to have completed upgrade Jobs before proceeding to the upgrade-controller initiated chart upgrade. This would ensure that no clashes between upgrades happen.

ipetrov117 self-assigned this Nov 26, 2024

ipetrov117 mentioned this issue Dec 10, 2024

K8s core component validation #116

Merged

5 tasks

ipetrov117 closed this as completed in #116 Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance K8s upgrade validation #109

Enhance K8s upgrade validation #109

ipetrov117 commented Nov 22, 2024 •

edited

Loading

ipetrov117 commented Nov 26, 2024 •

edited

Loading

Enhance K8s upgrade validation #109

Enhance K8s upgrade validation #109

Comments

ipetrov117 commented Nov 22, 2024 • edited Loading

Current status

Problem

Conclusion

ipetrov117 commented Nov 26, 2024 • edited Loading

ipetrov117 commented Nov 22, 2024 •

edited

Loading

ipetrov117 commented Nov 26, 2024 •

edited

Loading