[prometheus-operator-admission-webhook] job-patchWebhook runs too late #5607

TheRealNoob · 2025-05-05T20:57:35Z

Describe the bug a clear and concise description of what the bug is.

When using the default jobs.enabled=true to create the TLS key-pairs, the job-patchWebhook runs too late in the release life-cycle. If you deploy any prometheusRules as part of the release, they will fail with the error unknown Certificate Authority causing the helm release to error out. At present the patch job is applied with "helm.sh/hook": post-install,post-upgrade which means after all other resources are applied.

What are the scenarios where a prometheusRule could be deployed with the release?

This chart has native support for it in prometheusRule.enabled, but it is disabled by default
If this chart is used as part of another chart (IE subchart), and one of the other charts deploys prometheusRule objects

Proposed solution

I think it makes the most sense to run job-patchWebhook using "helm.sh/hook": pre-install. This is what the job-createSecret uses. The only thing we need to ensure is that the patch runs after the create and we can do that with helm.sh/hook-weight: <int>.

What's your helm version?

version.BuildInfo{Version:"v3.12.0", GitCommit:"c9f554d75773799f72ceef38c51210f1842a1dea", GitTreeState:"clean", GoVersion:"go1.20.3"}

What's your kubectl version?

mike@EliteBook-840:~$ k version Client Version: v1.30.11 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.31.7-eks-bcf3d70

Which chart?

prometheus-operator-admission-webhook

What's the chart version?

0.24.1

What happened?

helm release fails

What you expected to happen?

caBundle needs to be defined before prometheusRule objects are applied

How to reproduce it?

NA

Enter the changed values of values.yaml?

prometheusRule.enabled=true

Enter the command that you execute and failing/misfunctioning.

helm install prometheus-operator-admission-webhook prometheus-community/prometheus-operator-admission-webhook --set prometheusRule.enabled=true

Anything else we need to know?

Nope!

The text was updated successfully, but these errors were encountered:

TheRealNoob · 2025-05-05T21:02:32Z

cc @zeritti

zeritti · 2025-05-05T22:20:48Z

Thanks, @TheRealNoob, for raising this issue. This is indeed a known problem. Once a mutating/validation webhook configuration is present, the API server will contact the admission webhook just after a PrometheusRule has been created. At an install time, this will fail (assuming a custom CA is being used) as the patch job has not run as yet and the configuration has not been patched with the CA bundle.

I used to work around this problem with two steps when installing - 1. install a release with the PrometheusRule disabled and 2. upgrade the release with the rule enabled. I always preferred using cert-manager wherever possible, though.

Using hook weight would be elegant. In this case, though, the patch job has to patch an existing resource - if the job runs as a pre-install hook at the install time, the resource, i.e. a validating/mutating webhook configuration is not present and the job is going to fail.

TheRealNoob · 2025-05-06T15:52:32Z

Interesting problem....I spent some time thinking about the different approaches. The cleanest solution I came up with was to include the validating/mutating webhookconfiguration objects in the presync. And it turns out someone's already added support for just that in kube-prometheus-stack.

It's possible that this could cause errors if someone is applying prometheusRule/Alertmanager objects in another parallel release, but I'm chalking this up as an acceptable risk to get a clean release.

Thoughts?

TheRealNoob · 2025-05-07T00:57:28Z

So I updated the MR with the proposed changes above. I tested it out on minikube and I found that it does work, but a new problem arose. When helm gets to applying the prometheusRule objects, apiserver makes the call to the operator, and the request times out because the operator is still in startup.

TheRealNoob · 2025-05-07T01:06:12Z

Not sure how to address that issue. Helm doesn't have any concept of forced sleep unless you go crazy with a Job that runs a sleep command. Even so, I still consider the above change an improvement. At least now when we re-run the release with the same values file it works.

We use ArgoCD at my dayjob, not sure how familiar you are with it. If Argo does a release and it fails, it tries it again 4 more times before marking it as failed. I figure at least in this setup it will now succeed one of those times.

TheRealNoob · 2025-05-07T03:49:16Z

Come to think of it, the chart probably has the same issue in the certManager approach

zeritti · 2025-05-07T13:25:04Z

So I updated the MR with the proposed changes above. I tested it out on minikube and I found that it does work, but a new problem arose. When helm gets to applying the prometheusRule objects, apiserver makes the call to the operator, and the request times out because the operator is still in startup.

Yes, this is what happens, indeed. It will be ok in the end if the install gets retried.

The thing with making the configurations into hooks is that as hooks they get deleted, i.e. there will be a moment on upgrades with the configurations being absent. I also think that the configurations won't be deleted with a deletion of the release as they are not part of the release. These points should also be considered.

TheRealNoob · 2025-05-07T15:16:56Z

Sorry, I'm struggling to follow. Are you say that the ValidatingWebhookConfiguration and MutatingWebhookConfiguration objects get deleted and recreated during a chart upgrade? Why would that be? I can't even image there being changes to those objects

TheRealNoob · 2025-05-07T15:19:01Z

Perhaps you're referring to the "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded annotation? I did not include this annotations on the *WebhookConfiguration objects.

zeritti · 2025-05-07T18:48:30Z

Perhaps you're referring to the "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded annotation? I did not include this annotations on the *WebhookConfiguration objects.

Yes, this is what I meant. If you do not set it, the default applies (Ref.):

If no hook deletion policy annotation is specified, the before-hook-creation behavior applies by default.

TheRealNoob · 2025-05-07T18:51:53Z

Ah I see, I didn't know that. I assumed it defaulted to blank. So we need to define it as blank then. I'll give this a try and make sure this works. Assuming it does, you're onboard with this approach?

TheRealNoob added the bug Something isn't working label May 5, 2025

TheRealNoob linked a pull request May 5, 2025 that will close this issue

[prometheus-operator-admission-webhook] reorder job-patchwebhook #5608

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[prometheus-operator-admission-webhook] job-patchWebhook runs too late #5607

[prometheus-operator-admission-webhook] job-patchWebhook runs too late #5607

TheRealNoob commented May 5, 2025

TheRealNoob commented May 5, 2025

Uh oh!

zeritti commented May 5, 2025

Uh oh!

TheRealNoob commented May 6, 2025

Uh oh!

TheRealNoob commented May 7, 2025

Uh oh!

TheRealNoob commented May 7, 2025

Uh oh!

TheRealNoob commented May 7, 2025

Uh oh!

zeritti commented May 7, 2025

Uh oh!

TheRealNoob commented May 7, 2025

Uh oh!

TheRealNoob commented May 7, 2025

Uh oh!

zeritti commented May 7, 2025

Uh oh!

TheRealNoob commented May 7, 2025

Uh oh!

[prometheus-operator-admission-webhook] job-patchWebhook runs too late #5607

[prometheus-operator-admission-webhook] job-patchWebhook runs too late #5607

Comments

TheRealNoob commented May 5, 2025

Describe the bug a clear and concise description of what the bug is.

What's your helm version?

What's your kubectl version?

Which chart?

What's the chart version?

What happened?

What you expected to happen?

How to reproduce it?

Enter the changed values of values.yaml?

Enter the command that you execute and failing/misfunctioning.

Anything else we need to know?

TheRealNoob commented May 5, 2025

Uh oh!

zeritti commented May 5, 2025

Uh oh!

TheRealNoob commented May 6, 2025

Uh oh!

TheRealNoob commented May 7, 2025

Uh oh!

TheRealNoob commented May 7, 2025

Uh oh!

TheRealNoob commented May 7, 2025

Uh oh!

zeritti commented May 7, 2025

Uh oh!

TheRealNoob commented May 7, 2025

Uh oh!

TheRealNoob commented May 7, 2025

Uh oh!

zeritti commented May 7, 2025

Uh oh!

TheRealNoob commented May 7, 2025

Uh oh!