Helm upgrade is often duplicated, causing issues with Jobs #2869

mikmatko · 2024-09-18T07:06:00Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

Helm upgrade is seemingly called twice on every change or force update. This seems to occur most of the time, but not always.

Logs from a downstream cluster fleet-agent-0 pod:

Working scenario, Helm deployment is called only once:

{"level":"info","ts":"2024-09-18T06:36:20Z","logger":"bundledeployment.HelmDeployer.install","msg":"Upgrading helm release","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"7c7639d9-f1ba-49c2-8785-b6e582c166a8","commit":"239b40d88e01e2db8d80eabeb891384b25e76311","dryRun":false}
{"level":"info","ts":"2024-09-18T06:36:37Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"7c7639d9-f1ba-49c2-8785-b6e582c166a8","deploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:4f69fbccb9885a091faa2d70ed56b710a70e33c7f73d381394c10b70dbdf3452","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:9190b78396800308e5260944df675a40cb47e4b1c5e2180b7e70be580d38608f","release":"mikko-debug/mikko-debug:86","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:4f69fbccb9885a091faa2d70ed56b710a70e33c7f73d381394c10b70dbdf3452"}
{"level":"info","ts":"2024-09-18T06:36:37Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"b19b7a49-3c5f-47ad-be0f-375a7ebf5d22","deploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:4f69fbccb9885a091faa2d70ed56b710a70e33c7f73d381394c10b70dbdf3452","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:4f69fbccb9885a091faa2d70ed56b710a70e33c7f73d381394c10b70dbdf3452","release":"mikko-debug/mikko-debug:86","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:4f69fbccb9885a091faa2d70ed56b710a70e33c7f73d381394c10b70dbdf3452"}

Then a bit later after pushing Force Update through Rancher UI (same occurs on a single new commit too):

{"level":"info","ts":"2024-09-18T06:38:01Z","logger":"bundledeployment.HelmDeployer.install","msg":"Upgrading helm release","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"5a20d39e-ed74-4f98-ba97-40b6b79767e9","commit":"239b40d88e01e2db8d80eabeb891384b25e76311","dryRun":false}
{"level":"info","ts":"2024-09-18T06:38:17Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"5a20d39e-ed74-4f98-ba97-40b6b79767e9","deploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:4f69fbccb9885a091faa2d70ed56b710a70e33c7f73d381394c10b70dbdf3452","release":"mikko-debug/mikko-debug:87","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba"}
{"level":"info","ts":"2024-09-18T06:38:17Z","logger":"bundledeployment.HelmDeployer.install","msg":"Upgrading helm release","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"5ae5c18b-b800-4e8b-990e-5ce2e448bb4d","commit":"239b40d88e01e2db8d80eabeb891384b25e76311","dryRun":false}
{"level":"info","ts":"2024-09-18T06:38:34Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"5ae5c18b-b800-4e8b-990e-5ce2e448bb4d","deploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:4f69fbccb9885a091faa2d70ed56b710a70e33c7f73d381394c10b70dbdf3452","release":"mikko-debug/mikko-debug:88","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba"}
{"level":"info","ts":"2024-09-18T06:38:34Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"f327af12-2651-4550-95ba-aef2a49271c4","deploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","release":"mikko-debug/mikko-debug:87","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba"}
{"level":"info","ts":"2024-09-18T06:38:34Z","logger":"bundledeployment.UpdateStatus","msg":"Status not ready","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"f327af12-2651-4550-95ba-aef2a49271c4","error":"job.batch mikko-debug/mikko-debug-job-87 missing"}
{"level":"info","ts":"2024-09-18T06:38:34Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"c80e519f-b4fd-416e-911d-d689731329da","deploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","release":"mikko-debug/mikko-debug:88","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba"}
{"level":"info","ts":"2024-09-18T06:38:34Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"47736280-52da-4da1-8114-412c7d84bf46","deploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","release":"mikko-debug/mikko-debug:87","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba"}
{"level":"info","ts":"2024-09-18T06:38:34Z","logger":"bundledeployment.UpdateStatus","msg":"Status not ready","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"47736280-52da-4da1-8114-412c7d84bf46","error":"job.batch mikko-debug/mikko-debug-job-87 missing"}
{"level":"info","ts":"2024-09-18T06:38:34Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"49bfc5f3-0320-4e93-876b-c141ee5f1f7e","deploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","release":"mikko-debug/mikko-debug:88","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba"}
{"level":"info","ts":"2024-09-18T06:38:34Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"62f0793c-b15a-4bd3-9c70-46c67c7e9a7d","deploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","release":"mikko-debug/mikko-debug:87","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba"}
{"level":"info","ts":"2024-09-18T06:38:35Z","logger":"bundledeployment.UpdateStatus","msg":"Status not ready","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"62f0793c-b15a-4bd3-9c70-46c67c7e9a7d","error":"job.batch mikko-debug/mikko-debug-job-87 missing"}
{"level":"info","ts":"2024-09-18T06:38:36Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"d03de5d9-5438-4cd8-940e-f61106942130","deploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","release":"mikko-debug/mikko-debug:87","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba"}
{"level":"info","ts":"2024-09-18T06:38:37Z","logger":"bundledeployment.UpdateStatus","msg":"Status not ready","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"d03de5d9-5438-4cd8-940e-f61106942130","error":"job.batch mikko-debug/mikko-debug-job-87 missing"}
{"level":"info","ts":"2024-09-18T06:38:38Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"97c74c52-0b6f-4023-9e40-b1c51f078e72","deploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","release":"mikko-debug/mikko-debug:87","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba"}
{"level":"info","ts":"2024-09-18T06:38:39Z","logger":"bundledeployment.UpdateStatus","msg":"Status not ready","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"97c74c52-0b6f-4023-9e40-b1c51f078e72","error":"job.batch mikko-debug/mikko-debug-job-87 missing"}

As can be seen from the logs, Helm upgrade is called twice. As a result, Fleet thinks that a Job object is suddenly missing:

job.batch mikko-debug/mikko-debug-job-87 missing

While Fleet did two Helm upgrade operations, it seems to still think that it had done only once, hence looking for an object from the previous Helm release. This leaves the Bundle in a modified state.

Expected Behavior

Helm upgrade is called only once per change.

Steps To Reproduce

I believe this issue can be seen with any chart, but it is more apparent if you have any Job in the chart. Doesn't seem to matter what options are provided in fleet.yaml etc.

Environment

- Architecture: x86
- Fleet Version: v0.10.2
- Cluster:
  - Provider: GKE
  - Options:
  - Kubernetes Version: v1.30.4-gke.1213000

Logs

No response

Anything else?

No response

The text was updated successfully, but these errors were encountered:

manno · 2024-09-23T16:16:15Z

We're looking into the duplicated deploy.
However, having a job in a bundle is problematic. Here is an older blog post, that suggests to choose a random name for the job: https://www.suse.com/c/rancher_blog/rancher-fleet-tips-for-kubernetes-jobs-deployment-strategies-in-continuous-delivery-scenarios/

If the job is idempotent a random name would work. We're also researching if jobs can be ignored with bundle diffs.

mikmatko · 2024-09-24T06:37:35Z

Thanks for the response. In all cases where I've seen this issue, I'm using the following pattern for Job naming:

metadata:
  name: whatever-job-name-{{ .Release.Revision }}

Where {{ .Release.Revision }} means the Helm revision number, which is incremented on each Helm upgrade. I believe what you're suggesting in the blog post, using something like {{ randAlphaNum 8 | lower }} does not make any difference. The Job name is already unique, you wouldn't be able to re-deploy a Job with the same name anyway.

When Fleet initiates Helm upgrade twice, in between, the previous Job instance is deleted. I would be happy to use a workaround which would keep the previous Job instances, but for some reason, they are automatically cleaned up (I don't have any TTL set, that would also cause issues). In a way this can be achieved by using helm.sh/resource-policy: keep but Fleet conflicts with that too, the bundle then complains about orphaned resources.

Currently I don't think bundle diffs support ignoring Jobs. Similar related issues: #748 and #2051

My actual use case for running the Jobs is using a Helm post-upgrade hook to notify our Jenkins instance to start running test set after a successful deployment. I'm also using Jobs to run database migrations in several backend services. However I reproduced this issue of duplicated Helm upgrades even with a simple single Job, so it doesn't seem related to using Helm hooks etc.

jhoblitt · 2024-09-27T00:00:57Z

@mikmatko After an incredible amount of different attempts to get jobs to "not be a problem" I have been setting a helm hook to remove jobs to prevent collision between deployments. E.g.

---
apiVersion: batch/v1
kind: Job
metadata:
  annotations:
    helm.sh/hook: post-install,post-upgrade
    helm.sh/hook-delete-policy: hook-succeeded,before-hook-creation

manno · 2024-09-30T09:46:22Z

I added #2051 (comment) to the 2.10 milestone, to implement "ignore resources".

mikmatko · 2024-10-03T16:54:24Z

Spent a while debugging the duplicated upgrade issue on current HEAD (41d3f52).

Here is a horrible workaround that adds a small jitter to BundleDeploymentReconciler before it fetches the latest BundleDeployment from cluster:

diff --git a/internal/cmd/agent/controller/bundledeployment_controller.go b/internal/cmd/agent/controller/bundledeployment_controller.go
index 4d516227..e6ce4ee1 100644
--- a/internal/cmd/agent/controller/bundledeployment_controller.go
+++ b/internal/cmd/agent/controller/bundledeployment_controller.go
@@ -10,6 +10,7 @@ import (
 	"github.com/rancher/fleet/internal/cmd/agent/deployer/driftdetect"
 	"github.com/rancher/fleet/internal/cmd/agent/deployer/monitor"
 	fleetv1 "github.com/rancher/fleet/pkg/apis/fleet.cattle.io/v1alpha1"
+	"golang.org/x/exp/rand"
 
 	apierrors "k8s.io/apimachinery/pkg/api/errors"
 	"k8s.io/apimachinery/pkg/runtime"
@@ -99,6 +100,10 @@ func (r *BundleDeploymentReconciler) Reconcile(ctx context.Context, req ctrl.Req
 	ctx = log.IntoContext(ctx, logger)
 	key := req.String()
 
+	// add small jitter to avoid duplicated deployments
+	rand.Seed(uint64(time.Now().UnixNano()))
+	time.Sleep(time.Duration(rand.Intn(5)+2) * time.Second)
+
 	// get latest BundleDeployment from cluster
 	bd := &fleetv1.BundleDeployment{}
 	err := r.Get(ctx, req.NamespacedName, bd)

With this patch, this condition

fleet/internal/cmd/agent/deployer/deployer.go

Line 102 in 41d3f52

if bd.Spec.DeploymentID == bd.Status.AppliedDeploymentID {

is true. Without this patch, the condition is not true, and we then hit

fleet/internal/cmd/agent/deployer/deployer.go

Line 155 in 41d3f52

release, err := d.helm.Deploy(ctx, bd.Name, manifest, bd.Spec.Options)

where Helm deployment occurs.

I don't really know why. Something causes the reconciler to run twice around the exact same time. If both cases fetch the BundleDeployment from the cluster at roughly the same time, then in both cases bd.Spec.DeploymentID and bd.Status.AppliedDeploymentID will differ, thus causing Helm deployment getting called twice.

With this patch, the small jitter ensures that something has already happened to BundleDeployment before the other request pokes at it.

Since I'm not familiar with the Fleet codebase, I may have understood something wrong. @manno @weyfonk Does this make sense to you?

In any case, in my testing so far, the above patch has worked. I have not seen duplicated deployments ever since.

mikmatko added the kind/bug label Sep 18, 2024

weyfonk self-assigned this Sep 24, 2024

weyfonk mentioned this issue Sep 26, 2024

Migrate bundle diffs tests to integration #2905

Merged

weyfonk removed their assignment Sep 27, 2024

manno mentioned this issue Sep 30, 2024

[SURE-8309] fleet-agent in rke2 cluster repeatedly deploying managed-system-agent bundle #2856

Open

weyfonk mentioned this issue Sep 30, 2024

Limit Deployed bundle logs #2917

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Helm upgrade is often duplicated, causing issues with Jobs #2869

Helm upgrade is often duplicated, causing issues with Jobs #2869

mikmatko commented Sep 18, 2024

manno commented Sep 23, 2024

mikmatko commented Sep 24, 2024 •

edited

Loading

jhoblitt commented Sep 27, 2024

manno commented Sep 30, 2024

mikmatko commented Oct 3, 2024 •

edited

Loading

Helm upgrade is often duplicated, causing issues with Jobs #2869

Helm upgrade is often duplicated, causing issues with Jobs #2869

Comments

mikmatko commented Sep 18, 2024

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Logs

Anything else?

manno commented Sep 23, 2024

mikmatko commented Sep 24, 2024 • edited Loading

jhoblitt commented Sep 27, 2024

manno commented Sep 30, 2024

mikmatko commented Oct 3, 2024 • edited Loading

mikmatko commented Sep 24, 2024 •

edited

Loading

mikmatko commented Oct 3, 2024 •

edited

Loading