Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm upgrade is often duplicated, causing issues with Jobs #2869

Open
1 task done
mikmatko opened this issue Sep 18, 2024 · 5 comments
Open
1 task done

Helm upgrade is often duplicated, causing issues with Jobs #2869

mikmatko opened this issue Sep 18, 2024 · 5 comments
Labels

Comments

@mikmatko
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Helm upgrade is seemingly called twice on every change or force update. This seems to occur most of the time, but not always.

Logs from a downstream cluster fleet-agent-0 pod:

Working scenario, Helm deployment is called only once:

{"level":"info","ts":"2024-09-18T06:36:20Z","logger":"bundledeployment.HelmDeployer.install","msg":"Upgrading helm release","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"7c7639d9-f1ba-49c2-8785-b6e582c166a8","commit":"239b40d88e01e2db8d80eabeb891384b25e76311","dryRun":false}
{"level":"info","ts":"2024-09-18T06:36:37Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"7c7639d9-f1ba-49c2-8785-b6e582c166a8","deploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:4f69fbccb9885a091faa2d70ed56b710a70e33c7f73d381394c10b70dbdf3452","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:9190b78396800308e5260944df675a40cb47e4b1c5e2180b7e70be580d38608f","release":"mikko-debug/mikko-debug:86","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:4f69fbccb9885a091faa2d70ed56b710a70e33c7f73d381394c10b70dbdf3452"}
{"level":"info","ts":"2024-09-18T06:36:37Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"b19b7a49-3c5f-47ad-be0f-375a7ebf5d22","deploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:4f69fbccb9885a091faa2d70ed56b710a70e33c7f73d381394c10b70dbdf3452","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:4f69fbccb9885a091faa2d70ed56b710a70e33c7f73d381394c10b70dbdf3452","release":"mikko-debug/mikko-debug:86","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:4f69fbccb9885a091faa2d70ed56b710a70e33c7f73d381394c10b70dbdf3452"}

Then a bit later after pushing Force Update through Rancher UI (same occurs on a single new commit too):

{"level":"info","ts":"2024-09-18T06:38:01Z","logger":"bundledeployment.HelmDeployer.install","msg":"Upgrading helm release","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"5a20d39e-ed74-4f98-ba97-40b6b79767e9","commit":"239b40d88e01e2db8d80eabeb891384b25e76311","dryRun":false}
{"level":"info","ts":"2024-09-18T06:38:17Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"5a20d39e-ed74-4f98-ba97-40b6b79767e9","deploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:4f69fbccb9885a091faa2d70ed56b710a70e33c7f73d381394c10b70dbdf3452","release":"mikko-debug/mikko-debug:87","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba"}
{"level":"info","ts":"2024-09-18T06:38:17Z","logger":"bundledeployment.HelmDeployer.install","msg":"Upgrading helm release","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"5ae5c18b-b800-4e8b-990e-5ce2e448bb4d","commit":"239b40d88e01e2db8d80eabeb891384b25e76311","dryRun":false}
{"level":"info","ts":"2024-09-18T06:38:34Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"5ae5c18b-b800-4e8b-990e-5ce2e448bb4d","deploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:4f69fbccb9885a091faa2d70ed56b710a70e33c7f73d381394c10b70dbdf3452","release":"mikko-debug/mikko-debug:88","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba"}
{"level":"info","ts":"2024-09-18T06:38:34Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"f327af12-2651-4550-95ba-aef2a49271c4","deploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","release":"mikko-debug/mikko-debug:87","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba"}
{"level":"info","ts":"2024-09-18T06:38:34Z","logger":"bundledeployment.UpdateStatus","msg":"Status not ready","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"f327af12-2651-4550-95ba-aef2a49271c4","error":"job.batch mikko-debug/mikko-debug-job-87 missing"}
{"level":"info","ts":"2024-09-18T06:38:34Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"c80e519f-b4fd-416e-911d-d689731329da","deploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","release":"mikko-debug/mikko-debug:88","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba"}
{"level":"info","ts":"2024-09-18T06:38:34Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"47736280-52da-4da1-8114-412c7d84bf46","deploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","release":"mikko-debug/mikko-debug:87","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba"}
{"level":"info","ts":"2024-09-18T06:38:34Z","logger":"bundledeployment.UpdateStatus","msg":"Status not ready","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"47736280-52da-4da1-8114-412c7d84bf46","error":"job.batch mikko-debug/mikko-debug-job-87 missing"}
{"level":"info","ts":"2024-09-18T06:38:34Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"49bfc5f3-0320-4e93-876b-c141ee5f1f7e","deploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","release":"mikko-debug/mikko-debug:88","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba"}
{"level":"info","ts":"2024-09-18T06:38:34Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"62f0793c-b15a-4bd3-9c70-46c67c7e9a7d","deploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","release":"mikko-debug/mikko-debug:87","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba"}
{"level":"info","ts":"2024-09-18T06:38:35Z","logger":"bundledeployment.UpdateStatus","msg":"Status not ready","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"62f0793c-b15a-4bd3-9c70-46c67c7e9a7d","error":"job.batch mikko-debug/mikko-debug-job-87 missing"}
{"level":"info","ts":"2024-09-18T06:38:36Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"d03de5d9-5438-4cd8-940e-f61106942130","deploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","release":"mikko-debug/mikko-debug:87","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba"}
{"level":"info","ts":"2024-09-18T06:38:37Z","logger":"bundledeployment.UpdateStatus","msg":"Status not ready","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"d03de5d9-5438-4cd8-940e-f61106942130","error":"job.batch mikko-debug/mikko-debug-job-87 missing"}
{"level":"info","ts":"2024-09-18T06:38:38Z","logger":"bundledeployment.DeployBundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"97c74c52-0b6f-4023-9e40-b1c51f078e72","deploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba","release":"mikko-debug/mikko-debug:87","appliedDeploymentID":"s-cdb5595a5910c9b95765d654ab3a84d653589f6cf4aa9f314dac997310b2b:fb4e0cef7f7c41e0faa1e9ce7662ddde11afa277c52a5b342150dd9bd23841ba"}
{"level":"info","ts":"2024-09-18T06:38:39Z","logger":"bundledeployment.UpdateStatus","msg":"Status not ready","controller":"bundledeployment","controllerGroup":"fleet.cattle.io","controllerKind":"BundleDeployment","BundleDeployment":{"name":"mikko-debug-debug-debug-chart","namespace":"cluster-fleet-default-clustername-1ba05bfd28c8"},"namespace":"cluster-fleet-default-clustername-1ba05bfd28c8","name":"mikko-debug-debug-debug-chart","reconcileID":"97c74c52-0b6f-4023-9e40-b1c51f078e72","error":"job.batch mikko-debug/mikko-debug-job-87 missing"}

As can be seen from the logs, Helm upgrade is called twice. As a result, Fleet thinks that a Job object is suddenly missing:

job.batch mikko-debug/mikko-debug-job-87 missing

While Fleet did two Helm upgrade operations, it seems to still think that it had done only once, hence looking for an object from the previous Helm release. This leaves the Bundle in a modified state.

Expected Behavior

Helm upgrade is called only once per change.

Steps To Reproduce

I believe this issue can be seen with any chart, but it is more apparent if you have any Job in the chart. Doesn't seem to matter what options are provided in fleet.yaml etc.

Environment

- Architecture: x86
- Fleet Version: v0.10.2
- Cluster:
  - Provider: GKE
  - Options:
  - Kubernetes Version: v1.30.4-gke.1213000

Logs

No response

Anything else?

No response

@manno
Copy link
Member

manno commented Sep 23, 2024

We're looking into the duplicated deploy.
However, having a job in a bundle is problematic. Here is an older blog post, that suggests to choose a random name for the job: https://www.suse.com/c/rancher_blog/rancher-fleet-tips-for-kubernetes-jobs-deployment-strategies-in-continuous-delivery-scenarios/

If the job is idempotent a random name would work. We're also researching if jobs can be ignored with bundle diffs.

@mikmatko
Copy link
Author

mikmatko commented Sep 24, 2024

Thanks for the response. In all cases where I've seen this issue, I'm using the following pattern for Job naming:

metadata:
  name: whatever-job-name-{{ .Release.Revision }}

Where {{ .Release.Revision }} means the Helm revision number, which is incremented on each Helm upgrade. I believe what you're suggesting in the blog post, using something like {{ randAlphaNum 8 | lower }} does not make any difference. The Job name is already unique, you wouldn't be able to re-deploy a Job with the same name anyway.

When Fleet initiates Helm upgrade twice, in between, the previous Job instance is deleted. I would be happy to use a workaround which would keep the previous Job instances, but for some reason, they are automatically cleaned up (I don't have any TTL set, that would also cause issues). In a way this can be achieved by using helm.sh/resource-policy: keep but Fleet conflicts with that too, the bundle then complains about orphaned resources.

Currently I don't think bundle diffs support ignoring Jobs. Similar related issues: #748 and #2051

My actual use case for running the Jobs is using a Helm post-upgrade hook to notify our Jenkins instance to start running test set after a successful deployment. I'm also using Jobs to run database migrations in several backend services. However I reproduced this issue of duplicated Helm upgrades even with a simple single Job, so it doesn't seem related to using Helm hooks etc.

@jhoblitt
Copy link
Contributor

@mikmatko After an incredible amount of different attempts to get jobs to "not be a problem" I have been setting a helm hook to remove jobs to prevent collision between deployments. E.g.

---
apiVersion: batch/v1
kind: Job
metadata:
  annotations:
    helm.sh/hook: post-install,post-upgrade
    helm.sh/hook-delete-policy: hook-succeeded,before-hook-creation

@manno
Copy link
Member

manno commented Sep 30, 2024

I added #2051 (comment) to the 2.10 milestone, to implement "ignore resources".

@mikmatko
Copy link
Author

mikmatko commented Oct 3, 2024

Spent a while debugging the duplicated upgrade issue on current HEAD (41d3f52).

Here is a horrible workaround that adds a small jitter to BundleDeploymentReconciler before it fetches the latest BundleDeployment from cluster:

diff --git a/internal/cmd/agent/controller/bundledeployment_controller.go b/internal/cmd/agent/controller/bundledeployment_controller.go
index 4d516227..e6ce4ee1 100644
--- a/internal/cmd/agent/controller/bundledeployment_controller.go
+++ b/internal/cmd/agent/controller/bundledeployment_controller.go
@@ -10,6 +10,7 @@ import (
 	"github.com/rancher/fleet/internal/cmd/agent/deployer/driftdetect"
 	"github.com/rancher/fleet/internal/cmd/agent/deployer/monitor"
 	fleetv1 "github.com/rancher/fleet/pkg/apis/fleet.cattle.io/v1alpha1"
+	"golang.org/x/exp/rand"
 
 	apierrors "k8s.io/apimachinery/pkg/api/errors"
 	"k8s.io/apimachinery/pkg/runtime"
@@ -99,6 +100,10 @@ func (r *BundleDeploymentReconciler) Reconcile(ctx context.Context, req ctrl.Req
 	ctx = log.IntoContext(ctx, logger)
 	key := req.String()
 
+	// add small jitter to avoid duplicated deployments
+	rand.Seed(uint64(time.Now().UnixNano()))
+	time.Sleep(time.Duration(rand.Intn(5)+2) * time.Second)
+
 	// get latest BundleDeployment from cluster
 	bd := &fleetv1.BundleDeployment{}
 	err := r.Get(ctx, req.NamespacedName, bd)

With this patch, this condition

if bd.Spec.DeploymentID == bd.Status.AppliedDeploymentID {
is true. Without this patch, the condition is not true, and we then hit
release, err := d.helm.Deploy(ctx, bd.Name, manifest, bd.Spec.Options)
where Helm deployment occurs.

I don't really know why. Something causes the reconciler to run twice around the exact same time. If both cases fetch the BundleDeployment from the cluster at roughly the same time, then in both cases bd.Spec.DeploymentID and bd.Status.AppliedDeploymentID will differ, thus causing Helm deployment getting called twice.

With this patch, the small jitter ensures that something has already happened to BundleDeployment before the other request pokes at it.

Since I'm not familiar with the Fleet codebase, I may have understood something wrong. @manno @weyfonk Does this make sense to you?

In any case, in my testing so far, the above patch has worked. I have not seen duplicated deployments ever since.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 📋 Backlog
Development

No branches or pull requests

4 participants