-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Helm upgrade is often duplicated, causing issues with Jobs #2869
Comments
We're looking into the duplicated deploy. If the job is idempotent a random name would work. We're also researching if jobs can be ignored with bundle diffs. |
Thanks for the response. In all cases where I've seen this issue, I'm using the following pattern for Job naming: metadata:
name: whatever-job-name-{{ .Release.Revision }} Where When Fleet initiates Helm upgrade twice, in between, the previous Job instance is deleted. I would be happy to use a workaround which would keep the previous Job instances, but for some reason, they are automatically cleaned up (I don't have any TTL set, that would also cause issues). In a way this can be achieved by using Currently I don't think bundle diffs support ignoring Jobs. Similar related issues: #748 and #2051 My actual use case for running the Jobs is using a Helm |
@mikmatko After an incredible amount of different attempts to get jobs to "not be a problem" I have been setting a helm hook to remove jobs to prevent collision between deployments. E.g.
|
I added #2051 (comment) to the 2.10 milestone, to implement "ignore resources". |
Spent a while debugging the duplicated upgrade issue on current HEAD (41d3f52). Here is a horrible workaround that adds a small jitter to diff --git a/internal/cmd/agent/controller/bundledeployment_controller.go b/internal/cmd/agent/controller/bundledeployment_controller.go
index 4d516227..e6ce4ee1 100644
--- a/internal/cmd/agent/controller/bundledeployment_controller.go
+++ b/internal/cmd/agent/controller/bundledeployment_controller.go
@@ -10,6 +10,7 @@ import (
"github.com/rancher/fleet/internal/cmd/agent/deployer/driftdetect"
"github.com/rancher/fleet/internal/cmd/agent/deployer/monitor"
fleetv1 "github.com/rancher/fleet/pkg/apis/fleet.cattle.io/v1alpha1"
+ "golang.org/x/exp/rand"
apierrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/runtime"
@@ -99,6 +100,10 @@ func (r *BundleDeploymentReconciler) Reconcile(ctx context.Context, req ctrl.Req
ctx = log.IntoContext(ctx, logger)
key := req.String()
+ // add small jitter to avoid duplicated deployments
+ rand.Seed(uint64(time.Now().UnixNano()))
+ time.Sleep(time.Duration(rand.Intn(5)+2) * time.Second)
+
// get latest BundleDeployment from cluster
bd := &fleetv1.BundleDeployment{}
err := r.Get(ctx, req.NamespacedName, bd) With this patch, this condition fleet/internal/cmd/agent/deployer/deployer.go Line 102 in 41d3f52
fleet/internal/cmd/agent/deployer/deployer.go Line 155 in 41d3f52
I don't really know why. Something causes the reconciler to run twice around the exact same time. If both cases fetch the BundleDeployment from the cluster at roughly the same time, then in both cases With this patch, the small jitter ensures that something has already happened to BundleDeployment before the other request pokes at it. Since I'm not familiar with the Fleet codebase, I may have understood something wrong. @manno @weyfonk Does this make sense to you? In any case, in my testing so far, the above patch has worked. I have not seen duplicated deployments ever since. |
Is there an existing issue for this?
Current Behavior
Helm upgrade is seemingly called twice on every change or force update. This seems to occur most of the time, but not always.
Logs from a downstream cluster
fleet-agent-0
pod:Working scenario, Helm deployment is called only once:
Then a bit later after pushing
Force Update
through Rancher UI (same occurs on a single new commit too):As can be seen from the logs, Helm upgrade is called twice. As a result, Fleet thinks that a
Job
object is suddenly missing:job.batch mikko-debug/mikko-debug-job-87 missing
While Fleet did two Helm upgrade operations, it seems to still think that it had done only once, hence looking for an object from the previous Helm release. This leaves the Bundle in a
modified
state.Expected Behavior
Helm upgrade is called only once per change.
Steps To Reproduce
I believe this issue can be seen with any chart, but it is more apparent if you have any
Job
in the chart. Doesn't seem to matter what options are provided infleet.yaml
etc.Environment
Logs
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: