Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backport v2.10] [SURE-9460] Fleet not picking up gitrepo updates, no job created to update #3252

Closed
rancherbot opened this issue Jan 24, 2025 · 2 comments
Assignees
Milestone

Comments

@rancherbot
Copy link
Collaborator

This is a backport issue for #3138, automatically created via GitHub Actions workflow initiated by @0xavi0

Original issue body:

SURE-9460

Issue description

After upgrading Rancher to 2.9.3 / fleet to v0.10.4, some gitrepos are no longer receiving updates. Customer update the repository, but changes are not pushed to the clusters. No Job is created to pull in the changes that should be tracked by the gitRepo.

In fleet v0.10.4, there were changes made to how jobs are managed in fleet. Could these changes be the cause of the issue here? #2932 seems to change how jobs are managed.

Business impact:

Unable to receive updates to applications using fleet for continuous delivery.

Troubleshooting steps:

GitJob pod, does not show that jobs are completing for those gitRepos, We are also unable to find jobs for the

Repro steps:

Upgrade to Rancher 2.9.3 from 2.9.2

Workaround:

Is a workaround available and implemented? yes
What is the workaround:
Customer found that by editing a gitRepo in the Rancher UI, changing nothing, then saving, it will eventually cause the repo to pull the change and make the necessary updates.

When making those changes, a couple lines are changed within the gitRepo:
spec.correctDrift: {} is added
status.commit is updated
status.lastPollingTriggered time is updated (time changed by more than a day).

Actual behavior:

repositories are not updated.

Expected behavior:

Repositories are updated.

@rancherbot rancherbot added this to the v2.10.3 milestone Jan 24, 2025
@rancherbot rancherbot added this to Fleet Jan 24, 2025
@github-project-automation github-project-automation bot moved this to 🆕 New in Fleet Jan 24, 2025
0xavi0 added a commit to 0xavi0/fleet that referenced this issue Jan 27, 2025

Verified

This commit was signed with the committer’s verified signature. The key has expired.
0xavi0 Xavi Garcia
Port of: rancher#3239
Refers to rancher#3252

Signed-off-by: Xavi Garcia <[email protected]>
0xavi0 added a commit that referenced this issue Jan 27, 2025

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Port of: #3239
Refers to #3252

Signed-off-by: Xavi Garcia <[email protected]>
0xavi0 added a commit that referenced this issue Jan 28, 2025
Port of: #3239
Refers to #3252

Signed-off-by: Xavi Garcia <[email protected]>
@mmartin24
Copy link
Collaborator

mmartin24 commented Jan 28, 2025

Tested in Rancher 2.10 with hotfix v0.11.3-hotfix-ch.1.3afc03bd and compared vs a non fixed version 2.10.1 with fleet:v0.11.3 with upstream and downstream clusters using k3s.


Test steps

  • Deployed Rancher 2.10.0 with 1 downstream cluster and later updated gitrepo with hotfix image
export TAG=v0.11.3-hotfix-ch.1.3afc03bd
kubectl set image -n cattle-fleet-system deployment/gitjob "*=rancher/fleet:$TAG-linux-amd64"
  • Fleet is set with 50 workers for gitrepo, bundle, and bundledeployment
  • Deployed 30 gitrepos with simple-chart: https://github.com/mmartin24/test-fleet/tree/test30/30gitrepos (thanks @0xavi0 for the examples).
  • Note down the commit hash
  • Updated a value, for instance, name on the fleet.yaml file on the 30 gitrepos at the same time

Observations

  • Hash is correctly updated on all gitrepos in the fixed version, while some struggle in a non-fix version (2.10.1) in the example, so the fix seems to work. See screenshot of fixed version (left) vs non-fixed (right) where some hashes belonging to the previous commit remain

Image

  • Some jobs fail more often in the fixed version than in the non-fixed version. Log error is due to TLS handshake timeout. I have spoken aside with @0xavi0 and this seems due to the fact that commits are updated quicker in the fixed version and Github applies some kind of rate limiting, so I understand this "side effect" is outside the scope of the validation of this fix

Image

Error

simple-chart-010-745db-ss6lq gitcloner-initializer time="2025-01-28T15:03:08Z" level=fatal msg="Get \"https://github.com/mmartin24/test-fleet/info/refs?service=git-upload-pack\": net/http: TLS handshake timeout"   

I repeated the same with 15 workers (1 per 2 gitrepos) and the results were similar, both for checking the validity of the fix and the occasional job failure for rate limiting.


Video with real-time check with and without fix here:

Screencast.from.2025-01-28.16-12-00.mov

@kkaempf kkaempf moved this from 🆕 New to 📋 Backlog in Fleet Jan 29, 2025
@manno manno moved this from 📋 Backlog to Needs QA review in Fleet Feb 5, 2025
0xavi0 added a commit to 0xavi0/fleet that referenced this issue Feb 6, 2025
0xavi0 added a commit that referenced this issue Feb 6, 2025

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Refers to #3252

Signed-off-by: Xavi Garcia <[email protected]>
@mmartin24
Copy link
Collaborator

Retested in Rancher v2.10-39be5acbd0dd488c75da9a79bd9ba5e806018175-head with fleet:v0.11.4-rc.1


  • Checked Jitter is present and working over 60s + small range:
2025-02-13T23:26:45Z    DEBUG   gitops-status   Reconciling GitRepo status      {"controller": "GitRepoStatus", "controllerGroup": "fleet.cattle.io", "controllerKind": "GitRepo", "GitRepo": {"name":"testjitter","namespace":"fleet-default"}, "namespace": "fleet-default", "name": "testjitter", "reconcileID": "887387df-389a-45a4-9730-6665f3c1c2b1", "generation": 2, "commit": "dcf4917293ef131f64724d0c03cadc4f5b257168", "conditions": [{"type":"Ready","status":"True","lastUpdateTime":"2025-02-13T23:26:03Z"},{"type":"GitPolling","status":"True","lastUpdateTime":"2025-02-13T23:25:44Z"},{"type":"Reconciling","status":"False","lastUpdateTime":"2025-02-13T23:25:44Z"},{"type":"Stalled","status":"False","lastUpdateTime":"2025-02-13T23:25:44Z"},{"type":"Accepted","status":"True","lastUpdateTime":"2025-02-13T23:25:44Z"}]}
2025-02-13T23:27:47Z    DEBUG   gitops-status   Reconciling GitRepo status      {"controller": "GitRepoStatus", "controllerGroup": "fleet.cattle.io", "controllerKind": "GitRepo", "GitRepo": {"name":"testjitter","namespace":"fleet-default"}, "namespace": "fleet-default", "name": "testjitter", "reconcileID": "34d5737d-3d71-437e-a987-bc51f6ee5877", "generation": 2, "commit": "dcf4917293ef131f64724d0c03cadc4f5b257168", "conditions": [{"type":"Ready","status":"True","lastUpdateTime":"2025-02-13T23:26:03Z"},{"type":"GitPolling","status":"True","lastUpdateTime":"2025-02-13T23:25:44Z"},{"type":"Reconciling","status":"False","lastUpdateTime":"2025-02-13T23:25:44Z"},{"type":"Stalled","status":"False","lastUpdateTime":"2025-02-13T23:25:44Z"},{"type":"Accepted","status":"True","lastUpdateTime":"2025-02-13T23:25:44Z"}]}
2025-02-13T23:28:53Z    DEBUG   gitops-status   Reconciling GitRepo status      {"controller": "GitRepoStatus", "controllerGroup": "fleet.cattle.io", "controllerKind": "GitRepo", "GitRepo": {"name":"testjitter","namespace":"fleet-default"}, "namespace": "fleet-default", "name": "testjitter", "reconcileID": "649f5f4d-762e-46c6-821c-bee0b3cb78d1", "generation": 2, "commit": "dcf4917293ef131f64724d0c03cadc4f5b257168", "conditions": [{"type":"Ready","status":"True","lastUpdateTime":"2025-02-13T23:26:03Z"},{"type":"GitPolling","status":"True","lastUpdateTime":"2025-02-13T23:25:44Z"},{"type":"Reconciling","status":"False","lastUpdateTime":"2025-02-13T23:25:44Z"},{"type":"Stalled","status":"False","lastUpdateTime":"2025-02-13T23:25:44Z"},{"type":"Accepted","status":"True","lastUpdateTime":"2025-02-13T23:25:44Z"}]}
2025-02-13T23:29:55Z    DEBUG   gitops-status   Reconciling GitRepo status      {"controller": "GitRepoStatus", "controllerGroup": "fleet.cattle.io", "controllerKind": "GitRepo", "GitRepo": {"name":"testjitter","namespace":"fleet-default"}, "namespace": "fleet-default", "name": "testjitter", "reconcileID": "255e68e7-20ae-465f-bab1-6c23aba93704", "generation": 2, "commit": "dcf4917293ef131f64724d0c03cadc4f5b257168", "conditions": [{"type":"Ready","status":"True","lastUpdateTime":"2025-02-13T23:26:03Z"},{"type":"GitPolling","status":"True","lastUpdateTime":"2025-02-13T23:25:44Z"},{"type":"Reconciling","status":"False","lastUpdateTime":"2025-02-13T23:25:44Z"},{"type":"Stalled","status":"False","lastUpdateTime":"2025-02-13T23:25:44Z"},{"type":"Accepted","status":"True","lastUpdateTime":"2025-02-13T23:25:44Z"}]}
  • Checked GITREPO_SYNC_PERIOD default set to 2 hours:
Containers:
  gitjob:
    Container ID:  containerd://98d91a2a5e37b45c326e0a24417d468bb259ff0a200612c3c2c14c43fe6cd826
    Image:         rancher/fleet:v0.11.4-rc.1
    Image ID:      docker.io/rancher/fleet@sha256:43b0b594269ba6664b27580956536b91a042d42f7c97f8116f1395b5c061f8e2
    Port:          8081/TCP
    Host Port:     0/TCP
    Args:
      fleetcontroller
      gitjob
      --gitjob-image
      rancher/fleet:v0.11.4-rc.1
      --debug
      --debug-level
      1
    State:          Running
      Started:      Thu, 13 Feb 2025 23:18:25 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      NAMESPACE:                       cattle-fleet-system (v1:metadata.namespace)
      CATTLE_ELECTION_LEASE_DURATION:  30s
      CATTLE_ELECTION_RETRY_PERIOD:    10s
      CATTLE_ELECTION_RENEW_DEADLINE:  25s
      GITREPO_SYNC_PERIOD:             2h
      GITREPO_RECONCILER_WORKERS:      50
      CATTLE_DEV_MODE:                 true
  • Checked commits from 30 gitrepos are picked up and updated correctly within seconds by all gitrepos

Image

Image

Demo video:

Screencast.from.2025-02-14.00-10-08.webm

@github-project-automation github-project-automation bot moved this from Needs QA review to ✅ Done in Fleet Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

5 participants