Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(argo-cd): Add support for application controller dynamic cluster distribution. #2539

Merged
merged 2 commits into from
Feb 25, 2024

Conversation

oscrx
Copy link
Contributor

@oscrx oscrx commented Feb 21, 2024

Resolves: #2537

Let me know if anyone has questions or suggestions.

Relevant issues and PR's:

Thanks! @ishitasequeira

Removed env:

env:
  - name: ARGOCD_CONTROLLER_REPLICAS
    value: "1"

Added env's:

env:
  - name: ARGOCD_ENABLE_DYNAMIC_CLUSTER_DISTRIBUTION
    value: "true"
  - name: ARGOCD_CONTROLLER_HEARTBEAT_TIME
    value: "10"

Checklist:

  • I have bumped the chart version according to versioning
  • I have updated the documentation according to documentation
  • I have updated the chart changelog with all the changes that come with this pull request according to changelog.
  • Any new values are backwards compatible and/or have sensible default.
  • I have signed off all my commits as required by DCO.
  • My build is green (troubleshooting builds).

@pdrastil
Copy link
Member

@oscrx Great work. Can you also please run scripts/helm-docs.sh in the root of directory and commit README.md change?

@oscrx
Copy link
Contributor Author

oscrx commented Feb 21, 2024

I think I already did that, but after removing the trailing space in Chart.yaml will do it again :)

Also some helpful test/debug commands for anyone that's interested in reviewing:

helm dep update
helm template test . > before.yaml 
helm template test . --set controller.dynamicClusterDistribution=true > after.yaml
diff before.yaml after.yaml
# or in your favorite editor of course

@pdrastil
Copy link
Member

This is exactly why I asked for second drop-in implementation so it's easy to sync changes with upstream on 2.11 :)

@oscrx
Copy link
Contributor Author

oscrx commented Feb 21, 2024

After running the docs generator again the space seems the only difference. I love the generator btw, never used it before.

@pdrastil
Copy link
Member

One more ask. We should probably create smoke test for this in .ci folder just to see that Argo starts up if different implementation is enabled.

Copy link
Member

@pdrastil pdrastil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks again

charts/argo-cd/values.yaml Show resolved Hide resolved
@oscrx
Copy link
Contributor Author

oscrx commented Feb 21, 2024

I don't think the pipeline is detecting charts/argo-cd/ci/dynamic-sharding.yaml as a change..
Did I forget something?

@mkilchhofer
Copy link
Member

I don't think the pipeline is detecting charts/argo-cd/ci/dynamic-sharding.yaml as a change..

Did I forget something?

Filename needs to end in ...-values.yaml :)

@oscrx oscrx force-pushed the main branch 2 times, most recently from 779a1da to bad0bfb Compare February 22, 2024 00:36
@oscrx
Copy link
Contributor Author

oscrx commented Feb 22, 2024

Tried testing it again to see if it's flaky.

The error is a bit vague:

time="2024-02-21T23:44:19Z" level=fatal msg="(dymanic cluster distribution) failed to get app controller deployment: deployments.apps "argocd-application-controller" not found"

It's running fine in my cluster.

Startup logs ```text time="2024-02-22T00:44:29Z" level=info msg="ArgoCD Application Controller is starting" built="2024-02-14T17:37:43Z" commit=a79e0eaca415461dc36615470cecc25d6d38cefb namespace=argocd version=v2.10.1+a79e0ea time="2024-02-22T00:44:29Z" level=info msg="Processing all cluster shards" time="2024-02-22T00:44:29Z" level=info msg="Processing all cluster shards" time="2024-02-22T00:44:29Z" level=info msg="appResyncPeriod=3m0s, appHardResyncPeriod=0s, appResyncJitter=0s" time="2024-02-22T00:44:29Z" level=info msg="Starting configmap/secret informers" time="2024-02-22T00:44:29Z" level=info msg="Configmap/secret informer synced" time="2024-02-22T00:44:29Z" level=warning msg="The cluster https://kubernetes.default.svc has no assigned shard." time="2024-02-22T00:44:29Z" level=warning msg="The cluster https://kubernetes.default.svc has no assigned shard." time="2024-02-22T00:44:29Z" level=info msg="Cluster https://kubernetes.default.svc has been assigned to shard 0" time="2024-02-22T00:44:29Z" level=info msg="Starting secretInformer forcluster" time="2024-02-22T00:44:29Z" level=info msg="Start syncing cluster" server="https://kubernetes.default.svc" time="2024-02-22T00:44:29Z" level=info msg="0xc000ca59e0 subscribed to settings updates" time="2024-02-22T00:44:32Z" level=info msg="Cluster successfully synced" server="https://kubernetes.default.svc" ```

The new configmap also gets refreshed every 10 seconds

apiVersion: v1
data:
  shardControllerMapping: >-
    [{"ShardNumber":0,"ControllerName":"argocd-application-controller-5f656dbc96-qlq2s","HeartbeatTime":"2024-02-22T00:51:27Z"}]
kind: ConfigMap
metadata:
  creationTimestamp: '2024-02-22T00:14:03Z'
  name: argocd-app-controller-shard-cm
  namespace: argocd
  resourceVersion: '133966466'
  uid: 4c1e03da-2c5a-48e3-9c56-e5f2508dd8d6

@oscrx
Copy link
Contributor Author

oscrx commented Feb 22, 2024

@oscrx
Copy link
Contributor Author

oscrx commented Feb 22, 2024

Oh I figured it out, it's getting late :)

It's because the name of the helm chart is dynamic during the smoketest, the code/container gets it's own name from an environment variable: ARGOCD_APPLICATION_CONTROLLER_NAME
I added it.

It might be needed for other components in this helm chart as well..
https://argo-cd.readthedocs.io/en/stable/user-guide/environment-variables/
Let me know what you guys think, I can include it in this PR.

Also checked where it is getting the number of replica's. That code is dynamic though.
https://github.com/argoproj/argo-cd/blob/6aa79f283cde2b1701c755261c8408c01c7b7566/controller/sharding/sharding.go#L373

@oscrx
Copy link
Contributor Author

oscrx commented Feb 22, 2024

Any thoughts about the name variable?
We could also merge it as-is

@oscrx
Copy link
Contributor Author

oscrx commented Feb 25, 2024

bumped and rebased

@oscrx
Copy link
Contributor Author

oscrx commented Feb 25, 2024

I added the previously discussed variable names into a separate commit for easier review, let me know what you guys think and if we should configure this.

Copy link
Collaborator

@yu-croco yu-croco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution ! LGTM

@mbevc1 mbevc1 merged commit 7c8fab5 into argoproj:main Feb 25, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce support for dynamic cluster sharding
5 participants