Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArgoCD ApplicationSet SCM generator eats GitHub API Rate Limit #9413

Open
3 tasks done
IvanovOleg opened this issue May 15, 2022 · 25 comments
Open
3 tasks done

ArgoCD ApplicationSet SCM generator eats GitHub API Rate Limit #9413

IvanovOleg opened this issue May 15, 2022 · 25 comments
Labels
bug Something isn't working component:application-sets Bulk application management related type:scalability Issues related to scalability and performance related issues

Comments

@IvanovOleg
Copy link

IvanovOleg commented May 15, 2022

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug
I have an ApplicationSet with SCM generator configured to poll GitHub repositories. In my simple case I just poll one repo branch. This simple configuration reaches the GitHub API Request limit of 5000 requests in 10 minutes. This fact makes SCM generator completely useless.

To Reproduce

---
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: scm-development
spec:
  generators:
    - scmProvider:
        cloneProtocol: ssh
        github:
          organization: my-org
          allBranches: true
          tokenRef:
            secretName: github-token
            key: token
        filters:
          - repositoryMatch: ^my-repo
            pathsExist: [kustomize/overlays/development]   
            branchMatch: ^deploy   
  template:
    metadata:
      name: '{{ repository }}'
    spec:
      source:
        repoURL: '{{ url }}'
        targetRevision: '{{ branch }}'
        path: kustomize/overlays/development
      project: default
      destination:
        server: https://kubernetes.default.svc
        namespace: '{{ repository }}'
      syncPolicy:
        syncOptions:
          - CreateNamespace=true

Expected behavior
Modest API request usage

Screenshots

Version

v2.3.3

Logs

error listing repos: error listing pull requests for my-org/my-repo: GET https://api.github.com/repos/my-org/my-repo/pulls?per_page=100: 403 API rate limit exceeded for user ID 12345678. [rate reset in 31m51s]
@IvanovOleg IvanovOleg added the bug Something isn't working label May 15, 2022
@crenshaw-dev
Copy link
Member

Are there a lot of repos in the organization. The SCM generator loops over repos to list branches.

@IvanovOleg
Copy link
Author

@crenshaw-dev A lot. It looks like SCM retrieves everything and then filters which makes it extremely ineffective

@crenshaw-dev
Copy link
Member

It should apply your repositoryMatch filter before listing branches, which should help some.

After listing and filtering repos by name, it'll loop over the remaining repos and list branches.

After listing branches, it will filter by the branchMatch filter.

Then for each remaining branch, it'll make an API call to filter by pathsExist.

The regex filters are your friends. They should help minimize the API calls by a lot.

@IvanovOleg
Copy link
Author

@crenshaw-dev I my example, I select only one repository and a single branch, but it drains all 5000 requests in less then 10 minutes.

@crenshaw-dev
Copy link
Member

Oh wait. I just noticed the error message is about the Pull Request Generator? Are you sure the SCM generator is the issue? It shouldn't be hitting the PR API.

@IvanovOleg
Copy link
Author

@crenshaw-dev I guess I copied from the wrong app. Anyway I tried to remove all other appsets and apps and leave just SCM. It still drains all request. And when all rate limit is exceeded everything stops working.

@crenshaw-dev
Copy link
Member

If I understand correctly, you have it filtered down to one repo.

How many branches match the branch pattern for that one repo?

@IvanovOleg
Copy link
Author

@crenshaw-dev that repo contains main branch and deploy branch. That's all. It's a POC, not a real app

@crenshaw-dev
Copy link
Member

Wow. I'd have to dig into the code to get a good sense what's going wrong here. The generator code needs more debug logging. There's really no way to debug this without seeing what calls are being made.

@rishabh625
Copy link
Contributor

rishabh625 commented May 16, 2022

I did a bit of debugging , but some how I don't see much of calls or rate limit being although my org had only 2 repo and 4 branches.

I think Key to debug it successfully would be to have a large organization, @IvanovOleg : Can u share size of repo and number of branches , we could replicate and add filter

As in code I see that it filters once all repo's are listed

@IvanovOleg
Copy link
Author

@rishabh625 ~1000 repositories with at least 10 branches in each

@IvanovOleg
Copy link
Author

Any updates?

@rishabh625
Copy link
Contributor

Any updates?

Not yet @IvanovOleg , will post you soon

@jetersen
Copy link
Contributor

jetersen commented Aug 9, 2022

Perhaps it would help if ApplicationSet SCM generator used the existing repo secrets so it could pick up the GitHub App secret?

Instead of having to reference a separate token secret.

The org app could be set to allow read for all repos and have higher rate limits: https://docs.github.com/en/developers/apps/building-github-apps/rate-limits-for-github-apps#default-server-to-server-rate-limits-for-githubcom

See where the code already has GitHubApp Auth: https://github.com/argoproj/argo-cd/search?q=GithubAppInstallationId

@crenshaw-dev
Copy link
Member

@jetersen I think that might be what this PR does: #10092

Hoping to review it today.

@jetersen
Copy link
Contributor

jetersen commented Aug 9, 2022

@crenshaw-dev thanks for cross linking, I was searching for issues and PRs but found this one first and dropped my comment here 😅

@dllegru
Copy link

dllegru commented Oct 4, 2022

@jetersen Perhaps it would help if ApplicationSet SCM generator used the existing repo secrets so it could pick up the GitHub App secret?

I think it would help in terms that using GitHub App secret may allow to make more api calls compared to PAT, although imo we're not fixing the root issue which is that we're overusing api calls when we have scmProvider regexed into a repo + branch.

@crenshaw-dev crenshaw-dev added component:application-sets Bulk application management related type:scalability Issues related to scalability and performance related issues labels Dec 15, 2022
@jdomag
Copy link

jdomag commented Jan 25, 2023

I'm having the same issue with Bitbucket Cloud SCM:

        - scmProvider:
            bitbucket:
              # The workspace id (slug).  
              owner: "my-bitbucket-org"
              # The user to use for basic authentication with an app password.
              user: "my-bot-user"
              # If true, scan every branch of every repository. If false, scan only the main branch. Defaults to false.
              allBranches: false
              # Reference to a Secret containing an app password.
              appPasswordRef:
                secretName: argocd-repo-creds-bitbucket
                key: password

We have around 100 repos with few branches in each and application-set controller gets 429 :(
In my case filtering by repo name is not an option. Shouldn't app set controller have some parameter that controls the number of calls per second?

@XLordalX
Copy link

Maybe it would be a good idea to allow changing query parameters. For example github allows filtering on topic in the api request. This would drastically reduce the amount of calls for us.

@alen-z
Copy link

alen-z commented May 17, 2023

Proving to be serious issue for us. We'd like to use branchMatch...

@jastBytes
Copy link

We seem to have the same problem. Are there any news on this?

@ftmiro
Copy link

ftmiro commented Aug 16, 2023

I'm also facing the problem. My org has around 800 repositories.

GET https://api.github.com/orgs/<my org>/repos?per_page=100: 403 API rate limit exceeded for installation ID <ID>

Allowing adding the filtered topics to the query would help in short term until the organization has many repositories that must be discovered using the topic (argocd-discoverable, for example). Would be possible to create some delay between the iterations? Seems that there's no waiting between the calls made to GitHub.

Another possibility would be drop the branchMatch. I think the idea of branch belongs to Git and not to the SCM, since they don't provide an API that allow us to filter repositories that contains a list of branches. Maybe this could be achieved by using a matrix of SCM + Git generators.

@cyrus-mc
Copy link

So I ran into this problem and decided to dig into the code. I believe the issue is that there is no precedence and logical operation (AND) to the rules given in a filter.

I too had a similar filter:

        filters:
          - repositoryMatch: ^my-repo
            pathsExist: [kustomize/overlays/development]   
            branchMatch: ^deploy 

Looking at the code I determined that it evaluates each of the conditions individually. It doesn't look for repos that match ^my-repo and then further filter by pathsExist and branchMatch.

If I removed the pathsExist and branchMatch I did not experience 429s.

@crenshaw-dev
Copy link
Member

@cyrus-mc I believe this may be the fix: #14465

@totti339
Copy link

totti339 commented Nov 6, 2023

I'm facing the same issue. Can't we just control the polling interval and set something more than 60s ? I tried multiple option and looked into the code but didn't find any parameter that allows doing that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component:application-sets Bulk application management related type:scalability Issues related to scalability and performance related issues
Projects
None yet
Development

No branches or pull requests