Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

['feature' request] Add a delay between iterations of each git repository's analyze() method #50

Open
ChrisCarini opened this issue May 14, 2024 · 3 comments

Comments

@ChrisCarini
Copy link
Contributor

Hi @mnagel ! 👋

I've been using this CLI for years now (since at least 2021, when I first contributed to clustergit: af79ab4) - loved it back then, and I still love it and use it daily!

Context

For a while now, I've been encountering an 'undesired' behavior - specifically, I have a bunch of repos (order of magnitude in the hundreds) that I use this tool against, commonly fetching/pulling the latest changes, and most-commonly against GitHub-backed repositories. When I run clustergit, I've been encountering rate limiting from GitHub in the middle of fetching repos. So far, I've been bumping down --worker, from the default of 4, to 2, now down to 1 - I'm still hitting GitHub rate limits (booo! 😞 ).

"Feature" request

What I'm looking for here is a 'feature' (in quotes, because I personally feel like this slightly flies in the face of this CLI's ability to do operations in many workers) to delay between iterations of the analyze() method.

Testing

I've actually made this change to my local copy of clustergit (see diff below), and plan to test this over the next few weeks.
Screenshot 2024-05-14 at 11 03 52

  • If you're open to this 'feature', I'll happily open a PR.
  • If you're open to this 'feature', and are ok to "not wait a few weeks" for me to test locally (it's a simple change), I'll happily open a PR sooner.
  • If you're not open to this 'feature', I totally understand, and will likely just keep this delta for myself. :)

Let me know your thoughts!

@ChrisCarini ChrisCarini changed the title ['feature'] Add a delay between iterations of each git repository's analyze() method ['feature' request] Add a delay between iterations of each git repository's analyze() method May 14, 2024
ChrisCarini added a commit to ChrisCarini/dotfiles that referenced this issue Jun 5, 2024
…ork of `clustergit`; see mnagel/clustergit#50 for details) for improved reliability
@mnagel
Copy link
Owner

mnagel commented Jun 21, 2024

Hi Chris,

Glad you like clustergit! Patches are welcome!

I suspect it will not work with githubs SSH/git server, but in general big batches of git commands can be speeded up using a SSH ControlMaster, see e.g. https://docs.rc.fas.harvard.edu/kb/using-ssh-controlmaster-for-single-sign-on/

Best Regards
Michael

@ChrisCarini
Copy link
Contributor Author

Hi Michael,

Thank you! Your message made me realize that ControlMaster was not set for all git hostnames on my machine (specifically, it was unset for GitHub; where ~99% of the repos I interact with w.r.t clustergit live).

My initial testing shows that, with --workers 16, clustergit is now getting through the majority of the repos (estimate ~80-90%, which was more than before - currently my total is somewhere ~200 repos) without errors. I also connect via different means (e.g. home connection, away from home, via a work VPN w/ thousands of users during 'peak'), so I'll want to give it a spin for a bit to make sure the various scenarios are covered.

Let me give this setup of ssh config a shot for a week or so and get back to you!

That being said, I've been using my local patch w/ --delay 1 very successfully over the past ~1mo - the only downside (as expected) is clustergit takes much longer to complete. 😓 Because of this, I've usually just run it and concurrently moved on to checking email or something else in the mean time 😄.

Appreciate the response! I will follow up here..

Best,
Chris

@ChrisCarini
Copy link
Contributor Author

Hi @mnagel - happy new year! 🎉

So I've been using the ControlMaster solution you recommended for quite some time now, and it's been mostly successful (and, not required me to use my local change introducing the --delay arg). That being said, I do still have the local patch to add a --delay option - I'm happy to post a PR with the change for your review, on the off-shot it would be useful to others in the future; I'm also happy to just discard my local change. Let me know if you have any preference.

Best,
Chris

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants