Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running git fetch --shallow-since to fetch commit author log causes hang on Buildkite runners #212

Closed
camallen opened this issue Jul 21, 2023 · 12 comments
Labels
bug planned It’s planned to be done

Comments

@camallen
Copy link

camallen commented Jul 21, 2023

Our buildkite runner agents hang forever (> 30mins till we kill them) due to the introduction of the git_commit_authors functionality.

Our buildkite job uses docker / compose to mount the app and git repo to the container. And when we try and use knapsack via bundle exec rake "knapsack_pro:rspec[--format progress" we get an indefinite hang where the git fetch cmd never finishes and our CI system breaks :(

I've tried to work around the issue using the following but the only way i could get it working was to not run the git fetch via the disabling ENV var in my fork of your repo that I used in CI system to debug / test solutions, https://github.com/camallen/knapsack_pro-ruby/commits/avoid-git-log-paging

To fix

  1. Downgrading to v5.1.2 fixes this issue as it doesn't do the author commit lookups v5.1.2...v5.3.2
  2. Do not run the git fetch in
    `git fetch --shallow-since "one month ago" --quiet 2>/dev/null`

Noting that when I don't run the git fetch the code still works so perhaps the buildkite CI system doesn't need this shallow fetch for the author commit log?

Any help getting this fixes is appreciated, in the meantime I'm pursuing option 1 and downgrading the gem.

@shadre
Copy link
Member

shadre commented Jul 21, 2023

Apologies for the inconvenience!

Thank you for the report, and the investigation on your part. We are looking into the problem.

@ArturT ArturT added bug planned It’s planned to be done labels Jul 21, 2023
@ArturT
Copy link
Member

ArturT commented Jul 21, 2023

@camallen When you tested your change, does the following command print anything?

git fetch --shallow-since "one month ago"
  • I suspect that maybe the repository is very large or there were a lot of commits in the past month and fetching just commits for the last month is still super slow.
  • Or there is some slow network connection to the repository.
  • Or there is no access to the repository from inside of the CI so something hangs due to lack of permissions. Logs would give us a clue. Could you provide logs to the support email? Thanks.

@ArturT
Copy link
Member

ArturT commented Jul 21, 2023

@camallen We released a new knapsack_pro gem 5.3.3 version. It cancels the git fetch after 5 seconds if it takes too long. Could you try it?

If you still could provide logs that would be helpful so that we have a better understanding of the issue. Thank you.

@ArturT
Copy link
Member

ArturT commented Jul 24, 2023

We also released the fix for the @knapsack-pro/jest 7.2.1 and @knapsack-pro/cypress 7.2.1 npm packages.

@irphilli
Copy link

I ran into the same thing on Buildkite with similar setup (haven't tried the latest version just yet), but here's what showed up in the logs right before the agent would hang:

The authenticity of host 'github.com (140.82.113.3)' can't be established.
--
  | ECDSA key fingerprint is SHA256:p2QAMXNIC1TJYWeIOttrVc98/R1BUFWu3/LiyKgUfQM.
  | Are you sure you want to continue connecting (yes/no/[fingerprint])?

@camallen
Copy link
Author

Thanks @irphilli and apologies for the tardy reply knapsack folks.

I think this issue is related to git authentication to a private repo as the git credentials from buildkite are not available by default inside the repo. I'm asking the buildkite folks in buildkite/elastic-ci-stack-s3-secrets-hooks#58 and https://forum.buildkite.community/t/use-git-credentials-in-a-docker-container/3137

I'm not sure the use of timeout in ruby is the best solution (though it does work), https://jvns.ca/blog/2015/11/27/why-rubys-timeout-is-dangerous-and-thread-dot-raise-is-terrifying/ and the linked sidekiq author article http://www.mikeperham.com/2015/05/08/timeout-rubys-most-dangerous-api/

I'll report back what i find from buildkite as fixing this via a configuration setting will negate the need to run a timeout block on this ruby code

@ArturT
Copy link
Member

ArturT commented Jul 26, 2023

@irphilli Please let us know if the latest knapsack_pro gem version work for you.

I'm not sure the use of timeout in ruby is the best solution (though it does work), https://jvns.ca/blog/2015/11/27/why-rubys-timeout-is-dangerous-and-thread-dot-raise-is-terrifying/ and the linked sidekiq author article http://www.mikeperham.com/2015/05/08/timeout-rubys-most-dangerous-api/

@camallen Thanks for the details about the potential Timeout issues. That's helpful.

I'll report back what i find from buildkite as fixing this via a configuration setting will negate the need to run a timeout block on this ruby code

Ok. Thank you.

@irphilli
Copy link

I haven't seen any build timeouts with the newer version.

@ArturT
Copy link
Member

ArturT commented Jul 28, 2023

@irphilli Thanks for the info.

@ArturT
Copy link
Member

ArturT commented Jul 31, 2023

@camallen I'm closing this issue for now. If you have more context, feel free to provide it or reopen the issue.

@ArturT ArturT closed this as completed Jul 31, 2023
@camallen
Copy link
Author

camallen commented Aug 1, 2023

Thanks @ArturT agree to close this as it's working.

FWIW - this appears to be an issue with the use of docker containers and the git-credentials that buildkite setup on the agents not being available in the docker containers. Propagating and configuring those credentials for the containers git tooling should fix this issue.

I'll fold any solutions I discover or get from buildkite folks.

@ArturT
Copy link
Member

ArturT commented Aug 1, 2023

@camallen ok. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug planned It’s planned to be done
Projects
None yet
Development

No branches or pull requests

4 participants