Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
#90 introduced additional logging, providing a `logAround()` method that timed the execution of `Future`s. However, it contained a bug! The `thunk` was passed as a 'by-name' parameter (see https://docs.scala-lang.org/tour/by-name-parameters.html) so that it wouldn't start executing until we were ready to start timing it, which is reasonable. 'by-name' parameters are evaluated *every* time they are used though, and the `logAround()` method evaluated it _twice_. So the thunk was executed twice, concurrently, when it was just supposed to be executed once. You can see in the logs below that the 3 pieces of code timed with `logAround()` were executed twice: ``` Jan 31 15:50:48 prout-bot app/web.1 [info] controllers.Api - githubHook repo=guardian/frontend githubDeliveryGuid=Some(0789db88-a17f-11ed-9d54-6d035aa39ad1) xRequestId=Some(78c70f90-622e-4bb0-8d53-c70b0727b4b0) Jan 31 15:50:51 prout-bot app/web.1 [info] lib.RepoUtil - Updating Git repo with fetch... https://github.com/guardian/frontend.git Jan 31 15:50:51 prout-bot app/web.1 [info] lib.RepoUtil - Updating Git repo with fetch... https://github.com/guardian/frontend.git Jan 31 15:50:51 prout-bot app/web.1 [info] c.m.s.GitHub - guardian/frontend - Git Repo ref count: Success(393) Jan 31 15:50:51 prout-bot app/web.1 [info] c.m.s.GitHub - guardian/frontend - 'fetch repo hooks' 196 ms : success=true Jan 31 15:50:51 prout-bot app/web.1 [info] c.m.s.GitHub - guardian/frontend - Git Repo ref count: Success(393) Jan 31 15:50:51 prout-bot app/web.1 [info] c.m.s.GitHub - guardian/frontend - 'fetch git repo' 341 ms : success=true Jan 31 15:50:57 prout-bot app/web.1 [info] c.m.s.GitHub - guardian/frontend - PRs merged to master size=25 Jan 31 15:50:57 prout-bot app/web.1 [info] c.m.s.GitHub - guardian/frontend - Merged Pull Requests fetched: Success(List(25871, 25869, 25868, 25865, 25862, 25861, 25860, 25859, 25857, 25856, 25851, 25850, 25849, 25848, 25846, 25845, 25844, 25842, 25841, 25838, 25837, 25836, 25834, 25792, 25749)) Jan 31 15:50:57 prout-bot app/web.1 [info] c.m.s.GitHub - guardian/frontend - 'fetch PRs' 6160 ms : success=true Jan 31 15:50:58 prout-bot app/web.1 [info] c.m.s.GitHub - guardian/frontend - PRs merged to master size=25 Jan 31 15:50:58 prout-bot app/web.1 [info] c.m.s.GitHub - guardian/frontend - Merged Pull Requests fetched: Success(List(25871, 25869, 25868, 25865, 25862, 25861, 25860, 25859, 25857, 25856, 25851, 25850, 25849, 25848, 25846, 25845, 25844, 25842, 25841, 25838, 25837, 25836, 25834, 25792, 25749)) Jan 31 15:50:58 prout-bot app/web.1 [info] l.RepoLevelDetails - Need to look at guardian/frontend, branch:main commit AnyObjectId[2bddf3a5f95129cf745eb7843b71ce9f8782eeca] ``` Fetching repo PRs and hooks through GitHub API calls can be duplicated without much issue (apart from perhaps doubling API quota consumed), but fetching the git repo itself (cloning/fetching) happens on a fixed folder on the filesystem, and having simultaneous threads trying to write to that folder would often lead to exceptions, trying to lock those files - here are two examples: ``` Jan 30 12:00:01 prout-bot app/web.1 [info] c.m.s.GitHub - guardian/members-data-api - Git Repo ref count: Failure(org.eclipse.jgit.api.errors.TransportException: lock error: /tmp/bot/working-dir/guardian/members-data-api/repo.git/shallow) ``` ``` Jan 30 12:44:10 prout-bot app/web.1 Caused by: org.eclipse.jgit.errors.LockFailedException: Cannot lock /tmp/bot/working-dir/guardian/prout/repo.git/config. Ensure that no other process has an open file handle on the lock file /tmp/bot/working-dir/guardian/prout/repo.git/config.lock, then you may delete the lock file and retry. Jan 30 12:44:10 prout-bot app/web.1 at org.eclipse.jgit.storage.file.FileBasedConfig.save(FileBasedConfig.java:185) Jan 30 12:44:10 prout-bot app/web.1 at org.eclipse.jgit.api.CloneCommand.fetch(CloneCommand.java:303) Jan 30 12:44:10 prout-bot app/web.1 at org.eclipse.jgit.api.CloneCommand.call(CloneCommand.java:191) Jan 30 12:44:10 prout-bot app/web.1 at org.eclipse.jgit.api.CloneCommand.call(CloneCommand.java:1) Jan 30 12:44:10 prout-bot app/web.1 at lib.RepoUtil$.invoke$1(RepoUtil.scala:39) Jan 30 12:44:10 prout-bot app/web.1 at lib.RepoUtil$.getUpToDateRepo$1(RepoUtil.scala:61) Jan 30 12:44:10 prout-bot app/web.1 at lib.RepoUtil$.getGitRepo(RepoUtil.scala:68) Jan 30 12:44:10 prout-bot app/web.1 at lib.RepoSnapshot$Factory.$anonfun$fetchLatestCopyOfGitRepo$1(RepoSnapshot.scala:121) ``` Sentry does a reasonable job of showing that these errors only started with PR #90 (looking at the 'First Seen' of 'Jan 26, 5:57 PM'): https://sentry.io/organizations/the-guardian/issues/3899449647/?project=49913&query=is%3Aunresolved&referrer=issue-stream
- Loading branch information