Fix GHA caches #719

makslevental · 2024-08-29T00:00:16Z

GHA (github action) caches aren't mutable so currently we're timestamping them:

windows-build-test-cpp-asserts-v1-f77db6670966b372878b4219599ba723feb41945-2024-08-28T19:33:14Z

Thus PRs that run will upload new caches and push out old but useful caches, e.g., caches from main that could be used by PRs.

This PR tries to mitigate that by using either PR number (in case the cache is produced by a PR) or github.ref_name-N where N always increments (in case the cache is produced by a branch run, like cache warming on main) to label the cache. The overwrite is done by first deleting any existing cache by that name and then uploading the new cache to the same label/key.

~~So PR caches look like this~~

linux-build-test-cpp-asserts-manylinux-v2-719

~~and will be fixed/reused for the duration of the PR.~~

~~And caches produced by main look like~~

linux-build-test-cpp-asserts-manylinux-v2-main-123

FYI for anyone wondering how this all works - the restore-keys are ~~longest prefix matched~~ (see #719 (comment)); so

key: linux-build-test-cpp-asserts-manylinux-v2-${{ github.event.number || format('{0}-{1}', github.ref_name, github.run_number) }}
restore-keys: linux-build-test-cpp-

will create a cache with the aforementioned name but then restore the cache with ~~longest match~~ (see #719 (comment)) starting from linux-build-test-cpp- (including caches created by main).~~

see #719 (comment)

jtuyls

LGTM

.github/workflows/ci-linux.yml

newling · 2024-08-29T18:38:37Z

Thanks for the explainer! Some noob questions:

What is the 123 in linux-build-test-cpp-asserts-manylinux-v2-main-123, is this the 123'rd cache of main ever created? Why do we need multiple caches of main?

Do we even need caches of branches. Like, shouldn't they always be close enough to main to make the cache from main basically good enough?

What about branches from forks, same as branches from nod.ai?

makslevental · 2024-08-29T18:46:39Z

What is the 123 in linux-build-test-cpp-asserts-manylinux-v2-main-123, is this the 123'rd cache of main ever created? Why do we need multiple caches of main?

123 is github.run_number. We need multiple caches to prevent a race condition when a cache warming run overlaps with a main build.

Do we even need caches of branches. Like, shouldn't they always be close enough to main to make the cache from main basically good enough?

Now this is on point - we don't, in fact @ScottTodd suggested (obliquely) that we just do away with caches on PRs and only cache on push to main. I keep fiddling with this PR trying to figure out why ccache is still missing/hitting on and off and I'm about to give up and do exactly this (remove caches for PRs).

What about branches from forks, same as branches from nod.ai?

I'm always confused about this - you're the one that pointed out to me that the linux script/action has always successfully cached things, even for forks. So according to that logic, everything here should work for forks. But reading the docs, which talk about how feature branches that share a common ancestor can share caches, it doesn't make sense to me - main is a common ancestor but in another repo? I mean maybe it really does just find the common ancestor hash and gate on that but I would've assumed there'd be stricter checking.

newling · 2024-08-29T19:05:44Z

We need multiple caches to prevent a race condition when a cache warming run overlaps with a main build.

Makes sense

I keep fiddling with this PR trying to figure out why ccache is still missing/hitting on and off and I'm about to give up and do exactly this (remove caches for PRs).

Good to know. Caching just main might also mean llvm (via iree) is built less frequently in CI. Situation I have in mind: I make a PR before iree is bumped, and then I'm stuck with the cache for that PR, even after main has a cache with the bumped llvm.

I'm always confused about this - you're the one that pointed out to me that since the linux script/action has always successfully cached things, even for forks. So according to that logic, everything here should work for forks. But reading the docs, which talk about how feature branches that share a common ancestor can share caches, it doesn't make sense to me - main is a common ancestor but in another repo? I mean maybe it really does just find the common ancestor hash and gate on that but I would've assumed there'd be stricter checking.

Maybe the strictness needs to be introduced by other means. The first time you make a PR to IREE/torch-mlir, you need to request someone in the IREE team starts CI. If that's what you mean by 'stricter checking'. For my part I'm glad branches 'just work'.

ScottTodd · 2024-08-29T19:27:17Z

Now this is on point - we don't, in fact @ScottTodd suggested (obliquely) that we just do away with caches on PRs and only cache on push to main. I keep fiddling with this PR trying to figure out why ccache is still missing/hitting on and off and I'm about to give up and do exactly this (remove caches for PRs).

We have a few different workflows doing different things:

https://github.com/iree-org/iree/blob/main/.github/workflows/pkgci_build_packages.yml uses actions/cache without much customization. That goes through some layers of scripts and Docker to write into a local cache dir manually and have github save/restore that full directory. Pushes and PRs both read from and write to that, so https://github.com/iree-org/iree/actions/caches is typically at 20GB/10GB used. I've been considering changing that to only write to the cache on pushes, since the "post enable cache" step also takes 2m30s to upload - which is a bottleneck for starting dependent jobs. Using a cache hosted off of GitHub (GCP/Azure/AWS) would be better for storage limits and transfer speeds but would cost some money and require occasional maintenance.
https://github.com/iree-org/iree/blob/main/.github/workflows/ci.yml uses hendrikmuhs/ccache-action in a few places, with no Docker or scripts in the way. I have those set to only "save" to the cache on push events

I have some ideas for other experiments to try tracked at iree-org/iree#18185.

makslevental · 2024-08-29T22:42:03Z

I'm changing this PR to just not create caches for PRs. Let's see how that goes for a while.

makslevental · 2024-08-29T22:46:40Z

For the future: CCACHE_COMPILERCHECK=content will prevent mac from getting cache hits even though it's supposed to help hendrikmuhs/ccache-action#146.

EDIT: but it does help on windows 🤷
EDIT2: and linux 🤷 🤷 🤷

In truth - you can actually pass CCACHE_COMPILERCHECK a string eg clang -v output. If I ever have nothing else to do, I plan to try that out.

makslevental force-pushed the makslevental/overwrite-gha-caches branch from be38bc1 to ff188d7 Compare August 29, 2024 00:01

makslevental requested a review from ScottTodd August 29, 2024 00:02

makslevental changed the title ~~[wip] overwrite GHA caches~~ Overwrite GHA caches Aug 29, 2024

makslevental requested review from newling and jtuyls August 29, 2024 00:59

makslevental force-pushed the makslevental/overwrite-gha-caches branch 7 times, most recently from 3bc12eb to d7f1409 Compare August 29, 2024 05:26

jtuyls approved these changes Aug 29, 2024

View reviewed changes

.github/workflows/ci-linux.yml Show resolved Hide resolved

makslevental force-pushed the makslevental/overwrite-gha-caches branch 8 times, most recently from 9a320f4 to 5919494 Compare August 29, 2024 18:20

makslevental force-pushed the makslevental/overwrite-gha-caches branch from 5919494 to 5d50235 Compare August 29, 2024 18:47

makslevental force-pushed the makslevental/overwrite-gha-caches branch from 5d50235 to e1d45d8 Compare August 29, 2024 19:21

makslevental force-pushed the makslevental/overwrite-gha-caches branch 3 times, most recently from f79daaf to e3af51c Compare August 29, 2024 22:34

makslevental force-pushed the makslevental/overwrite-gha-caches branch 5 times, most recently from be3bcb5 to c3ee770 Compare August 29, 2024 22:40

makslevental changed the title ~~Overwrite GHA caches~~ Fix GHA caches Aug 29, 2024

makslevental force-pushed the makslevental/overwrite-gha-caches branch 3 times, most recently from 7f9f3c2 to 1425494 Compare August 29, 2024 23:25

makslevental added 3 commits August 29, 2024 16:49

[wip] overwrite GHA caches

056ac66

increment branch cache key

01bf40e

remove cache storage for PRs

7d0888a

makslevental force-pushed the makslevental/overwrite-gha-caches branch from 1425494 to 7d0888a Compare August 29, 2024 23:50

makslevental enabled auto-merge (squash) August 29, 2024 23:50

makslevental merged commit 3a8f993 into main Aug 30, 2024
4 of 5 checks passed

makslevental deleted the makslevental/overwrite-gha-caches branch August 30, 2024 00:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GHA caches #719

Fix GHA caches #719

makslevental commented Aug 29, 2024 •

edited

Loading

jtuyls left a comment

newling commented Aug 29, 2024

makslevental commented Aug 29, 2024 •

edited

Loading

newling commented Aug 29, 2024 •

edited

Loading

ScottTodd commented Aug 29, 2024

makslevental commented Aug 29, 2024

makslevental commented Aug 29, 2024 •

edited

Loading

Fix GHA caches #719

Fix GHA caches #719

Conversation

makslevental commented Aug 29, 2024 • edited Loading

jtuyls left a comment

Choose a reason for hiding this comment

newling commented Aug 29, 2024

makslevental commented Aug 29, 2024 • edited Loading

newling commented Aug 29, 2024 • edited Loading

ScottTodd commented Aug 29, 2024

makslevental commented Aug 29, 2024

makslevental commented Aug 29, 2024 • edited Loading

makslevental commented Aug 29, 2024 •

edited

Loading

makslevental commented Aug 29, 2024 •

edited

Loading

newling commented Aug 29, 2024 •

edited

Loading

makslevental commented Aug 29, 2024 •

edited

Loading