Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GHA caches #719

Merged
merged 3 commits into from
Aug 30, 2024
Merged

Fix GHA caches #719

merged 3 commits into from
Aug 30, 2024

Conversation

makslevental
Copy link
Collaborator

@makslevental makslevental commented Aug 29, 2024

GHA (github action) caches aren't mutable so currently we're timestamping them:

windows-build-test-cpp-asserts-v1-f77db6670966b372878b4219599ba723feb41945-2024-08-28T19:33:14Z

Thus PRs that run will upload new caches and push out old but useful caches, e.g., caches from main that could be used by PRs.

This PR tries to mitigate that by using either PR number (in case the cache is produced by a PR) or github.ref_name-N where N always increments (in case the cache is produced by a branch run, like cache warming on main) to label the cache. The overwrite is done by first deleting any existing cache by that name and then uploading the new cache to the same label/key.

So PR caches look like this

linux-build-test-cpp-asserts-manylinux-v2-719

and will be fixed/reused for the duration of the PR.

And caches produced by main look like

linux-build-test-cpp-asserts-manylinux-v2-main-123

FYI for anyone wondering how this all works - the restore-keys are longest prefix matched (see #719 (comment)); so

key: linux-build-test-cpp-asserts-manylinux-v2-${{ github.event.number || format('{0}-{1}', github.ref_name, github.run_number) }}
restore-keys: linux-build-test-cpp-

will create a cache with the aforementioned name but then restore the cache with longest match (see #719 (comment)) starting from linux-build-test-cpp- (including caches created by main).~~

see #719 (comment)

@makslevental makslevental force-pushed the makslevental/overwrite-gha-caches branch from be38bc1 to ff188d7 Compare August 29, 2024 00:01
@makslevental makslevental requested a review from ScottTodd August 29, 2024 00:02
@makslevental makslevental changed the title [wip] overwrite GHA caches Overwrite GHA caches Aug 29, 2024
@makslevental makslevental requested review from newling and jtuyls August 29, 2024 00:59
@makslevental makslevental force-pushed the makslevental/overwrite-gha-caches branch 7 times, most recently from 3bc12eb to d7f1409 Compare August 29, 2024 05:26
Copy link
Collaborator

@jtuyls jtuyls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@makslevental makslevental force-pushed the makslevental/overwrite-gha-caches branch 8 times, most recently from 9a320f4 to 5919494 Compare August 29, 2024 18:20
@newling
Copy link
Contributor

newling commented Aug 29, 2024

Thanks for the explainer! Some noob questions:

What is the 123 in linux-build-test-cpp-asserts-manylinux-v2-main-123, is this the 123'rd cache of main ever created? Why do we need multiple caches of main?

Do we even need caches of branches. Like, shouldn't they always be close enough to main to make the cache from main basically good enough?

What about branches from forks, same as branches from nod.ai?

@makslevental
Copy link
Collaborator Author

makslevental commented Aug 29, 2024

What is the 123 in linux-build-test-cpp-asserts-manylinux-v2-main-123, is this the 123'rd cache of main ever created? Why do we need multiple caches of main?

123 is github.run_number. We need multiple caches to prevent a race condition when a cache warming run overlaps with a main build.

Do we even need caches of branches. Like, shouldn't they always be close enough to main to make the cache from main basically good enough?

Now this is on point - we don't, in fact @ScottTodd suggested (obliquely) that we just do away with caches on PRs and only cache on push to main. I keep fiddling with this PR trying to figure out why ccache is still missing/hitting on and off and I'm about to give up and do exactly this (remove caches for PRs).

What about branches from forks, same as branches from nod.ai?

I'm always confused about this - you're the one that pointed out to me that the linux script/action has always successfully cached things, even for forks. So according to that logic, everything here should work for forks. But reading the docs, which talk about how feature branches that share a common ancestor can share caches, it doesn't make sense to me - main is a common ancestor but in another repo? I mean maybe it really does just find the common ancestor hash and gate on that but I would've assumed there'd be stricter checking.

@makslevental makslevental force-pushed the makslevental/overwrite-gha-caches branch from 5919494 to 5d50235 Compare August 29, 2024 18:47
@newling
Copy link
Contributor

newling commented Aug 29, 2024

We need multiple caches to prevent a race condition when a cache warming run overlaps with a main build.

Makes sense

I keep fiddling with this PR trying to figure out why ccache is still missing/hitting on and off and I'm about to give up and do exactly this (remove caches for PRs).

Good to know. Caching just main might also mean llvm (via iree) is built less frequently in CI. Situation I have in mind: I make a PR before iree is bumped, and then I'm stuck with the cache for that PR, even after main has a cache with the bumped llvm.

I'm always confused about this - you're the one that pointed out to me that since the linux script/action has always successfully cached things, even for forks. So according to that logic, everything here should work for forks. But reading the docs, which talk about how feature branches that share a common ancestor can share caches, it doesn't make sense to me - main is a common ancestor but in another repo? I mean maybe it really does just find the common ancestor hash and gate on that but I would've assumed there'd be stricter checking.

Maybe the strictness needs to be introduced by other means. The first time you make a PR to IREE/torch-mlir, you need to request someone in the IREE team starts CI. If that's what you mean by 'stricter checking'. For my part I'm glad branches 'just work'.

@makslevental makslevental force-pushed the makslevental/overwrite-gha-caches branch from 5d50235 to e1d45d8 Compare August 29, 2024 19:21
@ScottTodd
Copy link
Member

Now this is on point - we don't, in fact @ScottTodd suggested (obliquely) that we just do away with caches on PRs and only cache on push to main. I keep fiddling with this PR trying to figure out why ccache is still missing/hitting on and off and I'm about to give up and do exactly this (remove caches for PRs).

We have a few different workflows doing different things:

I have some ideas for other experiments to try tracked at iree-org/iree#18185.

@makslevental makslevental force-pushed the makslevental/overwrite-gha-caches branch 3 times, most recently from f79daaf to e3af51c Compare August 29, 2024 22:34
@makslevental makslevental force-pushed the makslevental/overwrite-gha-caches branch 5 times, most recently from be3bcb5 to c3ee770 Compare August 29, 2024 22:40
@makslevental makslevental changed the title Overwrite GHA caches Fix GHA caches Aug 29, 2024
@makslevental
Copy link
Collaborator Author

I'm changing this PR to just not create caches for PRs. Let's see how that goes for a while.

@makslevental
Copy link
Collaborator Author

makslevental commented Aug 29, 2024

For the future: CCACHE_COMPILERCHECK=content will prevent mac from getting cache hits even though it's supposed to help hendrikmuhs/ccache-action#146.

EDIT: but it does help on windows 🤷
EDIT2: and linux 🤷 🤷 🤷

In truth - you can actually pass CCACHE_COMPILERCHECK a string eg clang -v output. If I ever have nothing else to do, I plan to try that out.

@makslevental makslevental force-pushed the makslevental/overwrite-gha-caches branch 3 times, most recently from 7f9f3c2 to 1425494 Compare August 29, 2024 23:25
@makslevental makslevental force-pushed the makslevental/overwrite-gha-caches branch from 1425494 to 7d0888a Compare August 29, 2024 23:50
@makslevental makslevental enabled auto-merge (squash) August 29, 2024 23:50
@makslevental makslevental merged commit 3a8f993 into main Aug 30, 2024
4 of 5 checks passed
@makslevental makslevental deleted the makslevental/overwrite-gha-caches branch August 30, 2024 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants