-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-work of mucoll-spack to enable build of releases fully on github #17
Conversation
@madbaron Do you have instructions on what you did to setup the |
.github/workflows/mucoll-rebuild.yml
Outdated
@@ -0,0 +1,77 @@ | |||
name: Re-build and publish MuColl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this works in your fork, but I don't understand how. Don't you first need to build the base image everytime as the FROM
base image tag depends on the SHA of this commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the branch name, not the SHA. But don't we need something to trigger the building of the first mucoll-spack
image in any new branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I propose that we disable the partial image build now and only enable the full image build for now? It will make testing changes that only affect the last image easier, since the partial builds fail by design.
We then revisit this once we have a series of central base images that branches can start from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty open to any problem, but can you elaborate a bit more?
I don't think I understand:
- what the problem is with the current workflow
- why only doing the full build would make it easier to debug things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with the workflow can be illustrated by my test
brach. For illustration, I've "upgraded" the lcgeo package version. Only the sim:test
image build is triggered as the change only affects that stage. However that depends on the minimal:test
image, which does not exist. You can see the triggered and failed actions here.
This is the problem with the current workflow that I'm trying to highlight in this thread. It assumes the branch name in the new image tag for all images, even when the base image (e.g. minimal
) does not (and is not) have to be rebuild. Long term, this has plenty of solutions (e.g. we fix a dependency on a key4hep external image tag or try to do something smart like using a main
base image if the branch one is not found). But those are not available right now, which is why I'm proposing to stick to the long full builds for now to get something that works 100%. Do let me know if I missed something.
Regarding the "easier to debug things". Due to the problem described above, doing the full build is currently the only setup that can pass by design. Thus it is the only way to test that the committed contents work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I got your point much better now.
I think the only argument towards keeping the partial builds around was debug time once a branch exists (anyway none of these workflows can be listed as a requirement in PRs because the github tokens would not get write access) but that is likely largely solved by your buildcache updates.
I'll experiment with implementing the "fall back to main base image". If it turns out to be easy, I think it will be nice to have around anyway to save a few extra minutes.
The GitHub token setup should be ok (ok as in should work out if the box) as is. However, if the registry contains an image with the same name but uploaded from another repository, the token won't get write access. In that case you need to get rid of the previous image (I couldn't find a way to get the GitHub token to write over it). |
Yes, the problem turned out to be an existing |
@madbaron @tmadlener Have you considered using buildcaches to prevent rebuilding the common system packages? I have a prototype mostly working in my buildcache branch. It currently fails building py-onnxruntime due to running out of space. But it correctly pulls packages built in previous attempts and thus is much quicker. For example, build-minimal takes 17 min when all packages are found in the cache. |
Hi @kkrizka, But I see the workflow in your build-minimal example was successful until the end - so I guess we can pick it up? I think we should get things moving. Shall we merge this one first (after addressing the remaining discussion points) and follow-up just after with your buildcache additions? |
Hi @madbaron, yes it would be better to merge this and proceed on using buildcache in a separate PR. There is still some work left and there might be some discussion on the details (e.g. do we run spack standalone first and then again for docker?) |
This now seems like a good start for the updated workflow. Shall we merge this now and follow up on any additions via smaller PRs? |
Thanks. |
Actually might be worth waiting until key4hep/key4hep-spack#676 is merged so the ref hash becomes permanent. The current ref is broken as for 30 min ago as @tmadlener has been busy. It sounds like the merging of that is close though? |
In preparation for the next software release, this PR updates mucoll-spack to enable the complete build of releases within the github workflows.
This relies on an intermediate spack environment (key4hep-dev-base) by @tmadlener that contains the external dependencies of the key4hep stack. This environment is used to build an intermediate docker image (called mucoll-minimal) that is subsequently picked up to build the muon collider stack.
The CI is configured so that whenever the dockerfiles, or the spack configuration is changed, only the affected images (and those downstream) will be re-built.
The resulting docker images are automatically published and linked to the github page.
The complete workflow takes about 8 hours to complete on the github runners.
Please note that deploying the workflows to re-build the releases requires (without using external triggers) the dockerfiles to be bundled together with mucoll-spack, which is different from what we did with 2.9.
Personally I think this is ok (perhaps even a little better for discoverability) but please let me know if you disagree.
Adding a few people as reviewers!
Some notes for the future: