Re-work of mucoll-spack to enable build of releases fully on github #17

madbaron · 2024-11-25T15:51:24Z

In preparation for the next software release, this PR updates mucoll-spack to enable the complete build of releases within the github workflows.

This relies on an intermediate spack environment (key4hep-dev-base) by @tmadlener that contains the external dependencies of the key4hep stack. This environment is used to build an intermediate docker image (called mucoll-minimal) that is subsequently picked up to build the muon collider stack.

The CI is configured so that whenever the dockerfiles, or the spack configuration is changed, only the affected images (and those downstream) will be re-built.
The resulting docker images are automatically published and linked to the github page.

The complete workflow takes about 8 hours to complete on the github runners.

Please note that deploying the workflows to re-build the releases requires (without using external triggers) the dockerfiles to be bundled together with mucoll-spack, which is different from what we did with 2.9.
Personally I think this is ok (perhaps even a little better for discoverability) but please let me know if you disagree.

Adding a few people as reviewers!

Some notes for the future:

the preparation of the mucoll-minimal image (which is really a key4hep-minimal) would be best moved to key4hep-spack as it is completely generic
we should set up a cron-triggered workflow to run as nightly build with the latest versions of all packages
the previous concretize workflows have been removed, as they are part of the docker image building process. They could be split off to a separate step of the workflow if useful
we can review the list of packages to be compiled (i.e. all event generators) to make the resulting image a little slimmer

kkrizka · 2024-12-24T10:53:16Z

@madbaron Do you have instructions on what you did to setup the GITHUB_TOKEN to push to your own GHRC? I've made sure that " Read and write permissions" are set under "Settings -> Actions -> General" and I saw you already set the permissions:packages field in the workflow YAML. However I still keep getting a 403 Forbidden when trying to push (example). Googling so far hasn't been help. I've managed to get it working using a classic token, but I don't think that's compatible with your GITHUB_TOKEN setup.

.github/workflows/full-rebuild.yml

kkrizka · 2024-12-24T12:07:02Z

.github/workflows/mucoll-rebuild.yml

@@ -0,0 +1,77 @@
+name: Re-build and publish MuColl


I know this works in your fork, but I don't understand how. Don't you first need to build the base image everytime as the FROM base image tag depends on the SHA of this commit?

It's the branch name, not the SHA. But don't we need something to trigger the building of the first mucoll-spack image in any new branch?

Can I propose that we disable the partial image build now and only enable the full image build for now? It will make testing changes that only affect the last image easier, since the partial builds fail by design.

We then revisit this once we have a series of central base images that branches can start from.

I'm pretty open to any problem, but can you elaborate a bit more?
I don't think I understand:

what the problem is with the current workflow

why only doing the full build would make it easier to debug things

The problem with the workflow can be illustrated by my test brach. For illustration, I've "upgraded" the lcgeo package version. Only the sim:test image build is triggered as the change only affects that stage. However that depends on the minimal:test image, which does not exist. You can see the triggered and failed actions here.

This is the problem with the current workflow that I'm trying to highlight in this thread. It assumes the branch name in the new image tag for all images, even when the base image (e.g. minimal) does not (and is not) have to be rebuild. Long term, this has plenty of solutions (e.g. we fix a dependency on a key4hep external image tag or try to do something smart like using a main base image if the branch one is not found). But those are not available right now, which is why I'm proposing to stick to the long full builds for now to get something that works 100%. Do let me know if I missed something.

Regarding the "easier to debug things". Due to the problem described above, doing the full build is currently the only setup that can pass by design. Thus it is the only way to test that the committed contents work.

Thanks, I got your point much better now.

I think the only argument towards keeping the partial builds around was debug time once a branch exists (anyway none of these workflows can be listed as a requirement in PRs because the github tokens would not get write access) but that is likely largely solved by your buildcache updates.

I'll experiment with implementing the "fall back to main base image". If it turns out to be easy, I think it will be nice to have around anyway to save a few extra minutes.

…ent.

madbaron · 2024-12-24T15:21:39Z

The GitHub token setup should be ok (ok as in should work out if the box) as is.

However, if the registry contains an image with the same name but uploaded from another repository, the token won't get write access.
Could this be your case?

In that case you need to get rid of the previous image (I couldn't find a way to get the GitHub token to write over it).

kkrizka · 2024-12-24T15:33:09Z

Yes, the problem turned out to be an existing mucoll-spackpackage from my old MuonCollider-docker fork. The other way to solve it was to add the new repository to the package (instructions).

kkrizka · 2025-01-01T15:49:00Z

@madbaron @tmadlener Have you considered using buildcaches to prevent rebuilding the common system packages? I have a prototype mostly working in my buildcache branch. It currently fails building py-onnxruntime due to running out of space. But it correctly pulls packages built in previous attempts and thus is much quicker. For example, build-minimal takes 17 min when all packages are found in the cache.

madbaron · 2025-01-04T14:09:11Z

Hi @kkrizka,
yes, @tmadlener brought this up a few times (this is how the EIC builds their images - but using some remote S3 storage).
When looking at the free buildcache hosted by github it was not obvious it would be enough, so we postponed looking into it while we were ironing out the multi-step container workflow.

But I see the workflow in your build-minimal example was successful until the end - so I guess we can pick it up?

I think we should get things moving. Shall we merge this one first (after addressing the remaining discussion points) and follow-up just after with your buildcache additions?
Or would you rather push your changes into this PR directly?

kkrizka · 2025-01-04T18:59:38Z

Hi @madbaron, yes it would be better to merge this and proceed on using buildcache in a separate PR. There is still some work left and there might be some discussion on the details (e.g. do we run spack standalone first and then again for docker?)

kkrizka · 2025-01-07T10:01:55Z

This now seems like a good start for the updated workflow. Shall we merge this now and follow up on any additions via smaller PRs?

madbaron · 2025-01-07T10:16:33Z

Thanks.
I'll merge this as soon as the workflow finishes successfully in my branch (I got one of the checksums wrong when tagging the packages).

kkrizka · 2025-01-07T14:46:50Z

Actually might be worth waiting until key4hep/key4hep-spack#676 is merged so the ref hash becomes permanent. The current ref is broken as for 30 min ago as @tmadlener has been busy. It sounds like the merging of that is close though?

Federico Meloni and others added 30 commits November 14, 2024 10:55

updates for 2.10

3ff0427

update spack commit

c3ccf9c

updating cherry picks

bb4d864

split build towards 2.10

c77399b

specifying root version

28bfada

got rid of hardcoded hashes

11a3282

integrating docker image building

07196b0

update dockerfile spack

4e0f004

change workflow name

be16f1f

test latest commit

1436eaf

trigger full rebuild

640eb53

removed image sourcew

6edfea8

update key4hep-spack commit

3098716

fix minimal workflow

4348ab0

update minimal wf

71da3e7

free even more disk space

4182997

updating workflows to self-trigger

6a29650

checkout correct branch

7369075

Merge branch 'MuonColliderSoft:master' into master

775cd13

whitespace

a1c1405

version

79d2f1f

removing specific podio from debug rel

58f2f53

synch debug with rel

987065e

capital C

4a0c69a

checking out master

7a1a62c

checking out correct branch

1e3bd6b

updating github actions

8bc827d

fix spec list

8ddca00

passing git hash

602855e

passing git hash

e4acb5d

kkrizka added 2 commits December 22, 2024 12:27

Fix variable propagation in build.sh

4e046b4

Update variables in the sim image workflow.

eea4f6c

kkrizka reviewed Dec 24, 2024

View reviewed changes

.github/workflows/full-rebuild.yml Show resolved Hide resolved

kkrizka reviewed Dec 24, 2024

View reviewed changes

Don't add key4hep-external-stack as it is already part of an environm…

7884a87

…ent.

kkrizka added 3 commits December 31, 2024 15:50

mucoll-release: update ACTSTracking version to 1.3.1

18d4bcf

Add patch to compile openloops on aarch64.

e49a063

Update Dockerfile-sim to point to latest reference.

862b4e2

Federico Meloni added 3 commits January 4, 2025 15:30

renaming minimal to externals

7301781

renaming minimal to externals

06ecee0

picking up karol's updates

57ca0e2

Federico Meloni added 6 commits January 6, 2025 13:26

removing partial rebuilds

0641211

fully specifying package versions for release

d735638

updating readme

4c58056

checking out good commits in tests

89b03cf

checking out good commits in tests

7bd92df

correcting checksum

4a72ccb

kkrizka mentioned this pull request Jan 7, 2025

Draft: Use buildcache as part of the GitHub Actions #20

Draft

2 tasks

Federico Meloni added 2 commits January 7, 2025 21:00

picking up latest key4hep-spack commit

d650065

picking up merged key4hep-spack externals

22b9739

madbaron merged commit c2bf588 into MuonColliderSoft:master Jan 10, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-work of mucoll-spack to enable build of releases fully on github #17

Re-work of mucoll-spack to enable build of releases fully on github #17

madbaron commented Nov 25, 2024 •

edited

Loading

kkrizka commented Dec 24, 2024

kkrizka Dec 24, 2024

kkrizka Dec 24, 2024

kkrizka Dec 31, 2024

madbaron Dec 31, 2024

kkrizka Jan 1, 2025 •

edited

Loading

madbaron Jan 4, 2025

madbaron commented Dec 24, 2024 •

edited

Loading

kkrizka commented Dec 24, 2024

kkrizka commented Jan 1, 2025

madbaron commented Jan 4, 2025

kkrizka commented Jan 4, 2025

kkrizka commented Jan 7, 2025

madbaron commented Jan 7, 2025

kkrizka commented Jan 7, 2025

Re-work of mucoll-spack to enable build of releases fully on github #17

Re-work of mucoll-spack to enable build of releases fully on github #17

Conversation

madbaron commented Nov 25, 2024 • edited Loading

kkrizka commented Dec 24, 2024

kkrizka Dec 24, 2024

Choose a reason for hiding this comment

kkrizka Dec 24, 2024

Choose a reason for hiding this comment

kkrizka Dec 31, 2024

Choose a reason for hiding this comment

madbaron Dec 31, 2024

Choose a reason for hiding this comment

kkrizka Jan 1, 2025 • edited Loading

Choose a reason for hiding this comment

madbaron Jan 4, 2025

Choose a reason for hiding this comment

madbaron commented Dec 24, 2024 • edited Loading

kkrizka commented Dec 24, 2024

kkrizka commented Jan 1, 2025

madbaron commented Jan 4, 2025

kkrizka commented Jan 4, 2025

kkrizka commented Jan 7, 2025

madbaron commented Jan 7, 2025

kkrizka commented Jan 7, 2025

madbaron commented Nov 25, 2024 •

edited

Loading

kkrizka Jan 1, 2025 •

edited

Loading

madbaron commented Dec 24, 2024 •

edited

Loading