-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jobs/build: don't start new build until previous build completes #823
base: main
Are you sure you want to change the base?
Conversation
We only actually run the jobs when `uploading` is set. So it seems wrong to update the build description. In fact, let's not even waste time looking for missing arches or reading build metadata. Patch best viewed with whitespace ignored.
a3f861f
to
b9d3a04
Compare
did some trivial testing on this. It seems to work |
Since 973bcf9 ("jobs/build: rerun `build-arch` if previous build is incomplete"), there is a race possible where the `build` job rerun logic could kick in before the `release` jobs initially triggered for that build has finished. We don't want to queue builds in that case. Let's just prevent starting up the build job at all if there are remaining pieces from another build for this stream running. Co-authored-by: Jonathan Lebon <[email protected]>
b9d3a04
to
a084590
Compare
Hmm, I thought we wanted to keep the property of being able to start a new build while the previous build is still finishing? If that's not the case, a simpler approach is to always wait for the multi-arch and release jobs. And that would come with benefits like centralizing Slack notifications and not having to support an extra parameter and diverging between FCOS and RHCOS. (Edit: actually, another huge benefit would be that it'd be completely race-free. We'd be able to simplify our lock logic by a lot.) |
ehh, I've never been too set on that as a requirement.
I think the thing I wanted to keep more was the fidelity between the job and failure reporting. i.e. if the release job fails I don't really want the x86_64 build job to report failure. I'd also like notifications to come through when the x86_64 job completes (letting us know it succeeded) and this typically happens way before the other jobs. Maybe what we need is another toplevel job that's the entry point and will fail if any of the parts fail. Not sure how easy this would be to do because there is a lot of conditional logic baked in, but could be something to consider. |
We should probably think on this a bit more and look at some numbers. It means we significantly lower the rate at which we can build a stream and also increase the latency between requesting a build and having it.
Note though that's exactly what 4c02574 does to help the ART case. :)
Hmm yeah, I think we can get that by having the build job send its message when it's done and just waiting for the multi-arch and release jobs? We don't need to propagate failures from those if we don't want (that'll have to be a parameter that replaces
Yeah, maybe. There's a natural tension it seems between RHCOS and FCOS wrt how we want the pipeline to behave. We might end up there (a separate job), though ideally we try to rework things so that they become more similar while still preserving what we value. We could actually probably implement waiting for the build-arch and release jobs (race-free) without lots of locks and without giving up on pipelining. The milestone step is relevant here. Anyway, I guess we need a fix soon for the race condition. I think #822 has a much smaller impact on pipeline behaviour; WDYT about merging that one for now while we discuss the above? |
I'm not so sure this is a significant difference than what we require today. I'd say it's not too common that we have multiple builds of the same stream in flight.
Indeed, which is why it's a parameter and not the default.
👍
yes, but that does imply that pipelining is something we heavily take advantage of today or want to in the future. I really don't think (in the current pipeline setup) it's something we really need. Maybe if we had a slim version of the build that happened "all the time" and a heavy one that happened once a day or something. But right now we're running the heavy one all the time and that doesn't lend itself well to pipelining IMO. quickly invalidating all those images we just created and cloud uploads we just did is a bit of an anti-pattern.
I commented over there. I think I still prefer this (let's talk more!) but won't block that PR either. |
Agree, let's keep talking! I've updated that PR with your feedback. I think you're probably right we don't heavily rely on pipelining today. I just didn't want to make this change quickly to fix the race without thinking it through since it has a wider impact. I think there's an opportunity here to converge on something simpler than the status quo. Maybe we can get together and discuss what we want the "ideal" to look like. |
Since 973bcf9 ("jobs/build: rerun
build-arch
if previous build isincomplete"), there is a race possible where the
build
job rerun logiccould kick in before the
release
jobs initially triggered for thatbuild has finished. We don't want to queue builds in that case.
Let's just prevent starting up the build job at all if there are
remaining pieces from another build for this stream running.