-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider creating floating OS tags for use by repos which aren't dotnet/runtime #1279
Comments
cc @richlander |
This is good topic. Yes, the content was written with a dotnet/runtime bias. We could consider this. I certainly believe you that Linux breaks are low. The major point of the bottom part of the product is to insulate most libraries and apps from pain. If we did this, there would be a risk of CI breakage, but it likely isn't that high. It would be very similar to people who use The key question is how many repos would use this model. Assuming we agree to go this direction, is it worth the effort to maintain the model as you propose. I think if we got a half-dozen repos to sign up for that, it would make sense. |
After a tiny bit more thought, I think we'd want to take a page from our build images: https://github.com/dotnet/runtime/blob/main/docs/workflow/using-docker.md#the-official-runtime-docker-images I propose a model like this:
I'm thinking of something like: Alternatively, is there a reason why Arcade-provided templates are not updating these tags for you? That's another way to solve this problem. |
If there's a way to get something like darc to update them, that works for me too. I don't know enough about the inner workings of darc, whether you can have a concept of a version of something which isn't a package, but if it could automatically update our versions file with a string which could be used as the basis of a container name, then that would be a workable solution. We do have a property to specify which .NET version to publish our tests to for sending to Helix, so we could use that value as part of the container tag. Eg we would do something like this: <HelixTargetQueue Include="(Debian-$(XUnitPublishTargetFramework))[email protected]/dotnet-buildtools/prereqs:debian-$(XUnitPublishTargetFramework)" /> I'm happy with either option, or some other solution, my main goal is to not have to update things because a distro version is out of support. We really only run on a broad range of distros because in the early days of .NET Core, the WCF tests caught subtle differences in networking/security related behaviors between distros that corefx didn't as our tests covered a wider variety of scenarios (back when different libcurl installations could be compiled with different options). We haven't caught a problem which the runtime missed in a long time now, so it's more of a sanity check in case a customer has a problem with a specific distro we can be confident it's something specific to their setup rather than a fundamental problem. |
@mthalman @mmitche -- Who are the best folks to consider what a good model would look like? This isn't pressing. I agree with @mconnew that we should consider aligning ease-of-use and reliability concerns. Not every repo is as "concerning" as dotnet/runtime. On the flip-side, old container images are not serving any purpose for us. |
A PR flow model would be broadly useful. Most product repos would be able to utilize this. This would reduce manual work while keeping more repos on the latest. |
I am guessing that repos split via MSBuild vs yaml, w/how the container images are specified. I am also guessing that the repos to start with would be the MSBuild ones. Is that right? |
@MichaelSimons When you talk about a PR model, do you mean something that would automatically move from debian-11 to debian-12? or from specific image builds of debian-11? I generally favor a tag model. OS version specific tags (debian-11), OS family specific tags (debian-latest), and specific image references where needed (e.g. a break occurs) over a PR model. Our existing PR model that uses darc is frequently an invitation to stay on old versions of dependencies, rather than upgrade them. This is typically because of a couple factors:
Consider the 'simplest' dependency we have sitting around that flows widely, arcade. There are repositories with ancient versions of arcade sitting around simply because the branch or repo doesn't get a lot of attention. Other non-arcade dependencies tend to be much worse. I think we'd be inviting less uniformity and up-to-date-ness (important for security!) than if we encouraged folks to just take a latest tag. The other thing is that a tag model is already implementable in the current system. Maestro changes to update images are non-trivial and is a huge feature ask. |
I am referring to opening a PR to upgrade between OS versions. This is when there are risks associated with the upgrade. We already have OS version specific tags. The tags get updated frequently today via automation to pick up the latest security fixes. Repos automatically get the updates because of the "floating" nature of the tags. This work well. |
I'm sure this could be done using Renovate. That wouldn't require any infrastructure work on our part beyond a configuration file in the repo. As long as we defined the tags in such a way that Renovate can interpret the version-ness, it might just work. |
We should probably have a meeting about this. I suspect that there is a lot of context that isn't shared across everyone. Isn't there a "CI Council" group. Can we crash your meeting? We'd like to float this idea with you. I can tell that there are some fixed views on this, but I imagine we can design a better system if we attag it together. |
Yeah we can talk about this. To me, the questions that are important are who are the consumers who would favor a PR model vs. |
dotnet/runtime won't be moving to that model. In fact, the runtime and libraries test legs don't use the same strategies and are also responsible for testing everything. If we enabled this model, it would only be for repos that need minimal coverage. We'd only do that for a small number of distros. I think we should only consider this proposal if we can get a dozen or more repos that want this model. |
There was a meeting between @MichaelSimons, @mmitche, and @mthalman to discuss this. We propose the following:
|
That sounds good. Naturally, we'll want a short spec written on the scheme if we go that direction. I think we'll want a new team created that we can at-mention on issues when we are in process of updating tags. I see a benefit for adopting this model for as few distros as possible. That way, when we at-mention the team, there is a high likelihood that we're updating a distro they care about. I propose we go with Ubuntu and Alpine, only. |
I talked to @jkotas about this. He was saying that the best model for runtime would be PR-driven digest updates to avoid surprises, for both build and test images. I think the following might be the best model, starting with the most conservative:
I like the idea of using this opportunity (if we move forward with it) to improve both efficiency, compliance, and test coverage. An open question is which model that ASP.NET Core would prefer. |
What problem is this solving? I am not in favor of this, it is a step backwards. It will slow the rollout of image rebuilds to pickup CVE fixes, therefore exploding the number of EOL images in use across .NET which is already a problem. Furthermore the readability of digest references is terrible as the digest gives no indication of the distro/version in use. |
I like the "Floating tag (per .NET version) w/PR or human driven updates" model - I'd be hesitant to use a If we have another meeting about this, I can represent aspnetcore |
This isn't a factor. You can specify both the tag and the digest in the image name reference. The tag becomes just for human readability at that point. |
It would be solving surprise build breaks from build images updates we get in dotnet/runtime every once in a while. Here is an example of a recent surprise build break caused by build image updates:
The question is: Do we want dotnet/runtime engineering system to be predictable? As in, if the build of given SHA succeeds on the official build CI machine today, does the construction of the engineering system guarantee that the build succeeds tomorrow too? |
FWIW if a build happens on directly on a build agent (e.g. windows), we don't have this guarantee. The VM's the agents are based on get patched regularly. That being said I acknowledge this is a potential value add we get from using containers. My concerns with using digests/shas is that they will be changing frequently across all branches. Keeping everything up-to-date is going to take a non-trivial amount of work and even if kept up, it is going to generate a lot of compliance alerts related to the use of images with vulnerabilities and/or EOL images. It feels like a balance/ROI decision. As noted in the one issue referenced, there may be things we can do to make our builds more resilient to changes. Perhaps there is a difference between an explicit Dockerfile change versus a triggered rebuild to pick-up CVE updates. By that I mean triggered rebuilds to pick-up CVEs do not require repo source updates while Dockerfile changes would (via auto PRs). |
Yes, and we have been complaining about this for years. It was worse when we were building with preview VS versions. It got better recently when we switched to building with stable VS versions. It is ok to say that we prefer aggressive rollout of image updates (no PRs) and declare the engineering system unpredictability as "by design". We should acknowledge that it is not a cost-free and stress-free option. |
We can uplevel this ... there are two major proposals on the table:
I think the floating tag idea is largely understood at this point. It is pretty cheap to implement and would have specific terms of use for repo owners who want to use it. There are two approaches we can use for the PR-driven strategy:
Assuming we fund the tool at all, then the second option is one of the modalities. Most of the concerns on EOL images would (A) be a concern for the repo owners, and (B) could be mitigated by the update cadence. The update cadence hasn't been discussed. We're not required to have 24 hour turnaround on our repos in the general case for vulnerabilities, AFAIK. |
In PR 1274 the following statement is made:
While this statement may be true for the dotnet/runtime repo, it's not for others. The WCF repo hasn't ever had a build break due to incompatibilities from a Linux distro update. With only one exception (and that's Windows only), we have zero native platform calls in our code base. If there ever is an implementation incompatibility with a new OS distro, it will get discovered and fixed in the dotnet implementation in the runtime repo before we ever use that distro.
What we have had happen multiple times is our build break due to a Linux distros being retired in the build system and we weren't informed that was going to happen.
Having someone who is working on retiring older container images for unsupported distros having to reach out to each repo that is still using it, and then having someone on that team make a change, submit a PR, verify everything is still working, then merge it, that's all a manual process with lots of room for mistakes to be made. It also adds extra non-productive work which is a lot more disruptive and costly for smaller dev teams.
Instead of a concept of latest, (eg
debian-latest
), what about having a tag for the current distro? Eg for debian it would be something likedebian-current
. The current tag would get moved to a new distro once it's been verified that the dotnet/runtime repo is passing all its tests (or at least any failures aren't due to the new distro) on a new distro version. Maybe even wait a month after that to allow any repos with an infrequent darc update period will pick up a newer build of dotnet if any product fixes were needed. Any repo that's sensitive to distro changes and want to manually and intentionally move forward can stick to the current distro version named tags, but repos that are happy with the latest that dotnet/runtime has resolved any issues for can use the -current suffix? It will cut down on build system maintenance for teams like WCF where developer time is already stretched very thin.The text was updated successfully, but these errors were encountered: