-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Throttled update rollouts #83
Comments
@steveej and I have just started the process of defining how an admin provides a custom policy to the Policy Engine. FCOS will likely need a custom policy to implement channels and rate-limiting. We'll want to take into account your use-case as we finalize the design. Can you provide more specifics around your proposed update policy? It would be helpful to see things like inputs to the policy (e.g. the machine-id, admin-controlled rate limits), external services which must be consulted (e.g. the metrics collection server), and the criteria for offering a particular update payload. |
I think for FCOS we can proceed top-to-bottom here, and design/prototype pieces starting from the graph-builder till we reach the update-client (and the reboot manager later on). Regarding the graph-builder, I have the following points for which I'd like to see some inputs (from @bgilbert and @dustymabe especially):
Other assorted comments:
This indeed requires a metrics collector in place, which has to be queried by the policy engine.
This can be done either at metadata source (by maintaining a linear chain as the update path, or by explicitly whitelisting/blacklisting all paths) or as a policy rule (by selecting specific edges to cut). The former is static at graph-ingestion time but verbose/cumbersome, the latter is dynamic and likely more flexible but needs to be computed at request-time. Not sure which one we prefer.
For anything outside of "client request parameters" point above, I think this belongs to the "metrics" topic. Do you have specific example of relevant state to be collected which may not be directly in the set of request key parameters? |
Thanks @bgilbert for writing this up and for the context of how CoreUpdate worked in the past with CL. I don't have any objections with the proposal and like a lot of what I hear. Update barriers will especially be useful in being able to drop legacy hacks/workarounds.
@bgilbert are we ready to answer this question yet? Should we brainstorm a bit soon in order to be able to provide more data? |
seems reasonable to me that they would share the same metadata format
I think we need to collaborate with fedora infra on this.
all possible options. a list of tags on a git repo could be the most transparent way
I don't have any input on this. I guess it would be nice if we don't have too many options.
maybe this is part of the discussion about metrics?
No strong opinion, but I think I prefer pull based.
maybe something in between? stale graph if metadata source has been down < 1 day (i.e. intermittent failure) and maybe error if more than that (systemic outage). |
I am not opposed to any of this - there are some cool advantages - but the whole thing, particularly
makes me pause because in layering a totally new thing on top we're partially negating the testing and workflow that exists around ostree/rpm-ostree today. Typing Big picture...it feels like the biggest win would be putting Cincinnati on top of both ostree and rpm-md (i.e. yum/dnf). Then things work more consistently. But that's clearly an even larger endeavor. |
@cgwalters AIUI Cincinnati is designed to manage a single artifact (or meta-artifact) with a single version number, so I'm not sure how directly it'd map to rpm-md. Unless the version was the Pungi compose ID? Do you think it'd make sense to support Cincinnati directly in rpm-ostree? I have no idea how relevant our profile of Cincinnati would be to other rpm-ostree-using distros. The obvious use of |
Yeah, it would be nice indeed if it were possible to keep
Re. update barriers, there's a related discussion in upstream libostree: ostreedev/ostree#1228. If we can get this working in a satisfying way for other libostree users, then we could do something like this:
That's a big if though. But then it wouldn't be hard to add some integration between the Cincinnati client and rpm-ostree, e.g. making |
I'd be concerned about trying to duplicate the full generality of the Cincinnati graph inside rpm-ostree. Even if we use a well-defined subset of the graph functionality today, it would complicate expanding that subset in the future.
Hmm. Could it make sense to let the distro configure |
#98 discusses stream metadata, which might be one of the sources of truth for the graph builder. |
If we decided to exclusively use ostree static deltas for distributing updates, the set of available deltas could be encoded in the Cincinnati graph. cc @sinnykumari |
Discussion here with @lucab @jlebon @ajeddeloh @cgwalters
General consensus seemed clear for a separate agent. Agent would be a service talking between Cincinnati and rpm-ostree; but does it have a CLI? We want rpm-ostree to be clear it's being driven by an external agent. We know we clearly want to separate rebooting. Some discussion now of whether the finalize-delay thing is worth the UX complexity. Q: Is FCOS agent a container or part of the host? Consensus is for the host; this is core functionaliy. lucab: Problem with locksmith was direct req on etcd; required learning etcd. But for Kube want a separate thing. The new agent should support calling out to a generic HTTP service. Agent is configured to say which endpoint is used. jlebon: This solves non-Kubernetes locking andrew: Come out with a few policies for reboots; "whenever" "at 1am" (walters "wake up system") walters: How about supporting scripting w/Ansible for non-cluster but multi-node setups. lucab: push vs pull. Leans towards pull andrew: For push we could encorage shipping static binaries (go/rust) to the host that implement site-specific dynamic policies. On communication between rpm-ostree and agent; DBus. Can use API to download+stage. We add an API for "finalize and reboot" which is unlink("/etc/ostree-finalize.lock") - and possibly create it in the new deployment so the new thing is locked. |
[ C = Cincinnati ] lucab: How much code can we share with existing code (locksmith and machine-config-operator/pivot). Who does the HTTP pulling with pivot? Answer: podman walters: The version number is important here - what does C pass us? lucab: C gives us a DAG. Metadata is opaque. walters: if C gives us a commit hash, then rpm-ostree already knows it's not supposed to do anything on andrew: Can we expose C barriers to user? lucab: no, not part of the design. Want controlled logic on the server side. walters: Windows AIUI supports the "force give me the update and bypass C" model [some discussion of how this relates to streams for both FCOS and RHCOS, how this relates to installers, canary clusters] Need to create agent now. Partially depends C server; pushing payloads to it. Depends on infrastructure decisions (Fedora infra?). And testing what we're putting the refs! |
andrew: C coming as a container will make it easy to spin up for local dev. |
thanks for the updates from devconf discussions colin! |
Thanks @cgwalters! Let me summarize my understanding of the notes and see if I got it right:
Questions
|
It's nice to have things queued and ready to go. It helps avoid downtime, because you know the time is just rebooting basically. It's quite a different experience - I can say that for sure w/Silverblue, and I'd like to support it nicely for servers too. |
I think it makes sense for your client and policy to use a parameter to implement this feature. The policy could them choose to ignore rate-limiting when it sees the parameter. For what it's worth, OpenShift allows the admin to bypass Cincinnati entirely. At the end of the day, Cincinnati is just a hinting engine. |
I guess I'm confused. CL also separates download/install and reboot, but it updates the "bootloader" (actually the installed kernels) at install time, rather than just before reboot. What's the advantage of deferring the bootloader update? |
libostree does that too by default.
It means that rebooting the machine is a predictable operation; let's say one is rebooting because kernel memory is fragmented or something. Doing so doesn't sometimes as a side effect opt you in to an OS update. (But if there is an OS update and you were going to reboot anyways, then it's highly likely it was already downloaded and so you can save the download time) |
Also, note that in OSTree systems, doing the final bootloader update is coupled with rebooting because of ostreedev/ostree#1503 (which fixed coreos/rpm-ostree#40). |
Ah, that makes sense.
I can see the advantage to that under specific circumstances, such as when debugging a problem. I'd be concerned about making it the default, though. One of our core premises is that Fedora CoreOS updates automatically, and users shouldn't have to think about what version they're running. (And also, any version older than the latest is unsupported.) Our defaults should be consistent with that. |
@bgilbert your recap is in line with my understanding. Trying to answer your other questions:
|
Yep. I think there's a spectrum here though. One thing I've heard that makes a ton of sense is people want to run explicit "canary/dev" clusters that might update automatically, and gate their prod upgrades on the success of their dev clusters. There are a variety of ways to implement this, but what I want to argue here is that the prod cluster here can (and should) still be downloading so that the update is there and ready. |
Related to this, we were discussing possibly having the status of the agent being injected in OK, I opened coreos/rpm-ostree#1747 for this. |
Okay, I think I understand now. The premise is that the system downloads an update at 1 PM but is configured not to apply it until 11 PM. At 5 PM, the system crashes and reboots. We want the system to boot the old OS version, and then reboot again at 11 PM to apply the update. Is that right? I was thinking of the case where the user has disabled automatic reboots entirely, which seems to be fairly common on CL. If I have such a system, how will I reboot if I want an update activated, and how will I reboot if I don't? If the user reboots for an unrelated reason, I figured it'd be better to opportunistically apply the update, rather than never applying updates until explicitly requested. Relatedly, how will the system work if a second update is available before the first one is applied? Will it download that update as well? On CL, nothing further will happen until the machine reboots into the first update, which is not ideal.
If this were purely new development, I'd agree. But we'll also want a migration path for users running locksmith + etcd. In principle that could be a new distributed service that runs in the cluster, but I wonder if that wouldn't make CL migration too complex.
That would make it harder for #98 metadata to serve as the single source of truth. To pin upgrades at an older version, we'd need to update the stream metadata, rebuild the graph, and also update the ostree ref. (I'm not sure how much of a problem that is. In principle we might want to do it anyway.)
Sure. I was thinking more of the force-update case, if done via Cincinnati.
Nope, just curious. |
@lucab and I discussed further OOB. I see the point about keeping etcd support out of the agent. For users migrating from CL, we could provide a containerized lock manager which synchronizes with etcd, and a migration guide which recommends running that container on every node. New clusters wouldn't be deployed with that model, but it'd provide a way for existing non-Kubernetes clusters to get migrated quickly. |
I started experimenting with such a containerized lock manager, findings at #3 (comment). I also started experimenting with a minimal on-host agent, trying to understand what are the configuration axis, the state machine and introspectable parameters.
|
My current experiment is at https://github.com/lucab/exp-zincati. I think we have tickets in place (and in progress) for all the high level dependencies I saw while sketching that. In order to progress toward stabilizing that, I'd like to get a repo somewhere under /cc @ajeddeloh @arithx @LorbusChris as my usual go-to folks for name bikeshedding |
|
road-trip, minicar |
Related to the rockets / engines / jet theme: |
@lucab It does. 😁 |
From fcos IRC meeting: |
|
Earth-related: Otherwise from the other enginer-related thread we still have |
|
|
I like |
zincati it is! https://github.com/coreos/zincati - go forth! |
This has been fully implemented in Zincati and the FCOS Cincinnati backend at this point, I'm closing the ticket. There are further specific usecases and docs tasks which are ongoing and tracked by dedicated tickets, e.g. coreos/zincati#116 and #240. |
Build a system for gradually rolling out OS updates to Fedora CoreOS machines in a way that can be centrally managed by FCOS developers.
Background: Container Linux
Container Linux updates are rolled out gradually, typically over a 24 to 72 hour period. If a major bug is caught before the rollout is complete, we can suspend the rollout while we investigate.
On CL, rollouts are implemented in CoreUpdate by hashing the machine ID with the new OS version and comparing a few bits against a threshold which increases over time. If a machine automatically checks in but doesn't meet the current threshold, CoreUpdate responds that no update is available. If the user manually initiates an update request, the threshold is ignored and CoreUpdate provides the update.
Major bugs can be caught in two ways. CoreUpdate has a status reporting mechanism, so we can notice if many machines are failing to update. The status reports are not very granular, however, and thus not very debuggable. More commonly, a user reports a problem to coreos/bugs and we manually triage the issue and pause rollout if the problem appears serious.
For each update group (~= channel), CoreUpdate only knows about one target OS version at any given time. This is awkward for several reasons. If a machine running version A is not yet eligible for a new version C currently being rolled out, or if the rollout of C is paused, the machine should still be able to update to the previous version B, but CoreUpdate doesn't know how to do that. In addition, there's no way to require that machines on versions < P first update to P before updating to any version > P. (We call that functionality an "update barrier".) As a result, compatibility hacks for updating from particular old versions have to be carried in the OS forever, since there can be no guarantee that all older machines have updated.
CoreUpdate collects metrics about each machine that checks in: its update channel, its state in the client state machine, what OS version is running, what version was originally installed, the OEM ID (platform) of the machine, and its checkin history. This works okay but gives us an incomplete picture of the installed base: we do not receive any information about machines behind private CoreUpdate servers, behind a third-party update server such as CoreRoller, or which have updates disabled.
Fedora CoreOS
CoreUpdate, update_engine, and the Omaha protocol will not be used in Fedora CoreOS. A successor update protocol, Cincinnati, is being developed for OpenShift, and it appears that we'll be able to adapt it for Fedora CoreOS. I believe this involves:
Server requirements
...and nice-to-haves, and reifications of the second-system effect:
Client requirements
Metrics
Metrics should be handled by a separate system. Coupling metrics to the update protocol would provide the same sort of incomplete picture as in CoreUpdate, and in any event Cincinnati is not designed to collect client metrics. This probably means that certain of the features above, such as automatic rollout suspension or insight into the client state machine, will need to be handled outside the update protocol.
cc @crawford for corrections and additional details.
The text was updated successfully, but these errors were encountered: