Gate after commit and before deployment #870
Replies: 12 comments 27 replies
-
[RFC] Manual GatingMotivationFlux watches sources (e.g. GitRepositories, HelmRepositories, S3-compatible Buckets, ImageRepositories) and automatically reconciles the changes onto clusters as described with Flux Kustomizations and HelmReleases. There are situations when users want to have a gating mechanism after the cluster state changes are merged in Git:
Proposed solutionIn order to support manual gating, the GitOps Toolkit could be extended with a dedicated API and controller that would allow users to define A A A The ExampleDefine a gate that automatically closes after 1h from the time it has been opened: apiVersion: gating.toolkit.fluxcd.io/v1alpha1
kind: Gate
metadata:
name: sre-approval
namespace: flux-system
spec:
interval: 30s
default: closed
window: 1h When the gate is created in-cluster, the apiVersion: gating.toolkit.fluxcd.io/v1alpha1
kind: Gate
metadata:
name: sre-approval
namespace: flux-system
status:
conditions:
- lastTransitionTime: "2021-03-26T10:09:26Z"
message: "Gate closed by default"
reason: ReconciliationSucceeded
status: "False"
type: Opened While the gate is closed, all the objects that reference it will wait for an approval: apiVersion: kustomize.toolkit.fluxcd.io/v1beta1
kind: Kustomization
metadata:
name: my-app
namespace: flux-system
spec:
gates:
- name: sre-approval
- name: qa-approval
status:
conditions:
- lastTransitionTime: "2021-03-26T10:09:26Z"
message: "Reconciliation is waiting approval, gate 'flux-system/sre-approval' is closed."
reason: GateClosed
status: "False"
type: Approved The SRE team can open the gate either by annotating the gate or by calling the notification-controller webhook: kubectl -n flux-system annotate --overwrite gate/sre-approval \
open.gate.fluxcd.io/requestedAt="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" The apiVersion: gating.toolkit.fluxcd.io/v1alpha1
kind: Gate
metadata:
name: sre-approval
namespace: flux-system
status:
requestedAt: "2021-03-26T10:00:00Z"
resetToDefaultAt: "2021-03-26T11:00:00Z"
conditions:
- lastTransitionTime: "2021-03-26T10:00:00Z"
message: "Gate scheduled for closing at 2021-03-26T11:00:00Z"
reason: ReconciliationSucceeded
status: "True"
type: Opened While the gate is opened, all the objects that reference it are approved to reconcile at their configured interval. The SRE can decide to close the gate ahead of its schedule with: kubectl -n flux-system annotate --overwrite gate/sre-approval \
close.gate.fluxcd.io/requestedAt="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" The apiVersion: gating.toolkit.fluxcd.io/v1alpha1
kind: Gate
metadata:
name: sre-approval
namespace: flux-system
status:
requestedAt: "2021-03-26T10:10:00Z"
resetToDefaultAt: "2021-03-26T10:10:00Z"
conditions:
- lastTransitionTime: "2021-03-26T10:10:00Z"
message: "Gate close requested"
reason: ReconciliationSucceeded
status: "False"
type: Opened The objects that are referencing this gate, will finish their ongoing reconciliation (if any) then pause. To enforce a maintenance window of 24 hours, you can define a apiVersion: gating.toolkit.fluxcd.io/v1alpha1
kind: Gate
metadata:
name: maintenance
namespace: flux-system
spec:
interval: 30s
default: opened
window: 24h To start the maintenance window you can annotate the gate with: kubectl -n flux-system annotate --overwrite gate/maintenance \
close.gate.fluxcd.io/requestedAt="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" The apiVersion: gating.toolkit.fluxcd.io/v1alpha1
kind: Gate
metadata:
name: maintenance
namespace: flux-system
status:
requestedAt: "2021-03-26T10:00:00Z"
resetToDefaultAt: "2021-03-27T10:00:00Z"
conditions:
- lastTransitionTime: "2021-03-26T10:00:00Z"
message: "Gate scheduled for opening at 2021-03-27T11:00:00Z"
reason: ReconciliationSucceeded
status: "False"
type: Opened You could also schedule "No Deploy Fridays" with a CronJob that closes the |
Beta Was this translation helpful? Give feedback.
-
I would like to see release gating implemented in conjunction with #820. I would like to be able to see a diff of what is about to be applied before it happens, with a manual trigger to apply it. |
Beta Was this translation helpful? Give feedback.
-
I want to add a couple use-cases here where I think having an approval / gate would be helpful. These are just some ideas how it could work. First, for multiple environments and using Kustomize. If there is a single set of manifests and kustomize is used to set values by environment (dev/prod/etc) then it would be reasonable to deploy the changes to dev and gate the deploy to prod. Since it would only be a single set of manifests, they couldn't otherwise be gated / versioned. This could be configured at the cluster / Flux level by having some flag set that says whether that implementation should auto deploy on changes to the git repo or wait for approval. The other use-case is rolling updates across multiple environments. If we have 15 environments, all sharing the same manifests, this lets us roll out to 1, 2, 4, 8, etc environments instead of all at once. I have concerns about how to track this approval in git. If the So the overall process might look like where a PR is opened, and new code is merged to the default git branch. The development environment has no deployment |
Beta Was this translation helpful? Give feedback.
-
That is a really good solution! great connotation to explain it to stakeholders too. Gate closed no deploys, Gate opened code deployed 😛 when is this going in? |
Beta Was this translation helpful? Give feedback.
-
I am thinking about the Gating scenario from the perspective of the My main point is that this model doesn't give control to app developers over their This model only works where an admin persona is defining |
Beta Was this translation helpful? Give feedback.
-
This proposal seems great. Should the implementation be tracked in a ticket? Any appetite for this? |
Beta Was this translation helpful? Give feedback.
-
Would be nice if this gate could use cosign to validate the Signing keys on the gates.
|
Beta Was this translation helpful? Give feedback.
-
I have a use case coming in from a customer. This customer is an ISV in a regulated medical environment and has many client clusters. The applications on these client clusters are deployed and maintained with the way of Flux by the ISV. They are looking for a gating mechanism to control the roll-out of these applications after the PR is merged in the Git repo. The reasoning for this is both:
|
Beta Was this translation helpful? Give feedback.
-
I do have a big German Telco that wants this feature as well. They are constructing a platform as a service provider for tenants. Those tenants need to consume platform updates when they are ready to do so. This would help to make that happen. |
Beta Was this translation helpful? Give feedback.
-
Hey everyone 👋 We're currently focusing on Flux v2 GA release, there are still some things on the roadmap that we need to finish. After the GA release, I plan to create an official RFC based on #870 (comment). Given that manual gating is one of the most requested features in Flux, we'll try to prioritise this work after GA. Thanks to everyone who commented here, I'll do my best to incorporate your feedback in the final proposal, and once that's posted I will ask you to review and comment on the RFC. |
Beta Was this translation helpful? Give feedback.
-
+1 |
Beta Was this translation helpful? Give feedback.
-
I ran into the same issue on my last project. We needed a type of maintenance/reconciliation window for the flux reconciliations. So, I came up with a workaround using the flux suspension feature and K8s |
Beta Was this translation helpful? Give feedback.
-
I wanted to understand what would be the best way to extend the GitOps toolkit with a mechanism to allow configuration of a manual approval step (event-based) or a maintenance time window (e.g. only at night between 2-5 am) allowing to involve a cluster owner before deploying an update.
My use case: a SaaS provider (vendor company) has a marketplace where clients (other companies) can buy apps/services which shall be operated by the vendor in a K8s cluster on client's premises. Once the client buys an offer from the marketplace, an automated agent in the cloud is updating the deployment spec in a client-specific Git repo. The GitOps system within the client's K8s cluster recognizes the changed spec, retrieves the necessary artifacts and applies/deploys the changes. In case the offer is addressing regulated domain (like a classified medical product), the update must not automatic. The client needs to know upfront and have control over what is changed and when it is changed in order to keep the system up and running during critical time periods and prevent unexpected disruptions.
The commit/merge is either done by a human or by an automated business process (e.g. after a client has selected a marketplace offer to be installed on his cluster).
Ideally it would also be configurable whether the approval is requested before pulling the image from the registry or after. The latter is useful in case the images are large and should not be downloaded until the approval has been given.
One idea of @stefanprodan was to extract Flagger's manual gating feature into a dedicated controller so that any toolkit component could be gated in such a fashion. For example, when source-controller detects a new commit, instead of creating an artifact, it will call the gate hook and wait until the gate is opened. Once the gate is opened, it will generate the artifact, then kustomize/helm controllers will reconcile it. Same with image automation, the controller will not push to upsteam until a human opens the gate.
I'm not deep into the inner workings of flux and cannot really judge (yet) whether this idea would be the right approach for my use case. I hope that this discussion arouses broad interest for such a capability (which would probably also be useful in many other scenarios) and a simple and easy to solution will be found.
Beta Was this translation helpful? Give feedback.
All reactions