-
Notifications
You must be signed in to change notification settings - Fork 616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Flux Bootstrap for OCI-compliant Container Registries #4749
base: main
Are you sure you want to change the base?
Changes from 1 commit
ab4692c
d611398
ae9f312
59d1c7e
9c88182
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,319 @@ | ||
# [RFC] Flux Bootstrap for OCI-compliant Container Registries | ||
|
||
**Status:** provisional | ||
|
||
**Creation date:** 2024-04-27 | ||
|
||
**Last update:** 2024-04-27 | ||
|
||
## Summary | ||
|
||
Flux should allow a Git-less bootstrap procedure where the cluster desired state is stored in OCI artifacts. | ||
|
||
On the client-side, the Flux CLI should offer a command for packaging its own Kubernetes manifests into | ||
an OCI artifact and pushing the artifact to a container registry. | ||
|
||
On the server-side, the Flux controllers should be configured to self-update from the registry | ||
and reconcile the cluster state from OCI artifacts stored in the same or a different registry. | ||
|
||
## Motivation | ||
|
||
Given that OCI registries are evolving into a generic artifact storage solution, | ||
we should allow Flux users who don't want to run a Git server as part of their | ||
stefanprodan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
production infrastructure to bootstrap and manage their Kubernetes clusters using OCI artifacts. | ||
|
||
To decouple the clusters reconciliation from the Git repositories, Flux allows packaging and publishing | ||
the Kubernetes manifests stored in Git to an OCI registry by running the `flux push artifact` | ||
command in CI pipelines. | ||
|
||
### Goals | ||
|
||
- Add support to the Flux CLI for bootstrapping with a container registry as the source of truth. | ||
- Make it easy for users to switch from Git repositories to OCI repositories. | ||
|
||
### Non-Goals | ||
|
||
- Automate the migration of Flux manifests from a Git bootstrap repository to OCI. | ||
|
||
## Proposal | ||
|
||
Implement the `flux bootstrap oci` command with the following specifications: | ||
|
||
```shell | ||
flux bootstrap oci \ | ||
--url=<registry-url>/<flux-manifests>:<tag> \ | ||
--username=<registry-username> \ | ||
--password=<registry-password> \ | ||
--kustomization=<local/path/to/kustomization.yaml> \ | ||
--cluster-url=<registry-url>/<fleet-manifests>:<tag> \ | ||
--cluster-path=<path/inside/oci/artifact> | ||
``` | ||
|
||
The Terraform/OpenTofu counterpart is the `flux_bootstrap_oci` provider that exposes | ||
the same configuration options as the CLI. | ||
|
||
The bootstrap operations are split into two phases: | ||
|
||
- Install and self-update configuration for the Flux components. | ||
- Cluster state reconciliation configuration. | ||
|
||
### Install and self-update configuration | ||
|
||
The command performs the following steps based on the `url`, `username`, | ||
`password` and `kustomization` arguments: | ||
|
||
1. Logs in to the OCI registry using the provided credentials. | ||
2. Generates an OCI artifact from the Flux components manifests and the `kustomization.yaml` file. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might be worse mentioning here regarding workload identity mods that may be needed, e.g. on EKS role ARN needs to be set as an annotation, I forgot if that was necessary in GKE also. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have all of those documented here https://fluxcd.io/flux/installation/configuration/workload-identity/, people will need to read the docs and adapt their kustomization.yaml. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But perhaps there could be bootstrap argument to specify provider-specific attributes that would be handled accordingly based on provider flag? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is probably not the right place to bring this up but the workload identity for GKE is partially incorrect, you dont need to annotate service accounts any more with a GCP SA for GKE workload identity. You grant the Kubernetes service account access to what ever resources it needs via a member statement like below principal://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/flux-system/sa/source-controller I'm going to have a think on how I can update the docs on this one, but thought I'd raise it before I forget There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We've recently adopted Flux for our multi-cloud architecture. In order to support that we're actually overriding env vars, volumes, and volume bindings directly in order to set up OIDC-based auth to each cloud on our workloads. If we can extend those parameters on the source controller, or if we can pass in a pod template that would allow us to easily inject OIDC auth for accessing the OCI backend (I suspect we could do this with the Helm install method, but support in the CLI would be nice, as the bootstrap command is great). |
||
3. Applies the Flux components manifests along with their customisations to the cluster. | ||
4. Pushes the OCI artifact to the container registry using the specified tag. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if it would make sense to have two registry auth. A read-only for the image pull secrets, and read-write for pushing the artifacts to the registry. Storing a read-write secret in the cluster for image pull secrets does not seem like a good idea. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is even possible to consider more pull secrets with specific permissions (least privileges principle):
Then, it is possible to consider different OCI registry: one for images and another for Flux artifacts, because the latter could contain sensible infrastructure information. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes this is something the command could support. Currently our OCI implementation supports reading the Docker config file from the host OS, so we could use that for write operations and the flags for in-cluster secret. @sestegra the pull secret for the container images is already supported, it's one of the common flags to all bootstrap commands. |
||
5. Generates an image pull secret, an OCIRepository that points to the OCI artifact and | ||
stefanprodan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
a Flux Kustomization object that reconciles the OCI artifact contents. | ||
6. Applies the image pull secret, OCIRepository and Flux Kustomization to the cluster. | ||
stefanprodan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Artifacts pushed to the registry: | ||
- `<registry-url>/<flux-manifests>:<checksum>` (immutable artifact) | ||
- `<registry-url>/<flux-manifests>:<tag>` (tag pointing to the immutable artifact) | ||
|
||
Objects created by the command in the `flux-system` namespace: | ||
- `flux-components` Secret | ||
- `flux-components` OCIRepository | ||
- `flux-components` Kustomization | ||
|
||
### Cluster state reconciliation configuration | ||
|
||
After the OCIRepository and Flux Kustomization called `flux` become ready, the command | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this have to be strictly sequential and synchronous? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes it does, CRDs and controllers must be up and running before the cluster sync is deployed, same procedure as for the Git bootstrap. |
||
continues with the following steps: | ||
|
||
1. Logs in to the OCI registry where the cluster artifacts are stored using the provided credentials. | ||
2. If the cluster OCI artifact is not found, an empty artifact is created | ||
and pushed to the registry using the provided tag. | ||
3. Generates an image pull secret, an OCIRepository and a Flux Kustomization object | ||
stefanprodan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
that reconciles the cluster OCI artifact contents. | ||
4. Applies the image pull secret, OCIRepository and Flux Kustomization to the cluster. | ||
|
||
Objects created by the command in the `flux-system` namespace: | ||
- `flux-system` Secret | ||
- `flux-system` OCIRepository | ||
- `flux-system` Kustomization | ||
|
||
If the cluster registry is the same as the Flux components registry, the command could reuse the | ||
`flux-components` image pull secret. | ||
|
||
### Registry authentication | ||
|
||
The `flux bootstrap oci` command supports the following authentication methods: | ||
|
||
- Basic authentication with `--username` and `--password`. The credentials are stored in a Kubernetes Secret. | ||
- OIDC authentication with `--provider=<aws|azure|gcp>`. No credentials are stored in the cluster, source-controller | ||
will use Kubernetes Workload Identity to authenticate to the registry. | ||
|
||
To avoid passing the credentials as CLI flags, the password can be read from the standard input, e.g.: | ||
`echo <password> | flux bootstrap oci` or using an environment variable `OCI_PASSWORD`. | ||
|
||
If the registry is self-hosted and uses a self-signed TLS certificate, | ||
the root CA certificate can be provided with the `--ca-file` flag. | ||
|
||
If the registry is exposed on HTTP and not HTTPS, the `--allow-insecure-http` | ||
flag can be used to force non-TLS connections. | ||
|
||
### Signing and verification | ||
|
||
The `flux bootstrap oci` command supports the following signing and verification methods: | ||
|
||
- Cosign | ||
- Notation | ||
|
||
TODO: Add more details about the signing and verification methods, flags and options. | ||
|
||
### User Stories | ||
|
||
#### Story 1 | ||
|
||
> As a platform operator I want to bootstrap a Kubernetes cluster with Flux | ||
> using OCI artifacts stored in a container registry. | ||
|
||
The following example demonstrates how to bootstrap a Flux instance using GitHub Container Registry | ||
as the OCI registry for Flux components and the cluster state. | ||
|
||
```shell | ||
flux bootstrap oci \ | ||
--url=ghcr.io/stefanprodan/flux-manifests:production \ | ||
--username=<ghcr-username> \ | ||
--password=<ghcr-token> \ | ||
--kustomization=flux-manifests/kustomization.yaml \ | ||
--cluster-url=ghcr.io/stefanprodan/fleet-manifests:production \ | ||
--cluster-username=<ghcr-username> \ | ||
--cluster-password=<ghcr-token> \ | ||
--cluster-path=clusters/production | ||
``` | ||
|
||
Generated OCI artifacts: | ||
|
||
- `ghcr.io/stefanprodan/flux-manifests:88b028f` | ||
- `ghcr.io/stefanprodan/flux-manifests:production` | ||
- `ghcr.io/stefanprodan/fleet-manifests:6f7a258` | ||
- `ghcr.io/stefanprodan/fleet-manifests:production` | ||
|
||
Objects created in the `flux-system` namespace: | ||
|
||
Flux components reconciliation: | ||
|
||
```yaml | ||
apiVersion: source.toolkit.fluxcd.io/v1beta2 | ||
kind: OCIRepository | ||
metadata: | ||
name: flux-components | ||
namespace: flux-system | ||
spec: | ||
interval: 1m | ||
url: oci://ghcr.io/stefanprodan/flux-manifests | ||
ref: | ||
tag: production | ||
secretRef: | ||
name: flux-components | ||
--- | ||
apiVersion: kustomize.toolkit.fluxcd.io/v1 | ||
kind: Kustomization | ||
metadata: | ||
name: flux-components | ||
namespace: flux-system | ||
spec: | ||
interval: 1h | ||
retryInterval: 5m | ||
sourceRef: | ||
kind: OCIRepository | ||
name: flux-components | ||
path: ./ | ||
prune: true | ||
``` | ||
|
||
Cluster state reconciliation: | ||
|
||
```yaml | ||
apiVersion: source.toolkit.fluxcd.io/v1beta2 | ||
kind: OCIRepository | ||
metadata: | ||
name: flux-system | ||
namespace: flux-system | ||
spec: | ||
interval: 1m | ||
url: oci://ghcr.io/stefanprodan/fleet-manifests | ||
ref: | ||
tag: production | ||
secretRef: | ||
name: flux-system | ||
--- | ||
apiVersion: kustomize.toolkit.fluxcd.io/v1 | ||
kind: Kustomization | ||
metadata: | ||
name: flux-system | ||
namespace: flux-system | ||
spec: | ||
interval: 1h | ||
retryInterval: 5m | ||
sourceRef: | ||
kind: OCIRepository | ||
name: flux-system | ||
path: clusters/production | ||
prune: true | ||
``` | ||
|
||
#### Story 2 | ||
|
||
> As a platform operator I want to sync the cluster state with the fleet Git repository. | ||
|
||
Push changes from the fleet Git repository to the container registry: | ||
|
||
```shell | ||
# clone the fleet Git repository | ||
git clone https://github.com/stefanprodan/fleet.git | ||
cd fleet | ||
git switch main | ||
|
||
# push the contents the fleet OCI repository and tag it with the commit short SHA | ||
flux push artifact oci://ghcr.io/stefanprodan/fleet-manifests:$(git rev-parse --short HEAD) \ | ||
--path="./" \ | ||
--source="$(git config --get remote.origin.url)" \ | ||
--revision="$(git branch --show-current)@sha1:$(git rev-parse HEAD)" | ||
|
||
# tag the new version for production | ||
flux tag artifact oci://ghcr.io/stefanprodan/fleet-manifests:$(git rev-parse --short HEAD) \ | ||
--tag=production | ||
``` | ||
|
||
This operation can be automated using the Flux GitHub Action. | ||
|
||
The Git repository structure would be similar to the | ||
[flux2-kustomize-helm-example](https://github.com/fluxcd/flux2-kustomize-helm-example) with the following changes: | ||
|
||
- The `clusters/production/flux-system` directory is no more. | ||
- The Flux Kustomization objects defined in the `clusters/production` directory, such as | ||
`infrastructure.yaml` and `apps.yaml`, have the `.spec.sourceRef` set to | ||
`kind: OCIRepository` and `name: flux-system`. | ||
|
||
#### Story 3 | ||
|
||
> As a platform operator I want to update the Flux controllers on my production cluster | ||
> from CI without access to the Kubernetes API. | ||
|
||
Download the latest CLI version and update Flux directly in the registry, without rerunning bootstrap: | ||
|
||
```shell | ||
# pull the latest manifests from the registry | ||
flux pull artifact oci://ghcr.io/stefanprodan/flux-manifests:production \ | ||
--output=./flux-manifests | ||
|
||
# update the Flux components manifests | ||
flux install --export > ./flux-manifests/flux-system/gotk-components.yaml | ||
Comment on lines
+324
to
+330
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These are two alternative methods right? It's not very clear from the text at the moment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you don't have access to the cluster (what the user story is about), this is the only way. If you have API access, then like with Git bootstrap, you can just rerun it to update. OCI bootstrap behaves the same as Git bootstrap. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just thought these two commands will write the same kind of output, except that install lets you select a subset of controllers... maybe I am missing something else. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it the case right now that one has to rerun bootstrap on major/minor releases while patch releases are taken care of by in-cluster image version bumps? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Every time we release Flux, users get a PR opened to update their manifests in Git. For OCI you would need some kind of semver range or some other manual gate e.g. a GitHub workflow dispatch to approve minor bumps and let only patch versions be automatically push to the registry. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
All the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am talking about'pull artifact' vs 'install --export' (per above) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You need to pull to preserve the existing kustomization.yaml and any other extra resources you may have added at bootstrap. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, of course, this is a "rebase" ;) |
||
|
||
# calculate the checksum of the manifests | ||
checksum=$(grep -ar -e . ./flux-manifests/ | shasum | cut -c-16) | ||
|
||
# extract the Flux version and commit | ||
flux_version=$(flux version --client | awk '{print $2}') | ||
flux_commit=$(go version -m $(which flux) | grep vcs.revisio | awk -F= '{print $NF}') | ||
|
||
# push the updated manifests to the registry using the checksum as tag | ||
flux push artifact oci://ghcr.io/stefanprodan/flux-manifests:${checksum} \ | ||
--path="./flux-manifests" \ | ||
--source="https://github.com/fluxcd/flux2" \ | ||
--revision="${flux_version}@sha1:${flux_commit}" | ||
|
||
# tag the new version for production | ||
flux tag artifact oci://ghcr.io/stefanprodan/flux-manifests:${checksum} \ | ||
--tag=production | ||
``` | ||
|
||
This operation could be simplified by implementing a dedicated CLI command and/or GitHub Action. | ||
|
||
#### Story 4 | ||
|
||
> As a platform operator I want to update the registry credentials on my clusters. | ||
|
||
To rotate the registry credentials, generate a new GitHub token and overwrite the image pull secret: | ||
|
||
```shell | ||
flux create secret oci flux-system \ | ||
--url=ghcr.io \ | ||
--username=<ghcr-username> \ | ||
--password=<ghcr-token> | ||
``` | ||
|
||
Another option is to rerun the bootstrap command with the new credentials. | ||
|
||
## Design Details | ||
|
||
The bootstrap feature will be implemented as a Go package under `fluxcd/flux2/pkg/bootstrap/oci` | ||
using the [fluxcd/pkg/oci](https://github.com/fluxcd/pkg/tree/main/oci) | ||
library for OCI operations such as auth, push, pull, tag, etc. | ||
|
||
Both the Flux CLI and the Terraform/OpenTofu provider will use the `fluxcd/flux2/pkg/bootstrap/oci` package | ||
and expose the same configuration options. | ||
|
||
### Enabling the feature | ||
|
||
The feature is enabled by default. | ||
|
||
## Implementation History | ||
|
||
* NONE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would imagine that users may wish to choose between pulling from upstream OCI artifact that is published as part of the Flux release or having a full copy of it. If they choose to use a copy, another command may be needed to keep their copy up to date. Does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would always have a copy in your registry that includes customisations, same as with Git, bootstrap means vendoring the Flux manifests that in 99% of cases would need some fine tuning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's true that in most cases right now people will end up with a copy. How do they bring the copy up to date, e.g. some component has new pod spec fields that have to be set or there is an RBAC change? Do they have to read changelog and implement such changes? If in OCI world this could be avoided by means of referencing an upstream artifact and local changes stored as a patch/kustonization, it might be very nice actually, I guess it wasn't feasible with Git as an upstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For both Git and OCI bootstrap, the Flux update is fully automated in CI. See the Story 3 in this RFC. And also the docs here: https://fluxcd.io/flux/flux-gh-action/#automate-flux-updates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to clarify, can this fine tuning be done with an in-cluster Kustomization, or has that been proven somehow challenging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the common flags to the RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I know, what I mean is that it would be kind of less natural specify controllers with kustomize, one would probably need to select bases as there is no meaningful parameters and if or switch statements (at least last time I checked), the CLI offers are more meaningful option.
So because you need to select controllers, you start with CLI that gives you a single file that you kustomize a little, but it's all much complicated then you might have wished and defeats the purpose of putting kustomize in at this stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUE on the other hand could do better in all of this and you could potentially remove the need for CLI and make custom configs easier to introduce. Does what I said earlier make more sense now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way Flux can be customized at bootstrap is all via Kustomize patches and CLI options, this must be 100% compatible with the oci sub-command. I’m not considering using CUE or anything really that would diverge from the current Git bootstrap procedure. Users should be able to migrate from Git to OCI by simply reusing their current flux-system overlay including patches, image overrides, configGenerator, volumes, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is for sure, I'm just thinking maybe self-managed flux in the future could use CUE for this, possibly even without exposing CUE to the user. CLI could still work the same way on the surface also. I do recall we once spoke of an installer operator too, you could use CUE there and in the CLI. Just an idea :)