Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

configure the current kratix promise setup to run on golem #3604

Closed
Tracked by #3429
piontec opened this issue Jul 30, 2024 · 5 comments
Closed
Tracked by #3429

configure the current kratix promise setup to run on golem #3604

piontec opened this issue Jul 30, 2024 · 5 comments
Assignees
Labels

Comments

@piontec
Copy link

piontec commented Jul 30, 2024

To make our kratix promises work, we need a few objects to be created in the project namespace first. Stub for them is available here: https://github.com/giantswarm/giantswarm-management-clusters/tree/add-kratix/management-clusters/golem/extras/kratix/platform

ToDo and why

What's left there to do is:

  1. namespace.yaml - probably should be replaced with Organization or left out of scope for "a customer" to create first. For the demo, I think we should create it.
  2. crb_promises_role_binding.yaml - figure out a (Cluster)Role we should use for it in the binding. Currently, the only needed permission is to read Secrets and CMs in the same namespace.
  3. destination_mc.yaml and gitstatestore_gitops.yaml - this drives where Kratix will put rendered files in the gitops repo.
    3.1. GitStateStore needs access to the gitops repo, so the same repo as flux. It should be also possible to reference the same access token that flux is using.
    3.2. Destination - docs are here, we need to inject it in the correct place in the gitops repo directories.
  4. secret_ghcr_pull_token.yaml - we need to generate a github token to use here (only classic tokens are available) and encrypt with sops
  5. secret_github_token_repo_create.yaml - we need to create a token that can read template repos and create new ones.

Bonus

We can probably abstract the role binding and the info config map into something like a Project promise, but to get started I skipped it.

@piontec piontec changed the title configure the current setup to run on golem configure the current kratix promise setup to run on golem Jul 30, 2024
@uvegla uvegla self-assigned this Aug 15, 2024
@uvegla
Copy link

uvegla commented Aug 15, 2024

Summary

This is a summary of how it was wired together on golem with issues bumped into along the way with their solutions and long term / making it production ready implications.

Setup

Dev platform: https://github.com/giantswarm/dev-platform-kratix-promises
GS side gitops: https://github.com/giantswarm/kratix-gitops/
Demo side gitops: https://github.com/DemoTechInc/demotech-gitops

The GS gitops was moved to a separate repo and thus flux-giantswarm ns based source and kustomization to avoid breaking or blocking the core ones with the continuous experimentation. See: https://github.com/giantswarm/kratix-gitops/blob/aa9b0997109e8a56025fc119ebd68b9d32985fbf/management-clusters/golem/flux.yaml

The demo related sources and kustomizations are in. the default namespace to simulate real scenario: https://github.com/DemoTechInc/demotech-gitops/blob/a0ea4be15bb2fa24e0caee945043f9cdd89e22b9/management-clusters/golem/gitrepo.yaml + https://github.com/DemoTechInc/demotech-gitops/blob/a0ea4be15bb2fa24e0caee945043f9cdd89e22b9/management-clusters/golem/golem.yaml

Encountered issues

Kratix installation

https://github.com/giantswarm/kratix-app from the upstream chart is not useful at the current state because it actually does not install Kratix. Pulled in the raw kratix.yaml from the latest releases (because Syntasso was kind of enough to quickly fix some issues I reported along the way), so it is stored in giantswarm/kratix-gitops as a raw yaml to avoid the moving latest release.

Related parts:

RBAC

In order to make Kratix work on our MC, some RBAC rules must be created / managed GS side, cos default/automation is not allowed to create cluster scoped RBAC resources.

Related parts:

  • https://github.com/giantswarm/kratix-gitops/blob/aa9b0997109e8a56025fc119ebd68b9d32985fbf/management-clusters/golem/rbac/kratix-default-automation.yaml
    • kratix-core so customer default/automation can create Kratix resources
    • kratix-giantswarm so customer can create the demo promises we prepared under group promise.platform.giantswarm.io, so please note that this is hard coded for that use case only! For new groups, the rule must be aligned! ⚠️
  • kratix-canary is needed at a bit later stage, when the first destination is created and the default/automation reconciles it with Flux. Kratix commits a namespace as a dependency and a configmap as a resources to make sure a new Kratix destination works. This role is needed to make sure the SA can do it: kratix-worker-system ns and the CM in it.
    • ❓ There is no control over this atm. What if multiple destinations are configured and reconciled into the same cluster? Multiple ksutomizations would fight over managing the same namespace / cm.

Some of these might make sense to eventually live in rbac-operator. Also FTR noticed these finalizers on some of the cluterroles I created above that may or may not be a bug in the operator:

finalizers:
  - operatorkit.giantswarm.io/rbac-operator-crossplane-controller

Password protected SSH keys

Using SSH deploy key for the Kratix destination repo with write access enabled is required. It seem at the time of the experiment, Kratix does not support password protected SSH key, so it must be created without / with empty one.

Related parts:

Security context

Kratix currently cannot handle security contexts and this is a big problem because of our enforced Kyverno rules.

See: https://gigantic.slack.com/archives/C076AGFN92S/p1723111247860519

Summary is that for now solved it polexes, but most of these have to be dynamic (e.g. for each promise and namespace you want to deploy as resources).

This is how Kratix wraps the pipeline containers:

- the main container is: `docker.io/syntasso/kratix-platform-pipeline-adapter` called `update-status`
- there are 3 init containers, in order:
  - `docker.io/syntasso/kratix-platform-pipeline-adapter` called `reader`
  - OUR CONTAINER doing its thing as the pipeline step
  - `docker.io/syntasso/kratix-platform-pipeline-adapter` called `work-writer`

So it is not enough to set security context for our container.

Related parts:

Some promise changes

RBAC can be specified now on promises, for example: https://github.com/giantswarm/dev-platform-kratix-promises/blob/5b513addeb42ad52dbedcc083cef332e25547804/github-template-repo-promise/promise.yaml#L102-L109, docs: https://docs.kratix.io/main/reference/workflows#rbac

Pull policy was set to Always for all images, because to speed up development for now the images were build locally as latest and pushed manually to gsoci, example: https://github.com/giantswarm/dev-platform-kratix-promises/blob/5b513addeb42ad52dbedcc083cef332e25547804/github-template-repo-promise/promise.yaml#L119.

The SA must be set on each HelmRelease by Kyverno rule enforcement: https://github.com/giantswarm/dev-platform-kratix-promises/blob/5b513addeb42ad52dbedcc083cef332e25547804/app-deployment-promise/containers/appdeployment-template-pipeline/files/HelmRelease.yaml#L18

Wait for now was reduced to simple sleep: https://github.com/giantswarm/dev-platform-kratix-promises/blob/5b513addeb42ad52dbedcc083cef332e25547804/app-deployment-promise/containers/check-if-infra-ready-pipeline/execute-pipeline#L35 will possibly be replaced with a different logic, see: https://gigantic.slack.com/archives/C056D9YQ1B5/p1722683856867759?thread_ts=1722604473.446219&cid=C056D9YQ1B5

Some dev template repo changes

The security context had to be aligned to make it compatible with our Kyverno rules: giantswarm/devplatform-template-go-service@97408da

Cosign issue

Disabled cosign verification in the HR: https://github.com/giantswarm/dev-platform-kratix-promises/blob/5b513addeb42ad52dbedcc083cef332e25547804/app-deployment-promise/containers/appdeployment-template-pipeline/files/HelmRelease.yaml#L14-L17 because Flux 2.1.2 does not support keyless signing. Support was added in 2.2.0: https://github.com/fluxcd/flux2/releases/tag/v2.2.0 + https://fluxcd.io/flux/components/source/ocirepositories/#keyless-verification.

Idea for key based signing and verification: https://gigantic.slack.com/archives/C02GDJJ68Q1/p1723710023444289?thread_ts=1722593976.383529&cid=C02GDJJ68Q1

The GS CMC add-kratix branch resources

Equivalents of https://github.com/giantswarm/giantswarm-management-clusters/tree/add-kratix/management-clusters/golem/extras/kratix/platform are as follows:

@uvegla
Copy link

uvegla commented Aug 15, 2024

FTR the resources used for testing with promises at the state of: https://github.com/giantswarm/dev-platform-kratix-promises/tree/5b513addeb42ad52dbedcc083cef332e25547804

apiVersion: promise.platform.giantswarm.io/v1beta1
kind: githubrepo
metadata:
  labels:
    kratix.io/promise-name: githubrepo
  name: laszlo-kratix-2
  namespace: org-kratix-demo
spec:
  githubTokenSecretRef:
    name: dev-platform-gh-access
  registryInfoConfigMapRef:
    name: github-oci-registry-info
  repository:
    description: My first kratix project
    name: laszlo-kratix-2
    owner: DemoTechInc
    templateSource: giantswarm/devplatform-template-go-service
    visibility: private

and:

apiVersion: promise.platform.giantswarm.io/v1beta1
kind: appdeployment
metadata:
  labels:
    kratix.io/promise-name: appdeployment
  name: laszlo-kratix-2
  namespace: org-kratix-demo
spec:
  interval: 1m
  statusConfigMapReference:
    name: laszlo-kratix-2-info
  suspend: false
  timeout: 3m
  values:
    autoscaling:
      enabled: false
    ingress:
      enabled: false
    monitoring:
      serviceMonitor:
        enabled: false
    pdb:
      enabled: false
  version: '>=0.1.0'

@uvegla
Copy link

uvegla commented Aug 15, 2024

On Kratix workings

Here is my understanding on what Kratix does.

On promises and resources

A promises is built up from:

  • .spec.api is the CRD definition for the resource request
  • .spec.workflows is what happens when the promise has create/update/delete (CUD) actions and when a CR from the above .spec.api CRD has CUD action
    • promise workflow: triggered once when the workflow has CUD
      • output generated here is stored under the dependencies folder of the Kratix Destination
    • resource workflow: triggered each time a resource has CUD
      • output generated here is stored under the resources folder of the Kratix Destination
  • both the promise and the resource workflow can have 2 pipelines: configure and delete

On state stores and destinations

The output of workflows are stored as works.platform.kratix.io and there actual persistence in state stores as workplacements.platform.kratix.io.

The works actually store the generated file contents and the path / filename they are supposed to be stored in. These are created once per resource to store the output.

The workplacements are destination dependent and one is created for each work for each matching destination. For destination filtering see: https://docs.kratix.io/main/reference/destinations/multidestination-management.

Kratix has a 10 hour reconcile interval and it cannot be configured at the time of writing. But the reconciliation can happen by simply restating the Kratix controller as well beyond other events, see: https://docs.kratix.io/main/reference/resources/workflows#idempotency

For example when the reconciliation naturally happens or the controller restarts then the contents of the workplacements are enforced into the destinations. Note that the workflows / pipeline are not reran, the state stored in these places are enforced. (An interesting implication of this is that time dependent output (e.g. the output of the workflows depend on time, lets say a simple date > /kratix/output/date.txt) will not work and not get regenerated.)

⚠️ Kratix assumes full ownership of the destination folder. The final folder is always: <gitStateStorePath>/<destinationPath>/<destinationName>/ so you technically cannot write to the root of the repo as at least destinationName is always non empty. Also Kratix does not have or a need complex diff logic or knowing previous states because of that assumption. It simply wipes the real end destination folder and puts the result there. Also because of this I think the .spec.filepath.mode called nestedByMetadata makes sense tho the resulted paths are quite "ugly".

FTR folders created created under /kratix/output are created in the repo as well of course. With:

spec:
  path: ./
    filepath:
      mode: none

You can technically put files outside of dependencies and resources but you cannot escape the Kratix root folder of the destination I think. See: https://gigantic.slack.com/archives/C02GDJJ68Q1/p1723212403633329?thread_ts=1722593976.383529&cid=C02GDJJ68Q1

See: https://gigantic.slack.com/archives/C076AGFN92S/p1723199112084119

@uvegla
Copy link

uvegla commented Aug 15, 2024

Notes on some concerns

Full automation related concerns

  • There are a few chicken - egg problems to solve
    • Kratix Destination -> creates folder in repo vs. the flux git repo and kustomization reconciling that repo
    • Organization -> namespace vs. resources created in there

Kratix in general

  • The way destinations work are not compatible with GS recommended / used gitops repo structure
  • Security context / Kyverno issues
  • Upstream chart / GS chart generated from it
  • The "constant" reconciliation may not be desirable in some scenarios

Others

  • Cosign / Flux / Github action issues

@weatherhog
Copy link

is also working on gazelle. closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

No branches or pull requests

3 participants