Skip to content

Commit

Permalink
refactor: use yaml parser
Browse files Browse the repository at this point in the history
  • Loading branch information
paulfouquet committed Oct 9, 2024
2 parents b7368da + 9584b69 commit bee1193
Show file tree
Hide file tree
Showing 22 changed files with 178 additions and 123 deletions.
62 changes: 31 additions & 31 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -1,33 +1,33 @@
version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: daily
- package-ecosystem: "docker"
directory: "/.github/workflows"
schedule:
interval: daily
- package-ecosystem: npm
directory: "/"
schedule:
interval: daily
open-pull-requests-limit: 10
groups:
aws-sdk:
patterns:
- "@aws-sdk/*"
aws-cdk:
patterns:
- "@aws-cdk/*"
- "aws-cdk"
- "aws-cdk-lib"
- "cdk8s"
- "cdk8s-cli"
- "cdk8s-plus-*"
- "constructs"
ignore:
- dependency-name: "@aws-sdk/*"
update-types: ["version-update:semver-patch"]
- dependency-name: "@types/node"
update-types: ["version-update:semver-patch"]
- package-ecosystem: 'github-actions'
directory: '/'
schedule:
interval: daily
- package-ecosystem: 'docker'
directory: '/.github/workflows'
schedule:
interval: daily
- package-ecosystem: npm
directory: '/'
schedule:
interval: daily
open-pull-requests-limit: 10
groups:
aws-sdk:
patterns:
- '@aws-sdk/*'
aws-cdk:
patterns:
- '@aws-cdk/*'
- 'aws-cdk'
- 'aws-cdk-lib'
- 'cdk8s'
- 'cdk8s-cli'
- 'cdk8s-plus-*'
- 'constructs'
ignore:
- dependency-name: '@aws-sdk/*'
update-types: ['version-update:semver-patch']
- dependency-name: '@types/node'
update-types: ['version-update:semver-patch']
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ To connect to the EKS cluster you need to be [logged into AWS](https://toitutewh

Then to setup the cluster, only the first time using the cluster you need to run this


```bash
aws --region=ap-southeast-2 eks update-kubeconfig --name=Workflows
```
Expand Down
2 changes: 1 addition & 1 deletion docs/infrastructure/components/karpenter.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
# Karpenter
# Karpenter
25 changes: 25 additions & 0 deletions docs/infrastructure/destroy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# How to destroy an installation

Destroying the cluster and stack is not easy, because we use some custom EKS resources to link the two together. Based on a teardown, at time of writing the following sequence should work:

1. Delete the cluster:

```bash
aws eks delete-cluster --name=Workflows
aws eks wait cluster-deleted --name=Workflows
```

1. Attempt to delete the stack:

```bash
aws cloudformation delete-stack --stack-name=Workflows
aws cloudformation wait stack-delete-complete --stack-name=Workflows
```

1. Wait for the above to fail.
1. Go to the [stack in AWS console](https://ap-southeast-2.console.aws.amazon.com/cloudformation/home?region=ap-southeast-2#/stacks/?filteringText=Workflows&filteringStatus=active&viewNested=true)
1. Delete the stack, retaining all the resources which could not be deleted

The reason we don't use the CLI for the last step is that the logical ID of the resources which could not be deleted does not seem to be the same as the ones which need to be retained. The reason is uncertain, but for now deleting in the console is safer.
[How do I troubleshoot custom resource failures in AWS CloudFormation?](https://repost.aws/knowledge-center/cfn-troubleshoot-custom-resource-failures) might be relevant for future issues like this.
2 changes: 1 addition & 1 deletion docs/infrastructure/helm.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@ However, some of the component Helm charts do not have a `values.schema.json`. A

- [aws-for-fluent-bit](./components/fluentbit.md) (<https://github.com/aws/eks-charts/issues/1011>)
- [Karpenter](./components/karpenter.md)
- [Argo workflows](./components/argo.workflows.md)
- [Argo workflows](./components/argo.workflows.md)
3 changes: 2 additions & 1 deletion docs/infrastructure/initial.deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,6 @@ The first time a cluster is deployed Custom Resource Definitions (CRD) will not
This means that any resources that require a CRD will fail to deploy with a error similar to

> resource mapping not found for name: "karpenter-template" namespace: "" from "dist/0003-karpenter-provisioner.k8s.yaml": no matches for kind "AWSNodeTemplate" in version "karpenter.k8s.aws/v1alpha1"
> ensure CRDs are installed first
To work around this problem the first deployment can be repeated, as the CRDs are deployed early in the deployment process.
To work around this problem, re-run the `kubectl apply` command.
29 changes: 18 additions & 11 deletions docs/infrastructure/kubernetes.version.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ If there is a version matching to the Kubernetes version to upgrade to, upgrade
```bash
npm install --save-dev cdk8s-plus-27
```

2. Remove the previous version

```bash
Expand All @@ -34,12 +35,13 @@ Below is an example of upgrading from v1.27 to v1.28
```bash
npm install --save-dev @aws-cdk/lambda-layer-kubectl-v28
```

While also removing the old lambda-layer version

```bash
npm rm @aws-cdk/lambda-layer-kubectl-v27
```

2. Set the new Kubernetes version in `LinzEksCluster`

```typescript
Expand All @@ -50,20 +52,23 @@ Below is an example of upgrading from v1.27 to v1.28

```typescript
import { KubectlV28Layer } from '@aws-cdk/lambda-layer-kubectl-v28';
// ...
kubectlLayer: new KubectlV28Layer(this, 'KubeCtlLayer'),
```
4. Diff the stack to make sure that only versions are updated
```bash
npx cdk diff Workflows -c ci-role-arn=...
ci_role="$(aws iam list-roles | jq --raw-output '.Roles[] | select(.RoleName | contains("CiTopo")) | select(.RoleName | contains("-CiRole")).Arn')"
admin_role="arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):role/AccountAdminRole"
workflow_maintainer_role="$(aws cloudformation describe-stacks --stack-name=TopographicSharedResourcesProd | jq --raw-output .Stacks[0].Outputs[0].OutputValue)"
npx cdk diff --context=maintainer-arns="${ci_role},${admin_role},${workflow_maintainer_role}" Workflows
```
The only changes should be Kubernetes version related.
```
Resources
[~] AWS::Lambda::LayerVersion KubeCtlLayer KubeCtlLayer replace
Expand Down Expand Up @@ -92,8 +97,9 @@ Below is an example of upgrading from v1.27 to v1.28
## Cycle out EC2 Nodes to the new version
<https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html#version-deprecation>
> **Are Amazon EKS managed node groups automatically updated along with the cluster control plane version?**
No. A managed node group creates Amazon EC2 instances in your account. These instances aren't automatically upgraded when you or Amazon EKS update your control plane. For more information, see Updating a managed node group. We recommend maintaining the same Kubernetes version on your control plane and nodes.
> No. A managed node group creates Amazon EC2 instances in your account. These instances aren't automatically upgraded when you or Amazon EKS update your control plane. For more information, see Updating a managed node group. We recommend maintaining the same Kubernetes version on your control plane and nodes.
This process is necessary to avoid being blocked for a future Kubernetes version upgrade. For example, if Kubernetes get upgraded from `1.27` to `1.28` and the nodes remain in `1.27`, the next time Kubernetes will be upgraded to `1.29`, the upgrade will fail.
Expand All @@ -102,10 +108,11 @@ This process is necessary to avoid being blocked for a future Kubernetes version
```bash
node_group_name="$(aws eks list-nodegroups --cluster-name=Workflows | jq --raw-output '.nodegroups[]')"
```
2. Describe the nodegroup to validate the versions
By describing the node group you can check the current version, or you can use `k get nodes` to see what version is currently running
```bash
aws eks describe-nodegroup --cluster-name=Workflows --nodegroup-name="$node_group_name" | jq --raw-output .nodegroup.version
```
Expand All @@ -115,9 +122,9 @@ This process is necessary to avoid being blocked for a future Kubernetes version
```bash
aws eks update-nodegroup-version --cluster-name=Workflows --nodegroup-name="$node_group_name"
```
This step takes some time to run. You can wait for it to finish with this command:
```bash
aws eks wait nodegroup-active --cluster-name=Workflows --nodegroup-name="$node_group_name"
```
22 changes: 11 additions & 11 deletions docs/labels.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ The following list of labels should be used in conjunction with Kubernetes [well

## Workflows

| Label | Description | Examples |
| --------------------- | ---------------------------------------- |--------------------------------------|
| `linz.govt.nz/ticket` | JIRA Ticket number | `TDE-912`, `BM-37` |
| `linz.govt.nz/region` | Geographic region that object relates to | "wellington", "auckland" |
| `linz.govt.nz/category` | The LINZ group that owns the workflow | "basemaps", "raster", "test", "util" |
| Label | Description | Examples |
| ----------------------- | ---------------------------------------- | ------------------------------------ |
| `linz.govt.nz/ticket` | JIRA Ticket number | `TDE-912`, `BM-37` |
| `linz.govt.nz/region` | Geographic region that object relates to | "wellington", "auckland" |
| `linz.govt.nz/category` | The LINZ group that owns the workflow | "basemaps", "raster", "test", "util" |

For the type of data that is being processed

Expand All @@ -25,12 +25,12 @@ For the type of data that is being processed

Most other objects deployed via AWS-CDK and CDK8s should also include information about the CICD process that deployed it

| Label | Description | Examples |
| -------------------------- | ---------------------------------------- | ------------------------------------------ |
| `linz.govt.nz/git-hash` | git hash that deployed the object | "bb3dab2779922094d2b8ecd4c67f30c66b38613d" |
| `linz.govt.nz/git-version` | git version information | "v6.46.0", "v0.0.1-20-gbb3dab27" |
| `linz.govt.nz/git-repository` | git repository that the object came from | "linz\_\_topo-workflows" |
| `linz.govt.nz/build-id` | Unique ID of the build that deployed | "6806791032-1" |
| Label | Description | Examples |
| ----------------------------- | ---------------------------------------- | ------------------------------------------ |
| `linz.govt.nz/git-hash` | git hash that deployed the object | "bb3dab2779922094d2b8ecd4c67f30c66b38613d" |
| `linz.govt.nz/git-version` | git version information | "v6.46.0", "v0.0.1-20-gbb3dab27" |
| `linz.govt.nz/git-repository` | git repository that the object came from | "linz\_\_topo-workflows" |
| `linz.govt.nz/build-id` | Unique ID of the build that deployed | "6806791032-1" |

## Label Usage

Expand Down
23 changes: 10 additions & 13 deletions infra/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,23 +35,20 @@ Main entry point: [app](./cdk8s.ts)

### Deploy CDK

To deploy with AWS CDK a few configuration variables need to be set
To deploy with AWS CDK a few context values need to be set:

Due to VPC lookups a AWS account ID needs to be provided
- `aws-account-id`: Account ID to deploy into. This can be set with `export CDK_DEFAULT_ACCOUNT="$(aws sts get-caller-identity --query Account --output text)"`.
- `maintainer-arns`: Comma-separated list of AWS Role ARNs for the stack maintainers.

This can be done with either a `export CDK_DEFAULT_ACCOUNT=1234567890` or passed in at run time with `-c aws-account-id=1234567890`

Then a deployment can be made with `cdk`
Then a deployment can be made with `cdk`:

```shell
npx cdk diff -c aws-account-id=1234567890 -c ci-role-arn=arn::...
ci_role="$(aws iam list-roles | jq --raw-output '.Roles[] | select(.RoleName | contains("CiTopo")) | select(.RoleName | contains("-CiRole")).Arn')"
admin_role="arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):role/AccountAdminRole"
workflow_maintainer_role="$(aws cloudformation describe-stacks --stack-name=TopographicSharedResourcesProd | jq --raw-output .Stacks[0].Outputs[0].OutputValue)"
npx cdk deploy --context=maintainer-arns="${ci_role},${admin_role},${workflow_maintainer_role}" Workflows
```

#### CDK Context

- `aws-account-id`: Account ID to deploy into
- `ci-role-arn`: AWS Role ARN for the CI user

### Deploy CDK8s

Generate the kubernetes configuration yaml into `dist/`
Expand All @@ -63,12 +60,12 @@ npx cdk8s synth
Apply the generated yaml files

```shell
kubectl apply -f dist/
kubectl apply --filename=dist/
```

### Testing

To debug use the following as `cdk8s syth` swallows the errors
To debug use the following as `cdk8s synth` swallows the errors

```shell
npx tsx infra/cdk8s.ts
Expand Down
2 changes: 1 addition & 1 deletion infra/charts/argo.extras.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import { Chart, ChartProps } from 'cdk8s';
import * as kplus from 'cdk8s-plus-29';
import * as kplus from 'cdk8s-plus-30';
import { Construct } from 'constructs';

import { applyDefaultLabels } from '../util/labels.js';
Expand Down
2 changes: 1 addition & 1 deletion infra/charts/argo.workflows.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import { Chart, ChartProps, Duration, Helm } from 'cdk8s';
import { Secret } from 'cdk8s-plus-29';
import { Secret } from 'cdk8s-plus-30';
import { Construct } from 'constructs';

import { ArgoDbName, ArgoDbUser, DefaultRegion } from '../constants.js';
Expand Down
2 changes: 1 addition & 1 deletion infra/charts/cloudflared.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import { Chart, ChartProps, Size } from 'cdk8s';
import * as kplus from 'cdk8s-plus-29';
import * as kplus from 'cdk8s-plus-30';
import { Construct } from 'constructs';

import { applyDefaultLabels } from '../util/labels.js';
Expand Down
4 changes: 2 additions & 2 deletions infra/charts/event.exporter.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ import {
Namespace,
ServiceAccount,
Volume,
} from 'cdk8s-plus-29';
} from 'cdk8s-plus-30';
import { Construct } from 'constructs';

import { applyDefaultLabels } from '../util/labels.js';
Expand All @@ -28,7 +28,7 @@ export class EventExporter extends Chart {
metadata: { name: 'event-exporter', namespace: props.namespace },
});

// https://cdk8s.io/docs/latest/plus/cdk8s-plus-29/rbac/#role
// https://cdk8s.io/docs/latest/plus/cdk8s-plus-30/rbac/#role
const clusterRole = new ClusterRole(this, 'event-exporter-cr', {
metadata: { name: 'event-exporter' },
});
Expand Down
2 changes: 1 addition & 1 deletion infra/charts/kube-system.coredns.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import { Chart, ChartProps } from 'cdk8s';
import * as kplus from 'cdk8s-plus-29';
import * as kplus from 'cdk8s-plus-30';
import { Construct } from 'constructs';

import { applyDefaultLabels } from '../util/labels.js';
Expand Down
2 changes: 1 addition & 1 deletion infra/charts/kube-system.node.local.dns.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import { ApiObject, Chart, ChartProps, JsonPatch, Size } from 'cdk8s';
import * as kplus from 'cdk8s-plus-29';
import * as kplus from 'cdk8s-plus-30';
import { Construct } from 'constructs';

import { applyDefaultLabels } from '../util/labels.js';
Expand Down
8 changes: 4 additions & 4 deletions infra/eks/cluster.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { KubectlV29Layer } from '@aws-cdk/lambda-layer-kubectl-v29';
import { KubectlV30Layer } from '@aws-cdk/lambda-layer-kubectl-v30';
import { Aws, CfnOutput, Duration, RemovalPolicy, SecretValue, Size, Stack, StackProps } from 'aws-cdk-lib';
import * as chatbot from 'aws-cdk-lib/aws-chatbot';
import * as cloudwatch from 'aws-cdk-lib/aws-cloudwatch';
Expand Down Expand Up @@ -44,7 +44,7 @@ export class LinzEksCluster extends Stack {
/* Cluster ID */
id: string;
/** Version of EKS to use, this must be aligned to the `kubectlLayer` */
version = KubernetesVersion.of('1.29');
version = KubernetesVersion.of('1.30');
/** Argo needs a database for workflow archive */
argoDb: DatabaseInstance;
/** Argo needs a temporary bucket to store objects */
Expand Down Expand Up @@ -72,7 +72,7 @@ export class LinzEksCluster extends Stack {
defaultCapacity: 0,
vpcSubnets: [{ subnetType: SubnetType.PRIVATE_WITH_EGRESS }],
/** This must align to Cluster version: {@link version} */
kubectlLayer: new KubectlV29Layer(this, 'KubeCtlLayer'),
kubectlLayer: new KubectlV30Layer(this, 'KubeCtlLayer'),
/** To prevent IP exhaustion when running huge workflows run using ipv6 */
ipFamily: IpFamily.IP_V6,
clusterLogging: [ClusterLoggingTypes.API, ClusterLoggingTypes.CONTROLLER_MANAGER, ClusterLoggingTypes.SCHEDULER],
Expand All @@ -81,7 +81,7 @@ export class LinzEksCluster extends Stack {
// TODO: setup up a database CNAME for changing Argo DB without updating Argo config
// TODO: run a Disaster Recovery test to recover database data
this.argoDb = new DatabaseInstance(this, ArgoDbInstanceName, {
engine: DatabaseInstanceEngine.postgres({ version: PostgresEngineVersion.VER_15_3 }),
engine: DatabaseInstanceEngine.postgres({ version: PostgresEngineVersion.VER_15_7 }),
instanceType: InstanceType.of(InstanceClass.T3, InstanceSize.SMALL),
vpc: this.vpc,
databaseName: ArgoDbName,
Expand Down
Loading

0 comments on commit bee1193

Please sign in to comment.