Skip to content

Commit

Permalink
docs: Explain how to destroy an installation TDE-1276 (#795)
Browse files Browse the repository at this point in the history
#### Motivation

Add and clarify documentation.

#### Checklist

- [ ] Tests updated
- [x] Docs updated
- [x] Issue linked in Title
  • Loading branch information
l0b0 authored Oct 7, 2024
1 parent 1735f16 commit fc300ee
Show file tree
Hide file tree
Showing 10 changed files with 93 additions and 63 deletions.
62 changes: 31 additions & 31 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -1,33 +1,33 @@
version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: daily
- package-ecosystem: "docker"
directory: "/.github/workflows"
schedule:
interval: daily
- package-ecosystem: npm
directory: "/"
schedule:
interval: daily
open-pull-requests-limit: 10
groups:
aws-sdk:
patterns:
- "@aws-sdk/*"
aws-cdk:
patterns:
- "@aws-cdk/*"
- "aws-cdk"
- "aws-cdk-lib"
- "cdk8s"
- "cdk8s-cli"
- "cdk8s-plus-*"
- "constructs"
ignore:
- dependency-name: "@aws-sdk/*"
update-types: ["version-update:semver-patch"]
- dependency-name: "@types/node"
update-types: ["version-update:semver-patch"]
- package-ecosystem: 'github-actions'
directory: '/'
schedule:
interval: daily
- package-ecosystem: 'docker'
directory: '/.github/workflows'
schedule:
interval: daily
- package-ecosystem: npm
directory: '/'
schedule:
interval: daily
open-pull-requests-limit: 10
groups:
aws-sdk:
patterns:
- '@aws-sdk/*'
aws-cdk:
patterns:
- '@aws-cdk/*'
- 'aws-cdk'
- 'aws-cdk-lib'
- 'cdk8s'
- 'cdk8s-cli'
- 'cdk8s-plus-*'
- 'constructs'
ignore:
- dependency-name: '@aws-sdk/*'
update-types: ['version-update:semver-patch']
- dependency-name: '@types/node'
update-types: ['version-update:semver-patch']
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ To connect to the EKS cluster you need to be [logged into AWS](https://toitutewh

Then to setup the cluster, only the first time using the cluster you need to run this


```bash
aws --region=ap-southeast-2 eks update-kubeconfig --name=Workflows
```
Expand Down
2 changes: 1 addition & 1 deletion docs/infrastructure/components/karpenter.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
# Karpenter
# Karpenter
25 changes: 25 additions & 0 deletions docs/infrastructure/destroy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# How to destroy an installation

Destroying the cluster and stack is not easy, because we use some custom EKS resources to link the two together. Based on a teardown, at time of writing the following sequence should work:

1. Delete the cluster:

```bash
aws eks delete-cluster --name=Workflows
aws eks wait cluster-deleted --name=Workflows
```

1. Attempt to delete the stack:

```bash
aws cloudformation delete-stack --stack-name=Workflows
aws cloudformation wait stack-delete-complete --stack-name=Workflows
```

1. Wait for the above to fail.
1. Go to the [stack in AWS console](https://ap-southeast-2.console.aws.amazon.com/cloudformation/home?region=ap-southeast-2#/stacks/?filteringText=Workflows&filteringStatus=active&viewNested=true)
1. Delete the stack, retaining all the resources which could not be deleted

The reason we don't use the CLI for the last step is that the logical ID of the resources which could not be deleted does not seem to be the same as the ones which need to be retained. The reason is uncertain, but for now deleting in the console is safer.
[How do I troubleshoot custom resource failures in AWS CloudFormation?](https://repost.aws/knowledge-center/cfn-troubleshoot-custom-resource-failures) might be relevant for future issues like this.
2 changes: 1 addition & 1 deletion docs/infrastructure/helm.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@ However, some of the component Helm charts do not have a `values.schema.json`. A

- [aws-for-fluent-bit](./components/fluentbit.md) (<https://github.com/aws/eks-charts/issues/1011>)
- [Karpenter](./components/karpenter.md)
- [Argo workflows](./components/argo.workflows.md)
- [Argo workflows](./components/argo.workflows.md)
3 changes: 2 additions & 1 deletion docs/infrastructure/initial.deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,6 @@ The first time a cluster is deployed Custom Resource Definitions (CRD) will not
This means that any resources that require a CRD will fail to deploy with a error similar to

> resource mapping not found for name: "karpenter-template" namespace: "" from "dist/0003-karpenter-provisioner.k8s.yaml": no matches for kind "AWSNodeTemplate" in version "karpenter.k8s.aws/v1alpha1"
> ensure CRDs are installed first
To work around this problem the first deployment can be repeated, as the CRDs are deployed early in the deployment process.
To work around this problem, re-run the `kubectl apply` command.
24 changes: 14 additions & 10 deletions docs/infrastructure/kubernetes.version.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ If there is a version matching to the Kubernetes version to upgrade to, upgrade
```bash
npm install --save-dev cdk8s-plus-27
```

2. Remove the previous version

```bash
Expand All @@ -34,12 +35,13 @@ Below is an example of upgrading from v1.27 to v1.28
```bash
npm install --save-dev @aws-cdk/lambda-layer-kubectl-v28
```

While also removing the old lambda-layer version

```bash
npm rm @aws-cdk/lambda-layer-kubectl-v27
```

2. Set the new Kubernetes version in `LinzEksCluster`

```typescript
Expand All @@ -50,9 +52,9 @@ Below is an example of upgrading from v1.27 to v1.28

```typescript
import { KubectlV28Layer } from '@aws-cdk/lambda-layer-kubectl-v28';
// ...
kubectlLayer: new KubectlV28Layer(this, 'KubeCtlLayer'),
```
Expand All @@ -64,9 +66,9 @@ Below is an example of upgrading from v1.27 to v1.28
workflow_maintainer_role="$(aws cloudformation describe-stacks --stack-name=TopographicSharedResourcesProd | jq --raw-output .Stacks[0].Outputs[0].OutputValue)"
npx cdk diff --context=maintainer-arns="${ci_role},${admin_role},${workflow_maintainer_role}" Workflows
```
The only changes should be Kubernetes version related.
```
Resources
[~] AWS::Lambda::LayerVersion KubeCtlLayer KubeCtlLayer replace
Expand Down Expand Up @@ -95,8 +97,9 @@ Below is an example of upgrading from v1.27 to v1.28
## Cycle out EC2 Nodes to the new version
<https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html#version-deprecation>
> **Are Amazon EKS managed node groups automatically updated along with the cluster control plane version?**
No. A managed node group creates Amazon EC2 instances in your account. These instances aren't automatically upgraded when you or Amazon EKS update your control plane. For more information, see Updating a managed node group. We recommend maintaining the same Kubernetes version on your control plane and nodes.
> No. A managed node group creates Amazon EC2 instances in your account. These instances aren't automatically upgraded when you or Amazon EKS update your control plane. For more information, see Updating a managed node group. We recommend maintaining the same Kubernetes version on your control plane and nodes.
This process is necessary to avoid being blocked for a future Kubernetes version upgrade. For example, if Kubernetes get upgraded from `1.27` to `1.28` and the nodes remain in `1.27`, the next time Kubernetes will be upgraded to `1.29`, the upgrade will fail.
Expand All @@ -105,10 +108,11 @@ This process is necessary to avoid being blocked for a future Kubernetes version
```bash
node_group_name="$(aws eks list-nodegroups --cluster-name=Workflows | jq --raw-output '.nodegroups[]')"
```
2. Describe the nodegroup to validate the versions
By describing the node group you can check the current version, or you can use `k get nodes` to see what version is currently running
```bash
aws eks describe-nodegroup --cluster-name=Workflows --nodegroup-name="$node_group_name" | jq --raw-output .nodegroup.version
```
Expand All @@ -118,9 +122,9 @@ This process is necessary to avoid being blocked for a future Kubernetes version
```bash
aws eks update-nodegroup-version --cluster-name=Workflows --nodegroup-name="$node_group_name"
```
This step takes some time to run. You can wait for it to finish with this command:
```bash
aws eks wait nodegroup-active --cluster-name=Workflows --nodegroup-name="$node_group_name"
```
22 changes: 11 additions & 11 deletions docs/labels.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ The following list of labels should be used in conjunction with Kubernetes [well

## Workflows

| Label | Description | Examples |
| --------------------- | ---------------------------------------- |--------------------------------------|
| `linz.govt.nz/ticket` | JIRA Ticket number | `TDE-912`, `BM-37` |
| `linz.govt.nz/region` | Geographic region that object relates to | "wellington", "auckland" |
| `linz.govt.nz/category` | The LINZ group that owns the workflow | "basemaps", "raster", "test", "util" |
| Label | Description | Examples |
| ----------------------- | ---------------------------------------- | ------------------------------------ |
| `linz.govt.nz/ticket` | JIRA Ticket number | `TDE-912`, `BM-37` |
| `linz.govt.nz/region` | Geographic region that object relates to | "wellington", "auckland" |
| `linz.govt.nz/category` | The LINZ group that owns the workflow | "basemaps", "raster", "test", "util" |

For the type of data that is being processed

Expand All @@ -25,12 +25,12 @@ For the type of data that is being processed

Most other objects deployed via AWS-CDK and CDK8s should also include information about the CICD process that deployed it

| Label | Description | Examples |
| -------------------------- | ---------------------------------------- | ------------------------------------------ |
| `linz.govt.nz/git-hash` | git hash that deployed the object | "bb3dab2779922094d2b8ecd4c67f30c66b38613d" |
| `linz.govt.nz/git-version` | git version information | "v6.46.0", "v0.0.1-20-gbb3dab27" |
| `linz.govt.nz/git-repository` | git repository that the object came from | "linz\_\_topo-workflows" |
| `linz.govt.nz/build-id` | Unique ID of the build that deployed | "6806791032-1" |
| Label | Description | Examples |
| ----------------------------- | ---------------------------------------- | ------------------------------------------ |
| `linz.govt.nz/git-hash` | git hash that deployed the object | "bb3dab2779922094d2b8ecd4c67f30c66b38613d" |
| `linz.govt.nz/git-version` | git version information | "v6.46.0", "v0.0.1-20-gbb3dab27" |
| `linz.govt.nz/git-repository` | git repository that the object came from | "linz\_\_topo-workflows" |
| `linz.govt.nz/build-id` | Unique ID of the build that deployed | "6806791032-1" |

## Label Usage

Expand Down
1 change: 1 addition & 0 deletions infra/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ Main entry point: [app](./cdk8s.ts)
```shell
npm install
```

- Login to AWS

### Deploy CDK
Expand Down
14 changes: 7 additions & 7 deletions templates/argo-tasks/stac-validate.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,10 @@ spec:
- name: AWS_ROLE_CONFIG_PATH
value: s3://linz-bucket-config/config.json
args:
- 'stac'
- 'validate'
- '--concurrency={{inputs.parameters.concurrency}}'
- '--recursive={{inputs.parameters.recursive}}'
- '--checksum-assets={{inputs.parameters.checksum_assets}}'
- '--checksum-links={{inputs.parameters.checksum_links}}'
- '{{inputs.parameters.uri}}'
- 'stac'
- 'validate'
- '--concurrency={{inputs.parameters.concurrency}}'
- '--recursive={{inputs.parameters.recursive}}'
- '--checksum-assets={{inputs.parameters.checksum_assets}}'
- '--checksum-links={{inputs.parameters.checksum_links}}'
- '{{inputs.parameters.uri}}'

0 comments on commit fc300ee

Please sign in to comment.