Skip to content

Commit

Permalink
feat: Integrate Karpenter(openedx#41)
Browse files Browse the repository at this point in the history
Integrates Karpenter and adds an infrastructure example for AWS users.

---------

Co-authored-by: lpm0073 <[email protected]>
Co-authored-by: Gábor Boros <[email protected]>
  • Loading branch information
3 people committed Dec 5, 2023
1 parent 71f517f commit 3d3bc53
Show file tree
Hide file tree
Showing 32 changed files with 1,892 additions and 7 deletions.
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,9 @@ infra-*/terraform.tfstate
infra-*/terraform.tfstate*
infra-*/.terraform*
infra-*/secrets.auto.tfvars
*kubeconfig
*terraform.tfstate*
*terraform.lock.*
.terraform
*secrets.auto.tfvars
my-notes
105 changes: 101 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ In particular, this project aims to provide the following benefits to Open edX o
## Technology stack and architecture

1. At the base is a Kubernetes cluster, which you must provide (e.g. using Terraform to provision Amazon EKS).
* Any cloud provider such as AWS or Digital Ocean should work. There is an example Terraform setup in `infra-example` but it is just a starting point and not recommended for production use.
* Any cloud provider such as AWS or Digital Ocean should work. There are Terraform examples in the `infra-examples` folder but it is just a starting point and not recommended for production use.
2. On top of that, this project's helm chart will install the shared resources you need - an ingress controller, monitoring, database clusters, etc. The following are included but can be disabled/replaced if you prefer an alternative:
* Ingress controller: [ingress-nginx](https://kubernetes.github.io/ingress-nginx/)
* Automatic HTTPS cert provisioning: [cert-manager](https://cert-manager.io/)
Expand Down Expand Up @@ -89,6 +89,75 @@ still present in your cluster.
[pod-autoscaling plugin](https://github.com/eduNEXT/tutor-contrib-pod-autoscaling) enables the implementation of HPA and
VPA to start scaling an installation workloads. Variables for the plugin configuration are documented there.

#### Node-autoscaling with Karpenter in EKS Clusters.

This section provides a guide on how to install and configure [Karpenter](https://karpenter.sh/) in a EKS cluster. We'll use
infrastructure examples included in this repo for such purposes.

> Prerequisites:
- An aws accound id
- Kubectl 1.27
- Terraform 1.5.x or higher
- Helm

1. Clone this repository and navigate to `./infra-examples/aws`. You'll find Terraform modules for `vpc` and `k8s-cluster`
resources. Proceed creating the `vpc` resources first, followed by the `k8s-cluster` resources. Make sure to have the target
AWS account ID available, and then execute the following commands on every folder:

```
terraform init
terraform plan
terraform apply -auto-approve
```

It will create an EKS cluster in the new VPC. Required Karpenter resources will also be created.

2. Once the `k8s-cluster` is created, run the `terraform output` command on that module and copy the following output variables:

- cluster_name
- karpenter_irsa_role_arn
- karpenter_instance_profile_name

These variables will be required in the next steps.

3. Karpenter is a dependency of the harmony chart that can be enabled or disabled. To include Karpenter in the Harmony Chart,
**it is crucial** to configure these variables in your `values.yaml` file:

- `karpenter.enabled`: true
- `karpenter.serviceAccount.annotations.eks\.amazonaws\.com/role-arn`: "<`karpenter_irsa_role_arn` value from module>"
- `karpenter.settings.aws.defaultInstanceProfile`: "<`karpenter_instance_profile_name` value from module>"
- `karpenter.settings.aws.clusterName`: "<`cluster_name` value from module>"

Find below an example of the Karpenter section in the `values.yaml` file:

```
karpenter:
enabled: true
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: "<karpenter_irsa_role_arn>"
settings:
aws:
# -- Cluster name.
clusterName: "<cluster_name"
# -- Cluster endpoint. If not set, will be discovered during startup (EKS only)
# From version 0.25.0, Karpenter helm chart allows the discovery of the cluster endpoint. More details in
# https://github.com/aws/karpenter/blob/main/website/content/en/docs/upgrade-guide.md#upgrading-to-v0250
# clusterEndpoint: "https://XYZ.eks.amazonaws.com"
# -- The default instance profile name to use when launching nodes
defaultInstanceProfile: "<karpenter_instance_profile_name>"
```

4. Now, install the Harmony Chart in the new EKS cluster using [these instructions](#usage-instructions). This will provide a
very basic Karpenter configuration with one [provisioner](https://karpenter.sh/docs/concepts/provisioners/) and one
[node template](https://karpenter.sh/docs/concepts/node-templates/). Please refer to the official documentation to
get further details.

> **NOTE:**
> This Karpenter installation does not support multiple provisioners or node templates for now.
5. To test Karpenter, you can proceed with the instructions included in the
[official documentation](https://karpenter.sh/docs/getting-started/getting-started-with-karpenter/#first-use).


<br><br><br>
Expand Down Expand Up @@ -238,18 +307,46 @@ Just run `helm uninstall --namespace harmony harmony` to uninstall this.
### How to create a cluster for testing on DigitalOcean

If you use DigitalOcean, you can use Terraform to quickly spin up a cluster, try this out, then shut it down again.
Here's how. First, put the following into `infra-tests/secrets.auto.tfvars` including a valid DigitalOcean access token:
Here's how. First, put the following into `infra-examples/secrets.auto.tfvars` including a valid DigitalOcean access token:
```
cluster_name = "harmony-test"
do_token = "digital-ocean-token"
```
Then run:
```
cd infra-example
cd infra-examples/digitalocean
terraform init
terraform apply
cd ..
export KUBECONFIG=`pwd`/infra-example/kubeconfig
export KUBECONFIG=`pwd`/infra-examples/kubeconfig
```
Then follow steps 1-4 above. When you're done, run `terraform destroy` to clean
up everything.

## Appendix C: how to create a cluster for testing on AWS

Similarly, if you use AWS, you can use Terraform to spin up a cluster, try this out, then shut it down again.
Here's how. First, put the following into `infra-examples/aws/vpc/secrets.auto.tfvars` and `infra-examples/aws/k8s-cluster/secrets.auto.tfvars`:

```terraform
account_id = "012345678912"
aws_region = "us-east-1"
name = "tutor-multi-test"
```

Then run:

```bash
aws sts get-caller-identity # to verify that awscli is properly configured
cd infra-examples/aws/vpc
terraform init
terraform apply # run time is approximately 1 minute
cd ../k8s-cluster
terraform init
terraform apply # run time is approximately 30 minutes

# to configure kubectl
aws eks --region us-east-1 update-kubeconfig --name tutor-multi-test --alias tutor-multi-test
```

Then follow steps 1-4 above. When you're done, run `terraform destroy` in both the `aws` and `k8s-cluster` modules to clean up everything.
7 changes: 5 additions & 2 deletions charts/harmony-chart/Chart.lock
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,8 @@ dependencies:
- name: opensearch
repository: https://opensearch-project.github.io/helm-charts
version: 2.13.3
digest: sha256:11b69b1ea771337b1e7cf8497ee342a25b095b86899b8cee716be8cc9f955559
generated: "2023-07-01T19:23:29.18815+03:00"
- name: karpenter
repository: oci://public.ecr.aws/karpenter
version: v0.29.2
digest: sha256:453b9f734e2d770948d3cbd36529d98da284b96de051581ea8d11a3c05e7a78e
generated: "2023-10-03T10:52:43.453442762-05:00"
7 changes: 6 additions & 1 deletion charts/harmony-chart/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes to the chart and its
# templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.2.0
version: 0.3.0
# This is the version number of the application being deployed. This version number should be incremented each time you
# make changes to the application. Versions are not expected to follow Semantic Versioning. They should reflect the
# version the application is using. It is recommended to use it with quotes.
Expand Down Expand Up @@ -47,3 +47,8 @@ dependencies:
version: "2.13.3"
condition: opensearch.enabled
repository: https://opensearch-project.github.io/helm-charts

- name: karpenter
version: "v0.29.2"
repository: oci://public.ecr.aws/karpenter
condition: karpenter.enabled
15 changes: 15 additions & 0 deletions charts/harmony-chart/templates/karpenter/node-template.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{{- if .Values.karpenter.enabled -}}
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
name: {{ .Values.karpenter.nodeTemplate.name }}
annotations:
"helm.sh/hook": post-install,post-upgrade
spec:
subnetSelector:
karpenter.sh/discovery: {{ .Values.karpenter.settings.aws.clusterName }}
securityGroupSelector:
karpenter.sh/discovery: {{ .Values.karpenter.settings.aws.clusterName }}
tags:
karpenter.sh/discovery: {{ .Values.karpenter.settings.aws.clusterName }}
{{- end }}
23 changes: 23 additions & 0 deletions charts/harmony-chart/templates/karpenter/provisioner.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{{- if .Values.karpenter.enabled -}}
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: {{ .Values.karpenter.provisioner.name }}
annotations:
"helm.sh/hook": post-install,post-upgrade
spec:
{{- if .Values.karpenter.provisioner.spec.requirements }}
requirements: {{ toYaml .Values.karpenter.provisioner.spec.requirements | nindent 4 }}
{{- end }}
{{- if .Values.karpenter.provisioner.spec.limits.resources }}
limits:
resources:
{{- range $key, $value := .Values.karpenter.provisioner.spec.limits.resources }}
{{ $key }}: {{ $value | quote }}
{{- end }}
{{- end }}
providerRef:
name: {{ .Values.karpenter.nodeTemplate.name }}
ttlSecondsUntilExpired: {{ .Values.karpenter.provisioner.spec.ttlSecondsUntilExpired }}
ttlSecondsAfterEmpty: {{ .Values.karpenter.provisioner.spec.ttlSecondsAfterEmpty }}
{{- end }}
53 changes: 53 additions & 0 deletions charts/harmony-chart/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -183,3 +183,56 @@ opensearch:
".opendistro-notebooks",
".opendistro-asynchronous-search-response*",
]
karpenter:
# add Karpenter node management for AWS EKS clusters. See: https://karpenter.sh/
enabled: false
serviceAccount:
name: "karpenter"
annotations:
eks.amazonaws.com/role-arn: ""
settings:
aws:
# -- Cluster name.
clusterName: ""
# -- Cluster endpoint. If not set, will be discovered during startup (EKS only)
# From version 0.25.0, Karpenter helm chart allows the discovery of the cluster endpoint. More details in
# https://github.com/aws/karpenter/blob/main/website/content/en/docs/upgrade-guide.md#upgrading-to-v0250
# clusterEndpoint: ""
# -- The default instance profile name to use when launching nodes
defaultInstanceProfile: ""
# -- interruptionQueueName is disabled if not specified. Enabling interruption handling may
# require additional permissions on the controller service account.
interruptionQueueName: ""
# ---------------------------------------------------------------------------
# Provide sensible defaults for resource provisioning and lifecycle
# ---------------------------------------------------------------------------
# Requirements for the provisioner API.
# More details in https://karpenter.sh/docs/concepts/provisioners/
provisioner:
name: "default"
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
# - key: node.kubernetes.io/instance-type
# operator: In
# values: ["t3.large", "t3.xlarge", "t3.2xlarge", "t2.xlarge", "t2.2xlarge"]
# - key: kubernetes.io/arch
# operator: In
# values: ["amd64"]
# The limits section controls the maximum amount of resources that the provisioner will manage.
# More details in https://karpenter.sh/docs/concepts/provisioners/#speclimitsresources
limits:
resources:
cpu: "200" # 50 nodes * 4 cpu
memory: "800Gi" # 50 nodes * 16Gi
# TTL in seconds. If nil, the feature is disabled, nodes will never terminate
ttlSecondsUntilExpired: 2592000
# TTL in seconds. If nil, the feature is disabled, nodes will never scale down
# due to low utilization.
ttlSecondsAfterEmpty: 30
# Node template reference. More details in https://karpenter.sh/docs/concepts/node-templates/
nodeTemplate:
name: "default"
Binary file removed harmony-chart/charts/opensearch-2.11.4.tgz
Binary file not shown.
33 changes: 33 additions & 0 deletions infra-examples/aws/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Reference Architecture for AWS

This module includes Terraform modules to create AWS reference resources that are preconfigured to support Open edX as well as [Karpenter](https://karpenter.sh/) for management of [AWS EC2 spot-priced](https://aws.amazon.com/ec2/spot/) compute nodes and enhanced pod bin packing.

## Virtual Private Cloud (VPC)

There are no explicit requirements for Karpenter within this VPC defintion. However, there *are* several requirements for EKS which might vary from the VPC module defaults now or in the future. These include:

- defined sets of subnets for both private and public networks
- a NAT gateway
- enabling DNS host names
- custom resource tags for public and private subnets
- explicit assignments of AWS region and availability zones

See additional details here: [AWS VPC README](./vpc/README.rst)

## Elastic Kubernetes Service (EKS)

AWS EKS has grown more complex over time. This reference implementation is preconfigured as necessary to ensure that a.) you and others on your team can access the Kubernetes cluster both from the AWS Console as well as from kubectl, b.) it will work for an Open edX deployment, and c.) it will work with Karpenter. With these goals in mind, please note the following configuration details:

- requirements detailed in the VPC section above are explicitly passed in to this module as inputs
- cluster endpoints for private and public access are enabled
- IAM Roles for Service Accounts (IRSA) is enabled
- Key Management Service (KMS) is enabled, encrypting all Kubernetes Secrets
- cluster access via aws-auth/configMap is enabled
- a karpenter.sh/discovery resource tag is added to the EKS instance
- various AWS EKS add-ons that are required by Open edX and/or Karpenter and/or its supporting systems (metrics-server, vpa) are included
- additional cluster node security configuration is added to allow node-to-node and pod-to-pod communication using internal DNS resolution
- a managed node group is added containing custom labels, IAM roles, and resource tags; all of which are required by Karpenter
- adds additional resources required by AWS EBS CSI Driver add-on, itself required by EKS since 1.22
- additional EC2 security groups are added to enable pod shell access from kubectl

See additional details here: [AWS EKS README](./k8s-cluster/README.rst)
Loading

0 comments on commit 3d3bc53

Please sign in to comment.