diff --git a/Makefile b/Makefile index f0d63f4404..44b025e25a 100644 --- a/Makefile +++ b/Makefile @@ -40,7 +40,7 @@ install-jq: sudo apt-get install jq -y install-terraform: - $(eval TERRAFORM_VERSION:=1.2.7) + $(eval TERRAFORM_VERSION:=1.4.5) curl "https://releases.hashicorp.com/terraform/$(TERRAFORM_VERSION)/terraform_$(TERRAFORM_VERSION)_linux_amd64.zip" -o "terraform.zip" unzip -o -q terraform.zip sudo install -o root -g root -m 0755 terraform /usr/local/bin/terraform diff --git a/deployments/cognito-rds-s3/terraform/variables.tf b/deployments/cognito-rds-s3/terraform/variables.tf index 9f4538a09e..2cc729b717 100644 --- a/deployments/cognito-rds-s3/terraform/variables.tf +++ b/deployments/cognito-rds-s3/terraform/variables.tf @@ -2,6 +2,11 @@ variable "cluster_name" { description = "Name of cluster" type = string + + validation { + condition = length(var.cluster_name) > 0 && length(var.cluster_name) <= 19 + error_message = "The cluster name must be between [1, 19] characters" + } } variable "cluster_region" { diff --git a/deployments/cognito/terraform/variables.tf b/deployments/cognito/terraform/variables.tf index 57dfa08f22..896fb4a67f 100644 --- a/deployments/cognito/terraform/variables.tf +++ b/deployments/cognito/terraform/variables.tf @@ -2,6 +2,11 @@ variable "cluster_name" { description = "Name of cluster" type = string + + validation { + condition = length(var.cluster_name) > 0 && length(var.cluster_name) <= 19 + error_message = "The cluster name must be between [1, 19] characters" + } } variable "cluster_region" { diff --git a/deployments/rds-s3/terraform/variables.tf b/deployments/rds-s3/terraform/variables.tf index 82c0378fee..97d35b8684 100644 --- a/deployments/rds-s3/terraform/variables.tf +++ b/deployments/rds-s3/terraform/variables.tf @@ -2,6 +2,11 @@ variable "cluster_name" { description = "Name of cluster" type = string + + validation { + condition = length(var.cluster_name) > 0 && length(var.cluster_name) <= 19 + error_message = "The cluster name must be between [1, 19] characters" + } } variable "cluster_region" { diff --git a/deployments/vanilla/terraform/variables.tf b/deployments/vanilla/terraform/variables.tf index c6ea4a9467..b05bf4eae6 100644 --- a/deployments/vanilla/terraform/variables.tf +++ b/deployments/vanilla/terraform/variables.tf @@ -2,6 +2,11 @@ variable "cluster_name" { description = "Name of cluster" type = string + + validation { + condition = length(var.cluster_name) > 0 && length(var.cluster_name) <= 19 + error_message = "The cluster name must be between [1, 19] characters" + } } variable "cluster_region" { diff --git a/website/content/en/docs/add-ons/load-balancer/guide.md b/website/content/en/docs/add-ons/load-balancer/guide.md index 73c91ef87d..a6eace9a38 100644 --- a/website/content/en/docs/add-ons/load-balancer/guide.md +++ b/website/content/en/docs/add-ons/load-balancer/guide.md @@ -10,6 +10,8 @@ This tutorial shows how to expose Kubeflow over a load balancer on AWS. Follow this guide only if you are **not** using `Cognito` as the authentication provider in your deployment. Cognito-integrated deployment is configured with the AWS Load Balancer controller by default to create an ingress-managed Application Load Balancer and exposes Kubeflow via a hosted domain. +> Note: For Terraform deployment users, some steps that should be skipped will have a note indicating such below. + ## Background Kubeflow does not offer a generic solution for connecting to Kubeflow over a Load Balancer because this process is highly dependent on your environment and cloud provider. On AWS, we use the [AWS Load Balancer (ALB) controller](https://kubernetes-sigs.github.io/aws-load-balancer-controller/), which satisfies the Kubernetes [Ingress resource](https://kubernetes.io/docs/concepts/services-networking/ingress/) to create an [Application Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html) (ALB). When you create a Kubernetes `Ingress`, an ALB is provisioned that load balances application traffic. @@ -37,8 +39,15 @@ This guide assumes that you have: ## Create Load Balancer + +#### Setup for Manifest deployments + If you prefer to create a load balancer using automated scripts, you **only** need to follow the steps in the [automated script section](#automated-script). You can read the following sections in this guide to understand what happens when you run the automated script or to walk through all of the steps manually. +#### Setup for Terraform deployments + +Follow the manual steps below. + ### Create domain and certificates You need a registered domain and TLS certificate to use HTTPS with Load Balancer. Since your top level domain (e.g. `example.com`) can be registered at any service provider, for uniformity and taking advantage of the integration provided between Route53, ACM, and Application Load Balancer, you will create a separate [sudomain](https://en.wikipedia.org/wiki/Subdomain) (e.g. `platform.example.com`) to host Kubeflow and a corresponding hosted zone in Route53 to route traffic for this subdomain. To get TLS support, you will need certificates for both the root domain (`*.example.com`) and subdomain (`*.platform.example.com`) in the region where your platform will run (your EKS cluster region). @@ -86,7 +95,9 @@ If you choose DNS validation for the validation of the certificates, you will be ```bash printf 'certArn='$certArn'' > awsconfigs/common/istio-ingress/overlays/https/params.env ``` -### Configure Load Balancer controller +### Configure Load Balancer Controller + +> Important: Skip this step if you are using a Terraform deployment since the AWS Load Balancer Controller is installed by default unless you set `enable_aws_load_balancer_controller = false`. Set up resources required for the Load Balancer controller: @@ -103,6 +114,7 @@ Set up resources required for the Load Balancer controller: ``` - `kubernetes.io/role/internal-elb`. Add this tag only to private subnets. - `kubernetes.io/role/elb`. Add this tag only to public subnets. + 1. The Load balancer controller uses [IAM roles for service accounts](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html)(IRSA) to access AWS services. An OIDC provider must exist for your cluster to use IRSA. Create an OIDC provider and associate it with your EKS cluster by running the following command if your cluster doesn’t already have one: ```bash eksctl utils associate-iam-oidc-provider --cluster ${CLUSTER_NAME} --region ${CLUSTER_REGION} --approve @@ -113,15 +125,30 @@ Set up resources required for the Load Balancer controller: export LBC_POLICY_ARN=$(aws iam create-policy --policy-name $LBC_POLICY_NAME --policy-document file://awsconfigs/infra_configs/iam_alb_ingress_policy.json --output text --query 'Policy.Arn') eksctl create iamserviceaccount --name aws-load-balancer-controller --namespace kube-system --cluster ${CLUSTER_NAME} --region ${CLUSTER_REGION} --attach-policy-arn ${LBC_POLICY_ARN} --override-existing-serviceaccounts --approve ``` + 1. Configure the parameters for [load balancer controller](https://github.com/awslabs/kubeflow-manifests/blob/main/awsconfigs/common/aws-alb-ingress-controller/base/params.env) with the cluster name. ```bash printf 'clusterName='$CLUSTER_NAME'' > awsconfigs/common/aws-alb-ingress-controller/base/params.env ``` -### Build Manifests and deploy components -Run the following command to build and install the components specified in the Load Balancer [kustomize](https://github.com/awslabs/kubeflow-manifests/blob/main/deployments/add-ons/load-balancer/kustomization.yaml) file. +### Install Load Balancer Controller + +> Important: Skip this step if you are using a Terraform deployment since the AWS Load Balancer Controller is installed by default unless you set `enable_aws_load_balancer_controller = false`. + +Run the following command to build and install the Load Balancer controller [kustomize](https://github.com/awslabs/kubeflow-manifests/blob/main/awsconfigs/common/aws-alb-ingress-controller/base/kustomization.yaml) file. + ```bash -while ! kustomize build deployments/add-ons/load-balancer | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 30; done +kustomize build awsconfigs/common/aws-alb-ingress-controller/base | kubectl apply -f - +kubectl wait --for condition=established crd/ingressclassparams.elbv2.k8s.aws +kustomize build awsconfigs/common/aws-alb-ingress-controller/base | kubectl apply -f - +``` + +### Create Ingress + +Create an ingress that will use the certifcate you specified in `certArn`. + +```bash +kustomize build awsconfigs/common/istio-ingress/overlays/https | kubectl apply -f - ``` ### Update the domain with ALB address @@ -140,6 +167,8 @@ while ! kustomize build deployments/add-ons/load-balancer | kubectl apply -f -; ### Automated script +> Important: Terraform deployment users should not follow these Automated setup instructions and should follow the [Manual setup instructions](#create-load-balancer). + 1. Install dependencies for the script ```bash cd tests/e2e @@ -198,6 +227,8 @@ while ! kustomize build deployments/add-ons/load-balancer | kubectl apply -f -; ## Clean up +> Important: Terraform deployment users should not follow these clean up steps and should manually delete resources created while following the [Manual setup instructions](#create-load-balancer). + To delete the resources created in this guide, run the following commands from the root of your repository: > Note: Make sure that you have the configuration file created by the script in `tests/e2e/utils/load_balancer/config.yaml`. If you did not use the script, plug in the name, ARN, or ID of the resources that you created in the configuration file by referring to the sample in Step 4 of the [previous section](#automated-script). ```bash diff --git a/website/content/en/docs/add-ons/storage/efs/guide.md b/website/content/en/docs/add-ons/storage/efs/guide.md index af786c951a..b3561c4db3 100644 --- a/website/content/en/docs/add-ons/storage/efs/guide.md +++ b/website/content/en/docs/add-ons/storage/efs/guide.md @@ -6,6 +6,8 @@ weight = 10 This guide describes how to use Amazon EFS as Persistent storage on top of an existing Kubeflow deployment. +> Note: For Terraform deployment users, some steps that should be skipped will have a note indicating such below. + ## 1.0 Prerequisites For this guide, we assume that you already have an EKS Cluster with Kubeflow installed. The FSx CSI Driver can be installed and configured as a separate resource on top of an existing Kubeflow deployment. See the [deployment options]({{< ref "/docs/deployment" >}}) and [general prerequisites]({{< ref "/docs/deployment/vanilla/guide.md" >}}) for more information. @@ -37,9 +39,18 @@ export CLAIM_NAME= ## 2.0 Set up EFS +#### Setup for Manifest deployments + You can either use Automated or Manual setup to set up the resources required. If you choose the manual route, you get another choice between **static and dynamic provisioning**, so pick whichever suits you. On the other hand, for the automated script we currently only support **dynamic provisioning**. Whichever combination you pick, be sure to continue picking the appropriate sections through the rest of this guide. +#### Setup for Terraform deployments + +Follow the Manual setup to set up the resources required. As part of the Manual setup, you get another choice between **static and dynamic provisioning**, so pick whichever suits you. + ### 2.1 [Option 1] Automated setup + +> Important: Terraform deployment users should not follow these Automated setup instructions and should follow the [Manual setup instructions](#22-option-2-manual-setup). + The script automates all the manual resource creation steps but is currently only available for **Dynamic Provisioning** option. It performs the required cluster configuration, creates an EFS file system and it also takes care of creating a storage class for dynamic provisioning. Once done, move to section 3.0. 1. Run the following commands from the `tests/e2e` directory: @@ -80,7 +91,11 @@ If you prefer to manually setup each component then you can follow this manual g export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text) ``` -#### 1. Install the EFS CSI driver +#### 1. Driver install and IAM configuration + +> Important: Skip this step if you are using a Terraform deployment since EFS CSI driver is installed by default unless you set `enable_aws_efs_csi_driver = false`. + +##### 1.1 Install the EFS CSI driver We recommend installing the EFS CSI Driver v1.5.4 directly from the [the aws-efs-csi-driver github repo](https://github.com/kubernetes-sigs/aws-efs-csi-driver) as follows: ```bash @@ -95,7 +110,7 @@ NAME ATTACHREQUIRED PODINFOONMOUNT MODES AGE efs.csi.aws.com false false Persistent 5d17h ``` -#### 2. Create the IAM Policy for the CSI driver +##### 1.2. Create the IAM Policy for the CSI driver The CSI driver's service account (created during installation) requires IAM permission to make calls to AWS APIs on your behalf. Here, we will be annotating the Service Account `efs-csi-controller-sa` with an IAM Role which has the required permissions. 1. Download the IAM policy document from GitHub as follows. @@ -129,7 +144,7 @@ eksctl create iamserviceaccount \ kubectl describe -n kube-system serviceaccount efs-csi-controller-sa ``` -#### 3. Manually create an instance of the EFS filesystem +#### 2. Manually create an instance of the EFS filesystem Please refer to the official [AWS EFS CSI Document](https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html#efs-create-filesystem) for detailed instructions on creating an EFS filesystem. > Note: For this guide, we assume that you are creating your EFS Filesystem in the same VPC as your EKS Cluster. @@ -137,7 +152,7 @@ Please refer to the official [AWS EFS CSI Document](https://docs.aws.amazon.com/ #### Choose between dynamic and static provisioning In the following section, you have to choose between setting up [dynamic provisioning](https://kubernetes.io/docs/concepts/storage/dynamic-provisioning/) or setting up static provisioning. -#### 4. [Option 1] Dynamic provisioning +#### 3. [Option 1] Dynamic provisioning 1. Use the `$file_system_id` you recorded in section 3 above or use the AWS Console to get the filesystem id of the EFS file system you want to use. Now edit the `dynamic-provisioning/sc.yaml` file by chaning `` with your `fs-xxxxxx` file system id. You can also change it using the following command : ```bash file_system_id=$file_system_id yq e '.parameters.fileSystemId = env(file_system_id)' -i $GITHUB_STORAGE_DIR/efs/dynamic-provisioning/sc.yaml @@ -161,7 +176,7 @@ kubectl apply -f $GITHUB_STORAGE_DIR/efs/dynamic-provisioning/pvc.yaml Note : The `StorageClass` is a cluster scoped resource which means we only need to do this step once per cluster. -#### 4. [Option 2] Static Provisioning +#### 3. [Option 2] Static Provisioning Using [this sample](https://github.com/kubernetes-sigs/aws-efs-csi-driver/tree/master/examples/kubernetes/multiple_pods), we provided the required spec files in the sample subdirectory. However, you can create the PVC another way. 1. Use the `$file_system_id` you recorded in section 3 above or use the AWS Console to get the filesystem id of the EFS file system you want to use. Now edit the last line of the static-provisioning/pv.yaml file to specify the `volumeHandle` field to point to your EFS filesystem. Replace `$file_system_id` if it is not already set. diff --git a/website/content/en/docs/add-ons/storage/fsx-for-lustre/guide.md b/website/content/en/docs/add-ons/storage/fsx-for-lustre/guide.md index 5ddcf4d853..f805f86eed 100644 --- a/website/content/en/docs/add-ons/storage/fsx-for-lustre/guide.md +++ b/website/content/en/docs/add-ons/storage/fsx-for-lustre/guide.md @@ -6,6 +6,8 @@ weight = 20 This guide describes how to use Amazon FSx as Persistent storage on top of an existing Kubeflow deployment. +> Note: For Terraform deployment users, some steps that should be skipped will have a note indicating such below. + ## 1.0 Prerequisites For this guide, we assume that you already have an EKS Cluster with Kubeflow installed. The FSx CSI Driver can be installed and configured as a separate resource on top of an existing Kubeflow deployment. See the [deployment options]({{< ref "/docs/deployment" >}}) and [general prerequisites]({{< ref "/docs/deployment/vanilla/guide.md" >}}) for more information. @@ -36,9 +38,19 @@ export CLAIM_NAME= ``` ## 2.0 Setup FSx for Lustre + +#### Setup for Manifest deployments + You can either use Automated or Manual setup. We currently only support **Static provisioning** for FSx. +#### Setup for Terraform deployments + +Follow the Manual setup. We currently only support **Static provisioning** for FSx. + ### 2.1 [Option 1] Automated setup + +> Important: Terraform deployment users should not follow these Automated setup instructions and should follow the [Manual setup instructions](#22-option-2-manual-setup). + The script automates all the manual resource creation steps but is currently only available for **Static Provisioning** option. It performs the required cluster configuration, creates an FSx file system and it also takes care of creating a storage class for static provisioning. Once done, move to section 3.0. 1. Run the following commands from the `tests/e2e` directory: @@ -74,7 +86,11 @@ The script applies some default values for the file system name, performance mod ### 2.2 [Option 2] Manual setup If you prefer to manually setup each component then you can follow this manual guide. -#### 1. Install the FSx CSI Driver +#### 1. Driver install and IAM configuration + +> Important: Skip this step if you are using a Terraform deployment since EFS CSI driver is installed by default unless you set `enable_aws_fsx_csi_driver = false`. + +##### 1. Install the FSx CSI Driver We recommend installing the FSx CSI Driver v0.9.0 directly from the [the aws-fsx-csi-driver GitHub repository](https://github.com/kubernetes-sigs/aws-fsx-csi-driver) as follows: ```bash @@ -89,7 +105,7 @@ NAME ATTACHREQUIRED PODINFOONMOUNT MODES AGE fsx.csi.aws.com false false Persistent 14s ``` -#### 2. Create the IAM Policy for the CSI Driver +##### 2. Create the IAM Policy for the CSI Driver The CSI driver's service account (created during installation) requires IAM permission to make calls to AWS APIs on your behalf. Here, we will be annotating the Service Account `fsx-csi-controller-sa` with an IAM Role which has the required permissions. 1. Create the policy using the json file provided as follows: @@ -117,12 +133,12 @@ eksctl create iamserviceaccount \ kubectl describe -n kube-system serviceaccount fsx-csi-controller-sa ``` -#### 3. Create an instance of the FSx Filesystem +#### 2. Create an instance of the FSx Filesystem Please refer to the official [AWS FSx CSI documentation](https://docs.aws.amazon.com/fsx/latest/LustreGuide/getting-started-step1.html) for detailed instructions on creating an FSx filesystem. Note: For this guide, we assume that you are creating your FSx Filesystem in the same VPC as your EKS Cluster. -#### 4. Static provisioning +#### 3. Static provisioning [Using this sample from official Kubeflow Docs](https://www.kubeflow.org/docs/distributions/aws/customizing-aws/storage/#amazon-fsx-for-lustre) 1. Use the AWS Console to get the filesystem id of the FSx volume you want to use. You could also use the following command to list all the volumes available in your region. Either way, make sure that `file_system_id` is set. diff --git a/website/content/en/docs/component-guides/profiles.md b/website/content/en/docs/component-guides/profiles.md index 26dd8af9e1..f2df53aa39 100644 --- a/website/content/en/docs/component-guides/profiles.md +++ b/website/content/en/docs/component-guides/profiles.md @@ -43,8 +43,12 @@ You can find documentation about the `AwsIamForServiceAccount` plugin for specif ## Configuration steps +> Note: For Terraform deployment users, some steps that should be skipped will have a note indicating such below. + After installing Kubeflow on AWS with one of the available [deployment options]({{< ref "/docs/deployment" >}}), you can configure Kubeflow Profiles with the following steps: +### 1. Setup Environment Variables + 1. Define the following environment variables: ```bash @@ -55,7 +59,11 @@ After installing Kubeflow on AWS with one of the available [deployment options]( export PROFILE_CONTROLLER_POLICY_NAME= ``` -2. Create an IAM policy using the [IAM Profile controller policy](https://github.com/awslabs/kubeflow-manifests/blob/main/awsconfigs/infra_configs/iam_profile_controller_policy.json) file. +### 2. Configure the Profile Controller + +> Important: Terraform deployent users should skip this step. + +1. Create an IAM policy using the [IAM Profile controller policy](https://github.com/awslabs/kubeflow-manifests/blob/main/awsconfigs/infra_configs/iam_profile_controller_policy.json) file. ```bash aws iam create-policy \ @@ -66,7 +74,7 @@ After installing Kubeflow on AWS with one of the available [deployment options]( As a principle of least privilege, we recommend scoping the resources in the [IAM Profile controller policy](https://github.com/awslabs/kubeflow-manifests/blob/main/awsconfigs/infra_configs/iam_profile_controller_policy.json) to the specific policy arns of the policies created in step 6. -3. Associate IAM OIDC with your cluster. +2. Associate IAM OIDC with your cluster. ```bash aws --region $CLUSTER_REGION eks update-kubeconfig --name $CLUSTER_NAME @@ -74,7 +82,7 @@ After installing Kubeflow on AWS with one of the available [deployment options]( eksctl utils associate-iam-oidc-provider --cluster $CLUSTER_NAME --region $CLUSTER_REGION --approve ``` -4. Create an IRSA for the Profile controller using the policy. +3. Create an IRSA for the Profile controller using the policy. ```bash eksctl create iamserviceaccount \ @@ -87,7 +95,9 @@ After installing Kubeflow on AWS with one of the available [deployment options]( --approve ``` -5. Create an IAM trust policy to authorize federated requests from the OIDC provider. +### 3. Create a Profile + +1. Create an IAM trust policy to authorize federated requests from the OIDC provider. ```bash export OIDC_URL=$(aws eks describe-cluster --region $CLUSTER_REGION --name $CLUSTER_NAME --query "cluster.identity.oidc.issuer" --output text | cut -c9-) @@ -113,9 +123,9 @@ After installing Kubeflow on AWS with one of the available [deployment options]( EOF ``` -6. [Create an IAM policy](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create.html) to scope the permissions for the Profile. For simplicity, we will use the `arn:aws:iam::aws:policy/AmazonS3FullAccess` policy as an example. +2. [Create an IAM policy](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create.html) to scope the permissions for the Profile. For simplicity, we will use the `arn:aws:iam::aws:policy/AmazonS3FullAccess` policy as an example. -7. [Create an IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create.html) for the Profile using the scoped policy from the previous step. +3. [Create an IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create.html) for the Profile using the scoped policy from the previous step. ```bash aws iam create-role --role-name $PROFILE_NAME-$CLUSTER_NAME-role --assume-role-policy-document file://trust.json @@ -123,7 +133,7 @@ After installing Kubeflow on AWS with one of the available [deployment options]( aws iam attach-role-policy --role-name $PROFILE_NAME-$CLUSTER_NAME-role --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess ``` -8. Create a user in your configured auth provider (e.g. Cognito or Dex) or use an existing user. +4. Create a user in your configured auth provider (e.g. Cognito or Dex) or use an existing user. Export the user as an environment variable. For simplicity, we will use the `user@example.com` user that is created by default by most of our provided deployment options. @@ -131,7 +141,7 @@ After installing Kubeflow on AWS with one of the available [deployment options]( export PROFILE_USER="user@example.com" ``` -9. Create a Profile using the `PROFILE_NAME`. +5. Create a Profile using the `PROFILE_NAME`. ```bash cat < profile_iam.yaml