Deploying Stateful Microservices with Amazon FSx Lustre

Amazon FSx for Lustre is a fully managed service that provides cost-effective, high-performance storage for compute workloads. FSx for Lustre offers sub-millisecond latencies, up to hundreds of gigabytes per second of throughput, and millions of IOPS.

Amazon FSx for Lustre CSI driver

Setup the variables

ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text --region $AWS_REGION)
VPC_ID=$(aws eks describe-cluster --name $CLUSTER_NAME --query "cluster.resourcesVpcConfig.vpcId" --output text --region $AWS_REGION)
SUBNET_ID=$(aws eks describe-cluster --name $CLUSTER_NAME --query "cluster.resourcesVpcConfig.subnetIds[0]" --output text --region $AWS_REGION)
SECURITY_GROUP_ID=$(aws eks describe-cluster --name $CLUSTER_NAME --query "cluster.resourcesVpcConfig.securityGroupIds" --output text --region $AWS_REGION)
CIDR_BLOCK=$(aws ec2 describe-vpcs --vpc-ids $VPC_ID --query "Vpcs[].CidrBlock" --output text --region $AWS_REGION)
S3_LOGS_BUCKET=eks-fsx-lustre-$(cat /dev/urandom | LC_ALL=C tr -dc "[:alpha:]" | tr '[:upper:]' '[:lower:]' | head -c 32)
SECURITY_GROUP_ID=$(aws eks describe-cluster --name $CLUSTER_NAME --query "cluster.resourcesVpcConfig.clusterSecurityGroupId" --output text --region $AWS_REGION)

To deploy the Amazon FSx for Lustre CSI driver to an Amazon EKS cluster

  1. Create an AWS Identity and Access Management OIDC provider and associate it with your cluster.
eksctl utils associate-iam-oidc-provider \
    --region $AWS_REGION \
    --cluster $CLUSTER_NAME \
  1. Create an IAM policy and service account that allows the driver to make calls to AWS APIs on your behalf.
cat << EOF >  fsx-csi-driver.json
  1. Create the policy.
aws iam create-policy \
    --policy-name Amazon_FSx_Lustre_CSI_Driver \
    --policy-document file://fsx-csi-driver.json \
    --region $AWS_REGION
  1. Create a Kubernetes service account for the driver and attach the policy to the service account.
eksctl create iamserviceaccount \
    --region $AWS_REGION \
    --name fsx-csi-controller-sa \
    --namespace kube-system \
    --cluster $CLUSTER_NAME \
    --attach-policy-arn arn:aws-cn:iam::$ACCOUNT_ID:policy/Amazon_FSx_Lustre_CSI_Driver \
    --approve --override-existing-serviceaccounts
  1. Save the Role ARN
export ROLE_ARN=$(aws cloudformation describe-stacks --stack-name eksctl-eksworkshop-addon-iamserviceaccount-kube-system-fsx-csi-controller-sa --query "Stacks[0].Outputs[0].OutputValue" --output text --region $AWS_REGION)
  1. Deploy the aws-fsx-csi-driver
kubectl apply -k ""

kubectl get pods -n kube-system
NAME                                            READY   STATUS    RESTARTS   AGE
fsx-csi-controller-55bcb55d5d-hkn4p             2/2     Running   0          8d
fsx-csi-controller-55bcb55d5d-whff2             2/2     Running   0          8d
fsx-csi-node-8hqmk                              3/3     Running   0          8d
fsx-csi-node-v9zsw                              3/3     Running   0          8d
  1. Patch the driver deployment to add the service account
kubectl annotate serviceaccount -n kube-system fsx-csi-controller-sa \$ROLE_ARN --overwrite=true

kubectl get pods -n kube-system

Deploying the Stateful service to use the Amazon FSx

  1. Deploy a Kubernetes storage class, persistent volume claim, and sample application to verify that the CSI driver is working
# 1. Create an Amazon S3 bucket
aws s3 mb s3://$S3_LOGS_BUCKET --region $AWS_REGION
echo test-file >> testfile
aws s3 cp testfile s3://$S3_LOGS_BUCKET/export/testfile --region $AWS_REGION

# 2. Edit Security group
aws ec2 authorize-security-group-ingress --group-id ${SECURITY_GROUP_ID} --protocol tcp --port 988 --cidr --region $AWS_REGION
  1. Create the storageclass definition

If you only want to import data and read it without any modification and creation, then you don't need a value for s3ExportPath in your storageclass.yaml file.

cat << EOF > storageclass.yaml
kind: StorageClass
    name: fsx-sc
    subnetId: ${SUBNET_ID}
    securityGroupIds: ${SECURITY_GROUP_ID}
    s3ImportPath: s3://${S3_LOGS_BUCKET}
    s3ExportPath: s3://${S3_LOGS_BUCKET}/export
    deploymentType: SCRATCH_2
    - flock

kubectl apply -f storageclass.yaml
kubectl get pv
  1. Persistent volume claim
  • Download the persistent volume claim manifest

    curl -o claim.yaml
  • Edit the claim.yaml file. Change the following to one of the increment values listed below, based on your storage requirements and the deploymentType that you selected in a previous step.

    storage: <1200Gi>
    SCRATCH_2 and PERSISTENT – 1.2 TiB, 2.4 TiB, or increments of 2.4 TiB over 2.4 TiB.
    SCRATCH_1 – 1.2 TiB, 2.4 TiB, 3.6 TiB, or increments of 3.6 TiB over 3.6 TiB.
  • Create the persistent volume claim.

    kubectl apply -f claim.yaml
  • Confirm that the file system is provisioned.

kubectl get pvc
kubectl get persistentvolumeclaims fsx-claim -w

NAME        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
fsx-claim   Bound    pvc-fe62a8c9-7c96-49fd-a495-e8ca725b6571   1200Gi     RWX            fsx-sc         9m10s

The STATUS may show as Pending for 5-10 minutes, before changing to Bound. Don't continue with the next step until the STATUS is Bound.

Check the FSx console, you can find new created FSx Lustre file system lustre_file_sys.png

Deploy the sample application.

  1. Prepare the example YAML file
cat >test-pod.yaml <<EOF
apiVersion: v1
kind: Pod
  name: fsx-app
  - name: app
    image: centos
    command: ["/bin/sh"]
    args: ["-c", "while true; do echo \"hello from FSx\" >> /data/out.txt; sleep 5; done"]
    - name: persistent-storage
      mountPath: /data
  nodeSelector: linux amd64
  - name: persistent-storage
      claimName: fsx-claim
  1. Deploy application
kubectl apply -f test-pod.yaml

kubectl get pods
fsx-app          1/1     Running   0          6m33s

kubectl describe pod fsx-app
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  57s   default-scheduler  Successfully assigned default/sample-fsx-app to
  Normal  Pulling    56s   kubelet            Pulling image ""
  Normal  Pulled     52s   kubelet            Successfully pulled image ""
  Normal  Created    51s   kubelet            Created container app
  Normal  Started    51s   kubelet            Started container app

# kubectl logs should no error or exception
kubectl logs fsx-app -f
  1. Check the result
kubectl exec -ti fsx-app -- tail -f /data/out.txt
hello from FSx
hello from FSx
  1. Sample 2, using amazonlinux
curl -o sample-pod.yaml

# If you mix the windows and linux node group, you need modify the sample-pod.yaml to add the nodeSelector under spec
  nodeSelector: linux amd64

kubectl apply -f sample-pod.yaml

kubectl get pods
fsx-app          1/1     Running   0          6m33s
sample-fsx-app   1/1     Running   0          3m55s

kubectl describe pod sample-fsx-app
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  3m43s  default-scheduler  Successfully assigned default/fsx-app to
  Normal  Pulling    3m41s  kubelet            Pulling image ""
  Normal  Pulled     3m37s  kubelet            Successfully pulled image ""
  Normal  Created    3m36s  kubelet            Created container app
  Normal  Started    3m36s  kubelet            Started container app

# kubectl logs should no error or exception
kubectl logs fsx-app -f

kubectl exec -ti fsx-app -- tail -f /data/out.txt
  1. Troubleshooting
  • FSX CSI drivers don't support windows at the moment.
  • Make sure the aws-node pod is running on your kubernetes nodes
kubectl get pods -o wide --all-namespaces | grep aws-node
  • Make sure aws-node and fsx-csi-node are in READY and AVAILABLE status
kubectl get daemonset --all-namespaces
  • Make sure aws-node, fsx-csi-controller and fsx-csi-node pods are running on the node
kubectl describe node <YOUR LINUX NODE>

Access Amazon S3 files from the Amazon FSx for Lustre file system

  1. Verify that data was written to the Amazon FSx for Lustre file system by the sample app.
kubectl exec -it sample-fsx-app ls /data
export  out.txt
  1. Archive files to the s3ExportPath
  • Export the file /data/out.txt back to Amazon S3.
kubectl exec -ti sample-fsx-app -- lfs hsm_archive /data/out.txt
# Should no error or exception
  • Confirm that the out.txt file was written to the s3ExportPath folder in Amazon S3
aws s3api list-buckets | grep eks-fsx-lustre
export S3_LOGS_BUCKET=

aws s3 ls s3://$S3_LOGS_BUCKET/export/ --region cn-northwest-1 
2021-01-29 18:22:34      57005 out.txt
2021-01-20 22:54:29         10 testfile


kubectl delete -f test-pod.yaml
kubectl delete -f sample-pod.yaml
kubectl delete -f claim.yaml
kubectl delete -f storageclass.yaml
kubectl delete -k ""

aws s3 rm --recursive s3://$S3_LOGS_BUCKET
aws s3 rb  s3://$S3_LOGS_BUCKET

eksctl delete iamserviceaccount \
    --region $AWS_REGION \
    --name fsx-csi-controller-sa \
    --namespace kube-system \
    --cluster $CLUSTER_NAME

aws iam delete-policy \
    --policy-arn arn:aws:iam::$ACCOUNT_ID:policy/Amazon_FSx_Lustre_CSI_Driver


