Incorrect allocatable volumes count in csinode for AWS vt1/g4 instance types #2105

mpatlasov · 2024-08-02T02:09:29Z

/kind bug

What happened?

kubectl get csinode <node-name> -o json | jq .spec.drivers says that allocatable.count is 26 for vt1* instance types and 25 for g4* ones. While actual number of volumes that can be attached to the node is smaller:

type / reported / actual
g4dn.xlarge / 25 / 24
g4ad.xlarge / 25 / 24
vt1.3xlarge / 26 / 24
vt1.6xlarge / 26 / 22

There are many other g4* instance types mentioned here, but I verified the issue only for g4dn.xlarge and g4ad.xlarge. Reported number for vt1.24xlarge (26) is correct, while numbers for other vt1* types are not.

What you expected to happen?

kubectl get csinode must report correct max number of volumes to be attached.

How to reproduce it (as minimally and precisely as possible)?

Apply the following StatefulSet with 26 replicas:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: statefulset-vol-limit
spec:
  serviceName: "my-svc"
  replicas: 26
  selector:
    matchLabels:
      app: my-svc
  template:
    metadata:
      labels:
        app: my-svc
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values:
                - <node-name>
      containers:
      - name: fedora
        image: registry.fedoraproject.org/fedora-minimal
        command:
        - "sleep"
        - "604800"
        volumeMounts:
        - name: data
          mountPath: /mnt/storage
      tolerations:
        - key: "node-role.kubernetes.io/master"
          effect: "NoSchedule"
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi

In a while some pods get stuck at "ContainerCreating" status caused by volumes stuck at attaching status and couldn't be attached to the node. An error for a pod which got stuck looks like that:

$ oc describe po statefulset-vol-limit-22
...
  Warning  FailedAttachVolume  19s (x4 over 8m30s)  attachdetach-controller  (combined from similar events): AttachVolume.Attach failed for volume "pvc-f789150e-ef53-4166-97b1-8b44b4aadd54" : rpc error: code = Internal desc = Could not attach volume "vol-0cf35f3bad4a3e6f1" to node "i-0f445b9bbcbfbeb10": WaitForAttachmentState AttachVolume error, expected device but be attached but was attaching, volumeID="vol-0cf35f3bad4a3e6f1", instanceID="i-0f445b9bbcbfbeb10", Device="/dev/xvdaw", err=operation error EC2: AttachVolume, https response error StatusCode: 400, RequestID: b8659146-ddff-4c65-84a8-1e36e55ff3ec, api error VolumeInUse: vol-0cf35f3bad4a3e6f1 is already attached to an instance

Anything else we need to know?:

Official doc "Amazon EBS volume limits for Amazon EC2 instances" states clearly that GPU (or accelerators) must be counted:

For accelerated computing instances, the attached accelerators count towards the shared volume limit. For example, for p4d.24xlarge instances, which have a shared volume limit of 28, 8 GPUs, and 8 NVMe instance store volumes, you can attach up to 11 Amazon EBS volumes (28 volume limit - 1 network interface - 8 GPUs - 8 NVMe instance store volumes).

While getVolumesLimit() doesn't take care. It starts from availableAttachments=28 for Nitro instances, then applies the following arithmetic:

availableAttachments - enis - nvmeInstanceStoreVolumes - reservedVolumeAttachments

e.g. 28 - 1 - 1 - 1 == 25 for g4ad.xlarge.

There are must be other contributors (other than GPUs) because for vt1* instance types actual number doesn't decrease monotonically:

type / reported / actual
vt1.3xlarge / 26 / 24
vt1.6xlarge / 26 / 22
vt1.24xlarge / 26 / 26

I.e., it's hard to explain <24 , 22 , 26> solely from number-of-accelerators considerations.

Environment

Kubernetes version (use kubectl version):

$ kubectl version
Client Version: v1.30.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.2

Driver version:
Compiled manually (by docker build -t quay.io/rh_ee_mpatlaso/misc:aws-ebs-csi-drv-upstream -f Dockerfile .) from the head of master branch of https://github.com/kubernetes-sigs/aws-ebs-csi-driver :

commit 93dd985300a0fc61fe5e4957d43c52bc590abd28 (HEAD -> master, tag: helm-chart-aws-ebs-csi-driver-2.33.0, origin/master, origin/HEAD)
Merge: 25e3222a dc71aec9
Author: Kubernetes Prow Robot <[email protected]>
Date:   Wed Jul 24 15:34:27 2024 -0700

    Merge pull request #2098 from kubernetes-sigs/release-1.33
    
    Finalize Release v1.33.0

The text was updated successfully, but these errors were encountered:

AndrewSirenko · 2024-08-02T15:40:12Z

Hey @mpatlasov, thank you for raising this issue up! We will add this count of accelerators for these instance types to node startup by next release (as well as any other devices that we are missing).

Really appreciate the detailed ramp up and resources on this!

/assign @ElijahQuinones

k8s-ci-robot · 2024-08-02T15:40:14Z

@AndrewSirenko: GitHub didn't allow me to assign the following users: ElijahQuinones.

Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

Hey @mpatlasov, thank you for raising this issue up! We will add this count of accelerators for these instance types to node startup by next release (as well as any other devices that we are missing).

Really appreciate the detailed ramp up and resources on this!

/assign @ElijahQuinones

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

AndrewSirenko · 2024-08-02T16:25:22Z

/priority important-soon

ElijahQuinones · 2024-08-14T20:20:22Z

Hi @mpatlasov,

The PR for Gpus not being factored in has already been merged, and the PR for accelerators is in review right now.

As for your observation:

| There are must be other contributors (other than GPUs) because for vt1* instance types actual number doesn't decrease monotonically

The VT instance type is special in that both the vt1.3xlarge and vt1.6xlarge have accelerators that take up two attachment slots each. As for the vt1.24xlarge it's accelerators do not take up any attachment slots at all. This is not well documented and I have cut an internal documentation ticket to correct this.

Please let me know if you have any further questions or concerns!

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 2, 2024

k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Aug 2, 2024

ConnorJC3 mentioned this issue Aug 6, 2024

instance volume limits: workloads no longer attach ebs volumes #1163

Closed

This was referenced Aug 12, 2024

Allocatable #2107

Closed

Fix gpus not being considered when counting allocatables #2108

Merged

Fix accelerators not being considered when counting allocatables #2115

Merged

k8s-ci-robot closed this as completed in #2115 Aug 15, 2024

ElijahQuinones mentioned this issue Sep 24, 2024

REQUEST: New membership for ElijahQuinones kubernetes/org#5177

Closed

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect allocatable volumes count in csinode for AWS vt1/g4 instance types #2105

Incorrect allocatable volumes count in csinode for AWS vt1/g4 instance types #2105

mpatlasov commented Aug 2, 2024

AndrewSirenko commented Aug 2, 2024

k8s-ci-robot commented Aug 2, 2024

AndrewSirenko commented Aug 2, 2024

ElijahQuinones commented Aug 14, 2024 •

edited

Loading

Incorrect allocatable volumes count in csinode for AWS vt1*/g4* instance types #2105

Incorrect allocatable volumes count in csinode for AWS vt1*/g4* instance types #2105

Comments

mpatlasov commented Aug 2, 2024

AndrewSirenko commented Aug 2, 2024

k8s-ci-robot commented Aug 2, 2024

AndrewSirenko commented Aug 2, 2024

ElijahQuinones commented Aug 14, 2024 • edited Loading

Incorrect allocatable volumes count in csinode for AWS vt1/g4 instance types #2105

Incorrect allocatable volumes count in csinode for AWS vt1/g4 instance types #2105

ElijahQuinones commented Aug 14, 2024 •

edited

Loading