Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to create NFS provisioned volume in OpenShift Virtualization #341

Closed
justflite opened this issue Apr 8, 2023 · 9 comments
Closed

Comments

@justflite
Copy link

Environment: OCP 4.12 deployed on bare metal with OpenShift Virtualization operator installed. The cluster is connected to a HPE Primera storage array. HPE CSI driver for Kubernetes 2.2.0 and HPE NFS Provisioner 3.0.0 were installed.

Issue:

When creating the following Data Volume:

apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
name: dv1
spec:
source:
blank: {}
pvc:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: nfs-ssd

It failed with the following error messages:

failed to provision volume with StorageClass "nfs-ssd": rpc error: code = Internal desc = Failed to create NFS provisioned volume pvc-c2a552e7-5d05-4fe0-b32f-bc79ba6fb1e3, err persistentvolumeclaims "hpe-nfs-c2a552e7-5d05-4fe0-b32f-bc79ba6fb1e3" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , , rollback status: success

nfs-ssd is a storage class backed by HPE NFS provisioner.

@datamattsson
Copy link
Collaborator

This looks like a duplicate of #295. Is the PVC created by an Operator?

@justflite
Copy link
Author

No, it is not exactly the same as issue #295, but the solutions you recommended also apply to this issue.

Yes, the PVC is created by operator "OpenShift Virtualization".

To solve the issue #341, we have to solve the following 4 issues:

  1. Creating a PVC provisioned by HPE NFS provisioner in a namespace other than hpe-nfs by an operator
  2. "cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on"
  3. HPE NFS Provisioner pod cannot be created in a namespace other than hpe-nfs if specifying "nfsNamespace" other than hpe-nfs
  4. Creating a Data Volume will create 2 importer pods by CDI, one for the PVC that is NFS provisioned, one for the interal PVC used by HPE NFS Provisioner pod, while the latter one should not be created.

Now I have solutions for the first 3 issues:

  1. Use the workaround as described in issue NFS provisioning fails when PVC was created by an operator #295, setting the nfsNamespace parameter.
  2. Add the following rules to clusterole hpe-csi-provisioner-role by typing the following command:
    $ oc edit clusterrole hpe-csi-provisioner-role
    and add the following section:
  • apiGroups:
    • cdi.kubevirt.io
      resources:
    • datavolumes/finalizers
      verbs:
    • '*'
  • apiGroups:
    • kubevirt.io
      resources:
    • virtualmachines/finalizers
      verbs:
    • '*'
  1. Add an entry for every namespace you want to set to "nfsNamespace" in scc "hpe-csi-scc"
    ...
    users:
  • system:serviceaccount:hpe-storage:hpe-csi-controller-sa
  • system:serviceaccount:hpe-storage:hpe-csi-node-sa
  • system:serviceaccount:hpe-storage:hpe-csp-sa
  • system:serviceaccount:hpe-storage:hpe-csi-operator-sa
  • system:serviceaccount:hpe-nfs:hpe-csi-nfs-sa
  • system:serviceaccount:demo:hpe-csi-nfs-sa
    ...

But for the 4th issue, I don't have a solution yet.

@justflite
Copy link
Author

Detailed description for the 4th issue:

Creating a Data Volume will create 2 importer pods by CDI, one for the PVC that is NFS provisioned, one for the interal PVC used by HPE NFS Provisioner pod, while the latter one should not be created.

Consider creating the following DV:
$ cat dv4.yaml
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
name: dv4
spec:
source:
blank: {}
pvc:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
storageClassName: nfs-sas-demo

The PVC can be created successfully:

$ oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
dv4 Bound pvc-994f5584-ea32-4a79-a5d6-2793856faeb4 1Gi RWX nfs-sas-demo 56s
hpe-nfs-994f5584-ea32-4a79-a5d6-2793856faeb4 Bound pvc-fcee0baa-6b56-4f3c-8ea2-9c67ea50c340 1Gi RWO nfs-sas-demo 56s

dv4 is the PVC I want to create, and PVC "hpe-nfs-994f5584-ea32-4a79-a5d6-2793856faeb4 " is underlying PVC used by HPE NFS Provisioner.

But after 60 seconds, the PVC became:

$ oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
dv4 Bound pvc-994f5584-ea32-4a79-a5d6-2793856faeb4 1Gi RWX nfs-sas-demo 21m
hpe-nfs-994f5584-ea32-4a79-a5d6-2793856faeb4 Terminating pvc-fcee0baa-6b56-4f3c-8ea2-9c67ea50c340 1Gi RWO nfs-sas-demo 21m

This is because a pod is created by CDI:

$ oc get pods
NAME READY STATUS RESTARTS AGE
hpe-nfs-994f5584-ea32-4a79-a5d6-2793856faeb4-8b55dd86d-ctq2w 0/1 ContainerCreating 0 3s
importer-hpe-nfs-994f5584-ea32-4a79-a5d6-2793856faeb4 0/1 ContainerCreating 0 8s

Pod "importer-hpe-nfs-994f5584-ea32-4a79-a5d6-2793856faeb4" was created by CDI, but it should not be created because it is not the final PVC, it is the underlying PVC used by HPE NFS Provisioner.

Consider the following scenario:

Create a PVC by the following yaml file:

apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
name: "dv5"
spec:
source:
http:
url: "http://172.16.17.242/rhel9.qcow2"
pvc:
accessModes:
- ReadWriteMany
resources:
requests:
storage: "17Gi"
storageClassName: nfs-sas-demo

CDI will create 2 importer pods, one for dv5, and one for underlying PVC(which should not be created), and the creation of the PVC will failed because 2 pods (Importer for the underlying PVC and HPE NFS Provisioner pod which both need to mount the same PVC)want to mount the PVC at the same time. The creating of importer pod is always faster than HPE NFS Provisioner pod, so the creation of PVC dv5 will fail duo to HPE NFS Provisoner cannnot mount the underlying PVC that is already mounted by importer.

58m Warning FailedAttachVolume pod/hpe-nfs-fe636a61-6800-41c9-9bc8-454084591646-9c4bd48-bmbwg Multi-Attach error for volume "pvc-6eff39fd-7234-4d1a-8bfe-bad12307417b" Volume is already used by pod(s) importer-hpe-nfs-fe636a61-6800-41c9-9bc8-454084591646

So the question is:

How to tell CDI not to create importer pod for underlying PVC which is used by HPE NFS Provisioner, but just creating a importer pod for the final RWX PVC only?

The CDI picks up the PVC and does actions on it when it has a matching annotation in the PVC yaml. The HPE CSI is also copying the same to their PVC which is causing the problem.

$ oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
dv3 Pending nfs-sas-demo 6s
hpe-nfs-5adb0f43-ac9b-4e16-85fe-59b098572d49 Bound pvc-8f61135c-451c-41d4-bbcc-63c11f8bd5ad 1Gi RWO nfs-sas-demo 6s
$
$
$ oc get pvc hpe-nfs-5adb0f43-ac9b-4e16-85fe-59b098572d49 -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
cdi.kubevirt.io/storage.condition.running: "false"
cdi.kubevirt.io/storage.condition.running.message: ""
cdi.kubevirt.io/storage.condition.running.reason: ContainerCreating
cdi.kubevirt.io/storage.contentType: kubevirt
cdi.kubevirt.io/storage.deleteAfterCompletion: "true"
cdi.kubevirt.io/storage.import.importPodName: importer-hpe-nfs-5adb0f43-ac9b-4e16-85fe-59b098572d49
cdi.kubevirt.io/storage.import.source: none
cdi.kubevirt.io/storage.pod.phase: Pending
cdi.kubevirt.io/storage.pod.restarts: "0"
cdi.kubevirt.io/storage.preallocation.requested: "false"
csi.hpe.com/nfsPVC: "true"
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: csi.hpe.com
volume.kubernetes.io/storage-provisioner: csi.hpe.com
creationTimestamp: "2023-04-13T13:33:22Z"
finalizers:

  • kubernetes.io/pvc-protection
    labels:
    alerts.k8s.io/KubePersistentVolumeFillingUp: disabled
    app: containerized-data-importer
    app.kubernetes.io/component: storage
    app.kubernetes.io/managed-by: cdi-controller
    app.kubernetes.io/part-of: hyperconverged-cluster
    app.kubernetes.io/version: 4.12.0
    name: hpe-nfs-5adb0f43-ac9b-4e16-85fe-59b098572d49
    namespace: demo
    ownerReferences:
  • apiVersion: cdi.kubevirt.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: DataVolume
    name: dv3
    uid: 6bd296a1-728a-4f17-ad1c-5df43a9c1f85
    resourceVersion: "97045588"
    uid: 8f61135c-451c-41d4-bbcc-63c11f8bd5ad
    spec:
    accessModes:
  • ReadWriteOnce
    resources:
    requests:
    storage: 1Gi
    storageClassName: nfs-sas-demo
    volumeMode: Filesystem
    volumeName: pvc-8f61135c-451c-41d4-bbcc-63c11f8bd5ad
    status:
    accessModes:
  • ReadWriteOnce
    capacity:
    storage: 1Gi
    phase: Bound
    $ oc get pvc dv3 -o yaml
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
    annotations:
    cdi.kubevirt.io/storage.contentType: kubevirt
    cdi.kubevirt.io/storage.deleteAfterCompletion: "true"
    cdi.kubevirt.io/storage.import.source: none
    cdi.kubevirt.io/storage.pod.restarts: "0"
    cdi.kubevirt.io/storage.preallocation.requested: "false"
    volume.beta.kubernetes.io/storage-provisioner: csi.hpe.com
    volume.kubernetes.io/storage-provisioner: csi.hpe.com
    creationTimestamp: "2023-04-13T13:33:22Z"
    finalizers:
  • kubernetes.io/pvc-protection
    labels:
    alerts.k8s.io/KubePersistentVolumeFillingUp: disabled
    app: containerized-data-importer
    app.kubernetes.io/component: storage
    app.kubernetes.io/managed-by: cdi-controller
    app.kubernetes.io/part-of: hyperconverged-cluster
    app.kubernetes.io/version: 4.12.0
    name: dv3
    namespace: demo
    ownerReferences:
  • apiVersion: cdi.kubevirt.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: DataVolume
    name: dv3
    uid: 6bd296a1-728a-4f17-ad1c-5df43a9c1f85
    resourceVersion: "97045515"
    uid: 5adb0f43-ac9b-4e16-85fe-59b098572d49
    spec:
    accessModes:
  • ReadWriteMany
    resources:
    requests:
    storage: 1Gi
    storageClassName: nfs-sas-demo
    volumeMode: Filesystem
    status:
    phase: Pending
    $

@datamattsson
Copy link
Collaborator

Thanks for the additional details. To be completely honest here we have not qualified OpenShift Virtualization with the HPE CSI Driver. The operation you're performing should be made on with a regular RWX PVC (without NFS resources) using volumeMode: Block. More on that in this issue: #323

That said, this issue is a priority for HPE and we're currently working on getting this issue resolved for the next release of the CSI driver.

@justflite
Copy link
Author

Thank you very much for your kindly reply. Can we use a block mode regular RWX PVC? On scod.hpedev.io, it said a block mode RWX PVC can be provisoned, but the behavior can be unpredictable. Is there a success story of block mode RWX PVC used for VM in order to enable live migration feature?

I have tested block mode regular RWX PVC with HPE Primera C630, the creation of VM succeeded, but the live migration failed.

What is the version of the next release of the CSI driver that can be expected to solve this issue? v2.3.0?

@justflite
Copy link
Author

When I initiated a live migration of a VM that is based on block mode RWX PVC, the following error occurred:

Generated from kubelet on worker11.openshift.lab
4 times in the last 1 minute
MapVolume.SetUpDevice failed for volume "pvc-8733eeb9-aced-47a9-944e-22f6cc7c2620" : rpc error: code = Internal desc = Failed to stage volume pvc-8733eeb9-aced-47a9-944e-22f6cc7c2620, err: rpc error: code = Internal desc = Error creating device for volume pvc-8733eeb9-aced-47a9-944e-22f6cc7c2620, err: device not found with serial 60002ac0000000000000209300029a7f or target

Generated from kubelet on worker11.openshift.lab
Unable to attach or mount volumes: unmounted volumes=[rootdisk], unattached volumes=[rootdisk hotplug-disks private public ephemeral-disks container-disks libvirt-runtime sockets]: timed out waiting for the condition

@datamattsson
Copy link
Collaborator

It won't be in 2.3.0, it will be in subsequent release.

@datamattsson
Copy link
Collaborator

There's a beta chart available that fixes this for 3PAR pedigree platforms. GA release for the chart and certified OpenShift operator imminent.

@datamattsson
Copy link
Collaborator

Fixed in v2.4.1 using RWX block.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants