Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

version 1 PV does not get deleted when fs name of remote cluster is renamed #1131

Open
saurabhwani5 opened this issue Apr 16, 2024 · 0 comments
Labels
Customer Impact: Localized high impact (3) Reduction of function. Significant impact to workload. Customer Probability: Medium (3) Issue occurs in normal path but specific limited timing window, or other mitigating factor Found In: 2.11.0 Severity: 2 Indicates that the issue is critical and must be addressed before milestone. Type: Bug Indicates issue is an undesired behavior, usually caused by code error.

Comments

@saurabhwani5
Copy link
Member

saurabhwani5 commented Apr 16, 2024

Describe the bug

when we rename remote filesystem name and try to delete the pvc created by this filesystem then pv does not get deleted and wrong error log is coming as Error: [unable to list snapshots for fileset pvc-9f35edb2-2e7c-4b21-88b1-b75985f3bddf. Error [rpc error: code = Internal desc = remote call failed with response &{[] {400 Invalid value in 'filesystem'} { 0 }}: GET request https://10.11.57.54:443/scalemgmt/v2/filesystems/fs0/filesets/pvc-9f35edb2-2e7c-4b21-88b1-b75985f3bddf/snapshots, user: remoteadmin9, param: <nil>
whereas I have renamed remotecluster fs name from fs0 to newfs but it is still taking fs0 here

How to Reproduce?

  1. Rename the remote filesystem of the cluster as following (changing fs name from fs0 to newfs):
[root@remote-saurabhwani1213-2 ~]# mmunmount fs0 -a
[root@remote-saurabhwani1213-2 ~]# mmunmount fs0 -f -C all_remote
Tue Apr 16 03:22:10 AM PDT 2024: mmunmount: Unmounting file systems ...
[root@remote-saurabhwani1213-2 ~]# mmchfs fs0 -W newfs
mmchfs: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.
[root@remote-saurabhwani1213-2 ~]# mmmount newfs  -a
[root@saurabh7node-scalegui ~]# mmremotefs show all
Local Name  Remote Name  Cluster name       Mount Point        Mount Options    Automount  Drive  Priority
remoterenamed newfs        remote-saurabhwani1213.fyre.ibm.com /ibm/remoterenamed rw               no           -        0
[root@saurabh7node-scalegui ~]# mmlsmount remoterenamed -L

File system remoterenamed (remote-saurabhwani1213.fyre.ibm.com:newfs) is mounted on 9 nodes:
  10.11.59.38     remote-saurabhwani1213-3.fyre.ibm.com remote-saurabhwani1213.fyre.ibm.com
  10.11.53.28     remote-saurabhwani1213-1.fyre.ibm.com remote-saurabhwani1213.fyre.ibm.com
  10.11.57.54     remote-saurabhwani1213-2.fyre.ibm.com remote-saurabhwani1213.fyre.ibm.com
  10.11.15.139    saurabh7node-scalegui.fyre.ibm.com saurabh7nodespectrumscale.ibm.com
  10.11.15.123    saurabh7node-worker-1.fyre.ibm.com saurabh7nodespectrumscale.ibm.com
  10.11.15.137    saurabh7node-worker-5.fyre.ibm.com saurabh7nodespectrumscale.ibm.com
  10.11.15.130    saurabh7node-worker-3.fyre.ibm.com saurabh7nodespectrumscale.ibm.com
  10.11.15.125    saurabh7node-worker-2.fyre.ibm.com saurabh7nodespectrumscale.ibm.com
  10.11.15.135    saurabh7node-worker-4.fyre.ibm.com saurabh7nodespectrumscale.ibm.com
  1. Install CSI 2.11.0
[root@saurabh7node-master ~]# oc apply -f ibm-spectrum-scale-csi-operator.yaml
deployment.apps/ibm-spectrum-scale-csi-operator created
clusterrole.rbac.authorization.k8s.io/ibm-spectrum-scale-csi-operator created
clusterrolebinding.rbac.authorization.k8s.io/ibm-spectrum-scale-csi-operator created
serviceaccount/ibm-spectrum-scale-csi-operator created
customresourcedefinition.apiextensions.k8s.io/csiscaleoperators.csi.ibm.com created
[root@saurabh7node-master ~]# oc apply -f csiscaleoperators.csi.ibm.com_cr.yaml
csiscaleoperator.csi.ibm.com/ibm-spectrum-scale-csi created
[root@saurabh7node-master ~]# oc get pods
NAME                                                  READY   STATUS    RESTARTS   AGE
ibm-spectrum-scale-csi-9x6m4                          3/3     Running   0          30s
ibm-spectrum-scale-csi-attacher-7fb8cdf96d-5m6k9      1/1     Running   0          30s
ibm-spectrum-scale-csi-attacher-7fb8cdf96d-h9rdv      1/1     Running   0          30s
ibm-spectrum-scale-csi-m6bvf                          3/3     Running   0          30s
ibm-spectrum-scale-csi-operator-5895b8f98c-sqrw2      1/1     Running   0          63s
ibm-spectrum-scale-csi-provisioner-b5d455bf6-bhkgt    1/1     Running   0          30s
ibm-spectrum-scale-csi-resizer-5f77cbdc5f-h2xrw       1/1     Running   0          30s
ibm-spectrum-scale-csi-snapshotter-79877fd7f6-zkzg2   1/1     Running   0          30s
ibm-spectrum-scale-csi-x7kxx                          3/3     Running   0          30s
[root@saurabh7node-master ~]# oc get cso
NAME                     VERSION   SUCCESS
ibm-spectrum-scale-csi   2.11.0    True
[root@saurabh7node-master ~]# oc describe pod  | grep quay
    Image:         quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver@sha256:b2bc343eadbc11d9ed74a8477d2cd0a7a8460a72203d3f6236d4662e68df1166
    Image ID:      quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver@sha256:b2bc343eadbc11d9ed74a8477d2cd0a7a8460a72203d3f6236d4662e68df1166
  Normal  Pulled     37s   kubelet            Container image "quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver@sha256:b2bc343eadbc11d9ed74a8477d2cd0a7a8460a72203d3f6236d4662e68df1166" already present on machine
    Image:         quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver@sha256:b2bc343eadbc11d9ed74a8477d2cd0a7a8460a72203d3f6236d4662e68df1166
    Image ID:      quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver@sha256:b2bc343eadbc11d9ed74a8477d2cd0a7a8460a72203d3f6236d4662e68df1166
  Normal  Pulled     37s   kubelet            Container image "quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver@sha256:b2bc343eadbc11d9ed74a8477d2cd0a7a8460a72203d3f6236d4662e68df1166" already present on machine
    Image:         quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-operator@sha256:bd264199ac10d574163bfa32bb88844fd786ee6f794a56e235591d2f051c7807
    Image ID:      quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-operator@sha256:97adea43e18091d62a6cc6049106c6fbd860e62d6ccd952c98b626a6bb78fb92
      CSI_DRIVER_IMAGE:      quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver@sha256:b2bc343eadbc11d9ed74a8477d2cd0a7a8460a72203d3f6236d4662e68df1166
  Normal  Pulled     69s   kubelet            Container image "quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-operator@sha256:bd264199ac10d574163bfa32bb88844fd786ee6f794a56e235591d2f051c7807" already present on machine
    Image:         quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver@sha256:b2bc343eadbc11d9ed74a8477d2cd0a7a8460a72203d3f6236d4662e68df1166
    Image ID:      quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver@sha256:b2bc343eadbc11d9ed74a8477d2cd0a7a8460a72203d3f6236d4662e68df1166
  Normal  Pulled     37s   kubelet            Container image "quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver@sha256:b2bc343eadbc11d9ed74a8477d2cd0a7a8460a72203d3f6236d4662e68df1166" already present on machine
[root@saurabh7node-master ~]#
  1. Create pvc with renamed filesystem :
[root@saurabh7node-master ~]# cat pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: scale-advance-pvc-1
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  storageClassName: ibm-spectrum-scale-csi

---

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: ibm-spectrum-scale-csi
provisioner: spectrumscale.csi.ibm.com
parameters:
   volBackendFs: "remoterenamed"
reclaimPolicy: Delete
[root@saurabh7node-master ~]# oc apply -f pvc.yaml
persistentvolumeclaim/scale-advance-pvc-1 created
storageclass.storage.k8s.io/ibm-spectrum-scale-csi created

  1. Check if pvc is getting created
[root@saurabh7node-master ~]# oc get pvc -w
NAME                  STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS             AGE
scale-advance-pvc-1   Pending                                      ibm-spectrum-scale-csi   2s
scale-advance-pvc-1   Pending                                      ibm-spectrum-scale-csi   3s
scale-advance-pvc-1   Pending   pvc-9f35edb2-2e7c-4b21-88b1-b75985f3bddf   0                         ibm-spectrum-scale-csi   59s
scale-advance-pvc-1   Bound     pvc-9f35edb2-2e7c-4b21-88b1-b75985f3bddf   1Gi        RWX            ibm-spectrum-scale-csi   59s
  1. Delete the PVC :
C[root@saurabh7node-master ~]# oc get pvc
NAME                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS             AGE
scale-advance-pvc-1   Bound    pvc-9f35edb2-2e7c-4b21-88b1-b75985f3bddf   1Gi        RWX            ibm-spectrum-scale-csi   87s
[root@saurabh7node-master ~]# oc delete pvc scale-advance-pvc-1
persistentvolumeclaim "scale-advance-pvc-1" deleted
  1. Check pv description:
[root@saurabh7node-master ~]# oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                                               STORAGECLASS             REASON   AGE
pvc-9f35edb2-2e7c-4b21-88b1-b75985f3bddf   1Gi        RWX            Delete           Released   ibm-spectrum-scale-csi-driver/scale-advance-pvc-1   ibm-spectrum-scale-csi            60s
[root@saurabh7node-master ~]# oc describe pv
Name:            pvc-9f35edb2-2e7c-4b21-88b1-b75985f3bddf
Labels:          <none>
Annotations:     pv.kubernetes.io/provisioned-by: spectrumscale.csi.ibm.com
                 volume.kubernetes.io/provisioner-deletion-secret-name:
                 volume.kubernetes.io/provisioner-deletion-secret-namespace:
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    ibm-spectrum-scale-csi
Status:          Released
Claim:           ibm-spectrum-scale-csi-driver/scale-advance-pvc-1
Reclaim Policy:  Delete
Access Modes:    RWX
VolumeMode:      Filesystem
Capacity:        1Gi
Node Affinity:   <none>
Message:
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            spectrumscale.csi.ibm.com
    FSType:            gpfs
    VolumeHandle:      0;2;4181049054023231843;263B0B0A:6589AA52;;pvc-9f35edb2-2e7c-4b21-88b1-b75985f3bddf;/ibm/fs1/spectrum-scale-csi-volume-store/.volumes/pvc-9f35edb2-2e7c-4b21-88b1-b75985f3bddf
    ReadOnly:          false
    VolumeAttributes:      csi.storage.k8s.io/pv/name=pvc-9f35edb2-2e7c-4b21-88b1-b75985f3bddf
                           csi.storage.k8s.io/pvc/name=scale-advance-pvc-1
                           csi.storage.k8s.io/pvc/namespace=ibm-spectrum-scale-csi-driver
                           storage.kubernetes.io/csiProvisionerIdentity=1713263606126-9774-spectrumscale.csi.ibm.com
                           volBackendFs=remoterenamed
Events:
  Type     Reason              Age   From                                                                                                               Message
  ----     ------              ----  ----                                                                                                               -------
  Warning  VolumeFailedDelete  17s   spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-b5d455bf6-bhkgt_0bdfd1a0-ae1f-4e81-8212-1ce7f4078a39  rpc error: code = Internal desc = unable to list snapshot for fileset [pvc-9f35edb2-2e7c-4b21-88b1-b75985f3bddf]. Error: [unable to list snapshots for fileset pvc-9f35edb2-2e7c-4b21-88b1-b75985f3bddf. Error [rpc error: code = Internal desc = remote call failed with response &{[] {400 Invalid value in 'filesystem'} {  0 }}: GET request https://10.11.57.54:443/scalemgmt/v2/filesystems/fs0/filesets/pvc-9f35edb2-2e7c-4b21-88b1-b75985f3bddf/snapshots, user: remoteadmin9, param: <nil>, response: &{400 Bad Request 400 HTTP/1.1 1 1 map[Access-Control-Allow-Credentials:[true] Access-Control-Allow-Headers:[Authorization,Content-Type,Accept] Access-Control-Allow-Methods:[GET, POST, PUT, DELETE] Cache-Control:[no-cache="set-cookie, set-cookie2"] Content-Language:[en-US] Content-Length:[88] Content-Type:[application/json] Date:[Tue, 16 Apr 2024 10:59:21 GMT] Expires:[Thu, 01 Dec 1994 16:00:00 GMT] Set-Cookie:[JSESSIONID=0000rMgFpAZGUn9zOh3RyEKCAHr:138ac14a-9325-44c9-b901-f71b3128c461; Path=/; Secure; HttpOnly; SameSite=Strict] Strict-Transport-Security:[max-age=31536000] X-Content-Type-Options:[nosniff]] 0xc00024e400 88 [] true false map[] 0xc000376700 0xc000369080}]]
  Warning  VolumeFailedDelete  12s   spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-b5d455bf6-bhkgt_0bdfd1a0-ae1f-4e81-8212-1ce7f4078a39  rpc error: code = Internal desc = unable to list snapshot for fileset [pvc-9f35edb2-2e7c-4b21-88b1-b75985f3bddf]. Error: [unable to list snapshots for fileset pvc-9f35edb2-2e7c-4b21-88b1-b75985f3bddf. Error [rpc error: code = Internal desc = remote call failed with response &{[] {400 Invalid value in 'filesystem'} {  0 }}: GET request https://10.11.57.54:443/scalemgmt/v2/filesystems/fs0/filesets/pvc-9f35edb2-2e7c-4b21-88b1-b75985f3bddf/snapshots, user: remoteadmin9, param: <nil>, response: &{400 Bad Request 400 HTTP/1.1 1 1 map[Access-Control-Allow-Credentials:[true] Access-Control-Allow-Headers:[Authorization,Content-Type,Accept] Access-Control-Allow-Methods:[GET, POST, PUT, DELETE] Cache-Control:[no-cache="set-cookie, set-cookie2"] Content-Language:[en-US] Content-Length:[88] Content-Type:[application/json] Date:[Tue, 16 Apr 2024 10:59:26 GMT] Expires:[Thu, 01 Dec 1994 16:00:00 GMT] Set-Cookie:[JSESSIONID=0000v4QGtL_5GFOVglfFkgJFgAs:138ac14a-9325-44c9-b901-f71b3128c461; Path=/; Secure; HttpOnly; SameSite=Strict] Strict-Transport-Security:[max-age=31536000] X-Content-Type-Options:[nosniff]] 0xc00024e460 88 [] true false map[] 0xc000376e00 0xc000369290}]]
  Warning  VolumeFailedDelete  3s    spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-b5d455bf6-bhkgt_0bdfd1a0-ae1f-4e81-8212-1ce7f4078a39  rpc error: code = Internal desc = unable to list snapshot for fileset [pvc-9f35edb2-2e7c-4b21-88b1-b75985f3bddf]. Error: [unable to list snapshots for fileset pvc-9f35edb2-2e7c-4b21-88b1-b75985f3bddf. Error [rpc error: code = Internal desc = remote call failed with response &{[] {400 Invalid value in 'filesystem'} {  0 }}: GET request https://10.11.57.54:443/scalemgmt/v2/filesystems/fs0/filesets/pvc-9f35edb2-2e7c-4b21-88b1-b75985f3bddf/snapshots, user: remoteadmin9, param: <nil>, response: &{400 Bad Request 400 HTTP/1.1 1 1 map[Access-Control-Allow-Credentials:[true] Access-Control-Allow-Headers:[Authorization,Content-Type,Accept] Access-Control-Allow-Methods:[GET, POST, PUT, DELETE] Cache-Control:[no-cache="set-cookie, set-cookie2"] Content-Language:[en-US] Content-Length:[88] Content-Type:[application/json] Date:[Tue, 16 Apr 2024 10:59:35 GMT] Expires:[Thu, 01 Dec 1994 16:00:00 GMT] Set-Cookie:[JSESSIONID=0000YDjFCaP2RWwjKH-4vOHk8OE:138ac14a-9325-44c9-b901-f71b3128c461; Path=/; Secure; HttpOnly; SameSite=Strict] Strict-Transport-Security:[max-age=31536000] X-Content-Type-Options:[nosniff]] 0xc0000baea0 88 [] true false map[] 0xc00042d100 0xc0001dad10}]]
[root@saurabh7node-master ~]#

Expected behavior

PV should be deleted

Logs

/scale-csi/D.1131
csisnap.tar.gz

@saurabhwani5 saurabhwani5 added Severity: 2 Indicates that the issue is critical and must be addressed before milestone. Type: Bug Indicates issue is an undesired behavior, usually caused by code error. Customer Probability: Medium (3) Issue occurs in normal path but specific limited timing window, or other mitigating factor Customer Impact: Localized high impact (3) Reduction of function. Significant impact to workload. Found In: 2.11.0 labels Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Customer Impact: Localized high impact (3) Reduction of function. Significant impact to workload. Customer Probability: Medium (3) Issue occurs in normal path but specific limited timing window, or other mitigating factor Found In: 2.11.0 Severity: 2 Indicates that the issue is critical and must be addressed before milestone. Type: Bug Indicates issue is an undesired behavior, usually caused by code error.
Projects
None yet
Development

No branches or pull requests

1 participant