PVC atached to a pod doesn't migrate across nodes when Kubelet Service is stopped #563

rkomandu · 2022-01-18T16:11:56Z

Describe the bug

CNSA - 5112 (CSI bundled with this)

For HPO solution, the building blocks are CNSA, CSI and then on top of it DAS operator (internally installs Noobaa) for S3 object Access.

The complete description of the bug was posted in this Noobaa-core component which is a public repository.

noobaa/noobaa-core#6853

Configured MetalLB on the cluster (which shouldn't matter) for this problem description though.. Have the Noobaa core/db and 3 endpoints are running on the respective worker nodes as shown below

NAME                                               READY   STATUS    RESTARTS        AGE    IP              NODE                                  NOMINATED NO
DE   READINESS GATES
noobaa-core-0                                      1/1     Running   0               20d    10.254.14.77    worker1.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-db-pg-0                                     1/1     Running   3 (17d ago)     20d    10.254.18.0     worker0.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-default-backing-store-noobaa-pod-a1bf952a   1/1     Running   0               20d    10.254.18.4     worker0.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-endpoint-bfffdd599-7jzdf                    1/1     Running   0               3d1h   10.254.20.43    worker2.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-endpoint-bfffdd599-gxz5h                    1/1     Running   0               3d4h   10.254.15.112   worker1.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-endpoint-bfffdd599-mbfrj                    1/1     Running   0               3d4h   10.254.17.208   worker0.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-operator-5c46775cdd-vplhr                   1/1     Running   0               31d    10.254.16.22    worker0.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>

Step 2: Issued a kubelet service stop on the node where the noobaa-db pg pod is running 

[core@worker0 ~]$ sudo systemctl stop kubelet

Step 3:  noobaa-db-pg pod trying to migrate to worker2 from worker0 , noobaa operator restarted,  noobaa endpoint on worker0 has got into Pending state as expected 

NAME                                               READY   STATUS              RESTARTS         AGE    IP              NODE                                  N
OMINATED NODE   READINESS GATES
noobaa-core-0                                      1/1     Running             0                20d    10.254.14.77    worker1.rkomandu-ta.cp.fyre.ibm.com   <
none>           <none>
noobaa-db-pg-0                                     0/1     Init:0/2            0                6s     <none>          worker2.rkomandu-ta.cp.fyre.ibm.com   <
none>           <none>
noobaa-endpoint-bfffdd599-7jzdf                    1/1     Running             0                3d1h   10.254.20.43    worker2.rkomandu-ta.cp.fyre.ibm.com   <
none>           <none>
noobaa-endpoint-bfffdd599-gxz5h                    1/1     Running             0                3d4h   10.254.15.112   worker1.rkomandu-ta.cp.fyre.ibm.com   <
none>           <none>
noobaa-endpoint-bfffdd599-wlktz                    0/1     Pending             0                6s     <none>          <none>                                <
none>           <none>
noobaa-operator-5c46775cdd-9mgxt                   0/1     ContainerCreating   0                6s     <none>          worker2.rkomandu-ta.cp.fyre.ibm.com   <
none>           <none>

Step 4: Noobaa-db-pg pod continues to be in the Init state on the worker2 

NAME                                               READY   STATUS        RESTARTS         AGE     IP              NODE                                  NOMINA
TED NODE   READINESS GATES
noobaa-core-0                                      1/1     Running       0                20d     10.254.14.77    worker1.rkomandu-ta.cp.fyre.ibm.com   <none>
           <none>
noobaa-db-pg-0                                     0/1     Init:0/2      0                7m52s   <none>          worker2.rkomandu-ta.cp.fyre.ibm.com   <none>
           <none>
noobaa-endpoint-bfffdd599-7jzdf                    1/1     Running       0                3d1h    10.254.20.43    worker2.rkomandu-ta.cp.fyre.ibm.com   <none>
           <none>
noobaa-endpoint-bfffdd599-gxz5h                    1/1     Running       0                3d4h    10.254.15.112   worker1.rkomandu-ta.cp.fyre.ibm.com   <none>
           <none>
noobaa-endpoint-bfffdd599-wlktz                    0/1     Pending       0                7m52s   <none>          <none>                                <none>
           <none>
noobaa-operator-5c46775cdd-9mgxt                   1/1     Running       0                7m52s   10.254.20.72    worker2.rkomandu-ta.cp.fyre.ibm.com   <none>
           <none>
           
  Step 5: When described the noobaa-db-pg-0 , it showed the pvc was bound on the worker0 node can't be bound to worker2 node. 
  
  Events:
  Type     Reason              Age                    From                     Message
  ----     ------              ----                   ----                     -------
  Normal   Scheduled           11m                    default-scheduler        Successfully assigned openshift-storage/noobaa-db-pg-0 to worker2.rkomandu-ta.cp.fyre.ibm.com
  Warning  FailedAttachVolume  11m                    attachdetach-controller  Multi-Attach error for volume "pvc-3e03cdb0-a374-4aed-bc3f-6e6f9ba74bca" Volume is already exclusively attached to one node and can't be attached to another
  Warning  FailedMount         2m42s (x4 over 9m31s)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[db], unattached volumes=[db kube-api-access-89bwb noobaa-postgres-initdb-sh-volume noobaa-postgres-config-volume]: timed out waiting for the condition
  Warning  FailedMount         25s                    kubelet                  Unable to attach or mount volumes: unmounted volumes=[db], unattached volumes=[noobaa-postgres-initdb-sh-volume noobaa-postgres-config-volume db kube-api-access-89bwb]: timed out waiting for the condition
  Warning  FailedAttachVolume  16s (x10 over 4m31s)   attachdetach-controller  AttachVolume.Attach failed for volume "pvc-3e03cdb0-a374-4aed-bc3f-6e6f9ba74bca" : rpc error: code = Internal desc = ControllerPublishVolume : Error in getting filesystem Name for filesystem ID of 0D790B0A:61B0F1B9. Error [Get "https://ibm-spectrum-scale-gui.ibm-spectrum-scale:443/scalemgmt/v2/filesystems?filter=uuid=0D790B0A:61B0F1B9": context deadline exceeded (Client.Timeout exceeded while awaiting headers)]

This is a problem as I see, Is there a way to get this resolved.

What happens with this later for HPO team is that, the database is init state, HPO admin can't create any new accounts/exports etc.

Temp Workaround which was done is
on worker0 restarted the "Service Kubelet" that was made down earlier and then the noobaa-db-pg pod moved to worker2 w/o any problem. I u/s that this is linked with Kubelet service for the movement of the pod.

Could you take a look at this defect and provide your thoughts/comments ?

Data Collection and Debugging

Environmental output

What openshift/kubernetes version are you running, and the architecture?
oc version
Client Version: 4.9.5
Server Version: 4.9.5
Kubernetes Version: v1.22.0-rc.0+a44d0f0
kubectl get pods -o wide -n < csi driver namespace>

oc get pods -o wide -n ibm-spectrum-scale-csi
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ibm-spectrum-scale-csi-attacher-0 1/1 Running 12 (4d2h ago) 39d 10.254.16.5 worker0.rkomandu-ta.cp.fyre.ibm.com
ibm-spectrum-scale-csi-gqltw 3/3 Running 0 40h 10.17.127.141 worker2.rkomandu-ta.cp.fyre.ibm.com
ibm-spectrum-scale-csi-h78rs 3/3 Running 0 4d2h 10.17.126.141 worker0.rkomandu-ta.cp.fyre.ibm.com
ibm-spectrum-scale-csi-operator-d844fb754-7d9db 1/1 Running 23 (16h ago) 7d5h 10.254.19.7 worker0.rkomandu-ta.cp.fyre.ibm.com
ibm-spectrum-scale-csi-provisioner-0 1/1 Running 12 (4d2h ago) 39d 10.254.16.6 worker0.rkomandu-ta.cp.fyre.ibm.com
ibm-spectrum-scale-csi-resizer-0 1/1 Running 12 (4d2h ago) 39d 10.254.16.4 worker0.rkomandu-ta.cp.fyre.ibm.com
ibm-spectrum-scale-csi-smp65 3/3 Running 0 4d16h 10.17.126.253 worker1.rkomandu-ta.cp.fyre.ibm.com
ibm-spectrum-scale-csi-snapshotter-0 1/1 Running 12 (4d2h ago) 39d 10.254.16.14 worker0.rkomandu-ta.cp.fyre.ibm.com

kubectl get nodes -o wide

oc get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master0.rkomandu-ta.cp.fyre.ibm.com Ready master 39d v1.22.0-rc.0+a44d0f0 10.17.104.166 Red Hat Enterprise Linux CoreOS 49.84.202110220538-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0-74.rhaos4.9.gitd745cab.el8
master1.rkomandu-ta.cp.fyre.ibm.com Ready master 39d v1.22.0-rc.0+a44d0f0 10.17.113.80 Red Hat Enterprise Linux CoreOS 49.84.202110220538-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0-74.rhaos4.9.gitd745cab.el8
master2.rkomandu-ta.cp.fyre.ibm.com Ready master 39d v1.22.0-rc.0+a44d0f0 10.17.117.1 Red Hat Enterprise Linux CoreOS 49.84.202110220538-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0-74.rhaos4.9.gitd745cab.el8
worker0.rkomandu-ta.cp.fyre.ibm.com Ready worker 39d v1.22.0-rc.0+a44d0f0 10.17.126.141 Red Hat Enterprise Linux CoreOS 49.84.202110220538-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0-74.rhaos4.9.gitd745cab.el8
worker1.rkomandu-ta.cp.fyre.ibm.com Ready worker 39d v1.22.0-rc.0+a44d0f0 10.17.126.253 Red Hat Enterprise Linux CoreOS 49.84.202110220538-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0-74.rhaos4.9.gitd745cab.el8
worker2.rkomandu-ta.cp.fyre.ibm.com Ready worker 39d v1.22.0-rc.0+a44d0f0 10.17.127.141 Red Hat Enterprise Linux CoreOS 49.84.202110220538-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0-74.rhaos4.9.gitd745cab.el8

CNSA/Spectrum Scale version

CNSA 5.1.2.1 (Dec 10th GA)

Remote Spectrum Scale version
mmdiag --version

=== mmdiag: version ===
Current GPFS build: "5.1.2.1 ".
Built on Nov 11 2021 at 13:11:41
Running 4 days 17 hours 58 minutes 13 secs, pid 3060

Output for ./tools/spectrum-scale-driver-snap.sh -n < csi driver namespace> -v
This bug was opened earlier with Noobaa team but later it was redirected based on the Events that are posted above moved this to CSI

Tool to collect the CSI snap:

./tools/spectrum-scale-driver-snap.sh -n < csi driver namespace>

I don't have anything specifically collected as this was opened in the Noobaa. However the above steps shows clearly the PVC attached to the noobaa-db-pg doesn't get migrated to worker node when the "kubelet service" is stopped.

Add labels

Component:
Severity:
Customer Impact:
Customer Probability:
Phase:

Note : See labels for the labels

The text was updated successfully, but these errors were encountered:

baum · 2022-01-18T16:46:25Z

@rkomandu, the db pod was scheduled eventually, so could you provide logs of the "init" db init container:
This is how it looks on my side:

➜  noobaa-operator git:(master) ➜ kubectl logs noobaa-db-pg-0 -c init
uid change has been identified - will change from uid: 0 to new uid: 10001
setting permissions of /var/lib/pgsql for user 10001
changed permissions of /var/lib/pgsql successfully

real	0m0.003s
user	0m0.001s
sys	0m0.001s

Thank you!

rkomandu · 2022-01-18T17:03:42Z

I don't have the cluster as-is now. This was tried about a week back and opened first in Noobaa GH. It needs to be recreated, may be you can try with the steps mentioned as it is nothing to do with the noobaa

deeghuge · 2022-01-18T17:16:43Z

Hi @rkomandu
From the details pasted above, it looks like ibm-spectrum-scale-csi-attacher-0 was running on node worker0. This is the same node where Kubelet was shutdown. Since Attacher is StatefulSet, its failover is not straight forward as other pods. if it would have been any node other than node where attacher is running then noobaa pod should have failed over as expected.

ibm-spectrum-scale-csi-attacher-0 1/1 Running 12 (4d2h ago) 39d 10.254.16.5 worker0.rkomandu-ta.cp.fyre.ibm.com 
ibm-spectrum-scale-csi-gqltw 3/3 Running 0 40h 10.17.127.141 worker2.rkomandu-ta.cp.fyre.ibm.com 
ibm-spectrum-scale-csi-h78rs 3/3 Running 0 4d2h 10.17.126.141 worker0.rkomandu-ta.cp.fyre.ibm.com 
ibm-spectrum-scale-csi-operator-d844fb754-7d9db 1/1 Running 23 (16h ago) 7d5h 10.254.19.7 worker0.rkomandu-ta.cp.fyre.ibm.com 
ibm-spectrum-scale-csi-provisioner-0 1/1 Running 12 (4d2h ago) 39d 10.254.16.6 worker0.rkomandu-ta.cp.fyre.ibm.com 
ibm-spectrum-scale-csi-resizer-0 1/1 Running 12 (4d2h ago) 39d 10.254.16.4 worker0.rkomandu-ta.cp.fyre.ibm.com 
ibm-spectrum-scale-csi-smp65 3/3 Running 0 4d16h 10.17.126.253 worker1.rkomandu-ta.cp.fyre.ibm.com 
ibm-spectrum-scale-csi-snapshotter-0 1/1 Running 12 (4d2h ago) 39d 10.254.16.14 worker0.rkomandu-ta.cp.fyre.ibm.com

rkomandu · 2022-01-19T05:29:33Z

Hi @deeghuge , yes, CSI attacher seems to run on worker0 for 39d. I am concerned on this scenario for now as we have noobaa-endpoint running on each node which translates to each node will serve IO internally when requests come from Application nodes.

For the noobaa-db as PVC is attached this problem is observed that it can't move when Failover is detected. Please see if there is a way to fix this. If there is no way you can fix this then it should be a limitation in the field which is risky as node serving (noobaa-db) can get down for various reasons due to Error Injection or HW related problem on the node.

Note: I have tried the Kubelet service on other nodes (for ex: noobaa core pod - no PVC is there for this, noobaa-endpoint running etc) and there is no problem for our IO, as HA service is configured the IP moved over and IO was successful.

rkomandu · 2022-01-19T05:53:32Z

@baum , Do you still need the Kubectl logs for the noobaa-db ?

Reason for asking, the noobaa-db pod restarted as the data required for it was on the Storage Cluster(FS) worker node was down. Fyre admin got the node back to Active state 6days back.

 oc get pods -n openshift-storage
NAME                                               READY   STATUS    RESTARTS        AGE
noobaa-core-0                                      1/1     Running   0               8d
noobaa-db-pg-0                                     1/1     Running   1 (6d10h ago)   7d18h

rkomandu · 2022-01-19T05:54:12Z

@baum , it shows empty now

kubectl logs noobaa-db-pg-0 -c init

baum · 2022-01-19T09:30:20Z

@deeghuge re StatefulSet comment

Rescheduling StatefulSet's pods upon a k8s node failure (caused by kubelet stop) is a known and documented issue. To deal with this issue, the NooBaa operator ( and I believe other components of ODF ) implement a controller to force delete their pods from a failing node. Here is a short NooBaa's feature's document and the PR.

Best regards

deeghuge · 2022-01-21T05:06:46Z

From CSI side for CSI statefulSet needs manual intervention to move StatefulSet from failing node to other one or wait for Kubernetes for take corrective action which might take long time

deeghuge · 2022-01-21T05:09:58Z

We are investigating solution to fix this but we have very short runway for next release so can't commit fix unless we complete the investigation

rkomandu · 2022-01-24T13:03:03Z

@deeghuge

One more observation, the default noobaa backing store has the PVC attached problem when Kubelet service is stopped.

`

Step 1: noobaa-backing store is running on worker1 

NAME                                               READY   STATUS    RESTARTS       AGE     IP              NODE                                  NOMINATED NO
DE   READINESS GATES
noobaa-core-0                                      1/1     Running   0              6h44m   10.254.21.59    worker2.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-db-pg-0                                     1/1     Running   0              6h44m   10.254.21.60    worker2.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-default-backing-store-noobaa-pod-0e46f0f9   1/1     Running   0              6h41m   10.254.14.233   worker1.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-endpoint-8f4c64d67-4mm6l                    1/1     Running   0              85m     10.254.15.28    worker1.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-endpoint-8f4c64d67-b7fs2                    1/1     Running   0              6h40m   10.254.21.62    worker2.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-endpoint-8f4c64d67-n46s4                    1/1     Running   0              85m     10.254.21.63    worker2.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-endpoint-8f4c64d67-tc5nc                    1/1     Running   0              6h41m   10.254.14.234   worker1.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-endpoint-8f4c64d67-vsh7j                    1/1     Running   0              85m     10.254.18.94    worker0.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-endpoint-8f4c64d67-z2hfx                    1/1     Running   0              6h40m   10.254.18.36    worker0.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-operator-54877b7dc9-rcvf8                   1/1     Running   0              4d1h    10.254.19.40    worker0.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
     
     Step 2:  Kubelet service is down 

[core@worker1 ~]$ sudo systemctl stop kubelet
[core@worker1 ~]$

Step 3:  backing store went to worker0 
NAME                                               READY   STATUS              RESTARTS       AGE     IP              NODE                                  NO
MINATED NODE   READINESS GATES
noobaa-core-0                                      1/1     Running             0              6h47m   10.254.21.59    worker2.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-db-pg-0                                     1/1     Running             0              6h47m   10.254.21.60    worker2.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-default-backing-store-noobaa-pod-0e46f0f9   0/1     ContainerCreating   0              16s     <none>          worker0.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-endpoint-8f4c64d67-4465r                    0/1     Pending             0              16s     <none>          <none>                                <n
one>           <none>
noobaa-endpoint-8f4c64d67-b7fs2                    1/1     Running             0              6h43m   10.254.21.62    worker2.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-endpoint-8f4c64d67-l2f8c                    0/1     Pending             0              16s     <none>          <none>                                <n
one>           <none>
noobaa-endpoint-8f4c64d67-n46s4                    1/1     Running             0              88m     10.254.21.63    worker2.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-endpoint-8f4c64d67-vsh7j                    1/1     Running             0              88m     10.254.18.94    worker0.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-endpoint-8f4c64d67-z2hfx                    1/1     Running             0              6h43m   10.254.18.36    worker0.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-operator-54877b7dc9-rcvf8                   1/1     Running             0              4d1h    10.254.19.40    worker0.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>


Step 4:  CrashLoopback off state 
NAME                                               READY   STATUS             RESTARTS        AGE     IP              NODE                                  NO
MINATED NODE   READINESS GATES
noobaa-core-0                                      1/1     Running            0               7h33m   10.254.21.59    worker2.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-db-pg-0                                     1/1     Running            0               7h33m   10.254.21.60    worker2.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-default-backing-store-noobaa-pod-0e46f0f9   0/1     CrashLoopBackOff   10 (5m4s ago)   46m     10.254.18.127   worker0.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-endpoint-8f4c64d67-4465r                    0/1     Pending            0               46m     <none>          <none>                                <n
one>           <none>
noobaa-endpoint-8f4c64d67-b7fs2                    1/1     Running            0               7h29m   10.254.21.62    worker2.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-endpoint-8f4c64d67-l2f8c                    0/1     Pending            0               46m     <none>          <none>                                <n
one>           <none>
noobaa-endpoint-8f4c64d67-n46s4                    1/1     Running            0               134m    10.254.21.63    worker2.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-endpoint-8f4c64d67-vsh7j                    1/1     Running            0               134m    10.254.18.94    worker0.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-endpoint-8f4c64d67-z2hfx                    1/1     Running            0               7h29m   10.254.18.36    worker0.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-operator-54877b7dc9-rcvf8                   1/1     Running            0               4d2h    10.254.19.40    worker0.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>

Step 5:  oc describe pod of backing store (snippet for Events log) 

Events:
  Type     Reason                  Age                    From                     Message
  ----     ------                  ----                   ----                     -------
  Normal   Scheduled               41m                    default-scheduler        Successfully assigned openshift-storage/noobaa-default-backing-store-noobaa-pod-0e46f0f9 to worker0.rkomandu-ta.cp.fyre.ibm.com
  Warning  FailedAttachVolume      41m                    attachdetach-controller  Multi-Attach error for volume "pvc-18a7288d-11c5-456f-ae1f-b199a7716ce3" Volume is already exclusively attached to one node and can't be attached to another
  Warning  FailedMount             39m                    kubelet                  Unable to attach or mount volumes: unmounted volumes=[noobaastorage], unattached volumes=[tmp-logs-vol kube-api-access-blp86 noobaastorage]: timed out waiting for the condition
  Warning  FailedMount             37m                    kubelet                  Unable to attach or mount volumes: unmounted volumes=[noobaastorage], unattached volumes=[noobaastorage tmp-logs-vol kube-api-access-blp86]: timed out waiting for the condition
  Normal   SuccessfulAttachVolume  35m                    attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-18a7288d-11c5-456f-ae1f-b199a7716ce3"
  Normal   AddedInterface          35m                    multus                   Add eth0 [10.254.18.127/22] from openshift-sdn
  Normal   Pulled                  30m (x5 over 35m)      kubelet                  Container image "quay.io/rhceph-dev/odf4-mcg-core-rhel8@sha256:5507f2c1074bfb023415f0fef16ec42fbe6e90c540fc45f1111c8c929e477910" already present on machine
  Normal   Created                 30m (x5 over 34m)      kubelet                  Created container noobaa-agent
  Normal   Started                 30m (x5 over 34m)      kubelet                  Started container noobaa-agent
  Warning  BackOff                 4m20s (x106 over 33m)  kubelet                  Back-off restarting failed container


So here the worker1 to worker0 is not possible for the backing store to move.. 

oc get pvc
NAME                                               STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                AGE
db-noobaa-db-pg-0                                  Bound    pvc-add68276-5bb5-496a-9e7b-c7faa4f88960   50Gi       RWO            ibm-spectrum-scale-sample   7h35m
noobaa-default-backing-store-noobaa-pvc-0e46f0f9   Bound    pvc-18a7288d-11c5-456f-ae1f-b199a7716ce3   50Gi       RWO            ibm-spectrum-scale-sample   7h33m
noobaa-s3resvol-pvc-4080029599                     Bound    noobaa-s3respv-4080029599                  50Gi       RWX                                        7h33m

rkomandu · 2022-01-24T13:05:50Z

@baum , for HPO (IBM SS) we use the NSFS as you might very well know, how does this backingstore really affect us ?

Please see below

`oc get pods -o wide
NAME                                               READY   STATUS             RESTARTS         AGE     IP              NODE                                  NOMINATED NODE   READINESS GATES
noobaa-core-0                                      1/1     Running            0                7h38m   10.254.21.59    worker2.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
noobaa-db-pg-0                                     1/1     Running            0                7h38m   10.254.21.60    worker2.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
noobaa-default-backing-store-noobaa-pod-0e46f0f9   0/1     CrashLoopBackOff   11 (4m54s ago)   51m     10.254.18.127   worker0.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
noobaa-endpoint-8f4c64d67-4465r                    0/1     Pending            0                51m     <none>          <none>                                <none>           <none>
noobaa-endpoint-8f4c64d67-b7fs2                    1/1     Running            0                7h35m   10.254.21.62    worker2.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
noobaa-endpoint-8f4c64d67-l2f8c                    0/1     Pending            0                51m     <none>          <none>                                <none>           <none>
noobaa-endpoint-8f4c64d67-n46s4                    1/1     Running            0                139m    10.254.21.63    worker2.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
noobaa-endpoint-8f4c64d67-vsh7j                    1/1     Running            0                139m    10.254.18.94    worker0.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
noobaa-endpoint-8f4c64d67-z2hfx                    1/1     Running            0                7h35m   10.254.18.36    worker0.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
noobaa-operator-54877b7dc9-rcvf8                   1/1     Running            0                4d2h    10.254.19.40    worker0.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
ocs-metrics-exporter-7955bfc785-8dsgr              1/1     Running            0                4d2h    10.254.20.229   worker2.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
ocs-operator-57d785c8c7-q5smj                      1/1     Running            0                46m     10.254.18.124   worker0.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
ocs-operator-57d785c8c7-qtqdv                      1/1     Terminating        19 (123m ago)    4d2h    10.254.14.140   worker1.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
odf-console-756c9c8bc7-4gtvv                       1/1     Running            0                4d2h    10.254.20.230   worker2.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
odf-operator-controller-manager-89746b599-27v9h    2/2     Running            19 (48m ago)     4d2h    10.254.20.228   worker2.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
rook-ceph-operator-74864f7c6f-8f8d2                1/1     Running            0                46m     10.254.18.125   worker0.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
rook-ceph-operator-74864f7c6f-k8l2w                1/1     Terminating        0                4d2h    10.254.14.141   worker1.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>

`

As a result of the worker1 node Kubelet service down, the rook-ceph-operator-74864f7c6f-k8l2w, ocs-operator-57d785c8c7-qtqdv pods also are in Terminating state.

Could you talk to Nimrod about this ?

rkomandu · 2022-01-24T14:53:42Z

@deeghuge ,

This defect is kind of blocker for GA and it was discussed in the HPO DCT call.

deeghuge · 2022-01-24T16:36:43Z

Hi @rkomandu , This behaviour is expected if node with CSI Attacher statefulSet goes down. Since attacher is down, pod moving from Kubelet down node to other nodes will keep failing until CSI Attacher statefulSet comes back.

For robustness we do have documented suggestion for statefulSet deployment. IFffollowed there is less likely chance to get into issue you are seeing.
https://www.ibm.com/docs/en/spectrum-scale-csi?topic=planning-deployment-considerations

Node selection for StatefulSets: CSI external attacher and CSI external provisioner are sidecar containers that run as two separate StatefulSets. These pods can be scheduled on any of the worker nodes by Kubernetes. As a best practice, it is recommended to run these pods on two separate stable nodes. The StatefulSets by design of Kubernetes do not automatically fail over to another node, hence it is recommended to schedule them to run on reliable nodes. On Red Hat OpenShift, if the infrastructure nodes are worker nodes, it is recommended to schedule the sidecar containers to run on infrastructure nodes. Scheduling them to run on specific nodes can be achieved by using nodes labels and nodeSelectors. For more information, see Using the node selector. IBM Spectrum Scale Container Storage Interface driver pod must also be scheduled on the nodes that run StatefulSets.

rkomandu · 2022-01-24T17:05:25Z

Hi @deeghuge ,

For HPO GA solution we have 3MW (stacked master-worker) and the Failover/Failback can occur on any node. Even the power / node failure might happen, then how it can be addressed ?

Note: On Fyre we are trying with 3M+3W due to Memory constraints and the above exercise is on this env. However the case, it does affect the operation of HPO

troppens · 2022-01-24T17:15:46Z

@deeghuge, we are using this volume as PV for a Postgres DB. We need a mechanism for fully automated HA, in case of for instance OCP node failures. Is this doable for a compact OpenShift Cluster and are there best practices for configuring CSI?

Jainbrt · 2022-01-24T17:29:51Z

@troppens we do already have long pending defect/request #65 to address the HA for attacher/prvisioner sidecar and due to resource limit it is getting de-prioritized.

deeghuge · 2022-01-31T14:18:21Z

There are two scenarios to reported problem

When CSI-Attacher is running on same nodes as noobaa-db pod and Kubelet goes down, noobaa-db never comes up --> This is under investigation
When CSI-Attacher is running on different node and noobaa-db pod runs on different node and Kubelet goes down on nodes where noobaa-db is running then noobaa-db pod comes up but it takes around the ~6 minutes. I suspect this is Kubernetes behaviour around timeout. Any concern for this ?

Also while testing on Ravi's setup Noobaa-db pods keep crashing with following errors. What can be the reason for same ?

[[email protected] ~]# oc logs noobaa-db-pg-0
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2022-01-31 13:26:54.644 UTC [22] LOG:  starting PostgreSQL 12.9 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), 64-bit
2022-01-31 13:26:54.647 UTC [22] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-01-31 13:26:54.651 UTC [22] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-01-31 13:26:54.721 UTC [22] LOG:  redirecting log output to logging collector process
2022-01-31 13:26:54.721 UTC [22] HINT:  Future log output will appear in directory "log".
 done
server started
/var/run/postgresql:5432 - accepting connections
=> sourcing /usr/share/container-scripts/postgresql/start/set_passwords.sh ...
ERROR:  tuple already updated by self

@rkomandu @baum

troppens · 2022-01-31T17:58:41Z

@deeghuge - In a first step we need HA. 6 minutes is not ideal, but at least a baseline. Over time we can think about how to reduce fail-over time.

rkomandu · 2022-02-01T05:57:15Z

@deeghuge

I tried to follow the steps for this bug resolution using the https://access.redhat.com/solutions/5668581. Thanks to @aspalazz for providing the downloaded file (as i don't have access to RedHat Subscription).

Tried steps as per documentation and it didn't get the noobaa-db-pg-0 into Running state from CrashLoopBackOff.

As there is no deployment for the noobaa-db-pg-0, only statefulset tried to set replica of the pod to "0" and then set replica to "1"

[[email protected] ~]# oc scale statefulset noobaa-db-pg --replicas=0
statefulset.apps/noobaa-db-pg scaled

[[email protected] ~]# oc scale statefulset noobaa-db-pg --replicas=1
statefulset.apps/noobaa-db-pg scaled

This resulted in the noobaa-db-pg-0 pod coming back

noobaa-db-pg-0                                     1/1     Running   0               6m14s   10.254.18.22    worker0.rkomandu-ta.cp.fyre.ibm.com   <none>
      <none>

touch test of accounts worked for now. Will try new users and IO..

FYI
 mmdas account list

 Name                   UID     GID     New buckets path
 ----                   ---     ---     ----------------
 [email protected]        5300    5555    /mnt/remote-sample/user-5300-bucket-27jan/
 [email protected]        5301    5555    /mnt/remote-sample/user-5301-bucket-27jan/
 [email protected]        5302    5555    /mnt/remote-sample/user-5302-bucket-27jan/

As per our yday discussion and @troppens input, please continue to work on the Scenario 1 & 2.

[[email protected] ~]# oc scale statefulset noobaa-db-pg --replicas=0
statefulset.apps/noobaa-db-pg scaled
[[email protected] ~]# oc debug noobaa-db-pg-0
Error from server (NotFound): pods "noobaa-db-pg-0" not found
[[email protected] ~]# oc scale statefulset noobaa-db-pg --replicas=1
statefulset.apps/noobaa-db-pg scaled

rkomandu · 2022-02-01T12:04:12Z

@deeghuge
For scenario 1, where the noobaa-db-pg-0 and csi-attacher are on different nodes, the noobaa-db-pg-0 came back into Running state in 6mXSec. However there is a problem for the new user creation or new bucket creation eventhough noobaa-db-pg-0 is Active.

`
Can't create new bucket eventhough noobaa-db-pg-0 is running
[root@rkomandu-app-node1 scripts]# s3u5300 mb s3://newbucket-worker1-noobaa-db-down-5300
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.17.127.178'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.17.127.178'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.17.127.178'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
make_bucket failed: s3://newbucket-worker1-noobaa-db-down-5300 Read timeout on endpoint URL: "https://10.17.127.178/newbucket-worker1-noobaa-db-down-5300"

Could upload to the already existing bucket

[root@rkomandu-app-node1 scripts]# s3u5300 cp /bin/date s3://newbucket-u5300-01feb
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.17.127.178'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
upload: ../../bin/date to s3://newbucket-u5300-01feb/date

noobaa-db-pg-0 1/1 Running 0 73m 10.254.23.217 worker2.rkomandu-ta.cp.fyre.ibm.com

`
This is really a Blocker. Now as this DB issue we have to ask Noobaa team or CSI will try to do further check

mmdas account create [email protected] --uid 5303 --gid 5555
Something went wrong while processing the request.
Check 'ibm-spectrum-scale-das-endpoint' pod logs in 'ibm-spectrum-scale-das' namespace for more details

rkomandu · 2022-02-01T12:06:37Z

@baum , would you check from noobaa-db perspective ? If the node is made down the noobaa-db-pg-0 moved to the other node but the functioning has stopped like

For ex: below are failing
-- any new bucket creation
-- any account creation

oc get nodes
NAME                                  STATUS     ROLES    AGE   VERSION
master0.rkomandu-ta.cp.fyre.ibm.com   Ready      master   53d   v1.22.0-rc.0+a44d0f0
master1.rkomandu-ta.cp.fyre.ibm.com   Ready      master   53d   v1.22.0-rc.0+a44d0f0
master2.rkomandu-ta.cp.fyre.ibm.com   Ready      master   53d   v1.22.0-rc.0+a44d0f0
worker0.rkomandu-ta.cp.fyre.ibm.com   Ready      worker   53d   v1.22.0-rc.0+a44d0f0
worker1.rkomandu-ta.cp.fyre.ibm.com   NotReady   worker   53d   v1.22.0-rc.0+a44d0f0
worker2.rkomandu-ta.cp.fyre.ibm.com   Ready      worker   53d   v1.22.0-rc.0+a44d0f0

baum · 2022-02-01T15:19:09Z

@deeghuge re ERROR: tuple already updated by self during PostgreSQL startup might indicate corrupted database data structures such as indexes/catalogs. It might be a part of crash recovery. Do you see the DB eventually start operation?

baum · 2022-02-01T15:26:28Z

@rkomandu DB restart would cause interruption for some period of time however, once the DB is up and the noobaa core reconnected, you should be able to create buckets, accounts, etc.

Does the NooBaa CR phase is Ready?

➜  oc get noobaa noobaa
NAME     MGMT-ENDPOINTS                   S3-ENDPOINTS                     STS-ENDPOINTS                    IMAGE                                PHASE   AGE
noobaa   ["https://192.168.65.4:31337"]   ["https://192.168.65.4:31839"]   ["https://192.168.65.4:32196"]   noobaa/noobaa-core:5.10.0-20220120   Ready   5d4h

rkomandu · 2022-02-01T15:47:06Z

@rkomandu DB restart would cause interruption for some period of time however, once the DB is up and the noobaa core reconnected, you should be able to create buckets, accounts, etc.

Does the NooBaa CR phase is Ready?
➜  oc get noobaa noobaa
NAME     MGMT-ENDPOINTS                   S3-ENDPOINTS                     STS-ENDPOINTS                    IMAGE                                PHASE   AGE
noobaa   ["https://192.168.65.4:31337"]   ["https://192.168.65.4:31839"]   ["https://192.168.65.4:32196"]   noobaa/noobaa-core:5.10.0-20220120   Ready   5d4h

Noobaa is ready, otherwise how the IO worked for the existing buckets to upload the data.

oc get noobaa noobaa
NAME     MGMT-ENDPOINTS                    S3-ENDPOINTS                                                                                  IMAGE                                                                                                            PHASE   AGE
noobaa   ["https://10.17.127.141:32227"]   ["https://10.17.126.253:32532","https://10.17.126.141:32532","https://10.17.127.141:32532"]   quay.io/rhceph-dev/odf4-mcg-core-rhel8@sha256:5507f2c1074bfb023415f0fef16ec42fbe6e90c540fc45f1111c8c929e477910   Ready   5d10h

There is still a problem in the DB which Noobaa team need to investigate as per my above post, where DB is running but the accounts , buckets can't be created

rkomandu · 2022-02-07T05:09:56Z

@deeghuge ,
Noobaa team wants to close the bug 6853 which was originated there and then moved to CSI as per this issue. Can I close that defect or do you still need anything from them ?

deeghuge · 2022-02-07T05:14:28Z

Hi @rkomandu , we can close the 6853 for now. if required in future after CSI fix we can reopen

rkomandu · 2022-03-31T10:15:28Z

@deeghuge

Just a comment here: this is on the 2.5.0 with the csi-attacher-0/1 running as sts on two different nodes

NAME READY AGE
statefulset.apps/ibm-spectrum-scale-csi-attacher 2/2 8d

oc get lease -n ibm-spectrum-scale-csi
NAME HOLDER AGE
external-attacher-leader-spectrumscale-csi-ibm-com ibm-spectrum-scale-csi-attacher-1 8d
ibm-spectrum-scale-csi-operator ibm-spectrum-scale-csi-operator-9c4684b76-m49sp_1c5098e4-e3d3-4a36-843d-0e9056bec625 8d

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE REA
DINESS GATES
ibm-spectrum-scale-csi-attacher-0 1/1 Running 132 (142m ago) 8d 10.254.16.26 worker0.rkomandu-513.cp.fyre.ibm.com
ibm-spectrum-scale-csi-attacher-1 1/1 Running 134 (142m ago) 8d 10.254.12.24 worker1.rkomandu-513.cp.fyre.ibm.com

Noobaa-db-pg-0 is running on the worker2 node. Issued the worker2 node down

oc get nodes
NAME STATUS ROLES AGE VERSION
worker0.rkomandu-513.cp.fyre.ibm.com Ready worker 8d v1.22.3+e790d7f
worker1.rkomandu-513.cp.fyre.ibm.com Ready worker 8d v1.22.3+e790d7f
worker2.rkomandu-513.cp.fyre.ibm.com NotReady worker 8d v1.22.3+e790d7f

Now the noobaa-db-pg-0 pod takes about 6m Xsec to migrate to the worker1 node as it waits in Init state for these many mins.

noobaa-db-pg-0 1/1 Running 0 35m 10.254.12.27 worker1.rkomandu-513.cp.fyre.ibm.com

Event showed as follows 

  Warning  FailedMount             4m23s      kubelet                  Unable to attach or mount volumes: unmounted volumes=[db], unattached volumes=[db kube-ss-ntdws noobaa-postgres-initdb-sh-volume noobaa-postgres-config-volume]: timed out waiting for the condition
  Warning  FailedMount             2m7s       kubelet                  Unable to attach or mount volumes: unmounted volumes=[db], unattached volumes=[noobaa-pinitdb-sh-volume noobaa-postgres-config-volume db kube-api-access-ntdws]: timed out waiting for the condition
  Normal   SuccessfulAttachVolume  25s        attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-d8bbc960-29cc-4723-a6ee-dab2bdc56ec2"

After 6mXsec it moved from Init state to Running state on the worker1 node (migrated from worker2 --> worker1)

With the sts in place for csi-attacher, is the volume hasn't taken into consideration ?

@troppens , we need to add this into our documentation.

deeghuge · 2022-05-25T14:21:59Z

@nitishkumar4 @Jainbrt This should be fixed by #722 right ?

Jainbrt · 2022-05-26T10:33:39Z

@rkomandu could you please verify the fix with latest CSI 2.6.0 images ?

rkomandu · 2022-06-07T06:40:35Z

We are in the process of installing the CNSA 514 interim builds. Once we do that then will be able to verify this.

deeghuge · 2022-06-30T04:29:07Z

@rkomandu Please help verify and close this

NeeshaPan · 2022-07-08T06:06:40Z

After failback of Noobaa-db and Noobaa-core pod that were running on the same node. noobaa-db does not up, it remains in CrashLoopBackOff state.

Steps to repo:

Faiover the node that have both Noobaa-db and Noobaa-core pod running using systemctl stop kubelet command.
Try to run I/O and create account.
After that, do the failback using systemctl start kubelet command.
After failback, noobaa-db pod is in in CrashLoopBackOff state.

Output of systemctl stop kubelet command

core@hpo-app15 ~]$ sudo systemctl stop kubelet
[core@hpo-app15 ~]$
[core@hpo-app15 ~]$ sudo systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-mco-default-madv.conf, 20-logging.conf, 20-nodenet.conf
   Active: inactive (dead) since Thu 2022-07-07 10:07:21 UTC; 8s ago

After failover noobaa-db is up and running

Output of openshift-storage pods after failover

Every 5.0s: oc get pods -n openshift-storage -o wide; oc get pods -n ibm-spectrum-scale-das -o wide; oc get svc -A |g...  hpo-app11: Thu Jul  7 06:16:44 2022
NAME                                               READY   STATUS        RESTARTS      AGE     IP             NODE        NOMINATED NODE   READINESS GATES
csi-addons-controller-manager-5cf89687fb-7cknl     2/2     Terminating   0             3d1h    10.128.5.69    hpo-app15   <none>           <none>
csi-addons-controller-manager-5cf89687fb-xklml     2/2     Running       0             3m36s   10.128.2.157   hpo-app13   <none>           <none>
noobaa-core-0                                      1/1     Running       0             8m42s   10.128.2.119   hpo-app13   <none>           <none>
noobaa-db-pg-0                                     1/1     Running       0             8m42s   10.128.2.164   hpo-app13   <none>           <none>
noobaa-default-backing-store-noobaa-pod-7db3b453   1/1     Running       0             5d15h   10.128.1.8     hpo-app12   <none>           <none>
noobaa-endpoint-8cf8d9bfc-4b5sr                    1/1     Running       1 (44h ago)   5d15h   10.128.1.10    hpo-app12   <none>           <none>
noobaa-endpoint-8cf8d9bfc-gv4fq                    1/1     Running       0             3h39m   10.128.2.103   hpo-app13   <none>           <none>
noobaa-endpoint-8cf8d9bfc-mp9q8                    0/1     Pending       0             8m42s   <none>         <none>      <none>           <none>
noobaa-operator-58789697c6-92t7g                   1/1     Running       0             3d1h    10.128.1.138   hpo-app12   <none>           <none>
ocs-metrics-exporter-77d97594f4-2fpjs              1/1     Running       0             3m36s   10.128.2.163   hpo-app13   <none>           <none>
ocs-metrics-exporter-77d97594f4-zsbc5              1/1     Terminating   0             3d1h    10.128.5.73    hpo-app15   <none>           <none>
ocs-operator-8668749db6-gss52                      1/1     Running       0             3d1h    10.128.1.141   hpo-app12   <none>           <none>
odf-console-5f886c99d6-7fj7w                       1/1     Running       0             5d16h   10.128.1.0     hpo-app12   <none>           <none>
odf-operator-controller-manager-7bfb6545cd-mvnfj   2/2     Running       0             3d1h    10.128.1.140   hpo-app12   <none>           <none>
rook-ceph-operator-86698f57bc-kv9hf                1/1     Running       0             3m36s   10.128.0.141   hpo-app12   <none>           <none>
rook-ceph-operator-86698f57bc-zjbpf                1/1     Terminating   0             3d1h    10.128.5.70    hpo-app15   <none>           <none>
NAME                                                         READY   STATUS              RESTARTS        AGE     IP             NODE        NOMINATED NODE
READINESS GATES
ibm-spectrum-scale-das-controller-manager-5fffd98fcf-4g4v9   0/2     ContainerCreating   0               3m36s   <none>         hpo-app13   <none>
<none>
ibm-spectrum-scale-das-controller-manager-5fffd98fcf-qbccz   2/2     Terminating         2 (3h39m ago)   5d16h   10.128.5.72    hpo-app15   <none>
<none>
ibm-spectrum-scale-das-endpoint-879746999-2ph77              1/1     Running             0               3m36s   10.128.2.155   hpo-app13   <none>
<none>
ibm-spectrum-scale-das-endpoint-879746999-67h25              1/1     Running             0               5d15h   10.128.1.9     hpo-app12   <none>
<none>
ibm-spectrum-scale-das-endpoint-879746999-nfwwr              1/1     Terminating         0               3d1h    10.128.5.74    hpo-app15   <none>
<none>
openshift-storage                                  das-s3-hpo-app12                                            LoadBalancer   172.30.231.104   10.49.0.109
                         80:30328/TCP,443:31631/TCP,8444:31176/TCP,7004:30190/TCP   5d15h
openshift-storage                                  das-s3-hpo-app13                                            LoadBalancer   172.30.220.173   10.49.0.110
                         80:30546/TCP,443:31607/TCP,8444:30223/TCP,7004:30416/TCP   5d15h
openshift-storage                                  das-s3-hpo-app15                                            LoadBalancer   172.30.42.127    10.49.0.111
                         80:32735/TCP,443:30824/TCP,8444:32166/TCP,7004:30013/TCP   5d15h
openshift-storage                                  noobaa-mgmt                                                 LoadBalancer   172.30.159.60    <pending>
                         80:31396/TCP,443:30132/TCP,8445:30397/TCP,8446:31731/TCP   5d15h
openshift-storage                                  s3                                                          LoadBalancer   172.30.206.147   <pending>
                         80:30934/TCP,443:30179/TCP,8444:32357/TCP,7004:32195/TCP   5d15h

Account creation output

 [root@hpo-app11 ~]# mmdas account create s3user2tmp --uid 8092  --gid 9002  --newBucketsPath /mnt/remote-sample/export-user2
Account is created successfully. The secret and access keys are as follows.
 Secret Key                                     Access Key
 ----------                                     -----------
 686pN2fJCKKAlBBaxaW0bd6j6K9CKkWPe/T/jxVj       O8awlIyjFi9N4ODg11b7

Output of bucket create command

root@hpo-app11 cnsa-514-builds]# s3user1 mb s3://newbuckettemp
/usr/local/aws/lib/python3.6/site-packages/urllib3/connectionpool.py:1050: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.49.0.109'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
make_bucket: newbuckettemp

[root@hpo-app11 cnsa-514-builds]# s3user2 mb s3://newbucket
/usr/local/aws/lib/python3.6/site-packages/urllib3/connectionpool.py:1050: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.49.0.110'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
make_bucket: newbucket

[root@hpo-app11 ~]# s3user3  mb s3://newbucketuser3
/usr/local/aws/lib/python3.6/site-packages/urllib3/connectionpool.py:1050: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.49.0.111'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
make_bucket: newbucketuser3

Failback

[core@hpo-app15 ~]$ sudo systemctl start kubelet
[core@hpo-app15 ~]$ sudo systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-mco-default-madv.conf, 20-logging.conf, 20-nodenet.conf
   Active: active (running) since Thu 2022-07-07 10:23:56 UTC; 10s ago
  Process: 2012301 ExecStartPre=/bin/rm -f /var/lib/kubelet/memory_manager_state (code=exited, status=0/SUCCESS)
  Process: 2012298 ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state (code=exited, status=0/SUCCESS)
  Process: 2012294 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests (code=exited, status=0/SUCCESS

Openshift-storage pods

Every 5.0s: oc get pods -n openshift-storage -o wide; oc get pods -n ibm-spectrum-scale-das -o wide; oc get svc -A |g...  hpo-app11: Thu Jul  7 06:43:16 2022

NAME                                               READY   STATUS             RESTARTS        AGE     IP             NODE        NOMINATED NODE   READINESS G
ATES
csi-addons-controller-manager-5cf89687fb-xklml     2/2     Running            0               30m     10.128.2.157   hpo-app13   <none>           <none>
noobaa-core-0                                      1/1     Running            0               35m     10.128.2.119   hpo-app13   <none>           <none>
noobaa-db-pg-0                                     0/1     CrashLoopBackOff   8 (2m47s ago)   35m     10.128.2.164   hpo-app13   <none>           <none>
noobaa-default-backing-store-noobaa-pod-7db3b453   1/1     Running            0               5d16h   10.128.1.8     hpo-app12   <none>           <none>
noobaa-endpoint-8cf8d9bfc-4b5sr                    1/1     Running            2 (3m8s ago)    5d16h   10.128.1.10    hpo-app12   <none>           <none>
noobaa-endpoint-8cf8d9bfc-gv4fq                    1/1     Running            0               4h5m    10.128.2.103   hpo-app13   <none>           <none>
noobaa-endpoint-8cf8d9bfc-mp9q8                    1/1     Running            0               35m     10.128.5.144   hpo-app15   <none>           <none>
noobaa-operator-58789697c6-92t7g                   1/1     Running            0               3d2h    10.128.1.138   hpo-app12   <none>           <none>
ocs-metrics-exporter-77d97594f4-2fpjs              1/1     Running            0               30m     10.128.2.163   hpo-app13   <none>           <none>
ocs-operator-8668749db6-gss52                      1/1     Running            0               3d2h    10.128.1.141   hpo-app12   <none>           <none>
odf-console-5f886c99d6-7fj7w                       1/1     Running            0               5d16h   10.128.1.0     hpo-app12   <none>           <none>
odf-operator-controller-manager-7bfb6545cd-mvnfj   2/2     Running            0               3d2h    10.128.1.140   hpo-app12   <none>           <none>
rook-ceph-operator-86698f57bc-kv9hf                1/1     Running            0               30m     10.128.0.141   hpo-app12   <none>           <none>
NAME                                                         READY   STATUS    RESTARTS   AGE     IP             NODE        NOMINATED NODE   READINESS GATES
ibm-spectrum-scale-das-controller-manager-5fffd98fcf-4g4v9   2/2     Running   0          30m     10.128.2.156   hpo-app13   <none>           <none>
ibm-spectrum-scale-das-endpoint-879746999-2ph77              1/1     Running   0          30m     10.128.2.155   hpo-app13   <none>           <none>
ibm-spectrum-scale-das-endpoint-879746999-67h25              1/1     Running   0          5d16h   10.128.1.9     hpo-app12   <none>           <none>

Account creation after failback

[root@hpo-app11 ~]# mmdas account create s3user2tmp1 --uid 8092  --gid 9002  --newBucketsPath /mnt/remote-sample/export-user2
this.begin() must be called before sending queries on this transaction

Output of oc describe pod

Events:
  Type     Reason                  Age                 From                     Message
  ----     ------                  ----                ----                     -------
  Normal   Scheduled               33m                 default-scheduler        Successfully assigned openshift-storage/noobaa-db-pg-0 to hpo-app13
  Warning  FailedAttachVolume      33m                 attachdetach-controller  Multi-Attach error for volume "pvc-d682f776-a1a5-4b10-8b3b-6dcf60aa07dd" Volume is already exclusively attached to one node and can't be attached to another
  Warning  FailedMount             31m                 kubelet                  Unable to attach or mount volumes: unmounted volumes=[db], unattached volumes=[noobaa-postgres-config-volume db kube-api-access-c4hgw noobaa-postgres-initdb-sh-volume]: timed out waiting for the condition
  Warning  FailedMount             28m                 kubelet                  Unable to attach or mount volumes: unmounted volumes=[db], unattached volumes=[db kube-api-access-c4hgw noobaa-postgres-initdb-sh-volume noobaa-postgres-config-volume]: timed out waiting for the condition
  Normal   SuccessfulAttachVolume  27m                 attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-d682f776-a1a5-4b10-8b3b-6dcf60aa07dd"
  Normal   Pulled                  27m                 kubelet                  Container image "quay.io/rhceph-dev/odf4-mcg-core-rhel8@sha256:130374df22aea4a27219f2f927d2e786f95f8ffe639bc373397e9335594c662b" already present on machine
  Normal   AddedInterface          27m                 multus                   Add eth0 [10.128.2.164/23] from openshift-sdn
  Normal   Pulling                 27m                 kubelet                  Pulling image "quay.io/rhceph-dev/rhel8-postgresql-12@sha256:82d171ab0ce78a0157408662155b53d4f637947a303bfecb684f6132f5f468be"
  Normal   Created                 27m                 kubelet                  Created container init
  Normal   Started                 27m                 kubelet                  Started container init
  Normal   Pulled                  26m                 kubelet                  Successfully pulled image "quay.io/rhceph-dev/rhel8-postgresql-12@sha256:82d171ab0ce78a0157408662155b53d4f637947a303bfecb684f6132f5f468be" in 10.700943271s

Snippet of noobaa log

[root@hpo-app11 ~]# oc logs noobaa-db-pg-0 -n openshift-storage
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2022-07-07 12:28:09.645 UTC [22] LOG:  starting PostgreSQL 12.11 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit
2022-07-07 12:28:09.645 UTC [22] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-07-07 12:28:09.646 UTC [22] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-07-07 12:28:09.670 UTC [22] LOG:  redirecting log output to logging collector process
2022-07-07 12:28:09.670 UTC [22] HINT:  Future log output will appear in directory "log".
 done
server started
/var/run/postgresql:5432 - accepting connections
=> sourcing /usr/share/container-scripts/postgresql/start/set_passwords.sh ...
ERROR:  tuple concurrently updated

NeeshaPan · 2022-07-12T06:51:45Z

FYI @deeghuge

deeghuge · 2022-07-12T14:30:55Z

@nitishkumar4 Please take a look

nitishkumar4 · 2022-07-14T05:26:39Z

@NeeshaPan As discussed, the issue is fixed in the above cluster by gracefully stopping all Postgress services and starting it again. However to ensure if this issue is reproducible and is linked to CSI I will require a cluster to look into. Since the cluster where this issue was observed is being used for other purpose right now, please let me know whenever the cluster is available.

NeeshaPan · 2022-07-14T07:19:29Z

we have similar issue i.e noobaa/noobaa-core#6953 opened in noobaa repo.

rkomandu · 2022-08-01T08:30:15Z

@NeeshaPan , continue to try this and let us see if CSI functionality is working or not

NeeshaPan · 2022-09-01T11:07:07Z

@deeghuge Could you please take a look into this issue.
As mentioned in above comment i.e #563 (comment) noobaa-db remain in crashloopback state after failback. After discussing the same with noobaa team, found that noobaa-db remains in crashloopback state due to mounted volumes attached to nodes. This issue is reproducible on BM, however not on fyre.

Attached file that have output of pods & logs and procedure followed for FOFB on latest build.
OutputsCapturedDuringFOFB.txt

rkomandu · 2022-09-06T03:39:33Z

@deeghuge ,

Can we meet around 4PM for 30min to discuss on the failures on the above fix. Discussed with Noobaa team in the last 2 weeks and they are referring to the Volume being accessed from 2 different nodes is the problem for the Postgres database is failing

Sending the invite for the same. Please add if you need anyone to join.

@NeeshaPan FYI..

deeghuge · 2022-09-06T12:05:44Z

Thanks @rkomandu @NeeshaPan for the discussion. Here is the summery of discussion

The original issue was - db pod not getting into running state due to attacher(statefulset) was unavailable. The attacher issue was fixed and new logs uploaded shows the same and it is clearly new issue than the original.

Here are few questions we need to get answered from noobaa and k8s experts.

Why noobaa db was restarted when kubelet on app7 node was started ?
Why does noobaa db is in crashloopback ? You mentioned that it is because two db writer on different node. Other than noobaa db pod, no other entity like csi driver or spectrum scale writes anything to any volume. So question is when kubelet is down, the old db pod on app7 node which should be cleaned up by k8s is causing that write when kubelet on node 7 is started ? I think we need kubenetes/kubelet expert to help here
I also noticed one of the comment that this is only reproducible only compact cluster so we should check with expert if shutting down one master node is causing any interference in cleanup process.

To debug further the multi-attach error seen on restarted db pod ( which went away after sometime ) we should capture output of oc get volumeattachment in various stages.

Also please note Spectrum Scale is shared filesystem so data is always available on all the nodes. We rely on k8s for ReadWriteOnce functionality. So it is k8s which make sure only one pod is accessing RWO at any given time.

rkomandu · 2022-09-07T12:40:23Z

@NeeshaPan , take a look at Deepak comment when collecting logs next time for this situation.

Secondly, also "describe the noobaa-db pod" after the failover and later once fail back is done. We need to understand when does these Events

rkomandu · 2022-09-09T05:11:41Z

@deeghuge
Alex from Noobaa team has responded on the above Questions. He is still pointing to the PV for the noobaa database.

Could you add yourself to the noobaa repository , so you can respond to his questions noobaa/noobaa-core#6953.

Jainbrt · 2022-11-18T10:02:05Z

@deeghuge & @rkomandu can we close this issue if the original issue is fixed and open a new one if there is anything ?

rkomandu · 2022-11-18T10:20:47Z

@Jainbrt , the issue is not resolved yet. Few emails are sent to RH team as Deepak said it is to be dealt in the underlying K8's and no response as they are pointing to CSI need to implement as there was mention about RWO some setting (if i recall) ...

Jainbrt · 2022-11-18T10:23:55Z

Thanks @rkomandu for the update, I have removed verification label from the same.

deeghuge · 2024-03-19T09:24:13Z

Closing this issue as no update since long. Please reopen if RH comes back with analysis and changes are required in CSI

rkomandu added the Type: Bug Indicates issue is an undesired behavior, usually caused by code error. label Jan 18, 2022

rkomandu mentioned this issue Jan 18, 2022

noobaa-db-pg pod doesn't migrate when the node has Kubelet service stopped, says PVC can't be moved noobaa/noobaa-core#6853

Closed

deeghuge added this to the v2.6.0 milestone Mar 10, 2022

rkomandu mentioned this issue May 5, 2022

Failover of noobaa-db-pg0 works but fall back leaves pod in CrashLoopback noobaa/noobaa-core#6953

Closed

amdabhad added the Type: Needs Verification by Originator A fix (code, doc update, workaround) has been given and needs verification + closure if agreed label May 31, 2022

deeghuge linked a pull request Jun 30, 2022 that will close this issue

HA CSI sidecars using K8s Deployment #602

Closed

10 tasks

deeghuge assigned nitishkumar4 Jul 12, 2022

deeghuge assigned deeghuge and unassigned nitishkumar4 Sep 6, 2022

Jainbrt removed the Type: Needs Verification by Originator A fix (code, doc update, workaround) has been given and needs verification + closure if agreed label Nov 18, 2022

deeghuge closed this as completed Mar 19, 2024

PVC atached to a pod doesn't migrate across nodes when Kubelet Service is stopped #563

PVC atached to a pod doesn't migrate across nodes when Kubelet Service is stopped #563

Comments

rkomandu commented Jan 18, 2022

Describe the bug

Data Collection and Debugging

Add labels

baum commented Jan 18, 2022

rkomandu commented Jan 18, 2022

deeghuge commented Jan 18, 2022

rkomandu commented Jan 19, 2022 • edited Loading

rkomandu commented Jan 19, 2022

rkomandu commented Jan 19, 2022

baum commented Jan 19, 2022

deeghuge commented Jan 21, 2022 • edited Loading

deeghuge commented Jan 21, 2022

rkomandu commented Jan 24, 2022 • edited Loading

rkomandu commented Jan 24, 2022 • edited Loading

rkomandu commented Jan 24, 2022

deeghuge commented Jan 24, 2022 • edited Loading

rkomandu commented Jan 24, 2022 • edited Loading

troppens commented Jan 24, 2022

Jainbrt commented Jan 24, 2022

deeghuge commented Jan 31, 2022 • edited Loading

troppens commented Jan 31, 2022

rkomandu commented Feb 1, 2022

rkomandu commented Feb 1, 2022

rkomandu commented Feb 1, 2022 • edited Loading

baum commented Feb 1, 2022 • edited Loading

baum commented Feb 1, 2022

rkomandu commented Feb 1, 2022

rkomandu commented Feb 7, 2022

deeghuge commented Feb 7, 2022

rkomandu commented Mar 31, 2022 • edited Loading

deeghuge commented May 25, 2022

Jainbrt commented May 26, 2022

rkomandu commented Jun 7, 2022

deeghuge commented Jun 30, 2022

NeeshaPan commented Jul 8, 2022 • edited Loading

Output of systemctl stop kubelet command

Output of openshift-storage pods after failover

Account creation output

Failback

Openshift-storage pods

Output of oc describe pod

Snippet of noobaa log

NeeshaPan commented Jul 12, 2022

deeghuge commented Jul 12, 2022

nitishkumar4 commented Jul 14, 2022 • edited Loading

NeeshaPan commented Jul 14, 2022

rkomandu commented Aug 1, 2022

NeeshaPan commented Sep 1, 2022

rkomandu commented Sep 6, 2022

deeghuge commented Sep 6, 2022

rkomandu commented Sep 7, 2022

rkomandu commented Sep 9, 2022

Jainbrt commented Nov 18, 2022

rkomandu commented Nov 18, 2022

Jainbrt commented Nov 18, 2022

deeghuge commented Mar 19, 2024

rkomandu commented Jan 19, 2022 •

edited

Loading

deeghuge commented Jan 21, 2022 •

edited

Loading

rkomandu commented Jan 24, 2022 •

edited

Loading

rkomandu commented Jan 24, 2022 •

edited

Loading

deeghuge commented Jan 24, 2022 •

edited

Loading

rkomandu commented Jan 24, 2022 •

edited

Loading

deeghuge commented Jan 31, 2022 •

edited

Loading

rkomandu commented Feb 1, 2022 •

edited

Loading

baum commented Feb 1, 2022 •

edited

Loading

rkomandu commented Mar 31, 2022 •

edited

Loading

NeeshaPan commented Jul 8, 2022 •

edited

Loading

nitishkumar4 commented Jul 14, 2022 •

edited

Loading