Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PVC atached to a pod doesn't migrate across nodes when Kubelet Service is stopped #563

Closed
rkomandu opened this issue Jan 18, 2022 · 50 comments
Assignees
Labels
Customer Impact: Localized high impact (3) Reduction of function. Significant impact to workload. Customer Probability: Medium (3) Issue occurs in normal path but specific limited timing window, or other mitigating factor Severity: 2 Indicates that the issue is critical and must be addressed before milestone. Type: Bug Indicates issue is an undesired behavior, usually caused by code error.
Milestone

Comments

@rkomandu
Copy link

Describe the bug

CNSA - 5112 (CSI bundled with this)

For HPO solution, the building blocks are CNSA, CSI and then on top of it DAS operator (internally installs Noobaa) for S3 object Access.

The complete description of the bug was posted in this Noobaa-core component which is a public repository.

noobaa/noobaa-core#6853

Configured MetalLB on the cluster (which shouldn't matter) for this problem description though.. Have the Noobaa core/db and 3 endpoints are running on the respective worker nodes as shown below

NAME                                               READY   STATUS    RESTARTS        AGE    IP              NODE                                  NOMINATED NO
DE   READINESS GATES
noobaa-core-0                                      1/1     Running   0               20d    10.254.14.77    worker1.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-db-pg-0                                     1/1     Running   3 (17d ago)     20d    10.254.18.0     worker0.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-default-backing-store-noobaa-pod-a1bf952a   1/1     Running   0               20d    10.254.18.4     worker0.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-endpoint-bfffdd599-7jzdf                    1/1     Running   0               3d1h   10.254.20.43    worker2.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-endpoint-bfffdd599-gxz5h                    1/1     Running   0               3d4h   10.254.15.112   worker1.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-endpoint-bfffdd599-mbfrj                    1/1     Running   0               3d4h   10.254.17.208   worker0.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-operator-5c46775cdd-vplhr                   1/1     Running   0               31d    10.254.16.22    worker0.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>

Step 2: Issued a kubelet service stop on the node where the noobaa-db pg pod is running 

[core@worker0 ~]$ sudo systemctl stop kubelet

Step 3:  noobaa-db-pg pod trying to migrate to worker2 from worker0 , noobaa operator restarted,  noobaa endpoint on worker0 has got into Pending state as expected 

NAME                                               READY   STATUS              RESTARTS         AGE    IP              NODE                                  N
OMINATED NODE   READINESS GATES
noobaa-core-0                                      1/1     Running             0                20d    10.254.14.77    worker1.rkomandu-ta.cp.fyre.ibm.com   <
none>           <none>
noobaa-db-pg-0                                     0/1     Init:0/2            0                6s     <none>          worker2.rkomandu-ta.cp.fyre.ibm.com   <
none>           <none>
noobaa-endpoint-bfffdd599-7jzdf                    1/1     Running             0                3d1h   10.254.20.43    worker2.rkomandu-ta.cp.fyre.ibm.com   <
none>           <none>
noobaa-endpoint-bfffdd599-gxz5h                    1/1     Running             0                3d4h   10.254.15.112   worker1.rkomandu-ta.cp.fyre.ibm.com   <
none>           <none>
noobaa-endpoint-bfffdd599-wlktz                    0/1     Pending             0                6s     <none>          <none>                                <
none>           <none>
noobaa-operator-5c46775cdd-9mgxt                   0/1     ContainerCreating   0                6s     <none>          worker2.rkomandu-ta.cp.fyre.ibm.com   <
none>           <none>

Step 4: Noobaa-db-pg pod continues to be in the Init state on the worker2 

NAME                                               READY   STATUS        RESTARTS         AGE     IP              NODE                                  NOMINA
TED NODE   READINESS GATES
noobaa-core-0                                      1/1     Running       0                20d     10.254.14.77    worker1.rkomandu-ta.cp.fyre.ibm.com   <none>
           <none>
noobaa-db-pg-0                                     0/1     Init:0/2      0                7m52s   <none>          worker2.rkomandu-ta.cp.fyre.ibm.com   <none>
           <none>
noobaa-endpoint-bfffdd599-7jzdf                    1/1     Running       0                3d1h    10.254.20.43    worker2.rkomandu-ta.cp.fyre.ibm.com   <none>
           <none>
noobaa-endpoint-bfffdd599-gxz5h                    1/1     Running       0                3d4h    10.254.15.112   worker1.rkomandu-ta.cp.fyre.ibm.com   <none>
           <none>
noobaa-endpoint-bfffdd599-wlktz                    0/1     Pending       0                7m52s   <none>          <none>                                <none>
           <none>
noobaa-operator-5c46775cdd-9mgxt                   1/1     Running       0                7m52s   10.254.20.72    worker2.rkomandu-ta.cp.fyre.ibm.com   <none>
           <none>
           
  Step 5: When described the noobaa-db-pg-0 , it showed the pvc was bound on the worker0 node can't be bound to worker2 node. 
  
  Events:
  Type     Reason              Age                    From                     Message
  ----     ------              ----                   ----                     -------
  Normal   Scheduled           11m                    default-scheduler        Successfully assigned openshift-storage/noobaa-db-pg-0 to worker2.rkomandu-ta.cp.fyre.ibm.com
  Warning  FailedAttachVolume  11m                    attachdetach-controller  Multi-Attach error for volume "pvc-3e03cdb0-a374-4aed-bc3f-6e6f9ba74bca" Volume is already exclusively attached to one node and can't be attached to another
  Warning  FailedMount         2m42s (x4 over 9m31s)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[db], unattached volumes=[db kube-api-access-89bwb noobaa-postgres-initdb-sh-volume noobaa-postgres-config-volume]: timed out waiting for the condition
  Warning  FailedMount         25s                    kubelet                  Unable to attach or mount volumes: unmounted volumes=[db], unattached volumes=[noobaa-postgres-initdb-sh-volume noobaa-postgres-config-volume db kube-api-access-89bwb]: timed out waiting for the condition
  Warning  FailedAttachVolume  16s (x10 over 4m31s)   attachdetach-controller  AttachVolume.Attach failed for volume "pvc-3e03cdb0-a374-4aed-bc3f-6e6f9ba74bca" : rpc error: code = Internal desc = ControllerPublishVolume : Error in getting filesystem Name for filesystem ID of 0D790B0A:61B0F1B9. Error [Get "https://ibm-spectrum-scale-gui.ibm-spectrum-scale:443/scalemgmt/v2/filesystems?filter=uuid=0D790B0A:61B0F1B9": context deadline exceeded (Client.Timeout exceeded while awaiting headers)]

This is a problem as I see, Is there a way to get this resolved.

What happens with this later for HPO team is that, the database is init state, HPO admin can't create any new accounts/exports etc.

Temp Workaround which was done is
on worker0 restarted the "Service Kubelet" that was made down earlier and then the noobaa-db-pg pod moved to worker2 w/o any problem. I u/s that this is linked with Kubelet service for the movement of the pod.

Could you take a look at this defect and provide your thoughts/comments ?

Data Collection and Debugging

Environmental output

  • What openshift/kubernetes version are you running, and the architecture?
    oc version
    Client Version: 4.9.5
    Server Version: 4.9.5
    Kubernetes Version: v1.22.0-rc.0+a44d0f0

  • kubectl get pods -o wide -n < csi driver namespace>

oc get pods -o wide -n ibm-spectrum-scale-csi
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ibm-spectrum-scale-csi-attacher-0 1/1 Running 12 (4d2h ago) 39d 10.254.16.5 worker0.rkomandu-ta.cp.fyre.ibm.com
ibm-spectrum-scale-csi-gqltw 3/3 Running 0 40h 10.17.127.141 worker2.rkomandu-ta.cp.fyre.ibm.com
ibm-spectrum-scale-csi-h78rs 3/3 Running 0 4d2h 10.17.126.141 worker0.rkomandu-ta.cp.fyre.ibm.com
ibm-spectrum-scale-csi-operator-d844fb754-7d9db 1/1 Running 23 (16h ago) 7d5h 10.254.19.7 worker0.rkomandu-ta.cp.fyre.ibm.com
ibm-spectrum-scale-csi-provisioner-0 1/1 Running 12 (4d2h ago) 39d 10.254.16.6 worker0.rkomandu-ta.cp.fyre.ibm.com
ibm-spectrum-scale-csi-resizer-0 1/1 Running 12 (4d2h ago) 39d 10.254.16.4 worker0.rkomandu-ta.cp.fyre.ibm.com
ibm-spectrum-scale-csi-smp65 3/3 Running 0 4d16h 10.17.126.253 worker1.rkomandu-ta.cp.fyre.ibm.com
ibm-spectrum-scale-csi-snapshotter-0 1/1 Running 12 (4d2h ago) 39d 10.254.16.14 worker0.rkomandu-ta.cp.fyre.ibm.com

  • kubectl get nodes -o wide

oc get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master0.rkomandu-ta.cp.fyre.ibm.com Ready master 39d v1.22.0-rc.0+a44d0f0 10.17.104.166 Red Hat Enterprise Linux CoreOS 49.84.202110220538-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0-74.rhaos4.9.gitd745cab.el8
master1.rkomandu-ta.cp.fyre.ibm.com Ready master 39d v1.22.0-rc.0+a44d0f0 10.17.113.80 Red Hat Enterprise Linux CoreOS 49.84.202110220538-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0-74.rhaos4.9.gitd745cab.el8
master2.rkomandu-ta.cp.fyre.ibm.com Ready master 39d v1.22.0-rc.0+a44d0f0 10.17.117.1 Red Hat Enterprise Linux CoreOS 49.84.202110220538-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0-74.rhaos4.9.gitd745cab.el8
worker0.rkomandu-ta.cp.fyre.ibm.com Ready worker 39d v1.22.0-rc.0+a44d0f0 10.17.126.141 Red Hat Enterprise Linux CoreOS 49.84.202110220538-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0-74.rhaos4.9.gitd745cab.el8
worker1.rkomandu-ta.cp.fyre.ibm.com Ready worker 39d v1.22.0-rc.0+a44d0f0 10.17.126.253 Red Hat Enterprise Linux CoreOS 49.84.202110220538-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0-74.rhaos4.9.gitd745cab.el8
worker2.rkomandu-ta.cp.fyre.ibm.com Ready worker 39d v1.22.0-rc.0+a44d0f0 10.17.127.141 Red Hat Enterprise Linux CoreOS 49.84.202110220538-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0-74.rhaos4.9.gitd745cab.el8

  • CNSA/Spectrum Scale version

CNSA 5.1.2.1 (Dec 10th GA)

  • Remote Spectrum Scale version
    mmdiag --version

=== mmdiag: version ===
Current GPFS build: "5.1.2.1 ".
Built on Nov 11 2021 at 13:11:41
Running 4 days 17 hours 58 minutes 13 secs, pid 3060

  • Output for ./tools/spectrum-scale-driver-snap.sh -n < csi driver namespace> -v
    This bug was opened earlier with Noobaa team but later it was redirected based on the Events that are posted above moved this to CSI

Tool to collect the CSI snap:

./tools/spectrum-scale-driver-snap.sh -n < csi driver namespace>

I don't have anything specifically collected as this was opened in the Noobaa. However the above steps shows clearly the PVC attached to the noobaa-db-pg doesn't get migrated to worker node when the "kubelet service" is stopped.

Add labels

  • Component:
  • Severity:
  • Customer Impact:
  • Customer Probability:
  • Phase:

Note : See labels for the labels

@baum
Copy link

baum commented Jan 18, 2022

@rkomandu, the db pod was scheduled eventually, so could you provide logs of the "init" db init container:
This is how it looks on my side:

➜  noobaa-operator git:(master) ➜ kubectl logs noobaa-db-pg-0 -c init
uid change has been identified - will change from uid: 0 to new uid: 10001
setting permissions of /var/lib/pgsql for user 10001
changed permissions of /var/lib/pgsql successfully

real	0m0.003s
user	0m0.001s
sys	0m0.001s

Thank you!

@rkomandu
Copy link
Author

I don't have the cluster as-is now. This was tried about a week back and opened first in Noobaa GH. It needs to be recreated, may be you can try with the steps mentioned as it is nothing to do with the noobaa

@deeghuge
Copy link
Member

Hi @rkomandu
From the details pasted above, it looks like ibm-spectrum-scale-csi-attacher-0 was running on node worker0. This is the same node where Kubelet was shutdown. Since Attacher is StatefulSet, its failover is not straight forward as other pods. if it would have been any node other than node where attacher is running then noobaa pod should have failed over as expected.

ibm-spectrum-scale-csi-attacher-0 1/1 Running 12 (4d2h ago) 39d 10.254.16.5 worker0.rkomandu-ta.cp.fyre.ibm.com 
ibm-spectrum-scale-csi-gqltw 3/3 Running 0 40h 10.17.127.141 worker2.rkomandu-ta.cp.fyre.ibm.com 
ibm-spectrum-scale-csi-h78rs 3/3 Running 0 4d2h 10.17.126.141 worker0.rkomandu-ta.cp.fyre.ibm.com 
ibm-spectrum-scale-csi-operator-d844fb754-7d9db 1/1 Running 23 (16h ago) 7d5h 10.254.19.7 worker0.rkomandu-ta.cp.fyre.ibm.com 
ibm-spectrum-scale-csi-provisioner-0 1/1 Running 12 (4d2h ago) 39d 10.254.16.6 worker0.rkomandu-ta.cp.fyre.ibm.com 
ibm-spectrum-scale-csi-resizer-0 1/1 Running 12 (4d2h ago) 39d 10.254.16.4 worker0.rkomandu-ta.cp.fyre.ibm.com 
ibm-spectrum-scale-csi-smp65 3/3 Running 0 4d16h 10.17.126.253 worker1.rkomandu-ta.cp.fyre.ibm.com 
ibm-spectrum-scale-csi-snapshotter-0 1/1 Running 12 (4d2h ago) 39d 10.254.16.14 worker0.rkomandu-ta.cp.fyre.ibm.com

@rkomandu
Copy link
Author

rkomandu commented Jan 19, 2022

Hi @deeghuge , yes, CSI attacher seems to run on worker0 for 39d. I am concerned on this scenario for now as we have noobaa-endpoint running on each node which translates to each node will serve IO internally when requests come from Application nodes.

For the noobaa-db as PVC is attached this problem is observed that it can't move when Failover is detected. Please see if there is a way to fix this. If there is no way you can fix this then it should be a limitation in the field which is risky as node serving (noobaa-db) can get down for various reasons due to Error Injection or HW related problem on the node.

Note: I have tried the Kubelet service on other nodes (for ex: noobaa core pod - no PVC is there for this, noobaa-endpoint running etc) and there is no problem for our IO, as HA service is configured the IP moved over and IO was successful.

@rkomandu
Copy link
Author

@baum , Do you still need the Kubectl logs for the noobaa-db ?

Reason for asking, the noobaa-db pod restarted as the data required for it was on the Storage Cluster(FS) worker node was down. Fyre admin got the node back to Active state 6days back.

 oc get pods -n openshift-storage
NAME                                               READY   STATUS    RESTARTS        AGE
noobaa-core-0                                      1/1     Running   0               8d
noobaa-db-pg-0                                     1/1     Running   1 (6d10h ago)   7d18h

@rkomandu
Copy link
Author

@baum , it shows empty now

kubectl logs noobaa-db-pg-0 -c init

@baum
Copy link

baum commented Jan 19, 2022

@deeghuge re StatefulSet comment

Rescheduling StatefulSet's pods upon a k8s node failure (caused by kubelet stop) is a known and documented issue. To deal with this issue, the NooBaa operator ( and I believe other components of ODF ) implement a controller to force delete their pods from a failing node. Here is a short NooBaa's feature's document and the PR.

Best regards

@deeghuge
Copy link
Member

deeghuge commented Jan 21, 2022

From CSI side for CSI statefulSet needs manual intervention to move StatefulSet from failing node to other one or wait for Kubernetes for take corrective action which might take long time

@deeghuge
Copy link
Member

We are investigating solution to fix this but we have very short runway for next release so can't commit fix unless we complete the investigation

@rkomandu
Copy link
Author

rkomandu commented Jan 24, 2022

@deeghuge

One more observation, the default noobaa backing store has the PVC attached problem when Kubelet service is stopped.

`

Step 1: noobaa-backing store is running on worker1 

NAME                                               READY   STATUS    RESTARTS       AGE     IP              NODE                                  NOMINATED NO
DE   READINESS GATES
noobaa-core-0                                      1/1     Running   0              6h44m   10.254.21.59    worker2.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-db-pg-0                                     1/1     Running   0              6h44m   10.254.21.60    worker2.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-default-backing-store-noobaa-pod-0e46f0f9   1/1     Running   0              6h41m   10.254.14.233   worker1.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-endpoint-8f4c64d67-4mm6l                    1/1     Running   0              85m     10.254.15.28    worker1.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-endpoint-8f4c64d67-b7fs2                    1/1     Running   0              6h40m   10.254.21.62    worker2.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-endpoint-8f4c64d67-n46s4                    1/1     Running   0              85m     10.254.21.63    worker2.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-endpoint-8f4c64d67-tc5nc                    1/1     Running   0              6h41m   10.254.14.234   worker1.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-endpoint-8f4c64d67-vsh7j                    1/1     Running   0              85m     10.254.18.94    worker0.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-endpoint-8f4c64d67-z2hfx                    1/1     Running   0              6h40m   10.254.18.36    worker0.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
noobaa-operator-54877b7dc9-rcvf8                   1/1     Running   0              4d1h    10.254.19.40    worker0.rkomandu-ta.cp.fyre.ibm.com   <none>
     <none>
     
     Step 2:  Kubelet service is down 

[core@worker1 ~]$ sudo systemctl stop kubelet
[core@worker1 ~]$

Step 3:  backing store went to worker0 
NAME                                               READY   STATUS              RESTARTS       AGE     IP              NODE                                  NO
MINATED NODE   READINESS GATES
noobaa-core-0                                      1/1     Running             0              6h47m   10.254.21.59    worker2.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-db-pg-0                                     1/1     Running             0              6h47m   10.254.21.60    worker2.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-default-backing-store-noobaa-pod-0e46f0f9   0/1     ContainerCreating   0              16s     <none>          worker0.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-endpoint-8f4c64d67-4465r                    0/1     Pending             0              16s     <none>          <none>                                <n
one>           <none>
noobaa-endpoint-8f4c64d67-b7fs2                    1/1     Running             0              6h43m   10.254.21.62    worker2.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-endpoint-8f4c64d67-l2f8c                    0/1     Pending             0              16s     <none>          <none>                                <n
one>           <none>
noobaa-endpoint-8f4c64d67-n46s4                    1/1     Running             0              88m     10.254.21.63    worker2.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-endpoint-8f4c64d67-vsh7j                    1/1     Running             0              88m     10.254.18.94    worker0.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-endpoint-8f4c64d67-z2hfx                    1/1     Running             0              6h43m   10.254.18.36    worker0.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-operator-54877b7dc9-rcvf8                   1/1     Running             0              4d1h    10.254.19.40    worker0.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>


Step 4:  CrashLoopback off state 
NAME                                               READY   STATUS             RESTARTS        AGE     IP              NODE                                  NO
MINATED NODE   READINESS GATES
noobaa-core-0                                      1/1     Running            0               7h33m   10.254.21.59    worker2.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-db-pg-0                                     1/1     Running            0               7h33m   10.254.21.60    worker2.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-default-backing-store-noobaa-pod-0e46f0f9   0/1     CrashLoopBackOff   10 (5m4s ago)   46m     10.254.18.127   worker0.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-endpoint-8f4c64d67-4465r                    0/1     Pending            0               46m     <none>          <none>                                <n
one>           <none>
noobaa-endpoint-8f4c64d67-b7fs2                    1/1     Running            0               7h29m   10.254.21.62    worker2.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-endpoint-8f4c64d67-l2f8c                    0/1     Pending            0               46m     <none>          <none>                                <n
one>           <none>
noobaa-endpoint-8f4c64d67-n46s4                    1/1     Running            0               134m    10.254.21.63    worker2.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-endpoint-8f4c64d67-vsh7j                    1/1     Running            0               134m    10.254.18.94    worker0.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-endpoint-8f4c64d67-z2hfx                    1/1     Running            0               7h29m   10.254.18.36    worker0.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>
noobaa-operator-54877b7dc9-rcvf8                   1/1     Running            0               4d2h    10.254.19.40    worker0.rkomandu-ta.cp.fyre.ibm.com   <n
one>           <none>

Step 5:  oc describe pod of backing store (snippet for Events log) 

Events:
  Type     Reason                  Age                    From                     Message
  ----     ------                  ----                   ----                     -------
  Normal   Scheduled               41m                    default-scheduler        Successfully assigned openshift-storage/noobaa-default-backing-store-noobaa-pod-0e46f0f9 to worker0.rkomandu-ta.cp.fyre.ibm.com
  Warning  FailedAttachVolume      41m                    attachdetach-controller  Multi-Attach error for volume "pvc-18a7288d-11c5-456f-ae1f-b199a7716ce3" Volume is already exclusively attached to one node and can't be attached to another
  Warning  FailedMount             39m                    kubelet                  Unable to attach or mount volumes: unmounted volumes=[noobaastorage], unattached volumes=[tmp-logs-vol kube-api-access-blp86 noobaastorage]: timed out waiting for the condition
  Warning  FailedMount             37m                    kubelet                  Unable to attach or mount volumes: unmounted volumes=[noobaastorage], unattached volumes=[noobaastorage tmp-logs-vol kube-api-access-blp86]: timed out waiting for the condition
  Normal   SuccessfulAttachVolume  35m                    attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-18a7288d-11c5-456f-ae1f-b199a7716ce3"
  Normal   AddedInterface          35m                    multus                   Add eth0 [10.254.18.127/22] from openshift-sdn
  Normal   Pulled                  30m (x5 over 35m)      kubelet                  Container image "quay.io/rhceph-dev/odf4-mcg-core-rhel8@sha256:5507f2c1074bfb023415f0fef16ec42fbe6e90c540fc45f1111c8c929e477910" already present on machine
  Normal   Created                 30m (x5 over 34m)      kubelet                  Created container noobaa-agent
  Normal   Started                 30m (x5 over 34m)      kubelet                  Started container noobaa-agent
  Warning  BackOff                 4m20s (x106 over 33m)  kubelet                  Back-off restarting failed container


So here the worker1 to worker0 is not possible for the backing store to move.. 

oc get pvc
NAME                                               STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                AGE
db-noobaa-db-pg-0                                  Bound    pvc-add68276-5bb5-496a-9e7b-c7faa4f88960   50Gi       RWO            ibm-spectrum-scale-sample   7h35m
noobaa-default-backing-store-noobaa-pvc-0e46f0f9   Bound    pvc-18a7288d-11c5-456f-ae1f-b199a7716ce3   50Gi       RWO            ibm-spectrum-scale-sample   7h33m
noobaa-s3resvol-pvc-4080029599                     Bound    noobaa-s3respv-4080029599                  50Gi       RWX                                        7h33m

@rkomandu
Copy link
Author

rkomandu commented Jan 24, 2022

@baum , for HPO (IBM SS) we use the NSFS as you might very well know, how does this backingstore really affect us ?

Please see below

`oc get pods -o wide
NAME                                               READY   STATUS             RESTARTS         AGE     IP              NODE                                  NOMINATED NODE   READINESS GATES
noobaa-core-0                                      1/1     Running            0                7h38m   10.254.21.59    worker2.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
noobaa-db-pg-0                                     1/1     Running            0                7h38m   10.254.21.60    worker2.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
noobaa-default-backing-store-noobaa-pod-0e46f0f9   0/1     CrashLoopBackOff   11 (4m54s ago)   51m     10.254.18.127   worker0.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
noobaa-endpoint-8f4c64d67-4465r                    0/1     Pending            0                51m     <none>          <none>                                <none>           <none>
noobaa-endpoint-8f4c64d67-b7fs2                    1/1     Running            0                7h35m   10.254.21.62    worker2.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
noobaa-endpoint-8f4c64d67-l2f8c                    0/1     Pending            0                51m     <none>          <none>                                <none>           <none>
noobaa-endpoint-8f4c64d67-n46s4                    1/1     Running            0                139m    10.254.21.63    worker2.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
noobaa-endpoint-8f4c64d67-vsh7j                    1/1     Running            0                139m    10.254.18.94    worker0.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
noobaa-endpoint-8f4c64d67-z2hfx                    1/1     Running            0                7h35m   10.254.18.36    worker0.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
noobaa-operator-54877b7dc9-rcvf8                   1/1     Running            0                4d2h    10.254.19.40    worker0.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
ocs-metrics-exporter-7955bfc785-8dsgr              1/1     Running            0                4d2h    10.254.20.229   worker2.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
ocs-operator-57d785c8c7-q5smj                      1/1     Running            0                46m     10.254.18.124   worker0.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
ocs-operator-57d785c8c7-qtqdv                      1/1     Terminating        19 (123m ago)    4d2h    10.254.14.140   worker1.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
odf-console-756c9c8bc7-4gtvv                       1/1     Running            0                4d2h    10.254.20.230   worker2.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
odf-operator-controller-manager-89746b599-27v9h    2/2     Running            19 (48m ago)     4d2h    10.254.20.228   worker2.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
rook-ceph-operator-74864f7c6f-8f8d2                1/1     Running            0                46m     10.254.18.125   worker0.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>
rook-ceph-operator-74864f7c6f-k8l2w                1/1     Terminating        0                4d2h    10.254.14.141   worker1.rkomandu-ta.cp.fyre.ibm.com   <none>           <none>

`

As a result of the worker1 node Kubelet service down, the rook-ceph-operator-74864f7c6f-k8l2w, ocs-operator-57d785c8c7-qtqdv pods also are in Terminating state.

Could you talk to Nimrod about this ?

@rkomandu
Copy link
Author

@deeghuge ,

This defect is kind of blocker for GA and it was discussed in the HPO DCT call.

@deeghuge
Copy link
Member

deeghuge commented Jan 24, 2022

Hi @rkomandu , This behaviour is expected if node with CSI Attacher statefulSet goes down. Since attacher is down, pod moving from Kubelet down node to other nodes will keep failing until CSI Attacher statefulSet comes back.

For robustness we do have documented suggestion for statefulSet deployment. IFffollowed there is less likely chance to get into issue you are seeing.
https://www.ibm.com/docs/en/spectrum-scale-csi?topic=planning-deployment-considerations

Node selection for StatefulSets: CSI external attacher and CSI external provisioner are sidecar containers that run as two separate StatefulSets. These pods can be scheduled on any of the worker nodes by Kubernetes. As a best practice, it is recommended to run these pods on two separate stable nodes. The StatefulSets by design of Kubernetes do not automatically fail over to another node, hence it is recommended to schedule them to run on reliable nodes. On Red Hat OpenShift, if the infrastructure nodes are worker nodes, it is recommended to schedule the sidecar containers to run on infrastructure nodes. Scheduling them to run on specific nodes can be achieved by using nodes labels and nodeSelectors. For more information, see Using the node selector. IBM Spectrum Scale Container Storage Interface driver pod must also be scheduled on the nodes that run StatefulSets.

@rkomandu
Copy link
Author

rkomandu commented Jan 24, 2022

Hi @deeghuge ,

For HPO GA solution we have 3MW (stacked master-worker) and the Failover/Failback can occur on any node. Even the power / node failure might happen, then how it can be addressed ?

Note: On Fyre we are trying with 3M+3W due to Memory constraints and the above exercise is on this env. However the case, it does affect the operation of HPO

@troppens
Copy link

@deeghuge, we are using this volume as PV for a Postgres DB. We need a mechanism for fully automated HA, in case of for instance OCP node failures. Is this doable for a compact OpenShift Cluster and are there best practices for configuring CSI?

@Jainbrt
Copy link
Member

Jainbrt commented Jan 24, 2022

@troppens we do already have long pending defect/request #65 to address the HA for attacher/prvisioner sidecar and due to resource limit it is getting de-prioritized.

@deeghuge
Copy link
Member

deeghuge commented Jan 31, 2022

There are two scenarios to reported problem

  1. When CSI-Attacher is running on same nodes as noobaa-db pod and Kubelet goes down, noobaa-db never comes up --> This is under investigation
  2. When CSI-Attacher is running on different node and noobaa-db pod runs on different node and Kubelet goes down on nodes where noobaa-db is running then noobaa-db pod comes up but it takes around the ~6 minutes. I suspect this is Kubernetes behaviour around timeout. Any concern for this ?

Also while testing on Ravi's setup Noobaa-db pods keep crashing with following errors. What can be the reason for same ?

[[email protected] ~]# oc logs noobaa-db-pg-0
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2022-01-31 13:26:54.644 UTC [22] LOG:  starting PostgreSQL 12.9 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), 64-bit
2022-01-31 13:26:54.647 UTC [22] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-01-31 13:26:54.651 UTC [22] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-01-31 13:26:54.721 UTC [22] LOG:  redirecting log output to logging collector process
2022-01-31 13:26:54.721 UTC [22] HINT:  Future log output will appear in directory "log".
 done
server started
/var/run/postgresql:5432 - accepting connections
=> sourcing /usr/share/container-scripts/postgresql/start/set_passwords.sh ...
ERROR:  tuple already updated by self

@rkomandu @baum

@troppens
Copy link

@deeghuge - In a first step we need HA. 6 minutes is not ideal, but at least a baseline. Over time we can think about how to reduce fail-over time.

@rkomandu
Copy link
Author

rkomandu commented Feb 1, 2022

@deeghuge

I tried to follow the steps for this bug resolution using the https://access.redhat.com/solutions/5668581. Thanks to @aspalazz for providing the downloaded file (as i don't have access to RedHat Subscription).

Tried steps as per documentation and it didn't get the noobaa-db-pg-0 into Running state from CrashLoopBackOff.

As there is no deployment for the noobaa-db-pg-0, only statefulset tried to set replica of the pod to "0" and then set replica to "1"

[[email protected] ~]# oc scale statefulset noobaa-db-pg --replicas=0
statefulset.apps/noobaa-db-pg scaled

[[email protected] ~]# oc scale statefulset noobaa-db-pg --replicas=1
statefulset.apps/noobaa-db-pg scaled

This resulted in the noobaa-db-pg-0 pod coming back

noobaa-db-pg-0                                     1/1     Running   0               6m14s   10.254.18.22    worker0.rkomandu-ta.cp.fyre.ibm.com   <none>
      <none>

touch test of accounts worked for now. Will try new users and IO..

FYI
 mmdas account list

 Name                   UID     GID     New buckets path
 ----                   ---     ---     ----------------
 [email protected]        5300    5555    /mnt/remote-sample/user-5300-bucket-27jan/
 [email protected]        5301    5555    /mnt/remote-sample/user-5301-bucket-27jan/
 [email protected]        5302    5555    /mnt/remote-sample/user-5302-bucket-27jan/

As per our yday discussion and @troppens input, please continue to work on the Scenario 1 & 2.

[[email protected] ~]# oc scale statefulset noobaa-db-pg --replicas=0
statefulset.apps/noobaa-db-pg scaled
[[email protected] ~]# oc debug noobaa-db-pg-0
Error from server (NotFound): pods "noobaa-db-pg-0" not found
[[email protected] ~]# oc scale statefulset noobaa-db-pg --replicas=1
statefulset.apps/noobaa-db-pg scaled

@rkomandu
Copy link
Author

rkomandu commented Feb 1, 2022

@deeghuge
For scenario 1, where the noobaa-db-pg-0 and csi-attacher are on different nodes, the noobaa-db-pg-0 came back into Running state in 6mXSec. However there is a problem for the new user creation or new bucket creation eventhough noobaa-db-pg-0 is Active.

`
Can't create new bucket eventhough noobaa-db-pg-0 is running
[root@rkomandu-app-node1 scripts]# s3u5300 mb s3://newbucket-worker1-noobaa-db-down-5300
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.17.127.178'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.17.127.178'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.17.127.178'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
make_bucket failed: s3://newbucket-worker1-noobaa-db-down-5300 Read timeout on endpoint URL: "https://10.17.127.178/newbucket-worker1-noobaa-db-down-5300"

Could upload to the already existing bucket

[root@rkomandu-app-node1 scripts]# s3u5300 cp /bin/date s3://newbucket-u5300-01feb
urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.17.127.178'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
upload: ../../bin/date to s3://newbucket-u5300-01feb/date

noobaa-db-pg-0 1/1 Running 0 73m 10.254.23.217 worker2.rkomandu-ta.cp.fyre.ibm.com

`
This is really a Blocker. Now as this DB issue we have to ask Noobaa team or CSI will try to do further check

mmdas account create [email protected] --uid 5303 --gid 5555
Something went wrong while processing the request.
Check 'ibm-spectrum-scale-das-endpoint' pod logs in 'ibm-spectrum-scale-das' namespace for more details

@rkomandu
Copy link
Author

rkomandu commented Feb 1, 2022

@baum , would you check from noobaa-db perspective ? If the node is made down the noobaa-db-pg-0 moved to the other node but the functioning has stopped like

For ex: below are failing
-- any new bucket creation
-- any account creation

oc get nodes
NAME                                  STATUS     ROLES    AGE   VERSION
master0.rkomandu-ta.cp.fyre.ibm.com   Ready      master   53d   v1.22.0-rc.0+a44d0f0
master1.rkomandu-ta.cp.fyre.ibm.com   Ready      master   53d   v1.22.0-rc.0+a44d0f0
master2.rkomandu-ta.cp.fyre.ibm.com   Ready      master   53d   v1.22.0-rc.0+a44d0f0
worker0.rkomandu-ta.cp.fyre.ibm.com   Ready      worker   53d   v1.22.0-rc.0+a44d0f0
worker1.rkomandu-ta.cp.fyre.ibm.com   NotReady   worker   53d   v1.22.0-rc.0+a44d0f0
worker2.rkomandu-ta.cp.fyre.ibm.com   Ready      worker   53d   v1.22.0-rc.0+a44d0f0

@baum
Copy link

baum commented Feb 1, 2022

@deeghuge re ERROR: tuple already updated by self during PostgreSQL startup might indicate corrupted database data structures such as indexes/catalogs. It might be a part of crash recovery. Do you see the DB eventually start operation?

@baum
Copy link

baum commented Feb 1, 2022

@rkomandu DB restart would cause interruption for some period of time however, once the DB is up and the noobaa core reconnected, you should be able to create buckets, accounts, etc.

Does the NooBaa CR phase is Ready?

➜  oc get noobaa noobaa
NAME     MGMT-ENDPOINTS                   S3-ENDPOINTS                     STS-ENDPOINTS                    IMAGE                                PHASE   AGE
noobaa   ["https://192.168.65.4:31337"]   ["https://192.168.65.4:31839"]   ["https://192.168.65.4:32196"]   noobaa/noobaa-core:5.10.0-20220120   Ready   5d4h

@rkomandu
Copy link
Author

rkomandu commented Feb 1, 2022

@rkomandu DB restart would cause interruption for some period of time however, once the DB is up and the noobaa core reconnected, you should be able to create buckets, accounts, etc.

Does the NooBaa CR phase is Ready?

➜  oc get noobaa noobaa
NAME     MGMT-ENDPOINTS                   S3-ENDPOINTS                     STS-ENDPOINTS                    IMAGE                                PHASE   AGE
noobaa   ["https://192.168.65.4:31337"]   ["https://192.168.65.4:31839"]   ["https://192.168.65.4:32196"]   noobaa/noobaa-core:5.10.0-20220120   Ready   5d4h

Noobaa is ready, otherwise how the IO worked for the existing buckets to upload the data.

oc get noobaa noobaa
NAME     MGMT-ENDPOINTS                    S3-ENDPOINTS                                                                                  IMAGE                                                                                                            PHASE   AGE
noobaa   ["https://10.17.127.141:32227"]   ["https://10.17.126.253:32532","https://10.17.126.141:32532","https://10.17.127.141:32532"]   quay.io/rhceph-dev/odf4-mcg-core-rhel8@sha256:5507f2c1074bfb023415f0fef16ec42fbe6e90c540fc45f1111c8c929e477910   Ready   5d10h


There is still a problem in the DB which Noobaa team need to investigate as per my above post, where DB is running but the accounts , buckets can't be created

@rkomandu
Copy link
Author

rkomandu commented Feb 7, 2022

@deeghuge ,
Noobaa team wants to close the bug 6853 which was originated there and then moved to CSI as per this issue. Can I close that defect or do you still need anything from them ?

@deeghuge
Copy link
Member

deeghuge commented Feb 7, 2022

Hi @rkomandu , we can close the 6853 for now. if required in future after CSI fix we can reopen

@deeghuge deeghuge added this to the v2.6.0 milestone Mar 10, 2022
@rkomandu
Copy link
Author

rkomandu commented Mar 31, 2022

@deeghuge

Just a comment here: this is on the 2.5.0 with the csi-attacher-0/1 running as sts on two different nodes

NAME READY AGE
statefulset.apps/ibm-spectrum-scale-csi-attacher 2/2 8d

oc get lease -n ibm-spectrum-scale-csi
NAME HOLDER AGE
external-attacher-leader-spectrumscale-csi-ibm-com ibm-spectrum-scale-csi-attacher-1 8d
ibm-spectrum-scale-csi-operator ibm-spectrum-scale-csi-operator-9c4684b76-m49sp_1c5098e4-e3d3-4a36-843d-0e9056bec625 8d

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE REA
DINESS GATES
ibm-spectrum-scale-csi-attacher-0 1/1 Running 132 (142m ago) 8d 10.254.16.26 worker0.rkomandu-513.cp.fyre.ibm.com
ibm-spectrum-scale-csi-attacher-1 1/1 Running 134 (142m ago) 8d 10.254.12.24 worker1.rkomandu-513.cp.fyre.ibm.com

Noobaa-db-pg-0 is running on the worker2 node. Issued the worker2 node down

oc get nodes
NAME STATUS ROLES AGE VERSION
worker0.rkomandu-513.cp.fyre.ibm.com Ready worker 8d v1.22.3+e790d7f
worker1.rkomandu-513.cp.fyre.ibm.com Ready worker 8d v1.22.3+e790d7f
worker2.rkomandu-513.cp.fyre.ibm.com NotReady worker 8d v1.22.3+e790d7f

Now the noobaa-db-pg-0 pod takes about 6m Xsec to migrate to the worker1 node as it waits in Init state for these many mins.

noobaa-db-pg-0 1/1 Running 0 35m 10.254.12.27 worker1.rkomandu-513.cp.fyre.ibm.com

Event showed as follows 

  Warning  FailedMount             4m23s      kubelet                  Unable to attach or mount volumes: unmounted volumes=[db], unattached volumes=[db kube-ss-ntdws noobaa-postgres-initdb-sh-volume noobaa-postgres-config-volume]: timed out waiting for the condition
  Warning  FailedMount             2m7s       kubelet                  Unable to attach or mount volumes: unmounted volumes=[db], unattached volumes=[noobaa-pinitdb-sh-volume noobaa-postgres-config-volume db kube-api-access-ntdws]: timed out waiting for the condition
  Normal   SuccessfulAttachVolume  25s        attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-d8bbc960-29cc-4723-a6ee-dab2bdc56ec2"

After 6mXsec it moved from Init state to Running state on the worker1 node (migrated from worker2 --> worker1)

With the sts in place for csi-attacher, is the volume hasn't taken into consideration ?

@troppens , we need to add this into our documentation.

@Jainbrt Jainbrt added Customer Probability: Medium (3) Issue occurs in normal path but specific limited timing window, or other mitigating factor Customer Impact: Localized high impact (3) Reduction of function. Significant impact to workload. Severity: 2 Indicates that the issue is critical and must be addressed before milestone. labels Apr 7, 2022
@deeghuge
Copy link
Member

@nitishkumar4 @Jainbrt This should be fixed by #722 right ?

@Jainbrt
Copy link
Member

Jainbrt commented May 26, 2022

@rkomandu could you please verify the fix with latest CSI 2.6.0 images ?

@amdabhad amdabhad added the Type: Needs Verification by Originator A fix (code, doc update, workaround) has been given and needs verification + closure if agreed label May 31, 2022
@rkomandu
Copy link
Author

rkomandu commented Jun 7, 2022

We are in the process of installing the CNSA 514 interim builds. Once we do that then will be able to verify this.

@deeghuge
Copy link
Member

@rkomandu Please help verify and close this

@deeghuge deeghuge linked a pull request Jun 30, 2022 that will close this issue
10 tasks
@NeeshaPan
Copy link

NeeshaPan commented Jul 8, 2022

After failback of Noobaa-db and Noobaa-core pod that were running on the same node. noobaa-db does not up, it remains in CrashLoopBackOff state.

Steps to repo:

  1. Faiover the node that have both Noobaa-db and Noobaa-core pod running using systemctl stop kubelet command.
  2. Try to run I/O and create account.
  3. After that, do the failback using systemctl start kubelet command.
  4. After failback, noobaa-db pod is in in CrashLoopBackOff state.

Output of systemctl stop kubelet command

core@hpo-app15 ~]$ sudo systemctl stop kubelet
[core@hpo-app15 ~]$
[core@hpo-app15 ~]$ sudo systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-mco-default-madv.conf, 20-logging.conf, 20-nodenet.conf
   Active: inactive (dead) since Thu 2022-07-07 10:07:21 UTC; 8s ago

After failover noobaa-db is up and running

Output of openshift-storage pods after failover

Every 5.0s: oc get pods -n openshift-storage -o wide; oc get pods -n ibm-spectrum-scale-das -o wide; oc get svc -A |g...  hpo-app11: Thu Jul  7 06:16:44 2022
NAME                                               READY   STATUS        RESTARTS      AGE     IP             NODE        NOMINATED NODE   READINESS GATES
csi-addons-controller-manager-5cf89687fb-7cknl     2/2     Terminating   0             3d1h    10.128.5.69    hpo-app15   <none>           <none>
csi-addons-controller-manager-5cf89687fb-xklml     2/2     Running       0             3m36s   10.128.2.157   hpo-app13   <none>           <none>
noobaa-core-0                                      1/1     Running       0             8m42s   10.128.2.119   hpo-app13   <none>           <none>
noobaa-db-pg-0                                     1/1     Running       0             8m42s   10.128.2.164   hpo-app13   <none>           <none>
noobaa-default-backing-store-noobaa-pod-7db3b453   1/1     Running       0             5d15h   10.128.1.8     hpo-app12   <none>           <none>
noobaa-endpoint-8cf8d9bfc-4b5sr                    1/1     Running       1 (44h ago)   5d15h   10.128.1.10    hpo-app12   <none>           <none>
noobaa-endpoint-8cf8d9bfc-gv4fq                    1/1     Running       0             3h39m   10.128.2.103   hpo-app13   <none>           <none>
noobaa-endpoint-8cf8d9bfc-mp9q8                    0/1     Pending       0             8m42s   <none>         <none>      <none>           <none>
noobaa-operator-58789697c6-92t7g                   1/1     Running       0             3d1h    10.128.1.138   hpo-app12   <none>           <none>
ocs-metrics-exporter-77d97594f4-2fpjs              1/1     Running       0             3m36s   10.128.2.163   hpo-app13   <none>           <none>
ocs-metrics-exporter-77d97594f4-zsbc5              1/1     Terminating   0             3d1h    10.128.5.73    hpo-app15   <none>           <none>
ocs-operator-8668749db6-gss52                      1/1     Running       0             3d1h    10.128.1.141   hpo-app12   <none>           <none>
odf-console-5f886c99d6-7fj7w                       1/1     Running       0             5d16h   10.128.1.0     hpo-app12   <none>           <none>
odf-operator-controller-manager-7bfb6545cd-mvnfj   2/2     Running       0             3d1h    10.128.1.140   hpo-app12   <none>           <none>
rook-ceph-operator-86698f57bc-kv9hf                1/1     Running       0             3m36s   10.128.0.141   hpo-app12   <none>           <none>
rook-ceph-operator-86698f57bc-zjbpf                1/1     Terminating   0             3d1h    10.128.5.70    hpo-app15   <none>           <none>
NAME                                                         READY   STATUS              RESTARTS        AGE     IP             NODE        NOMINATED NODE
READINESS GATES
ibm-spectrum-scale-das-controller-manager-5fffd98fcf-4g4v9   0/2     ContainerCreating   0               3m36s   <none>         hpo-app13   <none>
<none>
ibm-spectrum-scale-das-controller-manager-5fffd98fcf-qbccz   2/2     Terminating         2 (3h39m ago)   5d16h   10.128.5.72    hpo-app15   <none>
<none>
ibm-spectrum-scale-das-endpoint-879746999-2ph77              1/1     Running             0               3m36s   10.128.2.155   hpo-app13   <none>
<none>
ibm-spectrum-scale-das-endpoint-879746999-67h25              1/1     Running             0               5d15h   10.128.1.9     hpo-app12   <none>
<none>
ibm-spectrum-scale-das-endpoint-879746999-nfwwr              1/1     Terminating         0               3d1h    10.128.5.74    hpo-app15   <none>
<none>
openshift-storage                                  das-s3-hpo-app12                                            LoadBalancer   172.30.231.104   10.49.0.109
                         80:30328/TCP,443:31631/TCP,8444:31176/TCP,7004:30190/TCP   5d15h
openshift-storage                                  das-s3-hpo-app13                                            LoadBalancer   172.30.220.173   10.49.0.110
                         80:30546/TCP,443:31607/TCP,8444:30223/TCP,7004:30416/TCP   5d15h
openshift-storage                                  das-s3-hpo-app15                                            LoadBalancer   172.30.42.127    10.49.0.111
                         80:32735/TCP,443:30824/TCP,8444:32166/TCP,7004:30013/TCP   5d15h
openshift-storage                                  noobaa-mgmt                                                 LoadBalancer   172.30.159.60    <pending>
                         80:31396/TCP,443:30132/TCP,8445:30397/TCP,8446:31731/TCP   5d15h
openshift-storage                                  s3                                                          LoadBalancer   172.30.206.147   <pending>
                         80:30934/TCP,443:30179/TCP,8444:32357/TCP,7004:32195/TCP   5d15h

Account creation output

 [root@hpo-app11 ~]# mmdas account create s3user2tmp --uid 8092  --gid 9002  --newBucketsPath /mnt/remote-sample/export-user2
Account is created successfully. The secret and access keys are as follows.
 Secret Key                                     Access Key
 ----------                                     -----------
 686pN2fJCKKAlBBaxaW0bd6j6K9CKkWPe/T/jxVj       O8awlIyjFi9N4ODg11b7

Output of bucket create command

root@hpo-app11 cnsa-514-builds]# s3user1 mb s3://newbuckettemp
/usr/local/aws/lib/python3.6/site-packages/urllib3/connectionpool.py:1050: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.49.0.109'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
make_bucket: newbuckettemp

[root@hpo-app11 cnsa-514-builds]# s3user2 mb s3://newbucket
/usr/local/aws/lib/python3.6/site-packages/urllib3/connectionpool.py:1050: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.49.0.110'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
make_bucket: newbucket

[root@hpo-app11 ~]# s3user3  mb s3://newbucketuser3
/usr/local/aws/lib/python3.6/site-packages/urllib3/connectionpool.py:1050: InsecureRequestWarning: Unverified HTTPS request is being made to host '10.49.0.111'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
make_bucket: newbucketuser3

Failback

[core@hpo-app15 ~]$ sudo systemctl start kubelet
[core@hpo-app15 ~]$ sudo systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-mco-default-madv.conf, 20-logging.conf, 20-nodenet.conf
   Active: active (running) since Thu 2022-07-07 10:23:56 UTC; 10s ago
  Process: 2012301 ExecStartPre=/bin/rm -f /var/lib/kubelet/memory_manager_state (code=exited, status=0/SUCCESS)
  Process: 2012298 ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state (code=exited, status=0/SUCCESS)
  Process: 2012294 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests (code=exited, status=0/SUCCESS

Openshift-storage pods

Every 5.0s: oc get pods -n openshift-storage -o wide; oc get pods -n ibm-spectrum-scale-das -o wide; oc get svc -A |g...  hpo-app11: Thu Jul  7 06:43:16 2022

NAME                                               READY   STATUS             RESTARTS        AGE     IP             NODE        NOMINATED NODE   READINESS G
ATES
csi-addons-controller-manager-5cf89687fb-xklml     2/2     Running            0               30m     10.128.2.157   hpo-app13   <none>           <none>
noobaa-core-0                                      1/1     Running            0               35m     10.128.2.119   hpo-app13   <none>           <none>
noobaa-db-pg-0                                     0/1     CrashLoopBackOff   8 (2m47s ago)   35m     10.128.2.164   hpo-app13   <none>           <none>
noobaa-default-backing-store-noobaa-pod-7db3b453   1/1     Running            0               5d16h   10.128.1.8     hpo-app12   <none>           <none>
noobaa-endpoint-8cf8d9bfc-4b5sr                    1/1     Running            2 (3m8s ago)    5d16h   10.128.1.10    hpo-app12   <none>           <none>
noobaa-endpoint-8cf8d9bfc-gv4fq                    1/1     Running            0               4h5m    10.128.2.103   hpo-app13   <none>           <none>
noobaa-endpoint-8cf8d9bfc-mp9q8                    1/1     Running            0               35m     10.128.5.144   hpo-app15   <none>           <none>
noobaa-operator-58789697c6-92t7g                   1/1     Running            0               3d2h    10.128.1.138   hpo-app12   <none>           <none>
ocs-metrics-exporter-77d97594f4-2fpjs              1/1     Running            0               30m     10.128.2.163   hpo-app13   <none>           <none>
ocs-operator-8668749db6-gss52                      1/1     Running            0               3d2h    10.128.1.141   hpo-app12   <none>           <none>
odf-console-5f886c99d6-7fj7w                       1/1     Running            0               5d16h   10.128.1.0     hpo-app12   <none>           <none>
odf-operator-controller-manager-7bfb6545cd-mvnfj   2/2     Running            0               3d2h    10.128.1.140   hpo-app12   <none>           <none>
rook-ceph-operator-86698f57bc-kv9hf                1/1     Running            0               30m     10.128.0.141   hpo-app12   <none>           <none>
NAME                                                         READY   STATUS    RESTARTS   AGE     IP             NODE        NOMINATED NODE   READINESS GATES
ibm-spectrum-scale-das-controller-manager-5fffd98fcf-4g4v9   2/2     Running   0          30m     10.128.2.156   hpo-app13   <none>           <none>
ibm-spectrum-scale-das-endpoint-879746999-2ph77              1/1     Running   0          30m     10.128.2.155   hpo-app13   <none>           <none>
ibm-spectrum-scale-das-endpoint-879746999-67h25              1/1     Running   0          5d16h   10.128.1.9     hpo-app12   <none>           <none>

Account creation after failback

[root@hpo-app11 ~]# mmdas account create s3user2tmp1 --uid 8092  --gid 9002  --newBucketsPath /mnt/remote-sample/export-user2
this.begin() must be called before sending queries on this transaction

Output of oc describe pod

Events:
  Type     Reason                  Age                 From                     Message
  ----     ------                  ----                ----                     -------
  Normal   Scheduled               33m                 default-scheduler        Successfully assigned openshift-storage/noobaa-db-pg-0 to hpo-app13
  Warning  FailedAttachVolume      33m                 attachdetach-controller  Multi-Attach error for volume "pvc-d682f776-a1a5-4b10-8b3b-6dcf60aa07dd" Volume is already exclusively attached to one node and can't be attached to another
  Warning  FailedMount             31m                 kubelet                  Unable to attach or mount volumes: unmounted volumes=[db], unattached volumes=[noobaa-postgres-config-volume db kube-api-access-c4hgw noobaa-postgres-initdb-sh-volume]: timed out waiting for the condition
  Warning  FailedMount             28m                 kubelet                  Unable to attach or mount volumes: unmounted volumes=[db], unattached volumes=[db kube-api-access-c4hgw noobaa-postgres-initdb-sh-volume noobaa-postgres-config-volume]: timed out waiting for the condition
  Normal   SuccessfulAttachVolume  27m                 attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-d682f776-a1a5-4b10-8b3b-6dcf60aa07dd"
  Normal   Pulled                  27m                 kubelet                  Container image "quay.io/rhceph-dev/odf4-mcg-core-rhel8@sha256:130374df22aea4a27219f2f927d2e786f95f8ffe639bc373397e9335594c662b" already present on machine
  Normal   AddedInterface          27m                 multus                   Add eth0 [10.128.2.164/23] from openshift-sdn
  Normal   Pulling                 27m                 kubelet                  Pulling image "quay.io/rhceph-dev/rhel8-postgresql-12@sha256:82d171ab0ce78a0157408662155b53d4f637947a303bfecb684f6132f5f468be"
  Normal   Created                 27m                 kubelet                  Created container init
  Normal   Started                 27m                 kubelet                  Started container init
  Normal   Pulled                  26m                 kubelet                  Successfully pulled image "quay.io/rhceph-dev/rhel8-postgresql-12@sha256:82d171ab0ce78a0157408662155b53d4f637947a303bfecb684f6132f5f468be" in 10.700943271s

Snippet of noobaa log

[root@hpo-app11 ~]# oc logs noobaa-db-pg-0 -n openshift-storage
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2022-07-07 12:28:09.645 UTC [22] LOG:  starting PostgreSQL 12.11 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit
2022-07-07 12:28:09.645 UTC [22] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-07-07 12:28:09.646 UTC [22] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-07-07 12:28:09.670 UTC [22] LOG:  redirecting log output to logging collector process
2022-07-07 12:28:09.670 UTC [22] HINT:  Future log output will appear in directory "log".
 done
server started
/var/run/postgresql:5432 - accepting connections
=> sourcing /usr/share/container-scripts/postgresql/start/set_passwords.sh ...
ERROR:  tuple concurrently updated

@NeeshaPan
Copy link

FYI @deeghuge

@deeghuge
Copy link
Member

@nitishkumar4 Please take a look

@nitishkumar4
Copy link
Contributor

nitishkumar4 commented Jul 14, 2022

@NeeshaPan As discussed, the issue is fixed in the above cluster by gracefully stopping all Postgress services and starting it again. However to ensure if this issue is reproducible and is linked to CSI I will require a cluster to look into. Since the cluster where this issue was observed is being used for other purpose right now, please let me know whenever the cluster is available.

@NeeshaPan
Copy link

we have similar issue i.e noobaa/noobaa-core#6953 opened in noobaa repo.

@rkomandu
Copy link
Author

rkomandu commented Aug 1, 2022

@NeeshaPan , continue to try this and let us see if CSI functionality is working or not

@NeeshaPan
Copy link

@deeghuge Could you please take a look into this issue.
As mentioned in above comment i.e #563 (comment) noobaa-db remain in crashloopback state after failback. After discussing the same with noobaa team, found that noobaa-db remains in crashloopback state due to mounted volumes attached to nodes. This issue is reproducible on BM, however not on fyre.

Attached file that have output of pods & logs and procedure followed for FOFB on latest build.
OutputsCapturedDuringFOFB.txt

@rkomandu
Copy link
Author

rkomandu commented Sep 6, 2022

@deeghuge ,

Can we meet around 4PM for 30min to discuss on the failures on the above fix. Discussed with Noobaa team in the last 2 weeks and they are referring to the Volume being accessed from 2 different nodes is the problem for the Postgres database is failing

Sending the invite for the same. Please add if you need anyone to join.

@NeeshaPan FYI..

@deeghuge deeghuge assigned deeghuge and unassigned nitishkumar4 Sep 6, 2022
@deeghuge
Copy link
Member

deeghuge commented Sep 6, 2022

Thanks @rkomandu @NeeshaPan for the discussion. Here is the summery of discussion

The original issue was - db pod not getting into running state due to attacher(statefulset) was unavailable. The attacher issue was fixed and new logs uploaded shows the same and it is clearly new issue than the original.

Here are few questions we need to get answered from noobaa and k8s experts.

  1. Why noobaa db was restarted when kubelet on app7 node was started ?
  2. Why does noobaa db is in crashloopback ? You mentioned that it is because two db writer on different node. Other than noobaa db pod, no other entity like csi driver or spectrum scale writes anything to any volume. So question is when kubelet is down, the old db pod on app7 node which should be cleaned up by k8s is causing that write when kubelet on node 7 is started ? I think we need kubenetes/kubelet expert to help here
  3. I also noticed one of the comment that this is only reproducible only compact cluster so we should check with expert if shutting down one master node is causing any interference in cleanup process.

To debug further the multi-attach error seen on restarted db pod ( which went away after sometime ) we should capture output of oc get volumeattachment in various stages.

Also please note Spectrum Scale is shared filesystem so data is always available on all the nodes. We rely on k8s for ReadWriteOnce functionality. So it is k8s which make sure only one pod is accessing RWO at any given time.

@rkomandu
Copy link
Author

rkomandu commented Sep 7, 2022

@NeeshaPan , take a look at Deepak comment when collecting logs next time for this situation.

Secondly, also "describe the noobaa-db pod" after the failover and later once fail back is done. We need to understand when does these Events

@rkomandu
Copy link
Author

rkomandu commented Sep 9, 2022

@deeghuge
Alex from Noobaa team has responded on the above Questions. He is still pointing to the PV for the noobaa database.

Could you add yourself to the noobaa repository , so you can respond to his questions noobaa/noobaa-core#6953.

@Jainbrt
Copy link
Member

Jainbrt commented Nov 18, 2022

@deeghuge & @rkomandu can we close this issue if the original issue is fixed and open a new one if there is anything ?

@rkomandu
Copy link
Author

@Jainbrt , the issue is not resolved yet. Few emails are sent to RH team as Deepak said it is to be dealt in the underlying K8's and no response as they are pointing to CSI need to implement as there was mention about RWO some setting (if i recall) ...

@Jainbrt Jainbrt removed the Type: Needs Verification by Originator A fix (code, doc update, workaround) has been given and needs verification + closure if agreed label Nov 18, 2022
@Jainbrt
Copy link
Member

Jainbrt commented Nov 18, 2022

Thanks @rkomandu for the update, I have removed verification label from the same.

@deeghuge
Copy link
Member

Closing this issue as no update since long. Please reopen if RH comes back with analysis and changes are required in CSI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Customer Impact: Localized high impact (3) Reduction of function. Significant impact to workload. Customer Probability: Medium (3) Issue occurs in normal path but specific limited timing window, or other mitigating factor Severity: 2 Indicates that the issue is critical and must be addressed before milestone. Type: Bug Indicates issue is an undesired behavior, usually caused by code error.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants