-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PVC atached to a pod doesn't migrate across nodes when Kubelet Service is stopped #563
Comments
@rkomandu, the db pod was scheduled eventually, so could you provide logs of the "init" db init container:
Thank you! |
I don't have the cluster as-is now. This was tried about a week back and opened first in Noobaa GH. It needs to be recreated, may be you can try with the steps mentioned as it is nothing to do with the noobaa |
Hi @rkomandu
|
Hi @deeghuge , yes, CSI attacher seems to run on worker0 for 39d. I am concerned on this scenario for now as we have noobaa-endpoint running on each node which translates to each node will serve IO internally when requests come from Application nodes. For the noobaa-db as PVC is attached this problem is observed that it can't move when Failover is detected. Please see if there is a way to fix this. If there is no way you can fix this then it should be a limitation in the field which is risky as node serving (noobaa-db) can get down for various reasons due to Error Injection or HW related problem on the node. Note: I have tried the Kubelet service on other nodes (for ex: noobaa core pod - no PVC is there for this, noobaa-endpoint running etc) and there is no problem for our IO, as HA service is configured the IP moved over and IO was successful. |
@baum , Do you still need the Kubectl logs for the noobaa-db ? Reason for asking, the noobaa-db pod restarted as the data required for it was on the Storage Cluster(FS) worker node was down. Fyre admin got the node back to Active state 6days back.
|
@baum , it shows empty now kubectl logs noobaa-db-pg-0 -c init |
@deeghuge re StatefulSet comment Rescheduling StatefulSet's pods upon a k8s node failure (caused by Best regards |
From CSI side for CSI statefulSet needs manual intervention to move StatefulSet from failing node to other one or wait for Kubernetes for take corrective action which might take long time |
We are investigating solution to fix this but we have very short runway for next release so can't commit fix unless we complete the investigation |
One more observation, the default noobaa backing store has the PVC attached problem when Kubelet service is stopped. `
|
@baum , for HPO (IBM SS) we use the NSFS as you might very well know, how does this backingstore really affect us ? Please see below
` As a result of the worker1 node Kubelet service down, the rook-ceph-operator-74864f7c6f-k8l2w, ocs-operator-57d785c8c7-qtqdv pods also are in Terminating state. Could you talk to Nimrod about this ? |
This defect is kind of blocker for GA and it was discussed in the HPO DCT call. |
Hi @rkomandu , This behaviour is expected if node with CSI Attacher statefulSet goes down. Since attacher is down, pod moving from Kubelet down node to other nodes will keep failing until CSI Attacher statefulSet comes back. For robustness we do have documented suggestion for statefulSet deployment. IFffollowed there is less likely chance to get into issue you are seeing.
|
Hi @deeghuge , For HPO GA solution we have 3MW (stacked master-worker) and the Failover/Failback can occur on any node. Even the power / node failure might happen, then how it can be addressed ? Note: On Fyre we are trying with 3M+3W due to Memory constraints and the above exercise is on this env. However the case, it does affect the operation of HPO |
@deeghuge, we are using this volume as PV for a Postgres DB. We need a mechanism for fully automated HA, in case of for instance OCP node failures. Is this doable for a compact OpenShift Cluster and are there best practices for configuring CSI? |
There are two scenarios to reported problem
Also while testing on Ravi's setup Noobaa-db pods keep crashing with following errors. What can be the reason for same ?
|
@deeghuge - In a first step we need HA. 6 minutes is not ideal, but at least a baseline. Over time we can think about how to reduce fail-over time. |
I tried to follow the steps for this bug resolution using the https://access.redhat.com/solutions/5668581. Thanks to @aspalazz for providing the downloaded file (as i don't have access to RedHat Subscription). Tried steps as per documentation and it didn't get the noobaa-db-pg-0 into Running state from CrashLoopBackOff. As there is no deployment for the noobaa-db-pg-0, only statefulset tried to set replica of the pod to "0" and then set replica to "1" [[email protected] ~]# oc scale statefulset noobaa-db-pg --replicas=0 [[email protected] ~]# oc scale statefulset noobaa-db-pg --replicas=1 This resulted in the noobaa-db-pg-0 pod coming back
touch test of accounts worked for now. Will try new users and IO..
As per our yday discussion and @troppens input, please continue to work on the Scenario 1 & 2. [[email protected] ~]# oc scale statefulset noobaa-db-pg --replicas=0 |
@deeghuge ` Could upload to the already existing bucket [root@rkomandu-app-node1 scripts]# s3u5300 cp /bin/date s3://newbucket-u5300-01feb noobaa-db-pg-0 1/1 Running 0 73m 10.254.23.217 worker2.rkomandu-ta.cp.fyre.ibm.com `
|
@baum , would you check from noobaa-db perspective ? If the node is made down the noobaa-db-pg-0 moved to the other node but the functioning has stopped like For ex: below are failing
|
@deeghuge re ERROR: tuple already updated by self during PostgreSQL startup might indicate corrupted database data structures such as indexes/catalogs. It might be a part of crash recovery. Do you see the DB eventually start operation? |
@rkomandu DB restart would cause interruption for some period of time however, once the DB is up and the noobaa core reconnected, you should be able to create buckets, accounts, etc. Does the NooBaa CR phase is ➜ oc get noobaa noobaa
NAME MGMT-ENDPOINTS S3-ENDPOINTS STS-ENDPOINTS IMAGE PHASE AGE
noobaa ["https://192.168.65.4:31337"] ["https://192.168.65.4:31839"] ["https://192.168.65.4:32196"] noobaa/noobaa-core:5.10.0-20220120 Ready 5d4h |
Noobaa is ready, otherwise how the IO worked for the existing buckets to upload the data.
There is still a problem in the DB which Noobaa team need to investigate as per my above post, where DB is running but the accounts , buckets can't be created |
@deeghuge , |
Hi @rkomandu , we can close the 6853 for now. if required in future after CSI fix we can reopen |
Just a comment here: this is on the 2.5.0 with the csi-attacher-0/1 running as sts on two different nodes NAME READY AGE oc get lease -n ibm-spectrum-scale-csi NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE REA Noobaa-db-pg-0 is running on the worker2 node. Issued the worker2 node down oc get nodes Now the noobaa-db-pg-0 pod takes about 6m Xsec to migrate to the worker1 node as it waits in Init state for these many mins. noobaa-db-pg-0 1/1 Running 0 35m 10.254.12.27 worker1.rkomandu-513.cp.fyre.ibm.com
After 6mXsec it moved from Init state to Running state on the worker1 node (migrated from worker2 --> worker1) With the sts in place for csi-attacher, is the volume hasn't taken into consideration ? @troppens , we need to add this into our documentation. |
@nitishkumar4 @Jainbrt This should be fixed by #722 right ? |
@rkomandu could you please verify the fix with latest CSI 2.6.0 images ? |
We are in the process of installing the CNSA 514 interim builds. Once we do that then will be able to verify this. |
@rkomandu Please help verify and close this |
After failback of Noobaa-db and Noobaa-core pod that were running on the same node. noobaa-db does not up, it remains in CrashLoopBackOff state. Steps to repo:
Output of systemctl stop kubelet command
After failover noobaa-db is up and running Output of openshift-storage pods after failover
Account creation output
Output of bucket create command
Failback
Openshift-storage pods
Account creation after failback
Output of oc describe pod
Snippet of noobaa log
|
FYI @deeghuge |
@nitishkumar4 Please take a look |
@NeeshaPan As discussed, the issue is fixed in the above cluster by gracefully stopping all Postgress services and starting it again. However to ensure if this issue is reproducible and is linked to CSI I will require a cluster to look into. Since the cluster where this issue was observed is being used for other purpose right now, please let me know whenever the cluster is available. |
we have similar issue i.e noobaa/noobaa-core#6953 opened in noobaa repo. |
@NeeshaPan , continue to try this and let us see if CSI functionality is working or not |
@deeghuge Could you please take a look into this issue. Attached file that have output of pods & logs and procedure followed for FOFB on latest build. |
Can we meet around 4PM for 30min to discuss on the failures on the above fix. Discussed with Noobaa team in the last 2 weeks and they are referring to the Volume being accessed from 2 different nodes is the problem for the Postgres database is failing Sending the invite for the same. Please add if you need anyone to join. @NeeshaPan FYI.. |
Thanks @rkomandu @NeeshaPan for the discussion. Here is the summery of discussion The original issue was - db pod not getting into running state due to attacher(statefulset) was unavailable. The attacher issue was fixed and new logs uploaded shows the same and it is clearly new issue than the original. Here are few questions we need to get answered from noobaa and k8s experts.
To debug further the multi-attach error seen on restarted db pod ( which went away after sometime ) we should capture output of Also please note Spectrum Scale is shared filesystem so data is always available on all the nodes. We rely on k8s for ReadWriteOnce functionality. So it is k8s which make sure only one pod is accessing RWO at any given time. |
@NeeshaPan , take a look at Deepak comment when collecting logs next time for this situation. Secondly, also "describe the noobaa-db pod" after the failover and later once fail back is done. We need to understand when does these Events |
@deeghuge Could you add yourself to the noobaa repository , so you can respond to his questions noobaa/noobaa-core#6953. |
@Jainbrt , the issue is not resolved yet. Few emails are sent to RH team as Deepak said it is to be dealt in the underlying K8's and no response as they are pointing to CSI need to implement as there was mention about RWO some setting (if i recall) ... |
Thanks @rkomandu for the update, I have removed verification label from the same. |
Closing this issue as no update since long. Please reopen if RH comes back with analysis and changes are required in CSI |
Describe the bug
CNSA - 5112 (CSI bundled with this)
For HPO solution, the building blocks are CNSA, CSI and then on top of it DAS operator (internally installs Noobaa) for S3 object Access.
The complete description of the bug was posted in this Noobaa-core component which is a public repository.
noobaa/noobaa-core#6853
This is a problem as I see, Is there a way to get this resolved.
What happens with this later for HPO team is that, the database is init state, HPO admin can't create any new accounts/exports etc.
Temp Workaround which was done is
on worker0 restarted the "Service Kubelet" that was made down earlier and then the noobaa-db-pg pod moved to worker2 w/o any problem. I u/s that this is linked with Kubelet service for the movement of the pod.
Could you take a look at this defect and provide your thoughts/comments ?
Data Collection and Debugging
Environmental output
What openshift/kubernetes version are you running, and the architecture?
oc version
Client Version: 4.9.5
Server Version: 4.9.5
Kubernetes Version: v1.22.0-rc.0+a44d0f0
kubectl get pods -o wide -n < csi driver namespace>
oc get pods -o wide -n ibm-spectrum-scale-csi
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ibm-spectrum-scale-csi-attacher-0 1/1 Running 12 (4d2h ago) 39d 10.254.16.5 worker0.rkomandu-ta.cp.fyre.ibm.com
ibm-spectrum-scale-csi-gqltw 3/3 Running 0 40h 10.17.127.141 worker2.rkomandu-ta.cp.fyre.ibm.com
ibm-spectrum-scale-csi-h78rs 3/3 Running 0 4d2h 10.17.126.141 worker0.rkomandu-ta.cp.fyre.ibm.com
ibm-spectrum-scale-csi-operator-d844fb754-7d9db 1/1 Running 23 (16h ago) 7d5h 10.254.19.7 worker0.rkomandu-ta.cp.fyre.ibm.com
ibm-spectrum-scale-csi-provisioner-0 1/1 Running 12 (4d2h ago) 39d 10.254.16.6 worker0.rkomandu-ta.cp.fyre.ibm.com
ibm-spectrum-scale-csi-resizer-0 1/1 Running 12 (4d2h ago) 39d 10.254.16.4 worker0.rkomandu-ta.cp.fyre.ibm.com
ibm-spectrum-scale-csi-smp65 3/3 Running 0 4d16h 10.17.126.253 worker1.rkomandu-ta.cp.fyre.ibm.com
ibm-spectrum-scale-csi-snapshotter-0 1/1 Running 12 (4d2h ago) 39d 10.254.16.14 worker0.rkomandu-ta.cp.fyre.ibm.com
kubectl get nodes -o wide
oc get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master0.rkomandu-ta.cp.fyre.ibm.com Ready master 39d v1.22.0-rc.0+a44d0f0 10.17.104.166 Red Hat Enterprise Linux CoreOS 49.84.202110220538-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0-74.rhaos4.9.gitd745cab.el8
master1.rkomandu-ta.cp.fyre.ibm.com Ready master 39d v1.22.0-rc.0+a44d0f0 10.17.113.80 Red Hat Enterprise Linux CoreOS 49.84.202110220538-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0-74.rhaos4.9.gitd745cab.el8
master2.rkomandu-ta.cp.fyre.ibm.com Ready master 39d v1.22.0-rc.0+a44d0f0 10.17.117.1 Red Hat Enterprise Linux CoreOS 49.84.202110220538-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0-74.rhaos4.9.gitd745cab.el8
worker0.rkomandu-ta.cp.fyre.ibm.com Ready worker 39d v1.22.0-rc.0+a44d0f0 10.17.126.141 Red Hat Enterprise Linux CoreOS 49.84.202110220538-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0-74.rhaos4.9.gitd745cab.el8
worker1.rkomandu-ta.cp.fyre.ibm.com Ready worker 39d v1.22.0-rc.0+a44d0f0 10.17.126.253 Red Hat Enterprise Linux CoreOS 49.84.202110220538-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0-74.rhaos4.9.gitd745cab.el8
worker2.rkomandu-ta.cp.fyre.ibm.com Ready worker 39d v1.22.0-rc.0+a44d0f0 10.17.127.141 Red Hat Enterprise Linux CoreOS 49.84.202110220538-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0-74.rhaos4.9.gitd745cab.el8
CNSA 5.1.2.1 (Dec 10th GA)
mmdiag --version
=== mmdiag: version ===
Current GPFS build: "5.1.2.1 ".
Built on Nov 11 2021 at 13:11:41
Running 4 days 17 hours 58 minutes 13 secs, pid 3060
./tools/spectrum-scale-driver-snap.sh -n < csi driver namespace> -v
This bug was opened earlier with Noobaa team but later it was redirected based on the Events that are posted above moved this to CSI
Tool to collect the CSI snap:
./tools/spectrum-scale-driver-snap.sh -n < csi driver namespace>
I don't have anything specifically collected as this was opened in the Noobaa. However the above steps shows clearly the PVC attached to the noobaa-db-pg doesn't get migrated to worker node when the "kubelet service" is stopped.
Add labels
Note : See labels for the labels
The text was updated successfully, but these errors were encountered: