-
Notifications
You must be signed in to change notification settings - Fork 82
Bricks are failing to connect to the volume post gluster node reboot #1457
Comments
@atinmu This is due to delay in brick SignIn i believe. @PrasadDesala Can you give the bricks some more time and check after a while if the brick still shows 0. |
@vpandey-RH Its been more than 45 minutes. Still I see bricks are trying to re-connect. |
IS there any change in number of bricks that were previously showing port as 0 ? |
@PrasadDesala Seems like there is no glusterfsd running on the node that was rebooted. Can you check it once ? |
Yes it seems brick process is not running after gluster node reboot. So the brick process is showing as '0' for that node. Below is the output snip of volume status for a volume; After node reboot: |
Taking this out from GCS/1.0 tag considering we're not going to make brick multiplexing a default option in GCS/1.0 release. |
Bricks are failing to connect to the volume post gluster node reboot.
Observed behavior
On a system having 102 PVCs with brick-mux enabled I rebooted gluster-kube1-0 pod. After sometime the gluster pod is back online and is connected to the trusted pool but bricks on that gluster node are failing to connect to the volume.
[root@gluster-kube1-0 /]# ps -ef | grep -i glusterfsd
root 30332 59 0 09:52 pts/3 00:00:00 grep --color=auto -i glusterfsd
[root@gluster-kube1-0 /]# glustercli volume status pvc-db2b6e88-0f29-11e9-aaf6-525400933534
Volume : pvc-db2b6e88-0f29-11e9-aaf6-525400933534
+--------------------------------------+-------------------------------+-----------------------------------------------------------------------------------------+--------+-------+------+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+-------------------------------+-----------------------------------------------------------------------------------------+--------+-------+------+
| 129ac9de-9e60-4227-99df-48d7e17238f9 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-db2b6e88-0f29-11e9-aaf6-525400933534/subvol1/brick1/brick | true | 35692 | 4034 |
| 46a34351-19a2-4fd2-b692-ea07fbe4f71d | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-db2b6e88-0f29-11e9-aaf6-525400933534/subvol1/brick2/brick | false | 0 | 0 |
| 0935a101-2e0d-4c5f-914f-0e4562602950 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-db2b6e88-0f29-11e9-aaf6-525400933534/subvol1/brick3/brick | true | 39067 | 4115 |
+--------------------------------------+-------------------------------+-----------------------------------------------------------------------------------------+--------+-------+------+
I am seeing below continuous messages in glusterd2 logs,
time="2019-01-03 09:52:57.982317" level=error msg="failed to connect to brick, aborting volume profile operation" brick="6257213e-de5c-4ae5-867d-38e0fd5abc0e:/var/run/glusterd2/bricks/pvc-81d554b4-0f27-11e9-aaf6-525400933534/subvol1/brick1/brick" error="dial unix /var/run/glusterd2/e70300fdb0bea4a4.socket: connect: connection refused" reqid=63bce8cc-c403-4978-8137-bb3ae361b496 source="[volume-profile.go:246:volumes.txnVolumeProfile]" txnid=e763af77-19f2-4935-bd02-9c65be68657a
time="2019-01-03 09:52:57.982371" level=error msg="Step failed on node." error="dial unix /var/run/glusterd2/e70300fdb0bea4a4.socket: connect: connection refused" node=6257213e-de5c-4ae5-867d-38e0fd5abc0e reqid=63bce8cc-c403-4978-8137-bb3ae361b496 source="[step.go:120:transaction.runStepFuncOnNodes]" step=volume.Profile txnid=e763af77-19f2-4935-bd02-9c65be68657a
time="2019-01-03 09:52:57.997172" level=info msg="client connected" address="10.233.64.5:48521" server=sunrpc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp
time="2019-01-03 09:52:57.998020" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/var/run/glusterd2/bricks/pvc-82196ac3-0f27-11e9-aaf6-525400933534/subvol1/brick1/brick error="SearchByBrickPath: port for brick /var/run/glusterd2/bricks/pvc-82196ac3-0f27-11e9-aaf6-525400933534/subvol1/brick1/brick not found" source="[rpc_prog.go:104:pmap.(*GfPortmap).PortByBrick]"
time="2019-01-03 09:52:57.998383" level=info msg="client disconnected" address="10.233.64.5:48521" server=sunrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
Expected/desired behavior
Post gluster pod reboot, bricks should connect back to the volume without any issues,
Details on how to reproduce (minimal and precise)
Information about the environment:
The text was updated successfully, but these errors were encountered: