Skip to content
This repository has been archived by the owner on Mar 26, 2020. It is now read-only.

Documenting solutions to common issues faced while restarting glusterd2 service #885

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions doc/quick-start-user-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,3 +209,54 @@ Verify that `glusterfsd` process is running on both nodes.
* Issues with 2 node clusters
* Restarting glusterd2 does not restore the cluster
* Peer remove doesn't work

### Solutions to common issues

> Note: These are hacks or temporary solutions to the problems. Only use these solutions only if the issues are faced during re-setup.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are hacks or temporary solutions to the problems.

Doesn't instil confidence on glusterd2, from user's perspective ;)


* If glusterd service fails with error: "failed to start embedded store"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to start every sentence with `If glusterd service fails with error. Just the symptom should suffice.


Sample output:
```log
ERRO[2018-06-04 06:05:41.020313] failed to start embedded store error="dial tcp {IP}: connect: connection refused" source="[embed.go:36:store.newEmbedStore]"
```
To solve this you will be required to clean up glusterd2 [working directory](https://github.com/gluster/glusterd2/blob/master/doc/quick-start-user-guide.md#running-glusterd2).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting the workdir will erase all cluster's data. Suggest the workaround only if this issue faced during re-setup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's localstatedir, and not working directory.

The path to default directory used by glusterd2 is "/var/lib/glusterd2/", if using custom config file then please provide working directory path instead of "/var/lib/glusterd2/"

```sh
# rm /var/lib/glusterd2/*
```

* If glusterd service fails with error: "Failed to create pid file"

Sample Output:
```log
FATA[2018-06-04 06:07:22.605017] Failed to create pid file error="Process is already running" source="[main.go:87:main.main]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not occur if pid file exists and process is not running. Please recheck

```
This issue occurs because the pid file already exists, to solve this issue you will be required to delete the already existing glusterd2 pid file. The path to pid file will be provided in the log messages just before this error log meesage.

Sample Output:
```log
DEBU[2018-06-04 06:07:22.604369] running with configuration cert-file= clientaddress=":24007" config=glusterd2.toml.example defaultpeerport=24008 etcdcurls="http://{IP}:2379" etcdendpoints=" []" etcdpurls="http://{IP}:2380" hooksdir=/var/lib/glusterd/hooks key-file= localstatedir=/var/lib/glusterd logdir=/usr/local/var/log/glusterd2 logfile=STDOUT loglevel=debug noembed=false peeraddress="{IP}:240 08" pidfile=/usr/local/var/run/glusterd2/glusterd2.pid rundir=/usr/local/var/run/glusterd2 source="[config.go:125:main.dumpConfigToLog]" statedump=true version=false workdir=/root/src/github.com/gluster/gluste rd2
FATA[2018-06-04 06:07:22.605017] Failed to create pid file error="Process is already running" source="[main.go:87:main.main]"
```

To delete the pid file for the given output:

```sh
# rm /usr/local/var/run/glusterd2/glusterd2.pid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If other instance of glusterd2 is running, removing pid file is not sufficient.

```

* If glusterd service fails with error: "failed to listen"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"failed to listen" can also mean another glusterd2 process is running. You have to be a little more specific about the socket file and what caused it


Sample Output:
```log
FATA[2018-06-04 06:07:38.348333] failed to listen error="listen unix /usr/local/var/run/glusterd2/glusterd2.socket: bind: address already in use" socket=glusterd2.socket source="[server.go:76:sunrpc.NewMuxed]"
```
This issue occurs because the socket address is already in use by old glusterd service. To resolve this issue you will have to delete the socket file to free the socket for the new glusterd service.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add a note to check if any other instance of glusterd2 is running before removing the socket file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The socket file isn't cleaned up on ungraceful or abrupt shutdown such as SIGKILL or a crash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the socket address is already in use by old glusterd service.

It's not.

Either:

  • It was being in use but wasn't cleaned up on shutdown
  • There's another glusterd2 instances running


```sh
# rm /usr/local/var/run/glusterd2/glusterd2.socket
```
The path to socket file will be mentioned in the error message itself.
Note: Check if any other instance of glusterd2 is running before removing the socket file.