-
Notifications
You must be signed in to change notification settings - Fork 82
Documenting solutions to common issues faced while restarting glusterd2 service #885
base: master
Are you sure you want to change the base?
Conversation
@atinmu I have just documented solutions to some of the issues faced while restarting glusterd2 service. I would like to know if this is a suitable way of documenting the solutions to the common issues? |
doc/quick-start-user-guide.md
Outdated
The path to default directory used by glusterd2 is ["/var/lib/glusterd/"](https://github.com/gluster/glusterd2/blob/master/glusterd2.toml.example#L1), if using custom config file then please provide working directory path instead of "/var/lib/glusterd/" | ||
|
||
```sh | ||
# rm /var/lib/glusterd/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to delete the complete glusterd2 directory? it will delete the old ETCD data which is stored in ETCD folder. ETCD config will be stored in store.toml (I think clearing this should be enough). If store.toml is empty glusterd2 should regenerate the store.toml based on the configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is still not addressed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Madhu-1 I will provide a warning, and will send a new PR after testing whether its suffiecient to delete store.toml.
This is definitely a good start and we should make this as a continuous process to capture all the troubleshooting experience. However what I am additionally looking at the document is more of a concrete direction on "Do these x, y, z things before setting up the environment" |
@atinmu Is this PR good to go, or would you like to reformat the documenting? |
I don't see any reason why this PR can not get in. We need to get into a good habit of refreshing this document in a periodic basis or whenever we stumble upon some issues which an user can face frequently. |
doc/quick-start-user-guide.md
Outdated
ERRO[2018-06-04 06:05:41.020313] failed to start embedded store error="dial tcp {IP}: connect: connection refused" source="[embed.go:36:store.newEmbedStore]" | ||
``` | ||
To solve this you will be required to clean up glusterd2 [working directory](https://github.com/gluster/glusterd2/blob/master/doc/quick-start-user-guide.md#running-glusterd2). | ||
The path to default directory used by glusterd2 is ["/var/lib/glusterd/"](https://github.com/gluster/glusterd2/blob/master/glusterd2.toml.example#L1), if using custom config file then please provide working directory path instead of "/var/lib/glusterd/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default is /var/lib/glusterd2
|
||
Sample Output: | ||
```log | ||
FATA[2018-06-04 06:07:22.605017] Failed to create pid file error="Process is already running" source="[main.go:87:main.main]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not occur if pid file exists and process is not running. Please recheck
To delete the pid file for the given output: | ||
|
||
```sh | ||
# rm /usr/local/var/run/glusterd2/glusterd2.pid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If other instance of glusterd2 is running, removing pid file is not sufficient.
```log | ||
FATA[2018-06-04 06:07:38.348333] failed to listen error="listen unix /usr/local/var/run/glusterd2/glusterd2.socket: bind: address already in use" socket=glusterd2.socket source="[server.go:76:sunrpc.NewMuxed]" | ||
``` | ||
This issue occurs because the socket address is already in use by old glusterd service. To resolve this issue you will have to delete the socket file to free the socket for the new glusterd service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add a note to check if any other instance of glusterd2 is running before removing the socket file
```log | ||
ERRO[2018-06-04 06:05:41.020313] failed to start embedded store error="dial tcp {IP}: connect: connection refused" source="[embed.go:36:store.newEmbedStore]" | ||
``` | ||
To solve this you will be required to clean up glusterd2 [working directory](https://github.com/gluster/glusterd2/blob/master/doc/quick-start-user-guide.md#running-glusterd2). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleting the workdir will erase all cluster's data. Suggest the workaround only if this issue faced during re-setup.
@aravindavk Please review, I have made the changes |
doc/quick-start-user-guide.md
Outdated
The path to default directory used by glusterd2 is "/var/lib/glusterd2/", if using custom config file then please provide working directory path instead of "/var/lib/glusterd2/" | ||
|
||
```sh | ||
# rm /var/lib/glusterd/2* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this path correct? Above you write "/var/lib/glusterd2/" but here the last slash and the 2 are transposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the issues aren't major and may seem obvious from symptoms and error messages. We'll have to focus on documenting cluster restart and quorum loss.
```log | ||
FATA[2018-06-04 06:07:38.348333] failed to listen error="listen unix /usr/local/var/run/glusterd2/glusterd2.socket: bind: address already in use" socket=glusterd2.socket source="[server.go:76:sunrpc.NewMuxed]" | ||
``` | ||
This issue occurs because the socket address is already in use by old glusterd service. To resolve this issue you will have to delete the socket file to free the socket for the new glusterd service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The socket file isn't cleaned up on ungraceful or abrupt shutdown such as SIGKILL or a crash.
|
||
### Solutions to common issues | ||
|
||
> Note: These are hacks or temporary solutions to the problems. Only use these solutions only if the issues are faced during re-setup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are hacks or temporary solutions to the problems.
Doesn't instil confidence on glusterd2, from user's perspective ;)
|
||
> Note: These are hacks or temporary solutions to the problems. Only use these solutions only if the issues are faced during re-setup. | ||
|
||
* If glusterd service fails with error: "failed to start embedded store" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to start every sentence with `If glusterd service fails with error. Just the symptom should suffice.
```log | ||
ERRO[2018-06-04 06:05:41.020313] failed to start embedded store error="dial tcp {IP}: connect: connection refused" source="[embed.go:36:store.newEmbedStore]" | ||
``` | ||
To solve this you will be required to clean up glusterd2 [working directory](https://github.com/gluster/glusterd2/blob/master/doc/quick-start-user-guide.md#running-glusterd2). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's localstatedir, and not working directory.
# rm /usr/local/var/run/glusterd2/glusterd2.pid | ||
``` | ||
|
||
* If glusterd service fails with error: "failed to listen" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"failed to listen" can also mean another glusterd2 process is running. You have to be a little more specific about the socket file and what caused it
```log | ||
FATA[2018-06-04 06:07:38.348333] failed to listen error="listen unix /usr/local/var/run/glusterd2/glusterd2.socket: bind: address already in use" socket=glusterd2.socket source="[server.go:76:sunrpc.NewMuxed]" | ||
``` | ||
This issue occurs because the socket address is already in use by old glusterd service. To resolve this issue you will have to delete the socket file to free the socket for the new glusterd service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the socket address is already in use by old glusterd service
.
It's not.
Either:
- It was being in use but wasn't cleaned up on shutdown
- There's another glusterd2 instances running
Issue: #883