Skip to content
This repository has been archived by the owner on Mar 26, 2020. It is now read-only.

Documenting solutions to common issues faced while restarting glusterd2 service #885

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

rishubhjain
Copy link
Contributor

Issue: #883

@rishubhjain
Copy link
Contributor Author

@atinmu I have just documented solutions to some of the issues faced while restarting glusterd2 service. I would like to know if this is a suitable way of documenting the solutions to the common issues?

@rishubhjain rishubhjain changed the title Documenting etcd cleanups Documenting solutions to common issues faced while restarting glusterd2 service Jun 14, 2018
The path to default directory used by glusterd2 is ["/var/lib/glusterd/"](https://github.com/gluster/glusterd2/blob/master/glusterd2.toml.example#L1), if using custom config file then please provide working directory path instead of "/var/lib/glusterd/"

```sh
# rm /var/lib/glusterd/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to delete the complete glusterd2 directory? it will delete the old ETCD data which is stored in ETCD folder. ETCD config will be stored in store.toml (I think clearing this should be enough). If store.toml is empty glusterd2 should regenerate the store.toml based on the configuration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is still not addressed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Madhu-1 I will provide a warning, and will send a new PR after testing whether its suffiecient to delete store.toml.

@atinmu
Copy link
Contributor

atinmu commented Jun 14, 2018

@atinmu I have just documented solutions to some of the issues faced while restarting glusterd2 service. I would like to know if this is a suitable way of documenting the solutions to the common issues?

This is definitely a good start and we should make this as a continuous process to capture all the troubleshooting experience. However what I am additionally looking at the document is more of a concrete direction on "Do these x, y, z things before setting up the environment"

@rishubhjain
Copy link
Contributor Author

@atinmu Is this PR good to go, or would you like to reformat the documenting?

@atinmu
Copy link
Contributor

atinmu commented Jun 25, 2018

@atinmu Is this PR good to go, or would you like to reformat the documenting?

I don't see any reason why this PR can not get in. We need to get into a good habit of refreshing this document in a periodic basis or whenever we stumble upon some issues which an user can face frequently.

ERRO[2018-06-04 06:05:41.020313] failed to start embedded store error="dial tcp {IP}: connect: connection refused" source="[embed.go:36:store.newEmbedStore]"
```
To solve this you will be required to clean up glusterd2 [working directory](https://github.com/gluster/glusterd2/blob/master/doc/quick-start-user-guide.md#running-glusterd2).
The path to default directory used by glusterd2 is ["/var/lib/glusterd/"](https://github.com/gluster/glusterd2/blob/master/glusterd2.toml.example#L1), if using custom config file then please provide working directory path instead of "/var/lib/glusterd/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default is /var/lib/glusterd2


Sample Output:
```log
FATA[2018-06-04 06:07:22.605017] Failed to create pid file error="Process is already running" source="[main.go:87:main.main]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not occur if pid file exists and process is not running. Please recheck

To delete the pid file for the given output:

```sh
# rm /usr/local/var/run/glusterd2/glusterd2.pid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If other instance of glusterd2 is running, removing pid file is not sufficient.

```log
FATA[2018-06-04 06:07:38.348333] failed to listen error="listen unix /usr/local/var/run/glusterd2/glusterd2.socket: bind: address already in use" socket=glusterd2.socket source="[server.go:76:sunrpc.NewMuxed]"
```
This issue occurs because the socket address is already in use by old glusterd service. To resolve this issue you will have to delete the socket file to free the socket for the new glusterd service.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add a note to check if any other instance of glusterd2 is running before removing the socket file

```log
ERRO[2018-06-04 06:05:41.020313] failed to start embedded store error="dial tcp {IP}: connect: connection refused" source="[embed.go:36:store.newEmbedStore]"
```
To solve this you will be required to clean up glusterd2 [working directory](https://github.com/gluster/glusterd2/blob/master/doc/quick-start-user-guide.md#running-glusterd2).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting the workdir will erase all cluster's data. Suggest the workaround only if this issue faced during re-setup.

@rishubhjain
Copy link
Contributor Author

@aravindavk Please review, I have made the changes

The path to default directory used by glusterd2 is "/var/lib/glusterd2/", if using custom config file then please provide working directory path instead of "/var/lib/glusterd2/"

```sh
# rm /var/lib/glusterd/2*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this path correct? Above you write "/var/lib/glusterd2/" but here the last slash and the 2 are transposed.

Copy link
Contributor

@prashanthpai prashanthpai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the issues aren't major and may seem obvious from symptoms and error messages. We'll have to focus on documenting cluster restart and quorum loss.

```log
FATA[2018-06-04 06:07:38.348333] failed to listen error="listen unix /usr/local/var/run/glusterd2/glusterd2.socket: bind: address already in use" socket=glusterd2.socket source="[server.go:76:sunrpc.NewMuxed]"
```
This issue occurs because the socket address is already in use by old glusterd service. To resolve this issue you will have to delete the socket file to free the socket for the new glusterd service.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The socket file isn't cleaned up on ungraceful or abrupt shutdown such as SIGKILL or a crash.


### Solutions to common issues

> Note: These are hacks or temporary solutions to the problems. Only use these solutions only if the issues are faced during re-setup.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are hacks or temporary solutions to the problems.

Doesn't instil confidence on glusterd2, from user's perspective ;)


> Note: These are hacks or temporary solutions to the problems. Only use these solutions only if the issues are faced during re-setup.

* If glusterd service fails with error: "failed to start embedded store"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to start every sentence with `If glusterd service fails with error. Just the symptom should suffice.

```log
ERRO[2018-06-04 06:05:41.020313] failed to start embedded store error="dial tcp {IP}: connect: connection refused" source="[embed.go:36:store.newEmbedStore]"
```
To solve this you will be required to clean up glusterd2 [working directory](https://github.com/gluster/glusterd2/blob/master/doc/quick-start-user-guide.md#running-glusterd2).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's localstatedir, and not working directory.

# rm /usr/local/var/run/glusterd2/glusterd2.pid
```

* If glusterd service fails with error: "failed to listen"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"failed to listen" can also mean another glusterd2 process is running. You have to be a little more specific about the socket file and what caused it

```log
FATA[2018-06-04 06:07:38.348333] failed to listen error="listen unix /usr/local/var/run/glusterd2/glusterd2.socket: bind: address already in use" socket=glusterd2.socket source="[server.go:76:sunrpc.NewMuxed]"
```
This issue occurs because the socket address is already in use by old glusterd service. To resolve this issue you will have to delete the socket file to free the socket for the new glusterd service.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the socket address is already in use by old glusterd service.

It's not.

Either:

  • It was being in use but wasn't cleaned up on shutdown
  • There's another glusterd2 instances running

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants