Recommended method of cluster management #473
Unanswered
stratacast
asked this question in
Q&A
Replies: 1 comment 2 replies
-
@stratacast I don't see a connection between the two. Chances are your node was out of disk space because the log wasn't truncated due to how you use Ra, we do not know what specifically. Or maybe your cluster did not have a majority of replicas online back in that moment, and by the time you have decided to "reset" a node, it was.
It's the monitoring of Raft-related metrics and the free disk space that you likely need, not another function to start a cluster. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I've been running ra for a while and ran into an issue where I couldn't get a node to reconnect until I cleared out the .segment files in my radata folder. The partitioned node seemed to run happily all alone while the rest of the cluster did its thing. This incident lead me to see that I don't think I am doing this clustering thing quite right. I
ra:start_cluster/4
a 3 member cluster. I've had it once where I had to shut down the cluster entirely to get everything reconnected, along with this segment file clearing (which may have been the solution previously).With some of this jank in my code, I think I am starting my cluster in an unreliable way. So, is the right way to start as a single node then add members? When a node disconnects to I remove it, then wait for an event like when the server shows up in the
nodes()
list and add the member back? There's a lot of ways to skin the cat with ra. What do you all suggest?Beta Was this translation helpful? Give feedback.
All reactions