-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
broken leader re-election after killing most of cluster nodes #78
Comments
Currently leader for node resets only when node receives some messages from other node. It's not a bug - raft requires more than half of the cluster alive to elect new leader, so in your scenario it's a normal behaviour. |
@bakwc tested with 4 nodes and 3 of the nodes live and the re-election didn't happen |
I saw this behaviour too, but in my case
now I kill the leader
i am running the servers in tmux, code where we test restarting the leader leaves everything in limbo |
Thanks for report! I'll try to reproduce it. How long does it take to get this situation? Is it reproduces only when you have a password-protected cluster? |
i'll try for non password protected, i'll do it now. |
yes indeed, that seems to be the issue, without passwd I cannot reproduce . |
@xmonader, did you use the password? You created a 4-node cluster, killed only one node (leader) and new leader was not elected? |
Please try to increase following config options, set them to:
|
sorry was away for this time, will try to do. |
Do you use python2 or python3? |
3
…On Mon, May 28, 2018 at 5:21 PM, Filipp Ozinov ***@***.***> wrote:
Do you use python2 or python3?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#78 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFvi1KAR3cDpXJwUpe1k9b2lxmqGwN7Jks5t3BYKgaJpZM4TU84C>
.
--
------------------------
Kristof De Spiegeleer
+971525609014 (my telegram account registered on this nr, best way to reach
me)
+201206927877
+32 475405474
skype: despiegk
|
Any update on this? |
Could you please provide more details? What is your reproduce steps? What is the cluster size? How much nodes were alive? |
My cluster size is 4 |
Also when I removed a node from the cluster it remained in otherNodes in not leader nodes. It maybe also relate to this |
Thanks for report, I'll chek it. What script did you use for test purposes? Could you post it somewhere (pastebin.com)? Is it always reproduces or from time to time?
|
Checked multiple times, can't reproduce. Could you please provide detailed step-by-step instruction of your actions? |
Acually I did what you explained. |
Also if you want I can show my screen in any call |
When adding nodes you need to specify all current cluster nodes manually. Added #112 to make auto-discovery. |
Thank you, |
I was killing some of a 3 node cluster randomly to verify an issue with re-electing leader and checking the status using
Here the leader should've been set to
6001
but got None value insteadAnd I reached this very interesting state where the node
6000
has a leader6001
but that leader isn't even active?it was fixed after launching
6002
The text was updated successfully, but these errors were encountered: