-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix: use next instead of return to continue listen for events #867
base: main
Are you sure you want to change the base?
Conversation
But wait, I think the thinking was that if this node is the leader it should crash/close when it looses its leadership? Are we not doing that? A leader restarts, resumes its leadership, it should need to listen for other leadership changes because when it looses its leadership it should exit the process? |
Yeah, well, it's stopping for some reason. I think I may have found another bug now which is the real issue! |
Hm, no. Because it returns from And because it's stuck in leader election without following, it means message loss if we publish to the new leader and it dies. No data has been synced and the "idling" non-follower will be leader but with old data. |
how can the idling non-followr become leader? it should check the ISR before trying to become leader |
It's still in the ISR set, so it will just continue to leader election. edit: let me check this again, i misinterpreted your comment at first... |
I think this is the flow:
|
Maybe we should add a check to verify the node is in ISR when it's elected, else revoke lease and start over? |
yes! |
I'll wait for #871 to be merged and test together with that. |
WHAT is this pull request doing?
When restarting a leader it could end up in a idling state, not turning into a follower. This because if it starts up fast enough it hasn't lost its leadership according to etcd, and the first election event will return the node as leader, and it can't be follower to itself.
Chaning the
return
tonext
will make it will wait for the next event that should be another node being promoted.HOW can this pull request be tested?
Failed to create specs for this so testing manually for now.