-
Notifications
You must be signed in to change notification settings - Fork 232
[WIP] HA failover raft #2751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[WIP] HA failover raft #2751
Conversation
The latest Buf updates on your PR. Results from workflow CI and Release / buf-check (pull_request).
|
Claude finished @alpe's task —— View job Comprehensive Code Review: HA Failover with RaftI've completed a detailed review of the HA failover implementation. Below are my findings organized by focus area. 1. Code Quality & Architecture
|
) | ||
|
||
// failoverState collect the components to reset when switching modes. | ||
type failoverState struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type contains all stateful services that are re-reset when switching from sync to aggregator mode
}) | ||
|
||
// optional Raft node details | ||
if raftNode != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick solution to add some maintenance function for the raft cluster. Probably proto makes more sense
@@ -0,0 +1,17 @@ | |||
package raft | |||
|
|||
// todo: refactor to use proto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RaftBlockState is currently de- / serialized via json for transport via raft.
} else if !isLeader && isCurrentlyLeader { // lost leadership | ||
workerCancel() | ||
n.logger.Info().Msg("lost leadership") | ||
return ErrLeadershipLost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The node is shutdown in the downstream code to ensure clean state.
logfile, err := os.Create(logfileName) | ||
require.NoError(s.t, err) | ||
errReader = io.NopCloser(io.TeeReader(errReader, logfile)) | ||
outReader = io.NopCloser(io.TeeReader(outReader, logfile)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am storing logs to help tracing down issues. This works with non temporary work dirs only.
🚧 WIP - early version
E2E test: test/e2e/failover_e2e_test.go
Overview