Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline + NoCache 情况下,可能出现 Can't truncate logs before _applied_id=22, last_log_kept=0 #421

Open
CkTD opened this issue Sep 13, 2023 · 5 comments

Comments

@CkTD
Copy link

CkTD commented Sep 13, 2023

场景:
1. leader 给某个 peer 安装快照 [last_included_index=22],完成后 _next_index = 23
2. 然后leader 先后发送两个 append entries rpc , 分别复制log [23, 28] 和 [29,32]

如果网络原因导致第二个 append entries rpc 先到达。在 NoCache 情况下,会失败。 replicator 把 _next_index 重置并取消掉第一个 rpc。 下面的这个检查又进入到了 truncate log 的分枝。_next_index--,也就是设置为 22。但实际上peer上仅仅是缺少了日志,应该是第一个分支。

// prev_log_index and prev_log_term doesn't match
r->_reset_next_index();
if (response->last_log_index() + 1 < r->_next_index) {
BRAFT_VLOG << "Group " << r->_options.group_id
<< " last_log_index at peer=" << r->_options.peer_id
<< " is " << response->last_log_index();
// The peer contains less logs than leader
r->_next_index = response->last_log_index() + 1;
} else {
// The peer contains logs from old term which should be truncated,
// decrease _last_log_at_peer by one to test the right index to keep
if (BAIDU_LIKELY(r->_next_index > 1)) {
BRAFT_VLOG << "Group " << r->_options.group_id
<< " log_index=" << r->_next_index << " mismatch";
--r->_next_index;
} else {
LOG(ERROR) << "Group " << r->_options.group_id
<< " peer=" << r->_options.peer_id
<< " declares that log at index=0 doesn't match,"
" which is not supposed to happen";
}
}

后续 append entries 的时候, prev_index = 21 , peer 上是没有的,仍然会失败。然后不停地减小 _next_index 重试,直到变成0触发了 peer 上的 truncate suffix。

@ehds
Copy link
Contributor

ehds commented Sep 14, 2023

取消 rpc 为什么进入下面的分支? 应该是判断 if (cntl->Failed())分支,所以 next_index 不会 --?

if (cntl->Failed()) {
ss << " fail, sleep.";
BRAFT_VLOG << ss.str();
// TODO: Should it be VLOG?
LOG_IF(WARNING, (r->_consecutive_error_times++) % 10 == 0)
<< "Group " << r->_options.group_id
<< " fail to issue RPC to " << r->_options.peer_id
<< " _consecutive_error_times=" << r->_consecutive_error_times
<< ", " << cntl->ErrorText();
// If the follower crashes, any RPC to the follower fails immediately,
// so we need to block the follower for a while instead of looping until
// it comes back or be removed
// dummy_id is unlock in block
r->_reset_next_index();
return r->_block(start_time_us, cntl->ErrorCode());
}

@CkTD
Copy link
Author

CkTD commented Sep 14, 2023

取消 rpc 为什么进入下面的分支? 应该是判断 if (cntl->Failed())分支,所以 next_index 不会 --?

if (cntl->Failed()) {
ss << " fail, sleep.";
BRAFT_VLOG << ss.str();
// TODO: Should it be VLOG?
LOG_IF(WARNING, (r->_consecutive_error_times++) % 10 == 0)
<< "Group " << r->_options.group_id
<< " fail to issue RPC to " << r->_options.peer_id
<< " _consecutive_error_times=" << r->_consecutive_error_times
<< ", " << cntl->ErrorText();
// If the follower crashes, any RPC to the follower fails immediately,
// so we need to block the follower for a while instead of looping until
// it comes back or be removed
// dummy_id is unlock in block
r->_reset_next_index();
return r->_block(start_time_us, cntl->ErrorCode());
}

被取消的第一个rpc应该是进入了if (cntl->Failed())

感觉出问题的是先到达的第二个 rpc, cntl->Failed() 没有失败,但是由于 log gap存在,Cache 又未开启,所以 response->success() 是 false, response->last_log_index() + 1 又等于 r->_next_index ,就进入了 truncate log 分枝。

@ehds
Copy link
Contributor

ehds commented Sep 15, 2023

取消 rpc 为什么进入下面的分支? 应该是判断 if (cntl->Failed())分支,所以 next_index 不会 --?

if (cntl->Failed()) {
ss << " fail, sleep.";
BRAFT_VLOG << ss.str();
// TODO: Should it be VLOG?
LOG_IF(WARNING, (r->_consecutive_error_times++) % 10 == 0)
<< "Group " << r->_options.group_id
<< " fail to issue RPC to " << r->_options.peer_id
<< " _consecutive_error_times=" << r->_consecutive_error_times
<< ", " << cntl->ErrorText();
// If the follower crashes, any RPC to the follower fails immediately,
// so we need to block the follower for a while instead of looping until
// it comes back or be removed
// dummy_id is unlock in block
r->_reset_next_index();
return r->_block(start_time_us, cntl->ErrorCode());
}

被取消的第一个rpc应该是进入了if (cntl->Failed())

感觉出问题的是先到达的第二个 rpc, cntl->Failed() 没有失败,但是由于 log gap存在,Cache 又未开启,所以 response->success() 是 false, response->last_log_index() + 1 又等于 r->_next_index ,就进入了 truncate log 分枝。

确实会存在这个问题,能否提个PR fix 这个问题呢 👍 ?

@CkTD
Copy link
Author

CkTD commented Sep 20, 2023

取消 rpc 为什么进入下面的分支? 应该是判断 if (cntl->Failed())分支,所以 next_index 不会 --?

if (cntl->Failed()) {
ss << " fail, sleep.";
BRAFT_VLOG << ss.str();
// TODO: Should it be VLOG?
LOG_IF(WARNING, (r->_consecutive_error_times++) % 10 == 0)
<< "Group " << r->_options.group_id
<< " fail to issue RPC to " << r->_options.peer_id
<< " _consecutive_error_times=" << r->_consecutive_error_times
<< ", " << cntl->ErrorText();
// If the follower crashes, any RPC to the follower fails immediately,
// so we need to block the follower for a while instead of looping until
// it comes back or be removed
// dummy_id is unlock in block
r->_reset_next_index();
return r->_block(start_time_us, cntl->ErrorCode());
}

被取消的第一个rpc应该是进入了if (cntl->Failed())
感觉出问题的是先到达的第二个 rpc, cntl->Failed() 没有失败,但是由于 log gap存在,Cache 又未开启,所以 response->success() 是 false, response->last_log_index() + 1 又等于 r->_next_index ,就进入了 truncate log 分枝。

确实会存在这个问题,能否提个PR fix 这个问题呢 👍 ?

这里还不能直接把条件改成 response->last_log_index() + 1 == r->_next_index。 这个条件成立,也有可能是非乱序,但确实是有冲突情况,需要 truncate log。 还得想个办法区分这两种情况,没想到很好的方法来 fix。

@ehds
Copy link
Contributor

ehds commented Sep 20, 2023

--- a/src/braft/replicator.cpp
+++ b/src/braft/replicator.cpp
@@ -450,9 +450,11 @@ void Replicator::_on_rpc_returned(ReplicatorId id, brpc::Controller* cntl,
             // The peer contains logs from old term which should be truncated,
             // decrease _last_log_at_peer by one to test the right index to keep
             if (BAIDU_LIKELY(r->_next_index > 1)) {
-                BRAFT_VLOG << "Group " << r->_options.group_id 
-                           << " log_index=" << r->_next_index << " mismatch";
+              BRAFT_VLOG << "Group " << r->_options.group_id
+                         << " log_index=" << r->_next_index << " mismatch";
+              if (request->prev_log_index() == r->_next_index - 1) {
                 --r->_next_index;
+              }
             } else {
                 LOG(ERROR) << "Group " << r->_options.group_id 
                            << " peer=" << r->_options.peer_id

这样可行吗? 把相等判断放到下面,只有当 request->prev_log_index() == r->_next_index - 1 即 AppendEntries 请求的 prev_log_index 是当前 _flying_append_entries 中最小的还未被确认的,才能去缩小 _next_index.

首先最小的请求一定会被处理,如果是 cntl.Failed , 恢复最初始的状态继续执行,如果是不匹配则向下执行:

如果是 leader 安装快照的场景,这个条件就不会被触发,因为最小的 prev_log_index 一定能匹配 term。

其他情况也不会有影响,因为当前最小的请求的 term 不匹配,那么之后的请求也一定不匹配。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants