Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于状态机异常的问题:FSMCaller is overload #472

Closed
Linary opened this issue Jun 28, 2020 · 3 comments
Closed

关于状态机异常的问题:FSMCaller is overload #472

Linary opened this issue Jun 28, 2020 · 3 comments

Comments

@Linary
Copy link

Linary commented Jun 28, 2020

这个错是表示follower节点应用日志的速度跟不上leader发日志的速度吗?

[JRaft-LogManager-Disruptor-0] [ERROR] com.baidu.hugegraph.backend.store.raft.StoreStateMachine [] - Raft error: FSMCaller is overload.
com.alipay.sofa.jraft.error.RaftException: FSMCaller is overload.
        at com.alipay.sofa.jraft.core.FSMCallerImpl.enqueueTask(FSMCallerImpl.java:236) [jraft-core-1.3.1.jar:?]
        at com.alipay.sofa.jraft.core.FSMCallerImpl.onCommitted(FSMCallerImpl.java:245) [jraft-core-1.3.1.jar:?]
        at com.alipay.sofa.jraft.core.BallotBox.setLastCommittedIndex(BallotBox.java:241) [jraft-core-1.3.1.jar:?]
        at com.alipay.sofa.jraft.core.NodeImpl$FollowerStableClosure.run(NodeImpl.java:1828) [jraft-core-1.3.1.jar:?]
        at com.alipay.sofa.jraft.storage.impl.LogManagerImpl$AppendBatcher.flush(LogManagerImpl.java:469) [jraft-core-1.3.1.jar:?]
        at com.alipay.sofa.jraft.storage.impl.LogManagerImpl$StableClosureEventHandler.onEvent(LogManagerImpl.java:565) [jraft-core-1.3.1.jar:?]
        at com.alipay.sofa.jraft.storage.impl.LogManagerImpl$StableClosureEventHandler.onEvent(LogManagerImpl.java:496) [jraft-core-1.3.1.jar:?]
        at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) [disruptor-3.3.7.jar:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
@killme2008
Copy link
Contributor

@Linary 这是状态机过载了,提交到 FSMCaller 的任务超过了他的处理能力,首先可以看下 metric,看看状态机处理的耗时在哪里,参考 https://www.sofastack.tech/projects/sofa-jraft/jraft-user-guide/ 第8节 metric 监控。
其次可以适当加大 RaftOpionts 的 disruptorBufferSize(默认 16384)

@Linary
Copy link
Author

Linary commented Jun 29, 2020

@killme2008 感谢回复,我再请教关于一个日志复制的问题。

leader在收到来自客户端的请求后,会向本地日志追加条目并向所有follower发送AppendEntries RPC,在收到大多数响应后将该条目应用到状态机并回复响应给客户端。

我从 NodeImpl 的 apply(Task) 方法开始跟踪代码,看到了:

  1. Task 被封装成 LogEntryAndClosure 加入 applyQueue 中;
  2. 消费者 LogEntryAndClosureHandler 会执行 executeApplyingTasks;
  3. entries 被封装成 LeaderStableClosure,调用 logManager.appendEntries;
  4. LeaderStableClosure 被封装成 StableClosureEvent 加入到 diskQueue 中;
  5. 消费者 StableClosureEventHandler 会执行 ab.flush(),这个 ab 是 AppendBatcher;
  6. AppendBatcher 的 flush 会执行 appendToStorage 追加日志条目,然后 for 循环里面调用 StableClosure.run();
  7. LeaderStableClosure 的 run 方法执行 ballotBox.commitAt,修改commitIndex,然后调用 FSMCaller 的 onCommitted 方法;
  8. FSMCaller 的 onCommitted 将 Task 封装成 ApplyTaskEvent 加入 taskQueue;
  9. 消费者 ApplyTaskHandler 执行 runApplyTask,然后调 doCommitted 修改 lastAppliedIndex,并调用状态机的 apply(iter) 方法;

这条链路里我没有看到 leader 向所有 follower 发送 AppendEntries RPC 的代码,请问是我跟的有遗漏还是有别的什么机制来实现的发 RPC 消息?

@killme2008
Copy link
Contributor

AppendEntries 完全是依靠 replicator 去完成的,某个节点成为 leader 后会为每一个 follower/learner 启动一个 replicator,由 replicator 负责日志复制。

关于复制可以看下这篇文章
https://www.cnblogs.com/luozhiyun/p/12005975.html

更多文章参考
#327

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants