Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emphasis the importance of input of unsafe recovery #18628

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions online-unsafe-recovery.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,10 @@ pd-ctl -u <pd_addr> unsafe remove-failed-stores <store_id1,store_id2,...>

若 PD 进行过灾难性恢复 [`pd-recover`](/pd-recover.md) 操作,丢失了无法恢复的 TiKV 节点的 store 信息,因此无法确定要传的 store ID 时,可指定 `--auto-detect` 参数允许传入一个空的 store ID 列表。在该模式下,所有未在 PD store 列表中的 store ID 均被认为无法恢复,进行移除。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@v01dstar 请问上面这一行中的 “PD store 列表” 是这个命令 pd-ctl -u <pd_addr> unsafe remove-failed-stores --auto-detect <store_id1,store_id2,...> 中的 <store_id1,store_id2,...> 吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarify 了一下, PTAL


> **注意:**
>
> 请确保一次性输入 **所有** 失败的 TiKV 节点和 TiFlash 节点,如果有部分失败节点遗漏,恢复可能会被阻塞。如果在短时间内 (如一天时间内),已经运行过一次 Online Unsafe Recovery ,请仍确保后续的执行仍然带有之前已经处理过的失败 TiKV 节点。如果无法确定所有的失败节点,可以使用 --auto-detect 模式,由 PD 将所有不在当前 store 列表中的副本删除。
v01dstar marked this conversation as resolved.
Show resolved Hide resolved

> **注意:**
>
> - 由于此命令需要收集来自所有 Peer 的信息,可能会造成 PD 短时间内有明显的内存使用量上涨(10 万个 Peer 预计使用约 500 MiB 内存)。
Expand Down