-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emphasis the importance of input of unsafe recovery #18628
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Yang Zhang <[email protected]>
Hi @v01dstar. Thanks for your PR. I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc @overvenus |
Co-authored-by: Neil Shen <[email protected]>
[LGTM Timeline notifier]Timeline:
|
/ok-to-test |
Signed-off-by: Yang Zhang <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
online-unsafe-recovery.md
Outdated
@@ -56,6 +56,10 @@ pd-ctl -u <pd_addr> unsafe remove-failed-stores <store_id1,store_id2,...> | |||
|
|||
若 PD 进行过灾难性恢复 [`pd-recover`](/pd-recover.md) 操作,丢失了无法恢复的 TiKV 节点的 store 信息,因此无法确定要传的 store ID 时,可指定 `--auto-detect` 参数允许传入一个空的 store ID 列表。在该模式下,所有未在 PD store 列表中的 store ID 均被认为无法恢复,进行移除。 | |||
|
|||
> **注意:** | |||
> | |||
> 请确保一次性输入 **所有** 失败的 TiKV 节点和 TiFlash 节点,如果有部分失败节点遗漏,恢复可能会被阻塞。如果在短时间内 (如一天时间内),已经运行过一次 Online Unsafe Recovery ,请仍确保后续的执行仍然带有之前已经处理过的失败 TiKV 和 TiFlash 节点。如果无法确定所有的失败节点,可以使用 --auto-detect 模式,由 PD 将所有不在当前 store 列表中的副本删除。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“如果无法确定所有的失败节点,可以使用 --auto-detect 模式,由 PD 将所有不在当前 store 列表中的副本删除” 这里的内容看起来和 L57 有一定的重合呢,是否可以整合在一起
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
重新组织了一下,PTAL
online-unsafe-recovery.md
Outdated
@@ -56,6 +56,10 @@ pd-ctl -u <pd_addr> unsafe remove-failed-stores <store_id1,store_id2,...> | |||
|
|||
若 PD 进行过灾难性恢复 [`pd-recover`](/pd-recover.md) 操作,丢失了无法恢复的 TiKV 节点的 store 信息,因此无法确定要传的 store ID 时,可指定 `--auto-detect` 参数允许传入一个空的 store ID 列表。在该模式下,所有未在 PD store 列表中的 store ID 均被认为无法恢复,进行移除。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@v01dstar 请问上面这一行中的 “PD store 列表” 是这个命令 pd-ctl -u <pd_addr> unsafe remove-failed-stores --auto-detect <store_id1,store_id2,...>
中的 <store_id1,store_id2,...> 吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarify 了一下, PTAL
Signed-off-by: Yang Zhang <[email protected]>
First-time contributors' checklist
What is changed, added or deleted? (Required)
Which TiDB version(s) do your changes apply to? (Required)
Tips for choosing the affected version(s):
By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.
For details, see tips for choosing the affected versions (in Chinese).
What is the related PR or file link(s)?
Do your changes match any of the following descriptions?