-
Notifications
You must be signed in to change notification settings - Fork 260
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add translations for disaster recovery
- Loading branch information
Showing
6 changed files
with
83 additions
and
29 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -81,3 +81,46 @@ $ emqx ctl ds set_replicas messages <Site ID 1> <Site ID 2> ... | |
``` | ||
|
||
这种方法可以最大程度地减少站点之间的数据传输量,同时确保尽可能地维持复制因子。 | ||
|
||
## 灾难恢复 | ||
|
||
当灾难发生时,知道如何高效地进行节点的恢复对于维护服务的连续性至关重要。本节提供了从常见灾难场景中恢复节点的指导。 | ||
|
||
### 节点的完全丢失 | ||
|
||
最常见的灾难场景之一是节点的完全丢失,这可能是由于无法恢复的硬件故障、磁盘损坏或人为错误造成的。 | ||
|
||
1. 通过重新分配分片来恢复可用性。 | ||
|
||
如果一个节点完全丢失,集群的可用性会在某种程度上受到影响。第一步是通过将丢失节点的分片重新分配到集群中的其他节点来恢复可用性。 | ||
|
||
您可以使用标准的 `leave` 命令来实现这一点。即使丢失的节点不可访问,该命令仍然可以运行,但转换可能需要更长时间完成。 | ||
|
||
```shell | ||
$ emqx ctl ds leave messages 5C6028D6CE9459C7 # 此处的 5C6028D6CE9459C7 是丢失节点的 Site ID | ||
``` | ||
|
||
2. 监控集群状态并等待所有分片转换成功完成。在继续进行下一步之前,确保没有更多的转换。 | ||
|
||
```shell | ||
$ emqx ctl ds info | ||
<...> | ||
SITES: | ||
D8894F95DC86DFDB '[email protected]' up | ||
5C6028D6CE9459C7 '[email protected]' (x) down | ||
<...> | ||
REPLICA TRANSITIONS: | ||
Shard Transitions | ||
messages/0 -5C6028D6CE9459C7 +D8894F95DC86DFDB | ||
<...> | ||
``` | ||
|
||
3. 一旦所有分片转换完成,您需要告知集群丢失的节点不会返回。 | ||
|
||
```shell | ||
$ emqx ctl ds forget messages 5C6028D6CE9459C7 | ||
``` | ||
|
||
如果计划使用原始节点名称替换丢失的节点,这一步至关重要。如果不这样做,可能会导致集群在两个不同的 Site ID 下识别出相同的节点名称,从而导致严重的混淆和潜在的问题。 |