Skip to content

Commit

Permalink
fix: etcd
Browse files Browse the repository at this point in the history
  • Loading branch information
ryan4yin committed Dec 8, 2023
1 parent d38bdf0 commit 3261090
Show file tree
Hide file tree
Showing 2 changed files with 52 additions and 90 deletions.
19 changes: 16 additions & 3 deletions datastore/etcd/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,12 @@

```shell
# 连上所有节点,就一定可以找到 leader
export ENDPOINTS=http://node1:2379,http://node2:2379,http://node3:2379
HOST_1=xxx
HOST_2=xxx
HOST_3=xxx
export ETCDCTL_API=3
export ENDPOINT=http://$HOST_1:2379
export ENDPOINTS=http://$HOST_1:2379,http://$HOST_2:2379,http://$HOST_3:2379
etcdctl --endpoints $ENDPOINTS endpoint status --write-out=table
```

Expand Down Expand Up @@ -55,11 +60,19 @@ etcd 支持启用密码验证,在启用之前必须先创建 root 用户,该

```shell
# 创建 root 用户
$ etcdctl --cacert ca.crt --cert peer.crt --key peer.key user add root
$ etcdctl --cacert ca.crt --cert peer.crt --key peer.key --endpoints $ENDPOINTS user add root
Password of root:

# The root user must have the root role and is allowed to change anything inside etcd.
$ etcdctl --cacert ca.crt --cert peer.crt --key peer.key --endpoints $ENDPOINTS user grant-role root root

# 启用访问认证功能
$ etcdctl --cacert ca.crt --cert peer.crt --key peer.key auth enable
$ etcdctl --cacert ca.crt --cert peer.crt --key peer.key --endpoints $ENDPOINTS auth enable


# 后续的所有操作都需要使用用户名和密码
$ etcdctl --cacert ca.crt --cert peer.crt --key peer.key --endpoints $ENDPOINTS --user root user list
Password of root:
```

## Etcd 集群运维需知
Expand Down
123 changes: 36 additions & 87 deletions datastore/etcd/etcd_with_systemd.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,101 +5,35 @@

参考官方文档,使用如下命令下载:
```shell
ETCD_VER=v3.4.14
# 运行以下命令前请先退出 root 模式!
ETCD_VER=v3.5.9

DOWNLOAD_URL=https://github.com/etcd-io/etcd/releases/download
# choose either URL
GOOGLE_URL=https://storage.googleapis.com/etcd
GITHUB_URL=https://github.com/etcd-io/etcd/releases/download
DOWNLOAD_URL=${GOOGLE_URL}
TAR_NAME=etcd-${ETCD_VER}-linux-arm64.tar.gz

curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
rm -rf /tmp/etcd-downloaded && mkdir -p /tmp/etcd-downloaded

mkdir -p /tmp/etcd
tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /tmp/etcd --strip-components=1
curl -L ${DOWNLOAD_URL}/${ETCD_VER}/${TAR_NAME} -o /tmp/${TAR_NAME}
tar xzvf /tmp/${TAR_NAME} -C /tmp/etcd-downloaded --strip-components=1
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz

/tmp/etcd-downloaded/etcd --version
/tmp/etcd-downloaded/etcdctl version

mkdir /data/bin
mv /tmp/etcd/etcd* /data/bin
rm -rf /tmp/etcd
mv /tmp/etcd-downloaded/etcd* /data/bin
rm -rf /tmp/etcd-downloaded
```

### 二、部署 Etcd 集群

假设我们把所有数据和配置都存放在 /data 目录下,它可能是一个独立的数据硬盘:


`/data/etcd.env` 内容如下,三个分别节点只有 `ETCD_NAME``THIS_IP` 两个参数需要修改,其他配置完全一致:

```conf
NAME_1=node1
NAME_2=node2
NAME_3=node3
HOST_1=172.16.238.100
HOST_2=172.16.238.101
HOST_3=172.16.238.102
ETCD_NAME=${NAME_1}
THIS_IP=${HOST_1}
# 可以考虑设置 TLS 双向认证增强安全性
# ETCD_TRUSTED_CA_FILE="/etc/etcd/etcd-ca.crt"
# ETCD_CERT_FILE="/etc/etcd/server.crt"
# ETCD_KEY_FILE="/etc/etcd/server.key"
# ETCD_PEER_CLIENT_CERT_AUTH=true
# ETCD_PEER_TRUSTED_CA_FILE="/etc/etcd/etcd-ca.crt"
# ETCD_PEER_KEY_FILE="/etc/etcd/server.key"
# ETCD_PEER_CERT_FILE="/etc/etcd/server.crt"
ETCD_INITIAL_CLUSTER_TOKEN=<random_token>
ETCD_INITIAL_CLUSTER_STATE=new
ETCD_DATA_DIR=/data/etcd.data
ETCD_INITIAL_CLUSTER="${NAME_1}=http://${HOST_1}:2380,${NAME_2}=http://${HOST_2}:2380,${NAME_3}=http://${HOST_3}:2380"
ETCD_LISTEN_PEER_URLS=http://${THIS_IP}:2380
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://${THIS_IP}:2380"
ETCD_LISTEN_CLIENT_URLS="http://${THIS_IP}:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://${THIS_IP}:2379"
```


`/data/etcd.service` 的内容如下,三个节点的此份配置完全一致,没有任何区别:

```conf
[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target
[Service]
Type=simple
# EnvironmentFile 不支持使用 ${xxx} 变量插值,这里不适合使用
# EnvironmentFile=/data/etcd.env
# -a 表示传递环境变量
ExecStart=/bin/bash -ac '. /data/etcd.env; /data/bin/etcd'
Restart=always
RestartSec=5s
LimitNOFILE=40000
[Install]
WantedBy=multi-user.target
```

然后在三个节点上分别运行如下指令,即可启动一个 etcd 集群:

```shell
# 注意这里不能用 `ln -s`,会导致系统重启后 systemd 无法识别,报很奇怪的错误!
cp /data/etcd.service /usr/lib/systemd/system/etcd.service
systemctl daemon-reload
systemctl enable etcd
systemctl start etcd
```


### 三、启动优化

`Type=simple` 模式下,systemd 其实无法获知 etcd 的真正状态,只假设脚本一启动,etcd 就正常运行了。

为了让 etcd 正确通知到 systemd 它的启动状态,可以改用 `Type=notify`,但是这种模式下,就必须使用 `/data/bin/etcd` 自身作为启动程序,不能再使用 `/bin/bash` 了。

示例如下:


`/data/etcd.env` 内容如下,三个分别节点只有 `ETCD_NAME``THIS_IP` 两个参数需要修改,其他配置完全一致:

```conf
Expand All @@ -113,6 +47,7 @@ systemctl start etcd
# ETCD_PEER_TRUSTED_CA_FILE="/etc/etcd/etcd-ca.crt"
# ETCD_PEER_KEY_FILE="/etc/etcd/server.key"
# ETCD_PEER_CERT_FILE="/etc/etcd/server.crt"
ETCD_INITIAL_CLUSTER_TOKEN=<random_token>
ETCD_INITIAL_CLUSTER_STATE=new
ETCD_DATA_DIR=/data/etcd.data
Expand All @@ -131,6 +66,8 @@ HOST_3=172.16.238.102

`/data/etcd.service` 的内容如下,三个节点的此份配置完全一致,没有任何区别:

> 为了让 etcd 正确通知到 systemd 它的启动状态,这里用 `Type=notify`,但是这种模式下,就必须使用 `/data/bin/etcd` 自身作为启动程序
```conf
[Unit]
Description=etcd key-value store
Expand All @@ -142,10 +79,10 @@ Type=notify
EnvironmentFile=/data/etcd.env
# ExecXXX 的命令中是可以使用 ${Xxx} 插值语法的
ExecStart=/data/bin/etcd \
--initial-advertise-peer-urls http://${THIS_IP}:2380 \
--listen-peer-urls http://${THIS_IP}:2380 \
--advertise-client-urls http://${THIS_IP}:2379 \
--listen-client-urls http://${THIS_IP}:2379 \
--advertise-client-urls http://${THIS_IP}:2379 \
--listen-peer-urls http://${THIS_IP}:2380 \
--initial-advertise-peer-urls http://${THIS_IP}:2380 \
--initial-cluster "${NAME_1}=http://${HOST_1}:2380,${NAME_2}=http://${HOST_2}:2380,${NAME_3}=http://${HOST_3}:2380"
Restart=always
RestartSec=5s
Expand All @@ -155,7 +92,19 @@ LimitNOFILE=40000
WantedBy=multi-user.target
```

这个方案同时使用了环境变量和命令行两种方式来设置 etcd 参数。
然后在三个节点上分别运行如下指令,即可启动一个 etcd 集群:

```shell
# 注意这里不能用 `ln -s`,会导致系统重启后 systemd 无法识别,报很奇怪的错误!
cp /data/etcd.service /usr/lib/systemd/system/etcd.service
systemctl daemon-reload
systemctl enable etcd

# 前两台节点上,这个命令会卡住,因为 etcd 尚未就绪,要等到第三台节点启动后,三台节点才会一起 healthy
systemctl start etcd

systemctl status etcd
```

## 参考文档

Expand Down

0 comments on commit 3261090

Please sign in to comment.