diff --git a/README.md b/README.md new file mode 100644 index 0000000..3731356 --- /dev/null +++ b/README.md @@ -0,0 +1,36 @@ +[![Build Status](https://travis-ci.org/radondb/xenon.png)](https://travis-ci.org/radondb/xenon) +[![Go Report Card](https://goreportcard.com/badge/github.com/radondb/xenon)](https://goreportcard.com/report/github.com/radondb/xenon) +[![codecov.io](https://codecov.io/gh/radondb/xenon/graphs/badge.svg)](https://codecov.io/gh/radondb/xenon/branch/master) + +# Xenon + +![](docs/images/xenon.png) + +## Overview + +`Xenon` is a MySQL HA and replication management tool using Raft protocol. + +* Fast Failover with no lost transactions +* Streaming & Speed-Unmatched backup/restore +* Mysql Operation and Maintenance +* No central control and easy-to-deploy +* As a Cloud App + +## Documentation + +- [build_and_run](docs/how_to_build_and_run_xenon.md) : How to build and run Xenon. +- [client_commands](docs/xenoncli_commands.md) : Xenon client commands. +- [how_xenon_works](docs/how_xenon_works.md) : How Xenon works. + +## Status + +Xenon is production ready, it has been used in production like [MySQL Plus](https://www.qingcloud.com/products/mysql-plus/) + +## Issues + +The [integrated github issue tracker](https://github.com/radondb/xenon/issues) +is used for this project. + +## License + +Xenon is released under the GPLv3. See [LICENSE](LICENSE) diff --git a/docs/how_to_build_and_run_xenon.md b/docs/how_to_build_and_run_xenon.md new file mode 100644 index 0000000..9e8c271 --- /dev/null +++ b/docs/how_to_build_and_run_xenon.md @@ -0,0 +1,467 @@ +[TOC] + +# How to build and run xenon + +## Requirements +1. `Xenon` is a self-contained binary that does not require additional system libraries at the operating system level. It is built on Linux. I have no hint about MS Windows and OS/X, and the build is incompatible with Windows and OS/X. It is a standalone application. When configured to run with a `MySQL` backend, so mysqld is required. +2. Xenon use `GTID` parallel replication technology, MySQL version is best `5.7 or higher`. +3. [Go](http://golang.org) version 1.8 or newer is required("sudo apt install golang" for ubuntu or "yum install golang" for centOS/redhat). + +## Step1. Download src code from github + +``` +$ git clone https://github.com/xenon/xenon +``` + +## Step2. Build + +### Step2.1 make build +After download radon src code from github, it will generate a directory named "xenon", execute the following commands: +``` +$ cd xenon +$ make build +``` +The binary executable file is in the "bin" directory, execute command "ls bin/": +``` +$ ls bin/ + +xenon xenoncli +``` + +### Step2.2 make test +``` +$ make test +``` +Next is a simple analysis of the things how we test xenon. (You can jump over `step 2.2` and `step 2.3`, continue reading from `Step 3`) + +In xenon, we developed a distributed test framework that makes distributed testing exceptionally easy. It can do to simulate MySQL Server, network flash, brain crack and other infrastructure failures. So it's easy to build a Raft+ cluster with 511 nodes. Constantly Kill Leader, brain cracker and recovery. After a period of time to confirm the status of the cluster to ensure that the logic is correct. + +For example, to create a 511 nodes Raft+ cluster, let's do the following: + +1. Kill Leader and wait for the birth of a new Leader. + +2. Forces all members of a cluster to Candidate state, and then confirms that the cluster state ultimately has only one Leader. +3. Forces all members of a cluster to be set as Leader state, and then confirm whether the cluster status ultimately has only one Leader. + +The logic of the code is as follows : + +``` +log := xlog.NewStdLog(xlog.Level(xlog.DEBUG)) +_, rafts, cleanup := MockRafts(log, 511) +defer cleanup() + +// Start the Raft+ cluster. +for _, raft := range rafts { + raft.Start() +} + +// Wait the Leader eggs. +MockWaitLeaderEggs(rafts, 1) + +// Case1: Stop the Leader(mock to IDLE). +MockStateTransition(leader, IDLE) + +// Case2: Force all the 511 nodes state to Candidate and then check the cluster. +for _, raft := range rafts { + MockStateTransition(raft, CANDIDATE) +} +MockWaitLeaderEggs(rafts, 1) + +// Case3: Force all the 511 nodes state to Leader and then check the cluster. +for _, raft := range rafts { + MockStateTransition(raft, LEADER) +} +MockWaitLeaderEggs(rafts, 1) +... ... +``` + +### Step2.3 Coverage Test + +``` +$ make coverage +``` + +## Step3. Config +xenon uses a configuration file xenon.conf.json. The repository includes a file called conf/xenon-sample.conf.json with so sic settings. Xenon is so smart that it does not require you to do anything with MySQL. Just need to install the MySQL service. + +Suppose you have already installed mysqld, if not, please reference [[MySQL 5.7 Install]](https://dev.mysql.com/doc/refman/5.7/en/installing.html) + +### Step3.1 Prepare the configuration file +* Copy xenon/conf/xenon-sample.conf.json to /etc/xenon/xenon.json +``` +$ sudo cp xenon/conf/xenon-sample.conf.json /etc/xenon/xenon.json +``` +* Make the following changes to the "${YOUR -....}" section: +``` +$ sudo vi /etc/xenon/xenon.json +``` + +``` +{ + "server": + { + "endpoint":"${YOUR-HOST}:8801" + }, + + "raft": + { + "meta-datadir":"raft.meta", + "leader-start-command":"${YOUR-LEADER-START-COMMAND}", + "leader-stop-command":"${YOUR-LEADER-STOP-COMMAND}" + }, + + "mysql": + { + "admin":"root", + "passwd":"", + "host":"localhost", + "port":${YOUR-MYSQL-PORT}, + "basedir":"${YOUR-MYSQL-BIN-DIR}", + "defaults-file":"${YOUR-MYSQL-CNF-PATH}" + }, + + "replication": + { + "user":"${YOUR-MYSQL-REPL-USER}", + "passwd":"${YOUR-MYSQL-REPL-PWD}" + }, + "backup": + { + "ssh-host":"%{YOUR-HOST}", + "ssh-user":"${YOUR-SSH-USER}", + "ssh-passwd":"${YOUR-SSH-PWD}", + "basedir":"${YOUR-MYSQL-BIN-DIR}", + "backup-dir":"${YOUR-BACKUP-DIR}", + "xtrabackup-bindir":"${YOUR-XTRABACKUP-BIN-DIR}" + }, + + "rpc": + { + "request-timeout":500 + }, + + "log": + { + "level":"INFO" + } +} +``` +Here's a [simple template](config/xenon-simple.conf.json) for your reference. + +### Step3.2 Configuration instructions + +All of the above Fields marked with ${YOUR -...} needs to be replaced with your own parameters before starting. + +These options: +``` +server: + "endpoint":"${YOUR-HOST}:8801" --xenon machine ip + +raft: + "leader-start-command":"${YOUR-START-VIP-CMD}" --start vip + "leader-stop-command":"${YOUR-STOP-VIP-CMD}" --stop vip + +mysql: + "port":${YOUR-MYSQL-PORT} --xenon manages native mysql port. Default is 3306 + "basedir":"${YOUR-MYSQL-BIN-DIR}" --basedir in mysql profile path. + "defaults-file":"${YOUR-MYSQL-CNF-PATH}" --mysql profile path, xenon uses it to start mysql. + +replication: + "user":"${YOUR-MYSQL-REPL-USER}" --mysql replication user. It can be created automatically + "passwd":"${YOUR-MYSQL-REPL-PWD}" --mysql replication password. It can be created automatically + +backup: + "ssh-host":"%{YOUR-HOST}" --current intranet IP, for backup + "ssh-user":"${YOUR-SSH-USER}" --ssh user, for backup. When rebuildme, use it to get backups + "ssh-passwd":"${YOUR-SSH-PWD}" --ssh password, for backup. When rebuildme, use it to get backups + "basedir":"${YOUR-MYSQL-BIN-DIR}" --basedir in mysql profile path. + "backup-dir":"${YOUR-BACKUP-DIR}" --backupdir, it can same as mysql's datadir or others. + "xtrabackup-bindir":"${YOUR-XTRABACKUP-BIN-DIR}" --xtrabackup command path. +``` + +### Step3.3 Account Description + +Here need to be aware that the account running xenon must be consistent with the mysql account, such as the use of ubuntu account to start xenon, it requires ubuntu mysql boot and mysql directory permissions. + +This is not the same with the traditional mysql place, not in need of mysql account, run xenon account colleague is mysql account. + +**Note :** Following is a synopsis of command line samples. For simplicity, we assume `xenon` is in your path. If not, replace `xenon` with `/path/to/xenon`. + + + +## Step4 Start xenon + +``` +# mkdir /data/ + +# echo "/etc/xenon/xenon.json" > xenon/bin/config.path + +# ./xenon -c /etc/xenon/xenon.json > /data/xenon.log 2>&1 & + +# cat /data/xenon.log +``` + +**Note**: +``` +In the xenon command path, you need to have a file called config.path which is the absolute path to the xenon.json file. Be sure to specify the `xenon_config_file` location with `-c` or `--config`. +``` + +If the configuration is no problem, xenon will do after boot: +* Detect mysqld, if the process does not exist then start +* Waiting for the mysql can serve to detect the existence of duplicate accounts, or create + +Now xenon has started successfully, the final step is keepalived configuration. + +## Step5 Keepalived configuration and start + +Keepalived is a routing software written in C. The main goal of this project is to provide simple and robust facilities for loadbalancing and high-availability to Linux system and Linux based infrastructures. + +In the following steps, keepalive is installed by default. If not, you can refer to [Install](http://www.keepalived.org/doc/installing_keepalived.html) for configuration + +For learning more news, please see its [official website](http://www.keepalived.org/). + +**Note**: All of the operation is under root. + +### Step5.1 LVS + +LVS(Linux Virtual Server)is load balancing software for Linux kernel–based operating systems. + +A group of servers are connected to each other via a high-speed LAN(Local Area Network) or a geographically distributed wide area network. At their front end there is a Load Balancer which seamlessly dispatches network requests to real servers. + +Therefore, the structure of the server cluster is transparent to the user. The user accesses the network service provided by the cluster system just as if accessing a high performance and highly available server. + + +Here are some specific operations : +``` +$ sudo su - + +# vip=${{YOUR-VIP}} + +# /sbin/ifconfig lo down; + +# /sbin/ifconfig lo up; + +# echo 1 > /proc/sys/net/ipv4/conf/lo/arp_ignore; + +# echo 2 > /proc/sys/net/ipv4/conf/lo/arp_announce; +# echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore; + +# echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce; + +# /sbin/ifconfig lo:0 ${vip} broadcast ${vip} netmask 255.255.255.255 up; + +# /sbin/route add -host ${vip} dev lo:0; + +# MySQL_port=${{YOU-MYSQL-PORT}} + +# M_MAC=${{YOU-MASTER-MAC}} + +# iptables -t mangle -I PREROUTING -d ${vip} -p tcp -m tcp --dport ${MySQL_port} -m mac ! --mac-source ${M_MAC} -j MARK --set-mark 0x1; + +# S1_MAC=${{YOU-SLAVE1-MAC}} +# iptables -t mangle -I PREROUTING -d ${vip} -p tcp -m tcp --dport ${MySQL_port} -m mac ! --mac-source ${S1_MAC} -j MARK --set-mark 0x1; + +# S2_MAC=${{YOU-SLAVE2-MAC}} + +# iptables -t mangle -I PREROUTING -d ${vip} -p tcp -m tcp --dport ${MySQL_port} -m mac ! --mac-source ${S2_MAC} -j MARK --set-mark 0x1; +``` +### Step5.2 Compile Keepalived.conf + +If you want to see a simple configuration, there is a [template](config/192.168.0.11_keepalived.md). If you want to know more, there are a lot of [keepalived configuration introduction](http://www.keepalived.org/doc/configuration_synopsis.html). + +### Step5.3 Start keepalived + +``` +# /etc/init.d/keepalived start +``` + +After done these, `ipvsadm -ln` can help us check the configure right or wrong. + + +## Step6 An easy example : Xenon starts with mysql + +**Note**: Following is a synopsis of command line samples. For simplicity, we assume `xenon` is in your path. If not, replace `xenon` with `/path/to/xenon`. And the operating system user is root. + +### Step6.1 Machine Condition + +First create three machines (the default version is Ubuntu16.04). They all have mysqld service + +| HostName | IP | Role | +| ------------------ | ------------ | ------ | +| i-lf9g3f5n(Master) | 192.168.0.11 | Master | +| i-0dc5giev(Slave1) | 192.168.0.2 | Slave | +| i-arb90jhc(Slave2) | 192.168.0.3 | Slave | + +### Step6.2 Mutual Trust + +Set up the trust of the three machines configured to reduce the possibility of bugs behind + +* On i-lf9g3f5n(M): + +``` +# vi /etc/hosts + add these at last: + 192.168.0.2 i-0dc5giev + 192.168.0.3 i-arb90jhc +# ssh-keygen +# ssh-copy-id ubuntu@i-0dc5giev +# ssh-copy-id ubuntu@i-arb90jhc +``` + +* On i-0dc5giev(S1): + +``` +# vi /etc/hosts + add these at last: + 192.168.0.3 i-arb90jhc + 192.168.0.11 i-lf9g3f5n + +# ssh-keygen +# ssh-copy-id ubuntu@i-arb90jhc +# ssh-copy-id ubuntu@i-lf9g3f5n +``` + +* On i-arb90jhc(S2): + +``` +# vi /etc/hosts + add these at last: + 192.168.0.2 i-0dc5giev + 192.168.0.11 i-lf9g3f5n + +# ssh-keygen +# ssh-copy-id ubuntu@i-0dc5giev +# ssh-copy-id ubuntu@i-lf9g3f5n +``` + +### Step6.3 Start Mysqld + +Start mysqld on each machine. + +If you want to get my configure, please click [my.cnf](config/MySQL.md) + +``` +# mysqld_safe --defaults-file=/etc/mysql/mysqld.conf.d/mysqld.conf & +``` + +### Step6.4 Start Xenon + +**Note :** Before starting xenon make sure the mysqld service is up and running + +Start xenon on each machine. The three nodes add the other two node `ip:port` to each other. + +If you want to get my configure, please click [192.168.0.11_xenon](config/192.168.0.11_xenon.md), [192.168.0.2_xenon](config/192.168.0.2_xenon.md) and [192.168.0.3_xenon](config/192.168.0.3_xenon.md). + +For more information on start xenon please refer to `Step3` and `Step4`. + +* On each node + +``` +# mkdir -p /etc/xenon/ + +# mkdir -p /data/raft + +# mkdir -p /data/mysql + +# mkdir -p /opt/xtrabackup/ + +# mkdir -p /data/log + +# touch /etc/xenon/xenon.json + +# ./xenon -c /etc/xenon/xenon.json > /data/log/xenon.log 2>&1 & +``` + +* On Master(192.168.0.11) + +``` +./xenoncli cluster add 192.168.0.2:3306,192.168.0.3:3306 +``` + +* On Slave1(192.168.0.2) + +``` +./xenoncli cluster add 192.168.0.11:3306,192.168.0.3:3306 +``` + +* On slave2 (192.168.0.3) + +``` +./xenoncli cluster add 192.168.0.11:3306,192.168.0.2:3306 +``` + +### Step6.5 Start Keepalived + +**Note :** I just configured the keepalived service on `Master` and `Slave1`. You can follow my configuration to operate, you can also follow your train of thought(for more detail about config and start Keepalived, refer to `Step5`). + +If you want to get my configure, please click [192.168.0.11_keepalived](config/192.168.0.11_keepalived.md) and [192.168.0.2_keepalived](config/192.168.0.2_keepalived.md). + +For more information on start xenon please refer to [Keepalived-Configuration](keepalived.md) + +* On each node + +``` +# /sbin/ifconfig lo down; + +# /sbin/ifconfig lo up; + +# echo 1 >/proc/sys/net/ipv4/conf/lo/arp_ignore; + +# echo 2 >/proc/sys/net/ipv4/conf/lo/arp_announce; + +# echo 1 >/proc/sys/net/ipv4/conf/all/arp_ignore; + +# echo 2 >/proc/sys/net/ipv4/conf/all/arp_announc; + +# /sbin/ifconfig lo:0 192.168.0.252 broadcast 192.168.0.252 netmask 255.255.255.255 up; + +# /sbin/route add -host 192.168.0.252 dev lo:0; +``` + +* On Master(192.168.0.11) + +``` +# iptables -t mangle -I PREROUTING -d 192.168.0.252 -p tcp -m tcp --dport 3306 -m mac ! --mac-source 52:54:dc:da:f0:cd -j MARK --set-mark 0x1 + +# iptables -t mangle -I PREROUTING -d 192.168.0.252 -p tcp -m tcp --dport 3306 -m mac ! --mac-source 52:54:dc:da:f0:cd -j MARK --set-mark 0x1 + +# ipvsadm --set 5 4 120 + +# /etc/init.d/keepalived start +``` + +* On Slave1(192.168.0.2) + +``` +# iptables -t mangle -I PREROUTING -d 192.168.0.252 -p tcp -m tcp --dport 3306 -m mac ! --mac-source 52:54:df:87:51:63 -j MARK --set-mark 0x1 + +# iptables -t mangle -I PREROUTING -d 192.168.0.252 -p tcp -m tcp --dport 3306 -m mac ! --mac-source 52:54:77:db:fc:ee -j MARK --set-mark 0x1 + +# ipvsadm --set 5 4 120 + +# /etc/init.d/keepalived start +``` + +* On Master(192.168.0.11) + +``` +# iptables -t mangle -I PREROUTING -d 192.168.0.252 -p tcp -m tcp --dport 3306 -m mac ! --mac-source 52:54:dc:da:f0:cd -j MARK --set-mark 0x1 + +# iptables -t mangle -I PREROUTING -d 192.168.0.252 -p tcp -m tcp --dport 3306 -m mac ! --mac-source 52:54:dc:da:f0:cd -j MARK --set-mark 0x1 + +# ipvsadm --set 5 4 120 + +# /etc/init.d/keepalived start +``` + +* On Slave1(192.168.0.2) + +``` +# iptables -t mangle -I PREROUTING -d 192.168.0.252 -p tcp -m tcp --dport 3306 -m mac ! --mac-source 52:54:df:87:51:63 -j MARK --set-mark 0x1 + +# iptables -t mangle -I PREROUTING -d 192.168.0.252 -p tcp -m tcp --dport 3306 -m mac ! --mac-source 52:54:77:db:fc:ee -j MARK --set-mark 0x1 + +# ipvsadm --set 5 4 120 + +# /etc/init.d/keepalived start +``` diff --git a/docs/how_xenon_works.md b/docs/how_xenon_works.md new file mode 100644 index 0000000..9ff2118 --- /dev/null +++ b/docs/how_xenon_works.md @@ -0,0 +1,459 @@ +[TOC] + +# internal mechanism on xenon + +## overview +MySQL is a very important RDS(Relational Database Service) in the field of cloud computing and has being widely used, but the operation and maintenance of MySQL are very complicated. In order to provide a better service, we developed Xenon. It helps MySQL Cluster be more availability and makes the strong consistency reach a new height. With highly automation and no human intervention, the O&M(Operation and Maintenance) are now easier and cost less. + +Xenon is a decentralized agent with no intrusive access to MySQL sources. A xenon manages a MySQL instance. It doesn't care about the deployment site as long as the network is reachable. + +It uses LVS + Raft + GTID parallel replication for master and data synchronization. More importantly, xenon rescues a number of operation and maintenance personnel. Now their greatest pleasure is in the production of casual master. + +`Xenon` is a MySQL replication topology HA, management and visualization tool, allowing for: + +**Discovery** + +`Xenon` actively crawls through your topologies and maps them. It reads basic MySQL info such as replication status and configuration. + +**Refactoring** + +`Xenon` understands replication rules. It knows about binlog file:position, GTID, Binlog Servers. + +Refactoring replication topologies can be a matter of drag & drop a replica under another master. Moving replicas around is safe: `xenon` will reject an illegal refactoring attempt. + +**Recovery** + +`Xenon` uses a holistic approach to detect master and intermediate master failures. Based on information gained from the topology itself, it recognizes a variety of failure scenarios. + +Optionally, it has the option to restore the node (which also allows the user to specify the recovery node). + +## 1 Xenon Raft+ + +The following describes the mechanism for xenon on raft. + +### 1.1 Highly Available + +In order to make the cluster highly available and the data reliable, we developed a new protocol based on the raft distributed coherency protocol : **`Raft+`** + +`Raft+`is the perfect combination of MySQL GTID parallel replication technology and `distributed conformance protocol raft`. + +If the cluster Master fault, `Raft+` will automatically second-level switch. It can ensure that zero data loss after switching and the cluster is still available. + +### 1.2 Raft+ Introduction + +In `Raft+`, we use the MySQL GTID (Global Transaction Identifier) <200b><200b>as the log index for the `Raft protocol` in conjunction with MySQL's Multi-Threaded Slave (MTS). It can complete the log entry parallel copy, parallel playback, log replay consumes an exceptionally short time, and the external service immediately after the failover. + +At the same time, `Raft+` uses Semi-Sync-Replication to ensure that at least one slave is completely synchronized with the master. After the master fails, the slave whose data is completely synchronized will be selected as the new Master. + +This ensures zero data loss and high availability. + +### 1.3 How Raft+ Works + +Set up a three-node cluster, one master and two slave. + +**The following is a gitd synchronization:** + +``` +{Master, [GTID:{1,2,3,4,5}] +{Slave1, [GTID:{1,2,3,4,5}] +{Slave2, [GTID:{1,2,3}] +``` + +* When the Master is not serviceable, Slave1 and Slave2 immediately start a new winner. + +* Xenon always ensure that the larger GTID has been synchronized to become a new master. Here is `Slave1`. + +* During the `VoteRequest` process, Slave1 directly rejects Slave2's `VoteRequest`, causing Slave2 to directly enter the next round of `VoteRequest` waiting for Slave1 to be elected. Therefore, the new Master data is fully synchronized with the old Master, thus ensuring zero data loss. + +* When Slave2 receives Heartbeat of Slave1. `CHANGE MASTER TO slave1` is automatically changed, and then data is copied according to GTID. + +**At this point, the cluster status changes to :** + +``` +{xxxooo, [GTID:{1,2,3,4,5}] +{Master, [GTID:{1,2,3,4,5}] +{Slave2, [GTID:{1,2,3,4,5}] +``` + +### 1.4 Raft+ Cluster Monitoring + +In order to monitor the cluster status of `Raft+`, we provide `xenoncli cluster` functionality. + +``` +$ xenoncli cluster status ++-------------+-------------------------------+---------+---------------------+----------------+ +| ID | Raft | Mysqld | Mysql | IO/SQL_RUNNING | ++-------------+-------------------------------+---------+---------------------+----------------+ +| 192.168.0.2 | [ViewID:2 EpochID:0]@LEADER | RUNNING | [ALIVE] [READWRITE] | [true/true] | +| | | | | | ++-------------+-------------------------------+---------+---------------------+----------------+ +| 192.168.0.3 | [ViewID:2 EpochID:0]@FOLLOWER | RUNNING | [ALIVE] [READONLY] | [true/true] | +| | | | | | ++-------------+-------------------------------+---------+---------------------+----------------+ +| 192.168.0.4 | [ViewID:2 EpochID:0]@FOLLOWER | RUNNING | [ALIVE] [READONLY] | [true/true] | +| | | | | | ++-------------+-------------------------------+---------+---------------------+----------------+ +``` + +#### 1.4.1 RAFT Status + +``` +type RaftStats struct { + // How many times the Pings called + Pings uint64 + + // How many times the HaEnables called + HaEnables uint64 + + // How many times the candidate promotes to a leader + LeaderPromotes uint64 + + // How many times the leader degrade to a follower + LeaderDegrades uint64 + + // How many times the leader got hb request from other leader + LeaderGetHeartbeatRequests uint64 + + // How many times the leader got vote request from others candidate + LeaderGetVoteRequests uint64 + + // How many times the leader got minority hb-ack + LessHearbeatAcks uint64 + + // How many times the follower promotes to a candidate + CandidatePromotes uint64 + + // How many times the candidate degrades to a follower + CandidateDegrades uint64 + + // How long of the state up + StateUptimes uint64 + + // The state of mysql: READONLY/WRITEREAD/DEAD + RaftMysqlStatus RAFTMYSQL_STATUS +} +``` + +#### 1.4.2 MySQL Status + +``` +type GTID struct { + // Mysql master log file which the slave is reading + Master_Log_File string + + // Mysql master log postion which the slave has read + Read_Master_Log_Pos uint64 + + // Slave IO thread state + Slave_IO_Running bool + + // Slave SQL thread state + Slave_SQL_Running bool + + // The GTID sets which the slave has received + Retrieved_GTID_Set string + + // The GTID sets which the slave has executed + Executed_GTID_Set string + + // Seconds_Behind_Master in 'show slave status' + Seconds_Behind_Master string + + // Slave_SQL_Running_State in 'show slave status' + // The value is identical to the State value of the SQL thread as displayed by SHOW PROCESSLIST + Slave_SQL_Running_State string + + //The Last_Error suggests that there may be more failures + //in the other worker threads which can be seen in the replication_applier_status_by_worker table + //that shows each worker thread's status + Last_Error string +} +``` + +#### 1.4.3 MySQLD Status + +``` +type MysqldStats struct { + // How many times the mysqld have been started by xenon + MysqldStarts uint64 + + // How many times the mysqld have been stopped by xenon + MysqldStops uint64 + + // How many times the monitor have been started by xenon + MonitorStarts uint64 + + // How many times the monitor have been stopped by xenon + MonitorStops uint64 +} +``` + +#### 1.4.4 Backup Status + +``` +type BackupStats struct { + // How many times backup have been called + Backups uint64 + + // How many times backup have failed + BackupErrs uint64 + + // How many times apply-log have been called + AppLogs uint64 + + // How many times apply-log have failed + AppLogErrs uint64 + + // How many times cannel have been taken + Cancels uint64 + + // The last error message of backup/applylog + LastError string + + // The last backup command info we call + LastCMD string +} +``` + +#### 1.4.5 Config Status + +``` +type ConfigStatus struct { + // log + LogLevel string + + // backup + BackupDir string + BackupIOPSLimits int + XtrabackupBinDir string + + // mysqld + MysqldBaseDir string + MysqldDefaultsFile string + + // mysql + MysqlAdmin string + MysqlHost string + MysqlPort int + MysqlReplUser string + MysqlPingTimeout int + + // raft + RaftDataDir string + RaftHeartbeatTimeout int + RaftElectionTimeout int + RaftRPCRequestTimeout int + RaftProtectionMode string + RaftStartVipCommand string + RaftStopVipCommand string +} +``` + +### 1.5 Raft+ Readonly Status + +In addition to `Leader`/`Candidate`/`Follower` three states outside raft + also provides `Idle` state: + +* **Idle state :** Don't participate in election Lord but will perceive Leader changes to change the replication channel. The `Idle` state is suitable for being deployed as a disaster recovery instance in a remote computer room. + +Through the `Idle` settings, different xenon nodes can be reassembled to provide services, which we call `Semi-Raft Group`. + +For example, a computer room A has 3 nodes, forming a `Semi-Raft Group`. The states are: + +``` +[A1:Leader, A2:Follower, A3: Follower] +``` + +Room B has 3 disaster recovery nodes(Semi-Raft Group): + +``` +[B1:Idle, B2:Idle, B3:Idle] +``` + +If room A is powered off and resumes for a long period of time, we can set up three instances of room B from Idle to Follower. + +In this way, Semi-Raft Group of the room B initiates selection of external services to hosts. Combined with `BinlogServer`, A's data exactly the same. + + +## 2 High Availability + +### 2.1 Ways to be HA + +HA is achieved by choosing either: + +* xenon/keepalived setup, where xenon switch VIP for service. + +* xenon/raft setup, where xenon nodes communicate by raft consensus. Each xenon node has a private database backend. + +### 2.2 HA via Keepalived + +HA is achieved by highly available keepalived. Keepalived is a Web service based on VRRP(Virtual Router Redundancy Protocol) agreement to achieve high availability program. + +Keepalived can be used to avoid single points of failure. A WEB service will have at least 2 servers running Keepalived. The one is master server (MASTER), the other is backup server (BACKUP). But the external appearance of a VIP(Virtual IP). The MASTER SERVER sends a specific message to BACKUP SERVER. +When the BACKUP SERVER does not receive this message means that the MAIN SERVER downtime. The BACKUP SERVER takes over the VIP and continues to provide the service. Thus ensuring high availability. + +### 2.3 HA via Raft+ +Xenon nodes will directly communicate via `Raft+` consensus algorithm. Each xenon node has its own private backend MySQL. + +Only one xenon node assumes leadership, and is always a part of a consensus. However all other nodes are independently active and are polling your topologies. + +It is recommended to run a 3-node setup. If there is only two nodes, the replication between the databases is asynchronous + +To access your MySQL service you may only speak to the RVIP/WVIP. + +* Use xenon/bin/xenoncli check for your proxy. + +## 3 Retake Slave + +OLTP high concurrency allows us to choose the master-slave replication architecture. However, in many cases of life, we find it is very troublesome to find that a slave library often causes a copy thread to be false for various reasons, or to add a slave node again. For a variety of reasons, xenon provides the rebuild slave function, which requires just a simple command from the library to solve the problem of copying from the library for quick use. + +### 3.1 Analysis Process + +* Xenon provides streaming backup, directly through the ssh hit the mysql data directory on the end machine, without any additional space, you can quickly complete the standby library re-take. + +* Assuming Slave1 is broken, you need to prepare the library to take a ride: + +``` + Master(A) + / \ +Slave1(B) Slave2(C) +``` + +The following is a simple operation process: + +1. B-xenon select the best backup source(mysql synchronized master data most), the assumption is C-xenon + +2. B-xenon kills B-mysql and empties its data directory + +3. B-xenon initiates a hotbackup request to C-xenon. Transfer B-xenon own ssh-user/ssh-passwd/iops at the same time + +4. C-xenon begins to back up and stream data to data directory under B-mysql which is managed by B-xenon. + +5. B-xenon received a backup of C-xenon. Completed + +6. B-xenon starts to apply log + +7. B-xenon starts the MySQL service + +8. Change the master-slave relationship. Master is current node. + +9. Start replicating. + +10. Re-take slave successed. + +### 3.2 Actual Operation + +In actual production, Master-Slave replication problem may be the most common. + +When a copy problem occurs and the problem is clear, we use `xenoncli mysql rebuildme` for fast rebuild. + +* The following is a complete rebuildme log: + +```plain + $ xenoncli mysql rebuildme + + 2017/10/17 10:59:02.391964 mysql.go:177: [WARNING] =====prepare.to.rebuildme===== + IMPORTANT: Please check that the backup run completes successfully. + At the end of a successful backup run innobackupex + prints "completed OK!". + + 2017/10/17 10:59:02.392296 mysql.go:187: [WARNING] S1-->check.raft.leader + 2017/10/17 10:59:02.399614 callx.go:140: [WARNING] rebuildme.found.best.slave[192.168.0.4:8801].leader[192.168.0.2:8801] + 2017/10/17 10:59:02.399633 mysql.go:203: [WARNING] S2-->prepare.rebuild.from[192.168.0.4:8801].... + 2017/10/17 10:59:02.400324 mysql.go:214: [WARNING] S3-->check.bestone[192.168.0.4:8801].is.OK.... + 2017/10/17 10:59:02.400336 mysql.go:219: [WARNING] S4-->disable.raft + 2017/10/17 10:59:02.400869 mysql.go:227: [WARNING] S5-->stop.monitor + 2017/10/17 10:59:02.402494 mysql.go:233: [WARNING] S6-->kill.mysql + 2017/10/17 10:59:02.443844 mysql.go:250: [WARNING] S7-->check.bestone[192.168.0.4:8801].is.OK.... + 2017/10/17 10:59:03.494280 mysql.go:264: [WARNING] S8-->rm.datadir[/home/mysql/data3306/] + 2017/10/17 10:59:03.494321 mysql.go:269: [WARNING] S9-->xtrabackup.begin.... + 2017/10/17 10:59:03.494837 callx.go:386: [WARNING] rebuildme.backup.from[192.168.0.4:8801] + 2017/10/17 10:59:21.375151 mysql.go:273: [WARNING] S9-->xtrabackup.end.... + 2017/10/17 10:59:21.375184 mysql.go:278: [WARNING] S10-->apply-log.begin.... + 2017/10/17 10:59:22.781295 mysql.go:281: [WARNING] S10-->apply-log.end.... + 2017/10/17 10:59:22.781575 mysql.go:286: [WARNING] S11-->start.mysql.begin... + 2017/10/17 10:59:22.782444 mysql.go:290: [WARNING] S11-->start.mysql.end... + 2017/10/17 10:59:22.782459 mysql.go:295: [WARNING] S12-->wait.mysqld.running.begin.... + 2017/10/17 10:59:25.795803 callx.go:349: [WARNING] wait.mysqld.running... + 2017/10/17 10:59:25.810427 mysql.go:297: [WARNING] S12-->wait.mysqld.running.end.... + 2017/10/17 10:59:25.810470 mysql.go:302: [WARNING] S13-->wait.mysql.working.begin.... + 2017/10/17 10:59:28.811584 callx.go:583: [WARNING] wait.mysql.working... + 2017/10/17 10:59:28.812049 mysql.go:304: [WARNING] S13-->wait.mysql.working.end.... + 2017/10/17 10:59:28.812219 mysql.go:309: [WARNING] S14-->reset.slave.begin.... + 2017/10/17 10:59:28.816761 mysql.go:313: [WARNING] S14-->reset.slave.end.... + 2017/10/17 10:59:28.816797 mysql.go:319: [WARNING] S15-->reset.master.begin.... + 2017/10/17 10:59:28.822253 mysql.go:321: [WARNING] S15-->reset.master.end.... + 2017/10/17 10:59:28.822322 mysql.go:326: [WARNING] S15-->set.gtid_purged[194758cd-b21c-11e7-80b7-5254281e57de:1-9245708].begin.... + 2017/10/17 10:59:28.824089 mysql.go:330: [WARNING] S15-->set.gtid_purged.end.... + 2017/10/17 10:59:28.824112 mysql.go:340: [WARNING] S16-->enable.raft.begin... + 2017/10/17 10:59:28.824680 mysql.go:344: [WARNING] S16-->enable.raft.done... + 2017/10/17 10:59:28.824717 mysql.go:350: [WARNING] S17-->wait[4000 ms].change.to.master... + 2017/10/17 10:59:28.824746 mysql.go:356: [WARNING] S18-->start.slave.begin.... + 2017/10/17 10:59:29.058472 mysql.go:360: [WARNING] S18-->start.slave.end.... + 2017/10/17 10:59:29.058555 mysql.go:364: [WARNING] completed OK! + 2017/10/17 10:59:29.058571 mysql.go:365: [WARNING] rebuildme.all.done.... +``` + +If the problem is not clear and needs to be analyzed in depth, let's delete the node by adding more nodes to ensure that the majority can service. This is very flexible. + +**Note :** +``` +1. Before rebuild, make sure the main library is alive. + Quickly add a new node is also done through the `rebuildme` function. + +2. If there is an error, you need to log according to the prompts to analyze. + The main analysis is to reconstruct the node log and backup node log. +``` + +## 4 Faliover + +## 4.1 Select the main conditions + +xenon master election using the raft protocol, the election basis conditions: + * Master_Log_File + * Read_Master_Log_Pos + * Slave_SQL_Running + +Which slave get the binlog up and no copy error, it is the new master candidate. + +## 4.2 Select the main process + +Suppose we cluster deployment mode 1 main 2 backup (respectively in 3 containers): + +``` + Master(A) + / \ +Slave1(B) Slave2(C) +``` + +* A-xenon (admin A's xenon) periodically sends heartbeats to other B / C-xenons, reports on the health of A-mysql, and maintains master-slave relationships. + +* When A-mysql is unavailable (maybe mysql hangs, even the container hangs up), B / C-xenon triggers a new master election if it does not receive A-xenon heartbeat within a certain period of time (configurable, default 3s). + +* **Suppose C-xenon first initiated the main election, the normal process is as follows:** + +``` +1. At the same time send vote-request for A and B. + +2. Mostly(favor-num > n/2+1) in favor and no objection.(If there is a negative vote, it means that C-mysql has less data than the opponent) + +3. Promoted to master + +4. Call the vip start +``` + +At this point A-xenon receives the heartbeat of C-xenon, you need to do the following: + +``` +1. Change the relationship between master and slave(if mysql is available). Start copying data from C-mysql sync + +2. Call the vip stop +``` + +At this point B-xenon receives the heartbeat of C-xenon, you need to do the following: + +``` +1. Change the relationship between master and slave. Start copying data from C-mysql sync +``` + +**The whole election process is very short, usually `3-6 seconds` to complete.** + diff --git a/docs/images/xenon.png b/docs/images/xenon.png new file mode 100644 index 0000000..a0a3633 Binary files /dev/null and b/docs/images/xenon.png differ diff --git a/docs/xenoncli_commands.md b/docs/xenoncli_commands.md new file mode 100644 index 0000000..58c501b --- /dev/null +++ b/docs/xenoncli_commands.md @@ -0,0 +1,214 @@ +[TOC] + +# command of xenoncli + +## overview + +Xenoncli provides very rich management functionality for external invocation. Therefore, the automatic operation dimension is realized. + +Make sure xenon is up and enter the following command. You will find xenoncli instruction on all levels of operation. Re buildme is the need to operate mysql + +``` +# ./xenoncli -h +A simple command line client for xenon + +Usage: + xenoncli [command] + +Available Commands: + cluster cluster related commands + init init the xenon config file + mysql mysql related commands + perf perf related commands + raft raft related commands + version Print the version number of xenon client + xenon xenon related commands +``` + +## 1 Cluster Status + +``` +# ./xenoncli cluster -h +cluster related commands + +Usage: + xenoncli cluster [command] + +Available Commands: + add add peers to leader(if there is no leader, add to local) + gtid show cluster gtid status + log merge cluster xenon.log from logdir + mysql show cluster mysql status + raft show cluster raft status + remove remove peers from leader(if there is no leader, remove from local) + status show cluster status + xenon show cluster xenon status +``` + +### 1.1 Add cluster node + +Assuming cluster has 3 nodes : +``` +xenon-1: 192.168.0.2:8801 +xenon-2: 192.168.0.3:8801 +xenon-3: 192.168.0.5:8801 +``` +Executing follow command: +``` +./xenoncli cluster add 192.168.0.2:8801 192.168.0.3:8801,192.168.0.5:8801 +``` +***xenon allows adding duplicate nodes, If new nodes are already in the cluster without any action*** + +### 1.2 Check cluster status + +``` +$./xenoncli cluster status ++------------------+-------------------------------+---------+---------+----------------------------+---------------------+----------------+------------------+ +| ID | Raft | Mysqld | Monitor | Backup | Mysql | IO/SQL_RUNNING | MyLeader | ++------------------+-------------------------------+---------+---------+----------------------------+---------------------+----------------+------------------+ +| 192.168.0.2:8801 | [ViewID:1 EpochID:0]@FOLLOWER | RUNNING | ON | state:[NONE]␤ | [ALIVE] [READONLY] | [true/true] | 192.168.0.5:8801 | +| | | | | LastError:␤ | | | | ++------------------+-------------------------------+---------+---------+----------------------------+---------------------+----------------+------------------+ +| 192.168.0.3:8801 | [ViewID:1 EpochID:0]@FOLLOWER | RUNNING | ON | state:[NONE]␤ | [ALIVE] [READONLY] | [true/true] | 192.168.0.5:8801 | +| | | | | LastError:␤ | | | | ++------------------+-------------------------------+---------+---------+----------------------------+---------------------+----------------+------------------+ +| 192.168.0.5:8801 | [ViewID:1 EpochID:0]@LEADER | RUNNING | ON | state:[NONE]␤ | [ALIVE] [READWRITE] | [true/true] | 192.168.0.5:8801 | +| | | | | LastError:␤ | | | | ++------------------+-------------------------------+---------+---------+----------------------------+---------------------+----------------+------------------+ +(3 rows) +``` +### 1.3 Check cluster raft status + +``` +$./xenoncli cluster raft ++------------------+----------+-----------+-----------+----------------+-----------+-----------+-----------+------------+-------------------+ +| ID | Raft | LPromotes | LDegrades | LGetHeartbeats | LGetVotes | CPromotes | CDegrades | Raft@Mysql | StateUptimes(sec) | ++------------------+----------+-----------+-----------+----------------+-----------+-----------+-----------+------------+-------------------+ +| 192.168.0.2:8801 | FOLLOWER | 0 | 0 | 0 | 0 | 0 | 0 | | 4 | ++------------------+----------+-----------+-----------+----------------+-----------+-----------+-----------+------------+-------------------+ +| 192.168.0.3:8801 | FOLLOWER | 0 | 0 | 0 | 0 | 1 | 0 | | 19155 | ++------------------+----------+-----------+-----------+----------------+-----------+-----------+-----------+------------+-------------------+ +| 192.168.0.5:8801 | LEADER | 1 | 0 | 0 | 0 | 1 | 0 | | 19150 | ++-----------------+----------+-----------+-----------+----------------+-----------+-----------+-----------+------------+-------------------+ +(3 rows) +``` +### 1.4 Check cluster mysql status + +``` +$./xenoncli cluster mysql ++------------------+----------+-------+-----------+------------------------------+----------------+----------------+------------+ +| ID | Raft | Mysql | Option | Master_Log_File/Pos | IO/SQL_Running | Seconds_Behind | Last_Error | ++------------------+----------+-------+-----------+------------------------------+----------------+----------------+------------+ +| 192.168.0.2:8801 | FOLLOWER | ALIVE | READONLY | [mysql-bin.000027/740423004] | [true/true] | 502 | | ++------------------+----------+-------+-----------+------------------------------+----------------+----------------+------------+ +| 192.168.0.3:8801 | FOLLOWER | ALIVE | READONLY | [mysql-bin.000027/740423004] | [true/true] | 480 | | ++------------------+----------+-------+-----------+------------------------------+----------------+----------------+------------+ +| 192.168.0.5:8801 | LEADER | ALIVE | READWRITE | [mysql-bin.000027/740468486] | [true/true] | | | ++------------------+----------+-------+-----------+------------------------------+----------------+----------------+------------+ +(3 rows) +``` +### 1.5 Check cluster gtid status + +``` +$./xenoncli cluster gtid ++------------------+----------+-------+------------------------------------------------+------------------------------------------------------+ +| ID | Raft | Mysql | Executed_GTID_Set | Retrieved_GTID_Set | ++------------------+----------+-------+------------------------------------------------+------------------------------------------------------+ +| 192.168.0.2:8801 | FOLLOWER | ALIVE | 91ad5418-967a-11e6-a0b3-525482b1ed69:1-1634089 | 91ad5418-967a-11e6-a0b3-525482b1ed69:1542637-2968736 | ++------------------+----------+-------+------------------------------------------------+------------------------------------------------------+ +| 192.168.0.3:8801 | FOLLOWER | ALIVE | 91ad5418-967a-11e6-a0b3-525482b1ed69:1-1691280 | 91ad5418-967a-11e6-a0b3-525482b1ed69:661-2968737 | ++------------------+----------+-------+------------------------------------------------+------------------------------------------------------+ +| 192.168.0.5:8801 | LEADER | ALIVE | 91ad5418-967a-11e6-a0b3-525482b1ed69:1-2968742 | | ++------------------+----------+-------+------------------------------------------------+------------------------------------------------------+ +(3 rows) +``` + +## 2 MySQL Operation + +``` +# ./xenoncli mysql -h +mysql related commands + +Usage: + xenoncli mysql [command] + +Available Commands: + backup backup this mysql to backupdir + cancelbackup + changepassword update mysql normal user password + createsuperuser create mysql super user + createuser create mysql normal user + createuserwithgrants create mysql normal user with privileges + dropuser drop mysql normal user + kill kill mysql pid(becareful!) + rebuildme rebuild a slave --from=endpoint + shutdown + start start mysql + startmonitor start mysqld monitor + status mysql status in JSON(mysqld/slave_SQL/IO is running) + stopmonitor stop mysqld monitor + sysvar set global variables +``` + +`e.g.` Although, in the above there is a simple description, but I suggest you to help rebuildme operation. After all, caution will not go wrong. +``` +# ./xenoncli mysql rebuildme --help +rebuild a slave --from=endpoint + +Usage: + xenoncli mysql rebuildme [--from=endpoint] [flags] + +Flags: + --from string --from=endpoint +``` + +* By default, the rebuildme operation will automatically find the slave with the same master data backup, so master will not be affected too much. This will not affect the write business. + +* If you use `--from=IP:XENON_PORT`, this shows that you specify in the end is from which database to back up. + +We think most problems can be solved by default, but if you insist on using --from, we can also be allowed. + + +## 3 MySQL Stack Info + +We crawl the MySQL process through Quickstack and see how MySQL invokes stack information. The subsequent analysis of the problem has been simplified. + + +The `quickstack` feature is quick and has little impact on the process. + +``` +# ./xenoncli perf -h +perf related commands + +Usage: + xenoncli perf [command] + +Available Commands: + quickstack capture the stack of mysqld using quickstack +``` + +## 4 Raft+ Operation + +``` +# ./xenoncli raft -h +raft related commands + +Usage: + xenoncli raft [command] + +Available Commands: + add add peers to local + disable enable the node out control of raft + disablepurgebinlog disable leader to purge binlog + enable enable the node in control of raft + enablepurgebinlog enable leader to purge binlog(default) + nodes show raft nodes + remove remove peers from local + status status in JSON(state(LEADER/CANDIDATE/FOLLOWER/IDLE)) + trytoleader propose this raft as leader +``` + + +## Help +It also has many features, here is just a list of commonly used part. +* Use "xenoncli [command] --help" for more information about a command.