You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What steps will reproduce the problem?
1. Have master-slave replication running ok.
2. Manually stop mysql on master
3. Let failover run
What is the expected output? What do you see instead?
Expect failover to continue despite error
What version of the product are you using? On what operating system?
manager version 0.55
node version 0.56
Please provide any additional information below.
Tue Mar 17 23:38:28 2015 - [warning] Global configuration file
/etc/masterha_default.cnf not found. Skipping.
Tue Mar 17 23:38:28 2015 - [info] Reading application default configurations
from /etc/mhatest.cnf..
Tue Mar 17 23:38:28 2015 - [info] Reading server configurations from
/etc/mhatest.cnf..
Tue Mar 17 23:38:28 2015 - [info] MHA::MasterMonitor version 0.55.
Tue Mar 17 23:38:32 2015 - [info] Dead Servers:
Tue Mar 17 23:38:32 2015 - [info] Alive Servers:
Tue Mar 17 23:38:32 2015 - [info] MASTERNODE(xx.xx.xx.xx:3306)
Tue Mar 17 23:38:32 2015 - [info] SLAVENODE(xx.xx.xx.xy:3306)
Tue Mar 17 23:38:32 2015 - [info] Alive Slaves:
Tue Mar 17 23:38:32 2015 - [info] SLAVENODE(xx.xx.xx.xy:3306)
Version=5.6.18-enterprise-commercial-advanced-log (oldest major version between
s
laves) log-bin:enabled
Tue Mar 17 23:38:32 2015 - [info] Replicating from
MASTERNODE(xx.xx.xx.xx:3306)
Tue Mar 17 23:38:32 2015 - [info] Current Alive Master:
MASTERNODE(xx.xx.xx.xx:3306)
Tue Mar 17 23:38:32 2015 - [info] Checking slave configurations..
Tue Mar 17 23:38:32 2015 - [warning] relay_log_purge=0 is not set on slave
SLAVENODE(xx.xx.xx.xy:3306).
Tue Mar 17 23:38:32 2015 - [info] Checking replication filtering settings..
Tue Mar 17 23:38:32 2015 - [info] binlog_do_db= REPORTS,SG, binlog_ignore_db=
Tue Mar 17 23:38:32 2015 - [info] Replication filtering check ok.
Tue Mar 17 23:38:32 2015 - [info] Starting SSH connection tests..
Tue Mar 17 23:38:35 2015 - [info] All SSH connection tests passed successfully.
Tue Mar 17 23:38:35 2015 - [info] Checking MHA Node version..
Tue Mar 17 23:38:36 2015 - [info] Version check ok.
Tue Mar 17 23:38:36 2015 - [info] Checking SSH publickey authentication
settings on the current master..
Tue Mar 17 23:38:38 2015 - [info] HealthCheck: SSH to MASTERNODE is reachable.
Tue Mar 17 23:38:40 2015 - [info] Master MHA Node version is 0.56.
Tue Mar 17 23:38:40 2015 - [info] Checking recovery script configurations on
the current master..
Tue Mar 17 23:38:40 2015 - [info] Executing command: save_binary_logs
--command=test --start_pos=4 --binlog_dir=/application/mysql_data
--output_file=/var/log/mha/
save_binary_logs_test --manager_version=0.55 --start_file=mysql-bin.000905
Tue Mar 17 23:38:40 2015 - [info] Connecting to
mhauser@MASTERNODE(MASTERNODE)..
Creating /var/log/mha if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /application/mysql_data, up to mysql-bin.000905
Tue Mar 17 23:38:42 2015 - [info] Master setting check done.
Tue Mar 17 23:38:42 2015 - [info] Checking SSH publickey authentication and
checking recovery script configurations on all alive slave servers..
Tue Mar 17 23:38:42 2015 - [info] Executing command : apply_diff_relay_logs
--command=test --slave_user='mha_mon' --slave_host=SLAVENODE --slave_i
p=xx.xx.xx.xy --slave_port=3306 --workdir=/var/log/mha
--target_version=5.6.18-enterprise-commercial-advanced-log
--manager_version=0.55 --relay_log_info=/usr/serv
ergraph/mysql_data/relay-log.info --relay_dir=/usr/application/mysql_data/
--slave_pass=xxx
Tue Mar 17 23:38:42 2015 - [info] Connecting to
[email protected](SLAVENODE:22)..
Checking slave recovery environment settings..
Opening /usr/application/mysql_data/relay-log.info ... ok.
Relay log found at /usr/application/mysql_data, up to mysql-relay-bin.000491
Temporary relay log file is /usr/application/mysql_data/mysql-relay-bin.000491
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Tue Mar 17 23:38:43 2015 - [info] Slaves settings check done.
Tue Mar 17 23:38:43 2015 - [info]
MASTERNODE (current master)
+--SLAVENODE
Tue Mar 17 23:38:43 2015 - [info] Checking master_ip_failover_script status:
Tue Mar 17 23:38:43 2015 - [info] /root/scripts/master_ip_failover_tt
--command=status --ssh_user=mhauser --orig_master_host=MASTERNODE --orig_maste
r_ip=xx.xx.xx.xx --orig_master_port=3306
Tue Mar 17 23:38:43 2015 - [info] OK.
Tue Mar 17 23:38:43 2015 - [warning] shutdown_script is not defined.
Tue Mar 17 23:38:43 2015 - [info] Set master ping interval 3 seconds.
Tue Mar 17 23:38:43 2015 - [warning] secondary_check_script is not defined. It
is highly recommended setting it to check master reachability from two or more
routes.
Tue Mar 17 23:38:43 2015 - [info] Starting ping health check on
MASTERNODE(xx.xx.xx.xx:3306)..
Tue Mar 17 23:38:43 2015 - [info] Ping(SELECT) succeeded, waiting until MySQL
doesn't respond..
Wed Mar 18 00:25:25 2015 - [warning] Got error on MySQL select ping: 2006
(MySQL server has gone away)
Wed Mar 18 00:25:25 2015 - [info] Executing SSH check script: save_binary_logs
--command=test --start_pos=4 --binlog_dir=/application/mysql_data
--output_file=/var/log/mha/save_binary_logs_test --manager_version=0.55
--binlog_prefix=mysql-bin
Creating /var/log/mha if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /application/mysql_data, up to mysql-bin.000906
Wed Mar 18 00:25:27 2015 - [info] HealthCheck: SSH to MASTERNODE is reachable.
Wed Mar 18 00:25:28 2015 - [warning] Got error on MySQL connect: 2013 (Lost
connection to MySQL server at 'reading initial communication packet', system
error: 111)
Wed Mar 18 00:25:28 2015 - [warning] Connection failed 1 time(s)..
Wed Mar 18 00:25:31 2015 - [warning] Got error on MySQL connect: 2013 (Lost
connection to MySQL server at 'reading initial communication packet', system
error: 111)
Wed Mar 18 00:25:31 2015 - [warning] Connection failed 2 time(s)..
Wed Mar 18 00:25:34 2015 - [warning] Got error on MySQL connect: 2013 (Lost
connection to MySQL server at 'reading initial communication packet', system
error: 111)
Wed Mar 18 00:25:34 2015 - [warning] Connection failed 3 time(s)..
Wed Mar 18 00:25:34 2015 - [warning] Master is not reachable from health
checker!
Wed Mar 18 00:25:34 2015 - [warning] Master MASTERNODE(xx.xx.xx.xx:3306) is not
reachable!
Wed Mar 18 00:25:34 2015 - [warning] SSH is reachable.
Wed Mar 18 00:25:34 2015 - [info] Connecting to a master server failed. Reading
configuration file /etc/masterha_default.cnf and /etc/mhatest.cnf again, and
trying to connect to all servers to check server status..
Wed Mar 18 00:25:34 2015 - [warning] Global configuration file
/etc/masterha_default.cnf not found. Skipping.
Wed Mar 18 00:25:34 2015 - [info] Reading application default configurations
from /etc/mhatest.cnf..
Wed Mar 18 00:25:34 2015 - [info] Reading server configurations from
/etc/mhatest.cnf..
Wed Mar 18 00:25:36 2015 -
[error][/usr/lib/perl5/site_perl/5.10.0/MHA/Server.pm, ln885] SQL Thread is
stopped(error) on SLAVENODE(xx.xx.xx.xy:3306)! Errno:1053, Error:Query
partially completed on the master (error on master: 1053) and was aborted.
There is a chance that your master is inconsistent at this point. If you are
sure that your master is ok, run this query manually on the slave and then
restart the slave with SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE; .
Query: 'DELETE FROM admin_commands_history WHERE servername = 'somethingelse'
AND date < DATE_SUB( NOW(), INTERVAL 1 DAY ) AND (UPPER(command) REGEXP
'QUERY|SELECT' ) = 1'
Wed Mar 18 00:25:36 2015 -
[error][/usr/lib/perl5/site_perl/5.10.0/MHA/ServerManager.pm, ln193] There is
no alive slave. We can't do failover
Wed Mar 18 00:25:36 2015 - [warning] Got Error: at
/usr/lib/perl5/site_perl/5.10.0/MHA/MasterMonitor.pm line 516.
Wed Mar 18 00:25:36 2015 - [info] Got exit code 1 (Not master dead).
Original issue reported on code.google.com by [email protected] on 18 Mar 2015 at 1:17
The text was updated successfully, but these errors were encountered:
Wed Mar 18 00:25:36 2015 -
[error][/usr/lib/perl5/site_perl/5.10.0/MHA/Server.pm, ln885] SQL Thread is
stopped(error) on SLAVENODE(xx.xx.xx.xy:3306)! Errno:1053,
This means SQL thread stopped on the host with error 1053. When master was
down, it was expected IO thread stopped but SQL thread should not have stopped.
If SQL thread stops, MHA aborts and does not start failover. To fix this
problem, investigate the root cause of the SQL thread error (MHA should not be
relevant here).
Original issue reported on code.google.com by
[email protected]
on 18 Mar 2015 at 1:17The text was updated successfully, but these errors were encountered: