MHA快速配置

来源:互联网 发布:新网域名转出 编辑:程序博客网 时间:2024/05/01 16:01

环境说明

IP 主机名 角色 备注 mha4mysql-node mha4mysql-manager 10.4.2.41 test1 master 读写 √ 10.4.2.42 test2 slave 只读,候选主库 √ 10.4.2.43 test3 slave 只读,监控节点 √ √

配置主从复制


本次实验为GTID复制

mha4mysql-manager-0.55-0.el6.noarch
mha4mysql-node-0.54-0.el6.noarch
不支持GTID切换,通过查看generl log,change master是基于file position的,手动执行,发现报错auto_position=1

安装epel源(为了装perl模块)

wget -P /etc/yum.repos.d/ http://mirrors.aliyun.com/repo/epel-6.repoyum clean allyum makecache

安装依赖包

可以只在test3装,这些模块是manager用来执行管理脚本所需要的

yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y

参数设置

relay_log_purge=0relay_log=./relay #如果找不到relay,很可能是这里配置有问题

MHA在发生切换的过程中,从库的恢复过程中依赖于relay log的相关信息,所以这里要将relay log的自动清除设置为OFF,采用手动清除relay log的方式。在默认情况下,从服务器上的中继日志会在SQL线程执行完毕后被自动删除。但是在MHA环境中,这些中继日志在恢复其他从服务器时可能会被用到,因此需要禁用中继日志的自动删除功能。定期清除中继日志需要考虑到复制延时的问题。在ext3的文件系统下,删除大的文件需要一定的时间,会导致严重的复制延时。为了避免复制延时,需要暂时为中继日志创建硬链接,因为在linux系统中通过硬链接删除大文件速度会很快。(在mysql数据库中,删除大表时,通常也采用建立硬链接的方式)

MHA节点中包含了pure_relay_logs命令工具,它可以为中继日志创建硬链接,执行SET GLOBAL relay_log_purge=1,等待几秒钟以便SQL线程切换到新的中继日志,再执行SET GLOBAL relay_log_purge=0。

http://www.cnblogs.com/gomysql/p/3675429.html

后面会通过脚本定期清理relay log

安装MHA

创建目录

mkdir -p /data/masterha/app1/logchown -R mysql:mysql /data/masterhachown mysql:mysql /etc/masterha

配置hosts文件

[root@test1 soft]# more /etc/hosts127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4::1         localhost localhost.localdomain localhost6 localhost6.localdomain610.4.2.41       test110.4.2.42       test210.4.2.43       test3

安装包下载
链接:http://pan.baidu.com/s/1eS1mHeM 密码:p20p

所有节点

rpm -ivh mha4mysql-node-0.57-0.el7.noarch.rpm

test3

rpm -ivh mha4mysql-manager-0.57-0.el7.noarch.rpm

test1添加vip

[root@test1 soft]# ip addr add 10.4.2.100/24 dev eth0[root@test1 soft]# ip addr1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00    inet 127.0.0.1/8 scope host lo    inet6 ::1/128 scope host        valid_lft forever preferred_lft forever2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000    link/ether 00:50:56:a6:01:5c brd ff:ff:ff:ff:ff:ff    inet 10.4.2.41/24 brd 10.4.2.255 scope global eth0    inet 10.4.2.100/24 scope global secondary eth0    inet6 fe80::250:56ff:fea6:15c/64 scope link        valid_lft forever preferred_lft forever

创建软连接,不然后面会报command not found

ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlogln -s /usr/local/mysql/bin/mysql /usr/bin/mysql

创建用户

CREATE USER 'mysql'@'10.4.2.%' IDENTIFIED BY 'mysql';GRANT ALL ON *.* to 'mysql'@'10.4.2.%'; #权限配置有点粗犷

创建配置文件

cat >> /etc/masterha/app1.cnf << ![server default] #此区块内的配置会影响下面所有区块#manager日志manager_log=/data/masterha/app1/log/manager.log #manager工作目录manager_workdir=/data/masterha/app1#设置master保存binlog的位置,以便MHA可以找到master的日志master_binlog_dir=/data/mysqldata/3306/binlog#自动failover时切换vip的脚本master_ip_failover_script="/data/masterha/master_ip_failover.pl"#收到切换主库时切换vip的脚本master_ip_online_change_script="/data/masterha/master_ip_online_change.pl"#上面创建的用户,failover时会执行stop/reset/start slave change master等命令user=mysqlpassword=mysql#设置监控主库发送ping包的时间间隔,默认是3秒,尝试三次没有回应的时候自动进行railoverping_interval=1#设置远端mysql在发生切换时binlog的保存位置remote_workdir=/tmp#复制用户repl_password=replrepl_user=repl#failover后发送邮件的脚本report_script="/data/masterha/send_report.pl"#For checking master availability from multiple network routes secondary_check_script=/usr/bin/masterha_secondary_check -s test2 -s test3#设置故障发生后关闭故障主机脚本,主要为了避免脑裂shutdown_script=""#ssh登录用户名,执行scp的用户ssh_user=root[server1]hostname=test1port=3306[server2]hostname=test2#candidate_master=1   #设置为候选master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中事件最新的slave#如果不设置candidate,默认按照配置文件中的顺序和复制延迟选择新主库#check_repl_delay=0   #默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master,因为对于这个slave的恢复需要花费很长时间,通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时,这个参数对于设置了candidate_master=1的主机非常有用,因为这个候选主在切换的过程中一定是新的masterport=3306[server3]hostname=test3port=3306!

配置ssh互信

当mha Manager管理器进行故障切换时,mha manager通过SSH连接mha node执行管理命令

When MHA Manager does failover, MHA manager connects MHA Node via SSH and
executes MHA Node commands when needed.

mha node会通过SSH(scp)发送差异relay log

MHA Node programs also send differential relay log files to other non-latest
slaves via SSH (scp). To make these procedures non-interactive, it is necessary to
setup SSH public key authentication.

所有节点

/usr/bin/ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ''/usr/bin/ssh-keygen -t dsa -f ~/.ssh/id_rsa -P ''

test1

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keyscat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys将test2,test3的id_r/dsa.pub追加写入到test1的authorized_keys文件里ssh test2 cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keysssh test2 cat /root/.ssh/id_dsa.pub >> /root/.ssh/authorized_keysssh test3 cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keysssh test3 cat /root/.ssh/id_dsa.pub >> /root/.ssh/authorized_keys将写有了所有节点id_r/dsa.pub的authorized_keys发送到test2,test3scp ~/.ssh/authorized_keys test2:~/.ssh/scp ~/.ssh/authorized_keys test3:~/.ssh/

检查SSH互相是否配置成功
masterha_check_ssh –conf=/etc/masterha/app1.cnf

Mon Jul 17 15:37:06 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Mon Jul 17 15:37:06 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..Mon Jul 17 15:37:06 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..Mon Jul 17 15:37:06 2017 - [info] Starting SSH connection tests..Mon Jul 17 15:37:07 2017 - [debug] Mon Jul 17 15:37:06 2017 - [debug]  Connecting via SSH from root@test1(10.4.2.41:22) to root@test2(10.4.2.42:22)..Mon Jul 17 15:37:06 2017 - [debug]   ok.Mon Jul 17 15:37:06 2017 - [debug]  Connecting via SSH from root@test1(10.4.2.41:22) to root@test3(10.4.2.43:22)..Mon Jul 17 15:37:07 2017 - [debug]   ok.Mon Jul 17 15:37:07 2017 - [debug] Mon Jul 17 15:37:07 2017 - [debug]  Connecting via SSH from root@test2(10.4.2.42:22) to root@test1(10.4.2.41:22)..Mon Jul 17 15:37:07 2017 - [debug]   ok.Mon Jul 17 15:37:07 2017 - [debug]  Connecting via SSH from root@test2(10.4.2.42:22) to root@test3(10.4.2.43:22)..Mon Jul 17 15:37:07 2017 - [debug]   ok.Mon Jul 17 15:37:08 2017 - [debug] Mon Jul 17 15:37:07 2017 - [debug]  Connecting via SSH from root@test3(10.4.2.43:22) to root@test1(10.4.2.41:22)..Mon Jul 17 15:37:07 2017 - [debug]   ok.Mon Jul 17 15:37:07 2017 - [debug]  Connecting via SSH from root@test3(10.4.2.43:22) to root@test2(10.4.2.42:22)..Mon Jul 17 15:37:08 2017 - [debug]   ok.Mon Jul 17 15:37:08 2017 - [info] All SSH connection tests passed successfully.

检查复制状态

[root@test3 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnfMon Jul 17 15:42:09 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Mon Jul 17 15:42:09 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..Mon Jul 17 15:42:09 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..Mon Jul 17 15:42:09 2017 - [info] MHA::MasterMonitor version 0.57.Mon Jul 17 15:42:09 2017 - [info] GTID failover mode = 1Mon Jul 17 15:42:09 2017 - [info] Dead Servers:Mon Jul 17 15:42:09 2017 - [info] Alive Servers:Mon Jul 17 15:42:09 2017 - [info]   test1(10.4.2.41:3306)Mon Jul 17 15:42:09 2017 - [info]   test2(10.4.2.42:3306)Mon Jul 17 15:42:09 2017 - [info]   test3(10.4.2.43:3306)Mon Jul 17 15:42:09 2017 - [info] Alive Slaves:Mon Jul 17 15:42:09 2017 - [info]   test2(10.4.2.42:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabledMon Jul 17 15:42:09 2017 - [info]     GTID ONMon Jul 17 15:42:09 2017 - [info]     Replicating from 10.4.2.41(10.4.2.41:3306)Mon Jul 17 15:42:09 2017 - [info]   test3(10.4.2.43:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabledMon Jul 17 15:42:09 2017 - [info]     GTID ONMon Jul 17 15:42:09 2017 - [info]     Replicating from 10.4.2.41(10.4.2.41:3306)Mon Jul 17 15:42:09 2017 - [info] Current Alive Master: test1(10.4.2.41:3306)Mon Jul 17 15:42:09 2017 - [info] Checking slave configurations..Mon Jul 17 15:42:09 2017 - [info]  read_only=1 is not set on slave test2(10.4.2.42:3306).Mon Jul 17 15:42:09 2017 - [info]  read_only=1 is not set on slave test3(10.4.2.43:3306).Mon Jul 17 15:42:09 2017 - [info] Checking replication filtering settings..Mon Jul 17 15:42:09 2017 - [info]  binlog_do_db= , binlog_ignore_db= Mon Jul 17 15:42:09 2017 - [info]  Replication filtering check ok.Mon Jul 17 15:42:09 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.Mon Jul 17 15:42:09 2017 - [info] Checking SSH publickey authentication settings on the current master..Mon Jul 17 15:42:09 2017 - [info] HealthCheck: SSH to test1 is reachable.Mon Jul 17 15:42:09 2017 - [info] test1(10.4.2.41:3306) (current master) +--test2(10.4.2.42:3306) +--test3(10.4.2.43:3306)Mon Jul 17 15:42:09 2017 - [info] Checking replication health on test2..Mon Jul 17 15:42:09 2017 - [info]  ok.Mon Jul 17 15:42:09 2017 - [info] Checking replication health on test3..Mon Jul 17 15:42:09 2017 - [info]  ok.Mon Jul 17 15:42:09 2017 - [info] Checking master_ip_failover_script status:Mon Jul 17 15:42:09 2017 - [info]   /data/masterha/master_ip_failover.pl --command=status --ssh_user=root --orig_master_host=test1 --orig_master_ip=10.4.2.41 --orig_master_port=3306 Mon Jul 17 15:42:09 2017 - [info]  OK.Mon Jul 17 15:42:09 2017 - [warning] shutdown_script is not defined.Mon Jul 17 15:42:09 2017 - [info] Got exit code 0 (Not master dead).MySQL Replication Health is OK.

启停mha manager

[root@test3 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/masterha/app1/log/manager.log 2>&1 &  [1] 21419检查mha manager监控状态[root@test3 ~]# masterha_check_status --conf=/etc/masterha/app1.cnfapp1 (pid:21419) is running(0:PING_OK), master:test1[root@test3 ~]# masterha_stop --conf=/etc/masterha/app1.cnfStopped app1 successfully.[1]+  Exit 1                  nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/masterha/app1/log/manager.log 2>&1[root@test3 ~]# masterha_check_status --conf=/etc/masterha/app1.cnfapp1 is stopped(2:NOT_RUNNING).

模拟主库挂掉

test3

nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/masterha/app1/log/manager.log 2>&1 &  

test1

[root@test1 soft]# mysqladmin shutdown

查看日志

Mon Jul 17 15:58:56 2017 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)Mon Jul 17 15:58:56 2017 - [info] Executing secondary network check script: /usr/bin/masterha_secondary_check -s test2 -s test3  --user=root  --master_host=test1  --master_ip=10.4.2.41  --master_port=3306 --master_user=mysql --master_password=mysql --ping_type=SELECTMon Jul 17 15:58:56 2017 - [info] Executing SSH check script: exit 0Mon Jul 17 15:58:56 2017 - [info] HealthCheck: SSH to test1 is reachable.Monitoring server test2 is reachable, Master is not reachable from test2. OK.Monitoring server test3 is reachable, Master is not reachable from test3. OK.Mon Jul 17 15:58:56 2017 - [info] Master is not reachable from all other monitoring servers. Failover should start.Mon Jul 17 15:58:57 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)Mon Jul 17 15:58:57 2017 - [warning] Connection failed 2 time(s)..Mon Jul 17 15:58:58 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)Mon Jul 17 15:58:58 2017 - [warning] Connection failed 3 time(s)..Mon Jul 17 15:58:59 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)Mon Jul 17 15:58:59 2017 - [warning] Connection failed 4 time(s)..Mon Jul 17 15:58:59 2017 - [warning] Master is not reachable from health checker!Mon Jul 17 15:58:59 2017 - [warning] Master test1(10.4.2.41:3306) is not reachable!Mon Jul 17 15:58:59 2017 - [warning] SSH is reachable.Mon Jul 17 15:58:59 2017 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..Mon Jul 17 15:58:59 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Mon Jul 17 15:58:59 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..Mon Jul 17 15:58:59 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..Mon Jul 17 15:58:59 2017 - [info] GTID failover mode = 1Mon Jul 17 15:58:59 2017 - [info] Dead Servers:Mon Jul 17 15:58:59 2017 - [info]   test1(10.4.2.41:3306)Mon Jul 17 15:58:59 2017 - [info] Alive Servers:Mon Jul 17 15:58:59 2017 - [info]   test2(10.4.2.42:3306)Mon Jul 17 15:58:59 2017 - [info]   test3(10.4.2.43:3306)Mon Jul 17 15:58:59 2017 - [info] Alive Slaves:Mon Jul 17 15:58:59 2017 - [info]   test2(10.4.2.42:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabledMon Jul 17 15:58:59 2017 - [info]     GTID ONMon Jul 17 15:58:59 2017 - [info]     Replicating from 10.4.2.41(10.4.2.41:3306)Mon Jul 17 15:58:59 2017 - [info]   test3(10.4.2.43:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabledMon Jul 17 15:58:59 2017 - [info]     GTID ONMon Jul 17 15:58:59 2017 - [info]     Replicating from 10.4.2.41(10.4.2.41:3306)Mon Jul 17 15:58:59 2017 - [info] Checking slave configurations..Mon Jul 17 15:58:59 2017 - [info]  read_only=1 is not set on slave test2(10.4.2.42:3306).Mon Jul 17 15:58:59 2017 - [info]  read_only=1 is not set on slave test3(10.4.2.43:3306).Mon Jul 17 15:58:59 2017 - [info] Checking replication filtering settings..Mon Jul 17 15:58:59 2017 - [info]  Replication filtering check ok.Mon Jul 17 15:58:59 2017 - [info] Master is down!Mon Jul 17 15:58:59 2017 - [info] Terminating monitoring script.Mon Jul 17 15:58:59 2017 - [info] Got exit code 20 (Master dead).Mon Jul 17 15:58:59 2017 - [info] MHA::MasterFailover version 0.57.Mon Jul 17 15:58:59 2017 - [info] Starting master failover.Mon Jul 17 15:58:59 2017 - [info] Mon Jul 17 15:58:59 2017 - [info] * Phase 1: Configuration Check Phase..Mon Jul 17 15:58:59 2017 - [info] Mon Jul 17 15:58:59 2017 - [info] GTID failover mode = 1Mon Jul 17 15:58:59 2017 - [info] Dead Servers:Mon Jul 17 15:58:59 2017 - [info]   test1(10.4.2.41:3306)Mon Jul 17 15:58:59 2017 - [info] Checking master reachability via MySQL(double check)...Mon Jul 17 15:58:59 2017 - [info]  ok.Mon Jul 17 15:58:59 2017 - [info] Alive Servers:Mon Jul 17 15:58:59 2017 - [info]   test2(10.4.2.42:3306)Mon Jul 17 15:58:59 2017 - [info]   test3(10.4.2.43:3306)Mon Jul 17 15:58:59 2017 - [info] Alive Slaves:Mon Jul 17 15:58:59 2017 - [info]   test2(10.4.2.42:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabledMon Jul 17 15:58:59 2017 - [info]     GTID ONMon Jul 17 15:58:59 2017 - [info]     Replicating from 10.4.2.41(10.4.2.41:3306)Mon Jul 17 15:58:59 2017 - [info]   test3(10.4.2.43:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabledMon Jul 17 15:58:59 2017 - [info]     GTID ONMon Jul 17 15:58:59 2017 - [info]     Replicating from 10.4.2.41(10.4.2.41:3306)Mon Jul 17 15:58:59 2017 - [info] Starting GTID based failover.Mon Jul 17 15:58:59 2017 - [info] Mon Jul 17 15:58:59 2017 - [info] ** Phase 1: Configuration Check Phase completed.Mon Jul 17 15:58:59 2017 - [info] Mon Jul 17 15:58:59 2017 - [info] * Phase 2: Dead Master Shutdown Phase..Mon Jul 17 15:58:59 2017 - [info] Mon Jul 17 15:58:59 2017 - [info] Forcing shutdown so that applications never connect to the current master..Mon Jul 17 15:58:59 2017 - [info] Executing master IP deactivation script:Mon Jul 17 15:58:59 2017 - [info]   /data/masterha/master_ip_failover.pl --orig_master_host=test1 --orig_master_ip=10.4.2.41 --orig_master_port=3306 --command=stopssh --ssh_user=root  Mon Jul 17 15:58:59 2017 - [info]  done.Mon Jul 17 15:58:59 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.Mon Jul 17 15:58:59 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.Mon Jul 17 15:58:59 2017 - [info] Mon Jul 17 15:58:59 2017 - [info] * Phase 3: Master Recovery Phase..Mon Jul 17 15:58:59 2017 - [info] Mon Jul 17 15:58:59 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..Mon Jul 17 15:58:59 2017 - [info] Mon Jul 17 15:58:59 2017 - [info] The latest binary log file/position on all slaves is mysql-bin.000033:194Mon Jul 17 15:58:59 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):Mon Jul 17 15:58:59 2017 - [info]   test2(10.4.2.42:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabledMon Jul 17 15:58:59 2017 - [info]     GTID ONMon Jul 17 15:58:59 2017 - [info]     Replicating from 10.4.2.41(10.4.2.41:3306)Mon Jul 17 15:58:59 2017 - [info]   test3(10.4.2.43:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabledMon Jul 17 15:58:59 2017 - [info]     GTID ONMon Jul 17 15:58:59 2017 - [info]     Replicating from 10.4.2.41(10.4.2.41:3306)Mon Jul 17 15:58:59 2017 - [info] The oldest binary log file/position on all slaves is mysql-bin.000033:194Mon Jul 17 15:58:59 2017 - [info] Oldest slaves:Mon Jul 17 15:58:59 2017 - [info]   test2(10.4.2.42:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabledMon Jul 17 15:58:59 2017 - [info]     GTID ONMon Jul 17 15:58:59 2017 - [info]     Replicating from 10.4.2.41(10.4.2.41:3306)Mon Jul 17 15:58:59 2017 - [info]   test3(10.4.2.43:3306)  Version=5.7.18-log (oldest major version between slaves) log-bin:enabledMon Jul 17 15:58:59 2017 - [info]     GTID ONMon Jul 17 15:58:59 2017 - [info]     Replicating from 10.4.2.41(10.4.2.41:3306)Mon Jul 17 15:58:59 2017 - [info] Mon Jul 17 15:58:59 2017 - [info] * Phase 3.3: Determining New Master Phase..Mon Jul 17 15:58:59 2017 - [info] Mon Jul 17 15:58:59 2017 - [info] Searching new master from slaves..Mon Jul 17 15:58:59 2017 - [info]  Candidate masters from the configuration file:Mon Jul 17 15:58:59 2017 - [info]  Non-candidate masters:Mon Jul 17 15:58:59 2017 - [info] New master is test2(10.4.2.42:3306)Mon Jul 17 15:58:59 2017 - [info] Starting master failover..Mon Jul 17 15:58:59 2017 - [info] From:test1(10.4.2.41:3306) (current master) +--test2(10.4.2.42:3306) +--test3(10.4.2.43:3306)To:test2(10.4.2.42:3306) (new master) +--test3(10.4.2.43:3306)Mon Jul 17 15:58:59 2017 - [info] Mon Jul 17 15:58:59 2017 - [info] * Phase 3.3: New Master Recovery Phase..Mon Jul 17 15:58:59 2017 - [info] Mon Jul 17 15:58:59 2017 - [info]  Waiting all logs to be applied.. Mon Jul 17 15:58:59 2017 - [info]   done.Mon Jul 17 15:58:59 2017 - [info] Getting new master's binlog name and position..Mon Jul 17 15:58:59 2017 - [info]  mysql-bin.000003:194Mon Jul 17 15:58:59 2017 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='test2 or 10.4.2.42', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';Mon Jul 17 15:58:59 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000003, 194, 663e66b3-6148-11e7-bddf-005056a6015c:1-1456965Mon Jul 17 15:58:59 2017 - [info] Executing master IP activate script:Mon Jul 17 15:58:59 2017 - [info]   /data/masterha/master_ip_failover.pl --command=start --ssh_user=root --orig_master_host=test1 --orig_master_ip=10.4.2.41 --orig_master_port=3306 --new_master_host=test2 --new_master_ip=10.4.2.42 --new_master_port=3306 --new_master_user='mysql'   --new_master_password=xxxSet read_only=0 on the new master.Creating app user on the new master..RTNETLINK answers: Cannot assign requested addressRTNETLINK answers: File existsMon Jul 17 15:59:00 2017 - [info]  OK.Mon Jul 17 15:59:00 2017 - [info] ** Finished master recovery successfully.Mon Jul 17 15:59:00 2017 - [info] * Phase 3: Master Recovery Phase completed.Mon Jul 17 15:59:00 2017 - [info] Mon Jul 17 15:59:00 2017 - [info] * Phase 4: Slaves Recovery Phase..Mon Jul 17 15:59:00 2017 - [info] Mon Jul 17 15:59:00 2017 - [info] Mon Jul 17 15:59:00 2017 - [info] * Phase 4.1: Starting Slaves in parallel..Mon Jul 17 15:59:00 2017 - [info] Mon Jul 17 15:59:00 2017 - [info] -- Slave recovery on host test3(10.4.2.43:3306) started, pid: 21758. Check tmp log /data/masterha/app1/test3_3306_20170717155859.log if it takes time..Mon Jul 17 15:59:00 2017 - [info] Mon Jul 17 15:59:00 2017 - [info] Log messages from test3 ...Mon Jul 17 15:59:00 2017 - [info] Mon Jul 17 15:59:00 2017 - [info]  Resetting slave test3(10.4.2.43:3306) and starting replication from the new master test2(10.4.2.42:3306)..Mon Jul 17 15:59:00 2017 - [info]  Executed CHANGE MASTER.Mon Jul 17 15:59:00 2017 - [info]  Slave started.Mon Jul 17 15:59:00 2017 - [info]  gtid_wait(663e66b3-6148-11e7-bddf-005056a6015c:1-1456965) completed on test3(10.4.2.43:3306). Executed 0 events.Mon Jul 17 15:59:00 2017 - [info] End of log messages from test3.Mon Jul 17 15:59:00 2017 - [info] -- Slave on host test3(10.4.2.43:3306) started.Mon Jul 17 15:59:00 2017 - [info] All new slave servers recovered successfully.Mon Jul 17 15:59:00 2017 - [info] Mon Jul 17 15:59:00 2017 - [info] * Phase 5: New master cleanup phase..Mon Jul 17 15:59:00 2017 - [info] Mon Jul 17 15:59:00 2017 - [info] Resetting slave info on the new master..Mon Jul 17 15:59:00 2017 - [info]  test2: Resetting slave info succeeded.Mon Jul 17 15:59:00 2017 - [info] Master failover to test2(10.4.2.42:3306) completed successfully.Mon Jul 17 15:59:00 2017 - [info] Deleted server1 entry from /etc/masterha/app1.cnf .Mon Jul 17 15:59:00 2017 - [info] ----- Failover Report -----app1: MySQL Master failover test1(10.4.2.41:3306) to test2(10.4.2.42:3306) succeededMaster test1(10.4.2.41:3306) is down!Check MHA Manager logs at test3:/data/masterha/app1/log/manager.log for details.Started automated(non-interactive) failover.Invalidated master IP address on test1(10.4.2.41:3306)Selected test2(10.4.2.42:3306) as a new master.test2(10.4.2.42:3306): OK: Applying all logs succeeded.test2(10.4.2.42:3306): OK: Activated master IP address.test3(10.4.2.43:3306): OK: Slave started, replicating from test2(10.4.2.42:3306)test2(10.4.2.42:3306): Resetting slave info succeeded.Master failover to test2(10.4.2.42:3306) completed successfully.Mon Jul 17 15:59:00 2017 - [info] Sending mail..Unknown option: conf

failoverover后监控会退出

[root@test3 ~]# [1]+  Done                    nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/masterha/app1/log/manager.log 2>&1[root@test3 ~]# masterha_check_status --conf=/etc/masterha/app1.cnfapp1 is stopped(2:NOT_RUNNING).

配置文件中test1会被删除

[root@test3 ~]# more /etc/masterha/app1.cnf [server default]manager_log=/data/masterha/app1/log/manager.logmanager_workdir=/data/masterha/app1master_binlog_dir=/data/mysqldata/3306/binlogmaster_ip_failover_script="/data/masterha/master_ip_failover.pl"master_ip_online_change_script="/data/masterha/master_ip_online_change.pl"password=mysqlping_interval=1remote_workdir=/tmprepl_password=replrepl_user=replreport_script="/data/masterha/send_report.pl"secondary_check_script=/usr/bin/masterha_secondary_check -s test2 -s test3shutdown_script=""ssh_user=rootuser=mysql[server2]hostname=test2port=3306[server3]hostname=test3port=3306

vip会漂移到新主
test2

[root@test2 ~]# ip addr1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00    inet 127.0.0.1/8 scope host lo    inet6 ::1/128 scope host        valid_lft forever preferred_lft forever2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN qlen 1000    link/ether 00:50:56:a6:3f:e7 brd ff:ff:ff:ff:ff:ff    inet 10.4.2.42/24 brd 10.4.2.255 scope global eth0    inet 10.4.2.100/24 scope global secondary eth0    inet6 fe80::250:56ff:fea6:3fe7/64 scope link 

邮件内容

Master test1(10.4.2.41:3306) is down!Check MHA Manager logs at test3:/data/masterha/app1/log/manager.log for details.Started automated(non-interactive) failover.Invalidated master IP address on test1(10.4.2.41:3306)Selected test2(10.4.2.42:3306) as a new master.test2(10.4.2.42:3306): OK: Applying all logs succeeded.test2(10.4.2.42:3306): OK: Activated master IP address.test3(10.4.2.43:3306): OK: Slave started, replicating from test2(10.4.2.42:3306)test2(10.4.2.42:3306): Resetting slave info succeeded.Master failover to test2(10.4.2.42:3306) completed successfully.

脚本

这里通过命令为eth0添加/删除vip, 主库要在一开始添加好vip
master_ip_failover

[root@test3 masterha]# cat master_ip_failover.pl #!/usr/bin/env perl#  Copyright (C) 2011 DeNA Co.,Ltd.##  This program is free software; you can redistribute it and/or modify#  it under the terms of the GNU General Public License as published by#  the Free Software Foundation; either version 2 of the License, or#  (at your option) any later version.##  This program is distributed in the hope that it will be useful,#  but WITHOUT ANY WARRANTY; without even the implied warranty of#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the#  GNU General Public License for more details.##  You should have received a copy of the GNU General Public License#   along with this program; if not, write to the Free Software#  Foundation, Inc.,#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA## Note: This is a sample script and is not complete. Modify the script based on your environment.use strict;use warnings FATAL => 'all';use Getopt::Long;use MHA::DBHelper;### 自定义配置开始 #### 该组机器的vipmy $vip = "10.4.2.100";my $if = "eth0";### 自定义配置结束 ###my (  $command,        $ssh_user,         $orig_master_host,  $orig_master_ip, $orig_master_port, $new_master_host,  $new_master_ip,  $new_master_port,  $new_master_user,  $new_master_password);GetOptions(  'command=s'             => \$command,  'ssh_user=s'            => \$ssh_user,  'orig_master_host=s'    => \$orig_master_host,  'orig_master_ip=s'      => \$orig_master_ip,  'orig_master_port=i'    => \$orig_master_port,  'new_master_host=s'     => \$new_master_host,  'new_master_ip=s'       => \$new_master_ip,  'new_master_port=i'     => \$new_master_port,  'new_master_user=s'     => \$new_master_user,  'new_master_password=s' => \$new_master_password,);### 自定义配置开始 ###sub add_vip {  my $output1 = `ssh $orig_master_host /sbin/ip addr del $vip/24 dev $if`;  my $output2 = `ssh $new_master_host /sbin/ip addr add $vip/24 dev $if`;}### 自定义配置结束 ###exit &main();sub main {  if ( $command eq "stop" || $command eq "stopssh" ) {    # $orig_master_host, $orig_master_ip, $orig_master_port are passed.    # If you manage master ip address at global catalog database,    # invalidate orig_master_ip here.    my $exit_code = 1;    eval {      # updating global catalog, etc      $exit_code = 0;    };    if ($@) {      warn "Got Error: $@\n";      exit $exit_code;    }    exit $exit_code;  }  elsif ( $command eq "start" ) {    # all arguments are passed.    # If you manage master ip address at global catalog database,    # activate new_master_ip here.    # You can also grant write access (create user, set read_only=0, etc) here.    my $exit_code = 10;    eval {      my $new_master_handler = new MHA::DBHelper();      # args: hostname, port, user, password, raise_error_or_not      $new_master_handler->connect( $new_master_ip, $new_master_port,        $new_master_user, $new_master_password, 1 );      ## Set read_only=0 on the new master      $new_master_handler->disable_log_bin_local();      print "Set read_only=0 on the new master.\n";      $new_master_handler->disable_read_only();      ## Creating an app user on the new master      print "Creating app user on the new master..\n";      ### 自定义配置开始 ###      #FIXME_xxx_create_user( $new_master_handler->{dbh} );      ### 自定义配置结束 ###      $new_master_handler->enable_log_bin_local();      $new_master_handler->disconnect();      ## Update master ip on the catalog database, etc      #FIXME_xxx;      ### 自定义配置开始 ###      &add_vip();      ### 自定义配置结束 ###      $exit_code = 0;    };    if ($@) {      warn $@;      # If you want to continue failover, exit 10.      exit $exit_code;    }    exit $exit_code;  }  elsif ( $command eq "status" ) {    # do nothing    exit 0;  }  else {    &usage();    exit 1;  }}sub usage {  print"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";}

master_ip_online_change

[root@test3 masterha]# cat master_ip_online_change.pl #!/usr/bin/env perl#  Copyright (C) 2011 DeNA Co.,Ltd.##  This program is free software; you can redistribute it and/or modify#  it under the terms of the GNU General Public License as published by#  the Free Software Foundation; either version 2 of the License, or#  (at your option) any later version.##  This program is distributed in the hope that it will be useful,#  but WITHOUT ANY WARRANTY; without even the implied warranty of#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the#  GNU General Public License for more details.##  You should have received a copy of the GNU General Public License#   along with this program; if not, write to the Free Software#  Foundation, Inc.,#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA## Note: This is a sample script and is not complete. Modify the script based on your environment.use strict;use warnings FATAL => 'all';use Getopt::Long;use MHA::DBHelper;use MHA::NodeUtil;use Time::HiRes qw( sleep gettimeofday tv_interval );use Data::Dumper;my $_tstart;my $_running_interval = 0.1;### 自定义配置开始 #### 该组机器的vipmy $vip = "10.4.2.100";my $if = "eth0";### 自定义配置结束 ###my (  $command,              $orig_master_is_new_slave, $orig_master_host,  $orig_master_ip,       $orig_master_port,         $orig_master_user,  $orig_master_password, $orig_master_ssh_user,     $new_master_host,  $new_master_ip,        $new_master_port,          $new_master_user,  $new_master_password,  $new_master_ssh_user,);GetOptions(  'command=s'                => \$command,  'orig_master_is_new_slave' => \$orig_master_is_new_slave,  'orig_master_host=s'       => \$orig_master_host,  'orig_master_ip=s'         => \$orig_master_ip,  'orig_master_port=i'       => \$orig_master_port,  'orig_master_user=s'       => \$orig_master_user,  'orig_master_password=s'   => \$orig_master_password,  'orig_master_ssh_user=s'   => \$orig_master_ssh_user,  'new_master_host=s'        => \$new_master_host,  'new_master_ip=s'          => \$new_master_ip,  'new_master_port=i'        => \$new_master_port,  'new_master_user=s'        => \$new_master_user,  'new_master_password=s'    => \$new_master_password,  'new_master_ssh_user=s'    => \$new_master_ssh_user,);exit &main();### 自定义配置开始 ###sub drop_vip {  my $output = `ssh $orig_master_host /sbin/ip addr del $vip/24 dev $if`;  #mysql里的连接全部干掉  #FIXME}sub add_vip {  my $output = `ssh $new_master_host /sbin/ip addr add $vip/24 dev $if`;}### 自定义配置结束 ###sub current_time_us {  my ( $sec, $microsec ) = gettimeofday();  my $curdate = localtime($sec);  return $curdate . " " . sprintf( "%06d", $microsec );}sub sleep_until {  my $elapsed = tv_interval($_tstart);  if ( $_running_interval > $elapsed ) {    sleep( $_running_interval - $elapsed );  }}sub get_threads_util {  my $dbh                    = shift;  my $my_connection_id       = shift;  my $running_time_threshold = shift;  my $type                   = shift;  $running_time_threshold = 0 unless ($running_time_threshold);  $type                   = 0 unless ($type);  my @threads;  my $sth = $dbh->prepare("SHOW PROCESSLIST");  $sth->execute();  while ( my $ref = $sth->fetchrow_hashref() ) {    my $id         = $ref->{Id};    my $user       = $ref->{User};    my $host       = $ref->{Host};    my $command    = $ref->{Command};    my $state      = $ref->{State};    my $query_time = $ref->{Time};    my $info       = $ref->{Info};    $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);    next if ( $my_connection_id == $id );    next if ( defined($query_time) && $query_time < $running_time_threshold );    next if ( defined($command)    && $command eq "Binlog Dump" );    next if ( defined($user)       && $user eq "system user" );    next      if ( defined($command)      && $command eq "Sleep"      && defined($query_time)      && $query_time >= 1 );    if ( $type >= 1 ) {      next if ( defined($command) && $command eq "Sleep" );      next if ( defined($command) && $command eq "Connect" );    }    if ( $type >= 2 ) {      next if ( defined($info) && $info =~ m/^select/i );      next if ( defined($info) && $info =~ m/^show/i );    }    push @threads, $ref;  }  return @threads;}sub main {  if ( $command eq "stop" ) {    ## Gracefully killing connections on the current master    # 1. Set read_only= 1 on the new master    # 2. DROP USER so that no app user can establish new connections    # 3. Set read_only= 1 on the current master    # 4. Kill current queries    # * Any database access failure will result in script die.    my $exit_code = 1;    eval {      ## Setting read_only=1 on the new master (to avoid accident)      my $new_master_handler = new MHA::DBHelper();      # args: hostname, port, user, password, raise_error(die_on_error)_or_not      $new_master_handler->connect( $new_master_ip, $new_master_port,        $new_master_user, $new_master_password, 1 );      print current_time_us() . " Set read_only on the new master.. ";      $new_master_handler->enable_read_only();      if ( $new_master_handler->is_read_only() ) {        print "ok.\n";      }      else {        die "Failed!\n";      }      $new_master_handler->disconnect();      # Connecting to the orig master, die if any database error happens      my $orig_master_handler = new MHA::DBHelper();      $orig_master_handler->connect( $orig_master_ip, $orig_master_port,        $orig_master_user, $orig_master_password, 1 );      ## Drop application user so that nobody can connect. Disabling per-session binlog beforehand      $orig_master_handler->disable_log_bin_local();      print current_time_us() . " Drpping app user on the orig master..\n";      #FIXME_xxx_drop_app_user($orig_master_handler);      &drop_vip();      ## Waiting for N * 100 milliseconds so that current connections can exit      my $time_until_read_only = 15;      $_tstart = [gettimeofday];      my @threads = get_threads_util( $orig_master_handler->{dbh},        $orig_master_handler->{connection_id} );      while ( $time_until_read_only > 0 && $#threads >= 0 ) {        if ( $time_until_read_only % 5 == 0 ) {          printf"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",            current_time_us(), $#threads + 1, $time_until_read_only * 100;          if ( $#threads < 5 ) {            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"              foreach (@threads);          }        }        sleep_until();        $_tstart = [gettimeofday];        $time_until_read_only--;        @threads = get_threads_util( $orig_master_handler->{dbh},          $orig_master_handler->{connection_id} );      }      ## Setting read_only=1 on the current master so that nobody(except SUPER) can write      print current_time_us() . " Set read_only=1 on the orig master.. ";      $orig_master_handler->enable_read_only();      if ( $orig_master_handler->is_read_only() ) {        print "ok.\n";      }      else {        die "Failed!\n";      }      ## Waiting for M * 100 milliseconds so that current update queries can complete      my $time_until_kill_threads = 5;      @threads = get_threads_util( $orig_master_handler->{dbh},        $orig_master_handler->{connection_id} );      while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {        if ( $time_until_kill_threads % 5 == 0 ) {          printf"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",            current_time_us(), $#threads + 1, $time_until_kill_threads * 100;          if ( $#threads < 5 ) {            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"              foreach (@threads);          }        }        sleep_until();        $_tstart = [gettimeofday];        $time_until_kill_threads--;        @threads = get_threads_util( $orig_master_handler->{dbh},          $orig_master_handler->{connection_id} );      }      ## Terminating all threads      print current_time_us() . " Killing all application threads..\n";      $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );      print current_time_us() . " done.\n";      $orig_master_handler->enable_log_bin_local();      $orig_master_handler->disconnect();      ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK      $exit_code = 0;    };    if ($@) {      warn "Got Error: $@\n";      exit $exit_code;    }    exit $exit_code;  }  elsif ( $command eq "start" ) {    ## Activating master ip on the new master    # 1. Create app user with write privileges    # 2. Moving backup script if needed    # 3. Register new master's ip to the catalog database# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.# If exit code is 0 or 10, MHA does not abort    my $exit_code = 10;    eval {      my $new_master_handler = new MHA::DBHelper();      # args: hostname, port, user, password, raise_error_or_not      $new_master_handler->connect( $new_master_ip, $new_master_port,        $new_master_user, $new_master_password, 1 );      ## Set read_only=0 on the new master      $new_master_handler->disable_log_bin_local();      print current_time_us() . " Set read_only=0 on the new master.\n";      $new_master_handler->disable_read_only();      ## Creating an app user on the new master      print current_time_us() . " Creating app user on the new master..\n";      #FIXME_xxx_create_app_user($new_master_handler);      &add_vip();      $new_master_handler->enable_log_bin_local();      $new_master_handler->disconnect();      ## Update master ip on the catalog database, etc      $exit_code = 0;    };    if ($@) {      warn "Got Error: $@\n";      exit $exit_code;    }    exit $exit_code;  }  elsif ( $command eq "status" ) {    # do nothing    exit 0;  }  else {    &usage();    exit 1;  }}sub usage {  print"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";  die;}

send_report

[root@test3 masterha]# cat send_report.pl #!/usr/bin/perl#  Copyright (C) 2011 DeNA Co.,Ltd.##  This program is free software; you can redistribute it and/or modify#  it under the terms of the GNU General Public License as published by#  the Free Software Foundation; either version 2 of the License, or#  (at your option) any later version.##  This program is distributed in the hope that it will be useful,#  but WITHOUT ANY WARRANTY; without even the implied warranty of#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the#  GNU General Public License for more details.##  You should have received a copy of the GNU General Public License#   along with this program; if not, write to the Free Software#  Foundation, Inc.,#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA## Note: This is a sample script and is not complete. Modify the script based on your environment.use strict;use warnings FATAL => 'all';use Mail::Sender;use Getopt::Long;#new_master_host and new_slave_hosts are set only when recovering master succeededmy ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );my $smtp='smtp.163.com';my $mail_from='xxx@163.com';my $mail_user='xxx@163.com';my $mail_pass='passwordxxx';my $mail_to=['xxx@163.com'];GetOptions(  'orig_master_host=s' => \$dead_master_host,  'new_master_host=s'  => \$new_master_host,  'new_slave_hosts=s'  => \$new_slave_hosts,  'subject=s'          => \$subject,  'body=s'             => \$body,);mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);sub mailToContacts {    my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_;    open my $DEBUG, "> /tmp/monitormail.log"        or die "Can't open the debug      file:$!\n";    my $sender = new Mail::Sender {        ctype       => 'text/plain; charset=utf-8',        encoding    => 'utf-8',        smtp        => $smtp,        from        => $mail_from,        auth        => 'LOGIN',        TLS_allowed => '0',        authid      => $user,        authpwd     => $passwd,        to          => $mail_to,        subject     => $subject,        debug       => $DEBUG    };    $sender->MailMsg(        {   msg   => $msg,            debug => $DEBUG        }    ) or print $Mail::Sender::Error;    return 1;}# Do whatever you want hereexit 0;

参考:

https://yq.aliyun.com/articles/58920
http://www.cnblogs.com/gomysql/p/3675429.html
http://blog.csdn.net/longyinyushi/article/details/50925638

原创粉丝点击