MHA故障转移日志

来源:互联网 发布:读读日报 关注知乎日报 编辑:程序博客网 时间:2024/04/28 08:00

该日志在MHA管理主机中,配置文件中你所定义的日志文件中。

在认定master主库宕机之后:故障转移步骤

Thu Jul 27 19:19:56 2017 - [info] * Phase 1: Configuration Check Phase..Thu Jul 27 19:19:56 2017 - [info] Thu Jul 27 19:19:56 2017 - [info] GTID failover mode = 1Thu Jul 27 19:19:56 2017 - [info] Dead Servers:Thu Jul 27 19:19:56 2017 - [info]   172.25.67.3(172.25.67.3:3306)Thu Jul 27 19:19:56 2017 - [info] Checking master reachability via MySQL(double check)...Thu Jul 27 19:19:56 2017 - [info]  ok.Thu Jul 27 19:19:56 2017 - [info] Alive Servers:Thu Jul 27 19:19:56 2017 - [info]   172.25.67.2(172.25.67.2:3306)Thu Jul 27 19:19:56 2017 - [info]   172.25.67.4(172.25.67.4:3306)Thu Jul 27 19:19:56 2017 - [info] Alive Slaves:Thu Jul 27 19:19:56 2017 - [info]   172.25.67.2(172.25.67.2:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabledThu Jul 27 19:19:56 2017 - [info]     GTID ONThu Jul 27 19:19:56 2017 - [info]     Replicating from 172.25.67.3(172.25.67.3:3306)Thu Jul 27 19:19:56 2017 - [info]   172.25.67.4(172.25.67.4:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabledThu Jul 27 19:19:56 2017 - [info]     GTID ONThu Jul 27 19:19:56 2017 - [info]     Replicating from 172.25.67.3(172.25.67.3:3306)Thu Jul 27 19:19:56 2017 - [info] Starting GTID based failover.Thu Jul 27 19:19:56 2017 - [info] Thu Jul 27 19:19:56 2017 - [info] ** Phase 1: Configuration Check Phase completed.

该第一阶段:配置检查。从上面的日志信息中可以得知死掉的主库是(172.25.67.3),存活的从库是(172.25.67.2/4),版本为5.7.17-log,且开启了GTID。

Thu Jul 27 19:19:56 2017 - [info] * Phase 2: Dead Master Shutdown Phase..Thu Jul 27 19:19:56 2017 - [info] Thu Jul 27 19:19:56 2017 - [info] Forcing shutdown so that applications never connect to the current master..Thu Jul 27 19:19:56 2017 - [info] Executing master IP deactivation script:Thu Jul 27 19:19:56 2017 - [info]   /etc/mha/master_ip_failover --orig_master_host=172.25.67.3 --orig_master_ip=172.25.67.3 --orig_master_port=3306 --command=stopssh --ssh_user=root  IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 172.25.67.100/24===Disabling the VIP on old master: 172.25.67.3 SIOCSIFFLAGS: Cannot assign requested addressThu Jul 27 19:19:57 2017 - [info]  done.Thu Jul 27 19:19:57 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.Thu Jul 27 19:19:57 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.

该第二阶段是把挂掉的master从虚拟的VIP中移除。下面是第三阶段:

Thu Jul 27 19:19:57 2017 - [info] Thu Jul 27 19:19:57 2017 - [info] * Phase 3: Master Recovery Phase..Thu Jul 27 19:19:57 2017 - [info] Thu Jul 27 19:19:57 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..Thu Jul 27 19:19:57 2017 - [info] Thu Jul 27 19:19:57 2017 - [info] The latest binary log file/position on all slaves is mysql-bin.000003:194Thu Jul 27 19:19:57 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):Thu Jul 27 19:19:57 2017 - [info]   172.25.67.2(172.25.67.2:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabledThu Jul 27 19:19:57 2017 - [info]     GTID ONThu Jul 27 19:19:57 2017 - [info]     Replicating from 172.25.67.3(172.25.67.3:3306)Thu Jul 27 19:19:57 2017 - [info]   172.25.67.4(172.25.67.4:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabledThu Jul 27 19:19:57 2017 - [info]     GTID ONThu Jul 27 19:19:57 2017 - [info]     Replicating from 172.25.67.3(172.25.67.3:3306)Thu Jul 27 19:19:57 2017 - [info] The oldest binary log file/position on all slaves is mysql-bin.000003:194Thu Jul 27 19:19:57 2017 - [info] Oldest slaves:Thu Jul 27 19:19:57 2017 - [info]   172.25.67.2(172.25.67.2:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabledThu Jul 27 19:19:57 2017 - [info]     GTID ONThu Jul 27 19:19:57 2017 - [info]     Replicating from 172.25.67.3(172.25.67.3:3306)Thu Jul 27 19:19:57 2017 - [info]   172.25.67.4(172.25.67.4:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabledThu Jul 27 19:19:57 2017 - [info]     GTID ONThu Jul 27 19:19:57 2017 - [info]     Replicating from 172.25.67.3(172.25.67.3:3306)

该第三阶段是进行最新的slave数据恢复。此阶段中首先要检查最新的的slave和最老的slave所接收的master上的binlog是否都为mysql-bin.000003,position位置是否为194。

Thu Jul 27 19:19:57 2017 - [info]  * Phase 3.2

然后,其实还会有3.2,用于抓取主库上未发送的binlog,并保存到对应指定的位置。但我这里本来就全同步了,所以就没有3.2。

Thu Jul 27 19:19:57 2017 - [info] * Phase 3.3: Determining New Master Phase..Thu Jul 27 19:19:57 2017 - [info] Thu Jul 27 19:19:57 2017 - [info] Searching new master from slaves..Thu Jul 27 19:19:57 2017 - [info]  Candidate masters from the configuration file:Thu Jul 27 19:19:57 2017 - [info]  Non-candidate masters:Thu Jul 27 19:19:57 2017 - [info] New master is 172.25.67.2(172.25.67.2:3306)Thu Jul 27 19:19:57 2017 - [info] Starting master failover..Thu Jul 27 19:19:57 2017 - [info] From:172.25.67.3(172.25.67.3:3306) (current master) +--172.25.67.2(172.25.67.2:3306) +--172.25.67.4(172.25.67.4:3306)To:172.25.67.2(172.25.67.2:3306) (new master) +--172.25.67.4(172.25.67.4:3306)Thu Jul 27 19:19:57 2017 - [info] Thu Jul 27 19:19:57 2017 - [info] * Phase 3.3: New Master Recovery Phase..Thu Jul 27 19:19:57 2017 - [info] Thu Jul 27 19:19:57 2017 - [info]  Waiting all logs to be applied.. Thu Jul 27 19:19:57 2017 - [info]   done.Thu Jul 27 19:19:57 2017 - [info] Getting new master's binlog name and position..Thu Jul 27 19:19:57 2017 - [info]  mysql-bin.000011:194Thu Jul 27 19:19:57 2017 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.25.67.2', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='redhat', MASTER_PASSWORD='xxx';Thu Jul 27 19:19:57 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000011, 194, e1b8a5b6-7274-11e7-8d14-525400813e46:1-3,e9a0c0f6-7274-11e7-8dde-525400022470:1Thu Jul 27 19:19:57 2017 - [info] Executing master IP activate script:Thu Jul 27 19:19:57 2017 - [info]   /etc/mha/master_ip_failover --command=start --ssh_user=root --orig_master_host=172.25.67.3 --orig_master_ip=172.25.67.3 --orig_master_port=3306 --new_master_host=172.25.67.2 --new_master_ip=172.25.67.2 --new_master_port=3306 --new_master_user='root' --new_master_password='Gmoon+007'  Unknown option: new_master_userUnknown option: new_master_passwordIN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 172.25.67.100/24===Enabling the VIP - 172.25.67.100/24 on the new master - 172.25.67.2 Thu Jul 27 19:19:57 2017 - [info]  OK.Thu Jul 27 19:19:57 2017 - [info] ** Finished master recovery successfully.Thu Jul 27 19:19:57 2017 - [info] * Phase 3: Master Recovery Phase completed.

这里提升一个新的master,其它slave都指向新的master。并通过命令master_ip_failover将故障转移,将VIP漂移到172.25.67.2上,从而使其提升为新的master。

Thu Jul 27 19:19:57 2017 - [info] * Phase 4: Slaves Recovery Phase..Thu Jul 27 19:19:57 2017 - [info] Thu Jul 27 19:19:57 2017 - [info] Thu Jul 27 19:19:57 2017 - [info] * Phase 4.1: Starting Slaves in parallel..Thu Jul 27 19:19:57 2017 - [info] Thu Jul 27 19:19:57 2017 - [info] -- Slave recovery on host 172.25.67.4(172.25.67.4:3306) started, pid: 2827. Check tmp log /usr/local/mha/172.25.67.4_3306_20170727191956.log if it takes time..Thu Jul 27 19:20:00 2017 - [info] Thu Jul 27 19:20:00 2017 - [info] Log messages from 172.25.67.4 ...Thu Jul 27 19:20:00 2017 - [info] Thu Jul 27 19:19:57 2017 - [info]  Resetting slave 172.25.67.4(172.25.67.4:3306) and starting replication from the new master 172.25.67.2(172.25.67.2:3306)..Thu Jul 27 19:19:59 2017 - [info]  Executed CHANGE MASTER.Thu Jul 27 19:20:00 2017 - [info]  Slave started.Thu Jul 27 19:20:00 2017 - [info]  gtid_wait(e1b8a5b6-7274-11e7-8d14-525400813e46:1-3,e9a0c0f6-7274-11e7-8dde-525400022470:1) completed on 172.25.67.4(172.25.67.4:3306). Executed 0 events.Thu Jul 27 19:20:00 2017 - [info] End of log messages from 172.25.67.4.Thu Jul 27 19:20:00 2017 - [info] -- Slave on host 172.25.67.4(172.25.67.4:3306) started.Thu Jul 27 19:20:00 2017 - [info] All new slave servers recovered successfully.

这一阶段是恢复最老的slave数据,当最老的slave数据和最新的slave上的relay log补齐后,MHA将master缺失的那一部分binlog发送给最老的slave,将数据补齐,之后重置同步复制关系。

Thu Jul 27 19:20:00 2017 - [info] * Phase 5: New master cleanup phase..Thu Jul 27 19:20:00 2017 - [info] Thu Jul 27 19:20:00 2017 - [info] Resetting slave info on the new master..Thu Jul 27 19:20:01 2017 - [info]  172.25.67.2: Resetting slave info succeeded.Thu Jul 27 19:20:01 2017 - [info] Master failover to 172.25.67.2(172.25.67.2:3306) completed successfully.Thu Jul 27 19:20:01 2017 - [info] ----- Failover Report -----app: MySQL Master failover 172.25.67.3(172.25.67.3:3306) to 172.25.67.2(172.25.67.2:3306) succeededMaster 172.25.67.3(172.25.67.3:3306) is down!Check MHA Manager logs at server1:/usr/local/mha/mha.log for details.Started automated(non-interactive) failover.Invalidated master IP address on 172.25.67.3(172.25.67.3:3306)Selected 172.25.67.2(172.25.67.2:3306) as a new master.172.25.67.2(172.25.67.2:3306): OK: Applying all logs succeeded.172.25.67.2(172.25.67.2:3306): OK: Activated master IP address.172.25.67.4(172.25.67.4:3306): OK: Slave started, replicating from 172.25.67.2(172.25.67.2:3306)172.25.67.2(172.25.67.2:3306): Resetting slave info succeeded.Master failover to 172.25.67.2(172.25.67.2:3306) completed successfully.

最后这以阶段就是打印故障切换报告了。

原创粉丝点击