mysql高可用性探究

来源：互联网发布：数据机房装修效果图编辑：程序博客网时间：2024/05/01 21:31

1.mmm方案

MMM即:Master-Master Replication Manager For MySQL,MySQL主主复制管理器的功能包括监控、故障转移和等一系列脚本构成,
这个脚本也能对基本的主从复制配置的任意数量的从服务器进行读负载均衡，所以可以用它来实现一组居于复制的虚拟IP,同时它还有数据备份、节点之间重新同步功能的能力.

IP
DB1:192.168.11.198
DB2:192.168.11.88
DB3:192.168.11.238
MONITOR:192.168.11.116

结构图如下:

db1
[mysql@localhost ~]$ sudo cat /etc/my.cnf
[sudo] password for mysql:
[client]
socket=/tmp/mysql.sock

[mysqld]
server-id=2
datadir=/mysql/data
socket=/tmp/mysql.sock
user=mysql
default_storage_engine=innodb
character_set_server=utf8
slow_query_log=1
slow_query_log_file=/mysql/slowquery.log
long_query_time=2
log-queries-not-using-indexes
log-slow-admin-statements
innodb_buffer_pool_size=50M
innodb_flush_log_at_trx_commit=1
max_allowed_packet=100M
binlog_format=mixed
log-bin=/mysql/log/mysql-bin
log_bin_trust_function_creators = 1
innodb_fast_shutdown = 0
binlog-do-db=test
replicate-do-db=test
log-slave-updates=on
[mysqld_safe]
log-error=/mysql/mysqld.log
pid-file=/mysql/mysqld.pid

db2
[mysql@localhost ~]$ sudo cat /etc/my.cnf
[client]
socket=/tmp/mysql.sock

[mysqld]
server-id=4
datadir=/mysql/data
socket=/tmp/mysql.sock
user=mysql
default_storage_engine=InnoDB
character_set_server=utf8
slow_query_log=1
slow_query_log_file=/mysql/slowquery.log
long_query_time=2
log-queries-not-using-indexes
log-slow-admin-statements
log_bin_trust_function_creators = 1
log-bin=/mysql/log/mysql-bin
report_host=192.168.23.164
binlog_format=mixed
log-bin=/mysql/log/mysql-bin
binlog-do-db=test
replicate-do-db=test
log-slave-updates=on
slave-skip-errors=1007,1050,1146,1051
[mysqld_safe]
log-error=/mysql/mysqld.log
pid-file=/mysql/mysqld.pid

db3
[client]
socket=/tmp/mysql.sock
port=3306

[mysqld]
server-id=3
port=3306
basedir=/usr/local/mysql
datadir=/mysql/data
socket=/tmp/mysql.sock
user=mysql
default_storage_engine=innodb
character_set_server=utf8
log-bin=/mysql/log/mysql-bin
slave-skip-errors=1007,1050
slow_query_log=1
slow_query_log_file=/mysql/slowquery.log
long_query_time=2
relay-log=relay-bin
relay-log-index=relay-bin.index
binlog_format=mixed
log-slave-updates=on
replicate-do-db=test
slave-skip-errors=1146
[mysqld_safe]
log-error=/mysql/mysqld.log
pid-file=/mysql/mysqld.pid

三、主从配置(master1和master2配置成主主,slave1配置成master1的从)
1、在master1上授权:
grant replication slave on *.* to repl@'192.168.11.198' identified by "XXXX";
grant replication slave on *.* to repl@'192.168.11.88' identified by "XXXX";
grant replication slave on *.* to repl@'192.168.11.238' identified by "XXXX";flush privileges;

2、在master2上授权:
grant replication slave on *.* to repl@'192.168.11.198' identified by "XXXX";
grant replication slave on *.* to repl@'192.168.11.88' identified by "XXXX";
grant replication slave on *.* to repl@'192.168.11.238' identified by "XXXX";flush privileges;

在master2、slave1执行
change master to master_host='192.168.11.198', master_port=3306, master_user='repl', master_password='XXXX';start slave;
把master1配置成master2的从库:
change master to master_host='192.168.11.88', master_port=3306, master_user='repl', master_password='XXXX';start slave;

在各个机器上执行:
db1
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.11.88
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000024
Read_Master_Log_Pos: 124156
Relay_Log_File: mysqld-relay-bin.000002
Relay_Log_Pos: 1068
Relay_Master_Log_File: mysql-bin.000024
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB: test
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 124156
Relay_Log_Space: 1242
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 4
Master_UUID: a7e4c60d-62ca-11e3-8710-080027e08a30
Master_Info_File: /mysql/data/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
1 row in set (0.00 sec)

db2
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.11.198
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000064
Read_Master_Log_Pos: 3324
Relay_Log_File: mysqld-relay-bin.000002
Relay_Log_Pos: 1870
Relay_Master_Log_File: mysql-bin.000064
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB: test
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 3324
Relay_Log_Space: 2044
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
Master_UUID: 69a73914-62ca-11e3-870f-080027dff846
Master_Info_File: /mysql/data/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
1 row in set (0.00 sec)

db3
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.11.198
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000064
Read_Master_Log_Pos: 3324
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 2655
Relay_Master_Log_File: mysql-bin.000064
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB: test
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 3324
Relay_Log_Space: 2822
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
Master_UUID: 69a73914-62ca-11e3-870f-080027dff846
Master_Info_File: /mysql/data/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
1 row in set (0.01 sec)

mysql>

mysql-mmm安装
1、db节点：
yum -y install mysql-mmm-agent

2、monitor节点：
yum -y install mysql-mmm*

mysql-mmm的配置:
1、在三个db节点授权:
grant super, replication client, process on *.* to 'mmm_agent'@'192.168.11.%' identified by 'XXXX';
grant replication client on *.* to 'mmm_monitor'@'192.168.11.%' identified by 'XXXX';flush privileges;

修改配置文件
sudo vim /etc/mysql-mmm/mmm_common.conf (同时编辑db、monitor)

[mysql@localhost ~]$ sudo cat /etc/mysql-mmm/mmm_common.conf
[sudo] password for mysql:
active_master_role writer

<host default>
cluster_interface eth0
pid_path /var/run/mysql-mmm/mmm_agentd.pid
bin_path /usr/libexec/mysql-mmm/
replication_user repl
replication_password XXXX
agent_user mmm_agent
agent_password 123456
</host>

<host db1>
ip 192.168.11.198
mysql_port 3306
mode master
peer db2
</host>

<host db2>
ip 192.168.11.88
mysql_port 3306
mode master
peer db1
</host>

<host db3>
ip 192.168.11.238
mysql_port 3306
mode slave
peer db3
</host>

<role writer>
hosts db1, db2
ips 192.168.11.170
mode exclusive
</role>

<role reader>
hosts db2 db3
ips 192.168.11.171,192.168.11.172
mode balanced
</role>

peer的意思相当于等同，表示db1与db2同等。
ips指定VIP
mode exclusive 有两种模式：exclusive排他，此模式下任何时候只能一个host拥有该角色
balanced模式可以有多个host同时拥有此角色。一般writer是exclusive，reader是balanced

sudo vim /etc/mysql-mmm/mmm_agent.conf (同时编辑master1、master2、slave1分别修改为：this db1、this db2、this db3)

sudo vim /etc/mysql-mmm/mmm_mon.conf (仅编辑monitor节点)
mysql@localhost bin]$ sudo cat /etc/mysql-mmm/mmm_mon.conf
include mmm_common.conf

<monitor>
ip 127.0.0.1
pid_path /var/run/mysql-mmm/mmm_mond.pid
bin_path /usr/libexec/mysql-mmm
status_path /var/lib/mysql-mmm/mmm_mond.status
ping_ips 192.168.11.198,192.168.11.88
auto_set_online 60

# The kill_host_bin does not exist by default, though the monitor will
# throw a warning about it missing. See the section 5.10 "Kill Host
# Functionality" in the PDF documentation.
#
# kill_host_bin /usr/libexec/mysql-mmm/monitor/kill_host
#
</monitor>

<host default>
monitor_user mmm_monitor
monitor_password 123456
</host>

debug 0

mmm启动
1、db节点：
[mysql@localhost mysql-mmm]$ sudo /etc/init.d/mysql-mmm-agent start
[sudo] password for mysql:
Starting MMM Agent Daemon: [ OK ]

[mysql@localhost bin]$ sudo /etc/init.d/mysql-mmm-monitor start
Starting MMM Monitor Daemon: [ OK ]

[mysql@localhost ~]$ sudo mmm_control show
db1(192.168.11.198) master/ONLINE. Roles: writer(192.168.11.170)
db2(192.168.11.88) master/ONLINE. Roles:
db3(192.168.11.238) slave/ONLINE. Roles: reader(192.168.11.171), reader(192.168.11.172)

[mysql@localhost bin]$ sudo mmm_control checks all
db2 ping [last change: 2014/05/06 17:53:36] OK
db2 mysql [last change: 2014/05/06 17:53:36] OK
db2 rep_threads [last change: 2014/05/06 17:53:36] OK
db2 rep_backlog [last change: 2014/05/06 17:53:36] OK: Backlog is null
db3 ping [last change: 2014/05/06 17:53:36] OK
db3 mysql [last change: 2014/05/06 19:04:39] OK
db3 rep_threads [last change: 2014/05/06 19:04:36] OK
db3 rep_backlog [last change: 2014/05/06 19:04:39] OK: Backlog is null
db1 ping [last change: 2014/05/06 17:53:36] OK
db1 mysql [last change: 2014/05/06 17:53:36] OK
db1 rep_threads [last change: 2014/05/06 17:53:36] OK
db1 rep_backlog [last change: 2014/05/06 17:53:36] OK: Backlog is null

测试:
停止DB1看192.168.11.170会不会漂移到DB2上去,同时DB3的Slave的IP会不会从DB1改到DB2

DB1:
[mysql@localhost ~]$ mysqladmin -u root -pXXXXXX shutdown
Warning: Using a password on the command line interface can be insecure.
140522 16:38:47 mysqld_safe mysqld from pid file /mysql/mysqld.pid ended
[1]+ Done mysqld_safe

MONITOR:
[mysql@localhost ~]$ sudo mmm_control show
db1(192.168.23.198) master/ONLINE. Roles: writer(192.168.11.170)
db2(192.168.23.88) master/ONLINE. Roles:
db3(192.168.23.238) slave/ONLINE. Roles: reader(192.168.11.171), reader(192.168.11.172)

[mysql@localhost ~]$ sudo mmm_control show
db1(192.168.23.198) master/HARD_OFFLINE. Roles:
db2(192.168.23.88) master/ONLINE. Roles:
db3(192.168.23.238) slave/ONLINE. Roles: reader(192.168.11.171), reader(192.168.11.172)

[mysql@localhost ~]$ sudo mmm_control show
db1(192.168.23.198) master/HARD_OFFLINE. Roles:
db2(192.168.23.88) master/ONLINE. Roles: writer(192.168.11.170)
db3(192.168.23.238) slave/ONLINE. Roles: reader(192.168.11.171), reader(192.168.11.172)

DB3:
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Connecting to master
Master_Host: 192.168.11.88
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000065
Read_Master_Log_Pos: 120
Relay_Log_File: relay-bin.000001
Relay_Log_Pos: 4
Relay_Master_Log_File: mysql-bin.000065
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB: test
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 120
Relay_Log_Space: 120
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 4
Master_UUID:
Master_Info_File: /mysql/data/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
1 row in set (0.00 sec)

2.mha 方式

一、MHA的简单介绍
MHA是由perl语言编写的,用外挂脚本的方式实现mysql主从复制的高可用性。
MHA可以自动检测mysql是否宕机,如果宕机,在10-30s内完成new master的选举,应用所有差异的binlog日志到所有slave,将所有的slave切换到新的master上来。
MHA除了自动检测mysql是否宕机,还能够交互式的切换master,在日常的数据库维护中,这个功能还是挺有用的。
由于MHA本身只负责数据库主从的切换,但是应用程序并不知道数据库的master变了。针对这种情况,可以使用MHA预留的几个脚本接口,通过虚拟IP或者修改全局配置文件的方法通知应用程序,master数据库已经改变。
MHA还是一个很活跃的项目,生产环境的使用者众多,不乏大公司,MHA的版本也很快,MHA作者在持续更新版本,最新版本已经支持GTID了。

二、MHA的原理
MHA的架构如下:
Mysql master1(MHA manger,MHA node)
|
____|____
| |
Mysql slave1(node) Mysql slave2(node)

首先介绍一下架构,上面这个图很挫,大家见谅哈,看下面文字。
MHA只支持两层的mysql复制架构,如上图,Mysql slave1下面还有slave的话,那么下面的slave属于第三层了,MHA是没法控制的。
在每个mysql的服务器上,都需要安装一个MHA的node节点。
全局一个MHA manger,manger节点需要通过配置文件中的账号访问到每个节点的Mysql,和ssh(非交互式)到每个节点的操作系统。所以这里就需要通过ssh key来完成。

MHA manage节点包含这几个程序:
masterha_manager (监控master,如果master down,自动完成failover)
masterha_master_switch (手动或者交互的完成failover或者master切换)
masterha_master_switch –conf=/etc/app1.cnf –master_state=dead –dead_master_host=192.168.153.150
masterha_master_switch –conf=/etc/app1.cnf –master_state=alive –new_master_host=192.168.153.151
masterha_check_status(检查masterha_manager是否运行)
masterha_check_repl(检查master复制环境是否正确)
masterha_stop(停止MHA)
masterha_conf_host
masterha_ssh_check (检查通过ssh是否可以登录对应的node节点)
purge_relay_logs (删除无用的relay log,避免延时)
masterha_secondary_check(通过其他路由去检测master是否真的挂了)
masterha_secondary_check -s 192.168.153.151 -s 192.168.153.152 –user=root –master_host=localhost –master_ip=192.168.153.150 –master_port=3306
Master is reachable from 192.168.153.151!

MHA node节点包含着四个程序:
save_binary_logs(保存和复制当掉的主服务器二进制日志)
apply_diff_relay_logs(识别差异的relay log事件，并应用于其他salve服务器)
purge_relay_logs(清除relay log文件)
filter_mysqlbinlog(这个脚本现在已经废弃了)
需要在所有mysql服务器上安装MHA节点，MHA管理服务器也需要安装。MHA管理节点模块内部依赖MHA节点模块。MHA管理节点通过ssh连接管理mysql服务器和执行MHA节点脚本。

MHA的failover流程:
#启动前的准备工作
#检查数据库服务器状态,获取相关参数设置
#测试ssh连接是否成功
#测试MHA node是否可用
#创建MHA日志目录
#开始检查slave的差异日志应用权限
#确定当前的复制架构
#调试master_ip_failover_script
#调试shutdown_script
#设置二次检查的主机masterha_secondary_check
#MHA启动完毕,进入监测状态
#监测DB1服务器挂了
#通过定义的二次监测,确认master是否挂了
#确认master挂了,开始进入failover流程
#再试尝试连接master和master的ssh
#通过MHA配置文件,监测其他slave的状态
#再次监测slave的配置是否有变化,是否符合failover条件
#正式开始failover
#再次对slave配置做检查
#对原Master做master_ip_failover_script和shutdown_script的操作
#开始差异日志的恢复:获取slave最后得到的binlog位置
#获取原master的binlog日志
#确定新的master
#在new master上应用差异的binlog日志
#获取new master的binlog位置。
#如果有master_ip_failover_script,那么给new master设置VIP
#开始恢复其他slave,也是从原master的binlog对比来做恢复
#差异日志应用完成以后,切换所有slave到new master。
#failover操作完成,生成failover报告

三、安装配置
环境设定:
主机角色 IP 安装软件
db1 MASTER 192.168.153.150 mysql,mha manger,mha node,keepalived
db2 SLAVE1 192.168.153.151 mysql,mha node,keepalived(候选MASTER)
db3 SLAVE2 192.168.153.152 mysql,mha node
VIP(virtual ip):192.168.153.100

大概的安装流程:
1、关闭selinux和iptables
2、安装开发库和基础库,以及相关的开发工具,perl库
3、配置ssh的公匙,免密码登录
4、安装配置mysql数据库,并且授权
5、安装mha node
6、安装mha manager
7、修改mha配置文件
8、测试mha切换
9、安装,配置,测试keepalived
10、将mha和keepalived结合,加上相关脚本,联合调试。

1、cat /etc/sysconfig/selinux
设置SELINUX=disabled
2、
#iptables -F INPUT
#service iptables save
#iptables -xvnL //查看没有任何规则为准,如果你真需要iptables规则,建议再安装调试玩MHA以后,在设置规则,再调试一次规则是否对MHA有影响。

3、配置ssh key免密码登录.
db1:
#ssh-keygen -t rsa //一路回车
#ssh-copy-id -i .ssh/id_rsa.pub root@192.168.153.150
#ssh-copy-id -i .ssh/id_rsa.pub root@192.168.153.151
#ssh-copy-id -i .ssh/id_rsa.pub root@192.168.153.152

db2:
#ssh-keygen -t rsa //一路回车
#ssh-copy-id -i .ssh/id_rsa.pub root@192.168.153.150
#ssh-copy-id -i .ssh/id_rsa.pub root@192.168.153.152

db2:
#ssh-keygen -t rsa //一路回车
#ssh-copy-id -i .ssh/id_rsa.pub root@192.168.153.150
#ssh-copy-id -i .ssh/id_rsa.pub root@192.168.153.151

4、安装mysql数据库,并授权
所有机器上安装mysql server,修改配置文件,完成三台机器主从复制的搭建,由于这块描述起来挺多的,大家可以参考我写的安装mysql-mmm的资料.
《mysql-mmm安装手册》http://isadba.com/?p=142

5、安装mha node,所有机器上都需要安装
在https://code.google.com/p/mysql-master-ha/下载最新的rpm包或者源码包安装，我使用的rpm包,如果包缺乏依赖关系,使用yum安装对应的包就可以。
#rpm -ivh mha4mysql-node-0.54-0.el6.noarch.rpm

6、在db1上安装MHA manager软件。
开始我也尝试用rpm包安装,但是遇见两个兼容行问题,我的yum库没有对应的包,使用CPAN安装以后,rpm包不能识别,转而使用了源码编译.总的来说MHA的软件包还是比较好安装的。
#tar -zxvf mha4mysql-manager-0.55.tar.gz
#cd mha4mysql-manager-0.55
#ls
#perl Makefile.PL
#make install

7、修改配置文件,配置文件只需要mha_manager机器上存在就行了.
默认的配置文件模板在源码包里面有，具体位置如下.
/root/soft/mha4mysql-manager-0.55/samples/conf,有app1.cnf和masterha_default.cnf两个配置文件。masterha_manager会同时读取这两个配置文件。
app1.cnf主要是存放node节点的配置,masterha_default.cnf主要存放服务器端的配置.但是通常的处理方式是不用masterha_default.cnf,而是把这个文件里面的配置写入到app1.cnf里面。
我的app1.cnf配置如下:

# cat /etc/app1.cnf[server default]user=mha//mha用来获取数据库一些配置和状态的用户password=mhassh_user=root//ssh key的用户repl_user=slave//mysql复制使用的账号和密码repl_password=slavemanager_workdir=/var/log/masterha/app1//mha状态和日志,差异日志保存的目录manager_log=/var/log/masterha/app1/manager.log//mha日志remote_workdir=/var/log/masterha/app1//node节点的工作目录secondary_check_script="masterha_secondary_check -s 192.168.153.151 -s 192.168.153.152"//二次检查的配置.意思是manager将连接到192.168.153.151和152的系统上,测试master是否可用,避免脑裂问题.//下面几个脚本控制稍后来讲，我们现在先不启用他们。#master_ip_failover_script="/opt/master_ip_failover.sh"//failover的控制Vip的脚本#master_ip_online_change_script=""//交互式出发的在线切换时调用的脚本#shutdown_script="/opt/master_ip_failover.sh"//关机脚本#report_script=""//通知脚本//下面是每个node节点的单独配置[server1]hostname=192.168.153.150candidate_master=1[server2]hostname=192.168.153.151candidate_master=1[server3]hostname=192.168.153.152no_master=1

8、测试mha
首先两个小测试:
#测试ssh key是否可用

# masterha_check_ssh --conf=/etc/app1.cnfSun Sep 28 14:39:57 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Sun Sep 28 14:39:57 2014 - [info] Reading application default configurations from /etc/app1.cnf..Sun Sep 28 14:39:57 2014 - [info] Reading server configurations from /etc/app1.cnf..Sun Sep 28 14:39:57 2014 - [info] Starting SSH connection tests..Sun Sep 28 14:40:02 2014 - [debug]Sun Sep 28 14:39:58 2014 - [debug]  Connecting via SSH from root@192.168.153.150(192.168.153.150:22) to root@192.168.153.151(192.168.153.151:22)..Sun Sep 28 14:40:01 2014 - [debug]   ok.Sun Sep 28 14:40:01 2014 - [debug]  Connecting via SSH from root@192.168.153.150(192.168.153.150:22) to root@192.168.153.152(192.168.153.152:22)..Sun Sep 28 14:40:02 2014 - [debug]   ok.Sun Sep 28 14:40:02 2014 - [debug]Sun Sep 28 14:39:58 2014 - [debug]  Connecting via SSH from root@192.168.153.151(192.168.153.151:22) to root@192.168.153.150(192.168.153.150:22)..Sun Sep 28 14:40:01 2014 - [debug]   ok.Sun Sep 28 14:40:01 2014 - [debug]  Connecting via SSH from root@192.168.153.151(192.168.153.151:22) to root@192.168.153.152(192.168.153.152:22)..Sun Sep 28 14:40:02 2014 - [debug]   ok.Sun Sep 28 14:40:03 2014 - [debug]Sun Sep 28 14:39:59 2014 - [debug]  Connecting via SSH from root@192.168.153.152(192.168.153.152:22) to root@192.168.153.150(192.168.153.150:22)..Sun Sep 28 14:40:02 2014 - [debug]   ok.Sun Sep 28 14:40:02 2014 - [debug]  Connecting via SSH from root@192.168.153.152(192.168.153.152:22) to root@192.168.153.151(192.168.153.151:22)..Sun Sep 28 14:40:03 2014 - [debug]   ok.Sun Sep 28 14:40:03 2014 - [info] All SSH connection tests passed successfully.

测试复制环境

# masterha_check_repl --conf=/etc/app1.cnfSun Sep 28 14:40:43 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Sun Sep 28 14:40:43 2014 - [info] Reading application default configurations from /etc/app1.cnf..Sun Sep 28 14:40:43 2014 - [info] Reading server configurations from /etc/app1.cnf..Sun Sep 28 14:40:43 2014 - [info] MHA::MasterMonitor version 0.55.Sun Sep 28 14:40:53 2014 - [info] Checking replication health on 192.168.153.151..省略若干行..............................Sun Sep 28 14:40:53 2014 - [info]  ok.Sun Sep 28 14:40:53 2014 - [info] Checking replication health on 192.168.153.152..Sun Sep 28 14:40:53 2014 - [info]  ok.Sun Sep 28 14:40:53 2014 - [warning] master_ip_failover_script is not defined.Sun Sep 28 14:40:53 2014 - [info] Checking shutdown script status:Sun Sep 28 14:40:53 2014 - [info]   /opt/master_ip_failover.sh --command=status --ssh_user=root --host=192.168.153.150 --ip=192.168.153.150Sun Sep 28 14:40:53 2014 - [info]  OK.Sun Sep 28 14:40:53 2014 - [info] Got exit code 0 (Not master dead).

如果以上两个测试都通过,看来环境和配置基本OK，我们来启动MHA

# masterha_manager  --conf=/etc/app1.cnfSun Sep 28 14:42:43 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Sun Sep 28 14:42:43 2014 - [info] Reading application default configurations from /etc/app1.cnf..Sun Sep 28 14:42:43 2014 - [info] Reading server configurations from /etc/app1.cnf..

进程在切换触发以后会自动关掉.这个命令建议放到screen里面跑。
现在可以去# cd /var/log/masterha/app1/看看生成的日志
如果没有什么很严重的错误信息,那么就可以准备尝试failover了。

#开始failover
关闭master的mysql服务,观察db02和db03的复制变化情况。
# service mysql stop
Shutting down MySQL (Percona Server)….. SUCCESS!
查看/var/log/masterha/app1/manager.log的日志.如果看见如下信息,那么就是failover成功了。

----- Failover Report -----app1: MySQL Master failover 192.168.153.150 to 192.168.153.151 succeededMaster 192.168.153.150 is down!Check MHA Manager logs at localhost.localdomain:/var/log/masterha/app1/manager.log for details.Started automated(non-interactive) failover.The latest slave 192.168.153.151(192.168.153.151:3306) has all relay logs for recovery.Selected 192.168.153.151 as a new master.192.168.153.151: OK: Applying all logs succeeded.192.168.153.152: This host has the latest relay log events.Generating relay diff files from the latest slave succeeded.192.168.153.152: OK: Applying all logs succeeded. Slave started, replicating from 192.168.153.151.192.168.153.151: Resetting slave info succeeded.Master failover to 192.168.153.151(192.168.153.151:3306) completed successfully.

接下来观察一下db2和db3的复制情况:
db2:

(root:hostname)[(none)]> show slave status\GEmpty set (0.00 sec)

db3:

(root:hostname)[(none)]> show slave status\G*************************** 1. row ***************************               Slave_IO_State: Waiting for master to send event                  Master_Host: 192.168.153.151                  Master_User: slave                  Master_Port: 3306                Connect_Retry: 10              Master_Log_File: mysql-bin.000016          Read_Master_Log_Pos: 688               Relay_Log_File: mysql-relay.000002                Relay_Log_Pos: 283        Relay_Master_Log_File: mysql-bin.000016             Slave_IO_Running: Yes            Slave_SQL_Running: Yes              Replicate_Do_DB:          Replicate_Ignore_DB:           Replicate_Do_Table:       Replicate_Ignore_Table:      Replicate_Wild_Do_Table:  Replicate_Wild_Ignore_Table:                   Last_Errno: 0                   Last_Error:                 Skip_Counter: 0          Exec_Master_Log_Pos: 688              Relay_Log_Space: 452              Until_Condition: None               Until_Log_File:                Until_Log_Pos: 0           Master_SSL_Allowed: No           Master_SSL_CA_File:           Master_SSL_CA_Path:              Master_SSL_Cert:            Master_SSL_Cipher:               Master_SSL_Key:        Seconds_Behind_Master: 0Master_SSL_Verify_Server_Cert: No                Last_IO_Errno: 0                Last_IO_Error:               Last_SQL_Errno: 0               Last_SQL_Error:  Replicate_Ignore_Server_Ids:             Master_Server_Id: 151                  Master_UUID: dd079e18-4244-11e4-b851-000c29da163e             Master_Info_File: /var/lib/mysql/master.info                    SQL_Delay: 0          SQL_Remaining_Delay: NULL      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it           Master_Retry_Count: 86400                  Master_Bind:      Last_IO_Error_Timestamp:     Last_SQL_Error_Timestamp:               Master_SSL_Crl:           Master_SSL_Crlpath:           Retrieved_Gtid_Set:            Executed_Gtid_Set:                Auto_Position: 01 row in set (0.00 sec)

9、安装测试keepalived,在db1和db2上安装keepalived。
关于keepalvied的信息,可以阅读《LVS+Keepalived使用总结》http://isadba.com/?p=67
或者搜索《keepalived权威指南》
下载keepalived的软件包

http://www.keepalived.org/download.html 下载最新的tar.gz包。#yum install kernel-devel#tar -zxvf keepalived-1.2.13.tar.gz#cd keepalived-1.2.13#./configure --prefix=/ --with-kernel-dir=/usr/src/kernels/2.6.32-431.29.2.el6.x86_64/# make && make install

安装完成后,修改配置文件,下面是db1上面的配置文件,db2的话,将优先级改低50就可以了。

# cat /etc/keepalived/keepalived.conf! Configuration File for keepalivedglobal_defs {   notification_email {     acassen@firewall.loc     failover@firewall.loc     sysadmin@firewall.loc   }   notification_email_from Alexandre.Cassen@firewall.loc   smtp_server 192.168.200.1   smtp_connect_timeout 30   router_id LVS_DEVEL}vrrp_instance VI_1 {    state MASTER    interface eth0//keepalived使用的网口    virtual_router_id 51    priority 150//优先级越高,优先获取虚拟IP    advert_int 1    authentication {        auth_type PASS        auth_pass 1111    }    virtual_ipaddress {        192.168.153.100//虚拟IP    }}

测试keepalived是否正常工作.

db1#service keepalived restartdb2#service keepalived restartdb1#ip add1: lo:  mtu 16436 qdisc noqueue state UNKNOWN    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00    inet 127.0.0.1/8 scope host lo    inet6 ::1/128 scope host       valid_lft forever preferred_lft forever2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000    link/ether 00:0c:29:da:16:3d brd ff:ff:ff:ff:ff:ff    inet 192.168.153.150/24 brd 192.168.153.255 scope global eth0    inet 192.168.153.100/32 scope global eth0    inet6 fe80::20c:29ff:feda:163d/64 scope link       valid_lft forever preferred_lft forever

现在关掉db1的keepalived的进程:

db1# killall keepaliveddb1#ip add1: lo:  mtu 16436 qdisc noqueue state UNKNOWN    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00    inet 127.0.0.1/8 scope host lo    inet6 ::1/128 scope host       valid_lft forever preferred_lft forever2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000    link/ether 00:0c:29:da:16:3d brd ff:ff:ff:ff:ff:ff    inet 192.168.153.150/24 brd 192.168.153.255 scope global eth0    inet6 fe80::20c:29ff:feda:163d/64 scope link       valid_lft forever preferred_lft foreverdb2# ip add1: lo:  mtu 16436 qdisc noqueue state UNKNOWN    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00    inet 127.0.0.1/8 scope host lo    inet6 ::1/128 scope host       valid_lft forever preferred_lft forever2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000    link/ether 00:0c:29:da:16:3e brd ff:ff:ff:ff:ff:ff    inet 192.168.153.151/24 brd 192.168.153.255 scope global eth0    inet 192.168.153.100/32 scope global eth0    inet6 fe80::20c:29ff:feda:163e/64 scope link       valid_lft forever preferred_lft forever

大家可以发现,虚拟IP瞬间已经漂移到了db2上面.调试信息可以在/var/log/messages 中看见。

我们使用keepalived的目的就是在MHA检测到master挂掉的时候,调用shutdown_script关掉keepalived进程,从而是虚拟IP移动到新的master上面去。

9、联合MHA和keepalived调试.
在调试之前,我们需要搞清楚一些事情.那就是关于上面我们注释掉的几个script,他们是干什么的,在什么时候调用.
#master_ip_failover_script:
首先启动的时候会调用这个脚本
/opt/master_ip_failover_script.sh –command=status –ssh_user=root –orig_master_host=192.168.153.150 –orig_master_ip=192.168.153.150 –orig_master_port=3306
然后在正式failover过程中的第二步,Dead Master Shutdown Phase阶段会在次执行。
/opt/master_ip_failover_script.sh –orig_master_host=192.168.153.150 –orig_master_ip=192.168.153.150 –orig_master_port=3306 –command=stopssh –ssh_user=root
在正式failover过程中的第3.4步骤中(选举新的master以后,应用差异的binlog后),会再次执行。
/opt/master_ip_failover_script.sh –command=start –ssh_user=root –orig_master_host=192.168.153.150 –orig_master_ip=192.168.153.150 –orig_master_port=3306 –new_master_host=192.168.153.151 –new_master_ip=192.168.153.151 –new_master_port=3306 –new_master_user=’mha’ –new_master_password=’mha’

#master_ip_online_change_script：
在使用masterha_master_switch –conf=/etc/app1.cnf –master_state=alive –new_master_host=192.168.153.151主动切换mysql master的时候会调用.
在online切换的第二阶段,拒绝写入原master的时候执行。
/opt/master_ip_online_change_script.sh –command=stop –orig_master_host=192.168.153.150 –orig_master_ip=192.168.153.150 –orig_master_port=3306 –orig_master_user=’mha’ –orig_master_password=’mha’ –new_master_host=192.168.153.151 –new_master_ip=192.168.153.151 –new_master_port=3306 –new_master_user=’mha’ –new_master_password=’mha’
然后会在new master上执行
/opt/master_ip_online_change_script.sh –command=start –orig_master_host=192.168.153.150 –orig_master_ip=192.168.153.150 –orig_master_port=3306 –orig_master_user=’mha’ –orig_master_password=’mha’ –new_master_host=192.168.153.151 –new_master_ip=192.168.153.151 –new_master_port=3306 –new_master_user=’mha’ –new_master_password=’mha’

#shutdown_script:
首先启动的时候会执行这个脚本,执行时间紧跟着master_ip_failover_script第一次执行后面
/opt/shutdown_script.sh –command=status –ssh_user=root –host=192.168.153.150 –ip=192.168.153.150
第二次执行是在master_ip_failover_script第二次执行后面
/opt/shutdown_script.sh –command=stopssh –ssh_user=root –host=192.168.153.150 –ip=192.168.153.150 –port=3306

#report_script=”" //通知脚本
在masterha_manager自动切换完成的最后会调用一次这个脚本。
report_script.sh –orig_master_host=(dead master’s hostname) –new_master_host=(new master’s hostname) –new_slave_hosts=(new slaves’ hostnames, delimited by commas) –subject=(mail subject) –body=(body)

在mha4mysql-manager源码包的samples/scripts/目录,会有几个示例的脚本.是perl编写的,我不太懂perl啦.如果有一样像我这样不太懂perl的同学,可以根据上面的调用参数,使用shell或者python从新实现一次。
自己在从新实现这些脚本的时候,有两点注意:
1、尽量符合调用的参数,让脚本更人性化
2、脚本的返回值需要是0或者10,不然会认为脚本执行错误,后面的操作将不再继续执行,failover操作将会停止。

我们现在需要自己写一个shutdown_script的脚本,内容就是检测master上的mysql是否真的挂掉了,如果真的挂掉了,那么就杀掉master上面的keepalived进程,触发VIP的漂移。
修改app.cnf中被注释掉的shutdown_script,指定到对应的脚本.我的shutdow_script.sh脚本在文章末尾公布,其实最简单的shutdown_script只需要干两个事情,一检查mysql是否当掉,二如果当掉就killall keepalived。

下面我们开始联合调试.
检查三台mysql的复制情况
master和备用master开启keepalived监听,检查虚拟ip是否在master上面。
启动mha_manager
关闭master mysql
检查slave的复制情况以及VIP漂移情况.

TIPS:有两个数据安全方面可以需要优化的地方
1、设置所有slave的read_only=on
如果设置了这个参数,就需要使用master_ip_failover_script和master_ip_online_change_script参数,在新master初始化的时候设置成read_only=off.这个设置的主要目的是避免master的os宕机时,keepalived的VIP比MHA先切换到new master.
2、设置所有的slave的relay_log_purge=0
设置这个参数以后,已完成的relay log就不会自动的purge掉.这个设置的主要目的是为了避免在failover的3.3和4.1阶段,diff log需要某个slave的已经完成的relay log存在.使用这个参数以后,会产生一个问题,
那就是relay log会越来越来,并且清理relay log的时候可能会导致复制阻塞.所以MHA的node提供了一个脚本purge_relay_logs来完成无阻塞的清理relay log.
我们需要在slave加上一个计划任务.
[app@slave_host1]$ cat /etc/cron.d/purge_relay_logs
# purge relay logs at 5am
0 5 * * * app /usr/bin/purge_relay_logs –user=root –password=PASSWORD –disable_relay_log_purge >> /var/log/masterha/purge_relay_logs.log 2>&1

下面是我的shutdown_script脚本,这个脚本主要使用的是stopssh方法,stop方法一般没有调用,如果你有需要,自己在稍微修改一下.
shutdown_script.sh:

[root@localhost opt]# cat shutdown_script.sh#!/bin/bash#       masterha shutdown_script.#       version:        2013-11-06       frist version##                               by andy.feng#                               copy rightLANG=Cfor i in $@do        if  [ ${i:2:2} = "ip" ]                then                IP=${i:5:20}        elif [ ${i:2:7} = "command" ]                then                CMD=${i:10:20}        elif [ ${i:2:4} = "port" ]                then                MYSQL_PORT=${i:7:20}        fidoneUSER="mha"PASSWORD="mha"function stopssh {        mysql -s -u$USER -p$PASSWORD -h$IP -P$MYSQL_PORT -e 'select count(*) as c from mysql.user;'  &> /dev/null        if [ $? -ne 0 ]        then                ssh $IP 'killall keepalived'                if [ $? != 0 ]                        then                        echo "$IP killall keepalived fail....."                        return 1                fi                        return 0        fi}function stop {        mysql -s -u$USER -p$PASSWORD -h$IP -P$MYSQL_PORT -e 'select count(*) as c from mysql.user;'  &> /dev/null        if [ $? -ne 0 ]        then                ssh $IP 'shutdown -h now'               if [ $? != 0 ]                        then                        echo "$IP shutdown  fail....."                        return 1               fi                        return 0        fi}if [ $CMD = 'stopssh' ]        then        stopsshfi

3.percona xtradb cluster 加上redis哨兵方式

正在研究

0 0