双master+heartbeat实现自动切换

来源:互联网 发布:阿里云短信接口 asp 编辑:程序博客网 时间:2024/06/01 07:50

author:skate
time:2012/07/12

 

双master+heartbeat实现自动切换


heartbeat主要是主机故障切换,服务故障不切换,如要服务故障切换就需要自己写脚本检测服务的状态,如果服务异常则调用heartbeat切换脚本完成切换

环境:
os:rht5.2
mysql:percona5.5
heartbeat:2.1.3


1.首先在双机上安装mysql软件
参考:http://blog.csdn.net/wyzxg/article/details/7695663

master1.skate.com的my.cnf添加如下配置

server-id       = 1
log_bin=/data/mysql/binlog/master_binlog.log
binlog-do-db=skate
log_slave_updates=1
auto_increment_increment=2
auto_increment_offset=1
binlog_format=mixed
expire_logs_days=7


master2.skate.com的my.cnf添加如下配置

server-id       = 2
log_bin=/data/mysql/binlog/master_binlog.log
binlog-do-db=skate
log_slave_updates=1
auto_increment_increment=2
auto_increment_offset=1
binlog_format=mixed
expire_logs_days=7

创建复制帐户
mysql>GRANT REPLICATION SLAVE ON *.* TO 'rep'@'%' IDENTIFIED BY 'rep';


2.然后在线创建slave
参考:http://www.percona.com/doc/percona-xtrabackup/howtos/setting_up_replication.html

3.创建好slave后,检测其正常;
为了保证数据一致性,首先在slave上运行
mysql> FLUSH TABLES WITH READ LOCK; //这个会阻塞slave的同步
mysql> show master status;

4.
因为这个时候slave全库锁住,不会被更新,然后在master运行
mysql> stop slave;
mysql> CHANGE MASTER TO
    ->     MASTER_HOST='master2.skate.com',
    ->     MASTER_USER='rep',
    ->     MASTER_PASSWORD='rep',
    ->     MASTER_PORT=3306,
    ->     MASTER_LOG_FILE='master-bin.001',  //在slave看到信息
    ->     MASTER_LOG_POS=4;                  //在slave看到信息
 
mysql> start slave;
mysql> show slave status\G;

最后在slave山解锁
mysql> unlock tables;

 

到目前为止 双master以及配置完,下面配置heartbeat实现故障自动切换


环境说明
vip:192.168.211.163
master1:eth0/192.168.211.127 对外
                   eth1/172.16.0.11     心跳
master2:192.168.211.199      对外
                   eth1/172.16.0.12     心跳
  
需要关注修改如下文件:
/etc/hosts
/etc/host.conf
/etc/resolv.conf
/etc/sysconfig/network
/etc/sysconfig/network-scripts/ifcfg-eth0
/etc/sysconfig/network-scripts/ifcfg-eth1

master1的hosts
[root@master1 mysql]# more /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 master1.skate.com master1 loalhost.localdomain localhost
192.168.211.127 master1.skate.com
192.168.211.199 master2.skate.com
::1     localhost6.localdomain6 localhost6

master2的hosts
[root@master2 ~]# more /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 master2.skate.com master2 loalhost.localdomain localhost
192.168.211.127 master1.skate.com
192.168.211.199 master2.skate.com
::1     localhost6.localdomain6 localhost6

master1和master2的host.conf
# more /etc/host.conf
order hosts,bind

master1和master2的resolv.conf
# more /etc/resolv.conf
nameserver 202.106.0.20
nameserver 202.106.196.115
search localhost

master1的network
[root@master1 mysql]# more /etc/sysconfig/network
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=master1.skate.com
GATEWAY="192.168.211.1"
GATEWAY="eth0" //网关使用的网卡
ONBOOT=YES
FORWARD_IPV4="yes"


master2的network
[root@master2 ~]# more /etc/sysconfig/network
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=master2.skate.com
GATEWAY="192.168.211.1"
GATEWAY="eth0"  //网关使用的网卡
ONBOOT=YES
FORWARD_IPV4="yes"

master1的ifcfg-eth0
[root@master1 mysql]# more /etc/sysconfig/network-scripts/ifcfg-eth0
# Intel Corporation 82540EM Gigabit Ethernet Controller
DEVICE=eth0
ONBOOT=yes
BOOTPROTO=none
HWADDR=08:00:27:3e:3e:09
NETMASK=255.255.255.0
IPADDR=192.168.211.127
GATEWAY=192.168.211.1
TYPE=Ethernet

[root@master1 mysql]# more /etc/sysconfig/network-scripts/ifcfg-eth1
# Intel Corporation 82540EM Gigabit Ethernet Controller
DEVICE=eth1
ONBOOT=yes
BOOTPROTO=none
HWADDR=08:00:27:9a:96:11
TYPE=Ethernet
NETMASK=255.255.255.0
IPADDR=172.16.0.11
USERCTL=no
IPV6INIT=no
PEERDNS=yes

master2的ifcfg-eth0
[root@master2 ~]# more /etc/sysconfig/network-scripts/ifcfg-eth0
# Intel Corporation 82540EM Gigabit Ethernet Controller
DEVICE=eth0
BOOTPROTO=none
HWADDR=08:00:27:a8:84:fc
ONBOOT=yes
TYPE=Ethernet
USERCTL=no
IPV6INIT=no
PEERDNS=yes
NETMASK=255.255.255.0
IPADDR=192.168.211.199
GATEWAY=192.168.211.1

[root@master2 ~]# more /etc/sysconfig/network-scripts/ifcfg-eth1
# Intel Corporation 82540EM Gigabit Ethernet Controller
DEVICE=eth1
ONBOOT=yes
BOOTPROTO=none
HWADDR=08:00:27:38:98:26
NETMASK=255.255.255.0
IPADDR=172.16.0.12
TYPE=Ethernet
USERCTL=no
IPV6INIT=no
PEERDNS=yes

 

检测各自的主机名;ping对方的机器名称

[root@master2 ~]# uname -n
master2.skate.com
[root@master2 ~]# ping master1.skate.com
PING master1.skate.com (192.168.211.127) 56(84) bytes of data.
64 bytes from master1.skate.com (192.168.211.127): icmp_seq=1 ttl=64 time=0.551 ms

 

主机已经配置完,下面开始安装heartbeat和依赖包

1.
master1和master2分别安装
# yum install heartbeat
# yum install ipvsadm
# yum install libnet


heartbeat有三个配置文件:
  — ha.cf
  — authkyes
  — haresources

[root@master1 mysql]# cd  /usr/share/doc/heartbeat-2.1.3/
[root@master1 heartbeat-2.1.3]# cp ha.cf /etc/ha.d/
[root@master1 heartbeat-2.1.3]# cp haresources /etc/ha.d/
[root@master1 heartbeat-2.1.3]# cp authkeys /etc/ha.d/

2.
首先配置ha.cf(两个节点一样的)
[root@master1 ha.d]# pwd
/etc/ha.d
[root@master1 ha.d]# more ha.cf
logfile /var/log/ha-log   #ha的日志文件记录位置。如没有该目录,则需要手动添加
logfacility local0       #这个是设置heartbeat的日志,这里是用的系统日志
keepalive 2             #多长时间检测一次,设定心跳(监测)时间时间为2秒
warntime 4              #连续多长时间联系不上后开始警告提示
deadtime 20             #连续多长时间联系不上后认为对方挂掉了(单位是秒)
initdead 60            #这里主要是给重启后预留的一段忽略时间段(比如:重启后启动网络等,如果在网络还没有通,keepalive检测肯定通不过,但这时候并不能切换)
#采用eth1的udp广播用来发送心跳信息
#bcast eth1
#采用网卡eth1的udp单播来通知心跳,ip应为对方IP,建议采用单播。当一个网段有多台这样cluster话,则一定要采用单播,否则每组cluster都会看到对方的节点,从而报错。
#ucast eth1 172.16.0.12
##使用udp端口694 进行心跳监测
udpport 694
auto_failback off    #恢复正常后是否需要再自动切换回来,一般都设为off。
##节点1的HOSTNAME,必须要与 uname -n 指令得到的结果一致。
node master1.skate.com
##节点2的HOSTNAME
node master1.skate.com
##通过ping 网关来监测心跳是否正常
ping 172.16.0.12
hopfudge 1
#ping确定节点dead时间
deadping 10
#指定和heartbeat一起启动、关闭的进程
respawn hacluster /usr/lib/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster
#是否采用v2 style模式,在三节点以上时一定要打开
#crm on

3.
编辑双机互联验证文件:authkeys
[root@master1 ha.d]# more authkeys
auth 1
1 crc
[root@master1 ha.d]# chmod 600 authkeys    //authkeys的权限一定要是600

4.
编辑集群资源文件:haresources (切换时备机需要做的事情)

[root@master1 ha.d]# more haresources
#node-name resource1 resource2 ... resourceN
master1.skate.com 192.168.211.163 test

#其中,192.168.211.163为VIP,这里test这个脚本一定需要,没有这个脚本,heartbeat是无法运行的,脚本完成切换所需要的动作。因为我这个是双master结构的,切换时只需要切换vip即可

[root@master1 ha.d]# more /etc/init.d/test
#!/bin/bash

echo "" $ > /dev/null

如果是master/slave结构的,这test脚本就要完成在start时将slave变成master。在stop时将master变成slave

建议还是采用heartbeat+双master模式,这样将数据丢失降到最低。采用innodb存储引擎,并且设置innodb_flush_log_at_trx_commit = 1,这使得几乎每个提交的事务都能记录在 ib_logfile* 中,在secondary节点上能得到恢复,减小损失

5.
测试
A.拔掉心跳网线,模拟网络故障
B.shutdown主机,模拟主机宕机
C.主机掉电,模拟故障
D.手动切换,调用脚本“/usr/lib/heartbeat/hb_standby”,让heartbeat通知对方节点自己请求变成standby节点。vip漂移到对方节点上

如果mysql服务有问题,主机是正常的,目前的环境是无法切换的。我们可以自己写脚本

cat /usr/local/mysql/bin/moniter.sh
#!/bin/bash
mysql_path=/usr/local/mysql/bin/
user="root"
password="skate"
logfile=/var/log/moniter.log
date=`(date +%y-%m-%d--%H:%M:%S)`
sleeptime=30
ip=$(/sbin/ifconfig | grep "inet addr" | grep -v "127.0.0.1" | awk '{print $2;}' |awk -F':' '{print $2;}' |head -1)
Slave_IO_Running=$(mysql -u$user -p$password -e 'show slave status\G' |grep "Slave_IO_Running" | awk '{print $2}')
Slave_SQL_Running=$(mysql -u$user -p$password -e 'show slave status\G' | grep "Slave_SQL_Running" | awk '{print $2}')

mysql -p$password -e "use test;"
if [[ $? != 0 ]]
then
/usr/share/heartbeat/hb_standby
echo "relase vip ,and become standby!!"
#报警脚本
else
echo "mysql is ok"
fi

if [ "$Slave_IO_Running" = "Yes" -a "$Slave_SQL_Running" = "Yes" ]
then
  echo "Slave is running!" >/dev/null
else
echo "{$ip}_replicate error please fix it "
#报警脚本
fi

这样这个脚本就可以检测mysql服务故障并切换,还可以报警(根据自己的报警方式)


6.维护
启动和关闭heartbeat的方法:
# /etc/init.d/hearbeat start 或 service heartbeat start
# /etc/init.d/hearbeat stop  或 service heartbeat stop


 

 

 

------end-------

 

 

 

原创粉丝点击