Mysql 主主复制失败

来源:互联网 发布:销售系统源码 编辑:程序博客网 时间:2024/05/21 17:22

Mysql 主主复制失败


  • Mysql 主主复制失败
    • 故障描述
    • 架构信息
    • 节点信息
    • 故障分析
    • 同步AIPPRD2的从环境
    • 同步AIPPRD1的从环境


故障描述

原因描述 因为机柜PDU老化, 导致整个机柜掉电. 故障时间 20160923-10:09 发现时间 20160929-13:56

架构信息

Tomcat Memcache Keepalive Mysql主主复制

节点信息

序号 节点名称 IP地址 报错信息 1 aipprd1 10.66.1.52 Got fatal error 1236 from master when reading data from binary log: ‘binlog truncated in the middle of event; consider out of disk space on master; the first event ‘mysql-bin.000084’ at 91941417, the last event read from ‘/aip/mysql/data/log/mysql-bin.000084’ at 91941783, the last byte read from ‘/aip/mysql/data/log/mysql-bin.000084’ at 91942912.’ 2 aipprd2 10.66.1.51 Got fatal error 1236 from master when reading data from binary log: ‘binlog truncated in the middle of event; consider out of disk space on master; the first event ‘mysql-bin.000082’ at 6369026, the last event read from ‘/aip/mysql/data/log/mysql-bin.000082’ at 6369026, the last byte read from ‘/aip/mysql/data/log/mysql-bin.000082’ at 6369280.’

故障分析

  • 由于Zabbix的Mysql监控脚本的缘故, 没有触发事件, 所以直至20160929检查Zabbix日志的时候, 才发现该故障, 当时KeepaliveVIPaipprd1上, 这个节点上, 数据库是对外服务的;
  • 那么首先以aipprd1为主, 先将aipprd2的从环境同步起来;
  • aipprd2从环境同步完成后, 再将aipprd1的从环境同步起来.

同步AIPPRD2的从环境

  • 检查aipprd2的Slave状态
    根据代码区的描述, Last_Errno的错误代码为1062, 需要手动修改Position.

    [root@aipprd2 ~]# mysql -uzabbixmoniter -ppassw0rd -hlocalhost -e "show slave status\G;"*************************** 1. row ***************************           Slave_IO_State:               Master_Host: 10.66.1.52              Master_User: root              Master_Port: 3388            Connect_Retry: 60          Master_Log_File: mysql-bin.000082      Read_Master_Log_Pos: 6369026           Relay_Log_File: mysql-relay-bin.000257            Relay_Log_Pos: 8999    Relay_Master_Log_File: mysql-bin.000082         Slave_IO_Running: No        Slave_SQL_Running: No          Replicate_Do_DB:       Replicate_Ignore_DB:        Replicate_Do_Table:    Replicate_Ignore_Table:   Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table:                Last_Errno: 1062               Last_Error: Error 'Duplicate entry '93FF91EF92866D23E80E4A57D55ED538-n1.tomcat604' for key 'PRIMARY'' on query. Default database: 'aipprd'. Query: 'INSERT INTO eahttpsession (     sessionid, username, account,      createtime, loginip,userid,explorer,userDomain,computerName,computerUserName)   VALUES ('93FF91EF92866D23E80E4A57D55ED538-n1.tomcat604', '李花', 'XS003_4200',      '2016-09-23 12:10:33', , '10752920','MSIE 7.0','','','')'             Skip_Counter: 0      Exec_Master_Log_Pos: 6367510          Relay_Log_Space: 11099          Until_Condition: None           Until_Log_File:             Until_Log_Pos: 0       Master_SSL_Allowed: No       Master_SSL_CA_File:        Master_SSL_CA_Path:           Master_SSL_Cert:         Master_SSL_Cipher:            Master_SSL_Key:     Seconds_Behind_Master: NULLMaster_SSL_Verify_Server_Cert: No            Last_IO_Errno: 1236            Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'binlog truncated in the middle of event; consider out of disk space on master; the first event 'mysql-bin.000082' at 6369026, the last event read from '/aip/mysql/data/log/mysql-bin.000082' at 6369026, the last byte read from '/aip/mysql/data/log/mysql-bin.000082' at 6369280.'           Last_SQL_Errno: 1062           Last_SQL_Error: Error 'Duplicate entry '93FF91EF92866D23E80E4A57D55ED538-n1.tomcat604' for key 'PRIMARY'' on query. Default database: 'aipprd'. Query: 'INSERT INTO eahttpsession (     sessionid, username, account,      createtime, loginip,userid,explorer,userDomain,computerName,computerUserName)   VALUES ('93FF91EF92866D23E80E4A57D55ED538-n1.tomcat604', '李花', 'XS003_4200',      '2016-09-23 12:10:33', , '10752920','MSIE 7.0','','','')'Replicate_Ignore_Server_Ids:          Master_Server_Id: 1
  • aipprd2上按照Last_IO_Error中的PosfilePos修改.
    按照报错给出的提示PosfilePos修改后, 报错依旧.

    mysql> slave stop;mysql> CHANGE MASTER TO master_host='10.66.1.52', master_port=3388, master_user='root',     master_password='passw0rd', master_log_file='mysql-bin.000082', master_log_pos=6369026;mysql> slave start;
  • aipprd2检查Mysql的日志
    Mysql的日志中记录了Crash开始的时间, 并给出了建议, Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000082' position 6367510

    160923 12:35:22 [Note] Starting crash recovery...160923 12:35:22 [Note] Crash recovery finished.160923 12:36:27 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000082' position 6367510
  • aipprd2上按照日志中建议的PosfilePos修改.
    按照日志中建议的PosfilePos修改后, 报错依旧.

    mysql> slave stop;mysql> CHANGE MASTER TO master_host='10.66.1.52', master_port=3388, master_user='root', master_password='passw0rd', master_log_file='mysql-bin.000082', master_log_pos=6367510;mysql> slave start;
  • aipprd1上检查posfile
    首先检查show slave status\G;中给出的pos, 发现日志中根本不存在.

    [root@aipprd1 log]# mysqlbinlog --no-defaults --start-position=6369026 mysql-bin.000083 /*!40019 SET @@session.max_insert_delayed_threads=0*/;/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;DELIMITER /*!*/;# at 4#160923 12:38:59 server id 1  end_log_pos 107  Start: binlog v 4, server v 5.5.24-log           created 160923 12:38:59 at startupROLLBACK/*!*/;BINLOG '47HkVw8BAAAAZwAAAGsAAAAAAAQANS41LjI0LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADjseRXEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA=='/*!*/;ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 912208946, event_type:      71ERROR: Could not read entry at offset 6369026: Error in log format or read error.DELIMITER # End of log fileROLLBACK /* added by mysqlbinlog */;/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

    此后检查Mysql日志中建议的pos, 发现posfile中是存在此记录的, 而此posfile的最后一个pos6368660, 而show slave status\G;pos6369026, 显然不存在日志文件中.

    [root@aipprd1 log]# mysqlbinlog --no-defaults  --start-position=6367510 mysql-bin.000082 /*!40019 SET @@session.max_insert_delayed_threads=0*/;/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;DELIMITER /*!*/;# at 4#160923 10:39:15 server id 1  end_log_pos 107  Start: binlog v 4, server v 5.5.24-log created 160923 10:39:15 at startup# Warning: this binlog is either in use or was not closed properly.ROLLBACK/*!*/;BINLOG '05XkVw8BAAAAZwAAAGsAAAABAAQANS41LjI0LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADTleRXEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA=='/*!*/;    # at 6367510    # at 6367585    # at 6367963    # at 6367990    # at 6368073    # at 6368163    # at 6368275    # at 6368358    # at 6368558    # at 6368585    # at 6368660    # End of log fileROLLBACK /* added by mysqlbinlog */;/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

    最后在mysql-bin.000083中检查show slave status\G;提示pos:6369026, 也不存在.

    [root@aipprd1 log]# mysqlbinlog --no-defaults  --start-position=6369026 mysql-bin.000083 
  • aipprd2上重新发起修改posfilepos操作
    检查aipprd1上的日志, 既然mysql-bin.000082日志末不存在pos:6369026, 且mysql-bin.000083为下一个日志, 那么重新发起修改posfilepos的操作.

    [root@aipprd1 log]# lltotal 526908-rw-rw---- 1 mysql mysql 104858346 Sep 19 23:04 mysql-bin.000079-rw-rw---- 1 mysql mysql 104859112 Sep 22 13:01 mysql-bin.000080-rw-rw---- 1 mysql mysql  42558345 Sep 23 10:23 mysql-bin.000081-rw-rw---- 1 mysql mysql   6369280 Sep 23 12:13 mysql-bin.000082-rw-rw---- 1 mysql mysql 104857953 Sep 26 15:02 mysql-bin.000083-rw-rw---- 1 mysql mysql  70844542 Sep 27 17:42 mysql-bin.000084-rw-rw---- 1 mysql mysql 104858026 Sep 29 15:48 mysql-bin.000085-rw-rw---- 1 mysql mysql    297019 Sep 29 15:54 mysql-bin.000086-rw-rw---- 1 mysql mysql       296 Sep 29 15:48 mysql-bin.log.index-rw-rw---- 1 mysql mysql       107 Sep 27 18:05 mysql-relay-bin.000182-rw-rw---- 1 mysql mysql        43 Sep 27 18:05 mysql-relay-bin.index

    重新修改posfilemysql-bin.000083, pos0, 启动Slave后, 现在同步正常.

    mysql> slave stop;mysql> CHANGE MASTER TO master_host='10.66.1.52', master_port=3388, master_user='root', master_password='passw0rd', master_log_file='mysql-bin.000083', master_log_pos=0;mysql> slave start;mysql> show slave status\G;*************************** 1. row ***************************           Slave_IO_State: Waiting for master to send event              Master_Host: 10.66.1.52              Master_User: root              Master_Port: 3388            Connect_Retry: 60          Master_Log_File: mysql-bin.000086      Read_Master_Log_Pos: 18585           Relay_Log_File: mysql-relay-bin.000002            Relay_Log_Pos: 8119286    Relay_Master_Log_File: mysql-bin.000083         Slave_IO_Running: Yes        Slave_SQL_Running: Yes          Replicate_Do_DB:       Replicate_Ignore_DB:        Replicate_Do_Table:    Replicate_Ignore_Table:   Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table:                Last_Errno: 0               Last_Error:              Skip_Counter: 0      Exec_Master_Log_Pos: 8119140          Relay_Log_Space: 280580804          Until_Condition: None           Until_Log_File:             Until_Log_Pos: 0       Master_SSL_Allowed: No       Master_SSL_CA_File:        Master_SSL_CA_Path:           Master_SSL_Cert:         Master_SSL_Cipher:            Master_SSL_Key:     Seconds_Behind_Master: 521370Master_SSL_Verify_Server_Cert: No            Last_IO_Errno: 0            Last_IO_Error:            Last_SQL_Errno: 0           Last_SQL_Error: Replicate_Ignore_Server_Ids:          Master_Server_Id: 11 row in set (0.00 sec)
  • 在同步的过程中, 发现有几个Last_SQL_Error: Error ‘Duplicate entry 1026SQL Error, 这个是因为重复主键导致Slave停止工作, 执行以下操作解决(如果有多条重复的主键, 需要执行多次):

    mysql> slave stop;mysql> set GLOBAL SQL_SLAVE_SKIP_COUNTER=1;mysql> slave start;

    还有另一种办法就是修改mysql配置文件/etc/my.cnf[mysqld]下加一行slave_skip_errors = 1062 ,保存后重启mysql, mysql slave可以正常同步了.


同步AIPPRD1的从环境

  • 检查aipprd1的Slave状态

    mysql> show slave status\G;*************************** 1. row ***************************           Slave_IO_State:               Master_Host: 10.66.1.51              Master_User: root              Master_Port: 3388            Connect_Retry: 60          Master_Log_File: mysql-bin.000084      Read_Master_Log_Pos: 91941783           Relay_Log_File: mysql-relay-bin.000002            Relay_Log_Pos: 296    Relay_Master_Log_File: mysql-bin.000084         Slave_IO_Running: No        Slave_SQL_Running: Yes          Replicate_Do_DB:       Replicate_Ignore_DB:        Replicate_Do_Table:    Replicate_Ignore_Table:   Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table:                Last_Errno: 0               Last_Error:              Skip_Counter: 0      Exec_Master_Log_Pos: 91941783          Relay_Log_Space: 452          Until_Condition: None           Until_Log_File:             Until_Log_Pos: 0       Master_SSL_Allowed: No       Master_SSL_CA_File:        Master_SSL_CA_Path:           Master_SSL_Cert:         Master_SSL_Cipher:            Master_SSL_Key:     Seconds_Behind_Master: NULLMaster_SSL_Verify_Server_Cert: No            Last_IO_Errno: 1236            Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'binlog truncated in the middle of event; consider out of disk space on master; the first event 'mysql-bin.000084' at 91941417, the last event read from '/aip/mysql/data/log/mysql-bin.000084' at 91941783, the last byte read from '/aip/mysql/data/log/mysql-bin.000084' at 91942912.'           Last_SQL_Errno: 0           Last_SQL_Error: Replicate_Ignore_Server_Ids:          Master_Server_Id: 21 row in set (0.00 sec)
  • aipprd2上检查日志文件
    aipprd1上检查show slave status\G;后, 提示需要修改posfile为mysql-bin.000084pos91942912, 因为在aipprd2同步完成后, 实际同步的数据是从aipprd1过来的, 这些数据在aipprd1上本身就存在的.

    [root@aipprd2 log]# lltotal 653832-rw-rw---- 1 mysql mysql 104857774 Sep 21 13:12 mysql-bin.000083-rw-rw---- 1 mysql mysql  91942912 Sep 23 10:21 mysql-bin.000084-rw-rw---- 1 mysql mysql   6295552 Sep 23 12:10 mysql-bin.000085-rw-rw---- 1 mysql mysql      8250 Sep 27 17:39 mysql-bin.000086-rw-rw---- 1 mysql mysql 104858014 Sep 29 17:20 mysql-bin.000087-rw-rw---- 1 mysql mysql 104857971 Sep 29 19:18 mysql-bin.000088-rw-rw---- 1 mysql mysql  55744926 Sep 29 20:23 mysql-bin.000089-rw-rw---- 1 mysql mysql       259 Sep 29 19:18 mysql-bin.log.index-rw-rw---- 1 mysql mysql  70844718 Sep 29 15:35 mysql-relay-bin.000005-rw-rw---- 1 mysql mysql 104857729 Sep 29 15:44 mysql-relay-bin.000006-rw-rw---- 1 mysql mysql       708 Sep 29 15:44 mysql-relay-bin.000007-rw-rw---- 1 mysql mysql       199 Sep 29 15:44 mysql-relay-bin.000008-rw-rw---- 1 mysql mysql  25188420 Sep 30 08:40 mysql-relay-bin.000009-rw-rw---- 1 mysql mysql       215 Sep 29 18:42 mysql-relay-bin.index

    aipprd2上检查show master status;后, 记录Posfilemysql-bin.000089, 既然aipprd1的数据为最新的, 且aipprd2已经从aipprd1后同步完成了(通过检查show slave status\G;中的Seconds_Behind_Master:, 如果此项值很小, 应该是同步完成了.), 那么两边的数据应该差不多的.

    mysql> show master status;+------------------+-----------+--------------+------------------+| File             | Position  | Binlog_Do_DB | Binlog_Ignore_DB |+------------------+-----------+--------------+------------------+| mysql-bin.000089 | 104341982 |              |                  |+------------------+-----------+--------------+------------------+1 row in set (0.00 sec)
  • aipprd1上发起修改posfilepos操作
    所以在此用posfilemysql-bin.000089pos0来修改, 启动Slave后, 开始同步.

    mysql> slave stop;Query OK, 0 rows affected (0.11 sec)mysql> CHANGE MASTER TO master_host='10.66.1.51', master_port=3388, master_user='root', master_password='passw0rd', master_log_file='mysql-bin.000089', master_log_pos=0;Query OK, 0 rows affected (0.06 sec)mysql> slave start;Query OK, 0 rows affected (0.00 sec)mysql> show slave status\G;*************************** 1. row ***************************            Slave_IO_State: Waiting for master to send event              Master_Host: 10.66.1.51              Master_User: root              Master_Port: 3388            Connect_Retry: 60          Master_Log_File: mysql-bin.000089      Read_Master_Log_Pos: 55744926           Relay_Log_File: mysql-relay-bin.000002            Relay_Log_Pos: 253    Relay_Master_Log_File: mysql-bin.000089         Slave_IO_Running: Yes        Slave_SQL_Running: Yes          Replicate_Do_DB:       Replicate_Ignore_DB:        Replicate_Do_Table:    Replicate_Ignore_Table:   Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table:                Last_Errno: 0               Last_Error:              Skip_Counter: 0      Exec_Master_Log_Pos: 55744926          Relay_Log_Space: 409          Until_Condition: None           Until_Log_File:             Until_Log_Pos: 0       Master_SSL_Allowed: No       Master_SSL_CA_File:        Master_SSL_CA_Path:           Master_SSL_Cert:         Master_SSL_Cipher:            Master_SSL_Key:     Seconds_Behind_Master: 0Master_SSL_Verify_Server_Cert: No            Last_IO_Errno: 0            Last_IO_Error:            Last_SQL_Errno: 0           Last_SQL_Error: Replicate_Ignore_Server_Ids:          Master_Server_Id: 21 row in set (0.00 sec)

0 0
原创粉丝点击
热门问题 老师的惩罚 人脸识别 我在镇武司摸鱼那些年 重生之率土为王 我在大康的咸鱼生活 盘龙之生命进化 天生仙种 凡人之先天五行 春回大明朝 姑娘不必设防,我是瞎子 小孩不吃饭怎么办偏瘦 我21岁现在缺钱怎么办 斗鱼直播卡顿怎么办 斗鱼怎么办卡怎么设置 无线网不能用了怎么办 率土之滨流浪后怎么办 率土之滨围洛阳怎么办 率土之滨没资源怎么办 竞彩改期的比赛怎么办 变性人老了会怎么办 冰箱放阳台直晒怎么办 喝了冰牛奶胃痛怎么办 空腹喝咖啡胃疼怎么办 空腹吃柿子胃疼怎么办 mac重启卡住了怎么办 快玩下载速度慢怎么办 微信消息不提示怎么办 车被物业锁了怎么办 去香港办的团签怎么办 香港个人签证g签怎么办 诛仙3挂机掉线怎么办 大神f2开不了机怎么办 昆仑账号忘记了怎么办 三星手机wifi信号弱怎么办 6个月的宝宝缺钙怎么办 半岁宝宝坐不稳怎么办 你宝宝老不睡觉怎么办 三岁宝宝摸咪咪怎么办 宝宝很困又不睡怎么办 两岁宝宝摸咪咪怎么办 失眠怎么办如何快速睡眠小偏方 9岁儿童睡不着觉怎么办 2岁儿童睡不着觉怎么办 多梦头晕没精神怎么办 晚上睡眠不好老做梦怎么办 睡醒后喉咙干痛怎么办 感冒了喉咙干痛怎么办 老公想你了你该怎么办 赌博赢来的100万怎么办 赌博输了60万怎么办 网赌欠债10几万怎么办