记一次从库主机断电重启后同步故障处理
来源:互联网 发布:指向二维数组的指针 编辑:程序博客网 时间:2024/06/05 03:01
一.基本信息
主库:10.24.65.30
从库:10.24.65.33
版本:mysql 5.7.14
mysql参数设置:
binlog_format:row
tx_isolation:READ-COMMITTED
master_info_repository:FILE
relay_log_info_repository:FILE
sync_master_info:10000
sync_relay_log_info:10000
relay_log_recovery :off
从库同步情况:
(product)root@localhost [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.24.65.30
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 338
Relay_Log_File: relay-bin.000003
Relay_Log_Pos: 551
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 338
二.模拟故障发生
1.主库模拟数据写入:
sysbench --test=/data/software/sysbench-0.5/sysbench/tests/db/oltp.lua --mysql-table-engine=innodb --oltp-table-size=1000000 --max-requests=10000 \
--num-threads=100 --oltp_tables_count=10 --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=admin --mysql-password=Dmjxjbgc6u \
--mysql-db=test --max-time=600 --mysql-socket=/tmp/mysql3306.sock prepare
2.从库操作:
将从库主机断电
3.从库主机断电恢复后,启动mysql和slave
同步发生了故障,如下:
(product)root@localhost [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Queueing master event to the relay log
Master_Host: 10.24.65.30
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000135
Read_Master_Log_Pos: 29046651
Relay_Log_File: relay-bin.000272
Relay_Log_Pos: 367
Relay_Master_Log_File: mysql-bin.000128
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1594
Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
Skip_Counter: 0
Exec_Master_Log_Pos: 154
三.修复故障
1.修复binlog损坏
上面显示binlog或relay log损坏,检查并记录确认上次sql thread最后正常执行的relay_master_log_file和exec_master_log_pos:
stop slave;
change master to
master_log_file='mysql-bin.000128',
master_log_pos=154;
start slave;
(product)root@localhost [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Queueing master event to the relay log
Master_Host: 10.24.65.30
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000138
Read_Master_Log_Pos: 123
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 320
Relay_Master_Log_File: mysql-bin.000128
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1062
Last_Error: Could not execute Write_rows event on table test.sbtest5; Duplicate entry '32257' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000128, end_log_pos 532773
Skip_Counter: 0
Exec_Master_Log_Pos: 154
Relay_Log_Space: 1348664795
2.修复1062错误
出现上面报错,是因为master.info和relay log.info两文件的信息没有及时刷盘更新,导致从master中读取或执行了重复的log,产生主键冲突报1062错误,先尝试跳过错误:
stop slave;
set global sql_slave_skip_counter=1;
start slave;
(product)root@localhost [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.24.65.30
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000145
Read_Master_Log_Pos: 10398871
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 1037690
Relay_Master_Log_File: mysql-bin.000128
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1062
Last_Error: Could not execute Write_rows event on table test.sbtest5; Duplicate entry '34945' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000128, end_log_pos 1570143
Skip_Counter: 0
Exec_Master_Log_Pos: 1037524
Relay_Log_Space: 2303127120
继续跳:
stop slave;
set global sql_slave_skip_counter=1;
start slave;
第二次跳过错误后,依然还报错。
直接设置stop slave;
set global slave_exec_mode='IDEMPOTENT'; # 默认值是STRICT(严格模式)
start slave;
设置完毕后,当出现1062错误(主键重复)或1023错误(记录没找到)时,等几分钟后,会自动跳过该错误,并且记录到错误日志里。
其实和slave_skip_errors这个参数的作用是一样的,只不过slave_skip_errors参数必须加到配置文件my.cnf里然后重启MySQL,而slave_exec_mode参数可以动态设置。
(product)root@localhost [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.24.65.30
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000145
Read_Master_Log_Pos: 10398871
Relay_Log_File: relay-bin.000039
Relay_Log_Pos: 320
Relay_Master_Log_File: mysql-bin.000145
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 10398871
上面显示已没有报错,且已同步正常。
没有错误后,需要参数还原: set global slave_exec_mode=default ;或者 set global slave_exec_mode='STRICT';
虽然没有错误了,还是需要用 pt-sync使用检查主从数据一致性。
四.pt-table-checksum检查主从是否一致
在test库中建表,并插入从库的信息.
CREATE TABLE `dsns` ( `id` int(11) NOT NULL AUTO_INCREMENT, `parent_id` int(11) DEFAULT NULL, `dsn` varchar(255) NOT NULL, PRIMARY KEY (`id`) );
写入从库信息
INSERT INTO dsns (parent_id,dsn) values(1,"h=10.24.65.33,u=pc_checksum,p=percona,P=3306");
检查命令:
cd /data/tool/percona-toolkit-2.2.10/bin
perl pt-table-checksum --no-check-binlog-format --nocheck-replication-filters --ignore-databases=mysql,\
performance_schema,information_schema,log,common_schema --replicate=test.checksums --host=127.0.0.1 \
--port 3306 -upc_checksum -ppercona --engines=InnoDB,MyISAM --recurse=1 --chunk-size-limit=0 --recursion-method=dsn=D=host,D=test,t=dsns
检查操作已经完成,查看输出结果,是否有不一致的记录。
-bash-4.1$ perl pt-table-checksum --no-check-binlog-format --nocheck-replication-filters --ignore-databases=mysql,\
> performance_schema,information_schema,log,common_schema --replicate=test.checksums --host=127.0.0.1 \
> --port 3306 -upc_checksum -ppercona --engines=InnoDB,MyISAM --recurse=1 --chunk-size-limit=0 --recursion-method=dsn=D=host,D=test,t=dsns
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
08-31T15:33:06 0 0 6 1 0 0.275 sys.sys_config
08-31T15:33:06 0 0 1 1 0 0.017 test.dsns
08-31T15:33:10 0 0 1000000 8 0 3.425 test.sbtest1
08-31T15:33:13 0 0 1000000 6 0 3.323 test.sbtest10
08-31T15:33:17 0 0 1000000 6 0 3.736 test.sbtest2
08-31T15:33:20 0 0 1000000 6 0 3.358 test.sbtest3
08-31T15:33:23 0 0 1000000 6 0 3.306 test.sbtest4
08-31T15:33:27 0 0 1000000 6 0 3.132 test.sbtest5
08-31T15:33:30 0 0 1000000 6 0 3.112 test.sbtest6
08-31T15:33:33 0 0 1000000 6 0 3.254 test.sbtest7
08-31T15:33:36 0 0 1000000 6 0 3.094 test.sbtest8
08-31T15:33:39 0 0 1000000 6 0 2.998 test.sbtest9
08-31T15:33:39 0 0 2 1 0 0.014 test.t1
08-31T15:33:39 0 0 2 1 0 0.012 test.t10
08-31T15:33:39 0 0 0 1 0 0.012 test.t10_tmp
08-31T15:33:39 0 0 2 1 0 0.012 test.t2
08-31T15:33:39 0 0 3 1 0 0.263 test.t3
08-31T15:33:39 0 0 2 1 0 0.013 test.t4
08-31T15:33:39 0 0 2 1 0 0.014 test.t5
08-31T15:33:39 0 0 3 1 0 0.012 test.t6
08-31T15:33:39 0 0 2 1 0 0.014 test.t7
08-31T15:33:39 0 0 1 1 0 0.013 test.t8
08-31T15:33:39 0 0 7 1 0 0.014 test.zeng
08-31T15:33:39 0 0 7 1 0 0.013 test.zeng1
上面显示两个库的数据是一致的。
- 记一次从库主机断电重启后同步故障处理
- 一次断电引发的svn数据库故障
- 记一次redis故障处理
- 记一次服务器故障处理
- Linux:记一次异常断电导致的系统无法正常启动(文件系统故障)
- 【故障处理】一次RAC故障处理过程
- 一次Oracle故障处理过程
- 一次svn的故障处理
- 一次ibm服务器故障处理
- 记一次断电恢复ORA-01033错误
- 记一次特殊的“ORA-04030”故障处理
- 一次逻辑备库延时故障的处理
- 记一次Exadata故障
- mysql主从同步故障处理
- mysql同步字符集故障处理
- 一次ORA-60死锁故障的处理
- 一次oracle 10g rac故障处理
- 一次oracle介质故障的处理
- jquery.form.js实现将form提交转为ajax方式提交的方法
- react-native-gradle-2.10-all.zip-下载慢
- 用exp_imp远程操作数据库,附帶Oracle数据导入导出imp/exp詳解
- android spinner下拉框组件使用
- Agile实践之Kanban工具: Wekan
- 记一次从库主机断电重启后同步故障处理
- 北京折叠——一部刚要开始就已经结束的科幻小说
- POJ 2000 Gold Coins(模拟)
- mysql slave的安全恢复
- 欢迎使用CSDN-markdown编辑器
- Oracle数据库开发笔记 lesson2
- Java中的日期、时间类
- linux命令——patch(含diff)
- POJ 2081 Recaman's Sequence (递推)