VKTM导致ASMB终止导致ORALCE实例宕

来源:互联网 发布:eclipse发布java程序 编辑:程序博客网 时间:2024/05/08 02:51

    昨天在查看数据库备份情况时,查看日志发现:

RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+DATA/iokbl/archivelog/2015_06_24/thread_1_seq_595.499.883235121 thread=1 sequence=595
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+DATA/iokbl/archivelog/2015_06_25/thread_1_seq_596.498.883270839 thread=1 sequence=596
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+DATA/iokbl/archivelog/2015_06_25/thread_1_seq_597.497.883270839 thread=1 sequence=597
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+DATA/iokbl/archivelog/2015_06_26/thread_1_seq_598.496.883357239 thread=1 sequence=598
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+DATA/iokbl/archivelog/2015_06_26/thread_1_seq_599.495.883357239 thread=1 sequence=599
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+DATA/iokbl/archivelog/2015_06_27/thread_1_seq_600.494.883443639 thread=1 sequence=600
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+DATA/iokbl/archivelog/2015_06_27/thread_1_seq_601.493.883443639 thread=1 sequence=601
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+DATA/iokbl/archivelog/2015_06_28/thread_1_seq_602.492.883530039 thread=1 sequence=602
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+DATA/iokbl/archivelog/2015_06_28/thread_1_seq_603.491.883530039 thread=1 sequence=603
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+DATA/iokbl/archivelog/2015_06_29/thread_1_seq_604.490.883616439 thread=1 sequence=604
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+DATA/iokbl/archivelog/2015_06_29/thread_1_seq_605.489.883616439 thread=1 sequence=605
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=+DATA/iokbl/archivelog/2015_06_30/thread_1_seq_606.488.883702839 thread=1 sequence=606

数据库版本为11.2.0.4,linux环境,用的是DATA GUARD,通过日志可以看出,归档日志未传递到备库,日志不能删除,而主库一直正常运行,说明备库出了问题,于是用sqlplus / as sysdba登录备库,提示连接到一个空实例,说明备库宕了。

查看alert日志:

Wed Jun 24 01:00:01 2015
Warning: VKTM detected a time drift.
Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.
Wed Jun 24 15:05:41 2015
Warning: VKTM detected a time drift.
Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.
Wed Jun 24 15:05:42 2015
NOTE: ASMB terminating
Errors in file /u01/app/oracle/diag/rdbms/standby/standby/trace/standby_asmb_114106.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID:
Session ID: 1261 Serial number: 11
Errors in file /u01/app/oracle/diag/rdbms/standby/standby/trace/standby_asmb_114106.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID:
Session ID: 1261 Serial number: 11
ASMB (ospid: 114106): terminating the instance due to error 15064
Wed Jun 24 15:05:42 2015
System state dump requested by (instance=1, osid=114106 (ASMB)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/standby/standby/trace/standby_diag_114080_20150624150542.trc
Dumping diagnostic data in directory=[cdmp_20150624150542], requested by (instance=1, osid=114106 (ASMB)), summary=[abnormal instance termination].
Instance terminated by ASMB, pid = 114106


        原来6月24号就已经挂了,因为备份在主库上做,策略是删除前7天的归档日志,也未看备库信息,所以一直没发现问题,看来以后要随时观察主备库情况,有问题及时解决

        那么就先把备库起来再说

SQL> startup;
ORACLE instance started.


Total System Global Area  732352512 bytes
Fixed Size    1367144 bytes
Variable Size  331350936 bytes
Database Buffers  377487360 bytes
Redo Buffers   22147072 bytes
Database mounted.
Database opened.

启日志应用:

SQL> altter database recover managed standby database disconnect from session;

接下来分析日志吧,先看当时的VKTM trace file:

*** 2015-06-22 20:00:01.102
kstmchkdrift (kstmhighrestimecntkeeper:highres): Stall, backward drift ended at 1434974497506826 drift: 1071
kstmchkdrift (kstmhighrestimecntkeeper:lowres): Time stalled at 1434974401
kstmchkdrift (kstmhighrestimecntkeeper:lowres): Stall, backward drift ended at 1434974402 drift: 1
kstmchkdrift (kstmhighrestimecntkeeper:highres): Time stalled at 1435024898106410


*** 2015-06-23 10:00:01.197
kstmchkdrift (kstmhighrestimecntkeeper:highres): Stall, backward drift ended at 1435024898107666 drift: 1256
kstmchkdrift (kstmhighrestimecntkeeper:highres): Time stalled at 1435028498622049


*** 2015-06-23 11:00:01.108
kstmchkdrift (kstmhighrestimecntkeeper:highres): Stall, backward drift ended at 1435028498623142 drift: 1093
kstmchkdrift (kstmhighrestimecntkeeper:lowres): Time stalled at 1435028401
kstmchkdrift (kstmhighrestimecntkeeper:lowres): Stall, backward drift ended at 1435028402 drift: 1
kstmchkdrift (kstmhighrestimecntkeeper:highres): Time stalled at 1435078899223760


*** 2015-06-24 01:00:01.205
kstmchkdrift (kstmhighrestimecntkeeper:highres): Stall, backward drift ended at 1435078899224947 drift: 1187
kstmchkdrift (kstmhighrestimecntkeeper:highres): Time stalled at 1435082499729169


*** 2015-06-24 02:00:01.106
kstmchkdrift (kstmhighrestimecntkeeper:highres): Stall, backward drift ended at 1435082499730249 drift: 1080
kstmchkdrift (kstmhighrestimecntkeeper:highres): Time jumped forward by (339800056)usec at (1435129640131631) whereas (1000000) is allowed

一直有关于时间不同步的警告,查看service ntpd tatus,发现ntp已经停了,于是先启动ntp,service ntpd start

根据alert中的信息,把systemstate dump信息转换下,用aas.awk工具:

awk -f ass1038.awk /u01/app/oracle/diag/rdbms/standby/standby/trace/standby_diag_114080_20150624150542.trc


Starting Systemstate 1
.................................................................
Ass.Awk Version 1.0.38
~~~~~~~~~~~~~~~~~~~~~~
Source file : /u01/app/oracle/diag/rdbms/standby/standby/trace/standby_diag_114080_20150624150542.trc


System State 1 (2015-06-24 15:05:42.666)
~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
1:                                      [DEAD] 
2:  waiting for 'pmon timer'            [DEAD] 
3:  waiting for 'rdbms ipc message'     [DEAD] 
4:  waiting for 'VKTM Logical Idle Wait' [DEAD] 
5:  waiting for 'rdbms ipc message'     [DEAD] 
6:  waiting for 'DIAG idle wait'        
7:  waiting for 'rdbms ipc message'     [DEAD] 
8:  waiting for 'DIAG idle wait'        [DEAD] 
9:  waiting for 'rdbms ipc message'     [DEAD] 
10: waiting for 'rdbms ipc message'     
11: waiting for 'rdbms ipc message'     
12: waiting for 'rdbms ipc message'     
13: waiting for 'rdbms ipc message'     
14: waiting for 'rdbms ipc message'     [DEAD] 
15: waiting for 'rdbms ipc message'     [DEAD] 
16: waiting for 'smon timer'            [DEAD] 
17: waiting for 'rdbms ipc message'     [DEAD] 
18: waiting for 'rdbms ipc message'     [DEAD] 
19:                                     
20: waiting for 'rdbms ipc message'     [DEAD] 
21: waiting for 'rdbms ipc message'     [DEAD] 
22: waiting for 'wait for unread message on broadcast channel' [DEAD] 
23: waiting for 'SQL*Net message from client' [DEAD] 
24: waiting for 'SQL*Net message from client' [DEAD] 
26: waiting for 'rdbms ipc message'     [DEAD] 
27: waiting for 'rdbms ipc message'     [DEAD] 
28: waiting for 'rdbms ipc message'     [DEAD] 
29: waiting for 'rdbms ipc message'     [DEAD] 
30: waiting for 'rdbms ipc message'     [DEAD] 
32: waiting for 'SQL*Net message from client' [DEAD] 
34: waiting for 'SQL*Net message from client' [DEAD] 
36: waiting for 'class slave wait'      [DEAD] 
37: waiting for 'MRP wait on archivelog arrival' [DEAD] 
38: waiting for 'parallel recovery slave next change' [DEAD] 
39: waiting for 'parallel recovery slave next change' [DEAD] 
40: waiting for 'parallel recovery slave next change' [DEAD] 
41: waiting for 'parallel recovery slave next change' [DEAD] 
42: waiting for 'parallel recovery slave next change' [DEAD] 
43: waiting for 'parallel recovery slave next change' [DEAD] 
44: waiting for 'parallel recovery slave next change' [DEAD] 
45: waiting for 'parallel recovery slave next change' [DEAD] 
46: waiting for 'parallel recovery slave next change' [DEAD] 
47: waiting for 'parallel recovery slave next change' [DEAD] 
48: waiting for 'parallel recovery slave next change' [DEAD] 
49: waiting for 'parallel recovery slave next change' [DEAD] 
50: waiting for 'parallel recovery slave next change' [DEAD] 
51: waiting for 'parallel recovery slave next change' [DEAD] 
52: waiting for 'parallel recovery slave next change' [DEAD] 
53: waiting for 'parallel recovery slave next change' [DEAD] 
54: waiting for 'parallel recovery slave next change' [DEAD] 
55: waiting for 'parallel recovery slave next change' [DEAD] 
56: waiting for 'parallel recovery slave next change' [DEAD] 
57: waiting for 'parallel recovery slave next change' [DEAD] 
58: waiting for 'parallel recovery slave next change' [DEAD] 
59: waiting for 'parallel recovery slave next change' [DEAD] 
60: waiting for 'parallel recovery slave next change' [DEAD] 
61: waiting for 'parallel recovery slave next change' [DEAD] 
62: waiting for 'parallel recovery slave next change' [DEAD] 
63: waiting for 'parallel recovery slave next change' [DEAD] 
64: waiting for 'parallel recovery slave next change' [DEAD] 
65: waiting for 'parallel recovery slave next change' [DEAD] 
66: waiting for 'parallel recovery slave next change' [DEAD] 
67: waiting for 'parallel recovery slave next change' [DEAD] 
68: waiting for 'parallel recovery slave next change' [DEAD] 
69: waiting for 'parallel recovery slave next change' [DEAD] 


NO BLOCKING PROCESSES FOUND


                   ------------------ooOoo------------------
For the LATEST version of this utility see
  http://dlsunuk11.uk.oracle.com/Public/Utils.html#ass


For additional documentation see
  http://dlsunuk11.uk.oracle.com/Public/TOOLS/Ass.html


Suggested improvements, bugs etc. should be sent to kevin.p.quinn@oracle.com


End of report. 49578 Lines Processed.

由于ASMB进程死掉,导致oralce实例宕,于是不能接收从主库传递过来的日志信息,起因就是ntp出了问题

具体可见MOS ORA-15064 as per ASM instance founds sessions unresponsive for over 3 mins (文档 ID 1615459.1)

根据MOS文档 Base Bug 11837095介绍,此bug已在11.2.0.3中fixed.

根据MOS文档 ID 1347586.1介绍,可以设置事件来解决此问题:

alter system set event="10795 trace name context forever, level 2" scope=spfile;

然后重启实例。后期要注意观察。另可见大神惜纷飞的博客也有说明:http://www.oratping.com/?p=359


0 0