Instance Crash & Bug 13846531

来源:互联网 发布:as网络语言什么意思 编辑:程序博客网 时间:2024/06/12 17:07

1.  故障现象

一客户决策支持数据库在3月6日、9日节点1的实例Crash。

2.   故障分析

3月6日下午17点27分,发现决策支持数据库节点1 ASM实例后台报ORA-04031错误,该错误表示共享内存不足,检查ASM实例时发现已HANG,该节点已被集群踢出,分析是在跑批处理加载的时候ASM实例shared pool碎片太多导致ORA-04031错误,该故障现象与Bug 13846531 : ASM INSTANCE CRASHED WITH ORA-4031 ON EXADATA类似,需安装相应的patch或设置几个参数绕过。

3月9日下午14点28分,该数据库节点1再次出现Crash,日志如下:

Sat Mar 09 14:28:35 2013Errors in file/u01/app/oracle/diag/rdbms/bjscj1dw/bjscj1dw1/trace/bjscj1dw1_lms1_8230.trc  (incident=384220):ORA-00600: internal error code, arguments: [kjbldrmrpst:pkey],[32767], [1123700], [0], [0], [], [], [], [], [], [], []Incident details in:/u01/app/oracle/diag/rdbms/bjscj1dw/bjscj1dw1/incident/incdir_384220/bjscj1dw1_lms1_8230_i384220.trcUse ADRCI or Support Workbench to package the incident.See Note 411.1 at My Oracle Support for error and packagingdetails.Sat Mar 09 14:28:38 2013Dumping diagnostic data in directory=[cdmp_20130309142838],requested by (instance=1, osid=8230 (LMS1)), summary=[incident=384220].Sat Mar 09 14:28:38 2013Sweep [inc][384220]: completedSweep [inc2][384220]: completedErrors in file/u01/app/oracle/diag/rdbms/bjscj1dw/bjscj1dw1/trace/bjscj1dw1_lms1_8230.trc:ORA-00600: internal error code, arguments: [kjbldrmrpst:pkey],[32767], [1123700], [0], [0], [], [], [], [], [], [], []LMS1 (ospid: 8230): terminating the instance due to error 484System state dump requested by (instance=1, osid=8230 (LMS1)),summary=[abnormal instance termination].System State dumped to trace file/u01/app/oracle/diag/rdbms/bjscj1dw/bjscj1dw1/trace/bjscj1dw1_diag_8214.trcSat Mar 09 14:28:50 2013Instance terminated by LMS1, pid = 8230Sat Mar 09 14:28:56 2013Starting ORACLE instance(normal)LICENSE_MAX_SESSION = 0LICENSE_SESSIONS_WARNING = 0


从日志中可以看出,数据库在Crash前出现ORA-00600 [kjbldrmrpst:pkey] 错误,此时实例被LMS1进程终止,6秒后该实例被集群自动启动。从错误信息来看,数据库很可能遭遇Bug 13397104 : INSTANCE CRASH WITH ORA-600 [KJBLPKEYDRMQSCCHK:PKEY],该BUG与11g Dynamic Remastering新引入的Read-mostly locking功能有关,为了提高性能,该功能默认是开启的,若频繁引起实例重启,建议关闭该功能,绕开这个Bug。

3.   处理总结

关于ORA-04031错误,需在asm spfile增加两个隐含参数,如下所示:

Thefollowing work around has helped in reported incident. (The ideal solution isto fix the application code to use bind variables instead of literals in thecalls it make)

1.Set SGA_TARGET=2G  (optional)

2._enable_shared_pool_durations= false

(Thiswill allow unpinned SQLA, KGLH0, and KGLHD memory to be freed to make room fornew permanent allocations)

3._library_cache_advice = false

  (To eliminate the allocations from kglsim,this parameter setting will work:)

Pleaseset all the above three change in the ASM SPFILE restart clusterware( We needto restart ASM instance).

关于ORA-00600 [kjbldrmrpst:pkey],若频繁出现实例重启的现象,建议关闭Dynamic Remastering的Read-mostly locking功能,设置下列隐含参数可以关闭。详细请参考Note 13397104.1。

_gc_read_mostly_locking=FALSE

原创粉丝点击