遭遇ora-600[19004]

来源:互联网 发布:家庭网络交换机 编辑:程序博客网 时间:2024/05/01 06:45

今天本来是休假半天,正在外面办事的时候,突然收到pushmail的告警,oracle数据库中有500多个ora-的报错,心里咯噔了一下,怎么会有那么多的报错!但是又没收到数据库down的pushmail,难道是程序遇到bug了?赶紧打了电话给同事,请他们帮忙看看。

同时也顾不上吃午饭了,匆匆赶到了公司,了解情况。Team Leader已经把该故障升级为一级故障,已经影响到了部分应用模块的正常使用。同事已经在metalink上开了SR,另外也开了一个电话会议,向客户介绍了当前情况。

赶紧先把SR扫了一遍,大致了解了情况,是ora-600[19004]的报错,oracle要求提供explain plan和10053的trace,于是一边登录现网执行语句,查看情况,另一边也介入了电话会议,确认了影响的范围,安抚客户情绪,并表达了针对这个问题的初步诊断分析,以及对策。

在alertlog中,我们看到有大量的报错:

Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1482830.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:16:13 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1789954.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:16:16 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1507510.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:16:29 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1884392.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:16:31 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1372166.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:16:35 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1388784.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:16:37 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1388784.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:16:37 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1884392.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:16:43 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1105974.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:17:17 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1482830.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:17:24 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1773588.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:17:30 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1527844.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:17:33 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1527844.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:17:39 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1773588.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:17:40 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1388784.trc:
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
Thu Jan 20 14:17:47 2011
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/udump/micprd_ora_1589298.trc:

从tracefile中看,语句是类似的,只是带入的变量有所不同。其sql是:

SELECTa.attr2AS attr2,
      
d.valuenameAS pccodename,
      
a.workingscopeAS workingscope,
      
h.currencynameAS currencyname,
      
a.globalpccodeAS globalpccode,
      
a.gradeAS grade,
      
a.faxAS fax,
      
b.zmclpatternAS zmclpattern1,
      
a.orgtypeAS orgtype,
      
a.industrytypeAS industry,
      
a.scagreementtypeAS scagreementtype,
      
a.regioncodeAS regioncode,
      
a.booksetidAS booksetid,
      
a.areaidAS areaid,
      
a.citycodeAS citycode,
      
g.languagenameAS languagename,
      
a.relationtypeAS relationtype,
      
a.saleschannelidAS saleschannelid,
      
a.corporatepersonAS corporateperson,
      
j.citynameAS cityname,
      
a.attr3AS attr3,
      
f.countrynameAS countryname,
      
a.isbottomAS isbottom,
      
a.paymentmethodAS paymentmethod,
      
n.orgnameAS relatedorgname,
      
a.bgorgidAS bgorgid,
      
a.shipmenttermAS shipmentterm,
      
a.shipmentmethodAS shipmentmethod,
      
'' AS tyresize,
      
c.areacodeAS areacode,
      
a.parentorgidAS parentorgid,
      
'' AS selectedcustom,
      
j.citycodeAS citycode1,
      
a.commentsAS comments,
      
a.ownerorgidAS ownerorgid,
      
a.registerfundAS registerfund,
      
a.telephoneAS telephone,
      
a.orgidAS orgid,
      
a.opendateAS opendate,
      
o.valuenameAS solutiontypename,
      
a.orgnameAS orgname,
      
l.valuenameAS statusname,
      
a.solutiontypeAS solutiontype,
      
a.initflagAS initflag,
      
a.datelastupdatedAS datelastupdated,
      
a.createdateAS createdate,
      
a.targettypeAS targettype,
      
a.urlAS url,
      
a.attr1AS attr1,
      
d.languagecodeAS languagecode,
      
a.portcodeAS portcode,
      
a.preferedcurrencyAS preferedcurrency,
      
c.areanameAS areaname,
      
a.countrycodeAS orgcountrycode,
      
a.statusAS status,
      
i.saleschannelnameAS saleschannelname,
      
a.paymenttermAS paymentterm,
      
e.valuenameAS corptypename,
      
b.orgcodeAS parentorgcode,
      
a.booksetopendateAS booksetopendate,
      
a.relatedorgidAS relatedorgid,
      
b.shortnameAS parentorgname,
      
a.corporationtypeAS corporationtype,
      
a.orgcodeAS orgcode,
      
a.shortnameAS shortname,
      
a.emailAS email,
      
a.defaultlanguageAS defaultlanguage,
      
j.languagecodeAS languagecode7,
      
1 AS COUNT
 
FROM table_abcd         d,
      
table_abcd         o,
      
table_112233       c,
      
table_aaaa_bbbb_cch,
      
lang               g,
      
table_abcd         l,
      
table_xyzzxy       b,
      
saleschannel       i,
      
table_xyzzxy       a,
      
table_abcd_1234_a f,
      
city               j,
      
table_xyzzxy       n,
      
table_abcd         e
 
WHERE 1 =1
  
AND h.languagecode =j.languagecode
  
AND a.defaultlanguage =g.languagecode
  
AND d.languagecode =h.languagecode
  
AND l.TYPE ='Status'
  
AND a.status =l.VALUE
  
AND d.TYPE ='GlobalPCCode'
  
AND n.orgid(+) = a.relatedorgid
  
AND d.languagecode =o.languagecode
  
AND o.TYPE ='SolutionType'
  
AND a.solutiontype =o.VALUE
  
AND a.globalpccode =d.VALUE
  
AND d.languagecode =f.languagecode
  
AND i.saleschannelid(+) = a.saleschannelid
  
AND a.preferedcurrency =h.currencycode
  
AND j.citycode(+) = a.citycode
  
AND c.areaid(+) = a.areaid
  
AND e.TYPE ='CorporationType'
  
AND b.orgid(+) = a.parentorgid
  
AND a.countrycode =f.countrycode
  
AND a.corporationtype =e.VALUE
  
AND d.languagecode =l.languagecode
  
AND d.languagecode =e.languagecode
  
AND d.languagecode ='cn'
  
AND a.orgid =100001419

这个sql无论在做explain plan for还是10053的时候,都会报600的错。

在metalink中查到,这个ora-600的报错是一个oracle的bug,在oracle 9208,10105,10203版本的几乎所有平台都会发生。这个bug已经有patch了:patch 4899105

当然,幸好oracle也提供了临时解决方法:

Gather statistics without histograms

在故障现场,执行了以下2个语句之后,问题解决:

--删除原有的统计信息:
exec dbms_stats.delete_table_stats(ownname=>'user_name',tabname=>'table_name',cascade_indexes=>true);
 
--重新收集不带直方图的统计信息:
exec dbms_stats.gather_table_stats(ownname=>'user_name',tabname=>'table_name',cascade=>true,method_opt=>'for all columns size 1');

临时的解决故障后,之后的步骤,就是和客户商量时间打上这个重要的patch了。

其实我还是有点纳闷的是,为什么会触发这个bug,难道是统计信息不够精确?但是我的数据库是使用oracle的默认收集统计信息的设置,而且某个月还会手工收集一次,应该不会有很旧的统计信息。另外,如果表的直方图信息有问题,这样的表应该不止一个,为什么直方图的信息会导致600的报错?为什么只是在执行一个sql的时候报错了600,而不是所有信息不准确的且multi join的sql都报错?

原文地址:http://www.oracleblog.org/working-case/suffering-ora-600-19004/