ORA-00600: internal error code, arguments: [15764] 引起的宕机及处理过程

来源:互联网 发布:周相 与太祖 知乎 编辑:程序博客网 时间:2024/06/17 09:07
 

事情經過:

1.      2011-09-06下午17:25為了提供三廠packing速度, 徐鳴為askey runcard表添加 Wo key 和carton no的聯合索引 創建方式為online。但是110分鐘後仍然無法創建成功,而系統application越來越高,三廠很多條線down線,為了不影響生產,決定cancel create index。 取消後數據庫運行正常。(smon會在後臺做善後工作) 觀察到20:40系統無異常 下班去吃飯。

2.      23:50正打算睡覺時 突然看到系統sfc2rac2 拋出很多 ora-00600錯誤,是很多,同時os也提示errpt 也拋出 SOFTWARE PROGRAM ABNORMALLY TERMINATED 錯誤,查看詳細記錄發現錯誤是程式是 oracle,這時感覺到問題的嚴重性,00:00整 果然發現二號節點可以連接上,但是執行查詢時會拋出ORA-00600錯誤。 一號節點正常。 首先想到的是重啟該節點能否解決

重啟後 database可以正常open,但是alert log中還是不斷拋出 ora-00600錯誤,無法查詢。  頭大~… 沒有辦法只好打電話找茂哥求救,茂哥一時也想不出該如何處理,google /baidu 信息很少 可能是oracle bug ,不知道和下午索引創建失敗是否有關係。 為了不影響生產 將sfc3rac2 shutdown,靠一號節點撐住。 02:00 實在是找不出問題 ,將log 整理出來發BBS求助。

 感冒搞的頭暈腦脹 撐不住  上床睡覺

3.      早上起來查看bbs 沒有人回覆,繼續google /baidu . 找到一些有用的信息

Metalink ID 839536.1

An ORA-600 [15764] is highly transient in nature. Most bugs filed for this issue have beenclosed as not reproducible.

看來應該是oracle的一個bug,但是如何解決呢?繼續找 找到

MOS 上给的解决方案是重启所有节点上的instance.
Solution
Workaround:
Bounce all instances in the RAC cluster.

 

4.      上午10點左右有人反映無法打開MES 查看報錯  ORA-12516: TNS: 监听程序找不到符合协议堆栈要求的可用处理程' 出現這個問題是因為操作系統已經達到最大進程數,查詢sfc3rac1的進程數

select count(*) from v$process;     count=1980   結果在(當前進程1980上下)

select value from v$parameter where name = 'processes'; value=2000(进程数的上限)

已經達到進程上限,看來需要儘快將二號節點修復。不然將影響應用。

5.      發緊急停機mail通知用戶 11:40—12:10  數據庫停機維護

將一號節點shutdown immediate;  正常

將一號節點 startup;                           正常

將二號節點 startup;                           正常

觀察一段事件一切正常, 12:00問題解決。

6.      如metalink上所述 ORA-00600[15764] 和parallel query 有关,應該是oracle的一個bug(10g,11g) 需要重啟所有節點解決。 沒有證據表面次問題和創建索引失敗有關(經過檢查smon已經處理好善後工作)還需要查看更多的信息。

7.      終於體會到DBA的壓力了!不斷的學習 積累知識和經驗 才能扛得住!

 

 

附上一些log:

2011-09-06  23:40開始出現N多ora-00600錯誤

Alert.log

ORA-00600: internal error code, arguments: [15764], [], [], [], [], [], [], []
Wed Sep 7 00:26:42 2011
Errors in file /apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc:
ORA-00600: internal error code, arguments: [15764], [], [], [], [], [], [], []
Wed Sep 7 00:26:44 2011
Trace dumping is performing id=[cdmp_20110907002644]
Wed Sep 7 00:27:10 2011
Errors in file /apps/oracle/admin/sfc3db/udump/sfc3db2_ora_2756730.trc:
ORA-00600: internal error code, arguments: [15764], [], [], [], [], [], [], []
Wed Sep 7 00:28:11 2011
Errors in file /apps/oracle/admin/sfc3db/udump/sfc3db2_ora_2756730.trc:
ORA-00600: internal error code, arguments: [15764], [], [], [], [], [], [], []

 

Os.log

 

sfc3rac2:/home/oracle$ errpt

IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION

A924A5FC   0907001311 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED

A924A5FC   0907000211 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED

A924A5FC   0906234011 P S SYSPROC        SOFTWARE PROGRAM ABNORMALLY TERMINATED

 

詳細OSlog

sfc3rac2:/home/oracle$ errpt -aj A924A5FC|more

---------------------------------------------------------------------------

LABEL:          CORE_DUMP

IDENTIFIER:     A924A5FC

 

Date/Time:       Wed Sep  7 00:13:35 GMT+08:00 2011

Sequence Number: 2190

Machine Id:      00C8B9724C00

Node Id:         sfc3rac2

Class:           S

Type:            PERM

WPAR:            Global

Resource Name:   SYSPROC

 

Description

SOFTWARE PROGRAM ABNORMALLY TERMINATED

 

Probable Causes

SOFTWARE PROGRAM

 

User Causes

USER GENERATED SIGNAL

 

        Recommended Actions

        CORRECT THEN RETRY

 

Failure Causes

SOFTWARE PROGRAM

 

        Recommended Actions

        RERUN THE APPLICATION PROGRAM

        IF PROBLEM PERSISTS THEN DO THE FOLLOWING

        CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data
SIGNAL NUMBER
  6
USER'S PROCESS ID:
  5452002
FILE SYSTEM SERIAL NUMBER
  14
INODE NUMBER
  931776
CORE FILE NAME
/apps/oracle/admin/sfc3db/cdump/core_5452002/core
PROGRAM NAME
oracle
STACK EXECUTION DISABLED
  0
COME FROM ADDRESS REGISTER
splay 0

PROCESSOR ID
  hw_fru_id: 0
  hw_cpu_id: 1

ADDITIONAL INFORMATION
slcra 3EC
??
ksedmp 330
ksfdmp 1C
kgeriv 10C
kgesiv 84
ksesic0 64
kxfprigdb 2F0
kxfpssg 3D4
kxfpqsod_ 610
kxfpqsod 31C
kxfrdel 1D0
kxsffir 244
kxsFreeWo 100
kxsFreeEx 2C
kksumc 8D8
opifch2 1C9C
opiefn0 3E4
opiefn E0
opiodr AE4
ttcpip 1024
opitsk 1128
opiino 994
opiodr AE4
opidrv 488
sou2o 94
opimai_re 1C0
main 9C
__start 74

Symptom Data
REPORTABLE
1
INTERNAL ERROR
0
SYMPTOM CODE
PCSS/SPI2 FLDS/oracle SIG/6 FLDS/slcra VALU/3ec

 

查看trc 文件中的有用信息

 

/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc: calls aborted: 0, num est exec limit hit: 0
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc: name=update seq$ set increment$=:2,minvalue=:3,maxvalue=:4,cycle#=:5,order$=:6,cache=:7,highwater=:8,audit$=:9,flags=:10 where obj#=:1
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc: name=select decode(failover_method, NULL, 0 , 'BASIC', 1, 'PRECONNECT', 2 , 'PREPARSE', 4 , 0), decode(failover_type, NULL, 1 , 'NONE', 1 , 'SESSION', 2, 'SELECT', 4, 1), failover_retries, failover_delay, flags from service$ where name = :1
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc:select SYS_CONTEXT('USERENV', 'SERVER_HOST'), SYS_CONTEXT('USERENV', 'DB_UNIQUE_NAME'), SYS_CONTEXT('USERENV', 'INSTANCE_NAME'), SYS_CONTEXT('USERENV', 'SERVICE_NAME'), INSTANCE_NUMBER, STARTUP_TIME, SYS_CONTEXT('USERENV', 'DB_DOMAIN') from v$instance where INSTANCE_NAME=SYS_CONTEXT('USERENV', 'INSTANCE_NAME')
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc: name=select value$ from props$ where name = 'GLOBAL_DB_NAME'
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc: name=select privilege#,level from sysauth$ connect by grantee#=prior privilege# and privilege#>0 start with grantee#=:1 and privilege#>0
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc: name=select privilege# from sysauth$ where (grantee#=:1 or grantee#=1) and privilege#>0
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc: parallel_execution_enabled = false
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc: bitmap_merge_area_size = 1048576
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc: _enable_type_dep_selectivity = true
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc: _optimizer_sortmerge_join_enabled = true
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc: _optimizer_correct_sq_selectivity = true
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc: _update_bji_ipdml_enabled = 0
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc: _optimizer_complex_pred_selectivity = true
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc: _optimizer_sortmerge_join_inequality = true
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc: Chunk 70000007c60e920 sz= 48 freeable "execution fro i"
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc: sqltxt(70000008e304be8)=select privilege#,level from sysauth$ connect by grantee#=prior privilege# and privilege#>0 start with grantee#=:1 and privilege#>0
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc: sqltxt(70000008e308988)=select privilege# from sysauth$ where (grantee#=:1 or grantee#=1) and privilege#>0
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc: sqltxt(70000008e3739a8)=select value$ from props$ where name = 'GLOBAL_DB_NAME'
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc:select SYS_CONTEXT('USERENV', 'SERVER_HOST'), SYS_CONTEXT('USERENV', 'DB_UNIQUE_NAME'), SYS_CONTEXT('USERENV', 'INSTANCE_NAME'), SYS_CONTEXT('USERENV', 'SERVICE_NAME'), INSTANCE_NUMBER, STARTUP_TIME, SYS_CONTEXT('USERENV', 'DB_DOMAIN') from v$instance where INSTANCE_NAME=SYS_CONTEXT('USERENV', 'INSTANCE_NAME')
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc:select decode(failover_method, NULL, 0 , 'BASIC', 1, 'PRECONNECT', 2 , 'PREPARSE', 4 , 0), decode(failover_type, NULL, 1 , 'NONE', 1 , 'SESSION', 2, 'SELECT', 4, 1), failover_retries, failover_delay, flags from service$ where name = :1
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc: sqltxt(70000008e39b380)=update seq$ set increment$=:2,minvalue=:3,maxvalue=:4,cycle#=:5,order$=:6,cache=:7,highwater=:8,audit$=:9,flags=:10 where obj#=:1
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc:boolean kmgs_spfile_update_ [11022E910, 11022E914) = 00000000
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc:word kcf_exec_read_count_ [11022ED98, 11022ED9C) = 0000024D
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc:1100004C0 3730362E 74726300 00000000 00000000 [706.trc.........]
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc:word xsaggrmergethresh_ [11022D500, 11022D504) = 00000558
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc:word xsoqshareexecutors_ [11022D61C, 11022D620) = 00000591
/apps/oracle/admin/sfc3db/udump/sfc3db2_ora_5054706.trc:word xsoqsqlenableawqdrmerge_ [11022D618, 11022D61C) = 0000059B

 

METALINK 关于ORA-00600[15764]文件原文

Parallel Query Fails with ORA-600 [15764] [ID 839536.1]


 Modified 13-SEP-2011     Type PROBLEM     Status MODERATED 

In this Document
  Symptoms
  Cause
  Solution
  References


 

This document is being delivered to you via Oracle Support's Rapid Visibility (RaV) process and therefore has not been subject to an independent technical review.

Applies to:

Oracle Server - Enterprise Edition - Version: 10.2.0.3 to 11.2.0.1 - Release: 10.2 to 11.2
Information in this document applies to any platform.

Symptoms

A select statement running in parallel failed with ORA-600 [15764].

 

The tracefile showed the following call stack:

 

----- Call Stack Trace -----
kxfprigdb <- kxfpqrgdb <- kxfxgs <- kxfxcw <- qerpxFetch <- opifch2 <- kpoal8 <- opiodr <- ttcpip <- opitsk <- opiino <- opiodr <- opidrv <- sou2o <- opimai_real <- main <- start

 

The tracefile also showed the process state was busy holding a child latch:

 

=================================================== 
PROCESS STATE 
------------- 
Process global information: 
     process: 7000004748eaa00, call: 700000443d33870, xact: 0, curses: 700000474e203c8, usrses: 700000474e203c8 
  ---------------------------------------- 
  SO: 7000004748eaa00, type: 2, owner: 0, flag: INIT/-/-/0x00 
  (process) Oracle pid=157, calls cur/top: 700000443d33870/700000443d33870, flag: (0) - 
            int error: 0, call error: 0, sess error: 0, txn error 0 
  (post info) last post received: 0 0 249 
              last post received-location: kxfprienq: QC 
              last process to post me: 700000474914f40 186 0 
              last post sent: 0 0 250 
              last post sent-location: kxfprienq: slave 
              last process posted by me: 700000474914f40 186 0 
    (latch info) wait_event=0 bits=10 
      holding    (efd=7) 700000472e4a838 Child process queue reference level=4 child#=99   
      Location from where latch is held: kxfprigdb: KSLBEGIN: addr qref  <---
        Context saved from call: 504403177372952360 
        state=busy, wlstate=free  <----
    Process Group: DEFAULT, pseudo proc: 700000474a384f0 
    O/S info: user: oracle, term: UNKNOWN, ospid: 2351144 
    OSD pid info: Unix process pid: 2351144, image: oraclePPSOLTP1@psoldbap003

 

Cause

The error is caused by unpublished Bug 9825798: "DBMV2:SQL EXEC PART:ORA-00600: INTERNAL ERROR CODE, ARGUMENTS: [15764]," which was closed as unpublished Bug 9792010: "DBMV2: STARETL_EXADATA: INSTANCE TERMINATES AFTER MULTIPLE PQ ERRORS." 

This error has been seen as early as 10.2.0.3 and is fixed by the 11.2.0.2 patch set.  The error is not RAC specific.


Solution

The following solutions are available:

  1. Upgrade to 12.1 (Future Release).
  2. Apply the 11.2.0.2 Server Patch Set.
  3. Apply the 11.2.0.1 Bundle Patch 8 (Exadata Database only).
  4. Apply one-off Patch 9792010 if available for your RDBMS version and OS platform.
  5. Workaround:
    Increase parallel message size to avoid join message fragmentation.
    The join message will be sent in one message buffer making the join atomic
    from the QC point of view.


 

Note:  Although the patch name contains the word "Exadata," this is a generic RDBMS parallel execution bug and is not exclusive to Exadata databases.


 

References

NOTE:9792010.8 - Bug 9792010 - Dumps / ORA-600 in PQ slaves due to race condition

Show Related InformationRelated


Products
  • Oracle Database Products > Oracle Database > Oracle Database > Oracle Server - Enterprise Edition
Keywords
PARALLEL EXECUTION; PARALLEL QUERY; FAILED ACTIVITYErrors
ORA-600[15764]; ORA-600[[15764]; ERROR 0

原创粉丝点击