库被HANG住的具体查看（使用HNAGANALYZE）及真正的死锁

来源：互联网发布：淘宝店铺等级怎么升编辑：程序博客网时间：2024/05/29 16:50

session 1

UPDATE DEPT SET DEPT.DEPTNO = 70 WHERE DEPT.DEPTNO = 50;

未提交

session 2

UPDATE DEPT SET DEPT.DEPTNO = 60 WHERE DEPT.DEPTNO = 50;

库被HANG住

session 3

--查看当前会话ID

SELECT SYS_CONTEXT('USERENV','sid') from dual;

138

--查看当前会话产生的跟踪文件路径及名称
SELECT D.VALUE || '\' || LOWER(RTRIM(I.INSTANCE, CHR(0))) || '_ora_' ||
       P.SPID || '.trc' AS "trace_file_name"
FROM (SELECT P.SPID
          FROM V$MYSTAT M, V$SESSION S, V$PROCESS P
         WHERE M.STATISTIC# = 1
           AND S.SID = M.SID
           AND P.ADDR = S.PADDR) P,
       (SELECT T.INSTANCE
          FROM V$THREAD T, V$PARAMETER V
         WHERE V.NAME = 'thread'
           AND (V.VALUE = 0 OR T.THREAD# = TO_NUMBER(V.VALUE))) I,
       (SELECT VALUE FROM V$PARAMETER WHERE NAME = 'user_dump_dest') D;

E:\ORACLE\PRODUCT\10.2.0\ADMIN\ORCL\UDUMP\orcl_ora_7168.trc

--便于查找跟踪文件

ALTER SESSION SET tracefile_identifier='YJG';

--把HANG住的信息dump到跟踪文件
ALTER SESSION SET EVENTS 'immediate trace name HANGANALYZE level 3';

查看跟踪文件，内容如下：

Dump file e:\oracle\product\10.2.0\admin\orcl\udump\orcl_ora_7168.trc
Fri Apr 26 16:13:06 2013
ORACLE V10.2.0.3.0 - Production vsnsta=0
vsnsql=14 vsnxtr=3
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options
Windows XP Version V5.1 Service Pack 3
CPU : 2 - type 586, 2 Physical Cores
Process Affinity : 0x00000000
Memory (Avail/Total): Ph:1332M/3070M, Ph+PgF:2180M/4961M, VA:1048M/2047M
Instance name: orcl

Redo thread mounted by this instance: 1

Oracle process number: 17

Windows thread id: 7168, image: ORACLE.EXE (SHAD)

*** ACTION NAME:(SQL 窗口 - ALTER SESSION SET tra) 2013-04-26 16:13:06.000
*** MODULE NAME:(PL/SQL Developer) 2013-04-26 16:13:06.000
*** SERVICE NAME:(orcl) 2013-04-26 16:13:06.000
*** SESSION ID:(138.2924) 2013-04-26 16:13:06.000
*** 2013-04-26 16:13:06.000
==============
HANG ANALYSIS:
==============
Open chains found:
Chain 1 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
    <0/139/2808/0x41e5231c/4836/SQL*Net message from client>
-- <0/136/2803/0x41e5290c/7520/enq: TX - row lock contention>
Other chains found:
Chain 2 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
    <0/137/2/0x41e55e7c/7132/wait for unread message on broad>
Chain 3 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
    <0/138/2924/0x41e50b5c/7168/No Wait>
Chain 4 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
    <0/142/2/0x41e534ec/5128/Streams AQ: waiting for time man>
Chain 5 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
    <0/146/625/0x41e4ff7c/4316/jobq slave wait>
Chain 6 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
    <0/150/1/0x41e51d2c/6624/Streams AQ: qmn coordinator idle>
Chain 7 : <cnode/sid/sess_srno/proc_ptr/ospid/wait_event> :
    <0/152/40/0x41e53adc/7240/Streams AQ: qmn slave idle wait>
Extra information that will be dumped at higher levels:
[level 4] :   1 node dumps -- [REMOTE_WT] [LEAF] [LEAF_NW]
[level 5] :   6 node dumps -- [SINGLE_NODE] [SINGLE_NODE_NW] [IGN_DMP]
[level 6] :   1 node dumps -- [NLEAF]
[level 10] : 18 node dumps -- [IGN]

State of nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predecessor):
[132]/0/133/60/0x41f2561c/1896/IGN/1/2//none
[133]/0/134/2935/0x41f26904/864/IGN/3/4//none
[134]/0/135/2/0x41f27bec/5648/IGN/5/6//none
[135]/0/136/2803/0x41f28ed4/7520/NLEAF/7/10/[138]/none
[136]/0/137/2/0x41f2a1bc/7132/SINGLE_NODE/11/12//none
[137]/0/138/2924/0x41f2b4a4/7168/SINGLE_NODE_NW/13/14//none
[138]/0/139/2808/0x41f2c78c/4836/LEAF/8/9//135
[140]/0/141/4/0x41f2ed5c/6368/IGN/15/16//none
[141]/0/142/2/0x41f30044/5128/SINGLE_NODE/17/18//none
[145]/0/146/625/0x41f34be4/4316/SINGLE_NODE/19/20//none
[147]/0/148/24/0x41f371b4/6908/IGN/21/22//none
[149]/0/150/1/0x41f39784/6624/SINGLE_NODE/23/24//none
[151]/0/152/40/0x41f3bd54/7240/SINGLE_NODE/25/26//none
[154]/0/155/3/0x41f3f60c/5104/IGN/27/28//none
[155]/0/156/7/0x41f408f4/6488/IGN/29/30//none
[159]/0/160/1/0x41f45494/400/IGN/31/32//none
[160]/0/161/1/0x41f4677c/2028/IGN/33/34//none
[161]/0/162/1/0x41f47a64/5292/IGN/35/36//none
[162]/0/163/1/0x41f48d4c/5492/IGN/37/38//none
[163]/0/164/1/0x41f4a034/4844/IGN/39/40//none
[164]/0/165/1/0x41f4b31c/5600/IGN/41/42//none
[165]/0/166/1/0x41f4c604/8152/IGN/43/44//none
[166]/0/167/1/0x41f4d8ec/5720/IGN/45/46//none
[167]/0/168/1/0x41f4ebd4/1340/IGN/47/48//none
[168]/0/169/1/0x41f4febc/4668/IGN/49/50//none
[169]/0/170/1/0x41f511a4/7968/IGN/51/52//none
====================
END OF HANG ANALYSIS
====================

说明是行锁竞争引起；

详细介绍HANGANALYZE :http://dev.21tx.com/2007/08/18/10786.html

1.为什么要使用hanganalyze?

　　Oracle数据库“真的”hang住了，可以理解为数据库内部发生死锁。因为普通的DML死锁，oracle服务器会自动监测他们的依赖关系，并回滚其中一个操作，终止这种相互等待的局面。而当这种死锁发生在争夺内核级别的资源（比如说是pins或latches）时，Oracle并不能自动的监测并处理这种死锁。

　　其实很多时候数据库并没有hang住，而只是由于数据库的性能问题，处理的时间比较长而已。

　　Hanganalyze工具使用内核调用检测会话在等待什么资源，报告出占有者和等待者的相互关系。另外，它还会将一些比较”interesting”的进程状态dump出来，这个取决于我们使用hanganalyze的分析级别。

　　使用hanganalyze

　　hanganalyze工具从oracle8i第二版开始提供，到9i增强了诊断RAC环境下的“集群范围”的信息，这意味着它将会报告出整个集群下的所有会话的信息。

　　目前有三种使用hanganalyze的方法:

　　一种是会话级别的：

ALTER SESSION SET EVENTS 'immediate
trace name HANGANALYZE level <level>';

　　一种是实例级别:

ORADEBUG hanganalyze <level>

　　一种是集群范围的：

ORADEBUG setmypid
ORADEBUG setinst all
ORADEBUG -g def hanganalyze <level>

　　先解释下各个level的含义：

　　1-2:只有hanganalyze输出，不dump任何进程

　　3：Level2+Dump出在IN_HANG状态的进程

　　4：Level3+Dump出在等待链里面的blockers(状态为LEAF/LEAF_NW/IGN_DMP)

　　5：Level4+Dump出所有在等待链中的进程（状态为NLEAF）

　　hanganalyze报告会分作许多片断，会话片断信息总是由一个header详尽描述被提取的的会话信息。Oracle8i和9i的信息略有不同：

sid是 Session ID
sess_srno是serial#
proc_ptr是Process Pointer
ospid 是OS Process ID
cnode是Node Id，Oracle9i才用
Nodenum是hanganalyze
自己为了记录这些会话而定制的编号，从0开始排起。
State 是node的状态
Adjlist是临近的node(通常代表一个blocker node)
Predecessor是Predecessor node ,通常代表一个 waiter node

　　接着解释一下比较重要的一些node state:

　　IN_HANG：这表示该node处于死锁状态，通常还有其他node（blocker）也处于该状态

　　LEAF/LEAF_NW：该node通常是blocker。通过条目的”predecessor”列可以判断这个node是否是blocker。LEAF说明该NODE没有等待其他资源，而LEAF_NW则可能是没有等待其他资源或者是在使用CPU.

　　如下的实例说明了node16阻塞了node19的资源：

nodenum]/cnode/sid/sess_srno/session/
ospid/state/start/finish/[adjlist]/predecessor
[16]/0/17/154/0x24617be0/26800/LEAF/29/30//19
[19]/0/20/13/0x24619830/26791/NLEAF/33/34/[16]/186

　　NLEAF：通常可以看作这些会话是被阻塞的资源。发生这种情况一般说明数据库发生性能问题而不是数据库hang

　　IGN/IGN_DMP：这类会话通常被认为是空闲会话，除非其adjlist列里存在node。如果是非空闲会话则说明其adjlist里的node正在等待其他node释放资源。

SINGLE_NODE/SINGLE_NODE_NW：近似于空闲会话

　实战hanganalyze:

生成跟踪文件
SQL> oradebug setmypid
已处理的语句
SQL> oradebug hanganalyze 3

　　查看跟踪文件

==============
HANG ANALYSIS:
==============
Open chains found:
Chain 1 : <sid/sess_srno/proc_ptr/ospid/wait_event> :
<5/1/0x843630f8/2531/No Wait>
Chain 2 : <sid/sess_srno/proc_ptr/ospid/wait_event> :
<12/1/0x84364e48/2521/instance state change>

　　有一个等待事件，操作系统进程是2521，经检查，是smon，占用了一个CPU的98％的资源

Other chains found:
Extra information that will be dumped at higher levels:
[level　4] :　 2 node dumps -- [LEAF] [LEAF_NW] [IGN_DMP]
[level 10] :　 6 node dumps -- [IGN]
State of nodes
([nodenum]/sid/sess_srno/session/state/
start/finish/[adjlist]/predecessor):
[0]/1/1/0x84405460/IGN/1/2//none
[1]/2/1/0x84405de0/IGN/3/4//none
[2]/3/1/0x84406760/IGN/5/6//none
[3]/4/1/0x844070e0/IGN/7/8//none
[4]/5/1/0x84407a60/LEAF_NW/9/10//none
[5]/6/1/0x844083e0/IGN/11/12//none
[10]/11/1/0x8440b360/IGN/13/14//none
[11]/12/1/0x8440bce0/LEAF/15/16//none
Dumping System_State and
Fixed_SGA in process with ospid 2521
Dumping Process information
for process with ospid 2531
Dumping Process information
for process with ospid 2521
*** 2004-12-09 00:07:49.680
====================
END OF HANG ANALYSIS
====================

　　我们看到，并没有发生死锁，Smon进程忙，说明在正确释放资源。

真正的死锁从一道OCP试题讲解

The session of user SCOTT receives the following error after executing an UPDATE
command on the EMP table:

ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
On investigation, you find that a session opened by user JIM has a transaction that caused the deadlock.

Which two statements are true regarding the session of SCOTT in this scenario? (Choose two.)
选项
A.The session is terminated after receiving the error and JIM can continue with his transaction.

B.SCOTT should perform a COMMIT or ROLLBACK to allow JIM to continue with his transaction.

C.The session is rolled back after receiving the error and JIM can continue with his transaction.

D.SCOTT has to reexecute the last command in the transaction after he commits the transaction.

哪个用户发现的死锁，哪个用户需要提交或回滚；并且最后一条事务需要重新执行才能成功；
数据库hang住，是使用的资源别人也在使用，而死锁是你使用别人正在使用的资源时，别人也想使用你正在使用的资源；

所以此题的答案是B和D

库被HANG住的具体查看（使用HNAGANALYZE） 及真正的死锁

库被HANG住的具体查看（使用HNAGANALYZE）及真正的死锁