UNDO相关问题总结（三）

来源：互联网发布：十金数据下载编辑：程序博客网时间：2024/05/05 07:29

转载至http://blog.csdn.net/oradh/article/details/24966437

过完5.1假期回来后，发现关于undo相关问题中还有一个常见问题没有涉及和总结，那就是enq: US – contention，本次主题简单描述一下enq: US – contention这类问题

问题描述

This event indicates the session is currently waiting on the Undo Segments ，即该等待表示会话进行的事务（transaction）正在队列中等待UNDO segment。

常见原因

1.undo 空间不足，事务申请undo空间保存前映像数据时，undo表空间中暂无空间可分配，因此该session进入等待队列中

2.活动session过多，事务申请undo空间保存前映像数据时，分配undo空间的lock已经全部被占用，此时该session进入等待队列中。这种情况下，检查online undo segments 非常多，同时可能会出现一个undo 段被多个事务共同使用。

解决方法（仅供参考）

1.确认问题的具体原因，查询undo表空间使用情况，判断是否因为undo空间不足导致该类问题。

2.如果确认是由于undo空间不足引起，紧急添加数据文件到undo表空间，暂缓问题，然后继续下一步分析根本原因。

3.继续查询系统中每个session的undo使用量(v$session ,v$transaction)，同时查询系统中是否有事务正在回滚(v$fast_start_transactions or v$fast_start_servers)，或者存在死事务回滚(x$ktuxe)。

4.确认具体的进程后，可以根据我上一篇博客(“UNDO 相关问题总结（二）”中”加快事务回滚的解决方法（仅供参考）“部分）描述的方法进行解决，最后向应用、开发、维护人员培训此类问题，避免再次发生。

5.判断是否是因为活动session过多引起enq: US – contention等待，这类判断可能需要从多方面入手，判断方法如下（仅供参考）

活动session过多引起的enq: US – contention等待，一般是受害者，是由于其它问题的裙带效应而导致（例如，一个并发量较高的SQL突然执行效率下降，此时会导致整个事务执行时间边长，进一步会导致整个数据库活动事务（session）增多，从而可能引起enq: US – contention等待）
enq: US – contention等待不会消耗CPU、IO资源，但是这时，我们一般可以看到数据库服务器上CPU资源或者IO资源紧张
查看数据库的等待情况，一般可以看到大量其它等待，例如row cache、latch、scattered read、RAC类等待等等....

6.如果判断为活动session过多引起enq: US – contention等待，根据上一步，我们知道解决这类问题，需要解决问题的源头（瓶颈点），由于源头问题的现象多样，需要具体问题具体分析了，暂不在这次主题的讨论范围之内。

注意：如果此时问题现象已经消失，对于第6步中问题源头的分析，可以抽取enq: US – contention等待开始出现的前几分钟AWR和ASH两个报告进行分析（因为首先有源头，才会导致后续enq: US – contention等待）

查询语句（仅供参考）

1.undo表空间使用情况查询语句：

select b.tablespace_name,

nvl(used_undo,0) "USED_UNDO(M)",

total_undo "Total_undo(M)",

trunc(nvl(used_undo,0) / total_undo * 100, 2) || '%' used_PCT

from (select nvl(sum(bytes / 1024 / 1024), 0) used_undo, tablespace_name

from dba_undo_extents

where status in ( 'ACTIVE','UNEXPIRED')

group by tablespace_name) a,

(select tablespace_name, sum(bytes / 1024 / 1024) total_undo

from dba_data_files

where tablespace_name in

(select value

from v$spparameter

where name = 'undo_tablespace'

and (sid = (select instance_name from v$instance) or

sid = '*'))

group by tablespace_name) b

where a.tablespace_name (+)= b.tablespace_name

2.session使用的undo量查询语句：

SELECT r.name rbs,

nvl(s.username, 'None') oracle_user,

s.osuser client_user,

p.username unix_user,

s.sid,

s.serial#,

p.spid unix_pid,s.MACHINE,s.PROGRAM,s.MODULE,

t.used_ublk * TO_NUMBER(x.value) / 1024 / 1024 as undo_mb ,

TO_CHAR(s.logon_time, 'mm/dd/yy hh24:mi:ss') as login_time,

TO_CHAR(sysdate - (s.last_call_et) / 86400, 'mm/dd/yy hh24:mi:ss') as last_txn,

t.START_TIME transaction_starttime

FROM v$process p,

v$rollname r,

v$session s,

v$transaction t,

v$parameter x 　　

WHERE s.taddr = t.addr 　　

AND s.paddr = p.addr 　　

AND r.usn = t.xidusn(+) 　　

AND x.name = 'db_block_size' 　　

ORDER by undo_mb desc

3.回滚的事务和回滚进度语句查询：

语句1.

alter session set NLS_DATE_FORMAT='DD-MON-YYYY HH24:MI:SS';

select usn, state, undoblockstotal "Total", undoblocksdone "Done", undoblockstotal-undoblocksdone "ToDo",

decode(cputime,0,'unknown',sysdate+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400)) "Estimated time to complete"

from v$fast_start_transactions;

语句2.

select ktuxeusn, to_char(sysdate,'DD-MON-YYYY HH24:MI:SS') "Time", ktuxesiz, ktuxesta

from x$ktuxe

where ktuxecfl = 'DEAD';

MOS文档

How to correct performance issues with enq: US - contention related to undo segments [ID 1332738.1]

Purpose

Assist in correcting performance issues related to "enq: US Contention" on undo segments.
You have many offline undo segments and the workload starts to online many undo segments over a short period of time. This can lead to high 'latch: row cache objects' contention may be seen on dc_rollback_segments together with high 'enq: US - contention' waits when using system managed undo with an auto tuned undo retention period.
Sessions attempting to online undo segments should show ktusmous_online_undoseg() in their call stack.
Another aspect of the problem can be due to long running queries which can raise tuned_undoretention to very high values and exhausts the undo tablespace resulting in ORA-1628.
A real world case:
A query is being executed and some rows are fetched from the cursor and then the user stops working on that query (e.g. does not press the "next" button on the application screen) and works on something else (e.g. in a different window). After some time the user continues working on the query ... auto-tune starts tracking the query from this point and the maxquerylen is quite large now, hence also the tuned_undoretention (that depends directly on the maxquerylen).
NOTE: The Seibel application can allow for this problem to happen.

Last Review Date

June 24, 2011

Instructions for the Reader

A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.

Troubleshooting Details

The wait event "enq: US Contention" is associated with contention on the latch in the row cache (dc_rollback_seg). Enqueue US - Contention can become a bottle-neck for performance if workload dictates that a lot of offlined undo segments must be onlined over a short period of time. The latch on the row cache can be unable to keep up with the workload.
This can happen for a number of reasons and some scenarios are legitimate workload demands.
Solution:
Ensure that peaks in onlined undo segments do not happen (see workaround #2). That is not always feasible.
Workarounds:
1. Bounce the instance.
2. Setting _rollback_segment_count to a high number to keep undo segments online.
alter system set "_rollback_segment_count"=;
3. Set _undo_autotune to false
alter system set "_undo_autotune" = false;
NOTE: Simply using _smu_debug_mode=33554432 may not be enough to stop the problem, but valid fix for bug 5387030.
4. A fix to bug 7291739 is to set a new hidden parameter, _highthreshold_undoretention to set a high threshold for undo retention completely distinct from maxquerylen.
alter system set "_highthreshold_undoretention"=;
If problems persist, please file a Service Request with Oracle Support.
@ Diagnosis
@
@ Should the workarounds and/or configuration changes not help to alleviate the problems,
@ development would need the following diagnostics data:
@
@ a. Provide alert.log which shows the last instance startup parameters through the time of the
@ latest isssues.
@
@ b. AWR and/or ASH report of 30 or 60 minutes interval.
@
@ b. Following query output:
@
@ alter session set nls_date_format='mm/dd/yy hh24:mi:ss';
@ select begin_time, MAXQUERYID, MAXQUERYLEN from v$undostat;
@
@ c. While the error is ongoing:
@
@ On single instance:
@
@ sqlplus / as sysdba
@ oradebug setmypid
@ oradebug unlimit
@ oradebug hanganalyze 3
@ oradebug dump systemstate 266
@
@ wait for 5 seconds
@
@ oradebug dump systemstate 266
@
@ wait for 2 minutes
@
@ sqlplus / as sysdba
@ oradebug setmypid
@ oradebug unlimit
@ oradebug hanganalyze 3
@ oradebug dump systemstate 266
@
@ wait for 5 seconds
@
@ oradebug dump systemstate 266
@
@ On RAC get tracing on all nodes
@
@ sqlplus / as sysdba
@ oradebug setmypid
@ oradebug unlimit
@ oradebug -g all hanganalyze 3
@ oradebug -g all dump systemstate 266
@
@ wait for 5 seconds
@
@ oradebug -g all dump systemstate 266
@
@ wait for 2 minutes
@
@ sqlplus / as sysdba
@ oradebug setmypid
@ oradebug unlimit
@ oradebug -g all hanganalyze 3
@ oradebug -g all dump systemstate 266
@
@ wait for 5 seconds
@
@ oradebug -g all dump systemstate 266

0 0