wait for a undo record模拟和解决
来源:互联网 发布:软件测试未来发展方向 编辑:程序博客网 时间:2024/04/30 15:32
今天在生产环境的数据库被hang住,检查当时数据库的等待事件:wait for a undo record,并且有大量的的Txn Recovery Start。现在模拟和记录下这个问题的解决思路和过程
1.故障模拟
SQL> create tablespace ydtbs datafile '/u01/oradata/orcl/dytbs01.dbf' size 5000M;
Tablespace created.
SQL> create table test tablespace ydtbs as select * from dba_objects;
Table created
SQL> insert into test select * from test;
15919040 rows created.
Elapsed: 00:08:20.53
SQL>
SQL> select count(*) from test;
COUNT(*)
----------
31838080
在另一个会话里面:
SQL> col sql_text format a50
SQL> l
1 select SQL_ID,HASH_VALUE,sql_text from v$sql
2* where sql_text like '%delete%test%'
SQL> /
SQL_ID HASH_VALUE SQL_TEXT
------------- ---------- --------------------------------------------------
b11zktbz980db 4271112619 select SQL_ID,HASH_VALUE,sql_text from v$sql where
sql_text like '%delete%test%'
cb5m2z04qzgzn 158318580 delete from test
SQL> select sid,serial# from v$session where sql_hash_value = 158318580;
SID SERIAL#
---------- ----------
159 3
SQL> alter system kill session '159,3';
第一个会话
SQL> delete from test;
delete from test
*
ERROR at line 1:
ORA-00028: your session has been killed
在第二个会话检查等待事件:
SQL> col event format a50
SQL> select sid, event, p1,p2,p3 from v$session_wait;
SID EVENT P1 P2 P3
---------- -------------------------------------------------- ---------- ---------- ----------
142 wait for a undo record 0 0 0
143 db file sequential read 5 59452 1
145 db file sequential read 5 59450 1
146 SQL*Net message from client 1650815232 1 0
147 SQL*Net message to client 1650815232 1 0
148 db file sequential read 5 59451 1
159 SQL*Net message from client 1650815232 1 0
160 rdbms ipc message 100 0 0
161 rdbms ipc message 300 0 0
162 rdbms ipc message 300 0 0
163 rdbms ipc message 500 0 0
164 rdbms ipc message 180000 0 0
165 wait for stopper event to be increased 0 0 0
166 log file parallel write 1 1906 1
170 pmon timer 300 0 0
21 rows selected.
SQL>
SQL> select pid, state, undoblocksdone from v$fast_start_servers;
PID STATE UNDOBLOCKSDONE
---------- ----------- --------------
21 RECOVERING 6407
22 RECOVERING 0
23 RECOVERING 0
24 RECOVERING 0
SQL> select ADDR,KTUXEUSN,KTUXESLT,KTUXESQN,KTUXESIZ from x$ktuxe where KTUXECFL='DEAD';
ADDR KTUXEUSN KTUXESLT KTUXESQN KTUXESIZ
-------- ---------- ---------- ---------- ----------
B7F38D9C 8 35 278 1116399
SQL> select pid, state, undoblocksdone from v$fast_start_servers;
PID STATE UNDOBLOCKSDONE
---------- ----------- --------------
17 RECOVERING 8686
18 IDLE 0
19 IDLE 0
20 IDLE 0
SQL> col event format a50
select sid, event, p1,p2,p3 from v$session_wait;
SQL>
SID EVENT P1 P2 P3
---------- -------------------------------------------------- ---------- ---------- ----------
145 Streams AQ: waiting for time management or cleanup 0 0 0
tasks
148 Streams AQ: qmn slave idle wait 0 0 0
151 Streams AQ: qmn coordinator idle wait 0 0 0
152 db file sequential read 2 100220 1
153 wait for a undo record 0 0 0
154 db file sequential read 5 41930 1
155 wait for a undo record 0 0 0
158 SQL*Net message from client 1650815232 1 0
159 SQL*Net message to client 1650815232 1 0
SID EVENT P1 P2 P3
---------- -------------------------------------------------- ---------- ---------- ----------
160 rdbms ipc message 100 0 0
161 rdbms ipc message 300 0 0
162 rdbms ipc message 500 0 0
163 rdbms ipc message 180000 0 0
164 wait for stopper event to be increased 0 0 0
165 rdbms ipc message 300 0 0
166 rdbms ipc message 152 0 0
167 rdbms ipc message 300 0 0
168 rdbms ipc message 300 0 0
169 rdbms ipc message 300 0 0
170 pmon timer 300 0 0
20 rows selected.
2.故障解决
解决思路:
wait for a undo record是cpu并行恢复的资源争用undo记录的等待,调整相关参数
fast_start_parallel_rollback 为false,进行串行恢复。
调整参数 fast_start_parallel_rollback 为false,这样回滚的进程就只会有一个,速度有时候反而比并行回滚还快。
SQL> show parameter fast_start_parallel_rollback
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
fast_start_parallel_rollback string LOW
SQL>
alter system set fast_start_parallel_rollback=false;
关闭并行恢复,启用串行恢复
SQL> alter system set fast_start_parallel_rollback=false;
System altered.
SQL> show parameter fast_start_parallel_rollback
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
fast_start_parallel_rollback string FALSE
SQL> set linesize 300
SQL> select sid, event, p1,p2,p3 from v$session_wait
SQL> /
SID EVENT P1 P2 P3
---------- -------------------------------------------------- ---------- ---------- ----------
170 pmon timer 72 0 0
169 rdbms ipc message 198 0 0
160 rdbms ipc message 300 0 0
161 rdbms ipc message 76 0 0
167 rdbms ipc message 239 0 0
163 rdbms ipc message 1152 0 0
165 rdbms ipc message 258 0 0
166 rdbms ipc message 241 0 0
168 rdbms ipc message 207 0 0
162 rdbms ipc message 300 0 0
159 db file sequential read 1 5226 1
SID EVENT P1 P2 P3
---------- -------------------------------------------------- ---------- ---------- ----------
164 db file sequential read 5 35991 1
156 SQL*Net message to client 1650815232 1 0
13 rows selected.
SQL> select pid, state, undoblocksdone from v$fast_start_servers;
no rows selected
问题解决
-----------
结论:
1.大事务运行过程中被异常终止是一件比较严重的事情,死事务的回滚可能会占严重消耗的系统资源。
2.cpu争用回滚段造成等待事件:wait for a undo record
3. fast_start_parallel_rollback参数
false -- 即是关闭parallel rollback功能
low -- 也是10g的默认值,该值含有是最大的rollback进程为2*cpu_count个
high -- 当设置为该值时,最大的rollback进程为4*cpu_count个
参数其最大值是要受参数parallel_max_servers的限制的,如果是rac环境,那么还跟参数parallel_threads_per_cpu有关系,这里需要说明一点的是,该参数跟recovery_parallelism不同的,recovery_parallelism参数是指在进行instance crash recovery时的并行恢复进程个数。
1.故障模拟
SQL> create tablespace ydtbs datafile '/u01/oradata/orcl/dytbs01.dbf' size 5000M;
Tablespace created.
SQL> create table test tablespace ydtbs as select * from dba_objects;
Table created
SQL> insert into test select * from test;
15919040 rows created.
Elapsed: 00:08:20.53
SQL>
SQL> select count(*) from test;
COUNT(*)
----------
31838080
在另一个会话里面:
SQL> col sql_text format a50
SQL> l
1 select SQL_ID,HASH_VALUE,sql_text from v$sql
2* where sql_text like '%delete%test%'
SQL> /
SQL_ID HASH_VALUE SQL_TEXT
------------- ---------- --------------------------------------------------
b11zktbz980db 4271112619 select SQL_ID,HASH_VALUE,sql_text from v$sql where
sql_text like '%delete%test%'
cb5m2z04qzgzn 158318580 delete from test
SQL> select sid,serial# from v$session where sql_hash_value = 158318580;
SID SERIAL#
---------- ----------
159 3
SQL> alter system kill session '159,3';
第一个会话
SQL> delete from test;
delete from test
*
ERROR at line 1:
ORA-00028: your session has been killed
在第二个会话检查等待事件:
SQL> col event format a50
SQL> select sid, event, p1,p2,p3 from v$session_wait;
SID EVENT P1 P2 P3
---------- -------------------------------------------------- ---------- ---------- ----------
142 wait for a undo record 0 0 0
143 db file sequential read 5 59452 1
145 db file sequential read 5 59450 1
146 SQL*Net message from client 1650815232 1 0
147 SQL*Net message to client 1650815232 1 0
148 db file sequential read 5 59451 1
159 SQL*Net message from client 1650815232 1 0
160 rdbms ipc message 100 0 0
161 rdbms ipc message 300 0 0
162 rdbms ipc message 300 0 0
163 rdbms ipc message 500 0 0
164 rdbms ipc message 180000 0 0
165 wait for stopper event to be increased 0 0 0
166 log file parallel write 1 1906 1
170 pmon timer 300 0 0
21 rows selected.
SQL>
SQL> select pid, state, undoblocksdone from v$fast_start_servers;
PID STATE UNDOBLOCKSDONE
---------- ----------- --------------
21 RECOVERING 6407
22 RECOVERING 0
23 RECOVERING 0
24 RECOVERING 0
SQL> select ADDR,KTUXEUSN,KTUXESLT,KTUXESQN,KTUXESIZ from x$ktuxe where KTUXECFL='DEAD';
ADDR KTUXEUSN KTUXESLT KTUXESQN KTUXESIZ
-------- ---------- ---------- ---------- ----------
B7F38D9C 8 35 278 1116399
SQL> select pid, state, undoblocksdone from v$fast_start_servers;
PID STATE UNDOBLOCKSDONE
---------- ----------- --------------
17 RECOVERING 8686
18 IDLE 0
19 IDLE 0
20 IDLE 0
SQL> col event format a50
select sid, event, p1,p2,p3 from v$session_wait;
SQL>
SID EVENT P1 P2 P3
---------- -------------------------------------------------- ---------- ---------- ----------
145 Streams AQ: waiting for time management or cleanup 0 0 0
tasks
148 Streams AQ: qmn slave idle wait 0 0 0
151 Streams AQ: qmn coordinator idle wait 0 0 0
152 db file sequential read 2 100220 1
153 wait for a undo record 0 0 0
154 db file sequential read 5 41930 1
155 wait for a undo record 0 0 0
158 SQL*Net message from client 1650815232 1 0
159 SQL*Net message to client 1650815232 1 0
SID EVENT P1 P2 P3
---------- -------------------------------------------------- ---------- ---------- ----------
160 rdbms ipc message 100 0 0
161 rdbms ipc message 300 0 0
162 rdbms ipc message 500 0 0
163 rdbms ipc message 180000 0 0
164 wait for stopper event to be increased 0 0 0
165 rdbms ipc message 300 0 0
166 rdbms ipc message 152 0 0
167 rdbms ipc message 300 0 0
168 rdbms ipc message 300 0 0
169 rdbms ipc message 300 0 0
170 pmon timer 300 0 0
20 rows selected.
2.故障解决
解决思路:
wait for a undo record是cpu并行恢复的资源争用undo记录的等待,调整相关参数
fast_start_parallel_rollback 为false,进行串行恢复。
调整参数 fast_start_parallel_rollback 为false,这样回滚的进程就只会有一个,速度有时候反而比并行回滚还快。
SQL> show parameter fast_start_parallel_rollback
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
fast_start_parallel_rollback string LOW
SQL>
alter system set fast_start_parallel_rollback=false;
关闭并行恢复,启用串行恢复
SQL> alter system set fast_start_parallel_rollback=false;
System altered.
SQL> show parameter fast_start_parallel_rollback
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
fast_start_parallel_rollback string FALSE
SQL> set linesize 300
SQL> select sid, event, p1,p2,p3 from v$session_wait
SQL> /
SID EVENT P1 P2 P3
---------- -------------------------------------------------- ---------- ---------- ----------
170 pmon timer 72 0 0
169 rdbms ipc message 198 0 0
160 rdbms ipc message 300 0 0
161 rdbms ipc message 76 0 0
167 rdbms ipc message 239 0 0
163 rdbms ipc message 1152 0 0
165 rdbms ipc message 258 0 0
166 rdbms ipc message 241 0 0
168 rdbms ipc message 207 0 0
162 rdbms ipc message 300 0 0
159 db file sequential read 1 5226 1
SID EVENT P1 P2 P3
---------- -------------------------------------------------- ---------- ---------- ----------
164 db file sequential read 5 35991 1
156 SQL*Net message to client 1650815232 1 0
13 rows selected.
SQL> select pid, state, undoblocksdone from v$fast_start_servers;
no rows selected
问题解决
-----------
结论:
1.大事务运行过程中被异常终止是一件比较严重的事情,死事务的回滚可能会占严重消耗的系统资源。
2.cpu争用回滚段造成等待事件:wait for a undo record
3. fast_start_parallel_rollback参数
false -- 即是关闭parallel rollback功能
low -- 也是10g的默认值,该值含有是最大的rollback进程为2*cpu_count个
high -- 当设置为该值时,最大的rollback进程为4*cpu_count个
参数其最大值是要受参数parallel_max_servers的限制的,如果是rac环境,那么还跟参数parallel_threads_per_cpu有关系,这里需要说明一点的是,该参数跟recovery_parallelism不同的,recovery_parallelism参数是指在进行instance crash recovery时的并行恢复进程个数。
关于fast_start_parallel_rollback相关学习,请参考roger大师的blog,具体链接如下:
http://www.killdb.com/2011/10/07/%E5%85%B3%E4%BA%8Eparallel-rollback%E7%9A%84%E4%B8%80%E7%82%B9%E6%80%BB%E7%BB%93.html
=========================================================================
-- The End --
- wait for a undo record模拟和解决
- wait for a undo record
- wait for a undo record等待事件
- wait for a undo record 等待事件
- oracle 数据库服务器断电重启导致数据库性能急剧下降之等待事件wait for a undo record
- A record for OF installation
- The record for a starter making a printed circuit board.
- UPDATE FOR WAIT 和 UPDATE FOR NOWAIT
- Record和Packed record
- 用wait和notify模拟阻塞队列
- 模拟阻塞队列-使用notify和wait
- 使用wait和notify模拟queue
- MYSQL REPLACE和WAIT FOR
- Wait for
- for record in file
- record for read ·
- Wait for me和waitting for me有什么区别?
- Just a Daily Record
- Fuck MySQL (2) Linux环境下MySQL用户登陆命令
- 初试picture控件
- 回车和换行的区别
- JS document练习
- 大 学 十 年 励志
- wait for a undo record模拟和解决
- 关系型数据库基础之:组合查询
- 58同城姚劲波:以域名起家 从创业小子到亿万富翁
- 危机过后,雷军为小米寻找新“同盟”?
- UTMS 信令培训(16)
- Linux目录文件操作
- Makefile编写------------实例演示
- 编译Android4.0.4
- C++内存管理