High "gc buffer busy acquire" wait after Upgrade to 11.2 (文档 ID 2047582.1)
来源:互联网 发布:最小截图软件 编辑:程序博客网 时间:2024/06/16 13:09
In this Document
APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.2.0.4 and laterInformation in this document applies to any platform.
SYMPTOMS
After a database was migrated (using export/import) from a 10.2.0.4 RAC cluster to a new cluster with Grid Infrastructure 12.1.0.2 and RDBMS 11.2.0.4, customer observed "gc buffer busy acquire" as the top wait event whenever the load increased significantly.
This behaviour was not observed in the "old" 10.2 environment, there was no change neither in the data (amount) nor in the application.
AWR report from instance 1 shows the following during times of high load:
Top 10 Foreground Events by Total Wait Time
AWR from instance 2 shows a similar performance:
Top 10 Foreground Events by Total Wait Time
The top SQL is like:
SELECT * FROM SPROD.LOKS WHERE...
It is a select for update SQL and there are many similar SQLs:
database alert log shows NUMA is enabled at OS level:
NUMA status: NUMA system w/ 2 process groups
cellaffinity.ora status: cannot find affinity map at '/etc/oracle/cell/network-config/cellaffinity.ora' (see trace file for details)
CELL communication will use 1 IP group(s):
Grp 0:
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DEST
ARCH: Warning; less destinations available than specified
by LOG_ARCHIVE_MIN_SUCCEED_DEST init.ora parameter
WARNING: db_recovery_file_dest is same as db_create_online_log_dest_2
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
NUMA system with 2 nodes detected
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
...
NUMA structure can also be verified from OS command:
# numactl --hardware
available: 2 nodes (0-1) <<< 2 nodes here, without NUMA enabled, it will show available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 12 13 14 15 16 17
node 0 size: 49143 MB
node 0 free: 33241 MB
node 1 cpus: 6 7 8 9 10 11 18 19 20 21 22 23
node 1 size: 49152 MB
node 1 free: 34444 MB
node distances:
node 0 1
0: 10 21
1: 21 10
# dmesg |grep -i numa
NUMA: Initialized distance table, cnt=2 <<< without NUMA enabled, it will show "No NUMA configuration found"
NUMA: Node 0 [0,a0000) + [100000,80000000) -> [0,80000000)
NUMA: Node 0 [0,80000000) + [100000000,c80000000) -> [0,c80000000)
CHANGES
Upgraded a database from a 10.2 cluster to a 11.2 new cluster
CAUSE
The issue was caused by NUMA being configured at the OS level, this caused the LMS processes to be blocked in status 'D' (Uninterruptible sleep - usually IO) for an excessive amount of time which in turn led to excessive gc related waits.
OSWatcher shows during this time, lms1 on instance 2 went into "D" status:
USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND
oracle 24205 1 19 1.3 2.4 31730396 4762712 ? D 10:20:01 00:03:26 ora_lms1_RACDB2 <<<
zzz ***Fri Jul 24 14:43:35 NZST 2015
USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND
oracle 24205 1 19 1.3 2.4 31751004 4786908 poll_s S 10:20:01 00:03:27 ora_lms1_RACDB2
RACDB2_lmhb_24292.trc complains that lms1 process did not move for 23 sec due to the "D" status:
==============================
LMS1 (ospid: 24205) has not moved for 23 sec (1437705793.1437705770)
kjfmGCR_HBCheckAll: LMS1 (ospid: 24205) has status 2
: Not in wait; last wait ended 22 secs ago.
*** 2015-07-24 14:43:16.860
loadavg : 7.77 5.32 4.26
Memory (Avail / Total) = 46962.26M / 193658.71M
Swap (Avail / Total) = 16384.00M / 16384.00M
skgpgcmdout: read() for cmd /bin/ps -elf | /bin/egrep 'PID | 24205' | /bin/grep -v grep timed out after 2.500 seconds
*** 2015-07-24 14:43:19.367
Short stack dump: ORA-32516: cannot wait for process 'Unix process pid: 24205, image: oracle@racnode2.com (LMS1)' to finish executing ORADEBUG command 'SHORT_STACK'; wait time exceeds 2500 ms
There are many traces on all instances complain a similar message to the followings as the result of lms1 process not moving:
*** ACTION NAME:() 2015-07-24 14:43:24.700
kcbzwb: try to cancel ongoing current request (tsn 7 rdba 0xe5f547a dataobj# 81085 obj# 81085) if any because it got timeout for 'gc buffer busy acquire' (timeout 100 csec) 30 times, to resolve potential deadlock
The dataobj# varies in the traces.
Per ASH data dump, all "gc buffer busy acquire" sessions are blocked by a session either waiting for "gc cr block 2-way", "gc cr block 3-way" or "gc cr block congested" etc, there is no blocking session for those gc wait, sometimes several sessions waiting for "gc buffer busy acquire" on the same block and blocked by the same session.
Those gc wait spikes are caused by lms process stuck in "D" status for 20-40 sec intermittently.
Further debug performed by OS to find out why a process stuck in "D" status confirms it is due to OS NUMA activity.
Other test results shows with NUMA enabled at OS layer, with or without NUMA enabled at Oracle layer, the performance are equally slow. Only turn off NUMA completely at OS layer resolves the issue.
SOLUTION
Engage the system administrator to turn off NUMA at the OS layer (this is platform specific but in the case of Linux it could be as simple as specifying the boot parameter "numa=off" in grub.conf).
For this case, the performance issue was resolved after turning off NUMA.
No more LMS process stuck in "D" status shown in OSwatcher, no LMS process not moving messages shown in lmhb trace.
The post change AWR report shows:
Top 10 Foreground Events by Total Wait Time
The previous top SQL now shows:
45K executions now only waits for 12sec in cluster wait compare to previous 5752sec cluster wait for 42K executions of the same SQL.
- High "gc buffer busy acquire" wait after Upgrade to 11.2 (文档 ID 2047582.1)
- gc buffer busy acquire
- gc buffer busy acquire vs release
- gc buffer busy acquire和gc buffer busy release原理及案例
- [RAC性能调优] gc buffer busy acquire 处理
- RAC性能分析gc buffer busy acquire 等待事件
- ORA-01791: not a SELECTed expression after upgrade to 11.2.0.4 (文档 ID 1600974.1)
- 【Oracle数据库技术支持】RAC性能分析 - gc buffer busy acquire 等待事件
- Oracle : buffer busy wait
- High 'cursor: pin S wait on X' . Cause: Shared Pool/Buffer Cache Resize Activity (文档 ID 742599.1)
- WAITEVENT: "buffer busy waits" Reference Note (文档 ID 34405.1)
- Srvctl Does Not Work After the Grid infrastructure upgrade from 11gr2 to 12c (文档 ID 2132856.1)
- rac 碰到 gc buffer busy
- gc buffer busy等待事件
- GC Buffer Busy等待事件
- buffer busy wait - file header block
- Common Wait Events---buffer busy waits
- Oracle wait event --- buffer busy waits
- 5908 Abelian Period
- (HDU 1754)I Hate It 线段树区间查询入门,单点修改
- hdu 3045 Picnic Cows 斜率优化dp
- 嵌入式工程师过的这小半辈子
- BZOJ 1911
- High "gc buffer busy acquire" wait after Upgrade to 11.2 (文档 ID 2047582.1)
- 外挂的本质是什么
- perl加载模块,曾经走过的弯路。
- 敌兵布阵 树状数组入门
- 机器学习之工程师入门路线
- linux/mac上使用nginx
- 1183 编辑距离 dp
- 文件的保存及读取
- Java关键字final用法