11g rac terminating the instance due to error 481 节点驱逐
来源:互联网 发布:unity3d 地形材质 编辑:程序博客网 时间:2024/06/05 22:15
版本
aix 6.1
oracle 11.2.0.3.0 rac
问题描述
2节点rac,其中一个节点无法启动,startup nomount无法成功,报错ORA-03113: end-of-file on communication channel
alert中发现如下内容
PMON (ospid: nnnn): terminating the instance due to error 481
这里把文档Doc ID 1383737.1贴出来,做下记录
ASM on Non First Node (Second or Other Node) Fails to Come up With: PMON (ospid: nnnn): terminating the instance due to error 481 (Doc ID 1383737.1)
Applies to:
Oracle Database - Enterprise Edition - Version 11.2.0.1 and later
Information in this document applies to any platform.
Purpose
This note lists common causes of ASM start up failure with the following error on non-first node (second or others):
alert_<ASMn>.log from non-first node
lmon registered with NM - instance number 2 (internal mem no 1)
Tue Dec 06 06:16:15 2011
System state dump requested by (instance=2, osid=19095 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /g01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_diag_19138.trc
Tue Dec 06 06:16:15 2011
PMON (ospid: 19095): terminating the instance due to error 481
Dumping diagnostic data in directory=[cdmp_20111206061615], requested by (instance=2, osid=19095 (PMON)), summary=[abnormal instance termination].
Tue Dec 06 06:16:15 2011
ORA-1092 : opitsk aborting process
Note: ASM instance terminates shortly after "lmon registered with NM"
If ASM on non-first node was running previously, likely the following will be in alert.log when it failed originally:
..
IPC Send timeout detected. Sender: ospid 32231 [oracle@ftdcslsedw01b (PING)]
..
ORA-29740: evicted by instance number 1, group incarnation 10
..
diag trace from non-first ASM (+ASMn_diag_<pid>.trc)
kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).
kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE])
alert_<ASMn>.log from first node
LMON (ospid: 15986) detects hung instances during IMR reconfiguration
LMON (ospid: 15986) tries to kill the instance 2 in 37 seconds.
Please check instance 2's alert log and LMON trace file for more details.
..
Remote instance kill is issued with system inc 64
Remote instance kill map (size 1) : 2
LMON received an instance eviction notification from instance 1
The instance eviction reason is 0x20000000
The instance eviction map is 2
Reconfiguration started (old inc 64, new inc 66)
If the issue happens while running root script (root.sh or rootupgrade.sh) as part of Grid Infrastructure installation/upgrade process, the following symptoms will present:
root script screen output
Start of resource "ora.asm" failedCRS-2672: Attempting to start 'ora.asm' on 'racnode1'
CRS-5017: The resource action "ora.asm start" encountered the following error:
ORA-03113: end-of-file on communication channel
Process ID: 0
Session ID: 0 Serial number: 0
. For details refer to "(:CLSN00107:)" in "/ocw/grid/log/racnode1/agent/ohasd/oraagent_grid/oraagent_grid.log".
CRS-2674: Start of 'ora.asm' on 'racnode1' failed
..
Failed to start ASM at /ispiris-qa/app/11.2.0.3/crs/install/crsconfig_lib.pm line 1272$GRID_HOME/cfgtoollogs/crsconfig/rootcrs_<nodename>.log
2011-11-29 15:56:48: Executing cmd: /ispiris-qa/app/11.2.0.3/bin/crsctl start resource ora.asm -init
..
> CRS-2672: Attempting to start 'ora.asm' on 'racnode1'
> CRS-5017: The resource action "ora.asm start" encountered the following error:
> ORA-03113: end-of-file on communication channel
> Process ID: 0
> Session ID: 0 Serial number: 0
> . For details refer to "(:CLSN00107:)" in "/ispiris-qa/app/11.2.0.3/log/racnode1/agent/ohasd/oraagent_grid/oraagent_grid.log".
> CRS-2674: Start of 'ora.asm' on 'racnode1' failed
> CRS-2679: Attempting to clean 'ora.asm' on 'racnode1'
> CRS-2681: Clean of 'ora.asm' on 'racnode1' succeeded
..
> CRS-4000: Command Start failed, or completed with errors.
>End Command output
2011-11-29 15:59:00: Executing cmd: /ispiris-qa/app/11.2.0.3/bin/crsctl check resource ora.asm -init
2011-11-29 15:59:00: Executing cmd: /ispiris-qa/app/11.2.0.3/bin/crsctl status resource ora.asm -init
2011-11-29 15:59:01: Checking the status of ora.asm
..
2011-11-29 15:59:53: Start of resource "ora.asm" failed
Details
Case1: link local IP (169.254.x.x) is being used by other adapter/network
Symptoms:
$GRID_HOME/log/<nodename>/alert<nodename>.log
[/ocw/grid/bin/orarootagent.bin(4813)]CRS-5018:(:CLSN00037:) Removed unused HAIP route: 169.254.95.0 / 255.255.255.0 / 0.0.0.0 / usb0OS messages (optional)
Dec 6 06:11:14 racnode1 dhclient: DHCPREQUEST on usb0 to 255.255.255.255 port 67
Dec 6 06:11:14 racnode1 dhclient: DHCPACK from 169.254.95.118ifconfig -a
..
usb0 Link encap:Ethernet HWaddr E6:1F:13:AD:EE:D3
inet addr:169.254.95.120 Bcast:169.254.95.255 Mask:255.255.255.0
..
Note: it's usb0 in this case, but it can be any other adapter which uses link local
Solution:
Link local IP must not be used by any other network on cluster nodes. In this case, an USB network device gets IP 169.254.95.118 from DHCP server which disrupted HAIP routing, and solution is to black list the device in udev from being activated automatically.
On Sun T series, by default, ILOM (adapter name usbecm0) uses link local, it can be deconfigured with command "ilomconfig disable interconnect" or reconfigured to use other IP addresses, refer to "http://docs.oracle.com/cd/E20451_01/html/E25303/mpclt.gkggr.html#scrolltoc " for details
Case2: firewall exists between nodes on private network (iptables etc)
No firewall is allowed on private network (cluster_interconnect) between nodes including software firewall like iptables, ipmon etc
Case3: HAIP is up on some nodes but not on all
Symptoms:
alert_<+ASMn>.log for some instances
Cluster communication is configured to use the following interface(s) for this instance
10.1.0.1alert_<+ASMn>.log for other instances
Cluster communication is configured to use the following interface(s) for this instance
169.254.201.65Note: some instances is using HAIP while others are not, so they can not talk to each other
Solution:
The solution is to bring up HAIP on all nodes.
To find out HAIP status, execute the following on all nodes:
$GRID_HOME/bin/crsctl stat res ora.cluster_interconnect.haip -init
If it's offline, try to bring it up as root:$GRID_HOME/bin/crsctl start res ora.cluster_interconnect.haip -init
If HAIP fails to start, refer to note 1210883.1 for known issues.
If the "up node" is not using HAIP, and no outage is allowed, the workaround is to set init.ora/spfile parameter cluster_interconnect to the private IP of each node to allow ASM/DB to come up on "down node". Once a maintenance window is planned, the parameter must be removed to allow HAIP to work.
Case4: HAIP is up on all nodes but some do not have route info
Symptoms:
alert_<+ASMn>.log for all instances
Cluster communication is configured to use the following interface(s) for this instance
169.254.xxx.xxx"netstat -rn" for some nodes (surviving nodes) missing HAIP route
netstat -rn
Destination Gateway Genmask Flags MSS Window irtt Iface
161.130.90.0 0.0.0.0 255.255.248.0 U 0 0 0 bond0
160.131.11.0 0.0.0.0 255.255.255.0 U 0 0 0 bond2
0.0.0.0 160.11.80.1 0.0.0.0 UG 0 0 0 bond0The line for HAIP is missing, i.e:
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 bond2
Note: As HAIP route info is missing on some nodes, HAIP is not pingable; usually newly restarted node will have HAIP route info
Solution:
The solution is to manually add HAIP route info on the nodes that's missing:
4.1. Execute "netstat -rn" on any node that has HAIP route info and locate the following:
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 bond2
Note: the first field is HAIP subnet ID and will start with 169.254.xxx.xxx, the third field is HAIP subnet netmask and the last field is private network adapter name
4.2. Execute the following as root on the node that's missing HAIP route:
# route add -net <HAIP subnet ID> netmask <HAIP subnet netmask> dev <private network adapter>
i.e.
# route add -net 169.254.0.0 netmask 255.255.0.0 dev bond2
4.3. Start ora.crsd as root on the node that's partial up:.# $GRID_HOME/bin/crsctl start res ora.crsd -init
The other workaround is to restart GI on the node that's missing HAIP route with "crsctl stop crs -f" and "crsctl start crs" command as root.
- 11g rac terminating the instance due to error 481 节点驱逐
- LMON:terminating instance due to error 481
- 解决terminating the instance due to error 481导致ASM无法启动故障
- RAC无法启动,报错terminating instance due to error 304
- 微软云azure部署oracle报错:PMON (ospid: 7504): terminating the instance due to error 822
- Solaris 11操作系统的bug导致的RAC节点驱逐
- RAC中节点被驱逐的条件
- Convert 10g Single-Instance database to 10g RAC
- Terminating app due to uncaught exception unrecognized selector sent to instance程序崩毁后如何快速定位哪一行错误
- Terminating app due to uncaught exception 'UIApplicationInvalidInterfaceOrientation'
- Terminating app due to uncaught exception 'NSUnknownKeyException'
- Terminating app due to uncaught exception 'NSUnknownKeyException'
- Terminating app due to uncaught exception 'NSUnknownKeyException'
- Terminating app due to uncaught exception 'NSInternalInconsistency
- Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: 'Pushing the same vi
- How to Convert 10g Single-Instance database to 10g RAC using Manual Conversion procedure
- How to Convert 10g Single-Instance database to 10g RAC using Manual Conversion procedure
- oel6.3上 oracle RAC 上节点驱逐检查过程。
- 凸组合(convex combination)
- 一个正整数N,不用sqrt求开方数
- openmp的一些示例代码
- 使用Texture管理cocosBuilder项目资源:纹理文件使用(TexturePacker)
- Web Service 那点事儿 —— 使用 CXF 开发 SOAP 服务
- 11g rac terminating the instance due to error 481 节点驱逐
- 从零开始开发应用 客户端篇(一)
- No Space Please, Only Tabs for Indentation
- 在Spring、Hibernate中使用Ehcache缓存
- HDU 题目1051 Wooden Sticks
- c++版链表实现
- python实现贝叶斯分类器
- 动态链接库与静态链接库的区别
- .......