节点2主机关停之后,VIP并没有failover到节点一
来源:互联网 发布:网络电视全球在线 编辑:程序博客网 时间:2024/06/11 07:08
现象:
节点2主机关停之后,VIP并没有failover到节点一
[root@MAA01 ~]# ifconfig
eth0 Link encap:Ethernet HWaddr A4:BA:DB:13:E2:AB
inet addr:10.8.32.111 Bcast:10.0.15.255 Mask:255.255.255.0
inet6 addr: fe80::a6ba:dbff:fe13:e2ab/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:12896778 errors:0 dropped:0 overruns:0 frame:0
TX packets:9488933 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2875695560 (2.6 GiB) TX bytes:2411913446 (2.2 GiB)
Interrupt:114 Memory:d6000000-d6012800
eth0:1 Link encap:Ethernet HWaddr A4:BA:DB:13:E2:AB
inet addr:10.8.32.115 Bcast:10.0.15.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:114 Memory:d6000000-d6012800
eth1 Link encap:Ethernet HWaddr A4:BA:DB:13:E2:AD
inet addr:192.168.127.101 Bcast:192.168.127.255 Mask:255.255.255.0
inet6 addr: fe80::a6ba:dbff:fe13:e2ad/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:203865 errors:0 dropped:0 overruns:0 frame:0
TX packets:309076 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:217218808 (207.1 MiB) TX bytes:66031839 (62.9 MiB)
Interrupt:122 Memory:d8000000-d8012800
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:2264776 errors:0 dropped:0 overruns:0 frame:0
TX packets:2264776 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3249461652 (3.0 GiB) TX bytes:3249461652 (3.0 GiB)
[oracle@MAA01 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....SM1.asm application ONLINE ONLINE maa01
ora....01.lsnr application ONLINE ONLINE maa01
ora....t01.gsd application ONLINE ONLINE maa01
ora....t01.ons application ONLINE ONLINE maa01
ora....t01.vip application ONLINE ONLINE maa01
ora....SM2.asm application ONLINE OFFLINE
ora....02.lsnr application ONLINE OFFLINE
ora....t02.gsd application ONLINE OFFLINE
ora....t02.ons application ONLINE OFFLINE
ora....t02.vip application ONLINE OFFLINE
ora.rac.db application ONLINE ONLINE maa01
ora....c1.inst application ONLINE ONLINE maa01
ora....c2.inst application ONLINE OFFLINE
ora...._taf.cs application OFFLINE OFFLINE
ora....ac1.srv application OFFLINE OFFLINE
ora....ac2.srv application OFFLINE OFFLINE
ora....rac1.cs application OFFLINE OFFLINE
ora....ac1.srv application OFFLINE OFFLINE
ora....rac2.cs application OFFLINE OFFLINE
ora....ac2.srv application OFFLINE OFFLINE
[oracle@MAA01 ~]$
此时,在节点1上ping节点2,无法ping通:[oracle@MAA01 ~]$ ping 10.8.32.112
PING 10.8.32.112 (10.8.32.112) 56(84) bytes of data.
From 10.8.32.111 icmp_seq=1 Destination Host Unreachable
From 10.8.32.111 icmp_seq=2 Destination Host Unreachable
分析:
查看了节点1的监听配置文件,未发现有异常:
$CRS_HOME/log/<nodename>/*.log
$CRS_HOME/log/<nodename>/crsd/*.log
$CRS_HOME/log/<nodename>/cssd/*.log
$ORACLE_HOME/network/admin/listener.ora
[oracle@MAA01 ~]$
[oracle@MAA01 ~]$ cd $ORACLE_HOME
[oracle@MAA01 db]$ cd network/admin/
[oracle@MAA01 admin]$ cat listener.ora
# listener.ora.maa01 Network Configuration File: /oracle/app/11gR1/db/network/admin/listener.ora.maa01
# Generated by Oracle configuration tools.
INBOUND_CONNECT_TIMEOUT_LISTENER_MAA01=180
LISTENER_MAA01 =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = MAA01-vip)(PORT = 1521)(IP = FIRST))
)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.8.32.111)(PORT = 1521)(IP = FIRST))
)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
)
)
)
查看节点1的相关日志文件,发现尝CRS进行了failover vip的尝试,但失败了。
[oracle@MAA01 admin]$
crsd.log:
[crsd(5072)]CRS-1201:CRSD started on node maa01.
2013-08-09 17:18:57.513
[crsd(5072)]CRS-1205:Auto-start failed for the CRS resource . Details in maa01.
2013-08-09 17:28:01.175
[cssd(5555)]CRS-1612:node joadbtest02 (2) at 50% heartbeat fatal, eviction in 14.102 seconds
2013-08-09 17:28:02.177
[cssd(5555)]CRS-1612:node joadbtest02 (2) at 50% heartbeat fatal, eviction in 13.102 seconds
2013-08-09 17:28:09.181
[cssd(5555)]CRS-1611:node joadbtest02 (2) at 75% heartbeat fatal, eviction in 6.102 seconds
2013-08-09 17:28:13.179
[cssd(5555)]CRS-1610:node joadbtest02 (2) at 90% heartbeat fatal, eviction in 2.102 seconds
2013-08-09 17:28:14.181
[cssd(5555)]CRS-1610:node joadbtest02 (2) at 90% heartbeat fatal, eviction in 1.102 seconds
2013-08-09 17:28:15.183
[cssd(5555)]CRS-1610:node joadbtest02 (2) at 90% heartbeat fatal, eviction in 0.092 seconds <--------------heart beat loss
2013-08-09 17:28:16.045
[cssd(5555)]CRS-1607:CSSD evicting node joadbtest02. Details in /oracle/app/11gR1/crs/log/maa01/cssd/ocssd.log.
[cssd(5555)]CRS-1601:CSSD Reconfiguration complete. Active nodes are maa01 . <----------------------------------------------------Node2 was evicted
alertmaa01.log:
[ CSSD]2013-08-09 17:28:31.188 [1158809920] >TRACE: clssnmUpdateNodeState: node 2, state (5/0) unique (1371182914/1371182914) prevConuni(1371182914) birth (244117402/244117402) (old/new)
[ CSSD]2013-08-09 17:28:31.188 [1158809920] >TRACE: clssnmDeactivateNode: node 2 (joadbtest02) left cluster
ocssd.log:
2013-08-09 17:18:57.506: [ CRSRES][1488656704] startRunnable: setting CLI values
2013-08-09 17:18:57.512: [ CRSRES][1486555456] maa01 : CRS-1019: Resource ora.joadbtest02.LISTENER_JOADBTEST02.lsnr (application) cannot run on maa01
2013-08-09 17:18:57.519: [ CRSRES][1488656704] Attempting to start `ora.maa01.ASM1.asm` on member `maa01`
2013-08-09 17:18:57.531: [ CRSRES][1490757952] startRunnable: setting CLI values
2013-08-09 17:18:57.541: [ CRSRES][1490757952] Attempting to start `ora.maa01.vip` on member `maa01`
2013-08-09 17:19:01.054: [ CRSRES][1490757952] Start of `ora.maa01.vip` on member `maa01` succeeded.
2013-08-09 17:19:01.079: [ CRSRES][1490757952] startRunnable: setting CLI values
2013-08-09 17:19:01.093: [ CRSRES][1490757952] Attempting to start `ora.maa01.LISTENER_MAA01.lsnr` on member `maa01`
2013-08-09 17:19:04.660: [ CRSRES][1490757952] Start of `ora.maa01.LISTENER_MAA01.lsnr` on member `maa01` succeeded.
2013-08-09 17:19:05.204: [ CRSRES][1513838912] CRS-1002: Resource 'ora.maa01.LISTENER_MAA01.lsnr' is already running on member 'maa01'
2013-08-09 17:28:31.192: [ OCRMAS][1213802816]th_master:13: I AM THE NEW OCR MASTER at incar 14. Node Number 1 <---Node 1 is master.
2013-08-09 17:28:31.194: [ CRSCOMM][1486555456] CLEANUP: Searching for connections to failed node joadbtest02
2013-08-09 17:28:31.194: [ CRSEVT][1486555456] Processing member leave for joadbtest02, incarnation: 244117407
2013-08-09 17:28:31.195: [ CRSD][1486555456] SM: recovery in process: 8
2013-08-09 17:28:31.195: [ CRSEVT][1486555456] Do failover for: joadbtest02 <-------在此时failover失败.
2013-08-09 17:28:31.399: [ CRSRES][1513838912] startRunnable: setting CLI values
2013-08-09 17:28:31.414: [ CRSRES][1513838912] Attempting to start `ora.joadbtest02.vip` on member `maa01` <---尝试vip failover到节点1
2013-08-09 17:28:31.421: [ CRSRES][1530632512] startRunnable: setting CLI values
2013-08-09 17:28:31.434: [ CRSRES][1530632512] Attempting to start `ora.rac.db` on member `maa01`
2013-08-09 17:28:31.542: [ CRSRES][1530632512] Start of `ora.rac.db` on member `maa01` succeeded.
2013-08-09 17:28:37.863: [ CRSAPP][1513838912] StartResource error for ora.joadbtest02.vip error code = 1
2013-08-09 17:28:41.057: [ CRSRES][1513838912] Start of `ora.joadbtest02.vip` on member `maa01` failed. <---------VIP failover failed.
2013-08-09 17:28:41.085: [ CRSEVT][1486555456] Post recovery done evmd event for: joadbtest02
2013-08-09 17:28:41.085: [ CRSD][1486555456] SM: recoveryDone: 0
2013-08-09 17:28:41.098: [ CRSEVT][1486555456] Processing RecoveryDone
再查看ora.joadbtest02.vip日志文件:
ora.joadbtest02.vip:
2013-08-09 17:28:34.723: [ RACG][1353934704] [11316][1353934704][ora.joadbtest02.vip]: checkIf: interface eth2 is down <--- is it clue?
Invalid parameters, or failed to bring up VIP (host=MAA01)
2013-08-09 17:28:34.729: [ RACG][1353934704] [11316][1353934704][ora.joadbtest02.vip]: clsrcexecut: cmd = /oracle/app/11gR1/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /oracle/app/11gR1/crs/bin/racgvip start joadbtest02
2013-08-09 17:28:34.729: [ RACG][1353934704] [11316][1353934704][ora.joadbtest02.vip]: clsrcexecut: rc = 1, time = 3.150s
2013-08-09 17:28:37.861: [ RACG][1353934704] [11316][1353934704][ora.joadbtest02.vip]: clsrcexecut: cmd = /oracle/app/11gR1/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /oracle/app/11gR1/crs/bin/racgvip check joadbtest02
2013-08-09 17:28:37.861: [ RACG][1353934704] [11316][1353934704][ora.joadbtest02.vip]: clsrcexecut: rc = 1, time = 3.130s
2013-08-09 17:28:37.861: [ RACG][1353934704] [11316][1353934704][ora.joadbtest02.vip]: end for resource = ora.joadbtest02.vip, action = start, status = 1, time = 6.350s
此处已经看出线索了,看来问题出在网卡这里,节点1的Public IP的网卡是eth0,不知道何故,节点二Public IP的网卡却为eth2,
由于客户之前的messages日志并没有保留,Oracle和集群更早期的日志也没有。具体为什么两个节点的Public IP不一样不得而知。
解决方法:
将两个节点Public IP的网卡设置为一致,具体操作可参考我之前写的一篇文章:
VIP不能正常启动,报错CRS-1006
http://blog.csdn.net/zhou1862324/article/details/17268339
1 0
- 节点2主机关停之后,VIP并没有failover到节点一
- RAC节点1reboot之后,节点1的资源为何没有failover到节点2?
- 11g RAC基于服务端的TAF,遇到会话failover到其他节点,但是VIP未漂
- 10G RAC节点2宕机通过修改listener.ora实现客户端通过节点2VIP连接到数据库
- 验证RAC节点reboot节点vip资源和服务会切换到另外节点,然后再手动恢复故障节点
- javascript实现把一个节点插入到另一个节点之后
- Javascript removeChild()遍历删除节点之后实际上并没有删除的解决方案
- 集群单节点任务Failover
- Eureka Server不踢出已关停的节点;Eureka心跳间隔
- VIP FAILOVER DEMONSTRATION
- Cacti中添加主机节点
- 天翼云主机内蒙古节点主机测评
- rac 中节点的vip在该节点启动不了,在其他节点正常启动。
- 节点
- 节点
- 节点
- 节点
- 节点
- GTK编程基础------对话框
- Documents打开文档时的参数需要“COleVariant covOptional((long)DISP_E_PARAMNOTFOUND, VT_ERROR); ”
- oracle 表空间管理再续
- 去除JSP页面自动生成的空行
- Hibernate get和load区别
- 节点2主机关停之后,VIP并没有failover到节点一
- 中国电视覆盖及收视状况调查结果出炉
- spring MVC 之构造ModelAndView对象
- Spring IOC 原理
- ModelAndView详解
- bigfib java
- wikioi-天梯-提高一等-棋盘dp-1169:传纸条
- KMP算法
- 黑马程序员_向对象的核心思想+实例Day15