Oracle CRS 不能启动,日志报错: "has a disk HB, but no network HB, DHB has rcfg..."

来源:互联网 发布:q宠大乐斗门派技能数据 编辑:程序博客网 时间:2024/06/05 21:05
现象:
--查看crs状态
#/u01/app/11.2.0/grid/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager


[root@ntrac1 ~]# /u01/app/oracle/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE                               Instance Shutdown   
ora.cluster_interconnect.haip
      1        ONLINE  OFFLINE                                                   
ora.crf
      1        ONLINE  OFFLINE                                                   
ora.crsd
      1        ONLINE  OFFLINE                                                   
ora.cssd
      1        ONLINE  OFFLINE                               STARTING            
ora.cssdmonitor
      1        ONLINE  ONLINE       ntrac1                                       
ora.ctssd
      1        ONLINE  OFFLINE                                                   
ora.diskmon
      1        OFFLINE OFFLINE                                                   
ora.evmd
      1        ONLINE  OFFLINE                                                   
ora.gipcd
      1        ONLINE  ONLINE       ntrac1                                       
ora.gpnpd
      1        ONLINE  ONLINE       ntrac1                                       
ora.mdnsd
      1        ONLINE  ONLINE       ntrac1                     


--查看grid日志
#tail -100f $GRID_HOME/log/ntrac1/alertntrac1.log
2014-08-06 12:29:59.627: 
[/u01/app/oracle/grid/bin/cssdagent(32145)]CRS-5818:Aborted command 'start' for resource 'ora.cssd'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/oracle/grid/log/ntrac1/agent/ohasd/oracssdagent_root/oracssdagent_root.log.
2014-08-06 12:29:59.628: 
[cssd(32230)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/oracle/grid/log/ntrac1/cssd/ocssd.log
2014-08-06 12:29:59.628: 
[cssd(32230)]CRS-1603:CSSD on node ntrac1 shutdown by user.
2014-08-06 12:30:04.791: 
[ohasd(20111)]CRS-2765:Resource 'ora.cssdmonitor' has failed on server 'ntrac1'.
2014-08-06 12:30:06.569: 
[cssd(36385)]CRS-1713:CSSD daemon is started in clustered mode
2014-08-06 12:30:08.191: 
[ohasd(20111)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
2014-08-06 12:30:22.814: 
[cssd(36385)]CRS-1707:Lease acquisition for node ntrac1 number 1 completed
2014-08-06 12:30:24.103: 
[cssd(36385)]CRS-1605:CSSD voting file is online: /dev/mapper/CML_OCR02; details in /u01/app/oracle/grid/log/ntrac1/cssd/ocssd.log.
2014-08-06 12:30:24.111: 
[cssd(36385)]CRS-1605:CSSD voting file is online: /dev/mapper/CML_OCR03; details in /u01/app/oracle/grid/log/ntrac1/cssd/ocssd.log.
2014-08-06 12:30:24.122: 
[cssd(36385)]CRS-1605:CSSD voting file is online: /dev/mapper/CML_OCR01; details in /u01/app/oracle/grid/log/ntrac1/cssd/ocssd.log.


--查看ocssd日志
#tail -100f $GRID_HOME/log/ntrac1/cssd/ocssd.log
2014-08-06 14:45:04.140: [    CSSD][483813120]clssnmLocalJoinEvent: takeover aborted due to cluster member node found on disk
2014-08-06 14:45:04.623: [    CSSD][488544000]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2014-08-06 14:45:04.968: [    CSSD][502802176]clssnmvDHBValidateNcopy: node 2, ntrac2, has a disk HB, but no network HB, DHB has rcfg 301688209, wrtcnt, 3361203, LATS 5085774, lastSeqNo 3361200, uniqueness 1406193376, timestamp 1407307504/1113874084
2014-08-06 14:45:04.968: [    CSSD][502802176]clssnmvDHBValidateNcopy: node 3, ntrac3, has a disk HB, but no network HB, DHB has rcfg 301688209, wrtcnt, 3360733, LATS 5085774, lastSeqNo 3360730, uniqueness 1406193385, timestamp 1407307504/1113864924
2014-08-06 14:45:05.105: [    CSSD][498054912]clssnmvDHBValidateNcopy: node 2, ntrac2, has a disk HB, but no network HB, DHB has rcfg 301688209, wrtcnt, 3361205, LATS 5085904, lastSeqNo 3361202, uniqueness 1406193376, timestamp 1407307504/1113874544
2014-08-06 14:45:05.105: [    CSSD][498054912]clssnmvDHBValidateNcopy: node 3, ntrac3, has a disk HB, but no network HB, DHB has rcfg 301688209, wrtcnt, 3360735, LATS 5085904, lastSeqNo 3360732, uniqueness 1406193385, timestamp 1407307504/1113864974


解决:

--查看网络连接,发现问题
从其它节点ping故障节点的私网IP地址,发现是ping不通的。

初步确定是网络原因,可能是网卡的问题,后来发现是有故障节点的有一根私有网卡上的一根网线没有插上,插上就重新启动crs就没问题了。


参考:

http://sqlsewer.blogspot.com/2013/07/oracle-crs-is-not-starting-has-disk-hb.html

http://t.askmaclean.com/thread-3709-1-1.html

0 0
原创粉丝点击