为什么Grid Infrastructure Rebootless节点防护失败(文档 ID 1502282.1)
来源:互联网 发布:java调用函数返回值 编辑:程序博客网 时间:2024/06/11 01:59
适用于:
Oracle Server - Enterprise Edition - Version11.2.0.2 and later
Information in this document applies to anyplatform.
用途:
Rebootless防护在11.2.0.2 GridInfrastructure中引入,在驱逐发生时,它将尝试在被驱逐的节点上正常停止GI,而不是重新启动节点,以避免节点重新启动。如果重新引导防护失败,则驱逐的节点将重新启动。此文档列出了重新引导防护故障的常见原因。
详细信息:
1.资源无法停止。
如果一个或多个资源无法停止,则rebootless fencing将失败,并且将重新启动节点。
在这种情况下,在节点2脑裂后rebootless fencing失败,node2将重启:
驱逐节点<GI_HOME>/log/<node>/alert<node>.log
..
2012-09-11 12:04:34.363
[cssd(18834)]CRS-1610:Network communication with node racnode1 (1) missing for90% of timeout interval. Removal of this node from cluster in 2.020seconds
2012-09-11 12:04:36.379
[cssd(18834)]CRS-1609:This node is unable to communicate with other nodes inthe cluster and is going down to preserve cluster integrity; details at(:CSSNM00008:) in /ocw/grid/log/racnode2/cssd/ocssd.log.
2012-09-11 12:04:36.379
[cssd(18834)]CRS-1656:The CSS daemon is terminating due to a fatal error;Details at (:CSSSC00012:) in /ocw/grid/log/racnode2/cssd/ocssd.log
2012-09-11 12:04:36.399
[cssd(18834)]CRS-1652:Starting clean up of CRSD resources.
2012-09-11 12:04:36.586
[crsd(26115)]CRS-5833:Cleaning resource 'zDRMON.sh.racnode2 1 1' failed as partof reboot-less node fencing
2012-09-11 12:04:36.588
[cssd(18834)]CRS-1653:The clean up of the CRSD resources failed. ##>>user resource fails to be cleaned
2012-09-11 12:04:37.042
[ohasd(16821)]CRS-2765:Resource 'ora.evmd' has failed on server 'racnode2'.
2012-09-11 12:04:37.052
[/ocw/grid/bin/scriptagent.bin(27696)]CRS-5822:Agent'/ocw/grid/bin/scriptagent_oracle' disconnected from server. Details at(:CRSAGF00117:) {0:4:10} in/ocw/grid/log/racnode2/agent/crsd/scriptagent_oracle/scriptagent_oracle.log.
2012-09-11 12:04:37.062
[ohasd(16821)]CRS-2765:Resource 'ora.crsd' has failed on server'racnode2'. ##>>node rebooted after this message, in some cases, this message won't be there
2012-09-11 12:10:47.356
[ohasd(16677)]CRS-2112:The OLR service started on node racnode2.
2012-09-11 12:10:47.521
[ohasd(16677)]CRS-1301:Oracle High Availability Service started on noderacnode2.
2012-09-11 12:10:47.539
[ohasd(16677)]CRS-8011:reboot advisory message from host: racnode2,component: cssagent, with time stamp: L-2012-09-11-12:04:37.140 ##>>reboot advisory shows both cssdagent and cssdmonitor took the action to reboot
[ohasd(16677)]CRS-8013:reboot advisory message text: clsnomon_status: needto reboot, unexpected failure 8 received from CSS
2012-09-11 12:10:47.594
[ohasd(16677)]CRS-8011:reboot advisory message from host: racnode2, component:cssmonit, with time stamp: L-2012-09-11-12:04:37.139
[ohasd(16677)]CRS-8013:reboot advisory message text: clsnomon_status: need toreboot, unexpected failure 8 received from CSS
2012-09-11 12:10:47.605
[ohasd(16677)]CRS-8017:location: /etc/oracle/lastgasp has 2 reboot advisory logfiles, 2 were announced and 0 errors occurred
当资源无法停止时,cssdagent或cssdmonitor或两者都将尝试重新引导节点,以下是样本日志。
<GI_HOME>/agent/ohasd/oracssdmonitor_root/oracssdmonitor_root.log
2012-09-11 12:04:36.400: [ USRTHRD][1095805248]clsnpollmsg_main: got posted
2012-09-11 12:04:36.400: [ USRTHRD][1095805248] clsnpollmsg_main: shutdowninitiated by CSS, requested to sync
2012-09-11 12:04:36.400: [ USRTHRD][1095805248] clsnwork_queue: posting workerthread
2012-09-11 12:04:36.400: [ USRTHRD][1095805248] clsnpollmsg_main: exiting checkloop
2012-09-11 12:04:36.400: [ USRTHRD][1095805248] clsnpollmsg_main: got HB signal
2012-09-11 12:04:36.400: [ USRTHRD][1097382208] clsnwork_process_work: callingsync
2012-09-11 12:04:36.413: [ USRTHRD][1097382208] clsnwork_process_work: synccompleted
2012-09-11 12:04:37.035: [ CSSCLNT][1095805248]clsssRecvMsg: got a disconnectfrom the server while waiting for message type 22
2012-09-11 12:04:37.035: [ CSSCLNT][1098959168]clsssRecvMsg: got a disconnectfrom the server while waiting for message type 27
2012-09-11 12:04:37.035: [ USRTHRD][1095805248] clsnwork_queue: posting workerthread
2012-09-11 12:04:37.035: [ USRTHRD][1095805248] clsnpollmsg_main: exiting checkloop
2012-09-11 12:04:37.035: [GIPCXCPT][1098959168]gipcInternalSend: connection notvalid for send operation endp 0x8e3e60 [00000000000001b7] { gipcEndpoint :localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=3165a05b-7e7139a5-18801))',remoteAddr'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_racnode2_)(GIPCID=7e7139a5-3165a05b-18834))',numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0,pidPeer 18834, flags 0x3861e, usrFlags 0x20010 }, ret gipcretConnectionLost(12)
2012-09-11 12:04:37.035: [ USRTHRD][1097382208] clsnwork_process_work: callingsync
2012-09-11 12:04:37.035: [ CSSCLNT][1077418304]clsssRecvMsg: got a disconnectfrom the server while waiting for message type 1
2012-09-11 12:04:37.036: [ CSSCLNT][1077418304]clssgsGroupGetStatus: communications failed (0/3/-1)
2012-09-11 12:04:37.036: [CSSCLNT][1077418304]clssgsGroupGetStatus: returning 8
2012-09-11 12:04:37.036: [ USRTHRD][1077418304]clsnomon_status: Communications failure with CSS detected. Waiting for sync tocomplete...
2012-09-11 12:04:37.036: [GIPCXCPT][1098959168]gipcSendSyncF [clsssServerRPC :clsss.c : 6272]: EXCEPTION[ ret gipcretConnectionLost (12) ] failed tosend on endp 0x8e3e60 [00000000000001b7] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=3165a05b-7e7139a5-18801))',remoteAddr'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_racnode2_)(GIPCID=7e7139a5-3165a05b-18834))',numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0,pidPeer 18834, flags 0x3861e, usrFlags 0x20010 }, addr 0000000000000000, buf0x4180bd80, len 80, flags 0x8000000
2012-09-11 12:04:37.036: [ CSSCLNT][1098959168]clsssServerRPC: send failed witherr 12, msg type 7
2012-09-11 12:04:37.036: [CSSCLNT][1098959168]clsssCommonClientExit: RPC failure, rc 3
2012-09-11 12:04:37.139: [ USRTHRD][1097382208]clsnwork_process_work: sync completed
2012-09-11 12:04:37.139: [ USRTHRD][1097382208] clsnSyncComplete: posting omon
<GI_HOME>/agent/ohasd/oracssdagent_root/oracssdagent_root.log
2012-09-11 12:04:36.400: [ USRTHRD][1095805248]clsnpollmsg_main: got posted
2012-09-11 12:04:36.400: [ USRTHRD][1095805248] clsnpollmsg_main: shutdowninitiated by CSS, requested to sync
2012-09-11 12:04:36.400: [ USRTHRD][1095805248] clsnwork_queue: posting workerthread
2012-09-11 12:04:36.400: [ USRTHRD][1095805248] clsnpollmsg_main: exiting checkloop
2012-09-11 12:04:36.400: [ USRTHRD][1095805248] clsnpollmsg_main: got HB signal
2012-09-11 12:04:36.400: [ USRTHRD][1097382208] clsnwork_process_work: callingsync
2012-09-11 12:04:36.413: [ USRTHRD][1097382208] clsnwork_process_work: synccompleted
2012-09-11 12:04:37.035: [ CSSCLNT][1098959168]clsssRecvMsg: got a disconnectfrom the server while waiting for message type 27
2012-09-11 12:04:37.035: [ CSSCLNT][1095805248]clsssRecvMsg: got a disconnectfrom the server while waiting for message type 22
2012-09-11 12:04:37.035: [GIPCXCPT][1098959168]gipcInternalSend: connection notvalid for send operation endp 0x2aaab4014900 [00000000000001c0] { gipcEndpoint :localAddr'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=561e3f6b-a0a3602e-18817))',remoteAddr'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_racnode2_)(GIPCID=a0a3602e-561e3f6b-18834))',numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0,pidPeer 18834, flags 0x3861e, usrFlags 0x20010 }, ret gipcretConnectionLost(12)
2012-09-11 12:04:37.035: [ USRTHRD][1095805248] clsnwork_queue: posting workerthread
2012-09-11 12:04:37.035: [ USRTHRD][1095805248] clsnpollmsg_main: exiting checkloop
2012-09-11 12:04:37.035: [GIPCXCPT][1098959168]gipcSendSyncF [clsssServerRPC :clsss.c : 6272]: EXCEPTION[ ret gipcretConnectionLost (12) ] failed tosend on endp 0x2aaab4014900 [00000000000001c0] { gipcEndpoint : localAddr'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=561e3f6b-a0a3602e-18817))',remoteAddr'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_racnode2_)(GIPCID=a0a3602e-561e3f6b-18834))',numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0,pidPeer 18834, flags 0x3861e, usrFlags 0x20010 }, addr 0000000000000000, buf0x4180bd80, len 80, flags 0x8000000
2012-09-11 12:04:37.035: [ CSSCLNT][1098959168]clsssServerRPC: send failed witherr 12, msg type 7
2012-09-11 12:04:37.035: [CSSCLNT][1098959168]clsssCommonClientExit: RPC failure, rc 3
2012-09-11 12:04:37.036: [CSSCLNT][1077418304]clsssRecvMsg: got a disconnect from the server whilewaiting for message type 1
2012-09-11 12:04:37.036: [ CSSCLNT][1077418304]clssgsGroupGetStatus: communications failed (0/3/-1)
2012-09-11 12:04:37.036: [CSSCLNT][1077418304]clssgsGroupGetStatus: returning 8
2012-09-11 12:04:37.036: [ USRTHRD][1077418304]clsnomon_status: Communications failure with CSS detected. Waiting for sync tocomplete...
2012-09-11 12:04:37.036: [ USRTHRD][1097382208] clsnwork_process_work: callingsync
由于CRSD资源(用户资源)无法停止,crsd.log可以作为进一步调试的起点。
- 为什么Grid Infrastructure Rebootless节点防护失败(文档 ID 1502282.1)
- 诊断 Grid Infrastructure 启动问题 (文档 ID 1623340.1)
- 诊断 Grid Infrastructure 启动问题 (文档 ID 1623340.1)
- 诊断 Grid Infrastructure 启动问题 (文档 ID 1623340.1)
- Grid Infrastructure 启动的五大问题 (文档 ID 1526147.1)
- 诊断 Grid Infrastructure 启动问题 (文档 ID 1623340.1)
- Grid Infrastructure Single Client Access Name (SCAN) Explained (文档 ID 887522.1)
- Pre 11.2 Database Issues in 11gR2 Grid Infrastructure Environment (文档 ID 948456.1)
- Grid Infrastructure Single Client Access Name (SCAN) Explained (文档 ID 887522.1)
- FAQ: 12c Grid Infrastructure Management Repository (GIMR) (文档 ID 1568402.1)
- Oracle Grid Infrastructure: Understanding Split-Brain Node Eviction (文档 ID 1546004.1)
- Troubleshoot Grid Infrastructure Startup Issues [ID 1050908.1]
- 在11gR2 Grid Infrastructure过程中出现 INS-40901 或 INS-40937 错误 (文档 ID 1056223.1)
- Srvctl Does Not Work After the Grid infrastructure upgrade from 11gr2 to 12c (文档 ID 2132856.1)
- 【翻译自mos文章】为什么GI 的 Rebootless Fencing 会失败?
- How to Troubleshoot Grid Infrastructure Startup Issues [ID 1050908.1]
- How to Troubleshoot Grid Infrastructure Startup Issues [ID 1050908.1]
- How to Troubleshoot Grid Infrastructure Startup Issues [ID 1050908.1]
- Data Guard 实时应用常见问题 (文档 ID 828274.1)
- 51单片机的TXD、 RXD 既接了 232 又接了 485芯片 ,会导致通信失败!
- 批量
- Linux用户和用户组
- SpringBoot快速入门(非maven)
- 为什么Grid Infrastructure Rebootless节点防护失败(文档 ID 1502282.1)
- 基础(可执行文件elf)
- java-字符串查找
- 【Hibernate】多对一映射
- 一个梯度下降算法的例子
- Linux切换用户
- 谈谈我对Android View事件分发的理解
- Hangman Judge(自顶而下的程序设计方法)(UVa 489)
- Java:List和Array相互转换