Oracle 11.2.0.1 RAC GRID 无法启动 : Oracle High Availability Services startup failed

来源:互联网 发布:leonard susskind知乎 编辑:程序博客网 时间:2024/05/18 01:44


 

、在虚拟机上安装的11.2.0.1的RAC,之所以选择11.2.0.1,是因为public IP和Private 网段的问题。 安装实例过程中,电脑死机,重启后,CRS 无法启动。

 

[root@rac1 bin]# ./crsctlstart crs

CRS-4124: Oracle HighAvailability Services startup failed.

CRS-4000: Command Startfailed, or completed with errors.

 

[root@rac1 bin]# ps -ef|grep has

root     8081     1  0 03:14 ?        00:00:00/u01/app/grid/11.2.0/bin/ohasd.bin reboot

root     8137  4230  1 03:23 pts/0    00:00:00 grep has

[root@rac1 bin]# kill -9 8081

[root@rac1 bin]# ./crsctl start crs

CRS-4124: Oracle High Availability Servicesstartup failed.

CRS-4000: Command Start failed, orcompleted with errors.

 

查看log

[grid@rac2 rac2]$ ll

total 72

drwxr-x--- 2 grid oinstall 4096 Nov 2100:38 admin

drwxrwxr-t 4 root oinstall 4096 Nov 2100:38 agent

-rw-rw-r-- 1 rootroot     9693 Nov 21 02:26 alertrac2.log

drwxr-x--- 2 grid oinstall 4096 Nov 2100:43 client

drwxr-x--- 2 root oinstall 4096 Nov 2100:42 crsd

drwxr-x--- 2 grid oinstall 4096 Nov 2100:39 cssd

drwxr-x--- 2 root oinstall 4096 Nov 2100:41 ctssd

drwxr-x--- 2 grid oinstall 4096 Nov 2100:39 diskmon

drwxr-x--- 2 grid oinstall 4096 Nov 2100:42 evmd

drwxr-x--- 2 grid oinstall 4096 Nov 2100:38 gipcd

drwxr-x--- 2 root oinstall 4096 Nov 2100:38 gnsd

drwxr-x--- 2 grid oinstall 4096 Nov 2100:40 gpnpd

drwxr-x--- 2 grid oinstall 4096 Nov 2100:38 mdnsd

drwxr-x--- 2 root oinstall 4096 Nov 2100:39 ohasd

drwxrwxr-t 5 grid oinstall 4096 Nov 2100:38 racg

drwxr-x--- 2 grid oinstall 4096 Nov 2100:42 srvm

 

除了alertrac2.log 在宕机的时候有更新外,其他文件均无更新。到节点1重启了一下:

 

[root@rac1 client]# ll

total 124

-rw-r--r-- 1 root root       193 Nov 21 00:31 clscfg.log

-rw-rw-rw- 1 root root     28635 Nov 21 00:32 crsctl.log

-rw-r--r-- 1 root root       114 Nov 21 00:32 crsctl.trc

-rw-r--r-- 1 gridoinstall   663 Nov 21 03:08 css.log

-rw-r--r-- 1 grid oinstall  1051 Nov 21 00:28 gpnptool_11653.log

-rw-r--r-- 1 grid oinstall   114 Nov 21 00:28 gpnptool_11653.trc

-rw-r--r-- 1 grid oinstall  1461 Nov 21 00:28 gpnptool_11660.log

-rw-r--r-- 1 grid oinstall   114 Nov 21 00:28 gpnptool_11660.trc

-rw-r--r-- 1 grid oinstall   551 Nov 21 00:35 oclskd.log

-rw-r----- 1 root root      6100 Nov 21 00:27 ocrconfig_11312.log

-rw-r--r-- 1 root root      3170 Nov 21 00:31 ocrconfig_12191.log

-rw-r----- 1 root root       342 Nov 21 00:37 ocrconfig_13798.log

-rw-r--r-- 1 grid oinstall 33862 Nov 2100:45 oifcfg.log

-rw-r--r-- 1 grid oinstall   114 Nov 21 00:45 oifcfg.trc

-rw-r--r-- 1 root root      1067 Nov 21 00:36 olsnodes.log

-rw-r--r-- 1 grid oinstall   114 Nov 21 00:37 olsnodes.trc

 

--css.log 的也只有如下错误:

[root@rac1 client]# cat css.log

Oracle Database 11g Clusterware Release11.2.0.1.0 - Production Copyright 1996, 2009 Oracle. All rights reserved.

2012-11-21 03:08:22.764: [CSSCLNT][4171966208]clssscConnect: gipc request failed with 29 (0x13)

2012-11-21 03:08:22.764: [ CSSCLNT][4171966208]clsssInitNative:connect failed, rc 29

2012-11-21 03:08:28.140: [CSSCLNT][4171966208]clssscConnect: gipc request failed with 29 (0x13)

2012-11-21 03:08:28.140: [CSSCLNT][4171966208]clsssInitNative: connect failed, rc 29

2012-11-21 03:08:37.908: [CSSCLNT][4171966208]clssscConnect: gipc request failed with 29 (0x13)

2012-11-21 03:08:37.908:[ CSSCLNT][4171966208]clsssInitNative: connect failed, rc 29

 

 

根据MOS 说明:

How toTroubleshoot Grid Infrastructure Startup Issues [ID 1050908.1]

http://blog.csdn.net/tianlesoftware/article/details/6013763

 

1. ocssd is fully up

If ocssd.bin is not fully up, crsd.log will show messages like following:

2010-02-03 22:37:51.638: [CSSCLNT][1548456880]clssscConnect: gipc request failed with 29 (0x16)
2010-02-03 22:37:51.638: [ CSSCLNT][1548456880]clsssInitNative: connect failed,rc 29
2010-02-03 22:37:51.639: [  CRSRTI][1548456880] CSS is not ready. Receivedstatus 3 from CSS. Waiting for good status ..

 

 

是OCSSD 进程无法启动。那么为什么OCSS进程无法启动? 我们对ohasd进程进行strace:

 

[root@rac1 client]# ps -ef|grep has

root    12192     1  012:44 ?        00:00:00/u01/app/grid/11.2.0/bin/ohasd.bin reboot

root    12281  8085  0 13:05 pts/2    00:00:00 grep has

[root@rac1 client]# strace -p 12192 -o dave.log

Process 12192 attached - interrupt to quit

 

quit

Process 12192 detached

[root@rac1 client]#

[root@rac1 client]# ls

clscfg.log dave.log           gpnptool_11660.trc  ocrconfig_13798.log  olsnodes.trc

crsctl.log gpnptool_11653.log oclskd.log           oifcfg.log

crsctl.trc gpnptool_11653.trc ocrconfig_11312.log  oifcfg.trc

css.log    gpnptool_11660.log ocrconfig_12191.log  olsnodes.log

[root@rac1 client]# cat dave.log

open("/var/tmp/.oracle/npohasd",O_WRONLY <unfinished ...>

 

这里提示了一条很重要的信息。就是这里的文件,这个文件,我们在安装11.2.0.1的RAC时也会遇到,其应该说是11.2.0.1的一个bug。

 

参考:

Oracle 11gRAC ohasd failed to start at /u01/app/11.2.0/grid/crs/install/rootcrs.pl line443 解决方法

http://blog.csdn.net/tianlesoftware/article/details/7697366

 

 

所以在启动CRS之前,先在2个节点指定dd命令:

[root@rac1 client]# /bin/ddif=/var/tmp/.oracle/npohasd of=/dev/null bs=1024 count=1

 

然后启动,这没有问题了:

[root@rac1 bin]# ./crsctlstart crs

CRS-4123: Oracle High Availability Serviceshas been started.

 

[root@rac2 bin]# ./crsctlstart crs

CRS-4123: Oracle High Availability Serviceshas been started.

 

[root@rac2 bin]#./crsctl check crs

CRS-4638: Oracle High AvailabilityServices is online

CRS-4535: Cannot communicate with ClusterReady Services

CRS-4530: Communications failure contactingCluster Synchronization Services daemon

CRS-4534: Cannotcommunicate with Event Manager

 

[root@rac1 bin]# ./crsctlcheck crs

CRS-4638: Oracle High Availability Servicesis online

CRS-4535: Cannot communicate with ClusterReady Services

CRS-4530: Communications failure contactingCluster Synchronization Services daemon

CRS-4534: Cannot communicate with EventManager

 

 

[root@rac1 bin]# ./crsctlstart cluster -all

CRS-5702: Resource 'ora.crsd' is alreadyrunning on 'rac1'

CRS-5702: Resource 'ora.crsd' is alreadyrunning on 'rac2'

 

[root@rac1 bin]# ./crsctlcheck crs

CRS-4638: Oracle High Availability Servicesis online

CRS-4535: Cannot communicate with ClusterReady Services

CRS-4529: Cluster Synchronization Servicesis online

CRS-4533: Event Manager is online

 

[root@rac2 bin]# ./crsctlcheck crs

CRS-4638: Oracle High Availability Servicesis online

CRS-4535: Cannot communicate with ClusterReady Services

CRS-4529: Cluster Synchronization Servicesis online

CRS-4533: Event Manager is online

 

 

--查看进程,都拉起来了。注意11g的进程启动有些慢,多等一会。

[root@rac2 u01]# sh crs_stat.sh

Name                           Target     State     Host     

------------------------------ -------------------  -------  

ora.DATA.dg                    ONLINE     ONLINE    rac1     

ora.FRA.dg                     ONLINE     ONLINE    rac1     

ora.LISTENER.lsnr              ONLINE     ONLINE    rac1     

ora.LISTENER_SCAN1.lsnr        ONLINE     ONLINE    rac2      

ora.OCRVOTING.dg               ONLINE     ONLINE    rac1     

ora.asm                        ONLINE     ONLINE    rac1     

ora.dave.db                    OFFLINE    OFFLINE             

ora.eons                       ONLINE     ONLINE    rac1     

ora.gsd                        OFFLINE    OFFLINE             

ora.net1.network               ONLINE     ONLINE    rac1     

ora.oc4j                       OFFLINE    OFFLINE             

ora.ons                        ONLINE     ONLINE    rac1     

ora.rac1.ASM1.asm              ONLINE     ONLINE    rac1     

ora.rac1.LISTENER_RAC1.lsnr    ONLINE    ONLINE     rac1     

ora.rac1.gsd                   OFFLINE    OFFLINE             

ora.rac1.ons                   ONLINE     ONLINE    rac1     

ora.rac1.vip                   ONLINE     ONLINE    rac1     

ora.rac2.ASM2.asm              ONLINE     ONLINE    rac2     

ora.rac2.LISTENER_RAC2.lsnr    ONLINE    ONLINE     rac2     

ora.rac2.gsd                   OFFLINE    OFFLINE             

ora.rac2.ons                   ONLINE     ONLINE    rac2     

ora.rac2.vip                   ONLINE     ONLINE    rac2     

ora.scan1.vip                  ONLINE     ONLINE    rac2     

 

 

现在可以处理我们实例,弄好之后在升级到11.2.0.3.4. 免得每次都遇到这种问题。

 

 

 

---------------------------------------------------------------------------------------

版权所有,文章允许转载,但必须以链接方式注明源地址,否则追究法律责任!

Skype:    tianlesoftware

QQ:       tianlesoftware@gmail.com

Email:    tianlesoftware@gmail.com

Blog:     http://blog.csdn.net/tianlesoftware

Weibo:    http://weibo.com/tianlesoftware

Twitter:  http://twitter.com/tianlesoftware

Facebook: http://www.facebook.com/tianlesoftware

Linkedin: http://cn.linkedin.com/in/tianlesoftware


原创粉丝点击