卸载11g R2 RAC 后遗留的init.ohasd引起10g CRS安装故障

来源:互联网 发布:mac自带的五笔输入法 编辑:程序博客网 时间:2024/04/27 02:09

SUSE平台上的oracle 10g RAC升级11g失败后,需要重新恢复10g的RAC环境。首先需要使用de-install工具卸载11g版本的grid及rdbms产品,正常结束后直接清空11g的grid及base目录。

但在安装10g版本CRS时,执行"$ORA_CRS_HOME/root.sh”时,提示"Waiting for the Oracle CRSD and EVMD to start”,详细信息如下:


[root@dwdb1 ~]# /u01/app/oracle/oraInventory/orainstRoot.shChanging permissions of /u01/app/oracle/oraInventory to 770.Changing groupname of /u01/app/oracle/oraInventory to oinstall.The execution of the script is complete[root@dwdb1 ~]# /u01/app/oracle/10gR2/crs/root.shWARNING: directory '/u01/app/oracle/10gR2' is not owned by rootWARNING: directory '/u01/app/oracle' is not owned by rootWARNING: directory '/u01/app' is not owned by rootWARNING: directory '/u01' is not owned by rootChecking to see if Oracle CRS stack is already configured/etc/oracle does not exist. Creating it now.Setting the permissions on OCR backup directorySetting up NS directoriesOracle Cluster Registry configuration upgraded successfullyWARNING: directory '/u01/app/oracle/10gR2' is not owned by rootWARNING: directory '/u01/app/oracle' is not owned by rootWARNING: directory '/u01/app' is not owned by rootWARNING: directory '/u01' is not owned by rootSuccessfully accumulated necessary OCR keys.Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.node :   node 1: dwdb1 dwdb1-priv dwdb1node 2: dwdb2 dwdb2-priv dwdb2Creating OCR keys for user 'root', privgrp 'root'..Operation successful.Now formatting voting device: /dev/raw/raw2Format of 1 voting devices complete.Startup will be queued to init within 30 seconds.Adding daemons to inittabExpecting the CRS daemons to be up within 600 seconds.CSS is active on these nodes.        dwdb1        dwdb2CSS is active on all nodes.

Waiting for the Oracle CRSD and EVMD to start

Waiting for the Oracle CRSD and EVMD to start

Waiting for the Oracle CRSD and EVMD to start

Waiting for the Oracle CRSD and EVMD to start

查看crs的进程状态,evmd进程不存在,如下

# ps -ef|grep d.binroot     10990 10364  0 15:03 ?        00:00:00 /u01/app/oracle/10gR2/crs/bin/crsd.bin restartroot     11485 11024  0 15:03 ?        00:00:00 /u01/app/oracle/10gR2/crs/bin/oprocd.bin run -t 1000 -m 500oracle   11609 11090  0 15:03 ?        00:00:00 /u01/app/oracle/10gR2/crs/bin/ocssd.binroot     15875  8055  0 15:13 pts/0    00:00:00 grep d.bin


查看crs的alert.log,crsd.log都无法定位问题原因。参照metalink上的文档重新clean up后再次安装,问题依旧。oracle10gRAC安装过多遍了,同一套环境同一个安装流程,升完11g失败后死活就装不上。

检查系统日志/var/log/messages发现了蛛丝马迹,多次执行root.sh失败的都出现过"init: /etc/inittab[56]: duplicate ID field "h1"”,详细信息如下:

ay  7 15:00:38 dwdb1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.8714.May  7 15:00:38 dwdb1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.8776.May  7 15:00:47 dwdb1 logger: autorun file for ohasd is missingMay  7 15:01:27 dwdb1 last message repeated 4 timesMay  7 15:01:37 dwdb1 logger: autorun file for ohasd is missingMay  7 15:01:38 dwdb1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.8776.May  7 15:01:38 dwdb1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.8714.May  7 15:01:47 dwdb1 logger: autorun file for ohasd is missingMay  7 15:02:27 dwdb1 last message repeated 4 timesMay  7 15:02:37 dwdb1 logger: autorun file for ohasd is missingMay  7 15:02:38 dwdb1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.8776.May  7 15:02:38 dwdb1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.8714.May  7 15:02:47 dwdb1 logger: autorun file for ohasd is missingMay  7 15:03:27 dwdb1 last message repeated 4 timesMay  7 15:03:37 dwdb1 logger: autorun file for ohasd is missingMay  7 15:03:39 dwdb1 logger: Cluster Ready Services completed waiting on dependencies.May  7 15:03:39 dwdb1 logger: Cluster Ready Services completed waiting on dependencies.May  7 15:03:39 dwdb1 logger: Running CRSD with TZ =May  7 15:03:39 dwdb1 logger: Oracle CSS Family monitor starting.May  7 15:03:40 dwdb1 logger: Filesystem containing /etc/oracle/scls_scr/dwdb1/root/cssrun vanished.May  7 15:03:40 dwdb1 logger: Unpredictable behavior from Oracle CRS may ensue.May  7 15:03:45 dwdb1 root: Oracle Cluster Ready Services starting by user request.May  7 15:03:45 dwdb1 root: Cluster Ready Services completed waiting on dependencies.May  7 15:03:45 dwdb1 init: Re-reading inittabMay  7 15:03:47 dwdb1 logger: autorun file for ohasd is missingMay  7 15:03:55 dwdb1 init: Re-reading inittabMay  7 15:03:55 dwdb1 init: /etc/inittab[56]: duplicate ID field "h1"May  7 15:03:56 dwdb1 logger: Cluster Ready Services completed waiting on dependencies.May  7 15:03:56 dwdb1 logger: Cluster Ready Services completed waiting on dependencies.May  7 15:03:56 dwdb1 logger: Running CRSD with TZ =May  7 15:03:56 dwdb1 logger: Oracle CSS Family monitor restarting.May  7 15:03:57 dwdb1 logger: autorun file for ohasd is missingMay  7 15:03:57 dwdb1 logger: Oracle CSS restart. 0, 1May  7 15:04:07 dwdb1 logger: autorun file for ohasd is missingMay  7 15:04:47 dwdb1 last message repeated 4 timesMay  7 15:05:57 dwdb1 last message repeated 7 timesMay  7 15:07:07 dwdb1 last message repeated 7 timesMay  7 15:08:17 dwdb1 last message repeated 7 timesMay  7 15:09:27 dwdb1 last message repeated 7 timesMay  7 15:10:37 dwdb1 last message repeated 7 timesMay  7 15:11:07 dwdb1 last message repeated 3 timesMay  7 15:11:13 dwdb1 sz[14772]: [root] crslog.tgz/ZMODEM: 185142 Bytes, 216293 BPSMay  7 15:11:17 dwdb1 logger: autorun file for ohasd is missingMay  7 15:11:57 dwdb1 last message repeated 4 times



核对/etc/inittab文件发现,CRS相关的部分如下

55 h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
56 h1:35:respawn:/etc/init.d/init.evmd run >/dev/null 2>&1 </dev/null
57 h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null
58 h3:35:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1 </dev/null


/etc/init.d/init.ohasd这在oracle10g及其以前版本并不存在的。检查/etc/init.d/可以发现除了10g的init.evmd,init.cssd,init.crsd及init.css外,还有11g特有的init.ohasd也存在该路径下。

更多init.ohasd的信息可以学习oracle官方文档,ohasd.bin是oracle11g新引进的集群组件Oracle High Availability Services的在linux|AIX下的守护进程。

11g的deinstall工具并没有把添加到初始化项的/etc/inittab的ohasd相关信息清除,从而造成安装10g版本crs执行root.sh时,因为无法初始化ohasd.bin,影响evmd进程的启动。(具体原因还得咨询oracle工程师)

解决办法:

1.注掉/etc/inittab里"h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null”这一行。
2.建议删除"/etc/init.d/init.ohasd”。测试过程中,没有删除该文件并未影响安装。

再次执行root.sh,顺利过去。