11.2.0.4添加节点时遇到ORA-12547: TNS:lost contact

来源:互联网 发布:身份证照片软件 编辑:程序博客网 时间:2024/06/05 16:48

环境描述:
11.2.0.4的2个节点rac,RHEL 6 Update 5

[root@rac2 ~]# uname -aLinux rac2 2.6.32-431.el6.x86_64 #1 SMP Sun Nov 10 22:19:54 EST 2013 x86_64 x86_64 x86_64 GNU/Linux[root@rac2 ~]# uname -r2.6.32-431.el6.x86_64
[oracle@rac2 ~]$ cat /etc/hosts127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4::1         localhost localhost.localdomain localhost6 localhost6.localdomain6192.168.188.18  rac1192.168.188.19  rac2192.168.188.20  rac3192.168.188.118  rac1-vip192.168.188.119  rac2-vip192.168.188.120  rac3-vip192.168.182.18    rac1-priv192.168.182.19    rac2-priv192.168.182.20    rac3-priv192.168.188.105   scan[oracle@rac2 ~]$ 

在添加第三个节点的dbca时遇到如下报错,然后第三个db instance添加不成功

/u01/app/11.2.0/grid/log/rac3/agent/crsd/oraagent_oracle/oraagent_oracle.log 的部分报错如下:

2015-09-10 01:38:21.978: [ora.orcl.db][3571566336]{1:28142:484} [start] crsHome = /u01/app/11.2.0/grid2015-09-10 01:38:21.978: [ora.orcl.db][3571566336]{1:28142:484} [start] oracleHome = /u02/app/oracle/product/11.2.0/dbhome_12015-09-10 01:38:21.978: [ora.orcl.db][3571566336]{1:28142:484} [start] command = '/u01/app/11.2.0/grid/bin/setasmgidwrap oracle_binary_path=/u02/app/oracle/product/11.2.0/dbhome_1/bin/oracle'2015-09-10 01:38:21.979: [ora.orcl.db][3571566336]{1:28142:484} [start] start dependency = hard(ora.DATA.dg) weak(type:ora.listener.type,global:type:ora.scan_listener.type,uniform:ora.ons,global:ora.gns,ora.FRA.dg) pullup(ora.DATA.dg)2015-09-10 01:38:21.979: [ora.orcl.db][3571566336]{1:28142:484} [start] ASM disk group dependency found2015-09-10 01:38:21.979: [ora.orcl.db][3571566336]{1:28142:484} [start] Utils:execCmd action = 1 flags = 6 ohome = /u01/app/11.2.0/grid cmdname = setasmgidwrap.2015-09-10 01:38:23.937: [    AGFW][3567363840]{1:28142:484} Agent received the message: RESOURCE_MODIFY_ATTR[ora.orcl.db 3 1] ID 4355:6712015-09-10 01:38:50.992: [ora.orcl.db][3571566336]{1:28142:484} [start] execCmd ret = 02015-09-10 01:38:50.992: [ USRTHRD][3571566336]{1:28142:484} InstConnection::initMutex AttachLock 00ae3210 DetachLock 00ae32282015-09-10 01:38:50.994: [ora.orcl.db][3571566336]{1:28142:484} [start] clsnInstConnection::makeConnectStr UsrOraEnv  m_oracleHome /u02/app/oracle/product/11.2.0/dbhome_1 Crshome /u01/app/11.2.0/grid2015-09-10 01:38:50.994: [ora.orcl.db][3571566336]{1:28142:484} [start] makeConnectStr = (DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/u02/app/oracle/product/11.2.0/dbhome_1/bin/oracle)(ARGV0=oracleorcl3)(ENVS='ORACLE_HOME=/u02/app/oracle/product/11.2.0/dbhome_1,ORACLE_SID=orcl3,LD_LIBRARY_PATH=')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))'))(CONNECT_DATA=(SID=orcl3)))2015-09-10 01:38:51.223: [ora.orcl.db][3571566336]{1:28142:484} [start] Container:start oracle home /u02/app/oracle/product/11.2.0/dbhome_12015-09-10 01:38:51.224: [ora.orcl.db][3571566336]{1:28142:484} [start] InstConnection::connectInt: server not attached2015-09-10 01:38:52.996: [ora.orcl.db][3571566336]{1:28142:484} [start] ORA-12547: TNS:lost contact2015-09-10 01:38:53.030: [ora.orcl.db][3571566336]{1:28142:484} [start] InstConnection::connectInt (1) Exception OCIException2015-09-10 01:38:53.032: [ora.orcl.db][3571566336]{1:28142:484} [start] InstConnection:connect:excp OCIException OCI error 125472015-09-10 01:38:53.033: [ora.orcl.db][3571566336]{1:28142:484} [start] InstConnection::connectInt: server not attached2015-09-10 01:38:53.712: [ora.orcl.db][3571566336]{1:28142:484} [start] ORA-12547: TNS:lost contact2015-09-10 01:38:53.713: [ora.orcl.db][3571566336]{1:28142:484} [start] InstConnection::connectInt (1) Exception OCIException2015-09-10 01:38:53.713: [ora.orcl.db][3571566336]{1:28142:484} [start] InstAgent::start: 1 errcode 125472015-09-10 01:38:53.713: [ora.orcl.db][3571566336]{1:28142:484} [start] ConnectionPool::resetConnection  s_statusOfConnectionMap 00ae97602015-09-10 01:38:53.713: [ora.orcl.db][3571566336]{1:28142:484} [start] ConnectionPool::resetConnection sid orcl3 status  22015-09-10 01:38:53.713: [ora.orcl.db][3571566336]{1:28142:484} [start] Gimh::check OH /u02/app/oracle/product/11.2.0/dbhome_1 SID orcl32015-09-10 01:38:53.754: [ora.orcl.db][3571566336]{1:28142:484} [start] GIMH: GIM-00104: Health check failed to connect to instance.GIM-00090: OS-dependent operation:open failed with status: 2GIM-00091: OS failure message: No such file or directoryGIM-00092: OS failure occurred at: sskgmsmr_72015-09-10 01:38:53.754: [ora.orcl.db][3571566336]{1:28142:484} [start] (:CLSN00007:)DbAgent::check failed gimh state 02015-09-10 01:38:53.763: [ora.orcl.db][3571566336]{1:28142:484} [start] clsnDbAgent:checkCbk clsagfw_res_status ret 52015-09-10 01:38:53.763: [ora.orcl.db][3571566336]{1:28142:484} [start] ConnectionPool::stopConnection2015-09-10 01:38:53.763: [ora.orcl.db][3571566336]{1:28142:484} [start] ConnectionPool::removeConnection connection count 02015-09-10 01:38:53.763: [ora.orcl.db][3571566336]{1:28142:484} [start] ConnectionPool::removeConnection freed 02015-09-10 01:38:53.763: [ora.orcl.db][3571566336]{1:28142:484} [start] ConnectionPool::stopConnection sid orcl3 status  12015-09-10 01:38:53.763: [ora.orcl.db][3571566336]{1:28142:484} [start] InstAgent::check 1 prev clsagfw_res_status 0 current clsagfw_res_status 52015-09-10 01:38:53.764: [ora.orcl.db][3571566336]{1:28142:484} [start] InstAgent::start not  logged on check state details Abnormal Termination2015-09-10 01:38:53.764: [ora.orcl.db][3571566336]{1:28142:484} [start] InstAgent::start: ORA-1012 or Lost Contact try cleanOracleIpc and start force2015-09-10 01:38:53.764: [ USRTHRD][3571566336]{1:28142:484} InstConnection:~InstConnection: this b00070c02015-09-10 01:38:53.766: [ora.orcl.db][3571566336]{1:28142:484} [start] InstAgent::start call sysresv2015-09-10 01:38:53.766: [ora.orcl.db][3571566336]{1:28142:484} [start] Container:start scls_clean_oracle_ipc Container orcl3 dbHome /u02/app/oracle/product/11.2.0/dbhome_1

用如上的报错,到mos上搜索,不过没啥有价值的东西。
于是就改变策略,用sqlplus / as sysdba 登陆看看有啥报错:

[oracle@rac3 oracle]$ sqlplus / as sysdbaSQL*Plus: Release 11.2.0.4.0 Production on Thu Sep 10 12:09:13 2015Copyright (c) 1982, 2013, Oracle.  All rights reserved.ERROR:ORA-12547: TNS:lost contactEnter user-name: ERROR:ORA-12547: TNS:lost contactEnter user-name: ERROR:ORA-12547: TNS:lost contactSP2-0157: unable to CONNECT to ORACLE after 3 attempts, exiting SQL*Plus[oracle@rac3 oracle]$ 

在mos文章SYSDBA Connections Fail With ORA-12547 Error (文档 ID 782276.1)的提示下,
在 $ORACLE_HOME/rdbms/log下,找到了很多trc文件,其内容截取如下:

----此时你也许又疑问,到bdump下看看?其实此时instance尚未建立,是没有bdump目录的。

[oracle@rac3 log]$ more orcl3_ora_14292.trcDump file /u02/app/oracle/product/11.2.0/dbhome_1/rdbms/log/orcl3_ora_14292.trcOracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit ProductionWith the Partitioning, Real Application Clusters, OLAP, Data Miningand Real Application Testing optionsORACLE_HOME = /u02/app/oracle/product/11.2.0/dbhome_1System name:    LinuxNode name:      rac3Release:        2.6.32-431.el6.x86_64Version:        #1 SMP Sun Nov 10 22:19:54 EST 2013Machine:        x86_64Instance name: orcl3Redo thread mounted by this instance: 0 <none>Oracle process number: 0Unix process pid: 14292, image: oracle@rac3*** 2015-09-10 11:32:38.641dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3, mask=0x0)----- Error Stack Dump -----ORA-00600: internal error code, arguments: [spstp: ORACLE_HOME uid does not match euid], [500], [1200], [], [], [], [], [], [], [], [], []----- SQL Statement (None) -----Current SQL information unavailable - no SGA.----- Call Stack Trace -----calling              call     entry                argument values in hex      location             type     point                (? means dubious value)     -------------------- -------- -------------------- ----------------------------skdstdst()+41        call     kgdsdst()            000000000 ? 000000000 ?                                                   7FFFB8AFF650 ? 7FFFB8AFF728 ?                                                   7FFFB8B041D0 ? 000000002 ?ksedst1()+103        call     skdstdst()           000000000 ? 000000000 ?                                                   7FFFB8AFF650 ? 7FFFB8AFF728 ?                                                   7FFFB8B041D0 ? 000000002 ?

发现了比较关键的报错:

spstp: ORACLE_HOME uid does not match euid], [500], [1200], [], [], [], [], [], [], [], [], []

到mos上搜索到了文章ORA-600 [spstp: ORACLE_HOME uid does not match euid] When Changing Permissions On $ORACLE_HOME/bin/oracle (文档 ID 747456.1)
得到如下的信息:该报错中的500是uid,而1200是euid

于是就去检查该节点上的oracle用户和grid用户的id信息,如下:

[oracle@rac3 oracle]$ id oracleuid=1200(oracle) gid=1000(oinstall) groups=1000(oinstall),1200(dba),1201(oper),1300(asmdba)[oracle@rac3 oracle]$ id griduid=1100(grid) gid=1000(oinstall) groups=1000(oinstall),1200(dba),1100(asmadmin),1301(asmoper),1300(asmdba)[oracle@rac3 oracle]$ 

上面输出中没有500.那500是从哪里来的?继续检查ORACLE_DB_HOME的属主,发现了问题:

[oracle@rac3 ~]$ pwd/home/oracle[oracle@rac3 ~]$ cd /u02/app/oracle/product/11.2.0/[oracle@rac3 11.2.0]$ ls -lrttotal 4drwxrwxr-x 74 500 oinstall 4096 Sep 10 01:12 dbhome_1[oracle@rac3 11.2.0]$ cd ..[oracle@rac3 product]$ ls -lrttotal 4drwxrwxr-x 3 500 oinstall 4096 Sep  9 21:46 11.2.0[oracle@rac3 product]$ cd ..[oracle@rac3 oracle]$ ls -lrttotal 12drwxrwxr-x 3    500 oinstall 4096 Sep  9 21:36 product  --------->此出product的属主是500,问题得到定位drwxr-xr-x 3 oracle oinstall 4096 Sep 10 01:37 cfgtoollogsdrwxr-xr-x 3 oracle oinstall 4096 Sep 10 11:31 admin[oracle@rac3 oracle]$ pwd/u02/app/oracle[oracle@rac3 oracle]$ 

                                                  
改变属主为oracle之后,再添加节点就没问题了。

总结一下:/u02/app/oracle/product的属主之所以会显示500,是因为rac3主机oracle用户一开始的uid是500,而其他两个节点上oracle用户的uid是1200.大家知道,rac节点的uid不一致的话,是不行的。于是就修改rac3上的uid,结果/u02/app/oracle/product的属主没改,就开始加节点。后续的就不说了。。
 

0 0
原创粉丝点击