IP冲突导致的ORA-12170

来源：互联网发布：乌班图软件仓库编辑：程序博客网时间：2024/06/05 11:15

IP冲突导致的ORA-12170
开发反馈应用程序连不上数据库，并且应用日志中报错：
2014-03-24 09:56:03,095 [WARN] ==== http-bio-8010-exec-3: org.hibernate.util.JDBCExceptionReporter.logExceptions(77)
        SQL Error: 20, SQLState: 61000

2014-03-24 09:56:03,096 [ERROR] ==== http-bio-8010-exec-3: org.hibernate.util.JDBCExceptionReporter.logExceptions(78)
        The Network Adapter could not establish the connection
DB环境为RAC环境，登陆上两台服务器分别检查，发现RAC上的资源都显示为正常状态，包括instance,监听等，并且从服务器端sqlplus / as sysdba连进去坐了些switch logfile之类的操作，都没有问题。
于是尝试从我的个人电脑上Plsql连接DB，半天不响应，之后报了：ORA-12170,TNS:Connect timeout occurred。看看sqlnet.log的信息：
***********************************************************************
Fatal NI connect error 12170.

VERSION INFORMATION:
TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
Windows NT TCP/IP NT Protocol Adapter for 32-bit Windows: Version 10.2.0.1.0 - Production
Time: 24-3月 -2014 09:43:08
Tracing not turned on.
Tns error struct:
    ns main err code: 12535
    TNS-12535: TNS: 操作超时
    ns secondary err code: 12560
    nt main err code: 505
    TNS-00505: 操作超时
    nt secondary err code: 60
    nt OS err code: 0
Client address: <unknown>

***********************************************************************
各个错误代码如下：
[oracle@bssdb02 ~]$ oerr ora 12170
12170, 00000, "TNS:Connect timeout occurred"
// *Cause: The server shut down because connection establishment or
// communication with a client failed to complete within the allotted time
// interval. This may be a result of network or system delays; or this may
// indicate that a malicious client is trying to cause a Denial of Service
// attack on the server.
// *Action: If the error occurred because of a slow network or system,
// reconfigure one or all of the parameters SQLNET.INBOUND_CONNECT_TIMEOUT,
// SQLNET.SEND_TIMEOUT, SQLNET.RECV_TIMEOUT in sqlnet.ora to larger values.
// If a malicious client is suspected, use the address in sqlnet.log to
// identify the source and restrict access. Note that logged addresses may
// not be reliable as they can be forged (e.g. in TCP/IP).
[oracle@bssdb02 ~]$ oerr ora 12535
12535, 00000, "TNS:operation timed out"
// *Cause: The requested operation could not be completed within the time out
// period.
// *Action: Look at the documentation on the secondary errors for possible
// remedy. See SQLNET.LOG to find secondary error if not provided explicitly.
// Turn on tracing to gather more information.
[oracle@bssdb02 ~]$ oerr ora 12560
12560, 00000, "TNS:protocol adapter error"
// *Cause: A generic protocol adapter error occurred.
// *Action: Check addresses used for proper protocol specification. Before
// reporting this error, look at the error stack and check for lower level
// transport errors. For further details, turn on tracing and reexecute the
// operation. Turn off tracing when the operation is complete.
[oracle@bssdb02 ~]$

很诡异，这种现象一般是服务名或者连接串配置有误，但同时多个用户突然出现这种错误，应该可以排除这方面的原因，防火墙都是关闭状态，这方面影响因素也可以排除，于是做了level 16的客户端的tns跟踪：
trace_level_client = 16
trace_file_client = client
trace_directory_client = d:\tmp

从trace内容来看，应该是在NODE1将用户连接转发到NODE2之后，用户连接在连接NODE2的时候出现问题导致的。
[24-3月 -2014 09:40:59:687] nsc2addr: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.23.135)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=RAC)(FAILOVER_MODE=(TYPE=SESSION)(METHOD=BASIC))(CID=(PROGRAM=C:\oracle\product\10.2.0\db_1\bin\sqlplus.exe)(HOST=BPIT-10003657)(USER=pengmd))))
[24-3月 -2014 09:40:59:687] nttbnd2addr: entry
[24-3月 -2014 09:40:59:687] snlinGetAddrInfo: entry
[24-3月 -2014 09:40:59:687] snlinGetAddrInfo: exit
[24-3月 -2014 09:40:59:687] nttbnd2addr: using host IP address: 10.0.23.135
[24-3月 -2014 09:40:59:687] snlinFreeAddrInfo: entry
[24-3月 -2014 09:40:59:687] snlinFreeAddrInfo: exit
[24-3月 -2014 09:40:59:687] nttbnd2addr: exit
[24-3月 -2014 09:40:59:687] nsc2addr: normal exit
[24-3月 -2014 09:40:59:688] nsopen: entry
[24-3月 -2014 09:40:59:688] nsmal: entry
[24-3月 -2014 09:40:59:688] nsmal: 736 bytes at 0x1bbf068
[24-3月 -2014 09:40:59:688] nsmal: normal exit
[24-3月 -2014 09:40:59:688] nsopenmplx: entry
[24-3月 -2014 09:40:59:688] nsmal: entry
[24-3月 -2014 09:40:59:688] nsmal: 1644 bytes at 0x1bbf350
[24-3月 -2014 09:40:59:688] nsmal: normal exit
[24-3月 -2014 09:40:59:688] nsiorini: entry
[24-3月 -2014 09:40:59:688] nsbal: entry
[24-3月 -2014 09:40:59:688] nsbgetfl: entry
[24-3月 -2014 09:40:59:688] nsbgetfl: normal exit
[24-3月 -2014 09:40:59:688] nsmal: entry
[24-3月 -2014 09:40:59:688] nsmal: 84 bytes at 0x1a2feb8
[24-3月 -2014 09:40:59:688] nsmal: normal exit
[24-3月 -2014 09:40:59:688] nsbal: normal exit
[24-3月 -2014 09:40:59:688] nsiorini: exit (0)
[24-3月 -2014 09:40:59:688] nscpxget: entry
[24-3月 -2014 09:40:59:688] nscpxget: normal exit
[24-3月 -2014 09:40:59:688] nsopenalloc_nsntx: nlhthput on mplx_ht_nsgbu:ctx=1bbf068, nsntx=1bbf350
[24-3月 -2014 09:40:59:688] nsopenmplx: normal exit
[24-3月 -2014 09:40:59:688] nsopen: opening transport...
[24-3月 -2014 09:40:59:688] nttcon: entry
[24-3月 -2014 09:40:59:688] nttcon: toc = 1
[24-3月 -2014 09:40:59:688] nttcnp: entry
[24-3月 -2014 09:40:59:688] nttcnp: creating a socket.
[24-3月 -2014 09:40:59:689] nttcnp: exit
[24-3月 -2014 09:40:59:689] nttcni: entry
[24-3月 -2014 09:40:59:689] nttcni: trying to connect to socket 288.
[24-3月 -2014 09:40:59:691] snlinGetNameInfo: entry
[24-3月 -2014 09:40:59:691] snlinGetNameInfo: Using numeric form of host's address 10.25.192.23
[24-3月 -2014 09:40:59:691] snlinGetNameInfo: exit
[24-3月 -2014 09:40:59:691] nttcni: connected on ipaddr 10.25.192.23
[24-3月 -2014 09:40:59:691] nttcni: exit
[24-3月 -2014 09:40:59:691] nttcon: NT layer TCP/IP connection has been established.   <==========NODE1已经跟客户端已经成功建立了连接
[24-3月 -2014 09:40:59:691] nttcon: set TCP_NODELAY on 288
[24-3月 -2014 09:40:59:691] nttcon: exit
[24-3月 -2014 09:40:59:691] nsopen: transport is open

[24-3月 -2014 09:40:59:703] nscall: connecting...
[24-3月 -2014 09:40:59:703] nsc2addr: entry
[24-3月 -2014 09:40:59:703] nsc2addr: (address=(protocol=tcp)(host=10.0.23.136)(port=1521)) <===========NODE1将连接转到NODE2
[24-3月 -2014 09:40:59:703] nttbnd2addr: entry
[24-3月 -2014 09:40:59:703] snlinGetAddrInfo: entry
[24-3月 -2014 09:40:59:703] snlinGetAddrInfo: exit
[24-3月 -2014 09:40:59:703] nttbnd2addr: using host IP address: 10.0.23.136             <======用户发起到NODE2的连接
[24-3月 -2014 09:40:59:703] snlinFreeAddrInfo: entry
[24-3月 -2014 09:40:59:703] snlinFreeAddrInfo: exit
[24-3月 -2014 09:40:59:703] nttbnd2addr: exit
[24-3月 -2014 09:40:59:703] nsc2addr: normal exit
[24-3月 -2014 09:40:59:703] nsopen: entry
[24-3月 -2014 09:40:59:703] nsmal: entry
[24-3月 -2014 09:40:59:703] nsmal: 736 bytes at 0x1bbf068
[24-3月 -2014 09:40:59:703] nsmal: normal exit
[24-3月 -2014 09:40:59:703] nsopenmplx: entry
[24-3月 -2014 09:40:59:703] nsmal: entry
[24-3月 -2014 09:40:59:703] nsmal: 1644 bytes at 0x1bbf350
[24-3月 -2014 09:40:59:703] nsmal: normal exit
[24-3月 -2014 09:40:59:703] nsiorini: entry
[24-3月 -2014 09:40:59:703] nsbal: entry
[24-3月 -2014 09:40:59:703] nsbgetfl: entry
[24-3月 -2014 09:40:59:703] nsbgetfl: normal exit
[24-3月 -2014 09:40:59:703] nsbal: normal exit
[24-3月 -2014 09:40:59:703] nsiorini: exit (0)
[24-3月 -2014 09:40:59:703] nscpxget: entry
[24-3月 -2014 09:40:59:703] nscpxget: normal exit
[24-3月 -2014 09:40:59:703] nsopenalloc_nsntx: nlhthput on mplx_ht_nsgbu:ctx=1bbf068, nsntx=1bbf350
[24-3月 -2014 09:40:59:704] nsopenmplx: normal exit
[24-3月 -2014 09:40:59:704] nsopen: opening transport...
[24-3月 -2014 09:40:59:704] nttcon: entry
[24-3月 -2014 09:40:59:704] nttcon: toc = 1
[24-3月 -2014 09:40:59:704] nttcnp: entry
[24-3月 -2014 09:40:59:704] nttcnp: creating a socket.
[24-3月 -2014 09:40:59:704] nttcnp: exit
[24-3月 -2014 09:40:59:704] nttcni: entry
[24-3月 -2014 09:40:59:704] nttcni: trying to connect to socket 288.
[24-3月 -2014 09:41:20:711] ntt2err: entry
[24-3月 -2014 09:41:20:711] ntt2err: soc 288 error - operation=1, ntresnt[0]=505, ntresnt[1]=60, ntresnt[2]=0 <====连接失败并报错
[24-3月 -2014 09:41:20:711] ntt2err: exit
[24-3月 -2014 09:41:20:811] nttcni: exit
[24-3月 -2014 09:41:20:811] nttcon: exit
[24-3月 -2014 09:41:20:811] nserror: entry
[24-3月 -2014 09:41:20:811] nserror: nsres: id=0, op=65, ns=12535, ns2=12560; nt[0]=505, nt[1]=60, nt[2]=0; ora[0]=0, ora[1]=0, ora[2]=0
[24-3月 -2014 09:41:20:811] nsopen: unable to open transport
[24-3月 -2014 09:41:20:811] nsiocancel: entry
[24-3月 -2014 09:41:20:811] nsiofrrg: entry
[24-3月 -2014 09:41:20:811] nsbfr: entry
[24-3月 -2014 09:41:20:811] nsbaddfl: entry
[24-3月 -2014 09:41:20:811] nsbaddfl: normal exit
[24-3月 -2014 09:41:20:811] nsbfr: normal exit
[24-3月 -2014 09:41:20:811] nsiofrrg: exit
[24-3月 -2014 09:41:20:811] nsiocancel: exit
[24-3月 -2014 09:41:20:811] nsopenfree_nsntx: nlhthdel from mplx_ht_nsgbu, ctx=1bbf068 nsntx=1bbf350
[24-3月 -2014 09:41:20:811] nsiocancel: entry
[24-3月 -2014 09:41:20:812] nsiofrrg: entry
[24-3月 -2014 09:41:20:812] nsiofrrg: exit
[24-3月 -2014 09:41:20:812] nsiocancel: exit
[24-3月 -2014 09:41:20:812] nsmfr: entry
[24-3月 -2014 09:41:20:812] nsmfr: 1644 bytes at 0x1bbf350
[24-3月 -2014 09:41:20:812] nsmfr: normal exit
[24-3月 -2014 09:41:20:812] nsmfr: entry
[24-3月 -2014 09:41:20:812] nsmfr: 736 bytes at 0x1bbf068
[24-3月 -2014 09:41:20:812] nsmfr: normal exit
[24-3月 -2014 09:41:20:812] nsopen: error exit
[24-3月 -2014 09:41:20:812] nsmfr: entry
[24-3月 -2014 09:41:20:812] nsmfr: 238 bytes at 0x1bcf1d8
[24-3月 -2014 09:41:20:812] nsmfr: normal exit
[24-3月 -2014 09:41:20:812] nsmfr: entry
[24-3月 -2014 09:41:20:812] nsmfr: 212 bytes at 0x1a2fce0
[24-3月 -2014 09:41:20:812] nsmfr: normal exit
[24-3月 -2014 09:41:20:812] nladtrm: entry
[24-3月 -2014 09:41:20:812] nladtrm: exit
[24-3月 -2014 09:41:20:812] nscall: error exit
[24-3月 -2014 09:41:20:813] nioqper: error from nscall
[24-3月 -2014 09:41:20:813] nioqper:    ns main err code: 12535
[24-3月 -2014 09:41:20:813] nioqper:    ns (2) err code: 12560
[24-3月 -2014 09:41:20:813] nioqper:    nt main err code: 505
[24-3月 -2014 09:41:20:813] nioqper:    nt (2) err code: 60
[24-3月 -2014 09:41:20:813] nioqper:    nt OS   err code: 0
[24-3月 -2014 09:41:20:813] niomapnserror: entry
[24-3月 -2014 09:41:20:813] niqme: entry
[24-3月 -2014 09:41:20:813] niqme: reporting NS-12535 error as ORA-12535
[24-3月 -2014 09:41:20:813] niqme: exit
[24-3月 -2014 09:41:20:813] niomapnserror: exit
[24-3月 -2014 09:41:20:813] niotns: Couldn't connect, returning 12170

大致明白了到NODE2的连接有问题，但前面步骤已经对NODE2进行了检查，没发现异常状况，再看看日志，发现crsd.log持续不断的报这种错：
2014-03-22 15:19:58.318: [ CRSRES][1532291392] Attempting to start `ora.bssdb02.vip` on member `bssdb02`
2014-03-22 15:20:20.592: [ CRSAPP][1532291392] StartResource error for ora.bssdb02.vip error code = 1
2014-03-22 15:20:20.990: [ CRSRES][1532291392] Start of `ora.bssdb02.vip` on member `bssdb02` failed.
2014-03-22 15:25:21.452: [ CRSRES][1532291392] startRunnable: setting CLI values
/var/log/messages也是持续不断报错:
Mar 23 04:05:02 bssdb02 avahi-daemon[7483]: Registering new address record for 10.0.23.136 on eth2.
Mar 23 04:05:02 bssdb02 avahi-daemon[7483]: Withdrawing address record for 10.0.23.136 on eth2.
看来VIP有问题，但ifconfig显示VIP是正常的。
[root@bssdb02 ~]# ifconfig
eth1      Link encap:Ethernet HWaddr A4:BA:DB:13:EA:C2
          inet addr:192.168.127.102 Bcast:192.168.127.255 Mask:255.255.255.0
          inet6 addr: fe80::a6ba:dbff:fe13:eac2/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:385015102 errors:0 dropped:0 overruns:0 frame:0
          TX packets:447034056 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:231813004471 (215.8 GiB) TX bytes:322103900599 (299.9 GiB)
          Interrupt:146 Memory:d8000000-d8012800

eth2      Link encap:Ethernet HWaddr A4:BA:DB:13:EA:C4
          inet addr:10.0.23.132 Bcast:10.0.15.255 Mask:255.255.255.0
          inet6 addr: fe80::a6ba:dbff:fe13:eac4/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:362482320 errors:0 dropped:0 overruns:0 frame:0
          TX packets:322680748 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:60578409873 (56.4 GiB) TX bytes:322706932324 (300.5 GiB)
          Interrupt:154 Memory:da000000-da012800

eth2:1    Link encap:Ethernet HWaddr A4:BA:DB:13:EA:C4
          inet addr:10.0.23.136 Bcast:10.0.15.255 Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          Interrupt:154 Memory:da000000-da012800

lo        Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:107105620 errors:0 dropped:0 overruns:0 frame:0
          TX packets:107105620 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:183491493683 (170.8 GiB) TX bytes:183491493683 (170.8 GiB)

[root@bssdb02 ~]#
既然客户端连接NODE2有问题，那么抓下包呢，用tcpdump配合客户端的连接，但显示到NODE2-VIP没有包过来：
[root@bssdb02 ~]# tcpdump -i eth2 -Z oracle src host 10.25.192.23 and dst host 10.0.23.136 and port 1521
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2, link-type EN10MB (Ethernet), capture size 96 bytes

0 packets captured
0 packets received by filter
0 packets dropped by kernel
修改下tcpdump设置，但从NODE2-PUBLICIP还是有网路包过来的，如下：
[root@bssdb02 ~]# tcpdump -i eth2 -Z oracle src host 10.25.192.23 and port 1521
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2, link-type EN10MB (Ethernet), capture size 96 bytes
11:47:41.951599 IP bpit-10003657.sany.com.cn.63079 > bssdb02.ncube-lm: S 1037287069:1037287069(0) win 8192 <mss 1460,nop,wscale 2,nop,nop,sackOK>
11:47:42.451180 IP bpit-10003657.sany.com.cn.63079 > bssdb02.ncube-lm: S 1037287069:1037287069(0) win 8192 <mss 1460,nop,wscale 2,nop,nop,sackOK>
11:47:42.951154 IP bpit-10003657.sany.com.cn.63079 > bssdb02.ncube-lm: S 1037287069:1037287069(0) win 8192 <mss 1460,nop,nop,sackOK>
11:48:12.823245 IP bpit-10003657.sany.com.cn.63088 > bssdb02.ncube-lm: S 3262807939:3262807939(0) win 8192 <mss 1460,nop,wscale 2,nop,nop,sackOK>
11:48:12.823833 IP bpit-10003657.sany.com.cn.63088 > bssdb02.ncube-lm: . ack 2306706224 win 16425
11:48:20.065630 IP bpit-10003657.sany.com.cn.63088 > bssdb02.ncube-lm: . ack 2 win 16425
11:48:20.066305 IP bpit-10003657.sany.com.cn.63088 > bssdb02.ncube-lm: F 0:0(0) ack 2 win 16425

问题到这基本上可以知道原因了，继而在客户端telnet NODE2-VIP 1521端口不通，但telnet NODE2-PUBLICIP 1521没问题。
于是将NODE2的VIP重启（crs_start ora..vip），但重启失败。
找到网络工程师，查了下NODE2-VIP，发现局域网内IP冲突导致这个问题。

2 4