IP冲突导致的ORA-12170
来源:互联网 发布:乌班图软件仓库 编辑:程序博客网 时间:2024/06/05 11:15
IP冲突导致的ORA-12170
开发反馈应用程序连不上数据库,并且应用日志中报错:
2014-03-24 09:56:03,095 [WARN] ==== http-bio-8010-exec-3: org.hibernate.util.JDBCExceptionReporter.logExceptions(77)
SQL Error: 20, SQLState: 61000
2014-03-24 09:56:03,096 [ERROR] ==== http-bio-8010-exec-3: org.hibernate.util.JDBCExceptionReporter.logExceptions(78)
The Network Adapter could not establish the connection
DB环境为RAC环境,登陆上两台服务器分别检查,发现RAC上的资源都显示为正常状态,包括instance,监听等,并且从服务器端sqlplus / as sysdba连进去坐了些switch logfile之类的操作,都没有问题。
于是尝试从我的个人电脑上Plsql连接DB,半天不响应,之后报了:ORA-12170,TNS:Connect timeout occurred。看看sqlnet.log的信息:
***********************************************************************
Fatal NI connect error 12170.
VERSION INFORMATION:
TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
Windows NT TCP/IP NT Protocol Adapter for 32-bit Windows: Version 10.2.0.1.0 - Production
Time: 24-3月 -2014 09:43:08
Tracing not turned on.
Tns error struct:
ns main err code: 12535
TNS-12535: TNS: 操作超时
ns secondary err code: 12560
nt main err code: 505
TNS-00505: 操作超时
nt secondary err code: 60
nt OS err code: 0
Client address: <unknown>
***********************************************************************
各个错误代码如下:
[oracle@bssdb02 ~]$ oerr ora 12170
12170, 00000, "TNS:Connect timeout occurred"
// *Cause: The server shut down because connection establishment or
// communication with a client failed to complete within the allotted time
// interval. This may be a result of network or system delays; or this may
// indicate that a malicious client is trying to cause a Denial of Service
// attack on the server.
// *Action: If the error occurred because of a slow network or system,
// reconfigure one or all of the parameters SQLNET.INBOUND_CONNECT_TIMEOUT,
// SQLNET.SEND_TIMEOUT, SQLNET.RECV_TIMEOUT in sqlnet.ora to larger values.
// If a malicious client is suspected, use the address in sqlnet.log to
// identify the source and restrict access. Note that logged addresses may
// not be reliable as they can be forged (e.g. in TCP/IP).
[oracle@bssdb02 ~]$ oerr ora 12535
12535, 00000, "TNS:operation timed out"
// *Cause: The requested operation could not be completed within the time out
// period.
// *Action: Look at the documentation on the secondary errors for possible
// remedy. See SQLNET.LOG to find secondary error if not provided explicitly.
// Turn on tracing to gather more information.
[oracle@bssdb02 ~]$ oerr ora 12560
12560, 00000, "TNS:protocol adapter error"
// *Cause: A generic protocol adapter error occurred.
// *Action: Check addresses used for proper protocol specification. Before
// reporting this error, look at the error stack and check for lower level
// transport errors. For further details, turn on tracing and reexecute the
// operation. Turn off tracing when the operation is complete.
[oracle@bssdb02 ~]$
很诡异,这种现象一般是服务名或者连接串配置有误,但同时多个用户突然出现这种错误,应该可以排除这方面的原因,防火墙都是关闭状态,这方面影响因素也可以排除,于是做了level 16的客户端的tns跟踪:
trace_level_client = 16
trace_file_client = client
trace_directory_client = d:\tmp
从trace内容来看,应该是在NODE1将用户连接转发到NODE2之后,用户连接在连接NODE2的时候出现问题导致的。
[24-3月 -2014 09:40:59:687] nsc2addr: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.23.135)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=RAC)(FAILOVER_MODE=(TYPE=SESSION)(METHOD=BASIC))(CID=(PROGRAM=C:\oracle\product\10.2.0\db_1\bin\sqlplus.exe)(HOST=BPIT-10003657)(USER=pengmd))))
[24-3月 -2014 09:40:59:687] nttbnd2addr: entry
[24-3月 -2014 09:40:59:687] snlinGetAddrInfo: entry
[24-3月 -2014 09:40:59:687] snlinGetAddrInfo: exit
[24-3月 -2014 09:40:59:687] nttbnd2addr: using host IP address: 10.0.23.135
[24-3月 -2014 09:40:59:687] snlinFreeAddrInfo: entry
[24-3月 -2014 09:40:59:687] snlinFreeAddrInfo: exit
[24-3月 -2014 09:40:59:687] nttbnd2addr: exit
[24-3月 -2014 09:40:59:687] nsc2addr: normal exit
[24-3月 -2014 09:40:59:688] nsopen: entry
[24-3月 -2014 09:40:59:688] nsmal: entry
[24-3月 -2014 09:40:59:688] nsmal: 736 bytes at 0x1bbf068
[24-3月 -2014 09:40:59:688] nsmal: normal exit
[24-3月 -2014 09:40:59:688] nsopenmplx: entry
[24-3月 -2014 09:40:59:688] nsmal: entry
[24-3月 -2014 09:40:59:688] nsmal: 1644 bytes at 0x1bbf350
[24-3月 -2014 09:40:59:688] nsmal: normal exit
[24-3月 -2014 09:40:59:688] nsiorini: entry
[24-3月 -2014 09:40:59:688] nsbal: entry
[24-3月 -2014 09:40:59:688] nsbgetfl: entry
[24-3月 -2014 09:40:59:688] nsbgetfl: normal exit
[24-3月 -2014 09:40:59:688] nsmal: entry
[24-3月 -2014 09:40:59:688] nsmal: 84 bytes at 0x1a2feb8
[24-3月 -2014 09:40:59:688] nsmal: normal exit
[24-3月 -2014 09:40:59:688] nsbal: normal exit
[24-3月 -2014 09:40:59:688] nsiorini: exit (0)
[24-3月 -2014 09:40:59:688] nscpxget: entry
[24-3月 -2014 09:40:59:688] nscpxget: normal exit
[24-3月 -2014 09:40:59:688] nsopenalloc_nsntx: nlhthput on mplx_ht_nsgbu:ctx=1bbf068, nsntx=1bbf350
[24-3月 -2014 09:40:59:688] nsopenmplx: normal exit
[24-3月 -2014 09:40:59:688] nsopen: opening transport...
[24-3月 -2014 09:40:59:688] nttcon: entry
[24-3月 -2014 09:40:59:688] nttcon: toc = 1
[24-3月 -2014 09:40:59:688] nttcnp: entry
[24-3月 -2014 09:40:59:688] nttcnp: creating a socket.
[24-3月 -2014 09:40:59:689] nttcnp: exit
[24-3月 -2014 09:40:59:689] nttcni: entry
[24-3月 -2014 09:40:59:689] nttcni: trying to connect to socket 288.
[24-3月 -2014 09:40:59:691] snlinGetNameInfo: entry
[24-3月 -2014 09:40:59:691] snlinGetNameInfo: Using numeric form of host's address 10.25.192.23
[24-3月 -2014 09:40:59:691] snlinGetNameInfo: exit
[24-3月 -2014 09:40:59:691] nttcni: connected on ipaddr 10.25.192.23
[24-3月 -2014 09:40:59:691] nttcni: exit
[24-3月 -2014 09:40:59:691] nttcon: NT layer TCP/IP connection has been established. <==========NODE1已经跟客户端已经成功建立了连接
[24-3月 -2014 09:40:59:691] nttcon: set TCP_NODELAY on 288
[24-3月 -2014 09:40:59:691] nttcon: exit
[24-3月 -2014 09:40:59:691] nsopen: transport is open
[24-3月 -2014 09:40:59:703] nscall: connecting...
[24-3月 -2014 09:40:59:703] nsc2addr: entry
[24-3月 -2014 09:40:59:703] nsc2addr: (address=(protocol=tcp)(host=10.0.23.136)(port=1521)) <===========NODE1将连接转到NODE2
[24-3月 -2014 09:40:59:703] nttbnd2addr: entry
[24-3月 -2014 09:40:59:703] snlinGetAddrInfo: entry
[24-3月 -2014 09:40:59:703] snlinGetAddrInfo: exit
[24-3月 -2014 09:40:59:703] nttbnd2addr: using host IP address: 10.0.23.136 <======用户发起到NODE2的连接
[24-3月 -2014 09:40:59:703] snlinFreeAddrInfo: entry
[24-3月 -2014 09:40:59:703] snlinFreeAddrInfo: exit
[24-3月 -2014 09:40:59:703] nttbnd2addr: exit
[24-3月 -2014 09:40:59:703] nsc2addr: normal exit
[24-3月 -2014 09:40:59:703] nsopen: entry
[24-3月 -2014 09:40:59:703] nsmal: entry
[24-3月 -2014 09:40:59:703] nsmal: 736 bytes at 0x1bbf068
[24-3月 -2014 09:40:59:703] nsmal: normal exit
[24-3月 -2014 09:40:59:703] nsopenmplx: entry
[24-3月 -2014 09:40:59:703] nsmal: entry
[24-3月 -2014 09:40:59:703] nsmal: 1644 bytes at 0x1bbf350
[24-3月 -2014 09:40:59:703] nsmal: normal exit
[24-3月 -2014 09:40:59:703] nsiorini: entry
[24-3月 -2014 09:40:59:703] nsbal: entry
[24-3月 -2014 09:40:59:703] nsbgetfl: entry
[24-3月 -2014 09:40:59:703] nsbgetfl: normal exit
[24-3月 -2014 09:40:59:703] nsbal: normal exit
[24-3月 -2014 09:40:59:703] nsiorini: exit (0)
[24-3月 -2014 09:40:59:703] nscpxget: entry
[24-3月 -2014 09:40:59:703] nscpxget: normal exit
[24-3月 -2014 09:40:59:703] nsopenalloc_nsntx: nlhthput on mplx_ht_nsgbu:ctx=1bbf068, nsntx=1bbf350
[24-3月 -2014 09:40:59:704] nsopenmplx: normal exit
[24-3月 -2014 09:40:59:704] nsopen: opening transport...
[24-3月 -2014 09:40:59:704] nttcon: entry
[24-3月 -2014 09:40:59:704] nttcon: toc = 1
[24-3月 -2014 09:40:59:704] nttcnp: entry
[24-3月 -2014 09:40:59:704] nttcnp: creating a socket.
[24-3月 -2014 09:40:59:704] nttcnp: exit
[24-3月 -2014 09:40:59:704] nttcni: entry
[24-3月 -2014 09:40:59:704] nttcni: trying to connect to socket 288.
[24-3月 -2014 09:41:20:711] ntt2err: entry
[24-3月 -2014 09:41:20:711] ntt2err: soc 288 error - operation=1, ntresnt[0]=505, ntresnt[1]=60, ntresnt[2]=0 <====连接失败并报错
[24-3月 -2014 09:41:20:711] ntt2err: exit
[24-3月 -2014 09:41:20:811] nttcni: exit
[24-3月 -2014 09:41:20:811] nttcon: exit
[24-3月 -2014 09:41:20:811] nserror: entry
[24-3月 -2014 09:41:20:811] nserror: nsres: id=0, op=65, ns=12535, ns2=12560; nt[0]=505, nt[1]=60, nt[2]=0; ora[0]=0, ora[1]=0, ora[2]=0
[24-3月 -2014 09:41:20:811] nsopen: unable to open transport
[24-3月 -2014 09:41:20:811] nsiocancel: entry
[24-3月 -2014 09:41:20:811] nsiofrrg: entry
[24-3月 -2014 09:41:20:811] nsbfr: entry
[24-3月 -2014 09:41:20:811] nsbaddfl: entry
[24-3月 -2014 09:41:20:811] nsbaddfl: normal exit
[24-3月 -2014 09:41:20:811] nsbfr: normal exit
[24-3月 -2014 09:41:20:811] nsiofrrg: exit
[24-3月 -2014 09:41:20:811] nsiocancel: exit
[24-3月 -2014 09:41:20:811] nsopenfree_nsntx: nlhthdel from mplx_ht_nsgbu, ctx=1bbf068 nsntx=1bbf350
[24-3月 -2014 09:41:20:811] nsiocancel: entry
[24-3月 -2014 09:41:20:812] nsiofrrg: entry
[24-3月 -2014 09:41:20:812] nsiofrrg: exit
[24-3月 -2014 09:41:20:812] nsiocancel: exit
[24-3月 -2014 09:41:20:812] nsmfr: entry
[24-3月 -2014 09:41:20:812] nsmfr: 1644 bytes at 0x1bbf350
[24-3月 -2014 09:41:20:812] nsmfr: normal exit
[24-3月 -2014 09:41:20:812] nsmfr: entry
[24-3月 -2014 09:41:20:812] nsmfr: 736 bytes at 0x1bbf068
[24-3月 -2014 09:41:20:812] nsmfr: normal exit
[24-3月 -2014 09:41:20:812] nsopen: error exit
[24-3月 -2014 09:41:20:812] nsmfr: entry
[24-3月 -2014 09:41:20:812] nsmfr: 238 bytes at 0x1bcf1d8
[24-3月 -2014 09:41:20:812] nsmfr: normal exit
[24-3月 -2014 09:41:20:812] nsmfr: entry
[24-3月 -2014 09:41:20:812] nsmfr: 212 bytes at 0x1a2fce0
[24-3月 -2014 09:41:20:812] nsmfr: normal exit
[24-3月 -2014 09:41:20:812] nladtrm: entry
[24-3月 -2014 09:41:20:812] nladtrm: exit
[24-3月 -2014 09:41:20:812] nscall: error exit
[24-3月 -2014 09:41:20:813] nioqper: error from nscall
[24-3月 -2014 09:41:20:813] nioqper: ns main err code: 12535
[24-3月 -2014 09:41:20:813] nioqper: ns (2) err code: 12560
[24-3月 -2014 09:41:20:813] nioqper: nt main err code: 505
[24-3月 -2014 09:41:20:813] nioqper: nt (2) err code: 60
[24-3月 -2014 09:41:20:813] nioqper: nt OS err code: 0
[24-3月 -2014 09:41:20:813] niomapnserror: entry
[24-3月 -2014 09:41:20:813] niqme: entry
[24-3月 -2014 09:41:20:813] niqme: reporting NS-12535 error as ORA-12535
[24-3月 -2014 09:41:20:813] niqme: exit
[24-3月 -2014 09:41:20:813] niomapnserror: exit
[24-3月 -2014 09:41:20:813] niotns: Couldn't connect, returning 12170
大致明白了到NODE2的连接有问题,但前面步骤已经对NODE2进行了检查,没发现异常状况,再看看日志,发现crsd.log持续不断的报这种错:
2014-03-22 15:19:58.318: [ CRSRES][1532291392] Attempting to start `ora.bssdb02.vip` on member `bssdb02`
2014-03-22 15:20:20.592: [ CRSAPP][1532291392] StartResource error for ora.bssdb02.vip error code = 1
2014-03-22 15:20:20.990: [ CRSRES][1532291392] Start of `ora.bssdb02.vip` on member `bssdb02` failed.
2014-03-22 15:25:21.452: [ CRSRES][1532291392] startRunnable: setting CLI values
/var/log/messages也是持续不断报错:
Mar 23 04:05:02 bssdb02 avahi-daemon[7483]: Registering new address record for 10.0.23.136 on eth2.
Mar 23 04:05:02 bssdb02 avahi-daemon[7483]: Withdrawing address record for 10.0.23.136 on eth2.
看来VIP有问题,但ifconfig显示VIP是正常的。
[root@bssdb02 ~]# ifconfig
eth1 Link encap:Ethernet HWaddr A4:BA:DB:13:EA:C2
inet addr:192.168.127.102 Bcast:192.168.127.255 Mask:255.255.255.0
inet6 addr: fe80::a6ba:dbff:fe13:eac2/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:385015102 errors:0 dropped:0 overruns:0 frame:0
TX packets:447034056 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:231813004471 (215.8 GiB) TX bytes:322103900599 (299.9 GiB)
Interrupt:146 Memory:d8000000-d8012800
eth2 Link encap:Ethernet HWaddr A4:BA:DB:13:EA:C4
inet addr:10.0.23.132 Bcast:10.0.15.255 Mask:255.255.255.0
inet6 addr: fe80::a6ba:dbff:fe13:eac4/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:362482320 errors:0 dropped:0 overruns:0 frame:0
TX packets:322680748 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:60578409873 (56.4 GiB) TX bytes:322706932324 (300.5 GiB)
Interrupt:154 Memory:da000000-da012800
eth2:1 Link encap:Ethernet HWaddr A4:BA:DB:13:EA:C4
inet addr:10.0.23.136 Bcast:10.0.15.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:154 Memory:da000000-da012800
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:107105620 errors:0 dropped:0 overruns:0 frame:0
TX packets:107105620 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:183491493683 (170.8 GiB) TX bytes:183491493683 (170.8 GiB)
[root@bssdb02 ~]#
既然客户端连接NODE2有问题,那么抓下包呢,用tcpdump配合客户端的连接,但显示到NODE2-VIP没有包过来:
[root@bssdb02 ~]# tcpdump -i eth2 -Z oracle src host 10.25.192.23 and dst host 10.0.23.136 and port 1521
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2, link-type EN10MB (Ethernet), capture size 96 bytes
0 packets captured
0 packets received by filter
0 packets dropped by kernel
修改下tcpdump设置,但从NODE2-PUBLICIP还是有网路包过来的,如下:
[root@bssdb02 ~]# tcpdump -i eth2 -Z oracle src host 10.25.192.23 and port 1521
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2, link-type EN10MB (Ethernet), capture size 96 bytes
11:47:41.951599 IP bpit-10003657.sany.com.cn.63079 > bssdb02.ncube-lm: S 1037287069:1037287069(0) win 8192 <mss 1460,nop,wscale 2,nop,nop,sackOK>
11:47:42.451180 IP bpit-10003657.sany.com.cn.63079 > bssdb02.ncube-lm: S 1037287069:1037287069(0) win 8192 <mss 1460,nop,wscale 2,nop,nop,sackOK>
11:47:42.951154 IP bpit-10003657.sany.com.cn.63079 > bssdb02.ncube-lm: S 1037287069:1037287069(0) win 8192 <mss 1460,nop,nop,sackOK>
11:48:12.823245 IP bpit-10003657.sany.com.cn.63088 > bssdb02.ncube-lm: S 3262807939:3262807939(0) win 8192 <mss 1460,nop,wscale 2,nop,nop,sackOK>
11:48:12.823833 IP bpit-10003657.sany.com.cn.63088 > bssdb02.ncube-lm: . ack 2306706224 win 16425
11:48:20.065630 IP bpit-10003657.sany.com.cn.63088 > bssdb02.ncube-lm: . ack 2 win 16425
11:48:20.066305 IP bpit-10003657.sany.com.cn.63088 > bssdb02.ncube-lm: F 0:0(0) ack 2 win 16425
问题到这基本上可以知道原因了,继而在客户端telnet NODE2-VIP 1521端口不通,但telnet NODE2-PUBLICIP 1521没问题。
于是将NODE2的VIP重启(crs_start ora..vip),但重启失败。
找到网络工程师,查了下NODE2-VIP,发现局域网内IP冲突导致这个问题。
开发反馈应用程序连不上数据库,并且应用日志中报错:
2014-03-24 09:56:03,095 [WARN] ==== http-bio-8010-exec-3: org.hibernate.util.JDBCExceptionReporter.logExceptions(77)
SQL Error: 20, SQLState: 61000
2014-03-24 09:56:03,096 [ERROR] ==== http-bio-8010-exec-3: org.hibernate.util.JDBCExceptionReporter.logExceptions(78)
The Network Adapter could not establish the connection
DB环境为RAC环境,登陆上两台服务器分别检查,发现RAC上的资源都显示为正常状态,包括instance,监听等,并且从服务器端sqlplus / as sysdba连进去坐了些switch logfile之类的操作,都没有问题。
于是尝试从我的个人电脑上Plsql连接DB,半天不响应,之后报了:ORA-12170,TNS:Connect timeout occurred。看看sqlnet.log的信息:
***********************************************************************
Fatal NI connect error 12170.
VERSION INFORMATION:
TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
Windows NT TCP/IP NT Protocol Adapter for 32-bit Windows: Version 10.2.0.1.0 - Production
Time: 24-3月 -2014 09:43:08
Tracing not turned on.
Tns error struct:
ns main err code: 12535
TNS-12535: TNS: 操作超时
ns secondary err code: 12560
nt main err code: 505
TNS-00505: 操作超时
nt secondary err code: 60
nt OS err code: 0
Client address: <unknown>
***********************************************************************
各个错误代码如下:
[oracle@bssdb02 ~]$ oerr ora 12170
12170, 00000, "TNS:Connect timeout occurred"
// *Cause: The server shut down because connection establishment or
// communication with a client failed to complete within the allotted time
// interval. This may be a result of network or system delays; or this may
// indicate that a malicious client is trying to cause a Denial of Service
// attack on the server.
// *Action: If the error occurred because of a slow network or system,
// reconfigure one or all of the parameters SQLNET.INBOUND_CONNECT_TIMEOUT,
// SQLNET.SEND_TIMEOUT, SQLNET.RECV_TIMEOUT in sqlnet.ora to larger values.
// If a malicious client is suspected, use the address in sqlnet.log to
// identify the source and restrict access. Note that logged addresses may
// not be reliable as they can be forged (e.g. in TCP/IP).
[oracle@bssdb02 ~]$ oerr ora 12535
12535, 00000, "TNS:operation timed out"
// *Cause: The requested operation could not be completed within the time out
// period.
// *Action: Look at the documentation on the secondary errors for possible
// remedy. See SQLNET.LOG to find secondary error if not provided explicitly.
// Turn on tracing to gather more information.
[oracle@bssdb02 ~]$ oerr ora 12560
12560, 00000, "TNS:protocol adapter error"
// *Cause: A generic protocol adapter error occurred.
// *Action: Check addresses used for proper protocol specification. Before
// reporting this error, look at the error stack and check for lower level
// transport errors. For further details, turn on tracing and reexecute the
// operation. Turn off tracing when the operation is complete.
[oracle@bssdb02 ~]$
很诡异,这种现象一般是服务名或者连接串配置有误,但同时多个用户突然出现这种错误,应该可以排除这方面的原因,防火墙都是关闭状态,这方面影响因素也可以排除,于是做了level 16的客户端的tns跟踪:
trace_level_client = 16
trace_file_client = client
trace_directory_client = d:\tmp
从trace内容来看,应该是在NODE1将用户连接转发到NODE2之后,用户连接在连接NODE2的时候出现问题导致的。
[24-3月 -2014 09:40:59:687] nsc2addr: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.0.23.135)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=RAC)(FAILOVER_MODE=(TYPE=SESSION)(METHOD=BASIC))(CID=(PROGRAM=C:\oracle\product\10.2.0\db_1\bin\sqlplus.exe)(HOST=BPIT-10003657)(USER=pengmd))))
[24-3月 -2014 09:40:59:687] nttbnd2addr: entry
[24-3月 -2014 09:40:59:687] snlinGetAddrInfo: entry
[24-3月 -2014 09:40:59:687] snlinGetAddrInfo: exit
[24-3月 -2014 09:40:59:687] nttbnd2addr: using host IP address: 10.0.23.135
[24-3月 -2014 09:40:59:687] snlinFreeAddrInfo: entry
[24-3月 -2014 09:40:59:687] snlinFreeAddrInfo: exit
[24-3月 -2014 09:40:59:687] nttbnd2addr: exit
[24-3月 -2014 09:40:59:687] nsc2addr: normal exit
[24-3月 -2014 09:40:59:688] nsopen: entry
[24-3月 -2014 09:40:59:688] nsmal: entry
[24-3月 -2014 09:40:59:688] nsmal: 736 bytes at 0x1bbf068
[24-3月 -2014 09:40:59:688] nsmal: normal exit
[24-3月 -2014 09:40:59:688] nsopenmplx: entry
[24-3月 -2014 09:40:59:688] nsmal: entry
[24-3月 -2014 09:40:59:688] nsmal: 1644 bytes at 0x1bbf350
[24-3月 -2014 09:40:59:688] nsmal: normal exit
[24-3月 -2014 09:40:59:688] nsiorini: entry
[24-3月 -2014 09:40:59:688] nsbal: entry
[24-3月 -2014 09:40:59:688] nsbgetfl: entry
[24-3月 -2014 09:40:59:688] nsbgetfl: normal exit
[24-3月 -2014 09:40:59:688] nsmal: entry
[24-3月 -2014 09:40:59:688] nsmal: 84 bytes at 0x1a2feb8
[24-3月 -2014 09:40:59:688] nsmal: normal exit
[24-3月 -2014 09:40:59:688] nsbal: normal exit
[24-3月 -2014 09:40:59:688] nsiorini: exit (0)
[24-3月 -2014 09:40:59:688] nscpxget: entry
[24-3月 -2014 09:40:59:688] nscpxget: normal exit
[24-3月 -2014 09:40:59:688] nsopenalloc_nsntx: nlhthput on mplx_ht_nsgbu:ctx=1bbf068, nsntx=1bbf350
[24-3月 -2014 09:40:59:688] nsopenmplx: normal exit
[24-3月 -2014 09:40:59:688] nsopen: opening transport...
[24-3月 -2014 09:40:59:688] nttcon: entry
[24-3月 -2014 09:40:59:688] nttcon: toc = 1
[24-3月 -2014 09:40:59:688] nttcnp: entry
[24-3月 -2014 09:40:59:688] nttcnp: creating a socket.
[24-3月 -2014 09:40:59:689] nttcnp: exit
[24-3月 -2014 09:40:59:689] nttcni: entry
[24-3月 -2014 09:40:59:689] nttcni: trying to connect to socket 288.
[24-3月 -2014 09:40:59:691] snlinGetNameInfo: entry
[24-3月 -2014 09:40:59:691] snlinGetNameInfo: Using numeric form of host's address 10.25.192.23
[24-3月 -2014 09:40:59:691] snlinGetNameInfo: exit
[24-3月 -2014 09:40:59:691] nttcni: connected on ipaddr 10.25.192.23
[24-3月 -2014 09:40:59:691] nttcni: exit
[24-3月 -2014 09:40:59:691] nttcon: NT layer TCP/IP connection has been established. <==========NODE1已经跟客户端已经成功建立了连接
[24-3月 -2014 09:40:59:691] nttcon: set TCP_NODELAY on 288
[24-3月 -2014 09:40:59:691] nttcon: exit
[24-3月 -2014 09:40:59:691] nsopen: transport is open
[24-3月 -2014 09:40:59:703] nscall: connecting...
[24-3月 -2014 09:40:59:703] nsc2addr: entry
[24-3月 -2014 09:40:59:703] nsc2addr: (address=(protocol=tcp)(host=10.0.23.136)(port=1521)) <===========NODE1将连接转到NODE2
[24-3月 -2014 09:40:59:703] nttbnd2addr: entry
[24-3月 -2014 09:40:59:703] snlinGetAddrInfo: entry
[24-3月 -2014 09:40:59:703] snlinGetAddrInfo: exit
[24-3月 -2014 09:40:59:703] nttbnd2addr: using host IP address: 10.0.23.136 <======用户发起到NODE2的连接
[24-3月 -2014 09:40:59:703] snlinFreeAddrInfo: entry
[24-3月 -2014 09:40:59:703] snlinFreeAddrInfo: exit
[24-3月 -2014 09:40:59:703] nttbnd2addr: exit
[24-3月 -2014 09:40:59:703] nsc2addr: normal exit
[24-3月 -2014 09:40:59:703] nsopen: entry
[24-3月 -2014 09:40:59:703] nsmal: entry
[24-3月 -2014 09:40:59:703] nsmal: 736 bytes at 0x1bbf068
[24-3月 -2014 09:40:59:703] nsmal: normal exit
[24-3月 -2014 09:40:59:703] nsopenmplx: entry
[24-3月 -2014 09:40:59:703] nsmal: entry
[24-3月 -2014 09:40:59:703] nsmal: 1644 bytes at 0x1bbf350
[24-3月 -2014 09:40:59:703] nsmal: normal exit
[24-3月 -2014 09:40:59:703] nsiorini: entry
[24-3月 -2014 09:40:59:703] nsbal: entry
[24-3月 -2014 09:40:59:703] nsbgetfl: entry
[24-3月 -2014 09:40:59:703] nsbgetfl: normal exit
[24-3月 -2014 09:40:59:703] nsbal: normal exit
[24-3月 -2014 09:40:59:703] nsiorini: exit (0)
[24-3月 -2014 09:40:59:703] nscpxget: entry
[24-3月 -2014 09:40:59:703] nscpxget: normal exit
[24-3月 -2014 09:40:59:703] nsopenalloc_nsntx: nlhthput on mplx_ht_nsgbu:ctx=1bbf068, nsntx=1bbf350
[24-3月 -2014 09:40:59:704] nsopenmplx: normal exit
[24-3月 -2014 09:40:59:704] nsopen: opening transport...
[24-3月 -2014 09:40:59:704] nttcon: entry
[24-3月 -2014 09:40:59:704] nttcon: toc = 1
[24-3月 -2014 09:40:59:704] nttcnp: entry
[24-3月 -2014 09:40:59:704] nttcnp: creating a socket.
[24-3月 -2014 09:40:59:704] nttcnp: exit
[24-3月 -2014 09:40:59:704] nttcni: entry
[24-3月 -2014 09:40:59:704] nttcni: trying to connect to socket 288.
[24-3月 -2014 09:41:20:711] ntt2err: entry
[24-3月 -2014 09:41:20:711] ntt2err: soc 288 error - operation=1, ntresnt[0]=505, ntresnt[1]=60, ntresnt[2]=0 <====连接失败并报错
[24-3月 -2014 09:41:20:711] ntt2err: exit
[24-3月 -2014 09:41:20:811] nttcni: exit
[24-3月 -2014 09:41:20:811] nttcon: exit
[24-3月 -2014 09:41:20:811] nserror: entry
[24-3月 -2014 09:41:20:811] nserror: nsres: id=0, op=65, ns=12535, ns2=12560; nt[0]=505, nt[1]=60, nt[2]=0; ora[0]=0, ora[1]=0, ora[2]=0
[24-3月 -2014 09:41:20:811] nsopen: unable to open transport
[24-3月 -2014 09:41:20:811] nsiocancel: entry
[24-3月 -2014 09:41:20:811] nsiofrrg: entry
[24-3月 -2014 09:41:20:811] nsbfr: entry
[24-3月 -2014 09:41:20:811] nsbaddfl: entry
[24-3月 -2014 09:41:20:811] nsbaddfl: normal exit
[24-3月 -2014 09:41:20:811] nsbfr: normal exit
[24-3月 -2014 09:41:20:811] nsiofrrg: exit
[24-3月 -2014 09:41:20:811] nsiocancel: exit
[24-3月 -2014 09:41:20:811] nsopenfree_nsntx: nlhthdel from mplx_ht_nsgbu, ctx=1bbf068 nsntx=1bbf350
[24-3月 -2014 09:41:20:811] nsiocancel: entry
[24-3月 -2014 09:41:20:812] nsiofrrg: entry
[24-3月 -2014 09:41:20:812] nsiofrrg: exit
[24-3月 -2014 09:41:20:812] nsiocancel: exit
[24-3月 -2014 09:41:20:812] nsmfr: entry
[24-3月 -2014 09:41:20:812] nsmfr: 1644 bytes at 0x1bbf350
[24-3月 -2014 09:41:20:812] nsmfr: normal exit
[24-3月 -2014 09:41:20:812] nsmfr: entry
[24-3月 -2014 09:41:20:812] nsmfr: 736 bytes at 0x1bbf068
[24-3月 -2014 09:41:20:812] nsmfr: normal exit
[24-3月 -2014 09:41:20:812] nsopen: error exit
[24-3月 -2014 09:41:20:812] nsmfr: entry
[24-3月 -2014 09:41:20:812] nsmfr: 238 bytes at 0x1bcf1d8
[24-3月 -2014 09:41:20:812] nsmfr: normal exit
[24-3月 -2014 09:41:20:812] nsmfr: entry
[24-3月 -2014 09:41:20:812] nsmfr: 212 bytes at 0x1a2fce0
[24-3月 -2014 09:41:20:812] nsmfr: normal exit
[24-3月 -2014 09:41:20:812] nladtrm: entry
[24-3月 -2014 09:41:20:812] nladtrm: exit
[24-3月 -2014 09:41:20:812] nscall: error exit
[24-3月 -2014 09:41:20:813] nioqper: error from nscall
[24-3月 -2014 09:41:20:813] nioqper: ns main err code: 12535
[24-3月 -2014 09:41:20:813] nioqper: ns (2) err code: 12560
[24-3月 -2014 09:41:20:813] nioqper: nt main err code: 505
[24-3月 -2014 09:41:20:813] nioqper: nt (2) err code: 60
[24-3月 -2014 09:41:20:813] nioqper: nt OS err code: 0
[24-3月 -2014 09:41:20:813] niomapnserror: entry
[24-3月 -2014 09:41:20:813] niqme: entry
[24-3月 -2014 09:41:20:813] niqme: reporting NS-12535 error as ORA-12535
[24-3月 -2014 09:41:20:813] niqme: exit
[24-3月 -2014 09:41:20:813] niomapnserror: exit
[24-3月 -2014 09:41:20:813] niotns: Couldn't connect, returning 12170
大致明白了到NODE2的连接有问题,但前面步骤已经对NODE2进行了检查,没发现异常状况,再看看日志,发现crsd.log持续不断的报这种错:
2014-03-22 15:19:58.318: [ CRSRES][1532291392] Attempting to start `ora.bssdb02.vip` on member `bssdb02`
2014-03-22 15:20:20.592: [ CRSAPP][1532291392] StartResource error for ora.bssdb02.vip error code = 1
2014-03-22 15:20:20.990: [ CRSRES][1532291392] Start of `ora.bssdb02.vip` on member `bssdb02` failed.
2014-03-22 15:25:21.452: [ CRSRES][1532291392] startRunnable: setting CLI values
/var/log/messages也是持续不断报错:
Mar 23 04:05:02 bssdb02 avahi-daemon[7483]: Registering new address record for 10.0.23.136 on eth2.
Mar 23 04:05:02 bssdb02 avahi-daemon[7483]: Withdrawing address record for 10.0.23.136 on eth2.
看来VIP有问题,但ifconfig显示VIP是正常的。
[root@bssdb02 ~]# ifconfig
eth1 Link encap:Ethernet HWaddr A4:BA:DB:13:EA:C2
inet addr:192.168.127.102 Bcast:192.168.127.255 Mask:255.255.255.0
inet6 addr: fe80::a6ba:dbff:fe13:eac2/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:385015102 errors:0 dropped:0 overruns:0 frame:0
TX packets:447034056 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:231813004471 (215.8 GiB) TX bytes:322103900599 (299.9 GiB)
Interrupt:146 Memory:d8000000-d8012800
eth2 Link encap:Ethernet HWaddr A4:BA:DB:13:EA:C4
inet addr:10.0.23.132 Bcast:10.0.15.255 Mask:255.255.255.0
inet6 addr: fe80::a6ba:dbff:fe13:eac4/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:362482320 errors:0 dropped:0 overruns:0 frame:0
TX packets:322680748 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:60578409873 (56.4 GiB) TX bytes:322706932324 (300.5 GiB)
Interrupt:154 Memory:da000000-da012800
eth2:1 Link encap:Ethernet HWaddr A4:BA:DB:13:EA:C4
inet addr:10.0.23.136 Bcast:10.0.15.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:154 Memory:da000000-da012800
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:107105620 errors:0 dropped:0 overruns:0 frame:0
TX packets:107105620 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:183491493683 (170.8 GiB) TX bytes:183491493683 (170.8 GiB)
[root@bssdb02 ~]#
既然客户端连接NODE2有问题,那么抓下包呢,用tcpdump配合客户端的连接,但显示到NODE2-VIP没有包过来:
[root@bssdb02 ~]# tcpdump -i eth2 -Z oracle src host 10.25.192.23 and dst host 10.0.23.136 and port 1521
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2, link-type EN10MB (Ethernet), capture size 96 bytes
0 packets captured
0 packets received by filter
0 packets dropped by kernel
修改下tcpdump设置,但从NODE2-PUBLICIP还是有网路包过来的,如下:
[root@bssdb02 ~]# tcpdump -i eth2 -Z oracle src host 10.25.192.23 and port 1521
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2, link-type EN10MB (Ethernet), capture size 96 bytes
11:47:41.951599 IP bpit-10003657.sany.com.cn.63079 > bssdb02.ncube-lm: S 1037287069:1037287069(0) win 8192 <mss 1460,nop,wscale 2,nop,nop,sackOK>
11:47:42.451180 IP bpit-10003657.sany.com.cn.63079 > bssdb02.ncube-lm: S 1037287069:1037287069(0) win 8192 <mss 1460,nop,wscale 2,nop,nop,sackOK>
11:47:42.951154 IP bpit-10003657.sany.com.cn.63079 > bssdb02.ncube-lm: S 1037287069:1037287069(0) win 8192 <mss 1460,nop,nop,sackOK>
11:48:12.823245 IP bpit-10003657.sany.com.cn.63088 > bssdb02.ncube-lm: S 3262807939:3262807939(0) win 8192 <mss 1460,nop,wscale 2,nop,nop,sackOK>
11:48:12.823833 IP bpit-10003657.sany.com.cn.63088 > bssdb02.ncube-lm: . ack 2306706224 win 16425
11:48:20.065630 IP bpit-10003657.sany.com.cn.63088 > bssdb02.ncube-lm: . ack 2 win 16425
11:48:20.066305 IP bpit-10003657.sany.com.cn.63088 > bssdb02.ncube-lm: F 0:0(0) ack 2 win 16425
问题到这基本上可以知道原因了,继而在客户端telnet NODE2-VIP 1521端口不通,但telnet NODE2-PUBLICIP 1521没问题。
于是将NODE2的VIP重启(crs_start ora..vip),但重启失败。
找到网络工程师,查了下NODE2-VIP,发现局域网内IP冲突导致这个问题。
2 4
- IP冲突导致的ORA-12170
- js冲突导致的问题
- 另一种导致原因导致的ORA-01658
- ORA-12170引发的ORA-3136所导致的数据库问题
- 一个服务器搭多个tomcat导致session丢失,或者同一个IP不同端口,多个应用的session会冲突解决方法
- 一个服务器搭多个tomcat导致session丢失,或者同一个IP不同端口,多个应用的session会冲突解决方法
- 一个服务器搭多个tomcat导致session丢失,或者同一个IP不同端口,多个应用的session会冲突解决方法
- 一个服务器搭多个tomcat导致session丢失,或者同一个IP不同端口,多个应用的session会冲突解决方法
- 一个服务器搭多个tomcat导致session丢失,或者同一个IP不同端口,多个应用的session会冲突解决方法
- IP冲突后的处理
- 查IP 冲突的电脑
- IP冲突的检测原理
- QQ和键盘的冲突导致蓝屏
- Windows域认证导致的ORA-01041
- UNION ALL导致的ORA-07445错误
- 一个空格导致的ORA-12516错误
- v$resource_limit导致的ORA-12537
- ORA-16179 粗心导致的小错误
- ASM在线迁移LUN遇到的问题
- hibernate的QHL查找函数
- org.springframework.beans.NotWritablePropertyException 错误
- mysql 创建、删除 索引 key 、primary key 、unique key 与index区别
- spring配置读取方式给数据迁移带来的风险
- IP冲突导致的ORA-12170
- 从1.5k到18k, 一个程序员的5年成长之路
- 推荐一下PyScripter
- UVa 11177 Fighting Against a Polygonal Monster 圆与多边形公共面积,二分
- LeetCode(Rotate List) 旋转链表
- AppWidgetProvider实现类的配置问题
- MFC menu 事件 dialog 对话框 传递 传值
- (12)html表单<form>之一
- 关于2的补码