监听连接时间过长的解决

来源:互联网 发布:淘宝子账号设置权重 编辑:程序博客网 时间:2024/05/01 19:19

一、连接的整个流程

1、A.客户端:(1)A-->B 发起连接  (9)与Server Process交互,完成连接


 2、B.监听进程:(2)B-->C fork子进程并等待 (7)B-->D 传送客户端信息


 3、C.监听子进程1:(3) C-->D fork子进程 (4)C-->B 子进程结束


 4、D.子进程2(Server Process) (5)D-->D exec Oracle (6)D-->B 监听传送数据 (8)D-->A 与客户端交互

 

二、整个监听过程的处理流程如下几步:

利用操作系统工具跟踪:

strace -rf-o /gyj/lsnr.log -p 4913

1、监听接受客户端的TCP连接,并获取客户端发过来的TNS数据包

 4926      0.000053 getsockname(8, {sa_family=AF_INET6, sin6_port=htons(1521),inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0,sin6_scope_id=0}, [9169787475114065948]) = 0

4926      0.000226 getpeername(8, 0x7fff2c68e5f8, [9169787475114065948]) = -1ENOTCONN (Transport endpoint is not connected)

4926      0.000055 accept(8, {sa_family=AF_INET6, sin6_port=htons(42055),inet_pton(AF_INET6, "::ffff:192.168.0.103", &sin6_addr),sin6_flowinfo=0, sin6_scope_id=0}, [120259084316]) = 12

4926      0.000063 getsockname(12, {sa_family=AF_INET6, sin6_port=htons(1521),inet_pton(AF_INET6, "::ffff:192.168.0.103", &sin6_addr),sin6_flowinfo=0, sin6_scope_id=0}, [120259084316]) = 0

4926       0.000051 fcntl(12, F_SETFL,O_RDONLY|O_NONBLOCK) = 0

4926      0.000034 getsockopt(12, SOL_SOCKET, SO_SNDBUF, [3200064202492396996],[4]) = 0

4926      0.000033 getsockopt(12, SOL_SOCKET, SO_RCVBUF, [3200064202492433792],[4]) = 0

4926      0.000036 setsockopt(12, SOL_TCP, TCP_NODELAY, [1], 4) = 0

4926      0.000087 fcntl(12, F_SETFD, FD_CLOEXEC) = 0

 

2、监听进程打开用于与子进程通信的管道,同时fork一个子进程,也就是前面我们称为“监听子进程1”的子进程,这里进程号为10209。然后监听进程一直等待,直到这个子进程10209结束

4926      0.000053 pipe([13, 14])      = 0

4926      0.000037 pipe([15, 16])      = 0

4926      0.000042 clone(child_stack=0,flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,child_tidptr=0x2b3d814b1320) =10209

4926      0.000765 wait4(10209,  <unfinished ...>

 

3、在监听进程等待子进程10209结束的同时,子进程10209完成的工作相对比较简单,仅仅是fork一个子程,也就是前面称为“子进程2”的子进程,新的子进程号为10210。子进程10209完成fork子进程10210之后,就立即退出:

10209     0.000116 clone(child_stack=0,flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,child_tidptr=0x2b3d814b1320) = 10210

10209     0.001169 exit_group(0)       = ?

 

4、回到监听主进程,监听进程在子进程10209退出后,在管道上读取数据,这就是一个会阻塞的操作,只有在管理上读到期数据后,才会返回:

4926      0.000567 <... wait4 resumed> [{WIFEXITED(s) &&WEXITSTATUS(s) == 0}], 0, NULL) = 10209

4926      0.000046 --- SIGCHLD (Child exited) @ 0 (0) ---

4926      0.000040 close(13)           = 0

4926      0.000055 close(16)           = 0

4926      0.000063 fcntl(15, F_SETFD, FD_CLOEXEC) = 0

4926      0.000056 fcntl(14, F_SETFD, FD_CLOEXEC) = 0

4926      0.000127 fcntl(12, F_SETFD, FD_CLOEXEC) = 0

4926      0.000270 poll([{fd=8, events=POLLIN|POLLRDNORM}, {fd=11,events=POLLIN|POLLRDNORM}, {fd=15, events=POLLIN|POLLRDNORM}, {fd=14,events=0}], 4, -1 <unfinished ...>

10210     0.000197 close(14)           = 0

10210     0.000073 close(15)           = 0

 

5、监听进程被阻塞的同时,“子进程2”,也就是进程号为10210的进程,通过exec调用,转而成为Oracle Sever Process:

10210     0.000319 setsid()            =10210

10210     0.000088 geteuid()           = 500

10210     0.000112 setsid()            = -1EPERM (Operation not permitted)

10210     0.000169 execve("/u01/app/oracle/product/11g/bin/oracle",["oracleocp", "(LOCAL=NO)"], [/* 29 vars */]) = 0

 

6、Server Process执行初始化动作,然后向管道中写入数据:

10210     0.000041 fstat(3, {st_mode=S_IFREG|0644, st_size=12755, ...}) = 0

10210     0.000043 mmap(NULL, 1053208, PROT_READ|PROT_EXEC,MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x2b9880bc3000

10210     0.000031 mprotect(0x2b9880bc5000, 1044480, PROT_NONE) = 0

10210     0.000030 mmap(0x2b9880cc4000, 4096, PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x2b9880cc4000

10210     0.000036 close(3)            = 0

10210     0.000054 open("/u01/app/oracle/product/11g/lib/libocr11.so",O_RDONLY) = 3

10210     0.000040 read(3,"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\302\0\0\0\0\0\0"...,832) = 832

10210     0.000039 fstat(3, {st_mode=S_IFREG|0644, st_size=1590995, ...}) = 0

10210     0.000043 mmap(NULL, 4096, PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b9880cc5000

10210     0.000046 mmap(NULL, 1743432, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,3, 0) = 0x2b9880cc6000

10210     0.000031 mprotect(0x2b9880d6d000, 1048576, PROT_NONE) = 0

10210     0.000031 mmap(0x2b9880e6d000, 12288, PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xa7000) = 0x2b9880e6d000

10210      0.000044 close(3)            = 0

10210     0.000032 open("/u01/app/oracle/product/11g/lib/libocrb11.so",O_RDONLY) = 3

10210     0.000039 read(3,"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340{\0\0\0\0\0\0"...,832) = 832

7、一直到期现在为止,我们还没有看到任何异常的地方。但接下来我们往下看,可以看出问题出在什么地方了:

10210 0.000042 uname({sys-“Linux”,node=”localhost.localdomain”, ….})=0

10210 0.000112 open(“/etc/resolv.conf”,O_RDONLY)=9

10210 0.000047 read(9,”search localdomain\nnameserver 10” …, 4096)=43

 

这段调用的含义是子进程10210尝试取得node名字localhost.localdomain,接着打开/etc/resolv.conf文件,这个是域名解析的配置文件,接下来d(9,”search localdomain\nnameserver 10” …, 4096)=43,这个地方后面省略的10开头的应该是域名服务器IP地址。表明通过这个服务器解析域名。

接下来是:

10210 0.0000057 connect(9,{sa_family=AF_INET,sin_port=host(53)},sin_addr=inet_addr(“10.54.170.70”)}),28)=0

10210 0.0000056 poll([{fd=9,events=POLLIN}],1,5000 <unfinished ……>

这段调用含义是子进程10210尝试向10.54.170.70这个IP地址,UDP协议端口53,也就是DNS协议端口请求解析域名localhost.localdomain.

  Poll是子进程10210在检查返回的数据,5000ms,也就是5s.注意这里的结果是unfinished,表明是在解析域名localhost.localdomain的时候出了问题,等待了5000ms,也就是5s.

接着是:

10210 4.055269 <…poll resumed>) =0 (Timeout)

10210 0.000119 poll([{fd=9,events=POLLIN}],1,5000<unfinished…>

这说明子进程10210在执行poll的时候超时,然后继续poll.

 大家数一下上述调后会发现子进程10210一共poll了4次,每次都在等待了5s后超时,所以子进程10210一共等待了20s.

 这就是上述库无论什么国连接都需要等待20s后才能连上的本质原因!接下来的监听过程我们无须再分析,因为我们已经找到答案。

检查DNS设置,如果在内网中,不需要访问互联网,直接去掉/etc/resolv.con中DNS Server配置,如需要访问互联网,指定一个可以访问的域名服务器IP地址。

 设置了正解的DNS Sever后,上述连接的性能问题不再出现。




看完上面的跟踪日志已基本可以定位问题了:OK先来模拟上面连接缓慢的现象,只有重现现象才才知道问题原来是这么简单啊。

这只修改/etc/resolv.conf,估计写错DNS服务器的IP地址,其它什么都不变。

vi /etc/resolv.conf

; generated by /sbin/dhclient-script
search localdomain
nameserver 192.168.217.130


#注这里192.168.217.130这个IP不是对应真正的DNS服务器,而是随便写了一个IP.


好马上用sqlplus来做连接:

[oracle@ocm ~]$ sqlplus gyj/gyj@ocm

连接非常缓慢,大约等待10S左右,请耐心等待,OK终于连接正去了。。。。后面操作正常的!!!!!!!!!!!!!!

[oracle@ocm ~]$ date;sqlplus gyj/gyj@ocm <<EOF;date
> exit
> EOF
Mon Apr 29 21:54:45 CST 2013

SQL*Plus: Release 11.2.0.1.0 Production on Mon Apr 29 21:54:45 2013

Copyright (c) 1982, 2009, Oracle. All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

gyj@OCM> Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Mon Apr 29 21:54:56 CST 2013



对比下面的时间:相差11s

Mon Apr 29 21:54:45 CST 2013

Mon Apr 29 21:54:56 CST 2013



解决:在resolv.conf中配置正确的DNS IP.如果数据库服务器不接外网,干掉就去掉nameserver 192.168.217.131这行。

把resolve那个里面的条目写成8.8.8.8连接时间就会变成30秒,比原来的时间稍微长一点(这个8.8.8.8 本机必须ping不同,想尽办法将外网断掉)

****************************************************************************************************

好,如果出现的结果是另一个错误,怎么办???????????????????


报错如下:

[oracle@ocm ~]$ sqlplus gyj/gyj@ocm

SQL*Plus: Release 11.2.0.1.0 Production on Mon Apr 29 20:14:30 2013

Copyright (c) 1982, 2009, Oracle. All rights reserved.

ERROR:
ORA-12545: Connect failed because target host or object does not exist


Enter user-name:
ERROR:
ORA-01017: invalid username/password; logon denied


Enter user-name:
ERROR:
ORA-01017: invalid username/password; logon denied


SP2-0157: unable to CONNECT to ORACLE after 3 attempts, exiting SQL*Plus



对下面的配置做一系列的检查:

1.查/etc/nsswitch.conf 配置

[root@ocm ~]# more /etc/nsswitch.conf

hosts: files dns


2.查/etc/hosts

root@ocm ~]# more /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
192.168.217.130 ocm.example.com ocm


3.查/etc/resolv.conf

[root@ocm ~]# more /etc/resolv.conf
; generated by /sbin/dhclient-script
search localdomain
nameserver 192.168.217.131


4.查DSN

[root@ocm named]# more /var/named/chroot/var/named/example.file

$TTL 86400
@ IN SOA server1.example.com. root (
42 ; serial (d. adams)
3H ; refresh
15M ; retry
1W ; expiry
1D ) ; minimum
IN NS server1.example.com.
server1 IN A 192.168.217.130
ocm IN A 192.168.217.130
ocp IN A 172.34.45.57



/etc/nsswitch.conf 这个文件 定义了查找域名解析的顺序 但不是每个应用都会按照这个生面的顺序去走的
/etc/hosts 默认系统的第一解析文件
/etc/resolv.conf 默认系统定义dnsserver的ip地址
最后一个example.file 区域解析文件,负责整个example.com的解析


**************************************************

要模拟缓慢很简单:(目的是要让走DNS)

1、配一个DSN

具体参考:http://blog.csdn.net/guoyjoe/article/details/16982179

root@mydb named]# vi /var/named/chroot/var/named/example.file


$TTL    86400
@               IN SOA  guoyjoe.example.com. root (
                                        42              ; serial (d. adams)
                                        3H              ; refresh
                                        15M             ; retry
                                        1W              ; expiry
                                        1D )            ; minimum
                IN NS           guoyjoe.example.com
guoyjoe         IN A            192.168.153.129
mydb            IN A            192.168.153.129

2、/etc/nsswitch.conf

hosts:       dns files    --把DNS放在前面解析(原来:hosts:      files  dns)

3、vi /etc/resolv.conf

; generated by /sbin/dhclient-script
search localdomain
nameserver 192.168.153.130  ----写一个错的DNS(正确的DNS 192.168.153.129)

4、   vi  /etc/hosts

192.168.153.129  mydb.example.com        mydb




**********本博客所有内容均为原创,如有转载请注明作者和出处!!!**********
Name:    guoyJoe

QQ:        252803295

Email:    oracledba_cn@hotmail.com

Blog:      http://blog.csdn.net/guoyJoe

ITPUB:   http://www.itpub.net/space-uid-28460966.html

OCM:     http://education.oracle.com/education/otn/YGuo.HTM
 _____________________________________________________________
加群验证问题:哪些SGA结构是必需的,哪些是可选的?否则拒绝申请!!!

答案在:http://blog.csdn.net/guoyjoe/article/details/8624392

Oracle@Paradise  总群:127149411

Oracle@Paradise No.1群:177089463(已满)

Oracle@Paradise No.2群:121341761

Oracle@Paradise No.3群:140856036


原创粉丝点击