[转自Oracle官方技术博客]SLES12 SP2上遇到ORA-12518: TNS:listener could not hand off client connection

来源:互联网 发布:相机双重曝光软件 编辑:程序博客网 时间:2024/05/18 18:18


By: Andy Zhang

SLES12 SP2的linux上发生的问题,并不常见,但是给出了一些新的思路。

现象是数据库进程达到300个左右时,就无法继续连接数据库了,报以下错误。

ERROR:
ORA-12518: TNS:listener could not hand off client connection


15-AUG-2017 01:40:01 * (CONNECT_DATA=(CID=(PROGRAM=myapp)(HOST=__jdbc__)(USER=admin))(SERVER=DEDICATED)(SERVICE_NAME=oracle)) * (ADDRESS=(PROTOCOL=tcp)(HOST=11.22.33.44)(PORT=1521)) * establish * oracle * 12518
TNS-12518: TNS:listener could not hand off client connection
 TNS-12536: TNS:operation would block
  TNS-12560: TNS:protocol adapter error
  TNS-00506: Operation would block
  Linux Error: 11: Resource temporarily unavailable

问题可以一直重现,但是用户无法找到限制在哪儿,ulimit -a显示没有明显限制:

sa-server-0:grid:+ASM1 # ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 513378
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1000000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1000000
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

检查进程限制也没有异常:

sa-server-0:~ # cat /proc/5497/limits
Limit                     Soft Limit           Hard Limit           Units    
Max cpu time              unlimited            unlimited            seconds  
Max file size             unlimited            unlimited            bytes    
Max data size             unlimited            unlimited            bytes    
Max stack size            33554432             unlimited            bytes    
Max core file size        unlimited            unlimited            bytes    
Max resident set          unlimited            unlimited            bytes    
Max processes             513378               513378               processes
Max open files            65536                65536                files    
Max locked memory         unlimited            unlimited            bytes    
Max address space         unlimited            unlimited            bytes    
Max file locks            unlimited            unlimited            locks    
Max pending signals       513378               513378               signals  
Max msgqueue size         819200               819200               bytes    
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        

让用户取了listener的strace,的确是clone函数失败,原因是资源不足(Resource temporarily unavailable)

STRACE
-------------------
filename=listener.strace

11404      0.000022 poll([{fd=8, events=POLLIN|POLLRDNORM}, {fd=11, events=POLLIN|POLLRDNORM}, {fd=13, events=POLLIN|POLLRDNORM}, {fd=14, events=POLLIN|POLLRDNORM}, {fd=15, events=POLLIN|POLLRDNORM}, {fd=16, events=POLLIN|POLLRDNORM}, {fd=17, events=POLLIN|POLLRDNORM}, {fd=3, events=POLLIN|POLLRDNORM}], 8, 60000) = 2 ([{fd=15, revents=POLLIN|POLLRDNORM}, {fd=3, revents=POLLIN|POLLRDNORM}]) <0.000012>
11404      0.000043 read(3, "\0\367\0\0\1\0\0\0\0016\1,\fA \0\177\377O\230\0\0\0\1\0\275\0:\0\0\0\0"..., 8208) = 247 <0.000010>
11404      0.000028 fcntl(3, F_GETFL)   = 0x802 (flags O_RDWR|O_NONBLOCK) <0.000008>
11404      0.000021 fcntl(3, F_SETFL, O_RDWR) = 0 <0.000008>
11404      0.000023 times({tms_utime=5483, tms_stime=2588, tms_cutime=440, tms_cstime=60}) = 1720115043 <0.000009>
11404      0.000096 fcntl(3, F_SETFD, 0) = 0 <0.000010>
11404      0.000027 pipe([18, 19])      = 0 <0.000012>
11404      0.000026 pipe([20, 21])      = 0 <0.000011>
11404      0.000024 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f4e29b769d0) = -1 EAGAIN (Resource temporarily unavailable) <0.000197>  《============
11404      0.000219 close(18)           = 0 <0.000011>
11404      0.000022 close(19)           = 0 <0.000010>
11404      0.000023 close(20)           = 0 <0.000009>
11404      0.000021 close(21)           = 0 <0.000009>

检查OS log发现了一点端倪:

2017-08-16T02:36:55.560027+08:00 server-0 kernel: [ 165.619978] cgroup: fork rejected by pids controller in /system.slice/ohasd.service

' fork rejected by pids controller' 说明对进程数是有限制的。

 

最终的原因是因为在SUSE 12上增加了systemd的资源控制,其中默认参数:

DefaultTasksMax was default value(512).
systemd limited maximum number of tasks that may be created in the unit.
 这个值会影响 OS上的maxpid,将该参数设为无限制后解决该问题:

修改 /etc/systemd/system.conf

设置 DefaultTasksMax 的值为'infinity',重启主机。 

阅读全文
0 0
原创粉丝点击