类似进程死掉Process com.midea.mmp2 died.
来源:互联网 发布:美国文献数据库 编辑:程序博客网 时间:2024/05/02 02:33
08:56:03,273 INFO – 执行Do func=[GetSeqNo] keyNam=[keynam];KeyVal=[PRYPAYBILSYSTRACKNO20130125];SeqNam=[keyval];tblName=[pryseqrec];len=[6];circleString=[1];colName=[null]2 08:56:03,296 ERROR – 获取数据库连接失败! : Cannot create PoolableConnectionFactory (ORA-01034: ORACLE not available3 ORA-27123: unable to attach to shared memory segment4 Linux Error: 22: Invalid argument5 Additional information: 76 Additional information: 21626927 )
OK,我看到了ORA-27123:unable to attach to shared memory segment错误,我猜想多数和内存有关。
这是一台CentOS 5.5上跑着Oracle 11.2.0.1的PC SERVER。
首先我查了下告警日志,发现最近有不少进程死掉现象,如下:
1 ... 2 Fri Jan 25 00:00:46 2013 3 Process m001 died, see its trace file 4 Fri Jan 25 00:07:36 2013 5 Process W000 died, see its trace file 6 Fri Jan 25 01:00:48 2013 7 Process m000 died, see its trace file 8 Fri Jan 25 01:07:45 2013 9 Process W000 died, see its trace file10 Fri Jan 25 01:55:40 201311 Process m000 died, see its trace file12 Fri Jan 25 02:07:50 201313 Process W000 died, see its trace file14 Fri Jan 25 02:35:28 201315 Process m000 died, see its trace file16 Fri Jan 25 03:05:45 201317 Process m000 died, see its trace file18 Fri Jan 25 03:37:56 201319 Process W000 died, see its trace file20 Fri Jan 25 03:58:01 201321 Process W000 died, see its trace file22 Fri Jan 25 04:05:34 201323 Process m000 died, see its trace file24 Fri Jan 25 04:25:51 201325 Process m000 died, see its trace file26 Fri Jan 25 05:00:07 201327 Process m001 died, see its trace file28 Fri Jan 25 05:09:22 201329 Process m000 died, see its trace file30 Fri Jan 25 05:45:57 201331 Process m000 died, see its trace file32 Fri Jan 25 06:01:04 201333 Thread 1 cannot allocate new log, sequence 178134 Private strand flush not complete35 Current log# 1 seq# 1780 mem# 0: /opt/11g/oracle/oradata/orcl/redo01.log36 Thread 1 advanced to log sequence 1781 (LGWR switch)37 Current log# 2 seq# 1781 mem# 0: /opt/11g/oracle/oradata/orcl/redo02.log38 Fri Jan 25 06:20:45 201339 Process m000 died, see its trace file40 Fri Jan 25 07:00:15 201341 Process m001 died, see its trace file42 …...
网上对Process m001 died,see its trace file和Process W0000 died,see its trace file类似的问题有很多讨论,基本都是因为进程数达到上限了。
我也查了下数据库参数设置和当前情况:
1 SQL> col RESOURCE_NAME for a202 SQL> col LIMIT_VALUE for a203 SQL> select resource_name,MAX_UTILIZATION,LIMIT_VALUE from v$resource_limit where resource_name in ('processes','sessions');4 5 RESOURCE_NAME MAX_UTILIZATION LIMIT_VALUE6 -------------------- --------------- --------------------7 processes 281 5008 sessions 282 792
--可见离上限还有很大距离,问题不是它导致的。
--查看了下top,当前服务器也很空闲,free查看内存使用情况,也没有问题。
1 [root@orcl ~]# free -m2 total used free shared buffers cached3 Mem: 12172 10245 1926 0 363 82884 -/+ buffers/cache: 1593 105795 Swap: 5535 68 5467
--我手动切换日志并且checkpoint了下,发现告警日志里又出现了Process died。
1 SQL> alter system switch logfile; 2 3 System altered. 4 5 SQL> alter system checkpoint; 6 7 System altered. 8 9 Process m001 died, see its trace file10 Fri Jan 25 10:09:34 201311 Process m000 died, see its trace file12 Fri Jan 25 10:17:39 2013
--我查了下/dev/shm的情况,虽然在物理内存12G的服务器上仅设置6G觉得有点小,但目前情况看来并不会导致太大问题。
1 [root@orcl ~]# df -Th /dev/shm2 文件系统 类型 容量 已用 可用 已用% 挂载点3 tmpfs tmpfs 6.0G 8.1M 6.0G 1% /dev/shm
--接着我查了下shmmax值设置,这个参数定义了共享内存段的最大尺寸(以字节为单位)。如果设置不当,我们就会遇到ORA-27123。
1 [root@orcl ~]# more /etc/sysctl.conf | grep shmmax2 kernel.shmmax = 10737418243 4 [root@orcl ~]# more /proc/sys/kernel/shmmax5 1073741824
--我发现此参数设置了1G,我觉得在一个使用AMM的系统(这是我没完全检查之后的想法)上这个值应该需要上调的。我没马上更改,继续查问题。
--OS上找了以上内容之后我又回到数据库,看了下内存相关设置,这一查吓一跳,本以为使用AMM的系统居然颠覆了我得想法:
1 SQL> show parameter memory2 3 NAME TYPE VALUE4 ------------------------------------ ---------------------- ------------------------------5 hi_shared_memory_address integer 06 memory_max_target big integer 07 memory_target big integer 08 shared_memory_address integer 0
--OK,Oracle 11g引入了AMM特性之后,Oracle也推荐使用它,当然,这不表示10g的ASSM不可用,根据特殊情况也有系统这样使用。 但我手头里的11g版本数据库基本都在使用AMM特性,而且用得也很好。
--查看sga和pga,如下:
1 SQL> show parameter sga 2 3 NAME TYPE VALUE 4 ------------------------------------ ---------------------- ------------------------------ 5 lock_sga boolean FALSE 6 pre_page_sga boolean FALSE 7 sga_max_size big integer 1G 8 sga_target big integer 1G 9 10 SQL> show parameter pga11 12 NAME TYPE VALUE13 ------------------------------------ ---------------------- ------------------------------14 pga_aggregate_target big integer 3844M
--到这里问题发生原因基本浮出水面了,SGA大小不足以满足需求而导致的。
--问题发生阶段的AWR报告中也能看到library hit命中率很低,硬解析比较严重,另外软解析的比重也非常低。
原因找到了,解决问题就简单了。
首先跟开发的同事大概说明了一下原因,然后也跟开发的领导申请了停机时间,通知了其他开发人员之后修改参数并重启数据库解决了问题。
--操作如下:
1、修改shmmax参数值,提升到2G:
1 [root@orcl ~]# vi /etc/sysctl.conf 2 kernel.shmmax = 21474836483 4 [root@orcl ~]# sysctl -p5 6 [root@orcl ~]# more /etc/sysctl.conf | grep shmmax7 kernel.shmmax = 2147483648
2、启动AMM特性,设置memory_target为6G:
1 SQL> create pfile='/home/oracle/pfile_20130125.bk' from spfile; 2 3 File created. 4 5 SQL> alter system set memory_target=6G scope=spfile; 6 7 System altered. 8 9 SQL> shutdown immediate10 Database closed.11 Database dismounted.12 ORACLE instance shut down.13 14 SQL> startup15 ORACLE instance started.16 17 Total System Global Area 2042241024 bytes18 Fixed Size 1337548 bytes19 Variable Size 1392510772 bytes20 Database Buffers 637534208 bytes21 Redo Buffers 10858496 bytes22 Database mounted.23 Database opened.24 25 -- 查看26 SQL> show parameter memory27 28 NAME_COL_PLUS_SHOW_PARAM TYPE VALUE_COL_PLUS_SHOW_PARAM29 ------------------------------ ---------------------- ------------------------------30 hi_shared_memory_address integer 031 memory_max_target big integer 6G32 memory_target big integer 6G33 shared_memory_address integer 034 35 SQL> select * from v$sgainfo;36 37 NAME BYTES RESIZE38 ---------------------------------------------------------------- ---------- ------39 Fixed SGA Size 1337548 No40 Redo Buffers 10858496 No41 Buffer Cache Size 520093696 Yes42 Shared Pool Size 486539264 Yes43 Large Pool Size 16777216 Yes44 Java Pool Size 16777216 Yes45 Streams Pool Size 16777216 Yes46 Shared IO Pool Size 0 Yes47 Granule Size 16777216 No48 Maximum SGA Size 2042241024 No49 Startup overhead in Shared Pool 167772160 No50 Free SGA Memory Available 97307852851 52 12 rows selected.53 54 SQL> show parameter sga55 56 NAME_COL_PLUS_SHOW_PARAM TYPE VALUE_COL_PLUS_SHOW_PARAM57 ------------------------------ ---------------------- ------------------------------58 lock_sga boolean FALSE59 pre_page_sga boolean FALSE60 sga_max_size big integer 1952M61 sga_target big integer 1G
- 类似进程死掉Process com.midea.mmp2 died.
- Process com.xxxxxxxx has died
- Halting process: ("Worker died")
- process XXX has died的log分析
- system died in sysfs node making process
- ORA-12537 TNS-12518 Process m000 died
- The Haskell process `xx’ has died issue
- 从Process xxxx (pid xxx) has died分析
- Android 开机Process xxx (pid xxxx) has died问题分析
- 关于[move_base-2] process has died执行错误的解决方案
- 关于[move_base-2] process has died执行错误的解决方案
- Android 分析:Process xxxxx (pid 30262) has died .
- 进程-process
- Process 进程
- 进程process
- Process进程
- Process J000 died, see its trace file,kkjcre1p: unable to spawn jobq slave process
- Process J000 died, see its trace file,kkjcre1p: unable to spawn jobq slave process
- 《Essential C++ 中文版》读书笔记 (3)
- 嵌入式OS入门笔记-以RTX为案例:一.简介
- 《深度探索C++对象模型》读书笔记 (2)
- Download Youtube Videos Without Any Software
- 如何解决Asp.Net中不能上传压缩文件的问题
- 类似进程死掉Process com.midea.mmp2 died.
- 图片文件上传到网站的指定的文件夹
- 图片上传到数据库中去
- 在eclipse 中 导入OpenCms源码及其远程调试
- MFC内存泄漏跟踪
- 表压缩与索引失效
- BZOJ 刷题记录 PART 4
- c++ 赋值运算符
- ORACLE重建索引需要考虑问题