oracle 11.2.0.3.7升级到11.2.0.3.11后数据库无法启动案例 - ORA-00600 kfioTranslateIO03和17090

来源:互联网 发布:java购物车实现原理 编辑:程序博客网 时间:2024/05/22 13:19

1. 环境说明

有一批数据库准备上线,当时安装的版本是11.2.0.3,打了PSU到11.2.0.3.7,但目前该版本的最新PSU已经到了11了,为了避免上线后安全扫描等需要停机打补丁操作,所以干脆在上线前就将数据库打上最新的PSU到11.2.0.3.11(Patch ID:18522512)。

blog地址:http://blog.csdn.net/hw_libo/article/details/39672901


2. alert日志

Mon Sep 29 14:50:23 2014ALTER DATABASE   MOUNTMon Sep 29 14:50:26 2014Sweep [inc][280114]: completedSweep [inc][280113]: completedSweep [inc2][280114]: completedSweep [inc2][280113]: completedNOTE: Loaded library: System ORA-15025: could not open disk "/dev/diskgroup/dg_ora"ORA-27037: unable to obtain file statusLinux-x86_64 Error: 13: Permission deniedAdditional information: 3SUCCESS: diskgroup DG_ORA was mountedERROR: failed to establish dependency between database NDADB and diskgroup resource ora.DG_ORA.dgErrors in file /opt/oracle/diag/rdbms/ndadb/NDADB/trace/NDADB_ckpt_15674.trc  (incident=288113):ORA-00600: internal error code, arguments: [kfioTranslateIO03], [], [], [], [], [], [], [], [], [], [], []Incident details in: /opt/oracle/diag/rdbms/ndadb/NDADB/incident/incdir_288113/NDADB_ckpt_15674_i288113.trcUse ADRCI or Support Workbench to package the incident.See Note 411.1 at My Oracle Support for error and packaging details.Errors in file /opt/oracle/diag/rdbms/ndadb/NDADB/trace/NDADB_ckpt_15674.trc  (incident=288114):ORA-00600: internal error code, arguments: [17090], [], [], [], [], [], [], [], [], [], [], []Incident details in: /opt/oracle/diag/rdbms/ndadb/NDADB/incident/incdir_288114/NDADB_ckpt_15674_i288114.trcDumping diagnostic data in directory=[cdmp_20140929145027], requested by (instance=1, osid=15674 (CKPT)), summary=[incident=288113].Use ADRCI or Support Workbench to package the incident.See Note 411.1 at My Oracle Support for error and packaging details.ERROR: unrecoverable error ORA-600 raised in ASM I/O path; terminating process 15674 Dumping diagnostic data in directory=[cdmp_20140929145028], requested by (instance=1, osid=15674 (CKPT)), summary=[incident=288114].PMON (ospid: 15585): terminating the instance due to error 469System state dump requested by (instance=1, osid=15585 (PMON)), summary=[abnormal instance termination].System State dumped to trace file /opt/oracle/diag/rdbms/ndadb/NDADB/trace/NDADB_diag_15634.trcDumping diagnostic data in directory=[cdmp_20140929145030], requested by (instance=1, osid=15585 (PMON)), summary=[abnormal instance termination].Instance terminated by PMON, pid = 15585

查看状态:

NDADB01:~ # crs_stat -tName           Type           Target    State     Host        ------------------------------------------------------------ora.DG_DATA.dg ora....up.type ONLINE    ONLINE    ndadb01     ora.DG_ORA.dg  ora....up.type ONLINE    ONLINE    ndadb01     ora....ER.lsnr ora....er.type ONLINE    ONLINE    ndadb01     ora.asm        ora.asm.type   ONLINE    ONLINE    ndadb01     ora.cssd       ora.cssd.type  ONLINE    ONLINE    ndadb01     ora.diskmon    ora....on.type OFFLINE   OFFLINE               ora.evmd       ora.evm.type   ONLINE    ONLINE    ndadb01     ora.ons        ora.ons.type   OFFLINE   OFFLINE

说明:数据库是由VCS双机拉起的,所以这里是看不到rdbms资源组的。

并且查看了crs日志、asm日志均是正常的。


3. 根据MOS文档解决问题

在MOS中查到:

ORA-00600 [kfioTranslateIO03] [17090] (Doc ID 1336846.1)

关键检查点:Case #1 ] Group permission of "oracle" executable from RDBMS home should have the same group information for ASM devices according to note 1084186.1.$ ls -l $GRID_HOME/bin/oracle-rwsr-s--x 1 grid oinstall 228954465 Jul 1 13:37 /oh1/grid/product/11.2.0/bin/oracle$ ls -l $RDBMS_HOME/bin/oracle-rwsr-s--x 1 oracle asmadmin 228954465 Jul 1 13:37 /oh1/oracle/product/11.2.0/bin/oracle导致这个问题的原因在于oracle可执行文件的所在操作系统组必需要有ASM磁盘文件的读写权限。解决办法:Please execute the following action plan from note 1084186.1.$ su - grid$ cd <Grid Home>/bin$ ./setasmgidwrap o=<11.2 RDBMS Home>/bin/oracle

经检查,确实是oracle用户下的$ORACLE_HOME/bin/oracle文件权限不对了:

## grid用户的$ORACLE_HOME/bin/oracle权限是正确的NDADB01:/dev/diskgroup # su - gridgrid@NDADB01:~> ls -l $ORACLE_HOME/bin/oracle-rwsr-s--x 1 grid oinstall 204902468 2014-09-29 10:37 /opt/oracrs/product/11gR2/grid/bin/oracle## oracle用户下的$ORACLE_HOME/bin/oracle文件权限不对NDADB01:/dev/diskgroup # su - oracle oracle@NDADB01:~> ls -l $ORACLE_HOME/bin/oracle-rwxr-x--x 1 oracle oinstall 233461759 2014-09-29 11:53 /opt/oracle/product/11gR2/db/bin/oracle## 正确应该为:oracle@NDADB01:~> ls -l $ORACLE_HOME/bin/oracle-rwsr-s--x 1 oracle asmadmin 233461759 2014-09-29 15:39 /opt/oracle/product/11gR2/db/bin/oracle

根据MOS的文档,解决办法:

NDADB01:/dev/diskgroup # su - gridgrid@NDADB01:~> cd $ORACLE_HOME/bingrid@NDADB01:/opt/oracrs/product/11gR2/grid/bin> ./setasmgidwrap o=/opt/oracle/product/11gR2/db/bin/oracle  ##这里指定的是oracle用户下的$ORACLE_HOME/bin/oraclegrid@NDADB01:/opt/oracrs/product/11gR2/grid/bin> ls -l /opt/oracle/product/11gR2/db/bin/oracle -rwsr-s--x 1 oracle asmadmin 233461759 2014-09-29 11:45 /opt/oracle/product/11gR2/db/bin/oracle

说明:这个文件的权限,我使用chmod u+s和chmod g+s等手工更正了文件权限,但数据库还是无法启动的,问题不能得到解决。

然后重启has(我这里是HA双机,而非RAC):

NDADB01:~ # crsctl stop has -f

NDADB01:~ # crsctl start has

经检查,数据库状态正常,数据也没有丢失,问题解决。

blog地址:http://blog.csdn.net/hw_libo/article/details/39672901

-- Bosco  QQ:375612082

---- END ----


0 0