RMAN备份时报ORA-19501错误--问题定位篇

来源:互联网 发布:流媒体软件 编辑:程序博客网 时间:2024/05/08 16:57


一个库,在备份时报错ORA-19501,下面将我的分析过程简单罗列下
环境:linux    + oracle 10.1.0.4.2
错误内容如下

RMAN> run {2> backup database format '/XXX/flash_recovery_area/prod/backupset/%U.dbf';3> }Starting backup at 01-AUG-13allocated channel: ORA_DISK_1channel ORA_DISK_1: sid=296 devtype=DISKchannel ORA_DISK_1: starting full datafile backupsetchannel ORA_DISK_1: specifying datafile(s) in backupsetinput datafile fno=00046 name=/XXX/oradata/prod/datafile/o1_mf_esbigtbl_1q6k0sp9_.dbfinput datafile fno=00002 name=/XXX/oradata/prod/datafile/o1_mf_undotbs1_1q6jqcko_.dbfinput datafile fno=00063 name=/XXX/oradata/prod/datafile/o1_mf_content__1q6k05ym_.dbf............channel ORA_DISK_1: starting piece 1 at 01-AUG-13RMAN-00571: ===========================================================RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============RMAN-00571: ===========================================================RMAN-03009: failure of backup command on ORA_DISK_1 channel at 08/01/2013 15:48:46ORA-19501: read error on file "/XXX/oradata/prod/datafile/o1_mf_content__1q6k05ym_.dbf", blockno 15233 (blocksize=8192)ORA-27072: File I/O errorLinux Error: 2: No such file or directoryAdditional information: 15232

首先根据上面的错误信息
1. 查看了该数据文件,发现它在物理上是存在的
2.根据oracle的错误编号,挖掘更多的内容


[oracle@infra bin]$ oerr ora 1950119501, 00000, "read error on file \"%s\", blockno %s (blocksize=%s)"// *Cause:  read error on input file// *Action: check the file[oracle@infra bin]$ oerr ora 2707227072, 00000, "File I/O error"// *Cause:  read/write/readv/writev system call returned error, additional//          information indicates starting block number of I/O// *Action: check errno

分析:读取文件错误,推断可能有坏块,具体是物理坏块还是逻辑坏块呢
3.查看告警日志,里面没有错误信息,没有提供有价值的信息
下面就从坏块入手
4.查看坏块所在的表空间及对象


SQL> r  1  SELECT OWNER, SEGMENT_NAME, SEGMENT_TYPE, TABLESPACE_NAME, A.PARTITION_NAME  2    FROM DBA_EXTENTS A  3    WHERE FILE_ID = &FILE_ID  4*   AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1Enter value for file_id: 63old   3:   WHERE FILE_ID = &FILE_IDnew   3:   WHERE FILE_ID = 63Enter value for block_id: 15233old   4:   AND &BLOCK_ID BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1new   4:   AND 15233 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1OWNER                SEGMENT_NAME         SEGMENT_TYPE       TABLESPACE_NAME      PARTITION_NAME-------------------- -------------------- ------------------ -------------------- ------------------------------CONTENT              DR$IFS_TEXT$I        TABLE              CONTENT_IFS_CTX_KSQL> select count(*) from CONTENT.DR$IFS_TEXT$I;  COUNT(*)----------   3257212

分析:可以访问坏块上的表的数据,这里有2种情况:
(1)该表的所有数据都在内存中,查询时全部逻辑读                                                  ---无法判断该表是否是存在逻辑坏块还是物理坏块;
(2)该表中的数据在内存和磁盘中都有,查询时,一部分物理读                              ---
    假设法 --由于不确定是物理坏块还是逻辑坏块,那么就假设为逻辑坏块。
    为了验证是逻辑坏块,执行下面操作
5.用dbv工具验证是否存在逻辑坏块


[oracle@infra bin]$ dbv file=/XXX/oradata/prod/datafile/o1_mf_content__1q6k05ym_.dbf  blocksize=8192DBVERIFY: Release 10.1.0.4.2 - Production on Thu Aug 1 16:26:07 2013Copyright (c) 1982, 2005, Oracle.  All rights reserved.DBVERIFY - Verification starting : FILE = /XXX/oradata/prod/datafile/o1_mf_content__1q6k05ym_.dbfDBVERIFY - Verification completeTotal Pages Examined         : 15258Total Pages Processed (Data) : 13427Total Pages Failing   (Data) : 0Total Pages Processed (Index): 20Total Pages Failing   (Index): 0Total Pages Processed (Other): 1707Total Pages Processed (Seg)  : 0Total Pages Failing   (Seg)  : 0Total Pages Empty            : 104Total Pages Marked Corrupt   : 0Total Pages Influx           : 0Highest block SCN            : 1978186924 (0.1978186924)RMAN> run {2> backup validate datafile 63 format '/XXX/flash_recovery_area/prod/backupset/%U.dbf';3> }Starting backup at 01-AUG-13using target database controlfile instead of recovery catalogallocated channel: ORA_DISK_1channel ORA_DISK_1: sid=367 devtype=DISKchannel ORA_DISK_1: starting full datafile backupsetchannel ORA_DISK_1: specifying datafile(s) in backupsetinput datafile fno=00063 name=/XXX/oradata/prod/datafile/o1_mf_content__1q6k05ym_.dbfRMAN-00571: ===========================================================RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============RMAN-00571: ===========================================================RMAN-03009: failure of backup command on ORA_DISK_1 channel at 08/01/2013 16:32:35ORA-19501: read error on file "/XXX/oradata/prod/datafile/o1_mf_content__1q6k05ym_.dbf", blockno 15233 (blocksize=8192)ORA-27072: File I/O errorAdditional information: 15232RMAN> backup check logical validate datafile 63;Starting backup at 01-AUG-13using channel ORA_DISK_1channel ORA_DISK_1: starting full datafile backupsetchannel ORA_DISK_1: specifying datafile(s) in backupsetinput datafile fno=00063 name=/XXX/oradata/prod/datafile/o1_mf_content__1q6k05ym_.dbfRMAN-00571: ===========================================================RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============RMAN-00571: ===========================================================RMAN-03009: failure of backup command on ORA_DISK_1 channel at 08/01/2013 17:42:10ORA-19501: read error on file "/XXX/oradata/prod/datafile/o1_mf_content__1q6k05ym_.dbf", blockno 15233 (blocksize=8192)ORA-27072: File I/O errorAdditional information: 15232


分析:dbv验证没有逻辑坏块,可是为什么rman下验证时候又报错呢?
      答案只有一个,那就是不是逻辑坏块,而是物理坏块
      那么为了验证是物理坏块,执行下面操作
6.用cp的命令验证物理坏块


[oracle@infra datafile]$ cp  /XXX/oradata/prod/datafile/o1_mf_content__1q6k05ym_.dbf   /tmp/1.dbfcp: reading `/XXX/oradata/prod/datafile/o1_mf_content__1q6k05ym_.dbf': Input/output error[oracle@infra datafile]$ cp /XXX/oradata/prod/datafile/rman01.dbf /tmp/2.dbf  --没报错[oracle@infra datafile]$ cp /XXX/oradata/prod/datafile/o1_mf_ovfmetri_1q6jw7hm_.dbf  /tmp/3.dbf  --没报错

分析:上面的测试结果,让我怀疑磁盘坏了,为了验证我的怀疑,执行如下内容
7.验证磁盘是否健康正常


[oracle@infra oracle]$ dmesg 0    0   0   0    0    0    00 17 00F 0F  1    1    0   1   0    1    1    A9IO APIC #9.......... register #00: 09000000.......    : physical APIC id: 09.......    : Delivery Type: 0.......    : LTS          : 0.... register #01: 00178020.......     : max redirection entries: 0017.......     : PRQ implemented: 1.......     : IO APIC version: 0020.... register #03: 00000001.......     : Boot DT    : 1.... IRQ redirection table: NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:    00 000 00  1    0    0   0   0    0    0    00 01 000 00  1    0    0   0   0    0    0    00 02 00F 0F  1    1    0   1   0    1    1    B1 03 000 00  1    0    0   0   0    0    0    00 04 000 00  1    0    0   0   0    0    0    00 05 000 00  1    0    0   0   0    0    0    00 06 000 00  1    0    0   0   0    0    0    00 07 000 00  1    0    0   0   0    0    0    00 08 000 00  1    0    0   0   0    0    0    00 09 000 00  1    0    0   0   0    0    0    00 0a 000 00  1    0    0   0   0    0    0    00 0b 000 00  1    0    0   0   0    0    0    00 0c 000 00  1    0    0   0   0    0    0    00 0d 000 00  1    0    0   0   0    0    0    00 0e 000 00  1    0    0   0   0    0    0    00 0f 000 00  1    0    0   0   0    0    0    00 10 000 00  1    0    0   0   0    0    0    00 11 000 00  1    0    0   0   0    0    0    00 12 000 00  1    0    0   0   0    0    0    00 13 000 00  1    0    0   0   0    0    0    00 14 000 00  1    0    0   0   0    0    0    00 15 000 00  1    0    0   0   0    0    0    00 16 000 00  1    0    0   0   0    0    0    00 17 000 00  1    0    0   0   0    0    0    00IRQ to pin mappings:IRQ0 -> 0:2IRQ1 -> 0:1IRQ4 -> 0:4IRQ5 -> 0:5IRQ6 -> 0:6IRQ8 -> 0:8IRQ10 -> 0:10IRQ12 -> 0:12IRQ13 -> 0:13IRQ14 -> 0:14IRQ15 -> 0:15IRQ16 -> 0:16IRQ17 -> 0:17IRQ18 -> 0:18IRQ19 -> 0:19IRQ23 -> 0:23IRQ26 -> 1:2.................................... done.Using local APIC timer interrupts.calibrating APIC timer ........ CPU clock speed is 2792.9879 MHz...... host bus clock speed is 199.4990 MHz.cpu: 0, clocks: 1994990, slice: 398998CPU0<T0:1994976,T1:1595968,D:10,S:398998,C:1994990>...
audit subsystem ver 0.1 initializedmtrr: type mismatch for fd000000,800000 old: uncachable new: write-combiningmtrr: type mismatch for fd000000,800000 old: uncachable new: write-combiningSCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18790976SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18790984SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18790992SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791000SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791008SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791016SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791024SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791032SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791040SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791048SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791056SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791064SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791072SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791080SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791088SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096application bug: sqlplus(2014) has SIGCHLD set to SIG_IGN but calls wait().(see the NOTES section of 'man 2 wait'). Workaround activated.SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791048SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791056SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791064SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791072SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791080SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791088SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 25040001 I/O error: dev 08:05, sector 18791096application bug: sqlplus(3841) has SIGCHLD set to SIG_IGN but calls wait().(see the NOTES section of 'man 2 wait'). Workaround activated.application bug: sqlplus(3841) has SIGCHLD set to SIG_IGN but calls wait().(see the NOTES section of 'man 2 wait'). Workaround activated.application bug: sqlplus(5580) has SIGCHLD set to SIG_IGN but calls wait().(see the NOTES section of 'man 2 wait'). Workaround activated.

结论:上面的测试结果证明了我的推断,磁盘坏了,产生了坏道,导致备份时,物理读取该数据文件时候报错
      但是这样就又有了新的问题,磁盘坏了,应该所在坏道上的数据文件逻辑结构也损坏,也就是应该产生
      逻辑坏块,但是事实并没有。而且在一次数据库重启后,数据库正常,并为报与之有关的错误
原创粉丝点击