LAD(Log Archive Dest)配置不当引起备份失败

来源:互联网 发布:卷积矩阵运算法则 编辑:程序博客网 时间:2024/05/16 07:41

一.问题起因

2014/10/14接某客户反馈,备份数据库的crontab执行失败。远程连接分析后发现是因为2014/09/13灾备演练过后dataguard参数没有正确调整导致的归档未清理,过多归档备份时因空间不足而失败。详细过程如下

二.日志分析
1.登陆后检查备份日志后发现数据文件备份成功但是备份归档时失败:

including current SPFILE in backup setchannel c1: starting piece 1 at 13-OCT-14channel c1: finished piece 1 at 13-OCT-14piece handle=/backup/addrrman/full_ADDRPROD_20141013_14004_1 tag=TAG20141013T220005 comment=NONEchannel c1: backup set complete, elapsed time: 00:00:01channel c2: finished piece 1 at 13-OCT-14piece handle=/backup/addrrman/full_ADDRPROD_20141013_14001_1 tag=TAG20141013T220005 comment=NONEchannel c2: backup set complete, elapsed time: 01:45:12channel c3: finished piece 1 at 13-OCT-14piece handle=/backup/addrrman/full_ADDRPROD_20141013_14002_1 tag=TAG20141013T220005 comment=NONEchannel c3: backup set complete, elapsed time: 01:46:01Finished backup at 13-OCT-14sql statement: alter system archive log current。。。。skip .....released channel: c1released channel: c2released channel: c3RMAN-00571: ===========================================================RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============RMAN-00571: ===========================================================RMAN-03009: failure of backup command on c3 channel at 10/14/2014 00:30:34<span style="color:#ff0000;">ORA-19502: write error on file "/backup/addrrman/arch_ADDRPROD_20141014_14093_1", block number 442369 (block size=512)ORA-27063: number of bytes read/written is incorrectIBM AIX RISC System/6000 Error: 28: No space left on deviceAdditional information: -1Additional information: 1048576</span>


2.检查数据文件备份集大小发现数据量未剧增

oracle@p740a:/backup/addrrman[addr11g1]$ls -ltrtotal 143197088-rw-------    1 oracle   oinstall         98 Aug 21 18:53 nohup.out-rw-r--r--    1 oracle   oinstall       7702 Oct 13 22:00 analyze.lst-rw-r-----    1 oracle   asmadmin 23931797504 Oct 13 23:44 full_ADDRPROD_20141013_14000_1-rw-r-----    1 oracle   asmadmin    7847936 Oct 13 23:44 full_ADDRPROD_20141013_14003_1-rw-r-----    1 oracle   asmadmin      98304 Oct 13 23:44 full_ADDRPROD_20141013_14004_1-rw-r-----    1 oracle   asmadmin 23550468096 Oct 13 23:45 full_ADDRPROD_20141013_14001_1-rw-r-----    1 oracle   asmadmin 25820962816 Oct 13 23:46 full_ADDRPROD_20141013_14002_1-rw-r--r--    1 oracle   oinstall    2659758 Oct 14 00:34 rman_delete.log-rw-r--r--    1 oracle   oinstall     803655 Oct 14 00:37 delete_local_std_arch.log-rw-r--r--    1 oracle   oinstall    1210456 Oct 14 00:38 rman_bk.log-rw-r--r--    1 oracle   oinstall        527 Oct 14 00:38 delete_cd_std_arch.log

3.检查归档删除日志发现9/13日归档因为没有在所有standby去apply

RMAN-08120: WARNING: archived log not deleted, not yet applied by standbyarchived log file name=+ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13079.1905.858179699 thread=1 sequence=13079RMAN-08120: WARNING: archived log not deleted, not yet applied by standbyarchived log file name=+ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13080.1618.858181499 thread=1 sequence=13080<span style="color:#ff0000;">RMAN-08120: WARNING: archived log not deleted, not yet applied by standby</span>archived log file name=+ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13081.1619.858182367 thread=1 sequence=13081

4.结合归档删除脚本中的archivelog删除策略

rman target / nocatalog log /backup/addrrman/rman_delete.log<<EOFallocate channel for maintenance type disk connect 'sys/xxxx@addr11g1';allocate channel for maintenance type disk connect 'sys/xxxx@addr11g2';CONFIGURE RETENTION POLICY TO REDUNDANCY 1;<span style="color:#ff0000;">CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY;-->在所有standby应用后才能删除</span>crosscheck backup;crosscheck archivelog all;delete noprompt archivelog until time 'sysdate-7';delete noprompt obsolete;delete noprompt expired backup;exitEOF

5.检查log_archive_dest和log_archive_dest_state发现有defer的LAD

NAME                                 TYPE        VALUE------------------------------------ ----------- ------------------------------log_archive_dest                     stringlog_archive_dest_1                   string      LOCATION=+ARCHDG VALID_FOR=(AL                                                 L_LOGFILES,ALL_ROLES) DB_UNIQU                                                 E_NAME=addrprodlog_archive_dest_3                   string      service=ADDRCD arch async vali                                                 d_for=(ONLINE_LOGFILES,PRIMARY                                                 _ROLE) reopen=60 db_unique_nam                                                 e=ADDRCDlog_archive_dest_4                   string      service=ADDRPROD_STD arch asyn                                                 c valid_for=(ONLINE_LOGFILES,P                                                 RIMARY_ROLE) reopen=60 db_uniq                                                 ue_name=ADDRPROD_STDlog_archive_dest_state_1             string      ENABLE<span style="background-color: rgb(255, 255, 0);">log_archive_dest_state_3             string      defer</span>log_archive_dest_state_4             string      enable

三.问题解决

清理log_archive_dest_3后重新手工删除archivelog 成功:

SQL> show parameter log_archive_dest_3;NAME                                 TYPE       VALUE------------------------------------ ---------- ------------------------------log_archive_dest_3                   string     service=ADDRCD arch async vali                                                d_for=(ONLINE_LOGFILES,PRIMARY                                                _ROLE) reopen=60 db_unique_nam                                                e=ADDRCDlog_archive_dest_30                  stringlog_archive_dest_31                  stringSQL> alter system set log_archive_dest_3='' scope=both sid='*';System altered.SQL> show parameter log_archive_dest_3;NAME                                 TYPE       VALUE------------------------------------ ---------- ------------------------------log_archive_dest_3                   stringlog_archive_dest_30                  stringlog_archive_dest_31                  string

删除归档时未再报错:RMAN> CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY;delete noprompt archivelog until time 'sysdate-7';using target database control file instead of recovery catalogold RMAN configuration parameters:CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY;new RMAN configuration parameters:CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY;new RMAN configuration parameters are successfully storedRMAN>allocated channel: ORA_DISK_1channel ORA_DISK_1: SID=963 instance=addr11g1 device type=DISKallocated channel: ORA_DISK_2channel ORA_DISK_2: SID=1717 instance=addr11g1 device type=DISKallocated channel: ORA_DISK_3channel ORA_DISK_3: SID=1908 instance=addr11g1 device type=DISKallocated channel: ORA_DISK_4channel ORA_DISK_4: SID=2189 instance=addr11g1 device type=DISKList of Archived Log Copies for database with db_unique_name ADDRPROD=====================================================================Key     Thrd Seq     S Low Time------- ---- ------- - ---------168624  1    13079   A 13-SEP-14        Name: +ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13079.1905.858179699168643  1    13080   A 13-SEP-14        Name: +ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13080.1618.858181499168646  1    13081   A 13-SEP-14        Name: +ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13081.1619.858182367168648  1    13082   A 13-SEP-14        Name: +ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13082.1620.858182411168656  1    13083   A 13-SEP-14        Name: +ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13083.1625.858182901168658  1    13084   A 13-SEP-14        Name: +ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13084.1624.858182903168662  1    13085   A 13-SEP-14        Name: +ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13085.1627.858182967168666  1    13086   A 13-SEP-14        Name: +ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13086.1629.858184767168670  1    13087   A 13-SEP-14        Name: +ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13087.1631.858186569168674  1    13088   A 13-SEP-14        Name: +ARCHDG/addrprod/archivelog/2014_09_13/thread_1_seq_13088.1633.858188367

四.小结

这种临时性操作的收尾不干净导致的问题应该也不少见,本次没有引起重大故障(当然并不意味着每次都不会引起重大故障)。所以,日常工作中我们还是需要从多方面入手确保系统的正常运行,例如:

1).足够熟悉系统环境,清楚掌握各个临时操作之后如何恢复回去;

2).当然以上一点纯粹不靠谱啦,都说好记性不如烂笔头,最好还是有标准化的OM咯;

3).相关临时操作完成后需要对系统进行一次完整的检查。


0 0