Oracle redo概述

来源:互联网 发布:易百淘软件 编辑:程序博客网 时间:2024/05/16 11:21

实验-比较批量提交与单独提交:

--构造环境

drop table t1 purge;

create table t1(x int);

drop table t2 purge;

create table t2(x int);

select * from v$mystat where rownum=1;

--开始测试

set timing on 

--语句1

begin

for i in 1 .. 100000 loop

insert into t1 values (i); 

commit;  

end loop;

end;

/

--语句2

begin

for i in 1 .. 100000 loop

insert into t2 values (i); 

end loop;

commit;  

end;

/

--语句3(找出提交过于频繁的SESSION)

select t1.sid, t1.value, t2.name

from v$sesstat t1, v$statname t2

where t2.name like '%user commits%' --可以只选user commits,其他系统级的先不关心

and t1.STATISTIC# = t2.STATISTIC#

and value >= 10

order by value desc;

1. redo与相关latch:

redo alloction latch:用于管理log buffer内存空间分配

redo copy latch :用于从pga中的内存写入log buffer

数据从pga写入log buffer时,LGWR进程处于等待状态,直到进程拷贝完毕数据才把log buffer写出到磁盘。

查看保护数据复制的redo copy latch个数:

sys@HX> select x.ksppinm ,y.ksppstvl ,x.ksppdesc 

from x$ksppi x,x$ksppcv y

where x.indx=y.indx 

and  x.ksppinm like '%_log_simultaneous_copies%';

KSPPINM                       |KSPPSTVL  |KSPPDESC

------------------------------|----------|----------------------------------------------------------------------

_log_simultaneous_copies      |2         |number of simultaneous copies into redo buffer(# of copy latches)

1 row selected.

oracle 8.1.3以后,这个数变成了cpu数的两倍,我这个是单cpu,所以_log_simultaneous_copies为2:

[~]$cat /proc/cpuinfo | grep processor

processor       : 0

从v$latch视图查看redo copy latch信息:

sys@HX> select name,gets,immediate_gets,immediate_misses,spin_gets 

from v$latch where name like 'redo copy%'; 

NAME                          |      GETS|IMMEDIATE_GETS|IMMEDIATE_MISSES| SPIN_GETS

------------------------------|----------|--------------|----------------|----------

redo copy                     |        27|        406615|             165|         0

1 row selected.

从v$latch_children视图获取子latch信息:

sys@HX> select name,gets,immediate_gets,immediate_misses,spin_gets 

from v$latch_children where name like 'redo% copy';

NAME                          |      GETS|IMMEDIATE_GETS|IMMEDIATE_MISSES| SPIN_GETS

------------------------------|----------|--------------|----------------|----------

redo copy                     |         5|          9903|              30|         0

redo copy                     |         1|        396712|             135|         0

2 rows selected.

一共两个。

server process进程获取redo copy latch后,紧接着需要获得Redo allocation latch,分配redo空间,分配完成后释放Redo allocation latch,接着把pga中的数据拷贝到redo log buffer,拷贝完成后redo copy latch释放。

数据拷贝完成后,server进程可能需要通知lgwr进程把数据写出(例如commit触发)。为了避免lgwr被不必要通知,server进程需要首先获得redo writting latch,以检查lgwr是否已经激活或者已经被通知。如果lgwr已经激活,那么redo writting latch释放。所以,如果用户提交过于频繁(因为要不断价差LGWR是否激活),可能会导致redo writting latch竞争。

sys@HX> l

select name,gets,immediate_gets,immediate_misses,spin_gets

from v$latch where name like 'redo writing'

sys@HX> /

NAME                          |      GETS|IMMEDIATE_GETS|IMMEDIATE_MISSES| SPIN_GETS

------------------------------|----------|--------------|----------------|----------

redo writing                  |     70942|             0|               0|         0

1 row selected.

在执行redo copy的过程中,server进程以log file sync等待事件处于等待状态。拷贝完毕后,进程苏醒,获得redo alloction latch检查相应redo是否已经写入redo log file,如果还没写出那就继续等待。

所以内存的分配(log buffer 获得)与redo写出(log buffer释放)都需要获得redo alloction latch。在繁忙的系统中,这个latch竞争也会很激烈。

2.  9iR2 redo 增强

9iR2中,oracle通过log_parallelism定义redo allocation并发级别,如果定义大于1,将分配多个共享redo log buffer区域,每个log buffer都有自己的 redo alloction latch进行保护,这提高了redo的并发性能,类似子缓冲池。9I中这种多 redo log buffer的机制成为PBRS(public redo  strands),使用高端服务器,cpu数量多时,可以考虑启用并行redo。通过视图v$latch可以观察redo alloction latch竞争累计等待时间:

select name,gets,misses,immediate_gets,immediate_misses,spin_gets

from v$latch  

where name in ('redo allocation','redo copy')

或者

select substr(b.name,1,20),gets,misses,immediate_gets,immediate_misses,spin_gets

from v$latch a,v$latchname b 

where b.name in ('redo alloction','redo copy') and a.latch#=b.latch#

NAME                          |      GETS|    MISSES|IMMEDIATE_GETS|IMMEDIATE_MISSES| SPIN_GETS

------------------------------|----------|----------|--------------|----------------|----------

redo copy                     |        42|         0|        408269|             165|         0

redo allocation               |    250550|        10|        408273|               0|         0

2 rows selected.

如果misses/gets比率超过1%,或者IMMEDIATE_MISSES/(IMMEDIATE_MISSES+IMMEDIATE_GETS)超过1%,那么通常认为存在latch竞争。

如果主机cpu在16-64,通常log_parallelism可以设置为1-8个。9i中这个参数默认是1,缺省redo allocation latch也只有一个。

3. 10g redo增强

sys@HX> select x.ksppinm ,y.ksppstvl ,x.ksppdesc 

from x$ksppi x,x$ksppcv y

where x.indx=y.indx 

and  x.ksppinm like '%_log_para%';

KSPPINM                       |KSPPSTVL  |KSPPDESC

------------------------------|----------|----------------------------------------------------------------------

_log_parallelism_max          |1         |Maximum number of log buffer strands

_log_parallelism_dynamic      |TRUE      |Enable dynamic strands

2 rows selected.

10g以后引入参数_log_parallelism_max 和_log_parallelism_dynamic,_log_parallelism_dynamic参数默认被设置为true,表示并行度自动分配,但是不大于_log_parallelism_max。

相比9I 的PBRS,oracle 10g增加了PVRS(private redolog strands)机制,实际就是在共享池中增加了一些小的内存空间,每个空间约65-128k大小,这些内存空间通过各自的 redo allocation latch保护,这样重做日志就不再需要通过pga到log buffer 的拷贝过程,redo copy latch也就不再需要了。新机制下,进行redo写出时,lgwr将pbrs与pvrs中的内容写出。发生flush redo时,所有public redo alloction latch需要被获取,所有public strands 的redo copy latch 需要被检查,所有包含活动事务的private strands需要被持有。PVRS并不试用与RAC系统。

查看大小:

sys@HX> select * from v$sgastat where name  like 'private%';          

POOL        |NAME                          |     BYTES

------------|------------------------------|----------

shared pool |private strands               |   3677184

所以,在10g中,可以看到多个的redo allocation latch;

sys@HX> l

select name,gets,misses,immediate_gets,immediate_misses,spin_gets

from v$latch_children

where name in ('redo allocation')

sys@HX> /

NAME                          |      GETS|    MISSES|IMMEDIATE_GETS|IMMEDIATE_MISSES| SPIN_GETS

------------------------------|----------|----------|--------------|----------------|----------

redo allocation               |      1079|         0|             0|               0|         0

redo allocation               |      1079|         0|             0|               0|         0

redo allocation               |      1079|         0|             0|               0|         0

…………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………

redo allocation               |      1089|         0|             0|               0|         0

redo allocation               |      1677|         0|             0|               0|         0

redo allocation               |    190555|         0|             0|               0|         0

redo allocation               |      7885|         0|             0|               0|         0

redo allocation               |     28013|        10|        409699|               0|         0

28 rows selected.

检查与PVRS相关的隐含参数:

sys@HX>  select x.ksppinm ,y.ksppstvl ,x.ksppdesc 

from x$ksppi x,x$ksppcv y

where x.indx=y.indx 

and  x.ksppinm like '%_log_private%';

KSPPINM                       |KSPPSTVL  |KSPPDESC

------------------------------|----------|----------------------------------------------------------------------

_log_private_parallelism_mul  |10        |Active sessions multiplier to deduce number of private strands

_log_private_mul              |5         |Private strand multiplier for log space preallocation

2 rows selected.

由于IMP与PVRS息息相关,可以查看IMU信息统计信息,注意到其中的redo allocation size部分。

sys@HX> select name,value from v$sysstat where upper(name) like '%IMU%';

NAME                                                                  |     VALUE

----------------------------------------------------------------------|----------

doubling up with imu segment                                          |         0

IMU commits                                                           |     78802

IMU Flushes                                                           |        98

IMU contention                                                        |        15

IMU recursive-transaction flush                                       |         3

IMU undo retention flush                                              |         0

IMU ktichg flush                                                      |         1

IMU bind flushes                                                      |         0

IMU mbu flush                                                         |         0

IMU pool not allocated                                                |     21513

IMU CR rollbacks                                                      |         0

IMU undo allocation size                                              |  47199448

IMU Redo allocation size                                              |    855624

IMU- failed to get a private strand                                   |     21513

cell simulated physical IO bytes eligible for predicate offload       |         0

cell simulated physical IO bytes returned by predicate offload        |         0

OS Maximum resident set size                                          |         0

17 rows selected.

一旦内存到redo log file写出没完成,就可能会出现log file switch等待事件。

sys@HX> select name from v$event_name where upper(name) like '%STRAND%';

NAME

----------------------------------------------------------------------

log file switch (private strand flush incomplete)

4. 利用update语句分析改变向量与重做记录:

sys@HX> select * from test;

ID|NAME

----------|------------------------------

1|ok1

2|ok2

3|ok3

4|ok4

5|ok5

5 rows selected.

update t1 set x=10 where name='ok5';

修改步骤:

(1)找到name='ok5'的记录所在的块,读入buffer cache

(2)在回滚表空间的相应回滚事务表上分配事务槽,记录redo信息,这是第一个改变向量

(3)从回滚段读入buffer cache,创建id=5的前映象,这个也需要产生redo 信息,redo信息从pga复制到log buffer。

(4)修改x=10,这个也需要记录重做日志信息

(5)commit,redo log buffer中记录提交信息,并在回滚段标记事务为非激活,通知lgwr把信息写出到redo file。

sys@HX> alter system switch logfile;

sys@HX> select * from v$logfile where group#=(select group# from v$log where status='CURRENT');

GROUP#|STATUS |TYPE   |MEMBER                                  |IS_

----------|-------|-------|----------------------------------------|---

2|       |ONLINE |/u01/oradata2/hx/redo02.log             |NO

1 row selected.

发出更新操作:

sys@HX> update test set id=10 where name='ok5'; 

1 row updated.

sys@HX> commit;

Commit complete.

sys@HX> select sid,serial#,username from v$session where sid=(select sid from v$mystat where rownum<=1);

SID|   SERIAL#|USERNAME

----------|----------|---------------

31|       459|HR

1 row selected.

在另外的session转储日志文件:

sys@HX> alter system  dump logfile '/u01/oradata2/hx/redo02.log';  

sys@HX> select value from v$diag_info where name like 'Default%';

VALUE

------------------------------------------------------------

/u01/diag/rdbms/hexel/hexel/trace/hexel_ora_6271.trc

分析转储文件:

找到update事务的相关redo信息:

这部分貌似和书上讲解的不太一样,待定:

5. 查看redo大小

(1)sqltrace:

hr@HX> set autotrace on;

hr@HX> insert into test values(11,'ok11');

Statistics

----------------------------------------------------------

1  recursive calls

22  db block gets

6  consistent gets

2  physical reads

1252  redo size

1132  bytes sent via SQL*Net to client

1278  bytes received via SQL*Net from client

4  SQL*Net roundtrips to/from client

1  sorts (memory)

0  sorts (disk)

1  rows processed

上面的redo size 部分就是这个操作产生的redo大小。

(2)记录当前session的redo size:

select a.name,b.value

from v$statname a,v$mystat b

where a.statistic#=b.statistic# and a.name='redo size';

NAME                          |     VALUE

------------------------------|----------

redo size                     |      5144

(3)记录实例启动以来产生的redo:

hr@HX> select name,value from v$sysstat where name='redo size';

NAME                          |     VALUE

------------------------------|----------

redo size                     | 181370372

(4)查询归档日志大小:

把数据库设置归档:

sys@HX> startup mount;

sys@HX> show parameter log_archive_dest_1 ;

NAME                                |TYPE       |VALUE

------------------------------------|-----------|------------------------------

log_archive_dest_1                  |string     |LOCATION=USE_DB_RECOVERY_FILE_DEST

sys@HX> alter system set DB_RECOVERY_FILE_DEST_SIZE=4g;

System altered.

sys@HX> alter system set db_recovery_file_dest ='/u01/oradata2/archived';

System altered.

sys@HX> alter database archivelog;

Database altered.

sys@HX> archive log list;

Database log mode              Archive Mode

Automatic archival             Enabled

Archive destination            USE_DB_RECOVERY_FILE_DEST

Oldest online log sequence     57

Next log sequence to archive   59

Current log sequence           59

sys@HX> alter database open;

如果是9i,还需要设置参数log_archive_start,否则无法自动归档:

log_archive_start = true

查看所有的归档日志:

sys@HX> select name,completion_time,blocks*block_size/1024/1024 mb from v$archived_log where  name is not null; 

NAME                                              |COMPLETION_TIME    |        MB

--------------------------------------------------|-------------------|----------

/u01/oradata2/archived/HEXEL/archivelog/2013_11_23|2013-11-23 12:07:42|.221191406

/o1_mf_1_59_990bjg94_.arc                         |                   |

/u01/oradata2/archived/HEXEL/archivelog/2013_11_23|2013-11-23 12:11:38|.009765625

/o1_mf_1_60_990bqt71_.arc                         |                   |

查看归档目录可用性,通常无法归档时由于尝试过度,相应dest的status可能会变为error:

SYS> select dest_id,dest_name,status from v$archive_dest;

DEST_ID|DEST_NAME                     |STATUS

----------|------------------------------|---------

1|LOG_ARCHIVE_DEST_1            |VALID

2|LOG_ARCHIVE_DEST_2            |INACTIVE

3|LOG_ARCHIVE_DEST_3            |INACTIVE

4|LOG_ARCHIVE_DEST_4            |INACTIVE

31 rows selected.

6. redo 写触发条件:

(1)每3s超时:

lgwr空闲时,处于rdbms ipc message等待,当3s超时时间到时,如果lgwr发现有redo写出,那么他将执行写出,这个时候log file paralel write等待事件出现:

可以使用10046事件跟踪跟踪;

hr@HX> select event#,event_id,name from v$event_name 

where name='rdbms ipc message' 

or name like '%log file parallel write%';;

EVENT#|  EVENT_ID|NAME

----------|----------|------------------------------

7| 866018717|rdbms ipc message

134|3999721902|log file parallel write

2 rows selected.

hr@HX> alter session set events '10046 trace name context forever ,level 12' ; 

hr@HX> update test set id=1 where id=2;

hr@HX>  alter session set events '10046 trace name context off' ;

(2)阈值达到

redo log buffer满1/3或者具有1m脏数据,

实际上是因为redo log buffer 达到隐含参数_log_io_size大小(这个参数缺省是1/3 log buffer大小,上限是1m)激活lgwr写出数据。注意log buffer size是以log block来衡量的。

sys@HX> select x.ksppinm,y.ksppstvl,x.ksppdesc 

from x$ksppi x, x$ksppcv y where x.indx=y.indx and x.ksppinm='_log_io_size';;

KSPPINM                       |KSPPSTVL  |KSPPDESC

------------------------------|----------|----------------------------------------------------------------------

_log_io_size                  |0         |automatically initiate log write if this many redo blocks in buffer

1 row selected.

因此,实际上我们可以设置log buffer为3m,如果太小,则lgwr会过度被调用:

(3)用户提交:

当事务提交时,在redo stream中将会记录事务提交标识,在事务提交返回成功标识前必须等待lgwr写出数据,并且以log file sync等待事件开始休眠,所以,过度提交肯定会有更多的LOG FILE sync等待事件。可以设置参数,避免出现这个等待事件。但是这样无法保证事务恢复性,因为无法保证log 写出完成。

sys@HX> select x.ksppinm,y.ksppstvl,x.ksppdesc 

from x$ksppi x, x$ksppcv y 

where x.indx=y.indx and x.ksppinm like '_wait_for_sync%';

KSPPINM                       |KSPPSTVL  |KSPPDESC

------------------------------|----------|----------------------------------------------------------------------

_wait_for_sync                |TRUE      |wait for sync on commit MUST BE ALWAYS TRUE

1 row selected.

有变量用于记录redo线程需要同步的log block number,如果多个提交在唤醒LGWR之前发生,此变量记录最高 log block number,之前的所有redo都会被写入磁盘,这时候被成为组提交(group commit)。

(4)在dbwr写出之前:

在增量检查点中,如果dbwr要写出low rba与high rba之间的数据,必须要先通知lgwr去执行写出,即先写日志,再写数据。显然,未提交的数据是无法进行实例恢复的(需要回滚)。

7. redo log buffer大小设置

redo log buffer 是轻量级数据,缺省值是max(512kb,128*cpu_count),如果log buffer space等待事件出现并且很显著,可以考虑增大log_buffer较少竞争,log_buffer大小也要以granule为基础:

8. 日志文件状态:

(1)current:

当前使用的日志:

当前使用的,是实例崩溃恢复必须的日志文件。如果当前日志被删除,恢复稍微会麻烦一些。

(2)active:

活动的非当前文件:这种日志可能未完成归档,包含了尚未完成检查点的重做信息,也是实例崩溃恢复必须的。如果日志文件循环使用到这个文件,他将会被覆盖,这个时候需要执行检查点,从而产生log file switch 等待事件:

(3)inactive

非活动日志,不包含实例恢复需要的数据,但是包含介质恢复需要的数据,如果数据库运行在归档模式下,如果切换到这个文件,但是文件还未归档,也会发生log file switch 等待事件。

(4)unused

刚添加,未使用的日志就是这个状态(restlogs启动也会有文件在这个状态)。

9.日志块大小:

虽然log_buffer中的redo entries是以bytes为单位,但是lgwr仍是按照block把数据写出到磁盘。redo block size与操作系统相关,可以通过下面方法查看:

sys@HX> select max(lebsz) from x$kccle;

MAX(LEBSZ)

----------

512

1 row selected

粗略计算:

sys@HX> select name,value from v$sysstat where name in ('redo size','redo wastage','redo blocks written');

NAME                          |     VALUE

------------------------------|----------

redo size                     |  13342316

redo wastage                  |    146300

redo blocks written           |     27345

每个块头占用16bytes

sys@HX> select ceil((13342316+146300)/27345+16) from dual; 

CEIL((13342316+146300)/27345+16)

--------------------------------

510

1 row selected.

[~]$dbfsize /u01/oradata2/hx/redo01.log 

Database file: /u01/oradata2/hx/redo01.log

Database file type: file system

Database file size: 204800 512 byte blocks

Database file: /u01/oradata2/hx/system.dbf

Database file type: file system

Database file size: 126720 8192 byte blocks

可以转储重做文件头查看:

sys@HX> alter session set events 'immediate trace name redohdr level 10';

Session altered.

LOG FILE #1:

name #3: /u01/oradata2/hx/redo01.log

Thread 1 redo log links: forward: 2 backward: 0

siz: 0x32000 seq: 0x00000043 hws: 0x2 bsz: 512 nab: 0x13 flg: 0x1 dup: 1

10.重做日志文件相关操作:

(1)增加日志组:

由于日志不能改变大小,所以一般需要增加日志组:

sys@HX> select * from v$logfile;
GROUP#|STATUS |TYPE   |MEMBER                                  |IS_
----------|-------|-------|----------------------------------------|---
3|       |ONLINE |/u01/oradata2/hx/redo03.log             |NO
2|       |ONLINE |/u01/oradata2/hx/redo02.log             |NO
1|       |ONLINE |/u01/oradata2/hx/redo01.log             |NO

3 rows selected.

sys@HX> alter database add logfile group 4 '/u01/oradata2/hx/redo04.log' size 20m;

Database altered.

sys@HX> alter database add logfile group 5 '/u01/oradata2/hx/redo05.log' size 20m;  

Database altered.

sys@HX> select * from v$logfile;

GROUP#|STATUS |TYPE   |MEMBER                                  |IS_

----------|-------|-------|----------------------------------------|---

3|       |ONLINE |/u01/oradata2/hx/redo03.log             |NO

2|       |ONLINE |/u01/oradata2/hx/redo02.log             |NO

1|       |ONLINE |/u01/oradata2/hx/redo01.log             |NO

4|       |ONLINE |/u01/oradata2/hx/redo04.log             |NO

5|       |ONLINE |/u01/oradata2/hx/redo05.log             |NO

5 rows selected.

(2)删除inactive状态的日志文件:(需要先归档,但是11g下貌似只要切换日志就归档呢)

sys@HX> alter database drop logfile group 1;

Database altered.

sys@HX> select * from v$logfile;

GROUP#|STATUS |TYPE   |MEMBER                                  |IS_

----------|-------|-------|----------------------------------------|---

3|       |ONLINE |/u01/oradata2/hx/redo03.log             |NO

2|       |ONLINE |/u01/oradata2/hx/redo02.log             |NO

4|       |ONLINE |/u01/oradata2/hx/redo04.log             |NO

5|       |ONLINE |/u01/oradata2/hx/redo05.log             |NO

sys@HX> alter system archive log sequence 68;

alter system archive log sequence 68

*

ERROR at line 1:

ORA-16013: 日志 2 sequence# 68 不需要归档

手动删除文件(一定注意先归档):

sys@HX> ho rm /u01/oradata2/hx/redo02.log

重新初始化原来的日志文件,使用原来的group#:

sys@HX> alter database add logfile group 2 '/u01/oradata2/hx/redo02.log' size 20m reuse;                   

Database altered.

强制日志模式下,任何操作都会产生日志:

sys@HX> select force_logging from v$database;

FOR

---

NO

1 row selected.

sys@HX> alter database force logging;

Database altered.

sys@HX> select force_logging from v$database;

FOR

---

YES

1 row selected.

sys@HX> alter database no force logging;

Database altered.

11. oracle redo相关故障恢复;

(1)丢失非活动日志文件:

在数据库运行时候,如果删除了inactive状态的日志文件,切换到这个文件时后,数据库可以检测到错误,直接跳到下一个文件:

sys@HX> select a.group#,a.status,b.member from v$log a,v$logfile b where a.group#=b.group#;

GROUP#|STATUS          |MEMBER

----------|----------------|----------------------------------------

3|INACTIVE        |/u01/oradata2/hx/redo03.log

2|INACTIVE        |/u01/oradata2/hx/redo02.log

4|CURRENT         |/u01/oradata2/hx/redo04.log

5|INACTIVE        |/u01/oradata2/hx/redo05.log

删除日志文件

sys@HX> ho mv /u01/oradata2/hx/redo02.log  /u01/oradata2/hx/redo02.log.bak 

sys@HX> alter system switch logfile;

System altered.

对于inactive状态文件,由于已经完成检查点,不会丢失数据,采用清除方法,重新建立即可:

sys@HX> alter database clear logfile  group  2;

Database altered.

clear并重新建立文件后,这个文件是unused状态,可以继续使用

sys@HX> select a.group#,a.status,b.member from v$log a,v$logfile b where a.group#=b.group#;

GROUP#|STATUS          |MEMBER

----------|----------------|----------------------------------------

3|INACTIVE        |/u01/oradata2/hx/redo03.log

2|UNUSED          |/u01/oradata2/hx/redo02.log

4|CURRENT         |/u01/oradata2/hx/redo04.log

5|INACTIVE        |/u01/oradata2/hx/redo05.log

4 rows selected.

如果是归档模式:

则需要使用命令:

sys@HX> alter database clear unarchived logfile  group  2;

(2)丢失active和current状态的文件:

a. 数据库正常关闭情况:

由于正常关闭,日志文件是inactive和current状态,实例不需要恢复,由于无法对current状态的文件执行clear操作,这个时候可以启动到mount状态,clear inactive状态的文件,再使用until cancel恢复,然后使用restlogs打开数据库:

sys@HX> select a.group#,a.status,b.member from v$log a,v$logfile b where a.group#=b.group#;

GROUP#|STATUS          |MEMBER

----------|----------------|----------------------------------------

2|UNUSED          |/u01/oradata2/hx/redo02.log

5|INACTIVE        |/u01/oradata2/hx/redo05.log

4|CURRENT         |/u01/oradata2/hx/redo04.log

3|INACTIVE        |/u01/oradata2/hx/redo03.log

4 rows selected.

sys@HX> ho rm /u01/oradata2/hx/redo* -rf

sys@HX> alter database open;

ERROR:

ORA-03114: 未连接到 ORACLE

alter database open

*

ERROR at line 1:

ORA-03113: 通信通道的文件结尾

进程 ID: 8542

会话 ID: 1 序列号: 5

实例会自己挂掉,查看告警日志文件:

alter database open

Block change tracking file is current.

Sun Nov 24 15:19:33 2013

Errors in file /u01/diag/rdbms/hexel/hexel/trace/hexel_lgwr_8486.trc:

ORA-00313: open failed for members of log group 2 of thread 1

ORA-00312: online log 2 thread 1: '/u01/oradata2/hx/redo02.log'

ORA-27037: unable to obtain file status

Linux-x86_64 Error: 2: No such file or directory

Additional information: 3

Errors in file /u01/diag/rdbms/hexel/hexel/trace/hexel_lgwr_8486.trc:

ORA-00313: open failed for members of log group 2 of thread 1

ORA-00312: online log 2 thread 1: '/u01/oradata2/hx/redo02.log'

ORA-27037: unable to obtain file status

Linux-x86_64 Error: 2: No such file or directory

Additional information: 3

Errors in file /u01/diag/rdbms/hexel/hexel/trace/hexel_ora_8542.trc:

ORA-00313: 无法打开日志组 1 (用于线程 ) 的成员

ORA-00312: 联机日志 2 线程 1: '/u01/oradata2/hx/redo02.log'

Sun Nov 24 15:19:34 2013

System state dump requested by (instance=1, osid=8542), summary=[abnormal instance termination].

System State dumped to trace file /u01/diag/rdbms/hexel/hexel/trace/hexel_diag_8476.trc

USER (ospid: 8542): terminating the instance due to error 313

Dumping diagnostic data in directory=[cdmp_20131124151934], requested by (instance=1, osid=8542), summary=[abnormal instance termination].

Instance terminated by USER, pid = 8542

重新启动到mount状态:进行恢复:

SYS> startup mount;

SYS> select a.group#,a.status,b.member from v$log a,v$logfile b where a.group#=b.group#;

GROUP#|STATUS          |MEMBER

----------|----------------|----------------------------------------

2|UNUSED          |/u01/oradata2/hx/redo02.log

5|INACTIVE        |/u01/oradata2/hx/redo05.log

4|CURRENT         |/u01/oradata2/hx/redo04.log

3|INACTIVE        |/u01/oradata2/hx/redo03.log

4 rows selected.

SYS> alter database clear logfile group 2;

Database altered.

SYS> alter database clear logfile group 5;

Database altered.

SYS> alter database clear logfile group 3;

Database altered.

[~]$ls /u01/oradata2/hx/redo*  -lh

-rw-r----- 1 oracle oinstall  21M 11月 24 15:22 /u01/oradata2/hx/redo02.log

-rw-r----- 1 oracle oinstall 101M 11月 24 15:22 /u01/oradata2/hx/redo03.log

-rw-r----- 1 oracle oinstall  21M 11月 24 15:22 /u01/oradata2/hx/redo05.log

SYS> recover database until cancel;

Media recovery complete.

SYS> alter database open resetlogs;

b.数据库异常关闭:

数据库异常关闭时,active状态和current状态的日志文件都有实例恢复需要的数据。如果有备份,可以利用备份和归档日志文件,实现不完全恢复,然后使用resetlogs打开数据库,这样就只失去丢失的日志文件中的内容。如果没有备份,那就只能在破坏一致性前提下强制重置日志。使用参数_allow_restlog_corruption可以使所有数据文件以最旧的scn打开,这需要system表空间拥有最旧的scn。

SYS> shutdown abort;

ORACLE instance shut down.

SYS> startup mount;

SYS> ho rm /u01/oradata2/hx/redo* -rf

SYS> alter system set "_allow_resetlogs_corruption"=true scope=spfile;

SYS> select CHECKPOINT_CHANGE# from v$datafile;

CHECKPOINT_CHANGE#

------------------

15480905

15480905

15480905

15480905

15480905

15480905

15480905

7 rows selected.

SYS> recover database using backup controlfile until cancel;

ORA-00279: 更改 15480905 (在 11/24/2013 15:40:46 生成) 对于线程 1 是必需的

ORA-00289: 建议: /u01/oradata2/archived/HEXEL/archivelog/2013_11_24/o1_mf_1_6_%u_.arc

ORA-00280: 更改 15480905 (用于线程 1) 在序列 #6 中

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

cancel

ORA-01547: 警告: RECOVER 成功但 OPEN RESETLOGS 将出现如下错误

ORA-01194: 文件 1 需要更多的恢复来保持一致性

ORA-01110: 数据文件 1: '/u01/oradata2/hx/system.dbf'

ORA-01112: 未启动介质恢复

观察日志:

ALTER DATABASE RECOVER  database using backup controlfile until cancel  

Media Recovery Start

Serial Media Recovery started

WARNING! Recovering data file 1 from a fuzzy file. If not the current file

it might be an online backup taken without entering the begin backup command.

WARNING! Recovering data file 2 from a fuzzy file. If not the current file

it might be an online backup taken without entering the begin backup command.

WARNING! Recovering data file 3 from a fuzzy file. If not the current file

it might be an online backup taken without entering the begin backup command.

WARNING! Recovering data file 4 from a fuzzy file. If not the current file

it might be an online backup taken without entering the begin backup command.

WARNING! Recovering data file 5 from a fuzzy file. If not the current file

it might be an online backup taken without entering the begin backup command.

WARNING! Recovering data file 6 from a fuzzy file. If not the current file

it might be an online backup taken without entering the begin backup command.

WARNING! Recovering data file 7 from a fuzzy file. If not the current file

it might be an online backup taken without entering the begin backup command.

ORA-279 signalled during: ALTER DATABASE RECOVER  database using backup controlfile until cancel  ...

Sun Nov 24 15:52:34 2013

ALTER DATABASE RECOVER    CANCEL  

ORA-1547 signalled during: ALTER DATABASE RECOVER    CANCEL  ...

ALTER DATABASE RECOVER CANCEL 

ORA-1112 signalled during: ALTER DATABASE RECOVER CANCEL ...

Sun Nov 24 15:52:39 2013

Checker run found 8 new persistent data failures

SYS> alter database open resetlogs;

Database altered.

继续观察日志(摘取):

Sun Nov 24 15:54:25 2013

alter database open resetlogs

RESETLOGS is being done without consistancy checks. This may result

in a corrupted database. The database should be recreated.

下面是最后恢复到的change号,实际就是上次完成检查点的scn。

RESETLOGS after incomplete recovery UNTIL CHANGE 15480905

Errors in file /u01/diag/rdbms/hexel/hexel/trace/hexel_ora_9537.trc:

ORA-00313: 无法打开日志组 2 (用于线程 1) 的成员

ORA-00312: 联机日志 2 线程 1: '/u01/oradata2/hx/redo02.log'

ORA-27037: 无法获得文件状态

………………………………………………………………………………………………………………

最后,关闭参数,由于数据库以不一致状态打开,略去很多一致性检验,可能会遭遇一些ora600,所以最好逻辑备份,重新建库。

SYS> alter system set "_allow_resetlogs_corruption"=false scope=spfile;        

System altered.

SYS> startup force;


0 0
原创粉丝点击