在没有备份的情况下通过rowid提取数据文件损坏的表空间中的部分数据

来源：互联网发布：java中的类编辑：程序博客网时间：2024/05/29 16:56

在企业中，由于成本限制会出现多套数据库环境使用一套存储的情况出现。而共用存储在带来成本优势的同时，也带来了操作上的风险，万一将数据库正在使用中的存储划分给其他主机，则会带来严重的后果。
笔者曾经遭遇过一次，系统管理员误将正在使用中的四个LUN划分给了一套新数据库使用，结果造成另外一套数据库的部分数据无法访问。随即客户要求尽量抢救数据，因为数据库没有采用归档，也没有备份，只能另寻途径。
因为损坏的是表空间的部分数据文件，所以具体表现就是一张表上的某些数据无法读取，对于分区表来说就是一个segment上的数据无法访问，而每次访问到这些无法读取的块时都会抛出

ORA-01116: error in opening database file XXX

错误，那么接下来要做的事情就是通过rowid绕过损坏的文件，将这些表中能用的数据抽取到新的一张表中。

具体脚本来自于MOS上的文章：

Extract rows from a CORRUPT table creating ROWID from DBA_EXTENTS (文档 ID 422547.1)

脚本内容：

connect / as sysdba set serveroutput on set concat off         DECLARE   nrows number;  rid rowid;  dobj number;  ROWSPERBLOCK number; BEGIN  ROWSPERBLOCK:=<VALUE CALCULATED IN STEP 1>;  nrows:=0;  select data_object_id  into dobj   from dba_objects   where owner = '&&table_owner'   and object_name = '&&table_name' -- and subobject_name = '<table partition>'  Add this condition if table is partitioned   ; for i in (select relative_fno, block_id, block_id+blocks-1 totblocks                       from dba_extents                       where owner = '&&table_owner'                           and segment_name = '&&table_name'  -- and partition_name = '<table partition>' Add this condition if table is partitioned -- and file_id != <OFFLINED DATAFILE> This condition is only used if a datafile needs to be skipped due to ORA-376 (A)           order by extent_id)   loop   for br in i.block_id..i.totblocks loop     for j in 1..ROWSPERBLOCK loop     begin       rid := dbms_rowid.ROWID_CREATE(1,dobj,i.relative_fno, br , j-1);       insert into <OWNER.NEW_TABLE> (<columns here>)              select /*+ ROWID(A) */ <columns here>              from &&table_owner.&&table_name A        where rowid = rid;                if sql%rowcount = 1 then nrows:=nrows+1; end if;       if (mod(nrows,10000)=0) then commit; end if;     exception when others then null;     end;     end loop;   end loop;  end loop;  COMMIT; dbms_output.put_line('Total rows: '||to_char(nrows)); END; /

下面我们通过实验来重现一下场景：

创建表空间，为了能够快速填满表空间，数据文件设置为20m非自动扩展：

SQL>  create tablespace DATATBS datafile  '/oradata/mydb/datatbs01.dbf' size 20m autoextend off;Tablespace created.

创建用户并授权：

SQL> create user user01 identified by oracle default tablespace DATATBS;User created.SQL> grant connect,resource to user01;Grant succeeded.

登录user01用户

SQL> conn user01/oracleConnected.

为了简单起见，在表空间中只有一个数据文件的情况下，我们先创建一张普通表

SQL> create table t1 (col1 int,col2 varchar2(50),col3 date);Table created.SQL> create index ind_t1_col1 on t1(col1);Index created.

向普通表中插入数据

SQL> insert into t1 values(0,'zhangsan',sysdate);1 row created.SQL> commit;Commit complete.

查看extent的情况

SQL>  select owner,segment_name,partition_name,TABLESPACE_NAME,file_id from dba_extents where owner='USER01';USER01                         IND_T1_COL1                                                      DATATBS                5USER01                         T1                                                                        DATATBS                5

随后我们为DATATBS表空间添加另外一个数据文件，之后向T1表插入大量数据，使得数据分布在两个数据文件上：

SQL> alter tablespace DATATBS add datafile '/oradata/mydb/datatbs02.dbf' size 20m autoextend off;Tablespace altered.

记录下两个数据文件的文件号：

SQL>  col file_name for a40SQL> select FILE_ID,RELATIVE_FNO,FILE_NAME,TABLESPACE_NAME from dba_data_files where tablespace_name='DATATBS';         5            5 /oradata/mydb/datatbs01.dbf              DATATBS         6            6 /oradata/mydb/datatbs02.dbf              DATATBS

像t1表中插入大量数据：

SQL> beginfor i in 1..100000 loop   insert into t1 values(0,'zhangsan',sysdate);   end loop;   commit;   end;/  2    3    4    5    6    7PL/SQL procedure successfully completed.

这时查询extent分布情况，通过file_id来看，两个文件都有分布了：

SQL> col segment_name for a30SQL> run  1*  select owner,segment_name,TABLESPACE_NAME,file_id from dba_extents where owner='USER01'OWNER                       SEGMENT_NAME                      TABLESPACE_NAME                     FILE_ID------------------------ ------------------------------       ----------------------------         ----------USER01                         T1                                                   DATATBS                   5USER01                         T1                                                   DATATBS                   5USER01                         T1                                                   DATATBS                   5USER01                         T1                                                   DATATBS                   5USER01                         T1                                                   DATATBS                   5USER01                         T1                                                   DATATBS                   5USER01                         T1                                                   DATATBS                   5USER01                         T1                                                   DATATBS                   5USER01                         T1                                                   DATATBS                   5USER01                         T1                                                   DATATBS                   5USER01                         T1                                                   DATATBS                   6USER01                         T1                                                   DATATBS                   6USER01                         T1                                                   DATATBS                   6USER01                         T1                                                   DATATBS                   6USER01                         T1                                                   DATATBS                   6USER01                         T1                                                   DATATBS                   6USER01                         T1                                                   DATATBS                   5USER01                         T1                                                   DATATBS                   6USER01                         IND_T1_COL1                                          DATATBS                   5USER01                         IND_T1_COL1                                          DATATBS                   5USER01                         IND_T1_COL1                                          DATATBS                   5USER01                         IND_T1_COL1                                          DATATBS                   5USER01                         IND_T1_COL1                                          DATATBS                   5USER01                         IND_T1_COL1                                          DATATBS                   5USER01                         IND_T1_COL1                                          DATATBS                   6USER01                         IND_T1_COL1                                          DATATBS                   6USER01                         IND_T1_COL1                                          DATATBS                   6USER01                         IND_T1_COL1                                          DATATBS                   6USER01                         IND_T1_COL1                                          DATATBS                   6USER01                         IND_T1_COL1                                          DATATBS                   6USER01                         IND_T1_COL1                                          DATATBS                   6USER01                         IND_T1_COL1                                          DATATBS                   6USER01                         IND_T1_COL1                                          DATATBS                   6USER01                         IND_T1_COL1                                          DATATBS                   6USER01                         IND_T1_COL1                                          DATATBS                   5USER01                         IND_T1_COL1                                          DATATBS                   6

然后我们对数据文件进行破坏，直接删除 datatbs02.dbf ，也就是6号文件：

[oracle@oelhost1 mydb]$ rm datatbs02.dbf [oracle@oelhost1 mydb]$ ls -ltrtotal 1563608-rw-r----- 1 oracle oinstall  30416896 Dec 12 13:39 temp01.dbf-rw-r----- 1 oracle oinstall  52429312 Dec 12 13:43 redo01.log-rw-r----- 1 oracle oinstall   5251072 Dec 12 13:44 users01.dbf-rw-r----- 1 oracle oinstall 555753472 Dec 12 13:44 sysaux01.dbf-rw-r----- 1 oracle oinstall  31465472 Dec 12 13:44 datatbs01.dbf-rw-r----- 1 oracle oinstall 744497152 Dec 12 13:44 system01.dbf-rw-r----- 1 oracle oinstall  52429312 Dec 12 13:44 redo02.log-rw-r----- 1 oracle oinstall  73408512 Dec 12 13:44 undotbs01.dbf-rw-r----- 1 oracle oinstall  52429312 Dec 12 13:45 redo03.log-rw-r----- 1 oracle oinstall   9748480 Dec 12 13:46 control01.ctl

由于实验环境数据库负载很小，要对buffer cache进行刷新：

SQL> conn / as sysdbaSQL>  alter system flush BUFFER_CACHE;System altered.

这时候执行查询就会抛出文件无法访问的错误：

SQL> conn user01/oracleConnected.SQL> select count(*) from t1;select count(*) from t1*ERROR at line 1:ORA-01116: error in opening database file 6ORA-01110: data file 6: '/oradata/mydb/datatbs02.dbf'ORA-27041: unable to open fileLinux Error: 2: No such file or directoryAdditional information: 3

而直接访问表中的数据则会在遇到丢失块的时候才会报错：

SQL> select * from t1;      COL1 COL2                                               COL3---------- -------------------------------------------------- ------------         0 zhangsan                                           12-DEC-16         0 zhangsan                                           12-DEC-16此处省略2W行。。。ERROR:ORA-01116: error in opening database file 6ORA-01110: data file 6: '/oradata/mydb/datatbs02.dbf'ORA-27041: unable to open fileLinux Error: 2: No such file or directoryAdditional information: 322035 rows selected.

接下来我们就要通过文章开头处的脚本来尽可能的找出剩余的数据，并且插入到一张新表中，访问这张新表就不会再报错了。该脚本的主要思路是通过rowid绕过损坏的数据文件，这里就涉及到ROWID的组成结构了，我们一起复习一下：
Oracle从8版本开始使用扩展的rowid，由四部分组成，格式为OOOOOOFFFBBBBBBRRR：

OOOOOO ：前6位表示数据库中segment段的对象ID，如AAAAao。需要注意的是对于聚簇表等使用相同段的对象，其对象ID也相同。
FFF：数据所在数据文件的相对文件号，如AAT。
BBBBBB：数据行所在数据块的号码，例如AAABrX。数据块号码是相对于它们所在的数据文件来说的，而非表空间。因此，在同一个表空间中可能存在两行具有相同块号的数据分布于不同的两个数据文件中。
RRR: 最后三位表示数据行号。

根据上面的描述，我们可以利用对象ID，没有丢失的数据文件号，通过dba_extents中的block_id，以及行号构建rowid，通过rowid就可以直接访问可用的数据了。那么，在这里存在一个问题，我们如何知道行号呢？答案是不知道，得靠猜。怎么猜呢，如果该对象有统计信息，那么查询该表的平均行长度，根据blocksize大小，就可以估算一个块中总行数，保险起见，建议将估算的总行数乘以2，就是一个数据块中可能存在最大行数，例如，块大小8K，
–查看平均行长度

select avg_row_len from  dba_tables where table_name='TABLE_NAME';

假如平均行长度为22，则结果为
8192/22*2=745.

如果之前没有统计信息，那么就得将这个值尽量估算的高一些，可以根据表中存储的列进行大致估算。

接下来创建一个新的表空间存储新表

SQL>  create tablespace DATATBS_NEW datafile '/oradata/mydb/datatbs_new01.dbf' size 10m autoextend on;Tablespace created.

创建新表

SQL>  create table user01.t1_new  tablespace DATATBS_NEW  as select * from user01.t1 where 1=2 ;Table created.

修改脚本并执行，注释部分为修改内容：

set serveroutput on set concat off         DECLARE   nrows number;  rid rowid;  dobj number;  ROWSPERBLOCK number; BEGIN  ROWSPERBLOCK:=745; --估算的每块最大行数乘以2 nrows:=0;  select data_object_id  into dobj   from dba_objects   where owner = '&&table_owner'   and object_name = '&&table_name' -- and subobject_name = '<table partition>'  如果是分区表需要追加上这个条件   ; for i in (select relative_fno, block_id, block_id+blocks-1 totblocks                       from dba_extents                       where owner = '&&table_owner'                           and segment_name = '&&table_name'  -- and partition_name = '<table partition>' Add this condition if table is partitioned  and file_id not in (6)  --此处填写需要跳过的文件号码，本例中是6号文件          order by extent_id)   loop   for br in i.block_id..i.totblocks loop     for j in 1..ROWSPERBLOCK loop     begin       rid := dbms_rowid.ROWID_CREATE(1,dobj,i.relative_fno, br , j-1);       insert into user01.t1_new      --新表的名称      select /*+ ROWID(A) */ *             from &&table_owner.&&table_name A        where rowid = rid;                if sql%rowcount = 1 then nrows:=nrows+1; end if;       if (mod(nrows,10000)=0) then commit; end if;     exception when others then null;     end;     end loop;   end loop;  end loop;  COMMIT; dbms_output.put_line('Total rows: '||to_char(nrows)); END;/SQL> SQL> SQL>   2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19                                                                 20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   35   36   37   38   39   40                                                                 41   42   43Enter value for table_owner: USER01old  12:  where owner = '&&table_owner'new  12:  where owner = 'USER01'Enter value for table_name: T1old  13:  and object_name = '&&table_name'new  13:  and object_name = 'T1'old  19:            where owner = '&&table_owner'new  19:            where owner = 'USER01'old  20:              and segment_name = '&&table_name'new  20:              and segment_name = 'T1'old  31:       from &&table_owner.&&table_name Anew  31:       from USER01.T1 ATotal rows: 60098PL/SQL procedure successfully completed.SQL> select count(*) from t1_new;  COUNT(*)----------     60098

如果把每个数据块中最大总行数估算过小，则会丢失部分本来可以恢复的数据。将ROWSPERBLOCK改为300，再执行该脚本，结果是59700，少了几百行数据：

truncate table user01.t1_new;省略执行过程 ROWSPERBLOCK:=300; SQL>  select count(*) from t1_new;  COUNT(*)----------     59700

而将ROWSPERBLOCK改为1500，估算值的4倍，再执行该脚本，结果还是60098，没有进一步的变化，说明估算值的2倍可以满足要求：

truncate table user01.t1_new;省略执行过程 ROWSPERBLOCK:=1500; SQL>  select count(*) from t1_new;  COUNT(*)----------     60098

至此，抽取已经完成，注意此脚本的效率较低，每行数据需要一次循环，因为ROWSPERBLOCK也不宜过大。后续需要继续处理损坏的受影响的表空间以及表和索引等，不再赘述。

0 0