How To Validate ASM Diskgroup Consistency/State After ASM Reclamation Utility (ASRU) Execution Abort

来源：互联网发布：js div大小改变事件编辑：程序博客网时间：2024/06/14 01:12

In this Document

Goal Solution Notes: Community Discussions References

APPLIES TO:

Oracle Database - Standard Edition - Version 10.2.0.1 to 12.1.0.1 [Release 10.2 to 12.1]
Oracle Database - Enterprise Edition - Version 10.2.0.1 to 12.1.0.1 [Release 10.2 to 12.1]
Information in this document applies to any platform.

GOAL

The present document shows in detail an example about the required tasks/validations to be executed in case ASM Reclamation Utility (ASRU) aborts (e.g. shell OS session/window is disconnected, ASRU execution is killed by accident, etc.).

SOLUTION

1) +DATA diskgroup was created to allocate the database files, it shows 14,290 MB free:

SQL*Plus: Release 11.2.0.4.0 Production on Thu May 1 20:15:53 2014

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Automatic Storage Management option

SQL> select group_number, name, state, total_mb, free_mb from v$asm_diskgroup;

GROUP_NUMBER NAME                           STATE         TOTAL_MB    FREE_MB
------------ ------------------------------ ----------- ---------- ----------
           2 DATA                           MOUNTED          14350      14290

2) These are its disks members:

SQL> select name, path, total_mb, free_mb from v$asm_disk where group_number like 2;

NAME                           PATH              TOTAL_MB    FREE_MB
------------------------------ --------------- ---------- ----------
DATA_0004                      /dev/raw/raw6         2870       2858
DATA_0003                      /dev/raw/raw5         2870       2858
DATA_0002                      /dev/raw/raw3         2870       2858
DATA_0001                      /dev/raw/raw2         2870       2858
DATA_0000                      /dev/raw/raw1         2870       2858

3) The original space allocation (in the +DATA diskgroup) before datafiles are created is as follows:

SQL> select group_number, name, state, total_mb, free_mb from v$asm_diskgroup;

GROUP_NUMBER NAME                           STATE         TOTAL_MB    FREE_MB
------------ ------------------------------ ----------- ---------- ----------
           2 DATA                           MOUNTED          14350      14290

SQL> select name, path, total_mb, free_mb from v$asm_disk where group_number like 2;

NAME                           PATH              TOTAL_MB    FREE_MB
------------------------------ --------------- ---------- ----------
DATA_0004                      /dev/raw/raw6         2870       2858
DATA_0003                      /dev/raw/raw5         2870       2858
DATA_0002                      /dev/raw/raw3         2870       2858
DATA_0001                      /dev/raw/raw2         2870       2858
DATA_0000                      /dev/raw/raw1         2870       2858

4) Then, the next tablespaces (6GB each) were created in the “+DATA” diskgroup:

SQL> CREATE BIGFILE TABLESPACE "6GBTS_1" DATAFILE '+DATA' SIZE 6G LOGGING EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO;

Tablespace created.

SQL> CREATE BIGFILE TABLESPACE "6GBTS_2" DATAFILE '+DATA' SIZE 6G LOGGING EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO;

Tablespace created.

5) After the tablespaces were created, the new space allocation (in the +DATA diskgroup) is as follows:

SQL> select group_number, name, state, total_mb, free_mb from v$asm_diskgroup;

GROUP_NUMBER NAME                           STATE         TOTAL_MB    FREE_MB
------------ ------------------------------ ----------- ---------- ----------
           2 DATA                           MOUNTED          14350        210

SQL> select name, path, total_mb, free_mb from v$asm_disk where group_number like 2;

NAME                           PATH                   TOTAL_MB    FREE_MB
------------------------------ -------------------- ---------- ----------
DATA_0004                      /dev/raw/raw6              2870         43
DATA_0003                      /dev/raw/raw5              2870         44
DATA_0002                      /dev/raw/raw3              2870         39
DATA_0001                      /dev/raw/raw2              2870         41
DATA_0000                      /dev/raw/raw1              2870         43

6) Then, one of the tablespaces was dropped to release 6GB of space from “+DATA” diskgroup:

SQL> drop tablespace "6GBTS_1" ;

Tablespace dropped.

7) ASM shows 6GB were released from the +DATA diskgroup:

SQL> select group_number, name, state, total_mb, free_mb from v$asm_diskgroup;

GROUP_NUMBER NAME                           STATE         TOTAL_MB    FREE_MB
------------ ------------------------------ ----------- ---------- ----------
           2 DATA                           MOUNTED          14350       6356

NAME                           PATH                   TOTAL_MB    FREE_MB
------------------------------ -------------------- ---------- ----------
DATA_0004                      /dev/raw/raw6              2870       1272
DATA_0003                      /dev/raw/raw5              2870       1273
DATA_0002                      /dev/raw/raw3              2870       1269
DATA_0001                      /dev/raw/raw2              2870       1270
DATA_0000                      /dev/raw/raw1              2870       1272

8) Next step, ASRU was executed against the “+DATA” diskgroup to reclaim the space at physical lun level:

8.1) ASRU was executed as follows (when ASRU was resizing the second of 5 disks, the “OS shell session/connection/window” was intentionally closed to simulate “ASRU” crash/disconnection/abort):

[ceintcb15]/refresh/asmsupt/home/asru> ASRU DATA
Checking the system ...done
Calculating the sizes of the disks ...done
Writing the data to a file ...done
Resizing the disks...done
Calculating the sizes of the disks ...done

/refresh/asmsupt/app/oracle/product/ASMdbA/perl/bin/perl -I /refresh/asmsupt/app/oracle/product/ASMdbA/perl/lib/5.10.0 /refresh/asmsupt/home/asru/zerofill 1 /dev/raw/raw5 1998 2870 /dev/raw/raw2 1999 2870 /dev/raw/raw1 2000 2870 /dev/raw/raw3 2001 2870 /dev/raw/raw6 2001 2870
872+0 records in
872+0 records out
914358272 bytes (914 MB) copied, 19.3352 seconds, 47.3 MB/s
871+0 records in
871+0 records out
913309696 bytes (913 MB) copied, 18.8338 seconds, 48.5 MB/s
870+0 records in
870+0 records out   <(=========   <<<<<<(Session was closed here)>>>>>>

8.2) The trace file (ASRU.trc) shows ASRU did not complete in background the storage reclamation (the ASRU process did not “dd” all the 5 disks, only 4) due to the session was closed:

Making diskgroup DATA thin provision friendly
Fri May  2 15:32:59 2014 
ASM_POWER_LIMIT is 1
Fri May  2 15:32:59 2014 
Checking the system ...
No traces from the previous execution found
Fri May  2 15:32:59 2014 
Calculating the sizes of the disks..
Fri May  2 15:32:59 2014 
Executing /* ASRU */SELECT D.NAME,D.TOTAL_MB,D.FREE_MB,G.ALLOCATION_UNIT_SIZE 
            FROM V$ASM_DISK D, 
            V$ASM_DISKGROUP G WHERE 
            D.GROUP_NUMBER = G.GROUP_NUMBER AND G.NAME='DATA'
Calculated sizes : 
DATA_0003 : total:2870 free:1273 used:1597 new:1997 
DATA_0001 : total:2870 free:1272 used:1598 new:1998 
DATA_0000 : total:2870 free:1271 used:1599 new:1999 
DATA_0002 : total:2870 free:1270 used:1600 new:2000 
DATA_0004 : total:2870 free:1270 used:1600 new:2000 
Fri May  2 15:33:00 2014 
Fri May  2 15:33:00 2014 
Writing the data to a file ...
Data to be recorded in the tp file :  DATA_0003 2870 DATA_0001 2870 DATA_0000 2870 DATA_0002 2870 DATA_0004 2870 
Fri May  2 15:33:00 2014 
Resizing the disks...
Fri May  2 15:33:00 2014 
Executing ALTER DISKGROUP DATA  RESIZE DISK DATA_0003 SIZE 1997M DISK DATA_0001 SIZE 1998M DISK DATA_0000 SIZE 1999M DISK DATA_0002 SIZE 2000M DISK DATA_0004 SIZE 2000M REBALANCE WAIT/* ASRU */
Fri May  2 15:33:17 2014 
Calculating the sizes of the disks..
Fri May  2 15:33:17 2014 
Executing /* ASRU */SELECT D.NAME,D.TOTAL_MB,D.FREE_MB,G.ALLOCATION_UNIT_SIZE 
            FROM V$ASM_DISK D, 
            V$ASM_DISKGROUP G WHERE 
            D.GROUP_NUMBER = G.GROUP_NUMBER AND G.NAME='DATA'
Disk sizes after first resize: 
DATA_0003 : 1997
DATA_0001 : 1998
DATA_0000 : 1999
DATA_0002 : 2000
DATA_0004 : 2000
Checking whether the resize is done successfully or not..
Fri May  2 15:33:17 2014 
Fri May  2 15:33:17 2014 
Fri May  2 15:33:17 2014 
Power given to the free function : 1
Retrieving the paths:
DATA_0003 : /dev/raw/raw5
DATA_0001 : /dev/raw/raw2
DATA_0000 : /dev/raw/raw1
DATA_0002 : /dev/raw/raw3
DATA_0004 : /dev/raw/raw6
Executing the zerofill at /refresh/asmsupt/home/asru/zerofill
Completed parsing disk and their ranges which are to be zeroed
Batch number 1 started
Executing /bin/dd if=/dev/zero of=/dev/raw/raw5 seek=1998 bs=1024k count=872
Batch number 1 ended
Batch number 2 started
Executing /bin/dd if=/dev/zero of=/dev/raw/raw2 seek=1999 bs=1024k count=871
Batch number 2 ended
Batch number 3 started
Executing /bin/dd if=/dev/zero of=/dev/raw/raw1 seek=2000 bs=1024k count=870
Batch number 3 ended
Batch number 4 started
Executing /bin/dd if=/dev/zero of=/dev/raw/raw3 seek=2001 bs=1024k count=869

8.3) Then, “check all repair” was executed on the +DATA diskgroup to confirm or discard inconsistencies (to validate the diskgroup):

SQL> alter diskgroup data CHECK ALL REPAIR;

Diskgroup altered.

8.4) “CHECK ALL REPAIR” did not report any ASM inconsistency:

SQL> alter diskgroup data CHECK ALL REPAIR
NOTE: starting check of diskgroup DATA
Fri May 02 16:00:41 2014
GMON checking disk 0 for group 1 at 9 for pid 22, osid 9360
GMON checking disk 1 for group 1 at 10 for pid 22, osid 9360
GMON checking disk 2 for group 1 at 11 for pid 22, osid 9360
GMON checking disk 3 for group 1 at 12 for pid 22, osid 9360
GMON checking disk 4 for group 1 at 13 for pid 22, osid 9360
SUCCESS: check of diskgroup DATA found no errors
SUCCESS: alter diskgroup data check all repair

8.5) Then, “AMDU” was executed as well as a second health check:

SQL> alter diskgroup data dismount;

Diskgroup altered.

 SQL> select name, state from v$asm_diskgroup;

NAME                           STATE
------------------------------ -----------
DATA                           DISMOUNTED

[ceintcb15]/refresh/asmsupt/home/asru> amdu -diskstring '/dev/raw/raw*' -dump 'DATA'
amdu_2014_05_02_16_34_12/
[ceintcb15]/refresh/asmsupt/home/asru>   


[ceintcb15]/refresh/asmsupt/home/asru> cd amdu_2014_05_02_16_34_12/

[ceintcb15]/refresh/asmsupt/home/asru/amdu_2014_05_02_16_34_12> ls -l
total 64608
-rw-r--r-- 1 asmsupt asmsupt 66068480 May  2 16:34 DATA_0001.img
-rw-r--r-- 1 asmsupt asmsupt     5680 May  2 16:34 DATA.map
-rw-r--r-- 1 asmsupt asmsupt     8868 May  2 16:34 report.txt
[ceintcb15]/refresh/asmsupt/home/asru/amdu_2014_05_02_16_34_12>

8.6) AMDU did not report any corrupted block on the +DATA diskgroup:

------------------------- SUMMARY FOR DISKGROUP DATA -------------------------
           Allocated AU's: 7994
                Free AU's: 6356
       AU's read for dump: 71
       Block images saved: 16130
        Map lines written: 71
          Heartbeats seen: 0
  Corrupt metadata blocks: 0   <(====
        Corrupt AT blocks: 0   <(====


******************************* END OF REPORT ********************************

9) After the +DATA diskgroup was validated and we confirmed it is in good shape, then ASRU was executed again as follows:

9.1) DATA diskgroup was mounted back:

SQL> alter diskgroup data mount;

Diskgroup altered.

9.2) Then ASRU was executed again, it is very clear in the results below that ASRU “started over” and resized (dd) all the 5 disks again (it started from scratch):

[ceintcb15]/refresh/asmsupt/home/asru> ASRU DATA
Checking the system ...done
Calculating the sizes of the disks ...done
Writing the data to a file ...done
Resizing the disks...done
Calculating the sizes of the disks ...done

/refresh/asmsupt/app/oracle/product/ASMdbA/perl/bin/perl -I /refresh/asmsupt/app/oracle/product/ASMdbA/perl/lib/5.10.0 /refresh/asmsupt/home/asru/zerofill 1 /dev/raw/raw5 1999 2870 /dev/raw/raw2 1999 2870 /dev/raw/raw1 2000 2870 /dev/raw/raw3 2001 2870 /dev/raw/raw6 2000 2870
871+0 records in
871+0 records out
913309696 bytes (913 MB) copied, 24.8047 seconds, 36.8 MB/s
871+0 records in
871+0 records out
913309696 bytes (913 MB) copied, 20.2713 seconds, 45.1 MB/s
870+0 records in
870+0 records out
912261120 bytes (912 MB) copied, 20.4338 seconds, 44.6 MB/s
869+0 records in
869+0 records out
911212544 bytes (911 MB) copied, 18.9407 seconds, 48.1 MB/s
870+0 records in
870+0 records out
912261120 bytes (912 MB) copied, 19.1543 seconds, 47.6 MB/s

Calculating the sizes of the disks ...done
Resizing the disks...done
Calculating the sizes of the disks ...done
Dropping the file ...done

Notes:

1) Usually ASRU execution should not be interrupted due to during this operation the disks could be shrunk or resized, therefore this action could generate a possible corruption issue on the ASM physical disks.

2) It is recommendable to perform this operation using a VNC session or directly on the console to avoid ASRU being interrupted due to session disconnections.

3) If it fails, it needs to start over, since at the moment ASRU has no capability to resume if it was cancelled/aborted. The following needs to be done:

3.1) Run the next health check on the diskgroup as follows:

SQL> alter diskgroup <diskgroup name> check all repair;

3.2) Then review the ASM alert.log which will report the results from the previous command and look for any corruption issue.

3.3) Obtain the AMDU dump from the affected diskgroup as follows (execute it as grid OS user):

$> <ASM Oracle Home>/bin/amdu -diskstring ' /dev/oracleasm/disks/*' -dump '<diskgroup name>'

Note 1: A new directory (e.g. amdu_2013_07_20_16_03_15/) with three files (<diskgroup name>_0001.img, <diskgroup name>.map & report.txt) will be created per diskgroup:

Note 2: Please review the report.txt file and look for any corruption issue reported by AMDU.

3.4) Also, please review the OS logs from all the nodes and look for any disk I/O issue:

=)> How To Gather The OS Logs For Each Specific OS Platform. (Doc ID 1349613.1)

3.5) If the previous health checks report no issues, then rerun the ASRU utility as described in the following document:

http://www.oracle.com/us/products/database/oracle-asru-3par.pdf

0 0