Oracle 11g 使用 dbms_parallel_execute 对大表进行并行update

来源:互联网 发布:奥飞数据 股票 编辑:程序博客网 时间:2024/06/04 17:49

一.  dbms_parallel_execute说明

Updating Large Tables in Parallel

       TheDBMS_PARALLEL_EXECUTEpackage enables you to incrementally update the data in a large table in parallel, in twohigh-level steps:

       (1)Group sets of rows in the table into smaller chunks.

       (2)Apply the desired UPDATE statement to the chunks in parallel,committing each time you have finished processing a chunk.

       --dbms_parallel_execute 包使用并行的2个步骤,一是将大表分成多个小的chunks。二对这些小的chunks 进行并行。

 

       Thistechnique is recommended whenever you are updating a lot of data. Its advantages are:

       (1)You lock only one set of rows at a time, for a relatively shorttime, instead of locking the entire table.

       (2)You do not lose work that has been done if something fails beforethe entire operation finishes.

       (3)You reduce rollback space consumption.

       (4)You improve performance.

 

See Also:

       OracleDatabase PL/SQL Packages and Types Reference for more information about theDBMS_PARALLEL_EXECUTE package

      

       http://download.oracle.com/docs/cd/E11882_01/appdev.112/e16760/d_parallel_ex.htm#ARPLS233

       -- 这个链接上有这个包的详细使用说明。

 

       并行在一定程度上能够提高SQL 的性能, 在我的blog里对parallelexecution 这块有说明:

       Oracle Parallel Execution(并行执行)

       http://blog.csdn.net/xujinyang/article/details/6832630

 

提到这篇文章,是关注一个问题:

       Oracle对Delete,update,merge的操作限制在,只有操作的对象是分区表示,Oracle 才会启动并行操作。原因在于,对于分区表,Oracle 会对每个分区启用一个并行服务进程同时进行数据处理,这对于非分区表来说是没有意义的。

 

       如果我们要对一张大表进行update,而且该表又不是分区表,这时就可以使用我们的dbms_parallel­_execute包来进行并行操作。

       dbms_parallel_execute包是把大表分成了多个小的chunks,然后对chunks进行并行,这个就类似把非分区表变成了分区表。

       注意,该包是Oracle 11g 以后才有的。

 

二.  使用说明

以下内容转自:

       http://www.oracle-base.com/articles/11g/dbms_parallel_execute_11gR2.php

 

2.1 操作需要createjob的权限,所以先赋权

SQL> conn / as sysdba;

Connected.

SQL> grant create job to icd;

Grant succeeded.

SQL> conn icd/icd;

Connected.

 

2.2 创建相关的测试表并插入数据

SQL> CREATE TABLE test_tab (
  2    id          NUMBER,
  3    description VARCHAR2(50),
  4    num_col     NUMBER,
  5    CONSTRAINT test_tab_pk PRIMARY KEY (id)
  6  );
Table created.
 
SQL> INSERT /*+ APPEND */ INTO test_tab
  2  SELECT level,
  3         'Description for ' || level,
  4         CASE
  5           WHEN MOD(level, 5) = 0 THEN 10
  6           WHEN MOD(level, 3) = 0 THEN 20
  7           ELSE 30
  8         END
  9  FROM   dual
10  CONNECT BY level <= 500000;
500000 rows created.
SQL> commit;
Commit complete.
 

2.3 收集统计信息

SQL> EXEC DBMS_STATS.gather_table_stats(USER, 'TEST_TAB', cascade => TRUE);
PL/SQL procedure successfully completed.
 
SQL> SELECT num_col, COUNT(*)
  2      FROM   test_tab
  3      GROUP BY num_col
  4      ORDER BY num_col;
 
   NUM_COL   COUNT(*)
---------- ----------
        10     100000
        20     133333
        30     266667
 

2.4  创建task

       TheCREATE_TASK procedure is used to create a new task. It requires a task name tobe specified, but can also include an optional task comment.

 

SQL> BEGIN

 2   DBMS_PARALLEL_EXECUTE.create_task (task_name => 'test_task');

 3  END;

 4  /

PL/SQL procedure successfully completed.

 

       Informationabout existing tasks is displayed using the [DBA|USER]_PARALLEL_EXECUTE_TASKSviews.

 

SQL> COLUMN task_name FORMAT A10

SQL> SELECT task_name,

 2         status

 3  FROM   user_parallel_execute_tasks;

 

TASK_NAME STATUS

---------- -------------------

test_task CREATED

 

       The GENERATE_TASK_NAME function returns a unique task name ifyou do not want to name the task manually.

 

SQL> SELECTDBMS_PARALLEL_EXECUTE.generate_task_name FROM  dual;

 

GENERATE_TASK_NAME

-----------------------------------------------------

TASK$_1

 

2.5 Split the workload into chunks

       将一张大表split 成多个chunks 有三种方法。

       (1)CREATE_CHUNKS_BY_ROWID

       (2)CREATE_CHUNKS_BY_NUMBER_COL

       (3)CREATE_CHUNKS_BY_SQL

      

       分配好的chunks 可以用drop_chunks 来删除。

 

2.5.1 CREATE_CHUNKS_BY_ROWID

       TheCREATE_CHUNKS_BY_ROWID procedure splits the data by rowid into chunks specifiedby the CHUNK_SIZE parameter. If the BY_ROW parameter isset to TRUE, the CHUNK_SIZE refers to the number of rows, otherwise it refersto the number of blocks.

 

SQL> BEGIN

  2dbms_parallel_execute.create_chunks_by_rowid(task_name   => 'test_task',

 3                                       table_owner => 'icd',

 4                                       table_name => 'test_tab',

 5                                       by_row      => true,

 6                                       chunk_size => 10000);

 7  end;

 8  /

PL/SQL procedure successfully completed.

 

一旦chunks创建完毕,task 的状态就变成了'chunked'.

SQL> COLUMN task_name FORMAT A10

SQL> SELECT task_name,

 2         status

 3  FROM   user_parallel_execute_tasks;

 

TASK_NAME STATUS

---------- -------------------

test_task CHUNKED

 

       The [DBA|USER]_PARALLEL_EXECUTE_CHUNKS views displayinformation about the individual chunks.

 

SQL> SELECT chunk_id, status,start_rowid, end_rowid

 2  FROM   user_parallel_execute_chunks

 3  WHERE  task_name = 'test_task'

 4  ORDER BY chunk_id;

 

 CHUNK_ID STATUS               START_ROWID        END_ROWID

---------- -------------------------------------- ------------------

        2 UNASSIGNED          AAATMCAAMAABSMIAAA AAATMCAAMAABSMPCcP

        3 UNASSIGNED          AAATMCAAMAABSMgAAA AAATMCAAMAABSMnCcP

        4 UNASSIGNED           AAATMCAAMAABSMoAAAAAATMCAAMAABSMvCcP

...

       73 UNASSIGNED          AAATMCAAMAABS0yAAA AAATMCAAMAABS1jCcP

       74 UNASSIGNED          AAATMCAAMAABS1kAAA AAATMCAAMAABS1/CcP

 

73 rows selected.

 

删除chunks

SQL> begin

 2  dbms_parallel_execute.drop_chunks('test_task');

 3  end;

 4  /

PL/SQL procedure successfully completed.

 

再次查看chunk状态,又变成了created.

SQL> SELECT task_name,

 2             status

 3     FROM   user_parallel_execute_tasks;

 

TASK_NAME STATUS

---------- -------------------

test_task CREATED

 

2.5.2  CREATE_CHUNKS_BY_NUMBER_COL

      TheCREATE_CHUNKS_BY_NUMBER_COL procedure divides the workload up based on a number column. It uses the specifiedcolumns min and max values along with the chunk size to split the data intoapproximately equal chunks. For the chunks to be equally sized the column mustcontain a continuous sequence of numbers, like that generated by a sequence.

 

BEGIN

dbms_parallel_execute.create_chunks_by_number_col(task_name    => 'test_task',

                                                                                           table_owner  => 'ICD',

                                                                                    table_name   => 'TEST_TAB',

                                         table_column => 'ID',

                                         chunk_size   => 10000);

END;

/

 

The [DBA|USER]_PARALLEL_EXECUTE_CHUNKSviews display information about the individual chunks.

 

SQL> SELECT chunk_id, status, start_id,end_id

 2  FROM   user_parallel_execute_chunks

 3  WHERE  task_name = 'test_task'

 4  ORDER BY chunk_id;

 

 CHUNK_ID STATUS                START_ID     END_ID

---------- -------------------- --------------------

       75 UNASSIGNED                   1      10000

       76 UNASSIGNED               10001      20000

       77 UNASSIGNED               20001      30000

       78 UNASSIGNED               30001      40000

       ......

      122 UNASSIGNED              470001     480000

      123 UNASSIGNED              480001     490000

      124 UNASSIGNED              490001     500000

 

50 rows selected.

 

2.5.3 CREATE_CHUNKS_BY_SQL

       TheCREATE_CHUNKS_BY_SQL procedure divides the workload based on a user-definedquery. If the BY_ROWID parameter is set to TRUE, the query must return a seriesof start and end rowids. If it's set to FALSE, the query must return a seriesof start and end IDs.

 

把之前创建的chunks drop 掉

SQL> exec dbms_parallel_execute.drop_chunks('test_task');

PL/SQL procedure successfully completed.

 

DECLARE

 l_stmt CLOB;

BEGIN

  l_stmt:= 'SELECT DISTINCT num_col, num_col FROM test_tab';

 

 DBMS_PARALLEL_EXECUTE.create_chunks_by_sql(task_name => 'test_task',

                                            sql_stmt  => l_stmt,

                                            by_rowid  => FALSE);

END;

/

 

       The[DBA|USER]_PARALLEL_EXECUTE_CHUNKS views display information about theindividual chunks.

 

SQL> SELECT chunk_id, status, start_id,end_id

 2  FROM   user_parallel_execute_chunks

 3  WHERE  task_name = 'test_task'

 4  ORDER BY chunk_id;

 

 CHUNK_ID STATUS                START_ID     END_ID

---------- -------------------- --------------------

      141 UNASSIGNED                  10         10

      142 UNASSIGNED                  30         30

      143 UNASSIGNED                  20         20

 

2.6 Run the task

       Runninga task involves running a specific statement for each defined chunk of work.The documentation only shows examples using updates of the base table, but thisis not the only use of this functionality. The statement associated with thetask can be a procedure call, as shown in one of the examples at the end of thearticle.

       There are two ways to run a taskand several procedures to control a running task.

 

2.6.1 RUN_TASK

       TheRUN_TASK procedure runs the specified statement inparallel by scheduling jobs to process the workload chunks. Thestatement specifying the actual work to be done mustinclude a reference to the ':start_id' and ':end_id', which represent arange of rowids or column IDs to be processed, as specified in the chunkdefinitions. The degree of parallelism is controlled by the number of scheduledjobs, not the number of chunks defined. The scheduled jobs take an unassignedworkload chunk, process it, then move on to the next unassigned chunk.

 

DECLARE

  l_sql_stmtVARCHAR2(32767);

BEGIN

  l_sql_stmt:= 'UPDATE /*+ ROWID (dda) */ test_tab t

                SET    t.num_col = t.num_col + 10

                WHERE rowid BETWEEN :start_idAND :end_id';

 

 DBMS_PARALLEL_EXECUTE.run_task(task_name      => 'test_task',

                                 sql_stmt       => l_sql_stmt,

                                language_flag  =>DBMS_SQL.NATIVE,

                                 parallel_level => 10);

END;

/

 

       TheRUN_TASK procedure waits for the task to complete. On completion, the status ofthe task must be assessed to know what action to take next.

2.6.2 User-defined framework

       TheDBMS_PARALLEL_EXECUTE package allows you to manually code the task run. The GET_ROWID_CHUNK and GET_NUMBER_COL_CHUNK proceduresreturn the next available unassigned chunk. You can than manuallyprocess the chunk and set its status. The example below shows the processing ofa workload chunked by rowid.

 

DECLARE

 l_sql_stmt    VARCHAR2(32767);

 l_chunk_id    NUMBER;

 l_start_rowid ROWID;

 l_end_rowid   ROWID;

 l_any_rows    BOOLEAN;

BEGIN

 l_sql_stmt := 'UPDATE /*+ ROWID (dda) */ test_tab t

                 SET    t.num_col = t.num_col + 10

                 WHERE rowid BETWEEN :start_idAND :end_id';

 

 LOOP

   -- Get next unassigned chunk.

   DBMS_PARALLEL_EXECUTE.get_rowid_chunk(task_name   => 'test_task',

                                         chunk_id    => l_chunk_id,

                                         start_rowid=> l_start_rowid,

                                         end_rowid   => l_end_rowid,

                                         any_rows    => l_any_rows);

 

   EXIT WHEN l_any_rows = FALSE;

 

   BEGIN

      -- Manually execute the work.

     EXECUTE IMMEDIATE l_sql_stmt USING l_start_rowid, l_end_rowid;

 

     -- Set the chunk status as processed.

     DBMS_PARALLEL_EXECUTE.set_chunk_status(task_name => 'test_task',

                                             chunk_id => l_chunk_id,

                                            status    =>DBMS_PARALLEL_EXECUTE.PROCESSED);

     EXCEPTION

       WHEN OTHERS THEN

         -- Record chunk error.

         DBMS_PARALLEL_EXECUTE.set_chunk_status(task_name => 'test_task',

                                                chunk_id  => l_chunk_id,

                                                status    =>DBMS_PARALLEL_EXECUTE.PROCESSED_WITH_ERROR,

                                                err_num   => SQLCODE,

                                                err_msg   => SQLERRM);

   END;

 

   -- Commit work.

   COMMIT;

  ENDLOOP;

END;

/

 

2.6.3 Task control

       A running task can be stopped and restarted using the STOP_TASKand RESUME_TASK procedures respectively.

       The PURGE_PROCESSED_CHUNKSprocedure deletes all chunks with a status of 'PROCESSED' or'PROCESSED_WITH_ERROR'.

       The ADM_DROP_CHUNKS, ADM_DROP_TASK,ADM_TASK_STATUS and ADM_STOP_TASK routines have the same function as theirnamesakes, but they allow the operations to performed on tasks owned by otherusers. In order to use these routines the user must have been granted the ADM_PARALLEL_EXECUTE_TASKrole.

2.7 Check the task status

       Thesimplest way to check the status of a task is to use the TASK_STATUS function. After execution of the task, the only possible return valuesare the 'FINISHED' or 'FINISHED_WITH_ERROR' constants. If the status isnot 'FINISHED', then the task can be resumed using the RESUME_TASK procedure.

 

DECLARE

 l_try NUMBER;

 l_status NUMBER;

BEGIN

  --If there is error, RESUME it for at most 2 times.

 l_try := 0;

 l_status := DBMS_PARALLEL_EXECUTE.task_status('test_task');

 WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)

 Loop

   l_try := l_try + 1;

   DBMS_PARALLEL_EXECUTE.resume_task('test_task');

   l_status := DBMS_PARALLEL_EXECUTE.task_status('test_task');

  ENDLOOP;

END;

/

 

The status of the taskand the chunks can also be queried.

COLUMN task_name FORMAT A10

SELECT task_name,

      status

FROM  user_parallel_execute_tasks;

 

TASK_NAME STATUS

---------- -------------------

test_task FINISHED

 

If there were errors, thechunks can be queried to identify the problems.

 

SELECT status, COUNT(*)

FROM  user_parallel_execute_chunks

GROUP BY status

ORDER BY status;

 
STATUS                 COUNT(*)
-------------------- ----------
PROCESSED_WITH_ERROR          3
 

       The[DBA|USER]_PARALLEL_EXECUTE_TASKS views contain a record of the JOB_PREFIX usedwhen scheduling the chunks of work.

 

SELECT job_prefix

FROM  user_parallel_execute_tasks

WHERE task_name = 'test_task';

 

JOB_PREFIX

------------------------------

TASK$_368

 

       Thisvalue can be used to query information about the individual jobs used duringthe process. The number of jobs scheduled should match the degree ofparallelism specified in the RUN_TASK procedure.

 

COLUMN job_name FORMAT A20

 

SELECT job_name, status

FROM  user_scheduler_job_run_details

WHERE job_name LIKE (SELECT job_prefix || '%'

                      FROM   user_parallel_execute_tasks

                      WHERE  task_name = 'test_task');

 

JOB_NAME             STATUS

--------------------------------------------------

TASK$_205_3          SUCCEEDED

TASK$_205_9          SUCCEEDED

TASK$_205_5          SUCCEEDED

TASK$_205_7          SUCCEEDED

TASK$_205_1          SUCCEEDED

TASK$_205_2          SUCCEEDED

TASK$_205_6          SUCCEEDED

TASK$_205_8          SUCCEEDED

TASK$_205_4          SUCCEEDED

TASK$_205_10         SUCCEEDED

 

2.8 Drop the task

       Oncethe job is complete you can drop the task, which will drop the associated chunkinformation also.

 

BEGIN

 DBMS_PARALLEL_EXECUTE.drop_task('test_task');

END;

/

 

三. 示例

3.1 Test 1

The following example shows the processingof a workload chunked by rowid.

 

DECLARE

 l_task     VARCHAR2(30) :='test_task';

 l_sql_stmt VARCHAR2(32767);

 l_try      NUMBER;

 l_status   NUMBER;

BEGIN

 DBMS_PARALLEL_EXECUTE.create_task (task_name => l_task);

 

 DBMS_PARALLEL_EXECUTE.create_chunks_by_rowid(task_name   => l_task,

                                              table_owner => 'TEST',

                                              table_name  => 'TEST_TAB',

                                              by_row      => TRUE,

                                              chunk_size  => 10000);

 

 l_sql_stmt := 'UPDATE /*+ ROWID (dda) */ test_tab t

                 SET    t.num_col = t.num_col + 10

                 WHERE rowid BETWEEN :start_idAND :end_id';

 

 DBMS_PARALLEL_EXECUTE.run_task(task_name      => l_task,

                                 sql_stmt       => l_sql_stmt,

                                language_flag  =>DBMS_SQL.NATIVE,

                                 parallel_level => 10);

 

  --If there is error, RESUME it for at most 2 times.

 l_try := 0;

 l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);

 WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)

 Loop

   l_try := l_try + 1;

   DBMS_PARALLEL_EXECUTE.resume_task(l_task);

   l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);

  ENDLOOP;

 

 DBMS_PARALLEL_EXECUTE.drop_task(l_task);

END;

/

 

3.2 Test 2

       Thefollowing example shows the processing of a workload chunked by a numbercolumn. Notice that the workload is actually a stored procedure in this case.

 

CREATE OR REPLACE PROCEDURE process_update(p_start_id IN NUMBER, p_end_id IN NUMBER) AS

BEGIN

 UPDATE /*+ ROWID (dda) */ test_tab t

 SET    t.num_col = t.num_col + 10

 WHERE id BETWEEN p_start_id AND p_end_id;

END;

/

 

DECLARE

 l_task     VARCHAR2(30) :='test_task';

 l_sql_stmt VARCHAR2(32767);

 l_try      NUMBER;

 l_status   NUMBER;

BEGIN

 DBMS_PARALLEL_EXECUTE.create_task (task_name => l_task);

 

 DBMS_PARALLEL_EXECUTE.create_chunks_by_number_col(task_name    => l_task,

                                                   table_owner  => 'TEST',

                                                   table_name   => 'TEST_TAB',

                                                   table_column => 'ID',

                                                   chunk_size   => 10000);

 

 l_sql_stmt := 'BEGIN process_update(:start_id, :end_id); END;';

 

 DBMS_PARALLEL_EXECUTE.run_task(task_name      => l_task,

                                 sql_stmt       => l_sql_stmt,

                                language_flag  =>DBMS_SQL.NATIVE,

                                 parallel_level=> 10);

 

  --If there is error, RESUME it for at most 2 times.

 l_try := 0;

 l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);

 WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)

 Loop

   l_try := l_try + 1;

   DBMS_PARALLEL_EXECUTE.resume_task(l_task);

   l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);

  ENDLOOP;

 

 DBMS_PARALLEL_EXECUTE.drop_task(l_task);

END;

/

 

3.3 Test 3

       Thefollowing example shows a workload chunked by an SQL statement and processed bya user-defined framework.

 

DECLARE

 l_task     VARCHAR2(30) :='test_task';

 l_stmt     CLOB;

 l_sql_stmt VARCHAR2(32767);

 l_chunk_id NUMBER;

 l_start_id NUMBER;

 l_end_id   NUMBER;

 l_any_rows BOOLEAN;

BEGIN

 DBMS_PARALLEL_EXECUTE.create_task (task_name => l_task);

 

 l_stmt := 'SELECT DISTINCT num_col, num_col FROM test_tab';

 

 DBMS_PARALLEL_EXECUTE.create_chunks_by_sql(task_name => l_task,

                                            sql_stmt  => l_stmt,

                                            by_rowid  => FALSE);

 

 l_sql_stmt := 'UPDATE /*+ ROWID (dda) */ test_tab t

                 SET    t.num_col = t.num_col

                 WHERE num_col BETWEEN:start_id AND :end_id';

 

 LOOP

   -- Get next unassigned chunk.

   DBMS_PARALLEL_EXECUTE.get_number_col_chunk(task_name => 'test_task',

                                              chunk_id    => l_chunk_id,

                                               start_id    => l_start_id,

                                              end_id      => l_end_id,

                                              any_rows    => l_any_rows);

 

   EXIT WHEN l_any_rows = FALSE;

 

   BEGIN

     -- Manually execute the work.

     EXECUTE IMMEDIATE l_sql_stmt USING l_start_id, l_end_id;

 

     -- Set the chunk status as processed.

     DBMS_PARALLEL_EXECUTE.set_chunk_status(task_name => 'test_task',

                                             chunk_id  => l_chunk_id,

                                            status    =>DBMS_PARALLEL_EXECUTE.PROCESSED);

     EXCEPTION

       WHEN OTHERS THEN

         -- Record chunk error.

         DBMS_PARALLEL_EXECUTE.set_chunk_status(task_name => 'test_task',

                                                chunk_id  => l_chunk_id,

                                                status    =>DBMS_PARALLEL_EXECUTE.PROCESSED_WITH_ERROR,

                                                err_num   => SQLCODE,

                                                err_msg   => SQLERRM);

   END;

 

   -- Commit work.

   COMMIT;

  ENDLOOP;

 

 DBMS_PARALLEL_EXECUTE.drop_task(l_task);

END;

/

 


-------------------------------------------------------------------------------------------------------

原创粉丝点击