Oracle 11g 使用 dbms_parallel_execute 对大表进行并行update

来源：互联网发布：奥飞数据股票编辑：程序博客网时间：2024/06/04 17:49

一. dbms_parallel_execute说明

Updating Large Tables in Parallel

TheDBMS_PARALLEL_EXECUTEpackage enables you to incrementally update the data in a large table in parallel, in twohigh-level steps:

（1）Group sets of rows in the table into smaller chunks.

（2）Apply the desired UPDATE statement to the chunks in parallel,committing each time you have finished processing a chunk.

--dbms_parallel_execute 包使用并行的2个步骤，一是将大表分成多个小的chunks。二对这些小的chunks 进行并行。

Thistechnique is recommended whenever you are updating a lot of data. Its advantages are:

（1）You lock only one set of rows at a time, for a relatively shorttime, instead of locking the entire table.

（2）You do not lose work that has been done if something fails beforethe entire operation finishes.

（3）You reduce rollback space consumption.

（4）You improve performance.

二. 使用说明

以下内容转自：

http://www.oracle-base.com/articles/11g/dbms_parallel_execute_11gR2.php

2.1 操作需要createjob的权限，所以先赋权

SQL> conn / as sysdba;

Connected.

SQL> grant create job to icd;

Grant succeeded.

SQL> conn icd/icd;

Connected.

2.2 创建相关的测试表并插入数据

SQL> CREATE TABLE test_tab (

  2    id          NUMBER,

  3    description VARCHAR2(50),

  4    num_col     NUMBER,

  5    CONSTRAINT test_tab_pk PRIMARY KEY (id)

  6  );

Table created.

SQL> INSERT /*+ APPEND */ INTO test_tab

  2  SELECT level,

  3         'Description for ' || level,

  4         CASE

  5           WHEN MOD(level, 5) = 0 THEN 10

  6           WHEN MOD(level, 3) = 0 THEN 20

  7           ELSE 30

  8         END

  9  FROM   dual

10  CONNECT BY level <= 500000;

500000 rows created.

SQL> commit;

Commit complete.

2.3 收集统计信息

SQL> EXEC DBMS_STATS.gather_table_stats(USER, 'TEST_TAB', cascade => TRUE);

PL/SQL procedure successfully completed.

SQL> SELECT num_col, COUNT(*)

  2      FROM   test_tab

  3      GROUP BY num_col

  4      ORDER BY num_col;

   NUM_COL   COUNT(*)

---------- ----------

        10     100000

        20     133333

        30     266667

2.4 创建task

TheCREATE_TASK procedure is used to create a new task. It requires a task name tobe specified, but can also include an optional task comment.

SQL> BEGIN

2 DBMS_PARALLEL_EXECUTE.create_task (task_name => 'test_task');

3 END;

4 /

PL/SQL procedure successfully completed.

Informationabout existing tasks is displayed using the [DBA|USER]_PARALLEL_EXECUTE_TASKSviews.

SQL> COLUMN task_name FORMAT A10

SQL> SELECT task_name,

2 status

3 FROM user_parallel_execute_tasks;

TASK_NAME STATUS

---------- -------------------

test_task CREATED

The GENERATE_TASK_NAME function returns a unique task name ifyou do not want to name the task manually.

SQL> SELECTDBMS_PARALLEL_EXECUTE.generate_task_name FROM dual;

GENERATE_TASK_NAME

-----------------------------------------------------

TASK$_1

2.5 Split the workload into chunks

将一张大表split 成多个chunks 有三种方法。

（1）CREATE_CHUNKS_BY_ROWID

（2）CREATE_CHUNKS_BY_NUMBER_COL

（3）CREATE_CHUNKS_BY_SQL

分配好的chunks 可以用drop_chunks 来删除。

2.5.1 CREATE_CHUNKS_BY_ROWID

TheCREATE_CHUNKS_BY_ROWID procedure splits the data by rowid into chunks specifiedby the CHUNK_SIZE parameter. If the BY_ROW parameter isset to TRUE, the CHUNK_SIZE refers to the number of rows, otherwise it refersto the number of blocks.

SQL> BEGIN

2dbms_parallel_execute.create_chunks_by_rowid(task_name => 'test_task',

3 table_owner => 'icd',

4 table_name => 'test_tab',

5 by_row => true,

6 chunk_size => 10000);

7 end;

8 /

PL/SQL procedure successfully completed.

一旦chunks创建完毕，task 的状态就变成了'chunked'.

SQL> COLUMN task_name FORMAT A10

SQL> SELECT task_name,

2 status

3 FROM user_parallel_execute_tasks;

TASK_NAME STATUS

---------- -------------------

test_task CHUNKED

The [DBA|USER]_PARALLEL_EXECUTE_CHUNKS views displayinformation about the individual chunks.

SQL> SELECT chunk_id, status,start_rowid, end_rowid

2 FROM user_parallel_execute_chunks

3 WHERE task_name = 'test_task'

4 ORDER BY chunk_id;

CHUNK_ID STATUS START_ROWID END_ROWID

---------- -------------------------------------- ------------------

2 UNASSIGNED AAATMCAAMAABSMIAAA AAATMCAAMAABSMPCcP

3 UNASSIGNED AAATMCAAMAABSMgAAA AAATMCAAMAABSMnCcP

4 UNASSIGNED AAATMCAAMAABSMoAAAAAATMCAAMAABSMvCcP

...

73 UNASSIGNED AAATMCAAMAABS0yAAA AAATMCAAMAABS1jCcP

74 UNASSIGNED AAATMCAAMAABS1kAAA AAATMCAAMAABS1/CcP

73 rows selected.

删除chunks

SQL> begin

2 dbms_parallel_execute.drop_chunks('test_task');

3 end;

4 /

PL/SQL procedure successfully completed.

再次查看chunk状态，又变成了created.

SQL> SELECT task_name,

2 status

3 FROM user_parallel_execute_tasks;

TASK_NAME STATUS

---------- -------------------

test_task CREATED

2.5.2 CREATE_CHUNKS_BY_NUMBER_COL

TheCREATE_CHUNKS_BY_NUMBER_COL procedure divides the workload up based on a number column. It uses the specifiedcolumns min and max values along with the chunk size to split the data intoapproximately equal chunks. For the chunks to be equally sized the column mustcontain a continuous sequence of numbers, like that generated by a sequence.

BEGIN

dbms_parallel_execute.create_chunks_by_number_col(task_name => 'test_task',

table_owner => 'ICD',

table_name => 'TEST_TAB',

table_column => 'ID',

chunk_size => 10000);

END;

The [DBA|USER]_PARALLEL_EXECUTE_CHUNKSviews display information about the individual chunks.

SQL> SELECT chunk_id, status, start_id,end_id

2 FROM user_parallel_execute_chunks

3 WHERE task_name = 'test_task'

4 ORDER BY chunk_id;

CHUNK_ID STATUS START_ID END_ID

---------- -------------------- --------------------

75 UNASSIGNED 1 10000

76 UNASSIGNED 10001 20000

77 UNASSIGNED 20001 30000

78 UNASSIGNED 30001 40000

......

122 UNASSIGNED 470001 480000

123 UNASSIGNED 480001 490000

124 UNASSIGNED 490001 500000

50 rows selected.

2.5.3 CREATE_CHUNKS_BY_SQL

TheCREATE_CHUNKS_BY_SQL procedure divides the workload based on a user-definedquery. If the BY_ROWID parameter is set to TRUE, the query must return a seriesof start and end rowids. If it's set to FALSE, the query must return a seriesof start and end IDs.

把之前创建的chunks drop 掉

SQL> exec dbms_parallel_execute.drop_chunks('test_task');

PL/SQL procedure successfully completed.

DECLARE

l_stmt CLOB;

BEGIN

l_stmt:= 'SELECT DISTINCT num_col, num_col FROM test_tab';

DBMS_PARALLEL_EXECUTE.create_chunks_by_sql(task_name => 'test_task',

sql_stmt => l_stmt,

by_rowid => FALSE);

END;

The[DBA|USER]_PARALLEL_EXECUTE_CHUNKS views display information about theindividual chunks.

SQL> SELECT chunk_id, status, start_id,end_id

2 FROM user_parallel_execute_chunks

3 WHERE task_name = 'test_task'

4 ORDER BY chunk_id;

CHUNK_ID STATUS START_ID END_ID

---------- -------------------- --------------------

141 UNASSIGNED 10 10

142 UNASSIGNED 30 30

143 UNASSIGNED 20 20

2.6 Run the task

Runninga task involves running a specific statement for each defined chunk of work.The documentation only shows examples using updates of the base table, but thisis not the only use of this functionality. The statement associated with thetask can be a procedure call, as shown in one of the examples at the end of thearticle.

There are two ways to run a taskand several procedures to control a running task.

2.6.1 RUN_TASK

TheRUN_TASK procedure runs the specified statement inparallel by scheduling jobs to process the workload chunks. Thestatement specifying the actual work to be done mustinclude a reference to the ':start_id' and ':end_id', which represent arange of rowids or column IDs to be processed, as specified in the chunkdefinitions. The degree of parallelism is controlled by the number of scheduledjobs, not the number of chunks defined. The scheduled jobs take an unassignedworkload chunk, process it, then move on to the next unassigned chunk.

DECLARE

l_sql_stmtVARCHAR2(32767);

BEGIN

l_sql_stmt:= 'UPDATE /*+ ROWID (dda) */ test_tab t

SET t.num_col = t.num_col + 10

WHERE rowid BETWEEN :start_idAND :end_id';

DBMS_PARALLEL_EXECUTE.run_task(task_name => 'test_task',

sql_stmt => l_sql_stmt,

language_flag =>DBMS_SQL.NATIVE,

parallel_level => 10);

END;

TheRUN_TASK procedure waits for the task to complete. On completion, the status ofthe task must be assessed to know what action to take next.

2.6.2 User-defined framework

TheDBMS_PARALLEL_EXECUTE package allows you to manually code the task run. The GET_ROWID_CHUNK and GET_NUMBER_COL_CHUNK proceduresreturn the next available unassigned chunk. You can than manuallyprocess the chunk and set its status. The example below shows the processing ofa workload chunked by rowid.

DECLARE

l_sql_stmt VARCHAR2(32767);

l_chunk_id NUMBER;

l_start_rowid ROWID;

l_end_rowid ROWID;

l_any_rows BOOLEAN;

BEGIN

l_sql_stmt := 'UPDATE /*+ ROWID (dda) */ test_tab t

SET t.num_col = t.num_col + 10

WHERE rowid BETWEEN :start_idAND :end_id';

LOOP

-- Get next unassigned chunk.

DBMS_PARALLEL_EXECUTE.get_rowid_chunk(task_name => 'test_task',

chunk_id => l_chunk_id,

start_rowid=> l_start_rowid,

end_rowid => l_end_rowid,

any_rows => l_any_rows);

EXIT WHEN l_any_rows = FALSE;

BEGIN

-- Manually execute the work.

EXECUTE IMMEDIATE l_sql_stmt USING l_start_rowid, l_end_rowid;

-- Set the chunk status as processed.

DBMS_PARALLEL_EXECUTE.set_chunk_status(task_name => 'test_task',

chunk_id => l_chunk_id,

status =>DBMS_PARALLEL_EXECUTE.PROCESSED);

EXCEPTION

WHEN OTHERS THEN

-- Record chunk error.

DBMS_PARALLEL_EXECUTE.set_chunk_status(task_name => 'test_task',

chunk_id => l_chunk_id,

status =>DBMS_PARALLEL_EXECUTE.PROCESSED_WITH_ERROR,

err_num => SQLCODE,

err_msg => SQLERRM);

END;

-- Commit work.

COMMIT;

ENDLOOP;

END;

2.6.3 Task control

       A running task can be stopped and restarted using the STOP_TASKand RESUME_TASK procedures respectively.

       The PURGE_PROCESSED_CHUNKSprocedure deletes all chunks with a status of 'PROCESSED' or'PROCESSED_WITH_ERROR'.

       The ADM_DROP_CHUNKS, ADM_DROP_TASK,ADM_TASK_STATUS and ADM_STOP_TASK routines have the same function as theirnamesakes, but they allow the operations to performed on tasks owned by otherusers. In order to use these routines the user must have been granted the ADM_PARALLEL_EXECUTE_TASKrole.

2.7 Check the task status

Thesimplest way to check the status of a task is to use the TASK_STATUS function. After execution of the task, the only possible return valuesare the 'FINISHED' or 'FINISHED_WITH_ERROR' constants. If the status isnot 'FINISHED', then the task can be resumed using the RESUME_TASK procedure.

DECLARE

l_try NUMBER;

l_status NUMBER;

BEGIN

--If there is error, RESUME it for at most 2 times.

l_try := 0;

l_status := DBMS_PARALLEL_EXECUTE.task_status('test_task');

WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)

Loop

l_try := l_try + 1;

DBMS_PARALLEL_EXECUTE.resume_task('test_task');

l_status := DBMS_PARALLEL_EXECUTE.task_status('test_task');

ENDLOOP;

END;

The status of the taskand the chunks can also be queried.

COLUMN task_name FORMAT A10

SELECT task_name,

status

FROM user_parallel_execute_tasks;

TASK_NAME STATUS

---------- -------------------

test_task FINISHED

If there were errors, thechunks can be queried to identify the problems.

SELECT status, COUNT(*)

FROM user_parallel_execute_chunks

GROUP BY status

ORDER BY status;

STATUS                 COUNT(*)

-------------------- ----------

PROCESSED_WITH_ERROR          3

The[DBA|USER]_PARALLEL_EXECUTE_TASKS views contain a record of the JOB_PREFIX usedwhen scheduling the chunks of work.

SELECT job_prefix

FROM user_parallel_execute_tasks

WHERE task_name = 'test_task';

JOB_PREFIX

------------------------------

TASK$_368

Thisvalue can be used to query information about the individual jobs used duringthe process. The number of jobs scheduled should match the degree ofparallelism specified in the RUN_TASK procedure.

COLUMN job_name FORMAT A20

SELECT job_name, status

FROM user_scheduler_job_run_details

WHERE job_name LIKE (SELECT job_prefix || '%'

FROM user_parallel_execute_tasks

WHERE task_name = 'test_task');

JOB_NAME STATUS

--------------------------------------------------

TASK$_205_3 SUCCEEDED

TASK$_205_9 SUCCEEDED

TASK$_205_5 SUCCEEDED

TASK$_205_7 SUCCEEDED

TASK$_205_1 SUCCEEDED

TASK$_205_2 SUCCEEDED

TASK$_205_6 SUCCEEDED

TASK$_205_8 SUCCEEDED

TASK$_205_4 SUCCEEDED

TASK$_205_10 SUCCEEDED

2.8 Drop the task

Oncethe job is complete you can drop the task, which will drop the associated chunkinformation also.

BEGIN

DBMS_PARALLEL_EXECUTE.drop_task('test_task');

END;

三. 示例

3.1 Test 1

The following example shows the processingof a workload chunked by rowid.

DECLARE

l_task VARCHAR2(30) :='test_task';

l_sql_stmt VARCHAR2(32767);

l_try NUMBER;

l_status NUMBER;

BEGIN

DBMS_PARALLEL_EXECUTE.create_task (task_name => l_task);

DBMS_PARALLEL_EXECUTE.create_chunks_by_rowid(task_name => l_task,

table_owner => 'TEST',

table_name => 'TEST_TAB',

by_row => TRUE,

chunk_size => 10000);

l_sql_stmt := 'UPDATE /*+ ROWID (dda) */ test_tab t

SET t.num_col = t.num_col + 10

WHERE rowid BETWEEN :start_idAND :end_id';

DBMS_PARALLEL_EXECUTE.run_task(task_name => l_task,

sql_stmt => l_sql_stmt,

language_flag =>DBMS_SQL.NATIVE,

parallel_level => 10);

--If there is error, RESUME it for at most 2 times.

l_try := 0;

l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);

WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)

Loop

l_try := l_try + 1;

DBMS_PARALLEL_EXECUTE.resume_task(l_task);

l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);

ENDLOOP;

DBMS_PARALLEL_EXECUTE.drop_task(l_task);

END;

3.2 Test 2

Thefollowing example shows the processing of a workload chunked by a numbercolumn. Notice that the workload is actually a stored procedure in this case.

CREATE OR REPLACE PROCEDURE process_update(p_start_id IN NUMBER, p_end_id IN NUMBER) AS

BEGIN

UPDATE /*+ ROWID (dda) */ test_tab t

SET t.num_col = t.num_col + 10

WHERE id BETWEEN p_start_id AND p_end_id;

END;

DECLARE

l_task VARCHAR2(30) :='test_task';

l_sql_stmt VARCHAR2(32767);

l_try NUMBER;

l_status NUMBER;

BEGIN

DBMS_PARALLEL_EXECUTE.create_task (task_name => l_task);

DBMS_PARALLEL_EXECUTE.create_chunks_by_number_col(task_name => l_task,

table_owner => 'TEST',

table_name => 'TEST_TAB',

table_column => 'ID',

chunk_size => 10000);

l_sql_stmt := 'BEGIN process_update(:start_id, :end_id); END;';

DBMS_PARALLEL_EXECUTE.run_task(task_name => l_task,

sql_stmt => l_sql_stmt,

language_flag =>DBMS_SQL.NATIVE,

parallel_level=> 10);

--If there is error, RESUME it for at most 2 times.

l_try := 0;

l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);

WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)

Loop

l_try := l_try + 1;

DBMS_PARALLEL_EXECUTE.resume_task(l_task);

l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);

ENDLOOP;

DBMS_PARALLEL_EXECUTE.drop_task(l_task);

END;

3.3 Test 3

Thefollowing example shows a workload chunked by an SQL statement and processed bya user-defined framework.

DECLARE

l_task VARCHAR2(30) :='test_task';

l_stmt CLOB;

l_sql_stmt VARCHAR2(32767);

l_chunk_id NUMBER;

l_start_id NUMBER;

l_end_id NUMBER;

l_any_rows BOOLEAN;

BEGIN

DBMS_PARALLEL_EXECUTE.create_task (task_name => l_task);

l_stmt := 'SELECT DISTINCT num_col, num_col FROM test_tab';

DBMS_PARALLEL_EXECUTE.create_chunks_by_sql(task_name => l_task,

sql_stmt => l_stmt,

by_rowid => FALSE);

l_sql_stmt := 'UPDATE /*+ ROWID (dda) */ test_tab t

SET t.num_col = t.num_col

WHERE num_col BETWEEN:start_id AND :end_id';

LOOP

-- Get next unassigned chunk.

DBMS_PARALLEL_EXECUTE.get_number_col_chunk(task_name => 'test_task',

chunk_id => l_chunk_id,

start_id => l_start_id,

end_id => l_end_id,

any_rows => l_any_rows);

EXIT WHEN l_any_rows = FALSE;

BEGIN

-- Manually execute the work.

EXECUTE IMMEDIATE l_sql_stmt USING l_start_id, l_end_id;

-- Set the chunk status as processed.

DBMS_PARALLEL_EXECUTE.set_chunk_status(task_name => 'test_task',

chunk_id => l_chunk_id,

status =>DBMS_PARALLEL_EXECUTE.PROCESSED);

EXCEPTION

WHEN OTHERS THEN

-- Record chunk error.

DBMS_PARALLEL_EXECUTE.set_chunk_status(task_name => 'test_task',

chunk_id => l_chunk_id,

status =>DBMS_PARALLEL_EXECUTE.PROCESSED_WITH_ERROR,

err_num => SQLCODE,

err_msg => SQLERRM);

END;

-- Commit work.

COMMIT;

ENDLOOP;

DBMS_PARALLEL_EXECUTE.drop_task(l_task);

END;

-------------------------------------------------------------------------------------------------------