EXPDP/IMPDP 中的并行度PARALLEL参数

来源：互联网发布：linux更改管理员密码编辑：程序博客网时间：2024/06/06 02:22

如果设置 EXPDP parallel=4 必须要设置4个EXPDP文件，不然PARALLEL是有问题的，同时EXPDP会使用一个WORKER进程导出METADATA，其他WORKER进程会同时出数据。如果EXPDP作业低于250M，只会启动一个WORKER进程。如果是500M会启动2个，1000M及会启动4个WOKER进程。一般来说加上%U来设置多个文件。

而IMPDP有所不同，会先启动一个WOKER进程METADATA导入,然后启动多个WORKER进程导入，所以再前期只会看到WOKER在导入METADATA，而且IMPDP如果PARALLE=4也需要不少于4个DMP文件，也可以使用%U来进行导入。

nohup expdp system/**** PARALLEL=2 JOB_NAME=full_bak_job full=y dumpfile=exptest:back_%U.dmp logfile=exptest:back.log &

impdp system/*** PARALLEL=2 EXCLUDE=STATISTICS JOB_NAME=full_imp cluster=no full=y dumpfile=test:back_%U.dmp logfile=test:back_imp.log;

而在11GR2后EXPDP和IMDP的WORKER进程会在多个INSTANCE启动，所以DIRECTORY必须在共享磁盘上，如果没有设置共享磁盘还是指定cluster=no 来防止报错。

当观察EXPDP/IMPDP woker的时候如下：

Import> status

Job: FULL_IMP
Operation: IMPORT
Mode: FULL
State: EXECUTING
Bytes Processed: 150,300,713,536
Percent Done: 80
Current Parallelism: 6
Job Error Count: 0
Dump File: /expdp/back_%u.dmp
Dump File: /expdp/back_01.dmp
Dump File: /expdp/back_02.dmp
Dump File: /expdp/back_03.dmp
Dump File: /expdp/back_04.dmp
Dump File: /expdp/back_05.dmp
Dump File: /expdp/back_06.dmp
Dump File: /expdp/back_07.dmp
Dump File: /expdp/back_08.dmp

Worker 1 Status:
Process Name: DW00
State: EXECUTING
Object Schema: ACRUN
Object Name: T_PLY_UNDRMSG
Object Type: DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
Completed Objects: 3
Completed Rows: 3,856,891
Completed Bytes: 1,134,168,200
Percent Done: 83
Worker Parallelism: 1

Worker 2 Status:
Process Name: DW01
State: EXECUTING
Object Schema: ACRUN
Object Name: T_FIN_PAYDUE
Object Type: DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
Completed Objects: 5
Completed Rows: 2,646,941
Completed Bytes: 1,012,233,224
Percent Done: 93
Worker Parallelism: 1

Worker 3 Status:
Process Name: DW02
State: EXECUTING
Object Schema: ACRUN
Object Name: MLOG$_T_FIN_CLMDUE
Object Type: DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
Completed Objects: 6
Completed Bytes: 382,792,584
Worker Parallelism: 1

Worker 4 Status:
Process Name: DW03
State: EXECUTING
Object Schema: ACRUN
Object Name: T_PAY_CONFIRM_INFO
Object Type: DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
Completed Objects: 5
Completed Rows: 2,443,790
Completed Bytes: 943,310,104
Percent Done: 83
Worker Parallelism: 1

Worker 5 Status:
Process Name: DW04
State: EXECUTING
Object Schema: ACRUN
Object Name: T_PLY_TGT
Object Type: DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
Completed Objects: 6
Completed Rows: 2,285,353
Completed Bytes: 822,501,496
Percent Done: 64
Worker Parallelism: 1

Worker 6 Status:
Process Name: DW05
State: EXECUTING
Object Schema: ACRUN
Object Name: T_FIN_PREINDRCT_CLMFEE
Object Type: DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
Completed Objects: 5
Completed Rows: 6,042,384
Completed Bytes: 989,435,088
Percent Done: 79
Worker Parallelism: 1

英文如下：

For Data Pump Export, the value that is specified for the parallel parameter should be less than or equal to the number of files in the dump file set. Each worker or Parallel Execution Process requires exclusive access to the dump file, so having fewer dump files than the degree of parallelism will mean that some workers or PX processes will be unable to write the information they are exporting. If this occurs, the worker processes go into an idle state and will not be doing any work until more files are added to the job. See the explanation of the DUMPFILE parameter in the Database Utilities guide for details on how to specify multiple dump files for a Data Pump export job.
For Data Pump Import, the workers and PX processes can all read from the same files. However, if there are not enough dump files, the performance may not be optimal because multiple threads of execution will be trying to access the same dump file. The performance impact of multiple processes sharing the dump files depends on the I/O subsystem containing the dump files. For this reason, Data Pump Import should not have a value for the PARALLEL parameter that is significantly larger than the number of files in the dump file set.

In a typical export that includes both data and metadata, the first worker process will unload the metadata: tablespaces, schemas, grants, roles, tables, indexes, and so on. This single worker unloads the metadata, and all the rest unload the data, all at the same time. If the metadata worker finishes and there are still data objects to unload, it will start unloading the data too. The examples in this document assume that there is always one worker busy unloading metadata while the rest of the workers are busy unloading table data objects.

If the external tables method is chosen, Data Pump will determine the maximum number of PX processes that can work on a table data object. It does this by dividing the estimated size of the table data object by 250 MB and rounding the result down. If the result is zero or one, then PX processes are not used to unload the table

The PARALLEL parameter works a bit differently in Import than Export. Because there are various dependencies that exist when creating objects during import, everything must be done in order. For Import, no data loading can occur until the tables are created because data cannot be loaded into tables that do not yet exist

0 0