The merge statement in oracle 9i

来源:互联网 发布:ipad淘宝版本 编辑:程序博客网 时间:2024/05/07 17:14

This article introduces the new MERGE SQL command (sometimes referred to as "UPSERT"). MERGE is a DML command that enables us to optionally update or insert data into a target table, depending on whether matching records already exist. In versions prior to 9i, we would have to code this scenario either in separate bulk SQL statements or in PL/SQL. We will compare MERGE to these methods later in this article.

getting started: sample data

Before we begin, we will create two scratch tables to work with; a source table and a target table. We will start by creating the source table as a selection of ALL_OBJECTS data.

SQL> CREATE TABLE source_table  2  NOLOGGING  3  AS  4     SELECT ROWNUM AS object_id  5     ,      object_name  6     ,      object_type  7     FROM   all_objects;Table created.

To add some volume and some data with different characteristics, we will add to our source table as follows.

SQL> INSERT /*+ APPEND */ INTO source_table  2  SELECT ROWNUM + (COUNT(*) OVER () * 5) AS object_id  3  ,      LOWER(object_name)              AS object_name  4  ,      SUBSTR(object_type,1,1)         AS object_type  5  FROM   all_objects;28963 rows created.SQL> COMMIT;Commit complete.SQL> INSERT /*+ APPEND */ INTO source_table  2  SELECT ROWNUM + (COUNT(*) OVER() * 10) AS object_id  3  ,      INITCAP(object_name)            AS object_name  4  ,      SUBSTR(object_type,-1)          AS object_type  5  FROM   all_objects;28963 rows created.SQL> COMMIT;Commit complete.

Finally, we will add a primary key as follows.

SQL> ALTER TABLE source_table ADD PRIMARY KEY (object_id);Table altered.

Our target table will be a 50% sample of the source table. Like the source table, it will also have a primary key on OBJECT_ID.

SQL> CREATE TABLE target_table  2  NOLOGGING  3  AS  4     SELECT *  5     FROM   source_table SAMPLE (50);Table created.SQL> ALTER TABLE target_table ADD PRIMARY KEY (object_id);Table altered.

the merge statement

We can now see an example of the MERGE statement. In the following example, we will merge the source table into the target table. We will capture a count of the target table rows before and after the merge.

SQL> SELECT COUNT(*) FROM target_table;  COUNT(*)----------     434851 row selected.SQL> MERGE  2     INTO  target_table tgt  3     USING source_table src  4     ON  ( src.object_id = tgt.object_id )  5  WHEN MATCHED  6  THEN  7     UPDATE  8     SET   tgt.object_name = src.object_name  9     ,     tgt.object_type = src.object_type 10  WHEN NOT MATCHED 11  THEN 12     INSERT ( tgt.object_id 13            , tgt.object_name 14            , tgt.object_type ) 15     VALUES ( src.object_id 16            , src.object_name 17            , src.object_type );86889 rows merged.

The syntax at first looks a little daunting, but if we read through from top to bottom, it is quite intuitive. Note the following clauses:

  • MERGE (line 1): as stated previously, this is now the 4th DML statement in Oracle. Any hints we might wish to add directly follow this keyword (i.e. MERGE /*+ HINT */);
  • INTO (line 2): this is how we specify the target for the MERGE. The target must be either a table or an updateable view (an in-line view cannot be used here);
  • USING (line 3): the USING clause represents the source dataset for the MERGE. This can be a single table (as in our example) or an in-line view;
  • ON () (line 4): the ON clause is where we supply the join between the source dataset and target table. Note that the join conditions must be in parentheses;
  • WHEN MATCHED (line 5): this clause is where we instruct Oracle on what to do when we already have a matching record in the target table (i.e. there is a join between the source and target datasets). We obviously want an UPDATE in this case. One of the restrictions of this clause is that we cannot update any of the columns used in the ON clause (though of course we don't need to as they already match). Any attempt to include a join column will raise an unintuitive invalid identifier exception; and
  • WHEN NOT MATCHED (line 10): this clause is where we INSERT records for which there is no current match.

Note that sqlplus reports the number of rows merged. This includes both the updates and inserts. Oracle treats MERGE as a MERGE and not an UPDATE+INSERT statement. The same is true of SQL%ROWCOUNT in PL/SQL.

As a rough sanity check, we can report the record count in the target table following the MERGE. We can see that this is the same as the MERGE count.

SQL> SELECT COUNT(*) FROM target_table;  COUNT(*)----------     868891 row selected.

merge performance considerations

MERGE is useful for combining larger source and target datasets, particularly for slowly-changing dimensions in data warehouses. If we explain a MERGE statement, we can see the mechanism Oracle uses to determine whether source and target rows match. The following output is an Autotrace explain plan for our original MERGE statement from above.

Execution Plan----------------------------------------------------------   0      MERGE STATEMENT Optimizer=CHOOSE (Cost=194 Card=86889 Bytes=9557790)   1    0   MERGE OF 'TARGET_TABLE'   2    1     VIEW   3    2       HASH JOIN (OUTER) (Cost=194 Card=86889 Bytes=7038009)   4    3         TABLE ACCESS (FULL) OF 'SOURCE_TABLE' (Cost=46 Card=86889 Bytes=2867337)   5    3         TABLE ACCESS (FULL) OF 'TARGET_TABLE' (Cost=24 Card=18950 Bytes=909600)

We can see that Oracle performs an outer join between the source dataset and target table (in our case we happen to have a hash join). This creates a non-mergeable view (this is an unfortunate coincidence in terminology) that is applied back to the target table. Without the outer join, Oracle would need to implement a "two-pass" solution such as we might code ourselves with a separate INSERT and UPDATE statement.

Major performance gains will be achieved by tuning the source-target join (for example, using indexes, hints, partitioning etc) or by tuning the USING clause if it is a complex in-line view. In addition, we can achieve some minor gains by ordering the WHEN clauses in a MERGE according to which event is the most likely (i.e. most frequent first), although at our sample data volume, the effects of this are not generally noticeable.

merge vs bulk insert + update

We can compare the overall performance of MERGE against a pre-9i SQL solution. In the following example, we will use a variation of Tom Kyte's RUNSTATS utility to compare our original MERGE with a solution that runs a bulk update followed by a bulk insert. We will pause the statistics between the two runs to reset the data. We will begin with the MERGE.

SQL> exec runstats_pkg.rs_start();PL/SQL procedure successfully completed.SQL> MERGE  2     INTO  target_table tgt  3     USING source_table src  4     ON  ( src.object_id = tgt.object_id )  5  WHEN MATCHED  6  THEN  7     UPDATE  8     SET   tgt.object_name = src.object_name  9     ,     tgt.object_type = src.object_type 10  WHEN NOT MATCHED 11  THEN 12     INSERT ( tgt.object_id 13            , tgt.object_name 14            , tgt.object_type ) 15     VALUES ( src.object_id 16            , src.object_name 17            , src.object_type );86889 rows merged.

Now we will pause runstats, rollback the merged data to reset it, and resume our resource snapshots.

SQL> exec runstats_pkg.rs_pause();PL/SQL procedure successfully completed.SQL> ROLLBACK;Rollback complete.SQL> exec runstats_pkg.rs_resume();PL/SQL procedure successfully completed.

We will now run a bulk update and insert as separate SQL statements. The update is written as an updateable in-line view which is often the fastest technique for bulk updating one table from another. The sqlplus feedback gives us the breakdown of the previous merge rowcount.

SQL> UPDATE ( SELECT src.object_name AS src_name  2           ,      src.object_type AS src_type  3           ,      tgt.object_name AS tgt_name  4           ,      tgt.object_type AS tgt_type  5           FROM   source_table src  6           ,      target_table tgt  7           WHERE  src.object_id = tgt.object_id )  8  SET tgt_name = src_name  9  ,   tgt_type = src_type;43485 rows updated.SQL> INSERT INTO target_table tgt  2  SELECT *  3  FROM   source_table src  4  WHERE  NOT EXISTS ( SELECT NULL  5                      FROM   target_table tgt  6                      WHERE  src.object_id = tgt.object_id );43404 rows created.

Finally we can report the differences in time and resources with runstats. We will only report major differences to keep the output to a minimum, as follows.

SQL> exec runstats_pkg.rs_stop(1000);Run1 ran in 1239 hsecsRun2 ran in 522 hsecsRun1 ran in 237.36% of the timeName                                  Run1        Run2        DiffSTAT..hot buffers moved to hea           2       1,024       1,022LATCH.simulator hash latch           5,423       7,820       2,397LATCH.checkpoint queue latch        12,435       8,966      -3,469STAT..session pga memory max         7,904           0      -7,904STAT..redo entries                  87,326      44,564     -42,762LATCH.redo allocation               87,793      44,871     -42,922LATCH.cache buffers chains         580,501     499,486     -81,015STAT..db block changes             175,708      89,825     -85,883STAT..db block gets                132,934      46,622     -86,312STAT..index fetch by key                 0      86,889      86,889STAT..table scan rows gotten       130,374     217,263      86,889STAT..consistent gets - examin         230      87,120      86,890STAT..buffer is not pinned cou         978      88,317      87,339STAT..no work - consistent rea         969      88,312      87,343STAT..session logical reads        134,389     222,507      88,118STAT..consistent gets                1,455     175,885     174,430STAT..redo size                 23,369,732  15,606,560  -7,763,172Run1 latches total versus run2 -- difference and pctRun1        Run2        Diff        Pct693,976     568,252    -125,724    122.12%PL/SQL procedure successfully completed.

We can see that MERGE performed less well than our two-part SQL solution; with it taking over twice as long. It generated more redo and used more latches. We can repeat the test against a typical PL/SQL-coded merge (common in older applications). We will replace the two-part SQL solution with a PL/SQL loop that will attempt an update first and insert only if the update affects no rows. The alternative to this would be to insert first and only update when a DUP_VAL_ON_INDEX exception was raised (a primary or unique key is required for this to work).

SQL> exec runstats_pkg.rs_start();PL/SQL procedure successfully completed.SQL> MERGE  2     INTO  target_table tgt  3     USING source_table src  4     ON  ( src.object_id = tgt.object_id )  5  WHEN MATCHED  6  THEN  7     UPDATE  8     SET   tgt.object_name = src.object_name  9     ,     tgt.object_type = src.object_type 10  WHEN NOT MATCHED 11  THEN 12     INSERT ( tgt.object_id 13            , tgt.object_name 14            , tgt.object_type ) 15     VALUES ( src.object_id 16            , src.object_name 17            , src.object_type );86889 rows merged.SQL> exec runstats_pkg.rs_pause();PL/SQL procedure successfully completed.SQL> ROLLBACK;Rollback complete.SQL> exec runstats_pkg.rs_resume();PL/SQL procedure successfully completed.SQL> DECLARE  2     i PLS_INTEGER := 0;  3     u PLS_INTEGER := 0;  4  BEGIN  5     FOR r IN ( SELECT * FROM source_table )  6     LOOP  7  8        UPDATE target_table tgt  9        SET    tgt.object_name = r.object_name 10        ,      tgt.object_type = r.object_type 11        WHERE  tgt.object_id = r.object_id; 12 13        u := u + SQL%ROWCOUNT; 14 15        IF SQL%ROWCOUNT = 0 THEN 16           INSERT INTO target_table 17              ( object_id, object_name, object_type ) 18           VALUES 19              ( r.object_id, r.object_name, r.object_type ); 20           i := i + 1; 21        END IF; 22 23     END LOOP; 24 25     DBMS_OUTPUT.PUT_LINE( u || ' rows updated.' ); 26     DBMS_OUTPUT.PUT_LINE( i || ' rows inserted.' ); 27  END; 28  /43485 rows updated.43404 rows inserted.PL/SQL procedure successfully completed.

We can now report the differences between the MERGE and the PL/SQL solution.

SQL> exec runstats_pkg.rs_stop(1000);Run1 ran in 1197 hsecsRun2 ran in 2617 hsecsRun1 ran in 45.74% of the timeName                                  Run1        Run2        DiffSTAT..CPU used by this session       1,087       2,283       1,196STAT..CPU used when call start       1,087       2,283       1,196STAT..hot buffers moved to hea          13       1,299       1,286LATCH.cache buffers lru chain        2,742       4,825       2,083STAT..recursive cpu usage                2       2,137       2,135LATCH.checkpoint queue latch         8,576      15,028       6,452LATCH.simulator hash latch           5,406      18,595      13,189STAT..shared hash latch upgrad           8      43,092      43,084STAT..redo entries                  87,340     130,585      43,245LATCH.redo allocation               87,790     131,215      43,425STAT..table scan rows gotten       130,374      86,889     -43,485STAT..db block gets                132,942     177,148      44,206STAT..Cached Commit SCN refere         286      56,854      56,568STAT..table scan blocks gotten         951      86,889      85,938STAT..buffer is not pinned cou         975      86,924      85,949STAT..db block changes             175,740     262,562      86,822STAT..index fetch by key                 0      86,891      86,891STAT..no work - consistent rea         967     129,991     129,024STAT..execute count                     13     130,338     130,325STAT..calls to get snapshot sc          28     130,355     130,327STAT..consistent gets - examin         230     130,962     130,732STAT..recursive calls                  567     218,074     217,507STAT..consistent gets                1,469     260,988     259,519LATCH.library cache pin                119     261,164     261,045LATCH.library cache                    166     261,675     261,509STAT..session logical reads        134,411     438,136     303,725LATCH.cache buffers chains         579,034   1,190,456     611,422STAT..redo size                 23,372,796  32,740,676   9,367,880Run1 latches total versus run2 -- difference and pctRun1        Run2        Diff        Pct687,702   1,889,744   1,202,042     36.39%PL/SQL procedure successfully completed.

It is probably no surprise that the MERGE was significantly faster than the PL/SQL solution. We could speed up the latter by using bulk processing, but we wouldn't be able to achieve a reduction of two-thirds required to match the MERGE.

merge and key preservation

MERGE is a deterministic, key-preserved operation. This means that for each source row, Oracle needs to be able to identify a single target record for update. The simplest method of ensuring that the MERGE is key-preserved is to join source and target according to the primary key of the target. We can demonstrate what happens if we cannot ensure key-preservation by modifying our MERGE to join on a column other than that of the primary key.

SQL> MERGE  2     INTO  target_table tgt  3     USING source_table src  4     ON  ( src.object_name = tgt.object_name )  5  WHEN MATCHED  6  THEN  7     UPDATE  8     SET    tgt.object_type = src.object_type  9  WHEN NOT MATCHED 10  THEN 11     INSERT ( tgt.object_id 12            , tgt.object_name 13            , tgt.object_type ) 14     VALUES ( src.object_id 15            , src.object_name 16            , src.object_type );   USING source_table src         *ERROR at line 3:ORA-30926: unable to get a stable set of rows in the source tables

This error message is slightly confusing. It has been raised because Oracle found more than one target row that matched a single source row. MERGE will allow multiple updates of a single target row, however, as long as the join is key-preserved. In the following example, we'll remove the primary key from the source table and duplicate some source data. We will revert to the key-preserved join and this time our MERGE should be successful.

SQL> ALTER TABLE source_table DROP PRIMARY KEY;Table altered.SQL> INSERT INTO source_table  2  SELECT *  3  FROM   target_table  4  WHERE  ROWNUM <= 10000;10000 rows created.SQL> MERGE  2     INTO  target_table tgt  3     USING source_table src  4     ON  ( src.object_id = tgt.object_id )  5  WHEN MATCHED  6  THEN  7     UPDATE  8     SET    tgt.object_type = src.object_type  9     ,      tgt.object_name = src.object_name 10  WHEN NOT MATCHED 11  THEN 12     INSERT ( tgt.object_id 13            , tgt.object_name 14            , tgt.object_type ) 15     VALUES ( src.object_id 16            , src.object_name 17            , src.object_type );96889 rows merged.

This simply means that a single target row might be updated multiple times, which could give rise to some data quality issues (for example, if compound arithmetic such as summing is included in the update of some columns).

merge in pl/sql

MERGE can be used with in PL/SQL (even row-by-row) but should generally be considered a set operation. The following example shows how we might use MERGE with FORALL.

SQL> DECLARE  2  3     TYPE aat_id IS TABLE OF source_table.object_id%TYPE  4        INDEX BY PLS_INTEGER;  5     TYPE aat_name IS TABLE OF source_table.object_name%TYPE  6        INDEX BY PLS_INTEGER;  7     TYPE aat_type IS TABLE OF source_table.object_type%TYPE  8        INDEX BY PLS_INTEGER;  9 10     aa_ids   aat_id; 11     aa_names aat_name; 12     aa_types aat_type; 13 14     CURSOR c_source IS 15        SELECT object_id 16        ,      object_name 17        ,      object_type 18        FROM   source_table; 19 20     n PLS_INTEGER := 0; 21 22  BEGIN 23 24     OPEN c_source; 25     LOOP 26        FETCH c_source BULK COLLECT INTO aa_ids, aa_names, aa_types LIMIT 1000; 27 28        --[some processing]-- 29 30        FORALL i IN 1 .. aa_ids.COUNT 31           MERGE 32              INTO  target_table tgt 33              USING ( 34                     SELECT aa_ids(i) AS object_id 35                     ,      aa_names(i) AS object_name 36                     ,      aa_types(i) AS object_type 37                     FROM   dual 38                    ) src 39              ON   ( src.object_id = tgt.object_id ) 40           WHEN MATCHED 41           THEN 42              UPDATE 43              SET   tgt.object_name = src.object_name 44              ,     tgt.object_type = src.object_type 45           WHEN NOT MATCHED 46           THEN 47              INSERT ( tgt.object_id 48                     , tgt.object_name 49                     , tgt.object_type ) 50              VALUES ( src.object_id 51                     , src.object_name 52                     , src.object_type ); 53 54        n := n + SQL%ROWCOUNT; 55 56        EXIT WHEN c_source%NOTFOUND; 57     END LOOP; 58     CLOSE c_source; 59 60     DBMS_OUTPUT.PUT_LINE(n || ' rows merged.'); 61  END; 62  /86889 rows merged.PL/SQL procedure successfully completed.

We can see that this is quite cumbersome. To use MERGE with either row-by-row PL/SQL or FORALL (as we saw above), each source record must be selected from DUAL to generate a rowsource. We cannot use the variables in direct assignments within the update or insert sections of the MERGE itself. This will generally perform better than row-by-row "insert-else-update" or "update-else-insert", but it is not generally recommended as an approach for reasonable data volumes.

merge and triggers

The MERGE statement will cause before and after event triggers to fire. Ironically, we have stated more than once that MERGE is a DML statement in its own right, yet Oracle will fire UPDATE and INSERT triggers if these events occur within the merge. We can demonstrate this quite easily below. We will create separate insert and update triggers and simply output a message from each.

SQL> CREATE OR REPLACE TRIGGER insert_trigger  2     AFTER INSERT  3     ON    target_table  4  BEGIN  5     DBMS_OUTPUT.PUT_LINE('Insert trigger...');  6  END;  7  /Trigger created.SQL> CREATE OR REPLACE TRIGGER update_trigger  2     AFTER UPDATE  3     ON    target_table  4  BEGIN  5     DBMS_OUTPUT.PUT_LINE('Update trigger...');  6  END;  7  /Trigger created.SQL> MERGE  2     INTO  target_table tgt  3     USING source_table src  4     ON  ( src.object_id = tgt.object_id )  5  WHEN MATCHED  6  THEN  7     UPDATE  8     SET   tgt.object_name = src.object_name  9     ,     tgt.object_type = src.object_type 10  WHEN NOT MATCHED 11  THEN 12     INSERT ( tgt.object_id 13            , tgt.object_name 14            , tgt.object_type ) 15     VALUES ( src.object_id 16            , src.object_name 17            , src.object_type );Update trigger...Insert trigger...86889 rows merged.

As an aside, we can compile a MERGE trigger, but it doesn't seem to do anything (and the documentation doesn't contain any information on this either).

SQL> DROP TRIGGER insert_trigger;Trigger dropped.SQL> DROP TRIGGER update_trigger;Trigger dropped.SQL> CREATE OR REPLACE TRIGGER merge_trigger  2     AFTER MERGE  3     ON    target_table  4  BEGIN  5     DBMS_OUTPUT.PUT_LINE('Merge trigger...');  6  END;  7  /Trigger created.SQL> MERGE  2     INTO  target_table tgt  3     USING source_table src  4     ON  ( src.object_id = tgt.object_id )  5  WHEN MATCHED  6  THEN  7     UPDATE  8     SET   tgt.object_name = src.object_name  9     ,     tgt.object_type = src.object_type 10  WHEN NOT MATCHED 11  THEN 12     INSERT ( tgt.object_id 13            , tgt.object_name 14            , tgt.object_type ) 15     VALUES ( src.object_id 16            , src.object_name 17            , src.object_type );86889 rows merged.

further reading

For more background on MERGE, read the SQL Reference. For an example of how to break down a MERGE rowcount into separate update and insert counts, read this oracle-developer.net article. The oracle-developer.net variation of Tom Kyte's RUNSTATS utility can be downloaded here.

source code

The source code for the examples in this article can be downloaded from here.