Note 821722 - Performance of the join on MSEG and MKPF

来源:互联网 发布:淘宝皇冠店月收入多少 编辑:程序博客网 时间:2024/05/11 14:50

Summary

Symptom

The database times required to execute a join using MSEG and MKPF are high.

Other terms

column statistics; BUDAT; performance

Reason and Prerequisites


The system runs on Microsoft SQL Server 2000, SQL Server 2005, or SQL Server 2008.
The MKPF table contains data for material documents. There is a secondary index on the BUDAT column (posting date) that usually only contains values up to the current day. New entries are always attached at the end, according to this index. The new values are not yet entered in the column statistics on field BUDAT. SQL Server refreshes the statistics automatically, but only when a particular section of the table is changed.
Let us say, for example, the column statistics were last refreshed one month ago. In the last two weeks, 10,000 entries were entered in MKPF. If you now select all entries from MKPF with BUDAT in the last two weeks, the SQL Server Query Analyzer comes up with an estimate that is far too low based on the column statistics (just a few entries).
This incorrect estimate only has a major impact on performance if the optimizer decides on an access path with field BUDAT when a join using MKPF and MSET is executed. If a large number of entries is found, the nested loop takes a very long time.

Examples


A.The SQL statement looks as follows:
           SELECT ... FROM

           MKPF INNER JOIN MSEG ON MANDT AND MBLNR AND MJAHR

           WHERE MANDT = ...  AND WERKS =... AND MATNR =... AND BUDAT >=...


The join is processed with a nested loop. During this operation, the SQL Server Query Optimizer decides to access either table MKPF with an index using BUDAT, or table MSEG with an index on MATNR.
If you use MKPF for the access, the access plan has the following structure (this plan is mistakenly used if the statistics are too old):
           SELECT

           Nested Loops(Inner Join, OUTER REFERENCES:([T_00]. [MJAHR], [T_00].[MBLNR]))

           Index Seek(OBJECT:([MKPF~BUD] AS [T_00]),

           SEEK:([MANDT]= '100' AND [BUDAT] >= '20051015') )

           Clustered Index Seek(OBJECT:([MSEG~0] AS [T_01]),

           SEEK:([MANDT]='100' AND [MBLNR]=[T_00]. [MBLNR] AND [MJAHR]=[T_00].[MJAHR]),

           WHERE:([MATNR]= 'MR07105' AND [WERKS]='WS01'))


If you use MSEG for the access, the access plan has the following structure:
           SELECT

           Nested Loops(Inner Join, OUTER REFERENCES:([T_01]. [MJAHR], [T_01].[MBLNR]))

           Bookmark Lookup(BOOKMARK:([Bmk1001]), OBJECT:([MSEG] AS [T_01]))

           Index Seek(OBJECT:([MSEG~M] AS [T_01]),

           SEEK:([MANDT]='100' AND [MATNR]='MR07105' AND [WERKS]='WS01') )

           Clustered Index Seek(OBJECT:([MKPF~0] AS [T_00]),

           SEEK:([MANDT]='100' AND [MBLNR]=[T_01]. [MBLNR] AND [MJAHR]= [T_01].[MJAHR]),

           WHERE:([BUDAT]>='20051015'))


The first access path may be correct if the selection is restricted with a posting date that actually only has a few entries. It should be compared with the number of entries in table MSEG for the specified material and plant.

B. The system sometimes reads MKPF and MSEG even with an unselective WHERE condition. In this case, the runtime of the SQL statement is always long (several seconds). However, the join strategy selected can make a big difference to the runtime.
If a large number of data records is to be read, the MERGE join is the best strategy. Instead the SQL Server Query Analyzer selects a NESTED LOOP if it underestimates the number of entries with corresponding BUDAT.
Essentially, the access plan with NESTED LOOP appears as follows:
           SELECT

           Hash Match(HASH:([Expr1002])=([T_01].[WERKS]))

           Constant Scan

           Nested Loops(Inner Join:([T_00].[MBLNR],[T_00]. [MJAHR]))

           Bookmark Lookup([MKPF] AS [T_00])

           Index Seek([MKPF~BUD]),

           SEEK:([MANDT]='800' AND [BUDAT]>= '20031001'))

           Clustered Index Seek([MSEG~0] AS[T_01]),

           SEEK:([MANDT]='800' AND[MBLNR]=[T_00]. [MBLNR] AND [MJAHR]=[T_00].[MJAHR]),

           WHERE:([SOBKZ]=' ' AND [LGORT]<>' '))


The access plan with join MERGE has the following elements:
           SELECT

           Hash Match(HASH:([Expr1002])=([T_01].[WERKS]))

           Constant Scan

           Merge Join(Inner Join,

           MERGE:([T_00].[MBLNR],[T_00].[MJAHR])=([T_01].[MBLNR], [T_01]. [MJAHR]))

           Clustered Index Seek([MKPF~0] AS [T_00]),

           SEEK:([MANDT]='800'), WHERE:([BUDAT]>='20010101'))

           Clustered Index Seek([MSEG~0] AS [T_01]),

           SEEK:([MANDT]='800'),

           WHERE:(([LGORT]<' ' OR [LGORT]> ' ') AND [SOBKZ]=' '))


Since the dataset is large, a long runtime is unavoidable. Which join strategy is more appropriate depends on the individual case.

Solution


The statistics on column BUDAT of table MKPF must be updated. The optimizer can then estimate the costs more precisely for the access using index MKPF~BUD and decide on another access path, if necessary.
Regularly refresh the statistics on column BUDAT. You can do this manually in DB02 or you can schedule an SQL Server job:

    1. In transaction DB02, call "Detail Analysis" and enter "MKPF".
    2. Search for the row with the column statistics on BUDAT and make a note of the exact name ('_WA_Sys_BUDAT_...').
    (To implement the update manually, select the row with the column statistics on BUDAT and select "UPDATE". Select a statistic for this in the "sample" dialog box. The sample rate can be accepted as is.)
    3. In the SQL Server Enterprise Manager, create a job that contains the following command as a step:
    UPDATE STATISTICS MKPF [_WA_Sys_BUDAT_...]
    4. Schedule the job regularly. How regularly depends on when new entries are created in MKPF and how many. Estimate how long it will take for 5000 new entries to be made in MKPF. Begin with this value as a repeat interval for the job and then use tests to adapt the value to your system requirements.
  • If the calling program is a customer-specific program (Z program), you can use hints in ABAP to force the optimal access as an alternative to the statistics update (for example, with LOOP JOIN and INDEX hint). To do so, you are required to change the ABAP program. For more information, see Notes 129385 and 133381.


Make sure that you update only the column statistics as described in the section above, but not the statistics of all indexes of the table MKPF. Note that a statistics update results in a recompilation. This means that all execution plans for which the (column) statistics are relevant are indicated as invalid by the SQL Server and are regenerated during the next execution of the SQL Server optimizer. As a result, the corresponding runtime information in the SQL statement cache (sys.dm_exec_query_stats) may be deleted and important information that is required for the statement ANALYZE may be lost. If all of the indexes of the table MKPF are updated, the runtime information of all SQL statements that access the table MKPF may be lost. This also happens if SQL Server automatically updates statistics. However, this happens far less frequently than the forced update that is described in this note.

原创粉丝点击