Introduction to the Optimizer

来源:互联网 发布:淘宝英雄联盟衣服 编辑:程序博客网 时间:2024/05/05 15:28

Introduction to the Optimizer

This chapter discusses SQL processing, optimization methods, and how the optimizer chooses a specific plan to execute SQL.Full table scans are cheaper than

The chapter contains the following sections:

  • Overview of SQL Processing
  • Overview of the Optimizer
  • Choosing an Optimizer Approach and Goal
  • Understanding the Cost-Based Optimizer
  • Understanding Access Paths for the CBO
  • Understanding Joins
  • Setting Cost-Based Optimizer Parameters
  • Overview of the Extensible Optimizer

Overview of SQL Processing

SQL processing uses the following main components to execute a SQL query:

  • The Parser checks both syntax and semantic analysis.
  • The Optimizer usescosting methods, cost-based optimizer (CBO), or internal rules,rule-based optimizer (RBO), to determine the most efficient way ofproducing the result of the query.
  • The Row Source Generator receives the optimal plan from the optimizer and outputs the execution plan for the SQL statement.
  • The SQL ExecutionEngine operates on the execution plan associated with a SQL statementand then produces the results of the query.

Figure 1-1 illustrates SQL processing.

Figure 1-1 SQL Processing Overview

Text description of pfgrf185.gif follows
Text description of the illustration pfgrf185.gif

Overview of the Optimizer

The optimizer determines the most efficient way toexecute a SQL statement after considering many factors related to theobjects referenced and the conditions specified in the query. Thisdetermination is an important step in the processing of any SQLstatement and can greatly affect execution time.

A SQL statement can be executed in many different ways, including the following:

  • Full table scans
  • Index scans
  • Nested loops
  • Hash joins


    Note:

    The optimizer might not make the same decisions from oneversion of Oracle to the next. In recent versions, the optimizer mightmake different decisions, because better information is available.


The output from the optimizer is a plan that describes anoptimum method of execution. The Oracle server provides the cost-based(CBO) and rule-based (RBO) optimization. In general, use the cost-basedapproach. Oracle Corporation is continually improving the CBO and newfeatures require CBO.


Note:

Oracle Corporation strongly advises the use of cost-basedoptimization. The rule-based optimization is available for backwardcompatibility with legacy applications and will be deprecated in afuture release.


You can influence the optimizer's choices by setting theoptimizer approach and goal, and by gathering representative statisticsfor the CBO. The optimizer goal is either throughput or response time.See "Choosing an Optimizer Approach and Goal".

Sometimes, the application designer, who has moreinformation about a particular application's data than is available tothe optimizer, can choose a more effective way to execute a SQLstatement. The application designer can use hints in SQL statements tospecify how the statement should be executed.

See Also:
  • "Choosing an Optimizer Approach and Goal" for more information on optimization goals
  • Chapter 3, "Gathering Optimizer Statistics" for information on gathering and using statistics
  • Chapter 5, "Optimizer Hints" for more information about using hints in SQL statements

Features that Require the CBO

The following features require use of the CBO:

  • Partitioned tables and indexes
  • Index-organized tables
  • Reverse key indexes
  • Function-based indexes
  • SAMPLE clauses in a SELECT statement
  • Parallel query and parallel DML
  • Star transformations and star joins
  • Extensible optimizer
  • Query rewrite with materialized views
  • Enterprise Manager progress meter
  • Hash joins
  • Bitmap indexes and bitmap join indexes
  • Index skip scans


    Note:

    Using any of these features enables the CBO, even if the parameter OPTIMIZER_MODE is set to RULE.


Optimizer Operations

For any SQL statement processed by Oracle, the optimizer performs the operations listed in Table 1-1.

Table 1-1  Optimizer Operations
Operation Description

Evaluation of expressions and conditions

The optimizer first evaluates expressions and conditions containing constants as fully as possible. See "How the Optimizer Performs Operations".

Statement transformation

For complex statements involving, for example, correlatedsubqueries or views, the optimizer might transform the originalstatement into an equivalent join statement. See "How the Optimizer Transforms SQL Statements".

Choice of optimizer approaches

The optimizer chooses either a cost-based or rule-based approach and determines the goal of optimization. See "Choosing an Optimizer Approach and Goal".

Choice of access paths

For each table accessed by the statement, the optimizerchooses one or more of the available access paths to obtain table data.See "Understanding Access Paths for the CBO".

Choice of join orders

For a join statement that joins more than two tables, theoptimizer chooses which pair of tables is joined first, and then whichtable is joined to the result, and so on. See "How the CBO Chooses Execution Plans for Join Types".

Choice of join methods

For any join statement, the optimizer chooses an operation to use to perform the join. See "How the CBO Chooses the Join Method".

Choosing an Optimizer Approach and Goal

By default, the goal of the CBO is the best throughput.This means that it chooses the least amount of resources necessary toprocess all rows accessed by the statement. Oracle can also optimize astatement with the goal of best response time. This means that it uses the least amount of resources necessary to process the first row accessed by a SQL statement.

See Also:

"How the CBO Optimizes SQL Statements for Fast Response"

The execution plan produced by the optimizer can varydepending on the optimizer's goal. Optimizing for best throughput ismore likely to result in a full table scan rather than an index scan,or a sort merge join rather than a nested loop join. Optimizing forbest response time usually results in an index scan or a nested loopjoin.

For example, suppose you have a join statement that canbe executed with either a nested loops operation or a sort-mergeoperation. The sort-merge operation might return the entire queryresult faster, while the nested loops operation might return the firstrow faster. If your goal is to improve throughput, then the optimizeris more likely to choose a sort merge join. If your goal is to improveresponse time, then the optimizer is more likely to choose a nestedloop join.

Choose a goal for the optimizer based on the needs of your application:

  • For applicationsperformed in batch, such as Oracle Reports applications, optimize forbest throughput. Usually, throughput is more important in batchapplications, because the user initiating the application is onlyconcerned with the time necessary for the application to complete.Response time is less important, because the user does not examine theresults of individual statements while the application is running.
  • For interactiveapplications, such as Oracle Forms applications or SQL*Plus queries,optimize for best response time. Usually, response time is important ininteractive applications, because the interactive user is waiting tosee the first row or first few rows accessed by the statement.

The optimizer's behavior when choosing an optimizationapproach and goal for a SQL statement is affected by the followingfactors:

  • OPTIMIZER_MODE Initialization Parameter
  • CBO Statistics in the Data Dictionary
  • Optimizer SQL Hints for Changing the CBO Goal

OPTIMIZER_MODE Initialization Parameter

The OPTIMIZER_MODE initialization parameterestablishes the default behavior for choosing an optimization approachfor the instance. The possible values and description are listed in Table 1-2.

Table 1-2  OPTIMIZER_MODE Parameter Values
Value Description

CHOOSE

The optimizer chooses between a cost-based approach and arule-based approach, depending on whether statistics are available.This is the default value.

  • If the data dictionarycontains statistics for at least one of the accessed tables, then theoptimizer uses a cost-based approach and optimizes with a goal of bestthroughput.
  • If the data dictionarycontains only some statistics, then the cost-based approach is stillused, but the optimizer must guess the statistics for the subjectswithout any statistics. This can result in suboptimal execution plans.
  • If the data dictionary contains no statistics for any of the accessed tables, then the optimizer uses a rule-based approach.

ALL_ROWS

The optimizer uses a cost-based approach for all SQLstatements in the session regardless of the presence of statistics andoptimizes with a goal of best throughput (minimum resource use tocomplete the entire statement).

FIRST_ROWS_n

The optimizer uses a cost-based approach, regardless ofthe presence of statistics, and optimizes with a goal of best responsetime to return the first n number of rows; n can equal 1, 10, 100, or 1000.

FIRST_ROWS

The optimizer uses a mix of cost and heuristics to find a best plan for fast delivery of the first few rows.

Note: Using heuristics sometimes leads the CBO togenerate a plan with a cost that is significantly larger than the costof a plan without applying the heuristic. FIRST_ROWS is available for backward compatibility and plan stability.

RULE

The optimizer chooses a rule-based approach for all SQL statements regardless of the presence of statistics.

You can change the goal of the CBO for all SQL statementsin a session by changing the parameter value in initialization file orby the ALTER SESSION SET OPTIMIZER_MODE statement. For example:

  • The following statementin an initialization parameter file changes the goal of the CBO for allsessions of the instance to best response time:



    OPTIMIZER_MODE = FIRST_ROWS_1










  • The following SQL statement changes the goal of the CBO for the current session to best response time:



    ALTER SESSION SET OPTIMIZER_MODE = FIRST_ROWS_1;










If the optimizer uses the cost-based approach for a SQLstatement, and if some tables accessed by the statement have nostatistics, then the optimizer uses internal information, such as thenumber of data blocks allocated to these tables, to estimate otherstatistics for these tables.

Optimizer SQL Hints for Changing the CBO Goal

To specify the goal of the CBO for an individual SQLstatement, use one of the hints in the following list. Any of thesehints in an individual SQL statement can override the OPTIMIZER_MODE initialization parameter for that SQL statement.

  • FIRST_ROWS(n), where n equals any positive integer
  • FIRST_ROWS
  • ALL_ROWS
  • CHOOSE
  • RULE

    See Also:

    Chapter 5, "Optimizer Hints" for information on how to use hints

CBO Statistics in the Data Dictionary

The statistics used by the CBO are stored in the datadictionary. You can collect exact or estimated statistics aboutphysical storage characteristics and data distribution in these schemaobjects by using the DBMS_STATS package or the ANALYZE statement.


Note:

Oracle Corporation strongly recommends that you use the DBMS_STATS package rather than ANALYZEto collect optimizer statistics. That package lets you collectstatistics in parallel, collect global statistics for partitionedobjects, and fine tune your statistics collection in other ways.Further, the cost-based optimizer will eventually use only statisticsthat have been collected by DBMS_STATS. See Oracle9i Supplied PL/SQL Packages and Types Reference for more information on this package.

However, you must use the ANALYZE statement rather than DBMS_STATS for statistics collection not related to the cost-based optimizer, such as:

  • To use the VALIDATE or LIST CHAINED ROWS clauses
  • To collect information on freelist blocks

To maintain the effectiveness of the CBO, you must havestatistics that are representative of the data. For table columns thatcontain values with large variations in number of duplicates, calledskewed data, you should collect histograms.

The resulting statistics provide the CBO with informationabout data uniqueness and distribution. Using this information, the CBOis able to compute plan costs with a high degree of accuracy. Thisenables the CBO to choose the best execution plan based on the leastcost.

See Also:

Chapter 3, "Gathering Optimizer Statistics"

How the CBO Optimizes SQL Statements for Fast Response

The CBO can optimize a SQL statement for fast response when the parameter OPTIMIZER_MODE is set to FIRST_ROWS_n, where n is 1, 10, 100, or 1000, or FIRST_ROWS. A hint FIRST_ROWS(n), where n is any positive integer, or FIRST_ROWS can be used to optimize an individual SQL statement for fast response.

Fast-response optimization is suitable for online users,such as those using Oracle Forms or Web access. Typically, online usersare interested in seeing the first few rows and seldom look at theentire query result, especially when the result size is large. For suchusers, it makes sense to optimize the query to produce the first fewrows as quickly as possible, even if the time to produce the entirequery result is not minimized.

With fast-response optimization, the CBO generates a planwith the lowest cost to produce the first row or the first few rows.The CBO employs two different fast-response optimizations, referred tohere as the old and new methods. The old method is used with the FIRST_ROWShint or parameter value. With the old method, the CBO uses a mixture ofcosts and rules to produce a plan. It is retained for backwardcompatibility reasons.

The new method is totally based on costs, and it is sensitive to the value of n. With small values of n, the CBO tends to generate plans that consist of nested loop joins with index lookups. With large values of n, the CBO tends to generate plans that consist of hash joins and full table scans.

The value of n should be chosenbased on the online user requirement and depends specifically on howthe result is displayed to the user. Generally, Oracle Forms users seethe result one row at a time and they are typically interested inseeing the first few screens. Other online users see the result onegroup of rows at a time.

With the fast-response method, the CBO explores different plans and computes the cost to produce the first n rows for each. It picks the plan that produces the first n rows at lowest cost. Remember that with fast-response optimization, a plan that produces the first nrows at lowest cost might not be the optimal plan to produce the entireresult. If the requirement is to obtain the entire result of a query,then fast-response optimization should not be used. Instead use the ALL_ROWS parameter value or hint.

Understanding the Cost-Based Optimizer

The CBO determines which execution plan is most efficientby considering available access paths and by factoring in informationbased on statistics for the schema objects (tables or indexes) accessedby the SQL statement. The CBO also considers hints, which areoptimization suggestions placed in a comment in the statement.

See Also:

Chapter 5, "Optimizer Hints" for detailed information on hints

The CBO performs the following steps:

  1. The optimizer generates a set of potential plans for the SQL statement based on available access paths and hints.
  2. The optimizerestimates the cost of each plan based on statistics in the datadictionary for the data distribution and storage characteristics of thetables, indexes, and partitions accessed by the statement.

    The cost is anestimated value proportional to the expected resource use needed toexecute the statement with a particular plan. The optimizer calculatesthe cost of access paths and join orders based on the estimatedcomputer resources, which includes I/O, CPU, and memory.

    Serial plans with higher costs take more time to executethan those with smaller costs. When using a parallel plan, however,resource use is not directly related to elapsed time.

  3. The optimizer compares the costs of the plans and chooses the one with the lowest cost.

Components of the CBO

The CBO consists of the following three main components:

  • Query Transformer
  • Estimator
  • Plan Generator

CBO components are illustrated in Figure 1-2.

Figure 1-2 Cost-Based Optimizer Components

Text description of pfgrf184.gif follows
Text description of the illustration pfgrf184.gif

Query Transformer

The input to the query transformer is a parsed query,which is represented by a set of query blocks. The query blocks arenested or interrelated to each other. The form of the query determineshow the query blocks are interrelated to each other. The main objectiveof the query transformer is to determine if it is advantageous tochange the form of the query so that it enables generation of a betterquery plan. Four different query transformation techniques are employedby the query transformer:

  • View Merging
  • Predicate Pushing
  • Subquery Unnesting
  • Query Rewrite with Materialized Views

Any combination of these transformations can be applied to a given query.

View Merging

Each view referenced in a query is expanded by the parserinto a separate query block. The query block essentially represents theview definition, and therefore the result of a view. One option for theoptimizer is to analyze the view query block separately and generate aview subplan. The optimizer then processes the rest of the query byusing the view subplan in the generation of an overall query plan. Thistechnique usually leads to a suboptimal query plan, because the view isoptimized separately from rest of the query.

The query transformer then removes the potentiallysuboptimal plan by merging the view query block into the query blockthat contains the view. Most types of views are merged. When a view ismerged, the query block representing the view is merged into thecontaining query block. Generating a subplan is no longer necessary,because the view query block is eliminated.

Predicate Pushing

For those views that are not merged, the querytransformer can push the relevant predicates from the containing queryblock into the view query block. This technique improves the subplan ofthe nonmerged view, because the pushed-in predicates can be used eitherto access indexes or to act as filters.

Subquery Unnesting

Like a view, a subquery is represented by a separatequery block. Because a subquery is nested within the main query oranother subquery, the plan generator is constrained in trying outdifferent possible plans before it finds a plan with the lowest cost.For this reason, the query plan produced might not be the optimal one.The restrictions due to the nesting of subqueries can be removed byunnesting the subqueries and converting them into joins. Mostsubqueries are unnested by the query transformer. For those subqueriesthat are not unnested, separate subplans are generated. To improveexecution speed of the overall query plan, the subplans are ordered inan efficient manner.

Query Rewrite with Materialized Views

A materialized view is like a query with a result that ismaterialized and stored in a table. When a user query is foundcompatible with the query associated with a materialized view, the userquery can be rewritten in terms of the materialized view. Thistechnique improves the execution of the user query, because most of thequery result has been precomputed. The query transformer looks for anymaterialized views that are compatible with the user query and selectsone or more materialized views to rewrite the user query. The use ofmaterialized views to rewrite a query is cost-based. That is, the queryis not rewritten if the plan generated without the materialized viewshas a lower cost than the plan generated with the materialized views.

See Also:
  • "How the Optimizer Transforms SQL Statements"
  • Oracle9i Data Warehousing Guide for more information on query rewrite

Estimator

The estimator generates three different types of measures:

  • Selectivity
  • Cardinality
  • Cost

These measures are related to each other, and one isderived from another. The end goal of the estimator is to estimate theoverall cost of a given plan. If statistics are available, then theestimator uses them to compute the measures. The statistics improve thedegree of accuracy of the measures.

Selectivity

The first measure, selectivity, represents a fraction ofrows from a row set. The row set can be a base table, a view, or theresult of a join or a GROUP BY operator. The selectivity is tied to a query predicate, such as last_name = 'Smith', or a combination of predicates, such as last_name = 'Smith' AND job_type = 'Clerk'.A predicate acts as a filter that filters a certain number of rows froma row set. Therefore, the selectivity of a predicate indicates how manyrows from a row set will pass the predicate test. Selectivity lies in avalue range from 0.0 to 1.0. A selectivity of 0.0 means that no rowswill be selected from a row set, and a selectivity of 1.0 means thatall rows will be selected.

The estimator uses an internal default value forselectivity, if no statistics are available. Different internaldefaults are used, depending on the predicate type. For example, theinternal default for an equality predicate (last_name = 'Smith') is lower than the internal default for a range predicate (last_name > 'Smith').The estimator makes this assumption because an equality predicate isexpected to return a smaller fraction of rows than a range predicate.

When statistics are available, the estimator uses them to estimate selectivity. For example, for an equality predicate (last_name = 'Smith'), selectivity is set to the reciprocal of the number n of distinct values of last_name, because the query selects rows that all contain one out of n distinct values. If a histogram is available on the last_namecolumn, then the estimator uses it instead of the number of distinctvalues. The histogram captures the distribution of different values ina column, so it yields better selectivity estimates. Having histogramson columns that contain skewed data (in other words, values with largevariations in number of duplicates) greatly helps the CBO generate goodselectivity estimates.

Cardinality

Cardinality represents the number of rows in a row set.Here, the row set can be a base table, a view, or the result of a joinor GROUP BY operator.

  • Base cardinalityis the number of rows in a base table. The base cardinality can becaptured by analyzing the table. If table statistics are not available,then the estimator uses the number of extents occupied by the table toestimate the base cardinality.
  • Effective cardinalityis the number of rows that are selected from a base table. Theeffective cardinality depends on the predicates specified on differentcolumns of a base table, with each predicate acting as a successivefilter on the rows of the base table. The effective cardinality iscomputed as the product of the base cardinality and combinedselectivity of all predicates specified on a table. When there is nopredicate on a table, its effective cardinality equals its basecardinality.
  • Join cardinality is the number of rows produced when two row sets are joined together. A joinis a Cartesian product of two row sets, with the join predicate appliedas a filter to the result. Therefore, the join cardinality is theproduct of the cardinalities of two row sets, multiplied by theselectivity of the join predicate.
  • Distinct cardinalityis the number of distinct values in a column of a row set. The distinctcardinality of a row set is based on the data in the column. Forexample, in a row set of 100 rows, if distinct column values are foundin 20 rows, then the distinct cardinality is 20.
  • Group cardinality is the number of rows produced from a row set after the GROUP BY operator is applied. The effect of the GROUP BYoperator is to decrease the number of rows in a row set. The groupcardinality depends on the distinct cardinality of each of the groupingcolumns and on the number of rows in the row set. For an illustrationof group cardinality, see Example 1-1.

Example 1-1 Group Cardinality

If a row set of 100 rows is grouped by colx, which has a distinct cardinality of 30, then the group cardinality is 30.

However, suppose the same row set of 100 rows is grouped by colx and coly,which have distinct cardinalities of 30 and 60, respectively. In thiscase, the group cardinality lies between the maximum of the distinctcardinalities of colx and coly, and the lower of the product of the distinct cardinalities of colx and coly, and the number of rows in the row set.

Group cardinality in this example can be represented by the following formula:




group cardinality lies between max ( dist. card. colx , dist. card. coly )






and min ( (dist. card. colx * dist. card. coly) ,






num rows in row set )










Substituting the numbers from the example, the groupcardinality is between the maximum of (30 and 60) and the minimum of(30*60 and 100). In other words, the group cardinality is between 60and 100.

Cost

The cost represents units of work or resource used. TheCBO uses disk I/O, CPU usage, and memory usage as units of work. So,the cost used by the CBO represents an estimate of the number of diskI/Os and the amount of CPU and memory used in performing an operation.The operation can be scanning a table, accessing rows from a table byusing an index, joining two tables together, or sorting a row set. Thecost of a query plan is the number of work units that are expected tobe incurred when the query is executed and its result produced.

The access pathdetermines the number of units of work required to get data from a basetable. The access path can be a table scan, a fast full index scan, oran index scan. During table scan or fast full index scan, multipleblocks are read from the disk in a single I/O operation. Therefore, thecost of a table scan or a fast full index scan depends on the number ofblocks to be scanned and the multiblock read count value. The cost ofan index scan depends on the levels in the B-tree, the number of indexleaf blocks to be scanned, and the number of rows to be fetched usingthe rowid in the index keys. The cost of fetching rows using rowidsdepends on the index clustering factor.

Although the clustering factor is a property of theindex, the clustering factor actually relates to the spread of similarindexed column values within data blocks in the table. A lowerclustering factor indicates that the individual rows are concentratedwithin fewer blocks in the table. Conversely, a high clustering factorindicates that the individual rows are scattered more randomly acrossblocks in the table. Therefore, a high clustering factor means that itcosts more to use a range scan to fetch rows by rowid, because moreblocks in the table need to be visited to return the data. Example 1-2 shows how the clustering factor can affect cost.

Example 1-2 Effects of Clustering Factor on Cost

Assume the following situation:

  • There is a table with 9 rows.
  • There is a nonunique index on col1 (in tab1).
  • The c1 column currently stores the values A, B, and C.
  • The table only has three Oracle blocks.

Case 1: The index clustering factor is low for the rows as they are arranged in the following diagram.




Block 1 Block 2 Block 3






------- ------- --------






A A A B B B C C C










This is because the rows that have the same indexed column values for c1are located within the same physical blocks in the table. The cost ofusing a range scan to return all of the rows that have the value A is low, because only one block in the table needs to be read.

Case 2: If the same rows in the table are rearranged sothat the index values are scattered across the table blocks (ratherthan colocated), then the index clustering factor is higher.




Block 1 Block 2 Block 3






------- ------- --------






A B C A B C A B C










This is because all three blocks in the table must be read in order to retrieve all rows with the value A in col1.

The join costrepresents the combination of the individual access costs of the tworow sets being joined. In a join, one row set is called inner, and theother is called outer.

  • In a nested loop join,for every row in the outer row set, the inner row set is accessed tofind all the matching rows to join. Therefore, in a nested loop join,the inner row set is accessed as many times as the number of rows inthe outer row set.



    cost = outer access cost + (inner access cost * outer cardinality)










  • In a sort merge join, the two row sets being joined are sorted by the join keys if they are not already in key order.



    cost = outer access cost + inner access cost + sort costs (if sort is used)










  • In a hash join,the inner row set is hashed into memory, and a hash table is builtusing the join key. Each row from the outer row set is then hashed, andthe hash table is probed to join all matching rows. If the inner rowset is very large, then only a portion of it is hashed into memory.This portion is called a hash partition.

    Each row from the outer row set is hashed to probematching rows in the hash partition. The next portion of the inner rowset is then hashed into memory, followed by a probe from the outer rowset. This process is repeated until all partitions of the inner row setare exhausted.




    cost = (outer access cost * # of hash partitions) + inner access cost



    See Also:

    "Understanding Joins" for more information on joins

Plan Generator

The main function of the plan generator is to try outdifferent possible plans for a given query and pick the one that hasthe lowest cost. Many different plans are possible because of thevarious combinations of different access paths, join methods, and joinorders that can be used to access and process data in different waysand produce the same result.

A join order is the order in which different join items,such as tables, are accessed and joined together. For example, in ajoin order of t1, t2, and t3, table t1 is accessed first. Next, t2 is accessed, and its data is joined to t1 data to produce a join of t1 and t2. Finally, t3 is accessed, and its data is joined to the result of the join between t1 and t2.

The plan for a query is established by first generatingsubplans for each of the nested subqueries and nonmerged views. Eachnested subquery or nonmerged view is represented by a separate queryblock. The query blocks are optimized separately in a bottom-up order.That is, the innermost query block is optimized first, and a subplan isgenerated for it. The outermost query block, which represents theentire query, is optimized last.

The plan generator explores various plans for a queryblock by trying out different access paths, join methods, and joinorders. The number of possible plans for a query block is proportionalto the number of join items in the FROM clause. This number rises exponentially with the number of join items.

The plan generator uses an internal cutoff to reduce thenumber of plans it tries when finding the one with the lowest cost. Thecutoff is based on the cost of the current best plan. If the currentbest cost is large, then the plan generator tries harder (in otherwords, explores more alternate plans) to find a better plan with lowercost. If the current best cost is small, then the plan generator endsthe search swiftly, because further cost improvement will not besignificant.

The cutoff works well if the plan generator starts withan initial join order that produces a plan with cost close to optimal.Finding a good initial join order is a difficult problem. The plangenerator uses a simple heuristic for the initial join order. It ordersthe join items by their effective cardinalities. The join item with thesmallest effective cardinality goes first, and the join item with thelargest effective cardinality goes last.

Understanding Execution Plans

To execute a SQL statement, Oracle might need to performmany steps. Each of these steps either retrieves rows of dataphysically from the database or prepares them in some way for the userissuing the statement. The combination of the steps Oracle uses toexecute a statement is called an execution plan. An execution plan includes an access path for each table that the statement accesses and an ordering of the tables (the join order) with the appropriate join method.

See Also:
  • "Understanding Access Paths for the CBO"
  • "Understanding Access Paths for the RBO"
  • Chapter 9, "Using EXPLAIN PLAN"

Overview of EXPLAIN PLAN

You can examine the execution plan chosen by theoptimizer for a SQL statement by using the EXPLAIN PLAN statement. Whenthe statement is issued, the optimizer chooses an execution plan andthen inserts data describing the plan into a database table. Simplyissue the EXPLAIN PLAN statement and then query the output table.

These are the basics of using the EXPLAIN PLAN statement:

  • Use the SQL script UTLXPLAN.SQL to create a sample output table called PLAN_TABLE in your schema. See "Creating the PLAN_TABLE Output Table".
  • Include the EXPLAIN PLAN FOR clause prior to the SQL statement. See "Running EXPLAIN PLAN".
  • After issuing theEXPLAIN PLAN statement, use one of the scripts provided by Oracle todisplay the most recent plan table output. See "Displaying PLAN_TABLE Output".
  • The execution order inEXPLAIN PLAN output begins with the line that is the furthest indentedto the right. The next step is the parent of that line. If two linesare indented equally, then the top line is executed first.


    Notes:
    • The EXPLAIN PLAN output tables in this chapter were displayed with the utlxpls.sql script.
    • The steps in the EXPLAINPLAN output tables in this chapter may be different on your system.Depending on database configurations, the optimizer may choosedifferent execution plans.

Example 1-3 uses EXPLAIN PLAN to examine a SQL statement that selects the employee_id, job_title, salary, and department_name for the employees whose IDs are less than 103.

Example 1-3 Using EXPLAIN PLAN




EXPLAIN PLAN FOR






SELECT e.employee_id, j.job_title, e.salary, d.department_name






FROM employees e, jobs j, departments d






WHERE e.employee_id < 103






AND e.job_id = j.job_id






AND e.department_id = d.department_id;










The resulting output table in Example 1-4 shows the execution plan chosen by the optimizer to execute the SQL statement in the example:

Example 1-4 EXPLAIN PLAN Output




-----------------------------------------------------------------------------------






| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|






-----------------------------------------------------------------------------------






| 0 | SELECT STATEMENT | | 3 | 189 | 10 (10)|






| 1 | NESTED LOOPS | | 3 | 189 | 10 (10)|






| 2 | NESTED LOOPS | | 3 | 141 | 7 (15)|






|* 3 | TABLE ACCESS FULL | EMPLOYEES | 3 | 60 | 4 (25)|






| 4 | TABLE ACCESS BY INDEX ROWID| JOBS | 19 | 513 | 2 (50)|






|* 5 | INDEX UNIQUE SCAN | JOB_ID_PK | 1 | | |






| 6 | TABLE ACCESS BY INDEX ROWID | DEPARTMENTS | 27 | 432 | 2 (50)|






|* 7 | INDEX UNIQUE SCAN | DEPT_ID_PK | 1 | | |






-----------------------------------------------------------------------------------













Predicate Information (identified by operation id):






---------------------------------------------------






3 - filter("E"."EMPLOYEE_ID"<103)






5 - access("E"."JOB_ID"="J"."JOB_ID")






7 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")










Oracle provides additional graphical tools for displaying EXPLAIN PLAN output. Figure 6-1, "Oracle SQL Analyze" is an example of the SQL statement displayed in Oracle SQL Analyze. Figure 1-3 shows a graphical EXPLAIN PLAN for the SQL statement in Example 1-3 which has been generated in an Oracle Enterprise Manager SQL Scratchpad window.

Figure 1-3 Graphical View of SQL Explain Plan in SQL Scratchpad

Text description of scratchpad1.gif follows.

Text description of the illustration scratchpad1.gif

Note that the execution steps in Figure 1-3 are identified by the order of execution, rather than an ID as in Example 1-4.

See Also:

For more information about Oracle Enterprise Manager and its optional applications, see Oracle Enterprise Manager Concepts Guide, Oracle Enterprise Manager Administrator's Guide, and Database Tuning with the Oracle Tuning Pack.

Steps in the Execution Plan

Each row in the output table corresponds to a single stepin the execution plan. Note that the step Ids with asterisks are listedin the Predicate Information section.

See Also:
  • "Steps in the Execution Plan"
  • Chapter 9, "Using EXPLAIN PLAN"

Each step of the execution plan returns a set of rowsthat either is used by the next step or, in the last step, is returnedto the user or application issuing the SQL statement. A set of rowsreturned by a step is called a row set.

The numbering of the step Ids reflects the order in whichthey are displayed in response to the EXPLAIN PLAN statement. Each stepof the execution plan either retrieves rows from the database oraccepts rows from one or more row sources as input.

  • The following steps in Example 1-4 physically retrieve data from an object in the database:
    • Step 3 reads all rows of the employees table.
    • Step 5 looks up each job_id in JOB_ID_PK index and finds the rowids of the associated rows in the jobs table.
    • Step 4 retrieves the rows with rowids that were returned by Step 5 from the jobs table.
    • Step 7 looks up each department_id in DEPT_ID_PK index and finds the rowids of the associated rows in the departments table.
    • Step 6 retrieves the rows with rowids that were returned by Step 7 from the departments table.
  • The following steps in Example 1-4 operate on rows returned by the previous row source:
    • Step 2 performs the nested loop operation on job_id in the jobs and employeestables, accepting row sources from Steps 3 and 4, joining each row fromStep 3 source to its corresponding row in Step 4, and returning theresulting rows to Step 2.
    • Step 1 performs thenested loop operation, accepting row sources from Step 2 and Step 6,joining each row from Step 2 source to its corresponding row in Step 6,and returning the resulting rows to Step 1.

      See Also:
      • "Understanding Access Paths for the CBO" and "Understanding Access Paths for the RBO" for more information on access paths
      • "Understanding Joins" for more information on the methods by which Oracle joins row sources

Understanding Execution Order

The steps of the execution plan are not performed in the order in which they are numbered in Example 1-3. Oracle first performs the steps that appear indented most to the right in the EXPLAIN PLAN output. In Figure 1-3,the steps are numbered in the order that they are performed.The rowsreturned by each step become the row sources of its parent step. Oraclethen performs the parent steps.

Oracle performs the following steps in Example 1-4 to execute the statement in Example 1-3:

  • Oracle performs Step 3and Step 4, returning the resulting rows to Step 2. For each rowreturned to Step 2, Oracle performs Step 5, returning resulting rowidto Step 4.
  • Oracle performs Step 1,joining the single row from Step 2 with a single row from Step 6,returning the resulting rows, if any, to the user issuing the SQLstatement. For each row returned to Step 6, Oracle performs Step 7,returning resulting rowid to Step 4.

If a parent step requires only a single row from itschild step before it can be executed, then Oracle performs the parentstep as soon as a single row has been returned from the child step. Ifthe parent of that parent step also can be activated by the return of asingle row, then it is executed as well.

Statement execution can cascade up the tree, possibly toencompass the rest of the execution plan. Oracle performs the parentstep and all cascaded steps once for each row retrieved by the childstep. The parent steps that are triggered for each row returned by achild step include table accesses, index accesses, nested loop joins,and filters.

If a parent step requires all rows from its child stepbefore it can be executed, then Oracle cannot perform the parent stepuntil all rows have been returned from the child step. Such parentsteps include sorts, sort merge joins, and aggregate functions.

Understanding Access Paths for the CBO

Access paths are ways in which data is retrieved from thedatabase. In general, index access paths should be used for statementsthat retrieve a small subset of table rows, while full scans are moreefficient when accessing a large portion of the table. Onlinetransaction processing (OLTP) applications, which consist ofshort-running SQL statements with high selectivity, often arecharacterized by the use of index access paths. Decision supportsystems, on the other hand, tend to use partitioned tables and performfull scans of the relevant partitions.

This section describes the data access paths that can be used to locate and retrieve any row in any table.

  • Full Table Scans
  • Rowid Scans
  • Index Scans
  • Cluster Scans
  • Hash Scans
  • Sample Table Scans
  • How the CBO Chooses an Access Path

Full Table Scans

This type of scan reads all rows from a table and filtersout those that do not meet the selection criteria. During a full tablescan, all blocks in the table that are under the high water mark arescanned. Each row is examined to determine whether it satisfies thestatement's WHERE clause.

When Oracle performs a full table scan, the blocks areread sequentially. Because the blocks are adjacent, I/O calls largerthan a single block can be used to speed up the process. The size ofthe read calls range from one block to the number of blocks indicatedby the initialization parameter DB_FILE_MULTIBLOCK_READ_COUNT. Using multiblock reads means a full table scan can be performed very efficiently. Each block is read only once.

Example 1-4, "EXPLAIN PLAN Output" contains an example of a full table scan on the employees table.

Why a Full Table Scan Is Faster for Accessing Large Amounts of Data

Full table scans are cheaper than index range scans whenaccessing a large fraction of the blocks in a table. This is becausefull table scans can use larger I/O calls, and making fewer large I/Ocalls is cheaper than making many smaller calls.

When the Optimizer Uses Full Table Scans

The optimizer uses a full table scan in any of the following cases:

Lack of Index

If the query is unable to use any existing indexes, thenit uses a full table scan. For example, if there is a function used onthe indexed column in the query, the optimizer is unable to use theindex and instead uses a full table scan as in Example 1-5.

Example 1-5 Full Table Scan




SELECT last_name, first_name






FROM employees






WHERE UPPER(last_name) LIKE :b1










If you need to use the index for case-independentsearches, then either do not permit mixed-case data in the searchcolumns or create a function-based index, such as UPPER(last_name), on the search column. See "Using Function-based Indexes".

Large Amount of Data

If the optimizer thinks that the query will access mostof the blocks in the table, then it uses a full table scan, even thoughindexes might be available.

Small Table

If a table contains less than DB_FILE_MULTIBLOCK_READ_COUNTblocks under the high water mark, which can be read in a single I/Ocall, then a full table scan might be cheaper than an index range scan,regardless of the fraction of tables being accessed or indexes present.

High Degree of Parallelism

A high degree of parallelism for a table skews the optimizer toward full table scans over range scans. Examine the DEGREE column in ALL_TABLES for the table to determine the degree of parallelism.

Full Table Scan Hints

Use the hint FULL(table alias) if you want to force the use of a full table scan. For more information on the FULL hint, see "FULL".

Example 1-6 shows a query that uses an index range scan. Example 1-7 shows the same query using the FULL hint to force a full table scan.

Example 1-6 Before Using the FULL Hint




SELECT employee_id, last_name






FROM employees






WHERE last_name LIKE :b1;













---------------------------------------------------------------------------------






| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|






---------------------------------------------------------------------------------






| 0 | SELECT STATEMENT | | 5 | 95 | 3 (34)|






| 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 5 | 95 | 3 (34)|






|* 2 | INDEX RANGE SCAN | EMP_NAME_IX | 2 | | 3 (34)|






---------------------------------------------------------------------------------













Predicate Information (identified by operation id):






---------------------------------------------------






2 - access("EMPLOYEES"."LAST_NAME" LIKE :Z)






filter("EMPLOYEES"."LAST_NAME" LIKE :Z)










Example 1-7 shows the Example 1-6 query using the FULL hint to force a full table scan.

Example 1-7 After Using the FULL Hint




SELECT /*+ FULL(e) */ employee_id, last_name






FROM employees e






WHERE last_name LIKE :b1;













-------------------------------------------------------------------------






| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|






-------------------------------------------------------------------------






| 0 | SELECT STATEMENT | | 5 | 95 | 4 (25)|






|* 1 | TABLE ACCESS FULL | EMPLOYEES | 5 | 95 | 4 (25)|






-------------------------------------------------------------------------













Predicate Information (identified by operation id):






---------------------------------------------------






1 - filter("E"."LAST_NAME" LIKE :Z)



Assessing I/O for Blocks, not Rows

Oracle does I/O by blocks. Therefore, the optimizer'sdecision to use full table scans is influenced by the percentage ofblocks accessed, not rows. This is called the index clustering factor.If blocks contain single rows, then rows accessed and blocks accessedare the same.

However, most tables have multiple rows in each block.Consequently, the desired number of rows could be clustered together ina few blocks, or they could be spread out over a larger number ofblocks.

See Also:

"Estimator" for more information on the index clustering factor

High Water Mark in DBA_TABLES

The data dictionary keeps track of the blocks that havebeen populated with rows. The high water mark is used as the end markerduring a full table scan. The high water mark is stored in DBA_TABLES.BLOCKS. It is reset when the table is dropped or truncated.

For example, consider a table that had a large number ofrows in the past. Most of the rows have been deleted, and now most ofthe blocks under the high water mark are empty. A full table scan onthis table exhibits poor performance because all the blocks under thehigh water mark are scanned.

Parallel Query Execution

When a full table scan is required, response time can beimproved by using multiple parallel execution servers for scanning thetable. Parallel queries are used generally in low-concurrency datawarehousing environments, because of the potential resource usage.

See Also:

Oracle9i Data Warehousing Guide

Rowid Scans

The rowid of a row specifies the datafile and data blockcontaining the row and the location of the row in that block. Locatinga row by specifying its rowid is the fastest way to retrieve a singlerow, because the exact location of the row in the database is specified.

To access a table by rowid, Oracle first obtains the rowids of the selected rows, either from the statement's WHEREclause or through an index scan of one or more of the table's indexes.Oracle then locates each selected row in the table based on its rowid.

In Example 1-4, "EXPLAIN PLAN Output", an index scan is performed the jobs and departments tables. The rowids retrieved are used to return the row data.

When the Optimizer Uses Rowids

This is generally the second step after retrieving therowid from an index. The table access might be required for any columnsin the statement not present in the index.

Access by rowid does not need to follow every index scan.If the index contains all the columns needed for the statement, thentable access by rowid might not occur.


Note:

Rowids are an internal Oracle representation of wheredata is stored. They can change between versions. Accessing data basedon position is not recommended, because rows can move around due to rowmigration and chaining and also after export and import. Foreign keysshould be based on primary keys. For more information on rowids, see Oracle9i Application Developer's Guide - Fundamentals.


Index Scans

In this method, a row is retrieved by traversing theindex, using the indexed column values specified by the statement. Anindex scan retrieves data from an index based on the value of one ormore columns in the index. To perform an index scan, Oracle searchesthe index for the indexed column values accessed by the statement. Ifthe statement accesses only columns of the index, then Oracle reads theindexed column values directly from the index, rather than from thetable.

The index contains not only the indexed value, but alsothe rowids of rows in the table having that value. Therefore, if thestatement accesses other columns in addition to the indexed columns,then Oracle can find the rows in the table by using either a tableaccess by rowid or a cluster scan.

An index scan can be one of the following types:

  • Index Unique Scans
  • Index Range Scans
  • Index Range Scans Descending
  • Index Skip Scans
  • Full Scans
  • Fast Full Index Scans
  • Index Joins
  • Bitmap Joins

Index Unique Scans

This scan returns, at most, a single rowid. Oracle performs a unique scan if a statement contains a UNIQUE or a PRIMARY KEY constraint that guarantees that only a single row is accessed.

In Example 1-4, "EXPLAIN PLAN Output", an index scan is performed on the jobs and departments tables, using the job_id_pk and dept_id_pk indexes respectively.

When the Optimizer Uses Index Unique Scans

This access path is used when all columns of a unique (B-tree) index are specified with equality conditions.

See Also:

Oracle9i Database Concepts for more details on index structures and for detailed information on how a B-tree is searched

Index Unique Scan Hints

In general, you should not need to use a hint to do aunique scan. There might be cases where the table is across a databaselink and being accessed from a local table, or where the table is smallenough for the optimizer to prefer a full table scan.

The hint INDEX(alias index_name) specifies the index to use, but not an access path (range scan or unique scan). For more information on the INDEX hint, see "INDEX".

Index Range Scans

An index range scan is a common operation for accessingselective data. It can be bounded (bounded on both sides) or unbounded(on one or both sides). Data is returned in the ascending order ofindex columns. Multiple rows with identical values are sorted inascending order by rowid.

If data must be sorted by order, then use the ORDER BY clause, and do not rely on an index. If an index can be used to satisfy an ORDER BY clause, then the optimizer uses this option and avoids a sort.

In Example 1-8,the order has been imported from a legacy system, and you are queryingthe order by the reference used in the legacy system. Assume thisreference is the order_date.

Example 1-8 Index Range Scan




SELECT order_status, order_id






FROM orders






WHERE order_date = :b1;













---------------------------------------------------------------------------------------






| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|






---------------------------------------------------------------------------------------






| 0 | SELECT STATEMENT | | 1 | 20 | 3 (34)|






| 1 | TABLE ACCESS BY INDEX ROWID| ORDERS | 1 | 20 | 3 (34)|






|* 2 | INDEX RANGE SCAN | ORD_ORDER_DATE_IX | 1 | | 2 (50)|






---------------------------------------------------------------------------------------













Predicate Information (identified by operation id):






---------------------------------------------------






2 - access("ORDERS"."ORDER_DATE"=:Z)










This should be a highly selective query, and you shouldsee the query using the index on the column to retrieve the desiredrows. The data returned is sorted in ascending order by the rowids forthe order_date. Because the index column order_date is identical for the selected rows here, the data is sorted by rowid.

When the Optimizer Uses Index Range Scans

The optimizer uses a range scan when it finds one or moreleading columns of an index specified in conditions, such as thefollowing:

  • col1 = :b1
  • col1 < :b1
  • col1 > :b1
  • AND combination of the preceding conditions for leading columns in the index
  • col1 like '%ASD' Wild-card searches should not be in a leading position. The condition col1 like '%ASD' does not result in a range scan.

Range scans can use unique or nonunique indexes. Range scans avoid sorting when index columns constitute the ORDER BY/GROUP BY clause.

Index Range Scan Hints

A hint might be required if the optimizer chooses some other index or uses a full table scan. The hint INDEX(table_alias index_name) specifies the index to use. For more information on the INDEX hint, see "INDEX".

Suppose that order_id has a skeweddistribution. The column has histograms, so the optimizer knows aboutthe distribution. However, with a bind variable, the optimizer does notknow the value and could choose a full table scan. You have two options:

  • Use literals rather than bind variables, which can cause problems due to nonsharing of SQL statements.
  • Use hints in order to share the statements.

Example 1-9 shows a query before using the INDEX hint.

Example 1-9 Before Using the INDEX Hint




SELECT l.line_item_id, order_id, l.unit_price * l.quantity






FROM order_items l






WHERE l.order_id = :b1;













--------------------------------------------------------------------------






| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|






--------------------------------------------------------------------------






| 0 | SELECT STATEMENT | | 6 | 90 | 4 (25)|






|* 1 | TABLE ACCESS FULL | ORDER_ITEMS | 6 | 90 | 4 (25)|






--------------------------------------------------------------------------













Predicate Information (identified by operation id):






---------------------------------------------------






1 - filter("L"."ORDER_ID"=TO_NUMBER(:Z))










Example 1-10 shows the Example 1-9 query using the INDEX hint.

Example 1-10 Using Bind Variables and INDEX Hint:




SELECT /*+ INDEX(l item_order_ix) */ l.line_item_id, order_id,






l.unit_price * l.quantity






FROM order_items l






WHERE l.order_id = :b1;













-----------------------------------------------------------------------------------






| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|






-----------------------------------------------------------------------------------






| 0 | SELECT STATEMENT | | 6 | 90 | 10 (10)|






| 1 | TABLE ACCESS BY INDEX ROWID| ORDER_ITEMS | 6 | 90 | 10 (10)|






|* 2 | INDEX RANGE SCAN | ITEM_ORDER_IX | 6 | | 2 (50)|






-----------------------------------------------------------------------------------













Predicate Information (identified by operation id):






---------------------------------------------------






2 - access("L"."ORDER_ID"=TO_NUMBER(:Z))



Index Range Scans Descending

An index range scan descending is identical to an indexrange scan, except that the data is returned in descending order.Indexes, by default, are stored in ascending order. Usually, this scanis used when ordering data in a descending order to return the mostrecent data first, or when seeking a value less than a specified value.

Example 1-11 uses a two-column unique index on order_id, line_item_id.

Example 1-11 Index Range Scan Descending Using Two-Column Unique Index




SELECT line_item_id, order_id






FROM order_items






WHERE order_id < :b1






ORDER BY order_id DESC;













------------------------------------------------------------------------------------






| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|






------------------------------------------------------------------------------------






| 0 | SELECT STATEMENT | | 33 | 231 | 3 (34)|






| 1 | TABLE ACCESS BY INDEX ROWID | ORDER_ITEMS | 33 | 231 | 3 (34)|






|* 2 | INDEX RANGE SCAN DESCENDING| ITEM_ORDER_IX | 6 | | 3 (34)|






------------------------------------------------------------------------------------













Predicate Information (identified by operation id):






---------------------------------------------------






2 - access("ORDER_ITEMS"."ORDER_ID"<TO_NUMBER(:Z))






filter("ORDER_ITEMS"."ORDER_ID"<TO_NUMBER(:Z))










The data is sorted in descending order by the order_id, line_item_id, rowid of the selected rows. However, because there is only one row for each order_id, line_item_id (item_order_ix is a unique index on the two columns), the rows are sorted by order_id, line_item_id.

When the Optimizer Uses Index Range Scans Descending

The optimizer uses index range scan descending when an order by descending clause can be satisfied by an index.

Index Range Scan Descending Hints

The hint INDEX_DESC(table_alias index_name) is used for this access path. For more information on the INDEX_DESC hint, see "INDEX_DESC".

Index Skip Scans

Index skip scans improve index scans by nonprefixcolumns. Often, scanning index blocks is faster than scanning tabledata blocks.

Skip scanning lets a composite index be split logicallyinto smaller subindexes. In skip scanning, the initial column of thecomposite index is not specified in the query. In other words, it isskipped.

The number of logical subindexes is determined by thenumber of distinct values in the initial column. Skip scanning isadvantageous if there are few distinct values in the leading column ofthe composite index and many distinct values in the nonleading key ofthe index.

Example 1-12 Index Skip Scan

Consider, for example, a table employees (sex, employee_id, address) with a composite index on (sex, employee_id). Splitting this composite index would result in two logical subindexes, one for M and one for F.

For this example, suppose you have the following index data:




(`F',98)






(`F',100)






(`F',102)






(`F',104)






(`M',101)






(`M',103)






(`M',105)










The index is split logically into the following two subindexes:

  • The first subindex has the keys with the value F.
  • The second subindex has the keys with the value M.

Figure 1-4 Index Skip Scan Illustration

Text description of pfgrf197.gif follows
Text description of the illustration pfgrf197.gif

The column sex is skipped in the following query:




SELECT *






FROM employees






WHERE employee_id = 101;










A complete scan of the index is not performed, but the subindex with the value F is searched first, followed by a search of the subindex with the value M.

Full Scans

A full scan is available if a predicate references one ofthe columns in the index. The predicate does not need to be an indexdriver. A full scan is also available when there is no predicate, ifboth the following conditions are met:

  • All of the columns in the table referenced in the query are included in the index.
  • At least one of the index columns is not null.

A full scan can be used to eliminate a sort operation, because the data is ordered by the index key. It reads the blocks singly.

Fast Full Index Scans

Fast full index scans are an alternative to a full tablescan when the index contains all the columns that are needed for thequery, and at least one column in the index key has the NOT NULLconstraint. A fast full scan accesses the data in the index itself,without accessing the table. It cannot be used to eliminate a sortoperation, because the data is not ordered by the index key. It reads the entire index using multiblock reads, unlike a full index scan, and can be parallelized.

Fast full scan is available only with the CBO. You can specify it with the initialization parameter OPTIMIZER_FEATURES_ENABLE or the INDEX_FFS hint. Fast full index scans cannot be performed against bitmap indexes.

A fast full scan is faster than a normal full index scanin that it can use multiblock I/O and can be parallelized just like atable scan.

Fast Full Index Scan Hints

The fast full scan has a special index hint, INDEX_FFS, which has the same format and arguments as the regular INDEX hint. For more information on the INDEX_FFS hint, see "INDEX_FFS".

Fast Full Index Scan Restrictions

Fast full index scans have the following restrictions:

  • At least one indexed column of the table must have the NOT NULL constraint.
  • There must be aparallel clause on the index if you want to perform a fast full indexscan in parallel. The parallel degree of the index is setindependently. The index does not inherit the degree of parallelism of the table.
  • You must have analyzed the index; otherwise, the optimizer might decide not to use it.

Index Joins

An index join is a hash join of several indexes thattogether contain all the table columns that are referenced in thequery. If an index join is used, then no table access is needed,because all the relevant column values can be retrieved from theindexes. An index join cannot be used to eliminate a sort operation.The index join is available only with the CBO.

Index Join Hints

You can specify an index join with the initialization parameter OPTIMIZER_FEATURES_ENABLE or the INDEX_JOIN hint. For more information on the INDEX_JOIN hint, see "INDEX_JOIN".

Bitmap Joins

A bitmap join uses a bitmap for key values and a mappingfunction that converts each bit position to a rowid. Bitmaps canefficiently merge indexes that correspond to several conditions in a WHERE clause, using Boolean operations to resolve AND and OR conditions.

Bitmap access is available only with the CBO.


Note:

Bitmap indexes and bitmap join indexes are available only if you have purchased the Oracle9i Enterprise Edition.


See Also:

Oracle9i Data Warehousing Guide for more information about bitmap indexes

Cluster Scans

A cluster scan is used to retrieve, from a table storedin an indexed cluster, all rows that have the same cluster key value.In an indexed cluster, all rows with the same cluster key value arestored in the same data block. To perform a cluster scan, Oracle firstobtains the rowid of one of the selected rows by scanning the clusterindex. Oracle then locates the rows based on this rowid.

Hash Scans

A hash scan is used to locate rows in a hash cluster,based on a hash value. In a hash cluster, all rows with the same hashvalue are stored in the same data block. To perform a hash scan, Oraclefirst obtains the hash value by applying a hash function to a clusterkey value specified by the statement. Oracle then scans the data blockscontaining rows with that hash value.

Sample Table Scans

A sample table scan retrieves a random sample of data from a table. This access path is used when a statement's FROM clause includes the SAMPLE clause or the SAMPLE BLOCK clause. To perform a sample table scan when sampling by rows (the SAMPLE clause), Oracle reads a specified percentage of rows in the table. To perform a sample table scan when sampling by blocks (the SAMPLE BLOCK clause), Oracle reads a specified percentage of table blocks.

Oracle does not support sample table scans when the queryinvolves a join or a remote table. However, you can perform anequivalent operation by using a CREATE TABLE AS SELECTquery to materialize a sample of an underlying table. You then rewritethe original query to refer to the newly created table sample.Additional queries can be written to materialize samples for othertables. Sample table scans require the CBO.

Example 1-13 uses a sample table scan to access 1% of the employees table, sampling by blocks.

Example 1-13 Sample Table Scan




SELECT *






FROM employees SAMPLE BLOCK (1);










The EXPLAIN PLAN output for this statement might look like this:




-------------------------------------------------------------------------






| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|






-------------------------------------------------------------------------






| 0 | SELECT STATEMENT | | 1 | 68 | 3 (34)|






| 1 | TABLE ACCESS SAMPLE | EMPLOYEES | 1 | 68 | 3 (34)|






-------------------------------------------------------------------------



How the CBO Chooses an Access Path

The CBO chooses an access path based on the following factors:

  • The available access paths for the statement
  • The estimated cost of executing the statement, using each access path or combination of paths

To choose an access path, the optimizer first determineswhich access paths are available by examining the conditions in thestatement's WHERE clause and its FROM clause for the SAMPLE or SAMPLE BLOCKclause. The optimizer then generates a set of possible execution plansusing available access paths and estimates the cost of each plan, usingthe statistics for the index, columns, and tables accessible to thestatement. Finally, the optimizer chooses the execution plan with thelowest estimated cost.

When choosing an access path, the CBO is influenced by the following:

  • Optimizer Hints

    The optimizer's choice among available access paths can be overridden with hints, except when the statement's FROM clause contains SAMPLE or SAMPLE BLOCK.

    See Also:

    Chapter 5, "Optimizer Hints" for information about hints in SQL statements

  • Old Statistics

    For example, if a table has not been analyzed since it was created, and if it has less than DB_FILE_MULTIBLOCK_READ_COUNT blocks under the high water mark, then the optimizer thinks that the table is small and uses a full table scan. Review the LAST_ANALYZED and BLOCKS columns in the ALL_TABLES table to see what the statistics reflect.

Choosing an Access Path Examples

This section discusses how the optimizer chooses an access path.

Case 1

In Example 1-14 the query uses an equality condition in its WHERE clause to select all employees named Jackson.

Example 1-14 Access Path Selection




SELECT *






FROM employees






WHERE last_name = 'JACKSON';










If the last_name column is a unique orprimary key, then the optimizer determines that there is only oneemployee named Jackson, and the query returns only one row. In thiscase, the query is very selective, and the optimizer is most likely toaccess the table using a unique scan on the index that enforces theunique or primary key.

Case 2

Consider again the query in Example 1-14. If the last_name column is not a unique or primary key, then the optimizer can use the following statistics to estimate the query's selectivity:

  • USER_TAB_COLUMNS.NUM_DISTINCT, which is the number of values for each column in the table
  • USER_TABLES.NUM_ROWS, which is the number of rows in each table

By dividing the number of rows in the employees table by the number of distinct values in the last_name column, the optimizer estimates what percentage of employees have the same name. By assuming that the last_name values are distributed uniformly, the optimizer uses this percentage as the estimated selectivity of the query.

Case 3

The following query selects all employees with employee ID numbers less than 7500:




SELECT *






FROM employees






WHERE employee_id < 7500;










To estimate the selectivity of the query, the optimizer uses the boundary value of 7500 in the WHERE clause condition and the values of the HIGH_VALUE and LOW_VALUE statistics for the employee_id column, if available. These statistics can be found in the USER_TAB_COL_STATISTICS view or the USER_TAB_COLUMNS view. The optimizer assumes that employee_idvalues are distributed evenly in the range between the lowest value andhighest value. The optimizer then determines what percentage of thisrange is less than the value 7500 and uses this value as the estimatedselectivity of the query.

Case 4

The following query uses a bind variable rather than a literal value for the boundary value in the WHERE clause condition:




SELECT *






FROM employees






WHERE employee_id < :e1;










The optimizer does not know the value of the bind variable e1. The value of e1might be different each time the query is executed. For this reason,the optimizer cannot use the means described in the previous example todetermine selectivity of this query. In this case, the optimizerheuristically guesses a small value for the selectivity, using aninternal default value. The optimizer makes this assumption whenever abind variable is used as a boundary value in a condition with one ofthe following operators: <, >, <=, or >=.

The optimizer's treatment of bind variables can cause itto choose different execution plans for SQL statements that differ onlyin the use of bind variables rather than constants. For example, theoptimizer might choose different execution plans for an embedded SQLstatement with a bind variable in an Oracle precompiler program and thesame SQL statement with a constant in SQL*Plus.

Case 5

The following query uses two bind variables as boundary values in the condition with the BETWEEN operator:




SELECT *






FROM employees






WHERE employee_id BETWEEN :low_e AND :high_e;










The optimizer rewrites the BETWEEN condition as the following two conditions:




employee_id >= :low_e






employee_id <= :high_e










The optimizer heuristically estimates a small selectivity(an internal default value) for indexed columns in order to favor theuse of the index.

Case 6

The following query uses the BETWEEN operator to select all employees with employee ID numbers between 7500 and 7800:




SELECT *






FROM employees






WHERE employee_id BETWEEN 7500 AND 7800;










To determine the selectivity of this query, the optimizer rewrites the WHERE clause condition into the following two conditions:




employee_id >= 7500






employee_id <= 7800










The optimizer estimates the individual selectivity of each condition, using the means described in Case 4.The optimizer then uses these selectivities (S1 and S2) and theabsolute value function (ABS) to estimate the selectivity (S) of the BETWEEN condition, as follows:




S = ABS(S1 + S2 - 1)



Understanding Joins

Joins are statements that retrieve data from more than one table. A join is characterized by multiple tables in the FROM clause, and the relationship between the tables is defined through the existence of a join condition in the WHERE clause.

This section discusses:

  • How the CBO Executes Join Statements
  • How the CBO Chooses the Join Method
  • How the CBO Chooses Execution Plans for Join Types
  • Join Methods:
    • Nested Loop Joins
    • Hash Joins
    • Sort Merge Joins
    • Cartesian Joins
    • Outer Joins

      See Also:

      A discussion of joins in the Oracle9i SQL Reference

How the CBO Executes Join Statements

To choose an execution plan for a join statement, the optimizer must make these interrelated decisions:

Access Paths

As for simple statements, the optimizer must choose an access path to retrieve data from each table in the join statement.

Join Method

To join each pair of row sources, Oracle must perform ajoin operation. Join methods include nested loop, sort merge,cartesian, and hash joins.

Join Order

To execute a statement that joins more than two tables,Oracle joins two of the tables and then joins the resulting row sourceto the next table. This process is continued until all tables arejoined into the result.

See Also:

"Understanding Access Paths for the RBO"

"Understanding Access Paths for the CBO"

How the CBO Chooses the Join Method

The optimizer estimates the cost of each join method andchooses the method with the least cost. If a join returns many rows,then the optimizer considers the following three factors:

  • A nested loop join isinefficient when a join returns a large number of rows (typically, morethan 10,000 rows is considered large), and the optimizer might choosenot to use it. The cost of a nested loop join is calculated by thefollowing formula:



    cost= access cost of A + (access cost of B * number of rows from A)










  • If you are using theCBO, then a hash join is the most efficient join when a join returns alarge number or rows. The cost of a hash join is calculated by thefollowing formula:



    cost= (access cost of A * number of hash partitions of B) + access cost of B










  • If you are using theRBO, then a merge join is the most efficient join when a join returns alarge number or rows. The cost of a merge join is calculated by thefollowing formula:



    cost= access cost of A + access cost of B +(sort cost of A + sort cost of B)










    When the data is presorted, the sort costs are both zero.


    Note:

    Oracle Corporation strongly advises the use of cost-basedoptimization. Rule-based optimization will be deprecated in a futurerelease.


How the CBO Chooses Execution Plans for Join Types

The following considerations apply to both the cost-based and rule-based approaches:

  • The optimizer firstdetermines whether joining two or more tables definitely results in arow source containing at most one row. The optimizer recognizes suchsituations based on UNIQUE and PRIMARY KEYconstraints on the tables. If such a situation exists, then theoptimizer places these tables first in the join order. The optimizerthen optimizes the join of the remaining set of tables.
  • For join statementswith outer join conditions, the table with the outer join operator mustcome after the other table in the condition in the join order. Theoptimizer does not consider join orders that violate this rule.

With the CBO, the optimizer generates a set of executionplans, according to possible join orders, join methods, and availableaccess paths. The optimizer then estimates the cost of each plan andchooses the one with the lowest cost. The optimizer estimates costs inthe following ways:

  • The cost of a nestedloops operation is based on the cost of reading each selected row ofthe outer table and each of its matching rows of the inner table intomemory. The optimizer estimates these costs using the statistics in thedata dictionary.
  • The cost of a sort merge join is based largely on the cost of reading all the sources into memory and sorting them.

The optimizer also considers other factors when determining the cost of each operation. For example:

  • A smaller sort areasize is likely to increase the cost for a sort merge join becausesorting takes more CPU time and I/O in a smaller sort area. Sort areasize is specified by the initialization parameter SORT_AREA_SIZE.


    Note:

    Oracle does not recommend using the SORT_AREA_SIZEparameter unless the instance is configured with the shared serveroption. Oracle recommends instead that you enable automatic sizing ofSQL work areas by setting PGA_AGGREGATE_TARGET. SORT_AREA_SIZE is retained only for backward compatibility.


  • A larger multiblockread count is likely to decrease the cost for a sort merge join inrelation to a nested loop join. If a large number of sequential blockscan be read from disk in a single I/O, then an index on the inner tablefor the nested loop join is less likely to improve performance over afull table scan. The multiblock read count is specified by theinitialization parameter DB_FILE_MULTIBLOCK_READ_COUNT.

With the CBO, the optimizer's choice of join orders can be overridden with the ORDERED hint. If the ORDEREDhint specifies a join order that violates the rule for an outer join,then the optimizer ignores the hint and chooses the order. Also, youcan override the optimizer's choice of join method with hints.

See Also:

Chapter 5, "Optimizer Hints" for more information about optimizer hints

How the CBO Executes Anti-joins

An anti-join returns rows from the left side of thepredicate for which there are no corresponding rows on the right sideof the predicate. That is, it returns rows that fail to match (NOT IN)the subquery on the right side. For example, an anti-join can select alist of employees who are not in a particular set of departments:

Example 1-15 Anti-join on Employees and Departments




SELECT * FROM employees






WHERE department_id NOT IN






(SELECT department_id FROM departments






WHERE location_id = 1700);










The optimizer uses a nested loops algorithm for NOT IN subqueries by default. However, if the MERGE_AJ, HASH_AJ, or NL_AJ hint is used and various required conditions are met, the NOT IN uncorrelated subquery can be changed into a sort merge or hash antijoin.

How the CBO Executes Semi-joins

A semi-join returns rows that match an EXISTSsubquery without duplicating rows from the left side of the predicatewhen multiple rows on the right side satisfy the criteria of thesubquery. For example:

Example 1-16 Semi-join on Employees and Departments




SELECT * FROM departments






WHERE EXISTS






(SELECT * FROM employees






WHERE departments.department_id = employees.department_id






AND employees.salary > 2500);










In this query, only one row needs to be returned from departments, even though many rows in employees might match the subquery. If there is no index on the salary column in employees, then a semi-join can be used to improve query performance.

The optimizer uses a nested loops algorithm by default for IN or EXISTS subqueries that cannot be merged with the containing query. However, if the MERGE_SJ, HASH_SJ, or NL_SJ hint is used and various required conditions are met, the subquery can be changed into a sort merge or hash semi-join.

See Also:

Chapter 5, "Optimizer Hints" for information about optimizer hints


Note:

Semi-join transformation cannot be done if the subquery is on an OR branch. This also applies to anti-joins.


How the CBO Executes Star Queries

Some data warehouses are designed around a star schema,which includes a large fact table and several small dimension (lookup)tables. The fact table stores primary information. Each dimension tablestores information about an attribute in the fact table.

A star query is a join between a fact table and a numberof lookup tables. Each lookup table is joined by its primary keys tothe corresponding foreign keys of the fact table, but the lookup tablesare not joined to each other.

The CBO recognizes star queries and generates efficient execution plans for them. Star queries are not recognized by the RBO.

A typical fact table contains keys and measures. For example, a simple fact table might contain the measure Sales, and the keys Time, Product, and Market. In this case there would be corresponding dimension tables for Time, Product, and Market. The Product dimension table, for example, typically contains information about each product number that appears in the fact table.

A star join uses a join of foreign keys in a fact tableto the corresponding primary keys in dimension tables. The fact tablenormally has a concatenated index on the foreign key columns tofacilitate this type of join, or it has a separate bitmap index on eachforeign key column.

See Also:

Oracle9i Data Warehousing Guide for more information about tuning star queries

Nested Loop Joins

Nested loop joins are useful when small subsets of dataare being joined and if the join condition is an efficient way ofaccessing the second table.

It is very important to ensure that the inner table isdriven from (dependent on) the outer table. If the inner table's accesspath is independent of the outer table, then the same rows areretrieved for every iteration of the outer loop, degrading performanceconsiderably. In such cases, hash joins joining the two independent rowsources perform better.

See Also:

"Cartesian Joins"

A nested loop join involves the following steps:

  1. The optimizer determines the driving table and designates it as the outer table.
  2. The other table is designated as the inner table.
  3. For every row inthe outer table, Oracle accesses all the rows in the inner table. Theouter loop is for every row in outer table and the inner loop is forevery row in the inner table. The outer loop appears before the innerloop in the execution plan, as follows:



    NESTED LOOPS






    outer_loop












    inner_loop













Nested Loop Example

This section discusses the outer and inner loops for the following nested loops in the query in Example 1-3.




...






| 2 | NESTED LOOPS | | 3 | 141 | 7 (15)|






|* 3 | TABLE ACCESS FULL | EMPLOYEES | 3 | 60 | 4 (25)|






| 4 | TABLE ACCESS BY INDEX ROWID| JOBS | 19 | 513 | 2 (50)|






|* 5 | INDEX UNIQUE SCAN | JOB_ID_PK | 1 | | |






...



In this example, the outer loop retrieves all the rows of the employees table. For every employee retrieved by the outer loop, the inner loop retrieves the associated row in the jobs table.

Outer loop

In the execution plan in Example 1-4, the outer loop and the equivalent statement are as follows:




3 | TABLE ACCESS FULL | EMPLOYEES













SELECT e.employee_id, e.salary






FROM employees e






WHERE e.employee_id < 103



Inner loop

The execution plan in Example 1-4 shows the inner loop being iterated for every row fetched from the outer loop, as follows:




4 | TABLE ACCESS BY INDEX ROWID| JOBS






5 | INDEX UNIQUE SCAN | JOB_ID_PK













SELECT j.job_title






FROM jobs j






WHERE e.job_id = j.job_id



When the Optimizer Uses Nested Loop Joins

The optimizer uses nested loop joins when joining smallnumber of rows, with a good driving condition between the two tables.You drive from the outer loop to the inner loop, so the order of tablesin the execution plan is important.

The outer loop is the driving row source. It produces aset of rows for driving the join condition. The row source can be atable accessed using an index scan or a full table scan. Also, the rowscan be produced from any other operation. For example, the output froma nested loop join can be used as a row source for another nested loopjoin.

The inner loop is iterated for every row returned fromthe outer loop, ideally by an index scan. If the access path for theinner loop is not dependent on the outer loop, then you can end up witha Cartesian product; for every iteration of the outer loop, the innerloop produces the same set of rows. Therefore, you should use otherjoin methods when two independent row sources are joined together.

Nested Loop Join Hints

If the optimizer is choosing to use some other join method, you can use the USE_NL(table1 table2) hint, where table1 and table2 are the aliases of the tables being joined.

For some SQL examples, the data is small enough for theoptimizer to prefer full table scans and use hash joins. This is thecase for the SQL example shown in Example 1-17, "Hash Joins". However, you can add a USE_NL hint that changes the join method to nested loop. For more information on the USE_NL hint, see "USE_NL".

Nesting Nested Loops

The outer loop of a nested loop can be a nested loopitself. You can nest two or more outer loops together to join as manytables as needed. Each loop is a data access method, as follows:




SELECT STATEMENT






NESTED LOOP 3






NESTED LOOP 2 (OUTER LOOP 3.1)






NESTED LOOP 1 (OUTER LOOP 2.1)






OUTER LOOP 1.1 - #1






INNER LOOP 1.2 - #2






INNER LOOP 2.2 - #3






INNER LOOP 3.2 - #4



Hash Joins

Hash joins are used for joining large data sets. Theoptimizer uses the smaller of two tables or data sources to build ahash table on the join key in memory. It then scans the larger table,probing the hash table to find the joined rows.

This method is best used when the smaller table fits inavailable memory. The cost is then limited to a single read pass overthe data for the two tables.

However, if the hash table grows too big to fit into thememory, then the optimizer breaks it up into different partitions. Asthe partitions exceed allocated memory, parts are written to temporarysegments on disk. Larger temporary extent sizes lead to improved I/Owhen writing the partitions to disk; the recommended temporary extentis about 1 MB. Temporary extent size is specified by INITIAL and NEXT for permanent tablespaces and by UNIFORM SIZE for temporary tablespaces.

After the hash table is complete, the following processes occur:

  1. The second, larger table is scanned.
  2. It is broken up into partitions like the smaller table.
  3. The partitions are written to disk.

When the hash table build is complete, it is possiblethat an entire hash table partition is resident in memory. Then, you donot need to build the corresponding partition for the second (larger)table. When that table is scanned, rows that hash to the resident hashtable partition can be joined and returned immediately.

Each hash table partition is then read into memory, and the following processes occur:

  1. The corresponding partition for the second table is scanned.
  2. The hash table is probed to return the joined rows.

This process is repeated for the rest of the partitions.The cost can increase to two read passes over the data and one writepass over the data.

If the hash table does not fit in the memory, it ispossible that parts of it may need to be swapped in and out, dependingon the rows retrieved from the second table. Performance for thisscenario can be extremely poor.

When the Optimizer Uses Hash Joins

The optimizer uses a hash join to join two tables if theyare joined using an equijoin and if either of the following conditionsare true:

  • A large amount of data needs to be joined.
  • A large fraction of the table needs to be joined.

In Example 1-17, the table orders is used to build the hash table, and order_items is the larger table, which is scanned later.

Example 1-17 Hash Joins




SELECT o.customer_id, l.unit_price * l.quantity






FROM orders o ,order_items l






WHERE l.order_id = o.order_id;













--------------------------------------------------------------------------






| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|






--------------------------------------------------------------------------






| 0 | SELECT STATEMENT | | 665 | 13300 | 8 (25)|






|* 1 | HASH JOIN | | 665 | 13300 | 8 (25)|






| 2 | TABLE ACCESS FULL | ORDERS | 105 | 840 | 4 (25)|






| 3 | TABLE ACCESS FULL | ORDER_ITEMS | 665 | 7980 | 4 (25)|






--------------------------------------------------------------------------













Predicate Information (identified by operation id):






---------------------------------------------------






1 - access("L"."ORDER_ID"="O"."ORDER_ID")



Hash Join Hints

Apply the USE_HASH hint to advise theoptimizer to use a hash join when joining two tables together. If youare having trouble getting the optimizer to use hash joins, investigatethe values for the HASH_AREA_SIZE and HASH_JOIN_ENABLED parameters.


Note:

Oracle does not recommend using the HASH_AREA_SIZEparameter unless the instance is configured with the shared serveroption. Oracle recommends instead that you enable automatic sizing ofSQL work areas by setting PGA_AGGREGATE_TARGET. HASH_AREA_SIZE is retained only for backward compatibility.


For more information on the USE_HASH hint, see "USE_HASH".

Sort Merge Joins

Sort merge joins can be used to join rows from twoindependent sources. Hash joins generally perform better than sortmerge joins. On the other hand, sort merge joins can perform betterthan hash joins if both of the following conditions exist:

  • The row sources are sorted already.
  • A sort operation does not have to be done.

However, if a sort merge join involves choosing a sloweraccess method (an index scan as opposed to a full table scan), then thebenefit of using a sort merge might be lost.

Sort merge joins are useful when the join conditionbetween two tables is an inequality condition (but not a nonequality)like <, <=, >, or >=. Sort merge joins perform better thannested loop joins for large data sets. You cannot use hash joins unlessthere is an equality condition.

In a merge join, there is no concept of a driving table. The join consists of two steps:

  1. Sort join operation: Both the inputs are sorted on the join key.
  2. Merge join operation: The sorted lists are merged together.

If the input is already sorted by the join column, then a sort join operation is not performed for that row source.

When the Optimizer Uses Sort Merge Joins

The optimizer can choose a sort merge join over a hashjoin for joining large amounts of data if any of the followingconditions are true:

  • The join condition between two tables is not an equi-join.
  • OPTIMIZER_MODE is set to RULE.
  • HASH_JOIN_ENABLED is false.
  • Because of sorts already required by other operations, the optimizer finds it is cheaper to use a sort merge than a hash join.
  • The optimizer thinks that the cost of a hash join is higher, based on the settings of HASH_AREA_SIZE and SORT_AREA_SIZE.


    Note:

    Oracle does not recommend using the HASH_AREA_SIZE and SORT_AREA_SIZEparameters unless the instance is configured with the shared serveroption. Oracle recommends instead that you enable automatic sizing ofSQL work areas by setting PGA_AGGREGATE_TARGET. HASH_AREA_SIZE and SORT_AREA_SIZE are retained only for backward compatibility.


Sort Merge Join Hints

To advise the optimizer to use a sort merge join, apply the USE_MERGE hint. You might also need to give hints to force an access path.

There are situations where it is better to override the optimize with the USE_MERGEhint. For example, the optimizer can choose a full scan on a table andavoid a sort operation in a query. However, there is an increased costbecause a large table is accessed through an index and single blockreads, as opposed to faster access through a full table scan.

For more information on the USE_MERGE hint, see "USE_MERGE".

Cartesian Joins

A Cartesian join is used when one or more of the tablesdoes not have any join conditions to any other tables in the statement.The optimizer joins every row from one data source with every row fromthe other data source, creating the Cartesian product of the two sets.

When the Optimizer Uses Cartesian Joins

The optimizer uses Cartesian joins when it is asked tojoin two tables with no join conditions. In some cases, a common filtercondition between the two tables could be picked up by the optimizer asa possible join condition. This is even more dangerous, because thejoins are not flagged in the execution plan as being a Cartesianproduct.

Cartesian joins generally result from poorly written SQL.You should specify join criteria and avoid Cartesian products. In somesituations, a query can involve a large number of tables and an extratable is contained in the FROM clause, but not in the WHERE clause. With such queries, a DISTINCT clause can weed out multiple rows. However, while the DISTINCT clause can be used to remove the extra tuples generated by the Cartesian product, the performance can be severely degraded.

In a SQL query where the inner table of a nested loopoperation is not driven from the outer table, but from an independentrow source, then the rows accessed can be the same as in a Cartesianproduct. Because the join condition is present but is applied afteraccessing the table, the result is not a Cartesian product. However,the cost of accessing the table (rows accessed) is about the same.

Cartesian Join Hints

Applying the ORDERED hint, causes theoptimizer uses a Cartesian join. By specifying a table before its jointable is specified, the optimizer does a Cartesian join. For moreinformation on the ORDERED hint, see "ORDERED".

Outer Joins

An outer join extends the result of a simple join. Anouter join returns all rows that satisfy the join condition and alsoreturns some or all of those rows from one table for which no rows fromthe other satisfy the join condition.

Nested Loop Outer Joins

This operation is used when an outer join is used betweentwo tables. The outer join returns the outer (preserved) table rows,even when there are no corresponding rows in the inner (optional) table.

In a regular outer join, the optimizer chooses the orderof tables (driving and driven) based on the cost. However, in a nestedloop outer join, the order of tables is determined by the joincondition. The outer table, with rows that are being preserved, is usedto drive to the inner table.

The optimizer uses nested loop joins to process an outer join in the following circumstances:

  • It is possible to drive from the outer table to inner table.
  • Data volume is low enough to make the nested loop method efficient.

For an example of a nested loop outer join, you can add the USE_NL hint to Example 1-18 to ensure that a nested loop is used. For example:




SELECT /*+ USE_NL(c o) */ cust_last_name, sum(nvl2(o.customer_id,0,1)) "Count"



Hash Join Outer Joins

The optimizer uses hash joins for processing an outerjoin if the data volume is high enough to make the hash join methodefficient or if it is not possible to drive from the outer table toinner table.

Like an outer join, the order of tables is not determinedby the cost, but by the join condition. The outer table (with preservedrows) is used to build the hash table, and the inner table is used toprobe the hash table.

Example 1-18shows a typical hash join outer join query. In this example, all thecustomers with credit limits greater than 1000 are queried. An outerjoin is needed so that you do not miss the customers who do not haveany orders.

Example 1-18 Hash Join Outer Joins




SELECT cust_last_name, sum(nvl2(o.customer_id,0,1)) "Count"






FROM customers c, orders o






WHERE c.credit_limit > 1000






AND c.customer_id = o.customer_id(+)






GROUP BY cust_last_name;













-------------------------------------------------------------------------






| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|






-------------------------------------------------------------------------






| 0 | SELECT STATEMENT | | 168 | 3192 | 11 (28)|






| 1 | SORT GROUP BY | | 168 | 3192 | 11 (28)|






|* 2 | HASH JOIN OUTER | | 260 | 4940 | 10 (20)|






|* 3 | TABLE ACCESS FULL | CUSTOMERS | 260 | 3900 | 6 (17)|






|* 4 | TABLE ACCESS FULL | ORDERS | 105 | 420 | 4 (25)|






-------------------------------------------------------------------------













Predicate Information (identified by operation id):






---------------------------------------------------













2 - access("C"."CUSTOMER_ID"="O"."CUSTOMER_ID"(+))






3 - filter("C"."CREDIT_LIMIT">1000)






4 - filter("O"."CUSTOMER_ID"(+)>0)










The query looks for customers which satisfy various conditions. An outer join returns NULLfor the inner table columns along with the outer (preserved) table rowswhen it does not find any corresponding rows in the inner table. Thisoperation finds all the customers rows that do not have any orders rows.

In this case, the outer join condition is the following:




customers.customer_id = orders.customer_id(+)










The components of this condition represent the following:

  • The outer table is customers.
  • The inner table is orders.
  • The join preserves the customers rows, including those rows without a corresponding row in orders.
  • The hash table is built using customers.
  • The hash table is probed using orders.

You could use a NOT EXISTSsubquery to return the rows. However, because you are querying all therows in the table, the hash join performs better (unless the NOT EXISTS subquery is not nested).

In Example 1-19,the outer join is to a multitable view. The optimizer cannot drive intothe view like in a normal join or push the predicates, so it builds theentire row set of the view.

Example 1-19 Outer Join to a Multitable View




SELECT c.cust_last_name, sum(revenue)






FROM customers c, v_orders o






WHERE c.credit_limit > 2000






AND o.customer_id(+) = c.customer_id






GROUP BY c.cust_last_name;













----------------------------------------------------------------------------






| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|






----------------------------------------------------------------------------






| 0 | SELECT STATEMENT | | 144 | 4608 | 16 (32)|






| 1 | SORT GROUP BY | | 144 | 4608 | 16 (32)|






|* 2 | HASH JOIN OUTER | | 663 | 21216 | 15 (27)|






|* 3 | TABLE ACCESS FULL | CUSTOMERS | 195 | 2925 | 6 (17)|






| 4 | VIEW | V_ORDERS | 665 | 11305 | |






| 5 | SORT GROUP BY | | 665 | 15960 | 9 (34)|






|* 6 | HASH JOIN | | 665 | 15960 | 8 (25)|






|* 7 | TABLE ACCESS FULL| ORDERS | 105 | 840 | 4 (25)|






| 8 | TABLE ACCESS FULL| ORDER_ITEMS | 665 | 10640 | 4 (25)|






----------------------------------------------------------------------------













Predicate Information (identified by operation id):






---------------------------------------------------






2 - access("O"."CUSTOMER_ID"(+)="C"."CUSTOMER_ID")






3 - filter("C"."CREDIT_LIMIT">2000)






6 - access("O"."ORDER_ID"="L"."ORDER_ID")






7 - filter("O"."CUSTOMER_ID">0)










The view definition is as follows:




CREATE OR REPLACE view v_orders AS






SELECT l.product_id, SUM(l.quantity*unit_price) revenue,






o.order_id, o.customer_id






FROM orders o, order_items l






WHERE o.order_id = l.order_id






GROUP BY l.product_id, o.order_id, o.customer_id;



Sort Merge Outer Joins

When an outer join cannot drive from the outer(preserved) table to the inner (optional) table, it cannot use a hashjoin or nested loop joins. Then it uses the sort merge outer join forperforming the join operation.

The optimizer uses sort merge for an outer join if anested loop join is inefficient. A nested loop join can be inefficientbecause of data volumes; or if, because of sorts already required byother operations, the optimizer finds it is cheaper to use a sort mergeover a hash join.

Full Outer Joins

A full outer join acts like a combination of the left andright outer joins. In addition to the inner join, rows from both tablesthat have not been returned in the result of the inner join arepreserved and extended with nulls. In other words, full outer joins letyou join tables together, yet still show rows that do not havecorresponding rows in the joined tables.

The query in Example 1-20 retrieves all departments and all employees in each department, but also includes:

  • Any employees without departments
  • Any departments without employees

Example 1-20 Full Outer Join




SELECT d.department_id, e.employee_id






FROM employees e






FULL OUTER JOIN departments d






ON e.department_id = d.department_id






ORDER BY d.department_id;










The statement produces the following output:




DEPARTMENT_ID EMPLOYEE_ID






------------- -----------






10 200






20 201






20 202






30 114






30 115






30 116






...






270






280






178






207













125 rows selected.



Setting Cost-Based Optimizer Parameters

This section contains some of the parameters specific tothe optimizer. The following sections are especially useful when tuningOracle applications.

Enabling CBO Features

You enable optimizer features by setting the OPTIMIZER_FEATURES_ENABLE initialization parameter.

OPTIMIZER_FEATURES_ENABLE Parameter

The OPTIMIZER_FEATURES_ENABLE parameter actsas an umbrella parameter for the CBO. This parameter can be used toenable a series of CBO-related features, depending on the release. Itaccepts one of a list of valid string values corresponding to therelease numbers, such as 8.0.4, 8.1.5, and so on. For example, thefollowing statement enables the use of the optimizer features in Oracle9i, Release 2 (9.2).




OPTIMIZER_FEATURES_ENABLE=9.2.0;










This statement causes the Release 2 (9.2) optimizer features to be used in generating query plans. For example, you can use ALL_ROWS or FIRST_ROWS optimizer mode for recursive user SQL generated by PL/SQL procedures. Prior to Oracle8i Release 8.1.6, only RULE or CHOOSE optimizer mode was used for such recursive SQL, and when the user explicitly set the OPTIMIZER_MODE parameter to FIRST_ROWS or ALL_ROWS, a CHOOSE mode was used instead.

The OPTIMIZER_FEATURES_ENABLE parameter wasintroduced in Oracle8, release 8.0.4. The main goal was to allowcustomers the ability to upgrade the Oracle server, yet preserve theold behavior of the CBO after the upgrade. For example, when youupgrade the Oracle server from release 8.1.5 to release 8.1.6, thedefault value of the OPTIMIZER_FEATURES_ENABLE parameterchanges from 8.1.5 to 8.1.6. This upgrade results in the CBO enablingoptimization features based on 8.1.6, as opposed to 8.1.5.

For plan stability or backward compatibility reasons, youmight not want the query plans to change because of new optimizerfeatures in a new release. In such a case, you can set the OPTIMIZER_FEATURES_ENABLEparameter to an earlier version. For example, to preserve the behaviorof the CBO to release 8.1.5, set the parameter as follows:




OPTIMIZER_FEATURES_ENABLE=8.1.5;










This statement disables all new optimizer features that were added in releases following release 8.1.5.


Note:

If you upgrade to a new release and you want to enablethe features available with that release, then you do not need toexplicitly set the OPTIMIZER_FEATURES_ENABLE parameter.


Oracle Corporation does not recommend explicitly setting the OPTIMIZER_FEATURES_ENABLEparameter to an earlier release. Instead, execution plan or queryperformance issues should be resolved on a case-by-case basis.

Table 1-3 describes some of the optimizer features that are enabled when you set the OPTIMIZER_FEATURES_ENABLE parameter to each of the following release values.

Table 1-3  Features Included with the OPTIMIZER_FEATURES_ENABLE Parameter
Set to Value New Features include

8.0.4

Index fast full scan

Ordered nested loop join method

8.0.5

No new features

8.0.6

Improved outer join cardinality estimation

8.1.4

No new features

8.1.5

Improved verification of NULLs inclusion in B-tree indexes

8.1.6

Use of FIRST_ROWS or ALL_ROWS mode for user recursive SQL

Random distribution of left input of nested loop join

Improved row-length calculation

Improved method of computing selectivity based on histogram

Partition pruning based on predicates in a subquery

8.1.7

Common subexpression optimization

Statistics of a column imbedded in some selected functions such as TO_CHAR to compute selectivity

Improved partition statistics aggregation

9.0.1

Peeking of user-defined bind variables

Complex view merging

Push-join predicate

Consideration of bitmap access paths for tables with only B-tree indexes

Subquery unnesting

Index joins

9.0.2

Parallel broadcast in parallel query using joins to small reference or lookup tables

Peeking of User-Defined Bind Variables

The CBO peeks at the values of user-defined bindvariables on the first invocation of a cursor. This feature lets theoptimizer determine the selectivity of any WHERE clausecondition, as well as if literals have been used instead of bindvariables. On subsequent invocations of the cursor, no peeking takesplace, and the cursor is shared, based on the standard cursor-sharingcriteria, even if subsequent invocations use different bind values.

When bind variables are used in a statement, it isassumed that cursor sharing is intended and that different invocationsare supposed to use the same execution plan. If different invocationsof the cursor would significantly benefit from different executionplans, then bind variables may have been used inappropriately in theSQL statement.

Controlling the Behavior of the CBO

This section lists some initialization parameters thatcan be used to control the behavior of the cost-based optimizer. Theseparameters can be used to enable various optimizer features in order toimprove the performance of SQL execution.

CURSOR_SHARING

This parameter converts literal values in SQL statementsto bind variables. Converting the values improves cursor sharing andcan affect the execution plans of SQL statements. The optimizergenerates the execution plan based on the presence of the bindvariables and not the actual literal values.

DB_FILE_MULTIBLOCK_READ_COUNT

This parameter specifies the number of blocks that areread in a single I/O during a full table scan or index fast full scan.The optimizer uses the value of DB_FILE_MULTIBLOCK_READ_COUNTto cost full table scans and index fast full scans. Larger valuesresult in a cheaper cost for full table scans and can result in theoptimizer choosing a full table scan over an index scan.

HASH_AREA_SIZE

This parameter specifies the amount of memory (in bytes)to be used for hash joins. The CBO uses this parameter to cost a hashjoin operation. Larger values for HASH_AREA_SIZE reduce the cost of hash joins.


Note:

Oracle does not recommend using the HASH_AREA_SIZEparameter unless the instance is configured with the shared serveroption. Oracle recommends instead that you enable automatic sizing ofSQL work areas by setting PGA_AGGREGATE_TARGET. HASH_AREA_SIZE is retained only for backward compatibility.


HASH_JOIN_ENABLED

This parameter can be used to enable or disable the use of hash joins as a join method chosen by the optimizer. When set to true,the optimizer considers hash joins as a possible join method. The CBOchooses a hash join if the cost is better than other join methods, suchas nested loops or sort merge joins.

OPTIMIZER_INDEX_CACHING

This parameter controls the costing of an index probe in conjunction with a nested loop. The range of values 0 to 100 for OPTIMIZER_INDEX_CACHINGindicates percentage of index blocks in the buffer cache, whichmodifies the optimizer's assumptions about index caching for nestedloops and IN-list iterators. A value of 100 infers that100% of the index blocks are likely to be found in the buffer cache andthe optimizer adjusts the cost of an index probe or nested loopaccordingly. Use caution when using this parameter because executionplans can change in favor of index caching.

OPTIMIZER_INDEX_COST_ADJ

This parameter can be used to adjust the cost of index probes. The range of values is 1 to 10000. The default value is 100, which means that indexes are evaluated as an access path based on the normal costing model. A value of 10 means that the cost of an index access path is one-tenth the normal cost of an index access path.

OPTIMIZER_MAX_PERMUTATIONS

This parameter controls the maximum number ofpermutations that the CBO considers when generating execution plans forSQL statements with joins. The range of values is 4 to 80000. A value of 80000corresponds to no limit. Setting this parameter to a value less than1000 normally ensures parse times of a few seconds or less.

The OPTIMIZER_MAX_PERMUTATIONS parameter canbe used to reduce parse times for complex SQL statements that join alarge number of tables. However, reducing its value can result in theoptimizer missing an optimal join permutation.

OPTIMIZER_MODE

This initialization parameter sets the mode of the optimizer at instance startup. The possible values are RULE, CHOOSE, ALL_ROWS, FIRST_ROWS_n, and FIRST_ROWS. For description of these parameter values, see "OPTIMIZER_MODE Initialization Parameter".


Note:

Oracle Corporation strongly advises the use of cost-basedoptimization. Rule-based optimization will be deprecated in a futurerelease.


PARTITION_VIEW_ENABLED

This parameter enables the partition view pruning feature. If set to true, then the CBO scans only the required partitions, based on the view predicates or filters.

QUERY_REWRITE_ENABLED

This parameter enables the query rewrite feature, which works in conjunction with materialized views. If set to true,then the CBO considers query rewrites, using materialized views tosatisfy the original query. This parameter also controls whether or notfunction-based indexes are used.

SORT_AREA_SIZE

This parameter specifies the amount of memory (in bytes)that will be used to perform sorts. If a sort operation is performed,and if the amount of data to be sorted exceeds the value of SORT_AREA_SIZE, then data beyond the value of SORT_AREA_SIZE is written to the temporary tablespace. The CBO uses the value of SORT_AREA_SIZE to cost sort operations including sort merge joins. Larger values for SORT_AREA_SIZE result in cheaper CBO costs for sort operations.


Note:

Oracle does not recommend using the SORT_AREA_SIZEparameter unless the instance is configured with the shared serveroption. Oracle recommends instead that you enable automatic sizing ofSQL work areas by setting PGA_AGGREGATE_TARGET. SORT_AREA_SIZE is retained only for backward compatibility.


STAR_TRANSFORMATION_ENABLED

This parameter, if set to true, enables theCBO to cost a star transformation for star queries. The startransformation combines the bitmap indexes on the various fact tablecolumns rather than using a Cartesian approach.

See Also:

Oracle9i Database Reference for complete information about each parameter

Overview of the Extensible Optimizer

The extensible optimizer is part of the CBO. It allowsthe authors of user-defined functions and domain indexes to control thethree main components that the CBO uses to select an execution plan:statistics, selectivity, and cost evaluation.

The extensible optimizer lets you:

  • Associate cost function and default costs with domain indexes, indextypes, packages, and standalone functions
  • Associate selectivity function and default selectivity with methods of object types, package functions, and standalone functions
  • Associate statistics collection functions with domain indexes and columns of tables
  • Order predicates with functions based on cost
  • Select a user-defined access path (domain index) for a table based on access cost
  • Use the DBMS_STATS package or the ANALYZE statement to invoke user-defined statistics collection and deletion functions


    Note:

    Oracle Corporation strongly recommends that you use the DBMS_STATS package rather than ANALYZEto collect optimizer statistics. That package lets you collectstatistics in parallel, collect global statistics for partitionedobjects, and fine tune your statistics collection in other ways.Further, the cost-based optimizer will eventually use only statisticsthat have been collected by DBMS_STATS. See Oracle9i Supplied PL/SQL Packages and Types Reference for more information on this package.

    However, you must use the ANALYZE statement rather than DBMS_STATS for statistics collection not related to the cost-based optimizer, such as:

    • To use the VALIDATE or LIST CHAINED ROWS clauses
    • To collect information on freelist blocks

  • Use new data dictionaryviews to include information about the statistics collection, cost, orselectivity functions associated with columns, domain indexes,indextypes, or functions
  • Add a hint to preserve the order of evaluation for function predicates

    See Also:

    Oracle9i Data Cartridge Developer's Guide for details about the extensible optimizer

Understanding User-Defined Statistics

You can define statistics collection functions for domain indexes, individual columns of a table, and user-defined datatypes.

Whenever a domain index is analyzed to gather statistics,Oracle calls the associated statistics collection function. Whenever acolumn of a table is analyzed, Oracle collects the standard statisticsfor that column and calls any associated statistics collectionfunction. If a statistics collection function exists for a datatype,then Oracle calls it for each column that has that datatype in thetable being analyzed.

Understanding User-Defined Selectivity

The selectivity of a predicate in a SQL statement is usedto estimate the cost of a particular access path; it is also used todetermine the optimal join order. The optimizer cannot compute anaccurate selectivity for predicates that contain user-definedoperators, because it does not have any information about theseoperators.

You can define selectivity functions for predicatescontaining user-defined operators, standalone functions, packagefunctions, or type methods. The optimizer calls the user-definedselectivity function whenever it encounters a predicate that containsthe operator, function, or method in one of the following relationswith a constant: <, <=, =, >=, >, or LIKE.

Understanding User-Defined Costs

The optimizer cannot compute an accurate estimate of thecost of a domain index because it does not know the internal storagestructure of the index. Also, the optimizer might underestimate thecost of a user-defined function that invokes PL/SQL, uses recursiveSQL, accesses a BFILE, or is CPU-intensive.

You can define costs for domain indexes and user-definedstandalone functions, package functions, and type methods. Theseuser-defined costs can be in the form of default costs that theoptimizer simply looks up, or they can be full-fledged cost functionsthat the optimizer calls to compute the cost.

原创粉丝点击