Oralce ------ On BULK COLLECT

来源：互联网发布：奥一网网络问政平台编辑：程序博客网时间：2024/06/05 11:42

By Steven Feuerstein

Best practices for knowing your LIMIT and kicking %NOTFOUND

I have started using BULK COLLECT whenever Ineed to fetch large volumes of data. This has caused me some troublewith my DBA, however. He is complaining that although my programs mightbe running much faster, they are also consuming way too much memory. Herefuses to approve them for a production rollout. What's a programmerto do?

The most important thing to remember when youlearn about and start to take advantage of features such as BULKCOLLECT is that there is no free lunch. There is almost always atrade-off to be made somewhere. The tradeoff with BULK COLLECT, like somany other performance-enhancing features, is "run faster but consumemore memory."

Specifically, memory for collections is storedin the program global area (PGA), not the system global area (SGA). SGAmemory is shared by all sessions connected to Oracle Database, but PGAmemory is allocated for eachsession. Thus, if a program requires 5MB of memory to populate acollection and there are 100 simultaneous connections, that programcauses the consumption of 500MB of PGA memory, in addition to thememory allocated to the SGA.

Fortunately, PL/SQL makes it easy for developersto control the amount of memory used in a BULK COLLECT operation byusing the LIMIT clause.

Suppose I need to retrieve all the rows from the employees table and then perform some compensation analysis on each row. I can use BULK COLLECT as follows:

PROCEDURE process_all_rows
IS
   TYPE employees_aat 
   IS TABLE OF employees%ROWTYPE
      INDEX BY PLS_INTEGER;
   l_employees employees_aat;
BEGIN
   SELECT *
   BULK COLLECT INTO l_employees
      FROM employees;
     
   FOR indx IN 1 .. l_employees.COUNT 
   LOOP
       analyze_compensation 
      (l_employees(indx));
   END LOOP;
END process_all_rows;

Very concise, elegant, and efficient code. If,however, my employees table contains tens of thousands of rows, each ofwhich contains hundreds of columns, this program can cause excessivePGA memory consumption.

Consequently, you should avoid this sort of"unlimited" use of BULK COLLECT. Instead, move the SELECT statementinto an explicit cursor declaration and then use a simple loop to fetchmany, but not all, rows from the table with each execution of the loopbody, as shown in Listing 1.

Code Listing 1: Using BULK COLLECT with LIMIT clause

PROCEDURE process_all_rows (limit_in IN PLS_INTEGER DEFAULT 100)
IS
    CURSOR employees_cur 
    IS 
        SELECT * FROM employees;

    TYPE employees_aat IS TABLE OF employees_cur%ROWTYPE
        INDEX BY PLS_INTEGER;

    l_employees employees_aat;
BEGIN   
    OPEN employees_cur;
    LOOP
        FETCH employees_cur 
            BULK COLLECT INTO l_employees LIMIT limit_in;

        FOR indx IN 1 .. l_employees.COUNT 
        LOOP
            analyze_compensation (l_employees(indx));
        END LOOP;

        EXIT WHEN l_employees.COUNT < limit_in;

   END LOOP;

   CLOSE employees_cur;
END process_all_rows;

The process_all_rows procedure in Listing 1requests that up to the value of limit_in rows be fetched at a time.PL/SQL will reuse the same limit_in elements in the collection eachtime the data is fetched and thus also reuse the same memory. Even ifmy table grows in size, the PGA consumption will remain stable.

How do you decide what number to use in theLIMIT clause? Theoretically, you will want to figure out how muchmemory you can afford to consume in the PGA and then adjust the limitto be as close to that amount as possible.

From tests I (and others) have performed,however, it appears that you will see roughly the same performance nomatter what value you choose for the limit, as long as it is at least25. The test_diff_limits.sql script, included with the sample code forthis column, at otn.oracle.com/oramag/oracle/08-mar/o28plsql.zip, demonstrates this behavior, using the ALL_SOURCE data dictionary view on an Oracle Database 11g instance. Here are the results I saw (in hundredths of seconds) when fetching all the rows (a total of 470,000):

Elapsed CPU time for limit of 1 = 1839
Elapsed CPU time for limit of 5 = 716
Elapsed CPU time for limit of 25 = 539
Elapsed CPU time for limit of 50 = 545
Elapsed CPU time for limit of 75 = 489
Elapsed CPU time for limit of 100 = 490
Elapsed CPU time for limit of 1000 = 501
Elapsed CPU time for limit of 10000 = 478
Elapsed CPU time for limit of 100000 = 527

Kicking the %NOTFOUND Habit

I was very happy to learn that Oracle Database 10gwill automatically optimize my cursor FOR loops to perform at speedscomparable to BULK COLLECT. Unfortunately, my company is still runningon Oracle9i Database, so I have started converting my cursor FORloops to BULK COLLECTs. I have run into a problem: I am using a LIMITof 100, and my query retrieves a total of 227 rows, but my programprocesses only 200 of them. [The query is shown in Listing 2.] What amI doing wrong?

Code Listing 2: BULK COLLECT, %NOTFOUND, and missing rows

PROCEDURE process_all_rows
IS
   CURSOR table_with_227_rows_cur 
   IS 
      SELECT * FROM table_with_227_rows;

   TYPE table_with_227_rows_aat IS 
      TABLE OF table_with_227_rows_cur%ROWTYPE
      INDEX BY PLS_INTEGER;

   l_table_with_227_rows table_with_227_rows_aat;
BEGIN   
   OPEN table_with_227_rows_cur;
   LOOP
      FETCH table_with_227_rows_cur 
         BULK COLLECT INTO l_table_with_227_rows LIMIT 100;

         EXIT WHEN table_with_227_rows_cur%NOTFOUND;     /* cause of missing rows */

      FOR indx IN 1 .. l_table_with_227_rows.COUNT 
      LOOP
         analyze_compensation (l_table_with_227_rows(indx));
      END LOOP;
   END LOOP;

   CLOSE table_with_227_rows_cur;
END process_all_rows;

You came soclose to a completely correct conversion from your cursor FOR loop toBULK COLLECT! Your only mistake was that you didn't give up the habitof using the %NOTFOUND cursor attribute in your EXIT WHEN clause.

The statement

EXIT WHEN 
table_with_227_rows_cur%NOTFOUND;

makes perfect sense when you are fetching yourdata one row at a time. With BULK COLLECT, however, that line of codecan result in incomplete data processing, precisely as you described.

Let's examine what is happening when you runyour program and why those last 27 rows are left out. After opening thecursor and entering the loop, here is what occurs:

1. The fetch statement retrieves rows 1 through 100.
2. table_with_227_rows_cur%NOTFOUND evaluates to FALSE, and the rows are processed.
3. The fetch statement retrieves rows 101 through 200.
4. table_with_227_rows_cur%NOTFOUND evaluates to FALSE, and the rows are processed.
5. The fetch statement retrieves rows 201 through 227.
6. table_with_227_rows_cur%NOTFOUND evaluates to TRUE, and the loop is terminated—with 27 rows left to process!

When you are using BULK COLLECT and collections to fetch data from your cursor, you should never rely on the cursor attributes to decide whether to terminate your loop and data processing.

So, to make sure that your query processes all 227 rows, replace this statement:

EXIT WHEN 
table_with_227_rows_cur%NOTFOUND; 

with

EXIT WHEN 
l_table_with_227_rows.COUNT = 0;

Generally, you should keep all of the following in mind when working with BULK COLLECT:

The collection is always filled sequentially, starting from index value 1.
It is always safe (that is, you will never raise a NO_DATA_FOUND exception) to iterate through a collection from 1 to collection.COUNT when it has been filled with BULK COLLECT.
The collection is empty when no rows are fetched.
Always check the contents of the collection (with the COUNT method) to see if there are more rows to process.
Ignore the values returned by the cursor attributes, especially %NOTFOUND.