【semi join】IN/EXISTS子查询的区别及其优化方式
来源:互联网 发布:淘宝培训费用 编辑:程序博客网 时间:2024/05/22 08:28
【semi join】IN/EXISTS子查询的区别及其优化方式2013-04-27 12:45:18 在ORACLE 11G大行其道的今天,还有很多人受早期版本的影响,记住一些既定的规则,
1.子查询结果集小,用IN
2.外表小,子查询表大,用EXISTS
这是完全错误的观点。在8i时代,这经常是正确的,但是现在已经11G了,马上12C就要面世了。其实在ORACLE 9i CBO就已经优化了IN,EXISTS的区别,ORACLE优化器有个查询转换器,很多SQL虽然写法不同,但是ORACLE优化器会根据既定规则进行查询重写,重写为优化器觉得效率最高的SQL,所以可能SQL写法不同,但是执行计划却是完全一样的。
IN与EXISTS有一点要记住:IN一般是用于非相关子查询,而EXISTS一般用于相关子查询。当然IN也可以用于相关子查询,EXISTS也可以用于非相关子查询。但是这区别很重要,虽然优化器很强大,但是查询转换是有一定的限制的,在EXISTS性能低下,无法进行相关查询转换,比如不能UNNEST SUBQUERY,那么可能我们需要改写SQL,通常可以用IN/JOIN等改写相关子查询,或对于含有OR的SEMI JOIN改为UNION ALL/UNION的形式。
下面就用例子说明一下:
DROP TABLE a;
DROP TABLE b;
CREATE TABLE a AS SELECT * FROM hr.employees;
CREATE TABLE b AS SELECT * FROM hr.employees;
--反复插入,构造20万行+
INSERT INTO a SELECT * FROM a;
INSERT INTO b SELECT * FROM a;
COMMIT;
BEGIN
dbms_stats.gather_table_stats(ownname => USER,tabname => 'a',estimate_percent => 100,cascade => TRUE);
dbms_stats.gather_table_stats(ownname => USER,tabname => 'b',estimate_percent => 100,cascade => TRUE);
END;
/
1.测试IN,EXISTS在简单查询中,是等价的
SQL> set autotrace traceonly exp
SQL> SELECT * FROM a
2 WHERE EXISTS(
3 SELECT 1 FROM b WHERE a.employee_id=b.employee_id);
执行计划
----------------------------------------------------------
Plan hash value: 2317816356
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 217K| 15M| 1375 (2)| 00:00:17 |
|* 1 | HASH JOIN | | 217K| 15M| 1375 (2)| 00:00:17 |
| 2 | SORT UNIQUE | | 217K| 848K| 126 (1)| 00:00:02 |
| 3 | INDEX FAST FULL SCAN| IDX2_B | 217K| 848K| 126 (1)| 00:00:02 |
| 4 | TABLE ACCESS FULL | A | 217K| 14M| 620 (1)| 00:00:08 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."EMPLOYEE_ID"="B"."EMPLOYEE_ID")
SQL> SELECT * FROM a
2 WHERE a.employee_id IN (SELECT b.employee_id FROM b);
执行计划
----------------------------------------------------------
Plan hash value: 2317816356
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 217K| 15M| 1375 (2)| 00:00:17 |
|* 1 | HASH JOIN | | 217K| 15M| 1375 (2)| 00:00:17 |
| 2 | SORT UNIQUE | | 217K| 848K| 126 (1)| 00:00:02 |
| 3 | INDEX FAST FULL SCAN| IDX2_B | 217K| 848K| 126 (1)| 00:00:02 |
| 4 | TABLE ACCESS FULL | A | 217K| 14M| 620 (1)| 00:00:08 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."EMPLOYEE_ID"="B"."EMPLOYEE_ID")
可以看出,两个计划完全没有区别。类似于ORACLE查询重写为:
SELECT a.* FROM a,(SELECT DISTINCT b.employee_id FROM b) b1 WHERE a.employee_id=b1.employee_id;
看看8i时代的区别:
SQL> SELECT/*+optimizer_features_enable('8.1.7')*/ * FROM a
2 WHERE EXISTS(
3 SELECT 1 FROM b WHERE a.employee_id=b.employee_id);
已用时间: 00: 00: 00.00
执行计划
----------------------------------------------------------
Plan hash value: 3422092984
-------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10854 | 731K| 344 |
|* 1 | FILTER | | | | |
| 2 | TABLE ACCESS FULL| A | 10854 | 731K| 344 |
|* 3 | INDEX RANGE SCAN | IDX2_B | 2049 | 8196 | 5 |
-------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT 0 FROM "B" "B" WHERE
"B"."EMPLOYEE_ID"=:B1))
3 - access("B"."EMPLOYEE_ID"=:B1)
Note
-----
- cpu costing is off (consider enabling it)
SQL> SELECT/*+optimizer_features_enable('8.1.7')*/ * FROM a
2 WHERE a.employee_id IN (SELECT b.employee_id FROM b);
已用时间: 00: 00: 00.00
执行计划
----------------------------------------------------------
Plan hash value: 1679318093
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 217K| 16M| | 1126 |
|* 1 | HASH JOIN | | 217K| 16M| | 1126 |
| 2 | VIEW | VW_NSO_1 | 106 | 1378 | | 779 |
| 3 | SORT UNIQUE | | 106 | 424 | 2576K| 779 |
| 4 | TABLE ACCESS FULL| B | 217K| 848K| | 344 |
| 5 | TABLE ACCESS FULL | A | 217K| 14M| | 344 |
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."EMPLOYEE_ID"="EMPLOYEE_ID")
Note
-----
- cpu costing is off (consider enabling it)
2.遇到优化器限制的做法:改写SQL
ORACLE优化器虽然已经很强大,但是还有很多限制,比如无法UNNEST的限制如子查询有CONNECT BY,SET操作,ROWNUM,关联子查询内部包含分组函数等。还比如SEMI JOIN.ANTI JOIN条件带OR的形式等。这时候,我们常用的优化方式就是SQL的等价改写,这要根据具体的业务和数据特点,来重写等价的SQL,千万别改写成结果不等价,那就糟糕了。
比如这个SQL:
SELECT * FROM a
WHERE EXISTS(
SELECT 1 FROM b WHERE a.department_id=b.department_id GROUP BY b.department_id HAVING a.salary>=MAX(b.salary));
SQL> @display_cursor
SQL_ID dgc8b80sxwct2, child number 1
-------------------------------------
SELECT * FROM a WHERE EXISTS( SELECT 1 FROM b WHERE
a.department_id=b.department_id GROUP BY b.department_id HAVING
a.salary>=MAX(b.salary))
Plan hash value: 705769378
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 22528 |00:09:13.61 | 32M|
|* 1 | FILTER | | 1 | | 22528 |00:09:13.61 | 32M|
| 2 | TABLE ACCESS FULL | A | 1 | 217K| 217K|00:00:00.20 | 3733 |
|* 3 | FILTER | | 14403 | | 2058 |00:09:13.20 | 32M|
| 4 | SORT GROUP BY NOSORT| | 14403 | 1 | 14403 |00:09:13.17 | 32M|
|* 5 | TABLE ACCESS FULL | B | 14403| 19745 | 973M|00:06:17.21 | 32M|
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( IS NOT NULL)
3 - filter(MAX("B"."SALARY")<=:B1)
5 - filter("B"."DEPARTMENT_ID"=:B1)
这是个很简单的SQL,但是因为使用了EXISTS关联子查询,并且内部有分组操作,无法进行有效的查询转换,走了FILTER操作,FILTER操作类似于NESTED LOOPS,但是不同于NESTED LOOPS的是,他还可以通过条件判断,是否走子步骤。这里全表扫描B 14403次(这是无法忍受的,特别遇到大表,甭想跑出来了)。
那么如何优化这种SQL呢,要改写,改写为JOIN形式:
SQL> SELECT * FROM a,(SELECT department_id,MAX(b.salary) max_salary FROM b GROUP BY b.department_id) b1
2 WHERE a.department_id=b1.department_id AND a.salary>=b1.max_salary;
已选择22528行。
已用时间: 00: 00: 00.64
执行计划
----------------------------------------------------------
Plan hash value: 774961296
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10854 | 1006K| 1250 (2)| 00:00:16 |
|* 1 | HASH JOIN | | 10854 | 1006K| 1250 (2)| 00:00:16 |
| 2 | VIEW | | 11 | 286 | 629 (3)| 00:00:08 |
| 3 | HASH GROUP BY | | 11 | 88 | 629 (3)| 00:00:08 |
| 4 | TABLE ACCESS FULL| B | 217K| 1696K| 620 (1)| 00:00:08 |
| 5 | TABLE ACCESS FULL | A | 217K| 14M| 620 (1)| 00:00:08 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."DEPARTMENT_ID"="B1"."DEPARTMENT_ID")
filter("A"."SALARY">="B1"."MAX_SALARY")
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
5986 consistent gets
0 physical reads
0 redo size
1851483 bytes sent via SQL*Net to client
16926 bytes received via SQL*Net from client
1503 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
22528 rows processed
从上面看出,逻辑读相对于没有改写之前的32M,时间9分钟,效率现在大幅度提升,时间变为00: 00: 00.64,l逻辑读变为5986次,因为走了HASH JOIN,相当于两表各扫描1次。
上面是通过将子查询改为JOIN的形式来优化,当然改写方式多样,下面用一个EXISTS改写为IN的方式来提高效率:
SELECT * FROM a WHERE EXISTS(SELECT 1 FROM b WHERE a.department_id=b.department_id AND a.employee_id=b.employee_id
START WITH b.employee_id =202 CONNECT BY PRIOR b.employee_id=b.manager_id);
SQL> set autotrace traceonly exp
SQL> SELECT * FROM a WHERE EXISTS(SELECT 1 FROM b WHERE a.department_id=b.department_id AND a.employee_id=b.employee_id
2 START WITH b.employee_id =202 CONNECT BY PRIOR b.employee_id=b.manager_id);
已用时间: 00: 00: 00.00
执行计划
----------------------------------------------------------
Plan hash value: 985844456
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 186 | 12834 | 723K (1)| 02:24:37 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | A | 217K| 14M| 620 (1)| 00:00:08 |
|* 3 | FILTER | | | | | |
|* 4 | CONNECT BY NO FILTERING WITH SW (UNIQUE)| | | | | |
| 5 | TABLE ACCESS FULL | B | 217K| 2545K| 620 (1)| 00:00:08 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT 0 FROM "B" "B" WHERE "B"."DEPARTMENT_ID"=:B1 AND
"B"."EMPLOYEE_ID"=:B2 START WITH "B"."EMPLOYEE_ID"=202 CONNECT BY "B"."MANAGER_ID"=PRIOR
"B"."EMPLOYEE_ID"))
3 - filter("B"."DEPARTMENT_ID"=:B1 AND "B"."EMPLOYEE_ID"=:B2)
4 - access("B"."MANAGER_ID"=PRIOR "B"."EMPLOYEE_ID")
filter("B"."EMPLOYEE_ID"=202)
又是和前面类似的FILTER操作,这条SQL也要运行很长时间,FILTER操作不是不好,就像NESTED LOOPS一样,也有高效的时候,如果FILTER操作做的次数不多,而且分支操作可以高效实用索引,那么也是高效的,这得注意。
改写,以上查询可以很容易改写为IN的形式:
SQL> SELECT * FROM a WHERE (a.department_id,a.employee_id) IN
2 (SELECT b.department_id,b.employee_id FROM b START WITH b.employee_id =202 CONNECT BY PRIOR b.employee_id=b.manager_id );
已用时间: 00: 00: 00.13
执行计划
----------------------------------------------------------
Plan hash value: 1584203533
--------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2048 | 190K| | 2466 (1)| 00:00:30 |
|* 1 | HASH JOIN RIGHT SEMI | | 2048 | 190K| 8064K| 2466 (1)| 00:00:30 |
| 2 | VIEW | VW_NSO_1 | 217K| 5514K| | 1617 (62)| 00:00:20 |
|* 3 | CONNECT BY NO FILTERING WITH START-WITH| | | | | | |
| 4 | TABLE ACCESS FULL | B | 217K| 2545K| | 620 (1)| 00:00:08 |
| 5 | TABLE ACCESS FULL | A | 217K| 14M| | 620 (1)| 00:00:08 |
--------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."DEPARTMENT_ID"="DEPARTMENT_ID" AND "A"."EMPLOYEE_ID"="EMPLOYEE_ID")
3 - access("B"."MANAGER_ID"=PRIOR "B"."EMPLOYEE_ID")
filter("B"."EMPLOYEE_ID"=202)
走了HASH JOIN,这条SQL效率大增,基本在<1s内会返回结果。
在学习过程中,一定要亲自实践,不能遵循于从某个地方看到的规则,特别是N年前的规则,规则是有用的,但是规则也会不断滴更新的,如果规则发生了更新,但是在你的脑子里没有更新,你却不经过实践,就永远遵循这规则,那你对这方面的知识永远知之甚少或知而不全。
分类: Oracle
1.子查询结果集小,用IN
2.外表小,子查询表大,用EXISTS
这是完全错误的观点。在8i时代,这经常是正确的,但是现在已经11G了,马上12C就要面世了。其实在ORACLE 9i CBO就已经优化了IN,EXISTS的区别,ORACLE优化器有个查询转换器,很多SQL虽然写法不同,但是ORACLE优化器会根据既定规则进行查询重写,重写为优化器觉得效率最高的SQL,所以可能SQL写法不同,但是执行计划却是完全一样的。
IN与EXISTS有一点要记住:IN一般是用于非相关子查询,而EXISTS一般用于相关子查询。当然IN也可以用于相关子查询,EXISTS也可以用于非相关子查询。但是这区别很重要,虽然优化器很强大,但是查询转换是有一定的限制的,在EXISTS性能低下,无法进行相关查询转换,比如不能UNNEST SUBQUERY,那么可能我们需要改写SQL,通常可以用IN/JOIN等改写相关子查询,或对于含有OR的SEMI JOIN改为UNION ALL/UNION的形式。
下面就用例子说明一下:
DROP TABLE a;
DROP TABLE b;
CREATE TABLE a AS SELECT * FROM hr.employees;
CREATE TABLE b AS SELECT * FROM hr.employees;
--反复插入,构造20万行+
INSERT INTO a SELECT * FROM a;
INSERT INTO b SELECT * FROM a;
COMMIT;
BEGIN
dbms_stats.gather_table_stats(ownname => USER,tabname => 'a',estimate_percent => 100,cascade => TRUE);
dbms_stats.gather_table_stats(ownname => USER,tabname => 'b',estimate_percent => 100,cascade => TRUE);
END;
/
1.测试IN,EXISTS在简单查询中,是等价的
SQL> set autotrace traceonly exp
SQL> SELECT * FROM a
2 WHERE EXISTS(
3 SELECT 1 FROM b WHERE a.employee_id=b.employee_id);
执行计划
----------------------------------------------------------
Plan hash value: 2317816356
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 217K| 15M| 1375 (2)| 00:00:17 |
|* 1 | HASH JOIN | | 217K| 15M| 1375 (2)| 00:00:17 |
| 2 | SORT UNIQUE | | 217K| 848K| 126 (1)| 00:00:02 |
| 3 | INDEX FAST FULL SCAN| IDX2_B | 217K| 848K| 126 (1)| 00:00:02 |
| 4 | TABLE ACCESS FULL | A | 217K| 14M| 620 (1)| 00:00:08 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."EMPLOYEE_ID"="B"."EMPLOYEE_ID")
SQL> SELECT * FROM a
2 WHERE a.employee_id IN (SELECT b.employee_id FROM b);
执行计划
----------------------------------------------------------
Plan hash value: 2317816356
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 217K| 15M| 1375 (2)| 00:00:17 |
|* 1 | HASH JOIN | | 217K| 15M| 1375 (2)| 00:00:17 |
| 2 | SORT UNIQUE | | 217K| 848K| 126 (1)| 00:00:02 |
| 3 | INDEX FAST FULL SCAN| IDX2_B | 217K| 848K| 126 (1)| 00:00:02 |
| 4 | TABLE ACCESS FULL | A | 217K| 14M| 620 (1)| 00:00:08 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."EMPLOYEE_ID"="B"."EMPLOYEE_ID")
可以看出,两个计划完全没有区别。类似于ORACLE查询重写为:
SELECT a.* FROM a,(SELECT DISTINCT b.employee_id FROM b) b1 WHERE a.employee_id=b1.employee_id;
看看8i时代的区别:
SQL> SELECT/*+optimizer_features_enable('8.1.7')*/ * FROM a
2 WHERE EXISTS(
3 SELECT 1 FROM b WHERE a.employee_id=b.employee_id);
已用时间: 00: 00: 00.00
执行计划
----------------------------------------------------------
Plan hash value: 3422092984
-------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10854 | 731K| 344 |
|* 1 | FILTER | | | | |
| 2 | TABLE ACCESS FULL| A | 10854 | 731K| 344 |
|* 3 | INDEX RANGE SCAN | IDX2_B | 2049 | 8196 | 5 |
-------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT 0 FROM "B" "B" WHERE
"B"."EMPLOYEE_ID"=:B1))
3 - access("B"."EMPLOYEE_ID"=:B1)
Note
-----
- cpu costing is off (consider enabling it)
SQL> SELECT/*+optimizer_features_enable('8.1.7')*/ * FROM a
2 WHERE a.employee_id IN (SELECT b.employee_id FROM b);
已用时间: 00: 00: 00.00
执行计划
----------------------------------------------------------
Plan hash value: 1679318093
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 217K| 16M| | 1126 |
|* 1 | HASH JOIN | | 217K| 16M| | 1126 |
| 2 | VIEW | VW_NSO_1 | 106 | 1378 | | 779 |
| 3 | SORT UNIQUE | | 106 | 424 | 2576K| 779 |
| 4 | TABLE ACCESS FULL| B | 217K| 848K| | 344 |
| 5 | TABLE ACCESS FULL | A | 217K| 14M| | 344 |
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."EMPLOYEE_ID"="EMPLOYEE_ID")
Note
-----
- cpu costing is off (consider enabling it)
显然,在8i时代,还是有明显的区别,EXISTS是主表驱动,走的FILTER,如果主表返回行很多,那么必然效率会低,如果主表小,内表,内表能走索引,是很好的,IN走HASH JOIN,受内表
驱动,内表返回行少,效率高。当然具体情况肯定会很复杂,这里不做研究了,因为8i已经是过去式了。
驱动,内表返回行少,效率高。当然具体情况肯定会很复杂,这里不做研究了,因为8i已经是过去式了。
2.遇到优化器限制的做法:改写SQL
ORACLE优化器虽然已经很强大,但是还有很多限制,比如无法UNNEST的限制如子查询有CONNECT BY,SET操作,ROWNUM,关联子查询内部包含分组函数等。还比如SEMI JOIN.ANTI JOIN条件带OR的形式等。这时候,我们常用的优化方式就是SQL的等价改写,这要根据具体的业务和数据特点,来重写等价的SQL,千万别改写成结果不等价,那就糟糕了。
比如这个SQL:
SELECT * FROM a
WHERE EXISTS(
SELECT 1 FROM b WHERE a.department_id=b.department_id GROUP BY b.department_id HAVING a.salary>=MAX(b.salary));
SQL> @display_cursor
SQL_ID dgc8b80sxwct2, child number 1
-------------------------------------
SELECT * FROM a WHERE EXISTS( SELECT 1 FROM b WHERE
a.department_id=b.department_id GROUP BY b.department_id HAVING
a.salary>=MAX(b.salary))
Plan hash value: 705769378
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 22528 |00:09:13.61 | 32M|
|* 1 | FILTER | | 1 | | 22528 |00:09:13.61 | 32M|
| 2 | TABLE ACCESS FULL | A | 1 | 217K| 217K|00:00:00.20 | 3733 |
|* 3 | FILTER | | 14403 | | 2058 |00:09:13.20 | 32M|
| 4 | SORT GROUP BY NOSORT| | 14403 | 1 | 14403 |00:09:13.17 | 32M|
|* 5 | TABLE ACCESS FULL | B | 14403| 19745 | 973M|00:06:17.21 | 32M|
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( IS NOT NULL)
3 - filter(MAX("B"."SALARY")<=:B1)
5 - filter("B"."DEPARTMENT_ID"=:B1)
这是个很简单的SQL,但是因为使用了EXISTS关联子查询,并且内部有分组操作,无法进行有效的查询转换,走了FILTER操作,FILTER操作类似于NESTED LOOPS,但是不同于NESTED LOOPS的是,他还可以通过条件判断,是否走子步骤。这里全表扫描B 14403次(这是无法忍受的,特别遇到大表,甭想跑出来了)。
那么如何优化这种SQL呢,要改写,改写为JOIN形式:
SQL> SELECT * FROM a,(SELECT department_id,MAX(b.salary) max_salary FROM b GROUP BY b.department_id) b1
2 WHERE a.department_id=b1.department_id AND a.salary>=b1.max_salary;
已选择22528行。
已用时间: 00: 00: 00.64
执行计划
----------------------------------------------------------
Plan hash value: 774961296
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10854 | 1006K| 1250 (2)| 00:00:16 |
|* 1 | HASH JOIN | | 10854 | 1006K| 1250 (2)| 00:00:16 |
| 2 | VIEW | | 11 | 286 | 629 (3)| 00:00:08 |
| 3 | HASH GROUP BY | | 11 | 88 | 629 (3)| 00:00:08 |
| 4 | TABLE ACCESS FULL| B | 217K| 1696K| 620 (1)| 00:00:08 |
| 5 | TABLE ACCESS FULL | A | 217K| 14M| 620 (1)| 00:00:08 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."DEPARTMENT_ID"="B1"."DEPARTMENT_ID")
filter("A"."SALARY">="B1"."MAX_SALARY")
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
5986 consistent gets
0 physical reads
0 redo size
1851483 bytes sent via SQL*Net to client
16926 bytes received via SQL*Net from client
1503 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
22528 rows processed
从上面看出,逻辑读相对于没有改写之前的32M,时间9分钟,效率现在大幅度提升,时间变为00: 00: 00.64,l逻辑读变为5986次,因为走了HASH JOIN,相当于两表各扫描1次。
上面是通过将子查询改为JOIN的形式来优化,当然改写方式多样,下面用一个EXISTS改写为IN的方式来提高效率:
SELECT * FROM a WHERE EXISTS(SELECT 1 FROM b WHERE a.department_id=b.department_id AND a.employee_id=b.employee_id
START WITH b.employee_id =202 CONNECT BY PRIOR b.employee_id=b.manager_id);
SQL> set autotrace traceonly exp
SQL> SELECT * FROM a WHERE EXISTS(SELECT 1 FROM b WHERE a.department_id=b.department_id AND a.employee_id=b.employee_id
2 START WITH b.employee_id =202 CONNECT BY PRIOR b.employee_id=b.manager_id);
已用时间: 00: 00: 00.00
执行计划
----------------------------------------------------------
Plan hash value: 985844456
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 186 | 12834 | 723K (1)| 02:24:37 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | A | 217K| 14M| 620 (1)| 00:00:08 |
|* 3 | FILTER | | | | | |
|* 4 | CONNECT BY NO FILTERING WITH SW (UNIQUE)| | | | | |
| 5 | TABLE ACCESS FULL | B | 217K| 2545K| 620 (1)| 00:00:08 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT 0 FROM "B" "B" WHERE "B"."DEPARTMENT_ID"=:B1 AND
"B"."EMPLOYEE_ID"=:B2 START WITH "B"."EMPLOYEE_ID"=202 CONNECT BY "B"."MANAGER_ID"=PRIOR
"B"."EMPLOYEE_ID"))
3 - filter("B"."DEPARTMENT_ID"=:B1 AND "B"."EMPLOYEE_ID"=:B2)
4 - access("B"."MANAGER_ID"=PRIOR "B"."EMPLOYEE_ID")
filter("B"."EMPLOYEE_ID"=202)
又是和前面类似的FILTER操作,这条SQL也要运行很长时间,FILTER操作不是不好,就像NESTED LOOPS一样,也有高效的时候,如果FILTER操作做的次数不多,而且分支操作可以高效实用索引,那么也是高效的,这得注意。
改写,以上查询可以很容易改写为IN的形式:
SQL> SELECT * FROM a WHERE (a.department_id,a.employee_id) IN
2 (SELECT b.department_id,b.employee_id FROM b START WITH b.employee_id =202 CONNECT BY PRIOR b.employee_id=b.manager_id );
已用时间: 00: 00: 00.13
执行计划
----------------------------------------------------------
Plan hash value: 1584203533
--------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2048 | 190K| | 2466 (1)| 00:00:30 |
|* 1 | HASH JOIN RIGHT SEMI | | 2048 | 190K| 8064K| 2466 (1)| 00:00:30 |
| 2 | VIEW | VW_NSO_1 | 217K| 5514K| | 1617 (62)| 00:00:20 |
|* 3 | CONNECT BY NO FILTERING WITH START-WITH| | | | | | |
| 4 | TABLE ACCESS FULL | B | 217K| 2545K| | 620 (1)| 00:00:08 |
| 5 | TABLE ACCESS FULL | A | 217K| 14M| | 620 (1)| 00:00:08 |
--------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."DEPARTMENT_ID"="DEPARTMENT_ID" AND "A"."EMPLOYEE_ID"="EMPLOYEE_ID")
3 - access("B"."MANAGER_ID"=PRIOR "B"."EMPLOYEE_ID")
filter("B"."EMPLOYEE_ID"=202)
走了HASH JOIN,这条SQL效率大增,基本在<1s内会返回结果。
在学习过程中,一定要亲自实践,不能遵循于从某个地方看到的规则,特别是N年前的规则,规则是有用的,但是规则也会不断滴更新的,如果规则发生了更新,但是在你的脑子里没有更新,你却不经过实践,就永远遵循这规则,那你对这方面的知识永远知之甚少或知而不全。
- 【semi join】IN/EXISTS子查询的区别及其优化方式
- 【semi join】IN/EXISTS子查询的区别及其优化方式
- 【semi join】IN/EXISTS子查询的区别及其优化方式
- [Hive]用left semi join替代in子查询的方式
- NOT IN、NOT EXISTS的相关子查询改用LEFT JOIN--sql2000性能优化
- IN、EXISTS的相关子查询用INNER JOIN 代替--sql2000性能优化
- MySQL优化案例---半连接(semi join)优化方式 导致的查询性能低下
- 使用exists(Semi-Join)优化distinct语句
- IN&EXISTS 与 NOT IN&NOT EXISTS 子查询 区别
- in子查询、exists子查询、连接,效率的探讨
- in子查询、exists子查询、连接,效率的探讨
- JOIN与EXISTS(子查询)的效率研究
- IN的相关子查询用EXISTS代替--sql2000性能优化
- exists和in的区别和优化
- Esper系列(六)子查询、Exists、In/not in、Any/Some、Join
- mysql之exists子查询和in查询的对比
- oracle sql 多表 嵌套子查询 连接查询, join where exist in 的区别
- 嵌套查询及其与join的区别
- 简评用PHP开发大型系统的缺点
- 2012蓝桥杯省赛第二题 古堡算式(暴力解决)
- 备忘录(memento)模式
- PHP5中的异常处理详解
- Android面试题收集(有详细答案)
- 【semi join】IN/EXISTS子查询的区别及其优化方式
- PHP 5中新增加日期(date)函数的常量
- VS2010的helplibrary问题
- 【Oracle index】B*Tree索引之NULL统计技巧
- 一道阿里巴巴笔试题
- 2012蓝桥杯省赛第三题 比酒量(暴力解决)
- 二十五条励志金句,让你迅速提升自己的价值!
- 不要以你的现状来判断你的未来—俞敏洪6月2号在同济大学的演讲
- 30岁之前需要知道的10个人生底线,你知道几个?