将“相关值”导致的Nested Loop优化成Hash Join
来源:互联网 发布:中星睿典软件注册码 编辑:程序博客网 时间:2024/05/22 07:51
原文地址:http://www.itpub.net/thread-1376537-1-2.html
“相关值”是我剽窃的一个词,与之相反的是“不相关值”,我借用来描述以下这种情况:在SQL中有两个表A和B要join,join条件是A.name=B.first_name,判定符“=”左右两边的值是“不相关”的,而substr(A.name,1,length(B.first_name))=B.first_name,判定符“=”左右两边的值是“相关”的,因为要通过“=”右边的值才能求出左边的值(相反亦然)。
由于这种“相关值”的情况,A、B表join的算法注定只能走Nested Loop了,因为每一个不同的驱动值(驱动表中的值,也就是B表中的值)将改变A表负责join的值,执行计划如下:
1
select
a.*,b.object_name
from
TEST_OBJECT a, TEST_OBJ1 b
2
where
substr(a.object_name,1,length(b.object_name))=b.object_name;
01
已选择108612行。
02
03
已用时间: 00: 00: 21.54
04
05
---------------------------------------------------------------------
06
| Id | Operation | Name | Rows | Bytes | Cost |
07
---------------------------------------------------------------------
08
| 0 | SELECT STATEMENT | | | | |
09
| 1 | NESTED LOOPS | | | | |
10
| 2 | TABLE ACCESS FULL | TEST_OBJ1 | | | |
11
|* 3 | TABLE ACCESS FULL | TEST_OBJECT | | | |
12
---------------------------------------------------------------------
13
14
Predicate Information (identified by operation id):
15
---------------------------------------------------
16
17
3 - filter("B"."OBJECT_NAME"=SUBSTR("A"."OBJECT_NAME",1,LENGTH("B"."OBJ
18
ECT_NAME")))
19
20
统计信息
21
----------------------------------------------------------
22
1 recursive calls
23
0 db block gets
24
9262734 consistent gets
这里展开一下,Nested Loop这个算法复杂度是 O(B*A) (B表示驱动表的记录数,A表示内部表),如果A、B都很大,那就杯具了,关于Nested Loop驱动表的选取就不展开了。
网友anlinew提供了一种非常精妙的,决的优化方法,将语句改写成:
1
select
a.*,b.object_name
from
TEST_OBJECT a, TEST_OBJ1 b
2
where
substr(a.object_name,1,length(b.object_name))=b.object_name
3
and
substr(a.object_name,1,4)=substr(b.object_name,1,4)
4
and
length(b.object_name)>3
5
union
all
6
select
a.*,b.object_name
from
TEST_OBJECT a, TEST_OBJ1 b
7
where
substr(a.object_name,1,length(b.object_name))=b.object_name
8
and
length(b.object_name)<4;
1
01
已选择108612行。
02
03
已用时间: 00: 06: 51.24
04
05
-----------------------------------------------------------------------------------
06
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
07
-----------------------------------------------------------------------------------
08
| 0 | SELECT STATEMENT | | 3468 | 365K| 90997 (100)| 00:18:12 |
09
| 1 | UNION-ALL | | | | | |
10
|* 2 | HASH JOIN | | 34 | 3672 | 176 (2)| 00:00:03 |
11
|* 3 | TABLE ACCESS FULL| TEST_OBJ1 | 659 | 12521 | 35 (0)| 00:00:01 |
12
| 4 | TABLE ACCESS FULL| TEST_OBJECT | 52544 | 4566K| 139 (1)| 00:00:02 |
13
| 5 | NESTED LOOPS | | 3434 | 362K| 90821 (1)| 00:18:10 |
14
|* 6 | TABLE ACCESS FULL| TEST_OBJ1 | 659 | 12521 | 35 (0)| 00:00:01 |
15
|* 7 | TABLE ACCESS FULL| TEST_OBJECT | 5 | 445 | 138 (1)| 00:00:02 |
16
-----------------------------------------------------------------------------------
17
18
Predicate Information (identified by operation id):
19
---------------------------------------------------
20
21
2 - access(SUBSTR("A"."OBJECT_NAME",1,4)=SUBSTR("B"."OBJECT_NAME",1,4))
22
filter("B"."OBJECT_NAME"=SUBSTR("A"."OBJECT_NAME",1,LENGTH("B"."OBJE
23
CT_NAME")))
24
3 - filter(LENGTH("B"."OBJECT_NAME")>3)
25
6 - filter(LENGTH("B"."OBJECT_NAME")<4)
26
7 - filter("B"."OBJECT_NAME"=SUBSTR("A"."OBJECT_NAME",1,LENGTH("B"."OBJE
27
CT_NAME")))
28
29
统计信息
30
----------------------------------------------------------
31
1 recursive calls
32
0 db block gets
33
33946 consistent gets
执行计划虽然变复杂了,但是耗时大幅减少,consistent gets也大幅降低,作出巨大贡献的是Hash Join的引入。
这里再展开一下,Hash Join的复杂度是O(A+B),简单来说就是对A、B表各扫描一次,如果A、B都比较大的情况来看,无疑Hash Join要比Nested Loop 优越很多。
扯远了,回到anlinew的具体方法上吧,导致Hash Join出现的关键因素是一个谓词的引入:
1
and
substr(a.object_name,1,4)=substr(b.object_name,1,4)
套用我剽窃的那个词来说,这是“不相关值”的对比!
anlinew的核心思想是将数据“分片”,该例子中分片的依据是多少位首字母(这里是4),其中“大头”由Hash Join处理,而“小头”走Nested Loop,这种“抓大放小”的做法直接就从复杂度上进行了优化。
这种“分片”的思想非常值得借鉴,将“相关值”判断转化成“不相关值”的判断也是处理问题的一种有效手法。
- 将“相关值”导致的Nested Loop优化成Hash Join
- oracle的表连接hash join、nested loop join
- oracle hash join和nested loop下的驱动表相关测试
- Oracle优化器、优化模式、表的连接方式(Hash Join、Nested Loop、Sort Merge Join)
- HASH JOIN ,SORT MERGE JOIN ,NESTED LOOP
- HASH JOIN/MERGE JOIN/NESTED LOOP
- 描述HASH JOIN ,MERGE JOIN ,NESTED LOOP
- Hash Join, Nested Loop, Sort-Merge Join
- Hash Join 与 Nested-Loop Join
- HASH JOIN ,MERGE JOIN ,NESTED LOOP 比较
- HASH JOIN ,MERGE JOIN ,NESTED LOOP 比较
- hash join、nested loop,sort merge join
- HASH JOIN ,MERGE JOIN ,NESTED LOOP 比较
- HASH JOIN ,MERGE JOIN ,NESTED LOOP 比较
- HASH JOIN ,MERGE JOIN ,NESTED LOOP 比较
- HASH JOIN , MERGE JOIN ,NESTED LOOP
- HASH JOIN ,MERGE JOIN ,NESTED LOOP 比较
- HASH JOIN ,MERGE JOIN ,NESTED LOOP 比较
- cmd命令
- 试图运行项目时出错:无法在 Web 服务器上启动调试.拒绝访问
- Solaris 10 安装oracle 11g 参数设置
- DataStage读数据库的写进参数的问题
- centos 5.5 vnc server配置方法
- 将“相关值”导致的Nested Loop优化成Hash Join
- 各种浏览器内核说明
- TortoiseSVN的命令行更新方式
- SharePoint - Error: Service Unavailable HTTP Error 503. The service is unavailable
- 让黄晓明代言凡客是否合适?
- CUDA学习(三)
- 以后每周都写一篇博文
- 解决负号后置问题
- Android不同的Activity怎么联系