转IN&EXISTS&NOTIN&NOTEXISTS

来源：互联网发布：算法初步高考题编辑：程序博客网时间：2024/06/05 08:53

1.1、IN 和 EXISTS
2.
3.   IN和EXISTS的处理流程是不一样的：
4.
5. IN的执行流程
6. select * from T1 where x in ( select y from T2 )
7. 可以理解为:
8. select * from t1, ( select distinct y from t2 ) t2 where t1.x = t2.y;
9.
10. EXISTS的执行流程
11. select * from t1 where exists ( select null from t2 where y = x )
12. 可以理解为:
13.     for x in ( select * from t1 )
14.     loop
15.        if ( exists ( select null from t2 where y = x.x )
16.        then
17.           OUTPUT THE RECORD
18.        end if
19.     end loop
20.
21.   创建测试用例表big(4000 row)和small(400 row)
22.
23. create table big as select * from dba_objects where rownum <= 10000;
24. insert into big select * from big;
25. insert into big select * from big;
26. commit;
27. create table small as select * from dba_objects where rownum <= 400;
28.
29.当内层表为small,外层表为big时,两种语法的查询如下：
30.
31. SQL> select count(1) from big a where a.object_id in
32.       (select b.object_id from sall b);
33.
34.    COUNT(1)
35. ----------
36.        1600
37.
38. 已用时间: 00: 00: 00.56
39.
40. Execution Plan
41. -----------------------------------------------------
42.     0      SELECT STATEMENT Optimizer=CHOOSE
43.     1    0   SORT (AGGREGATE)
44.     2    1     MERGE JOIN
45.     3    2       SORT (JOIN)
46.     4    3         TABLE ACCESS (FULL) OF 'BIG'
47.     5    2       SORT (JOIN)
48.     6    5         VIEW OF 'VW_NSO_1'
49.     7    6           SORT (UNIQUE)
50.     8    7             TABLE ACCESS (FULL) OF 'SMALL'
51.
52. Statistics
53. -----------------------------------------------------
54.            0 recursive calls
55.            0 db block gets
56.          543 consistent gets
57.            0 physical reads
58.
59. SQL> select count(1) from big a where exists
60.        (select 1 from small b where a.object_id=b.object_id);
61.
62.    COUNT(1)
63. ----------
64.         1600
65.
66.   已用时间: 00: 00: 03.10
67.
68. Execution Plan
69. -----------------------------------------------------
70.      0      SELECT STATEMENT Optimizer=CHOOSE
71.      1    0   SORT (AGGREGATE)
72.      2    1     FILTER
73.      3    2       TABLE ACCESS (FULL) OF 'BIG'
74.      4    2       TABLE ACCESS (FULL) OF 'SMALL'
75.
76. Statistics
77. -----------------------------------------------------
78.            0 recursive calls
79.            0 db block gets
80.       312157 consistent gets
81.            0 physical reads
82.
83.当内层表为big,外层表为small时,两种语法的查询如下：
84.
85. SQL> select count(1) from small a where a.object_id in
86.       (select b.object_id from big b);
87.
88.    COUNT(1)
89. ----------
90.         400
91.
92. 已用时间: 00: 00: 00.56
93.
94. Execution Plan
95. -----------------------------------------------------
96.     0      SELECT STATEMENT Optimizer=CHOOSE
97.     1    0   SORT (AGGREGATE)
98.     2    1     MERGE JOIN
99.     3    2       SORT (JOIN)
100.     4    3         TABLE ACCESS (FULL) OF 'SMALL'
101.     5    2       SORT (JOIN)
102.     6    5         VIEW OF 'VW_NSO_1'
103.     7    6           SORT (UNIQUE)
104.     8    7             TABLE ACCESS (FULL) OF 'BIG'
105.
106. Statistics
107. -----------------------------------------------------
108.            0 recursive calls
109.            0 db block gets
110.          543 consistent gets
111.            0 physical reads
112.
113. SQL> select count(1) from small a where exists
114.       (select null from big b where a.bject_id=b.object_id);
115.
116.    COUNT(1)
117. ----------
118.         400
119.
120. 已用时间: 00: 00: 00.25
121.
122. Execution Plan
123. -----------------------------------------------------
124.     0      SELECT STATEMENT Optimizer=CHOOSE
125.     1    0   SORT (AGGREGATE)
126.     2    1     FILTER
127.     3    2       TABLE ACCESS (FULL) OF 'SMALL'
128.     4    2       TABLE ACCESS (FULL) OF 'BIG'
129.
130. Statistics
131. -----------------------------------------------------
132.            0 recursive calls
133.            0 db block gets
134.         2562 consistent gets
135.            0 physical reads
136.
137.
138.在对表big、small进行分析后,发现CBO下两种语法的执行计划是一样的,都使用hash连接或者hash半连接
139.
140. SQL> analyze table big compute statistics;
141. SQL> analyze table small compute statistics;
142.
143. SQL> select count(1) from big a where a.object_id in
144.         (select b.object_id from small b);
145.
146.    COUNT(1)
147. ----------
148.        1600
149.
150. 已用时间: 00: 00: 00.09
151.
152. Execution Plan
153. -------------------------------------------------------
154.     0      SELECT STATEMENT Optimizer=CHOOSE (Cost=58
155.                                       Card=1 Bytes=8)
156.     1    0   SORT (AGGREGATE)
157.     2    1     HASH JOIN (Cost=58 Card=1600 Bytes=12800)
158.     3    2       SORT (UNIQUE)
159.     4    3         TABLE ACCESS (FULL) OF 'SMALL' (Cost=2 Card=400 Bytes=1600)
160.     5    2       TABLE ACCESS (FULL) OF 'BIG' (Cost=53 Card=40000 Bytes=160000)
161.
162. Statistics
163. -------------------------------------------------------
164.            0 recursive calls
165.            0 db block gets
166.          543 consistent gets
167.            0 physical reads
168.
169. SQL> select count(1) from big a where exists
170.       (select 1 from small b where a.object_id=b.object_id);
171.
172.    COUNT(1)
173. ----------
174.        1600
175.
176. 已用时间: 00: 00: 00.09
177.
178. Execution Plan
179. ----------------------------------------------------------
180.     0      SELECT STATEMENT Optimizer=CHOOSE (Cost=58 Card=1 Bytes=8)
181.     1    0   SORT (AGGREGATE)
182.     2    1     HASH JOIN (Cost=58 Card=1600 Bytes=12800)
183.     3    2       SORT (UNIQUE)
184.     4    3         TABLE ACCESS (FULL) OF 'SMALL' (Cost=2 Card=400 Bytes=1600)
185.     5    2       TABLE ACCESS (FULL) OF 'BIG' (Cost=53 Card=40000 Bytes=160000)
186.
187. Statistics
188. --------------------------------------------------------
189.            0 recursive calls
190.            0 db block gets
191.          543 consistent gets
192.            0 physical reads
193.
194.
195. SQL> select count(1) from small a where a.object_id in
196.       (select b.object_id from big b);
197.
198.    COUNT(1)
199. ----------
200.         400
201.
202. 已用时间: 00: 00: 00.09
203.
204. Execution Plan
205. ------------------------------------------------------
206.     0      SELECT STATEMENT Optimizer=CHOOSE (Cost=56 Card=1 Bytes=8)
207.     1    0   SORT (AGGREGATE)
208.     2    1     HASH JOIN (SEMI) (Cost=56 Card=400 Bytes=3200)
209.     3    2       TABLE ACCESS (FULL) OF 'SMALL' (Cost=2 Card=400 Bytes=1600)
210.     4    2       TABLE ACCESS (FULL) OF 'BIG' (Cost=53 Card=40000 Bytes=160000)
211.
212. Statistics
213. ------------------------------------------------------
214.            0 recursive calls
215.            0 db block gets
216.          543 consistent gets
217.            0 physical reads
218.
219. SQL> select count(1) from small a where exists
220.       (select 1 from big b where a.object_id=b.object_id);
221.
222.    COUNT(1)
223. ----------
224.         400
225.
226. 已用时间: 00: 00: 00.09
227.
228. Execution Plan
229. -------------------------------------------------------
230.     0      SELECT STATEMENT Optimizer=CHOOSE (Cost=56 Card=1 Bytes=8)
231.     1    0   SORT (AGGREGATE)
232.     2    1     HASH JOIN (SEMI) (Cost=56 Card=400 Bytes=3200)
233.     3    2       TABLE ACCESS (FULL) OF 'SMALL' (Cost=2 Card=400 Bytes=1600)
234.     4    2       TABLE ACCESS (FULL) OF 'BIG' (Cost=53 Card=40000 Bytes=160000)
235.
236. Statistics
237. -------------------------------------------------------
238.            0 recursive calls
239.            0 db block gets
240.          543 consistent gets
241.            0 physical reads
242.
243.删除表分析,使用提示/*+ use_hash(a,b) */ 或者 /*+ hash_sj */,
244.两种语法都可以达到CBO的执行计划
245.
246.SQL> analyze table big delete statistics;
247.SQL> analyze table small delete statistics;
248.
249.SQL> select /*+ use_hash(a,b) */count(1) from big a where a.object_id in
250.     (select b.object_id from small b);
251.
252.SQL> select /*+ use_hash(a,b) */count(1) from big a where exists
253.     (select 1 from small b where a.object_id=b.object_id);
254.
255.SQL> select count(1) from small a where a.object_id in
256.     (select /*+ hash_sj */ b.object_id from big b);
257.
258.SQL> select count(1) from small a where exists
259.     (select /*+ hash_sj */ 1 from big b where a.object_id=b.object_id);
260.
261.下表列出了各种情况下的速度情况：
262.┌───────────┬──────────────────────────┬─────────────────────────┬─────────────┐
263.│           │ outer big,inner small   │ outer small,inner big │ table rows │
264.├───────────┼──────────┬───────────────┼──────────┬──────────────┼─────────────┤
265.│           │ IN SQL │ EXISTS SQL   │ IN SQL │ EXISTS SQL │             │
266.├───────────┼──────────┼───────────────┼──────────┼──────────────┼─────────────┤
267.│un-analyze │ 0.56s   │ 3.10s        │ 0.56s   │ 0.25s       │ big=40000 │
268.├───────────┼──────────┼───────────────┼──────────┼──────────────┤     and     │
269.│ analyzed │ 0.09s   │ 0.09s        │ 0.09s   │ 0.09s       │ small=400 │
270.├───────────┼──────────┼───────────────┼──────────┼──────────────┼─────────────┤
271.│un-analyze │ 0.72s   │ 3.53s        │ 0.25s   │ 2.97s       │ big=5000   │
272.├───────────┼──────────┼───────────────┼──────────┼──────────────┤     and     │
273.│ analyzed │ 0.09s   │ 0.09s        │ 0.09s   │ 0.09s       │ small=4000 │
274.└───────────┴──────────┴───────────────┴──────────┴──────────────┴─────────────┘
275.
276.结论：在未对表进行分析前,若两个表数据量差异很大,则外层表是大表时使用IN较快,
277.     外层表是小表时使用EXISTS较快;若两表数据量接近,则使用IN较快;
278.     分析表后无论用IN还是EXISTS都变得更快,由于执行计划一样,所以速度一样;
279.
280.     所以：无论使用IN还是EXISTS,只要使用散列连接,即提示/*+ use_hash(a,b) */,
281.          或者在子句中散列半连接提示/*+ hash_sj */, 就使其达到最优速度;
282.
283.附注：半连接的提示有hash_sj、merge_sj、nl_sj
284.
285.
286.***********************************************************************************************************************
287.***********************************************************************************************************************
288.
289.
290.2、NOT IN 和 NOT EXISTS
291.
292. NOT EXISTS的执行流程
293. select .....
294.   from rollup R
295. where not exists ( select 'Found' from title T
296.                              where R.source_id = T.Title_ID);
297. 可以理解为:
298. for x in ( select * from rollup )
299.       loop
300.           if ( not exists ( that query ) ) then
301.                  OUTPUT
302.           end if;
303.        end;
304.
305.注意:NOT EXISTS 与 NOT IN 不能完全互相替换，看具体的需求。如果选择的列可以为空，则不能被替换。
306.
307.对于not in 和 not exists的性能区别：
308.   not in 只有当子查询中，select 关键字后的字段有not null约束或者有这种暗示时用not in,另外如果主查询中表大，
309.   子查询中的表小但是记录多，则应当使用not in,并使用anti hash join.
310.
311.   如果主查询表中记录少，子查询表中记录多，并有索引，可以使用not exists，另外not in最好也可以用/*+ HASH_AJ */
312.   或者外连接+is null，NOT IN 在基于成本的应用中较好
313.
314.   创建测试用例表big(40000 row)和small(1000 row)：
315.
316.   truncate table big;
317.   truncate table small;
318.   insert into big   select * from dba_objects where rownum <=20000;
319.   insert into big   select * from dba_objects where rownum <=20000;
320.   insert into small select * from dba_objects where rownum <=1000;
321.   commit;
322.
323.   基本句型：
324.   <1> not in
325.   SQL> select count(1) from big a where a.object_id not in (select b.object_id from small b);
326.
327.   COUNT(1)
328. ----------
329.      38000
330.
331. 已用时间: 00: 00: 12.56
332.
333. Execution Plan
334. -----------------------------------------------------
335.    0      SELECT STATEMENT Optimizer=CHOOSE
336.    1    0   SORT (AGGREGATE)
337.    2    1     FILTER
338.    3    2       TABLE ACCESS (FULL) OF 'BIG'
339.    4    2       TABLE ACCESS (FULL) OF 'SMALL'
340.
341.
342. Statistics
343. --------------------------------