与友人就\"Exists 子查询\"技术的讨论

来源：互联网发布：openwrt golang 编辑：程序博客网时间：2024/05/14 02:46

且看Pg的一个例子(其他待续):

dm=# select version();
version
-------------------------------------------------------------
PostgreSQL 9.3.4, compiled by Visual C++ build 1700, 64-bit
(1 row)

数据准备:

dm=# create table t1(c1 int, c2 int);
CREATE TABLE
dm=# create table t2(d1 int, d2 int);
CREATE TABLE
dm=# insert into t1 values(generate_series(1,100000,1), generate_series(1,100000,1)*10);
INSERT 0 100000
dm=# insert into t2 values(generate_series(1,100,1), generate_series(1,100,1)*10);
INSERT 0 100

察看执行计划,hash 连接. 红色字体的数字,预估,不精确.

dm=# explain select * from t1 where exists (select * from t2 where c1 = d1);
                               QUERY PLAN
------------------------------------------------------------------------
Hash Join (cost=41.25..2735.80 rows=47401 width=8)
   Hash Cond: (t1.c1 = t2.d1)
   -> Seq Scan on t1 (cost=0.00..1391.02 rows=94802 width=8)
   -> Hash (cost=38.75..38.75 rows=200 width=4)
         -> HashAggregate (cost=36.75..38.75 rows=200 width=4)
               -> Seq Scan on t2 (cost=0.00..31.40 rows=2140 width=4)
(6 rows)

获得精确数据:

dm=# vacuum t1;
VACUUM
dm=# vacuum t2;
VACUUM

再次查看执行计划,数据已经精确, 不需要多做HashAggregate :
dm=# explain select * from t1 where exists (select * from t2 where c1 = d1);
                           QUERY PLAN
----------------------------------------------------------------
Hash Semi Join (cost=3.25..1709.86 rows=100 width=8)
   Hash Cond: (t1.c1 = t2.d1)
   -> Seq Scan on t1 (cost=0.00..1443.00 rows=100000 width=8)
   -> Hash (cost=2.00..2.00 rows=100 width=4)
         -> Seq Scan on t2 (cost=0.00..2.00 rows=100 width=4)
(5 rows)

以上的语句, 不存在索引,所以效率不高. 建立索引,看PG对三种表连接算法的选用情况,如下:
dm=# create index t2_d1 on t2(d1);
CREATE INDEX

依然使用了hash joing,观察cost=3.25..1709.86 等待与下面的其他示例对比.
dm=# explain select * from t1 where exists (select * from t2 where c1 = d1);
                           QUERY PLAN
----------------------------------------------------------------
Hash Semi Join (cost=3.25..1709.86 rows=100 width=8)
   Hash Cond: (t1.c1 = t2.d1)
   -> Seq Scan on t1 (cost=0.00..1443.00 rows=100000 width=8)
   -> Hash (cost=2.00..2.00 rows=100 width=4)
         -> Seq Scan on t2 (cost=0.00..2.00 rows=100 width=4)
(5 rows)

禁止hash join, Pg选择了排序后连接,对比其代价计算11121.69..11123.62 , 高于hash join:

dm=# SET enable_hashjoin = off;
SET
dm=# explain select * from t1 where exists (select * from t2 where c1 = d1);
                              QUERY PLAN
----------------------------------------------------------------------
Merge Semi Join (cost=11121.69..11123.62 rows=100 width=8)
   Merge Cond: (t1.c1 = t2.d1)
   -> Sort (cost=11116.32..11366.32 rows=100000 width=8)
         Sort Key: t1.c1
         -> Seq Scan on t1 (cost=0.00..1443.00 rows=100000 width=8)
   -> Sort (cost=5.32..5.57 rows=100 width=4)
         Sort Key: t2.d1
         -> Seq Scan on t2 (cost=0.00..2.00 rows=100 width=4)
(8 rows)

0 0