postgresql explain query中的materialize

来源：互联网发布：mac如何安装zip文件编辑：程序博客网时间：2024/05/22 04:32

今天在GREENPLUM的执行计划中看到了materialize，不知道这个步骤是做什么的，查了下文档，没有相关解释。没事，GREENPLUM不行还有POSTGRESQL呢，GOOGLE了一把，找到了答案。

What does materialize do? I'm joining two tables, not views or anything like that.

A materialize node means the output of whatever is below it in the tree (which can be a scan, or a full set of joins or something like that) is materalized into memory before the upper node is executed. This is usually done when the outer node needs a source that it can re-scan for some reason or other.

So in your case, the planner is determining that the result of a scan on one of your tables will fit in memory, and it till make it possible to choose an upper join operation that requires rescans while still being cheaper.

具体的SQL如下

aligputf8=# select count(1) from ttt1;
count
-------
10000
(1 row)

aligputf8=#
aligputf8=# select count(1) from ttt2;
count
-------
10000
(1 row)

aligputf8=# explain select * from ttt1 as a,ttt2 as b where a.id=3;
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
Gather Motion 6:1 (slice2; segments: 6) (cost=0.00..1409.07 rows=1667 width=8)
   -> Nested Loop (cost=0.00..1409.07 rows=1667 width=8)
         -> Broadcast Motion 1:6 (slice1; segments: 1) (cost=0.00..137.07 rows=6 width=4)
               -> Seq Scan on ttt1 a (cost=0.00..137.00 rows=1 width=4)
                     Filter: id = 3
         -> Seq Scan on ttt2 b (cost=0.00..112.00 rows=1667 width=4)
(6 rows)

aligputf8=# explain select * from ttt1 as a,ttt2 as b where a.id=3;
aligputf8=#
aligputf8=# explain select * from ttt1 as a,ttt2 as b where a.id<3000;
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
Gather Motion 6:1 (slice2; segments: 6) (cost=122.00..3599268.93 rows=4998334 width=8)
   -> Nested Loop (cost=122.00..3599268.93 rows=4998334 width=8)
         -> Broadcast Motion 6:6 (slice1; segments: 6) (cost=0.00..346.93 rows=2999 width=4)
               -> Seq Scan on ttt1 a (cost=0.00..137.00 rows=500 width=4)
                     Filter: id < 3000
         -> Materialize (cost=122.00..222.00 rows=1667 width=4)
               -> Seq Scan on ttt2 b (cost=0.00..112.00 rows=1667 width=4)
(7 rows)

我的理解就是PG为了加快nested loop循环的速度，把b表的数据缓存在了内存中。我们可以看到前面一个SQL，只对b表数据进行1次扫描，因此并不需要进行缓存；后者需要进行2999次扫描。