PostgreSQL (User defined Operator) UDO & Operator Optimization Information

来源：互联网发布：java设计模式书籍编辑：程序博客网时间：2024/06/08 11:31

Postgres2015全国用户大会将于11月20至21日在北京丽亭华苑酒店召开。本次大会嘉宾阵容强大，国内顶级PostgreSQL数据库专家将悉数到场，并特邀欧洲、俄罗斯、日本、美国等国家和地区的数据库方面专家助阵:

Postgres-XC项目的发起人铃木市一(SUZUKI Koichi)
Postgres-XL的项目发起人Mason Sharp
pgpool的作者石井达夫(Tatsuo Ishii)
PG-Strom的作者海外浩平(Kaigai Kohei)
Greenplum研发总监姚延栋
周正中(德哥), PostgreSQL中国用户会创始人之一
汪洋，平安科技数据库技术部经理
……

2015年度PG大象会报名地址：http://postgres2015.eventdove.com/
PostgreSQL中国社区： http://postgres.cn/
PostgreSQL专业1群： 3336901（已满）
PostgreSQL专业2群： 100910388
PostgreSQL专业3群： 150657323

PostgreSQL 支持自定义操作符，本质上是调用函数来实现的。

语法如下：

CREATE OPERATOR name (    PROCEDURE = function_name    [, LEFTARG = left_type ] [, RIGHTARG = right_type ]    [, COMMUTATOR = com_op ] [, NEGATOR = neg_op ]    [, RESTRICT = res_proc ] [, JOIN = join_proc ]    [, HASHES ] [, MERGES ])

例如创建一个求两个值的平均值的操作符：

首选要创建函数


postgres=# create function f_avg(numeric,numeric) returns numeric as $$
postgres$#   select ($1+$2)/2;
postgres$# $$ language sql strict;
CREATE FUNCTION

验证函数


postgres=# select f_avg(1,null);
 f_avg 
-------
      
(1 row)
postgres=# select f_avg(1,2);
       f_avg        
--------------------
 1.5000000000000000
(1 row)

创建操作符，指定左右参数类型，调用的函数名，commutator是一个和优化器相关的选项，我后面会重点介绍：


postgres=# create operator ## (procedure=f_avg, leftarg=numeric, rightarg=numeric, commutator='##');
CREATE OPERATOR
postgres=# select 1 ## 2;
      ?column?      
--------------------
 1.5000000000000000
(1 row)

注意到在创建操作符的语法中有6个和优化器有关的关键字：


    [, COMMUTATOR = com_op ] [, NEGATOR = neg_op ]
    [, RESTRICT = res_proc ] [, JOIN = join_proc ]
    [, HASHES ] [, MERGES ]

介绍如下：

假设x表示操作符左侧的参数，y表示操作符右侧的参数

1. commutator，指明x op1 y等效于y op2 x，即操作数调换，返回的值一样。例如2>1 和1<2结果是一致的。那么>就是<的commutator或者反之。又例如1+2和2+1是等价的，那么+就是+的commutator。commutator只需要在创建其中一个操作符时指定，创建另一个对应的操作符时可以不需要指定，PostgreSQL会自动建立这个关系。例如创建>操作符时指定了它的commutator是<，那么在创建<操作符时可以不需要指定>是它的commutator。

另外需要注意，有commutator操作符的操作符的左右两侧的参数类型必须一致，这样才能满足x op1 y等价于y op2 x。

优化器如何利用commutator呢？例如索引扫描，必须列在操作符的左侧才能使用索引。1 > tbl.c这个条件，如果>没有commutator的话，是不能使用索引的。

例子，以int4的>和<操作符为例，实验一下：

>和<在PostgreSQL中是一对commutator


postgres=# select oprcom::regoper from pg_operator where oprname='>' and oprcode='int4gt'::regproc;
    oprcom    
--------------
 pg_catalog.<
(1 row)
postgres=# select oprcom::regoper from pg_operator where oprname='<' and oprcode='int4lt'::regproc;
    oprcom    
--------------
 pg_catalog.>
(1 row)

记录他们的oprcom对应的OID


postgres=# select * from pg_operator where oprname='>' and oprcode='int4gt'::regproc;
 oprname | oprnamespace | oprowner | oprkind | oprcanmerge | oprcanhash | oprleft | oprright | oprresult | oprcom | oprnegate | oprc
ode |   oprrest   |     oprjoin     
---------+--------------+----------+---------+-------------+------------+---------+----------+-----------+--------+-----------+-----
----+-------------+-----------------
 >       |           11 |       10 | b       | f           | f          |      23 |       23 |        16 |     97 |       523 | int4
gt  | scalargtsel | scalargtjoinsel
(1 row)
postgres=# select * from pg_operator where oprname='<' and oprcode='int4lt'::regproc;
 oprname | oprnamespace | oprowner | oprkind | oprcanmerge | oprcanhash | oprleft | oprright | oprresult | oprcom | oprnegate | oprc
ode |   oprrest   |     oprjoin     
---------+--------------+----------+---------+-------------+------------+---------+----------+-----------+--------+-----------+-----
----+-------------+-----------------
 <       |           11 |       10 | b       | f           | f          |      23 |       23 |        16 |    521 |       525 | int4
lt  | scalarltsel | scalarltjoinsel
(1 row)

接下来我要通过更新pg_operator解除他们的commutator关系，设置为0即可。


postgres=# update pg_operator set oprcom=0 where oprname='>' and oprcode='int4gt'::regproc;
UPDATE 1
postgres=# update pg_operator set oprcom=0 where oprname='<' and oprcode='int4lt'::regproc;
UPDATE 1

创建测试表，插入测试数据，创建索引：


postgres=# create table tbl(id int);
CREATE TABLE
postgres=# insert into tbl select generate_series(1,100000);
INSERT 0 100000
postgres=# create index idx_tbl_id on tbl(id);
CREATE INDEX

将列放在条件的左边可以走索引，但是放在右边不走索引。因为优化器不能决定>,<是否为commutator


postgres=# explain select * from tbl where id<10;
                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Index Only Scan using idx_tbl_id on tbl  (cost=0.29..8.45 rows=9 width=4)
   Index Cond: (id < 10)
(2 rows)
postgres=# explain select * from tbl where 10>id;
                        QUERY PLAN                        
----------------------------------------------------------
 Seq Scan on tbl  (cost=0.00..1361.00 rows=33333 width=4)
   Filter: (10 > id)
(2 rows)

重新建立这两个 operator的commutator关系后，优化器会自动将10>id转换为id<10，并且走索引了：


postgres=# update pg_operator set oprcom=521 where oprname='<' and oprcode='int4lt'::regproc;
UPDATE 1
postgres=# update pg_operator set oprcom=97 where oprname='>' and oprcode='int4gt'::regproc;
UPDATE 1
postgres=# explain select * from tbl where 10>id;
                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Index Only Scan using idx_tbl_id on tbl  (cost=0.29..8.45 rows=9 width=4)
   Index Cond: (id < 10)
(2 rows)

2. negator，指x op1 y 等价于 not(y op2 x)，或者x op1等价于not( y op2)，或者op1 x 等价于not(op2 y)，因此negator支持一元和二元操作符。

例子:

如果=和<>是一对negator操作符，NOT (x = y) 可以简化为 x <> y。


postgres=# explain select * from tbl where 10=id;
                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Index Only Scan using idx_tbl_id on tbl  (cost=0.29..8.31 rows=1 width=4)
   Index Cond: (id = 10)
(2 rows)
postgres=# explain select * from tbl where not(10<>id);
                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Index Only Scan using idx_tbl_id on tbl  (cost=0.29..8.31 rows=1 width=4)
   Index Cond: (id = 10)
(2 rows)

同样，操作符两侧参数x,y的类型必须一致。并且仅适用于返回布尔逻辑类型的操作符。

3. restrict，是用于评估选择性的函数，仅适用于二元操作符，例如where col>100，这个查询条件，如何评估选择性呢？是通过操作符的restrict来指定的，选择性乘以pg_class.reltuples就可以评估得到这个查询条件的行数。

选择性函数的代码在 src/backend/utils/adt/

包括

-rw-r--r--. 1 1107 1107 33191 Jun 10 03:29 array_selfuncs.c

-rw-r--r--. 1 1107 1107 2316 Jun 10 03:29 geo_selfuncs.c

-rw-r--r--. 1 1107 1107 720 Jun 10 03:29 network_selfuncs.c

-rw-r--r--. 1 1107 1107 33895 Jun 10 03:29 rangetypes_selfuncs.c

-rw-r--r--. 1 1107 1107 218809 Jun 10 03:29 selfuncs.c

选择性函数，还需要依赖数据库的统计信息，从而计算选择性，常见的选择性计算函数有：

postgres=# select distinct oprrest from pg_operator order by 1;

oprrest

--------------

eqsel 相等

neqsel 不相等

scalarltsel 小于等于

scalargtsel 大于等于

areasel

positionsel

contsel

iclikesel

icnlikesel

regexeqsel

likesel

icregexeqsel

regexnesel

nlikesel

icregexnesel

rangesel

networksel

tsmatchsel

arraycontsel

(20 rows)

当然，用户如果自定义数据类型的话，也可以自定义选择性函数，或者使用以上标准的选择性函数，只是可能需要实现一下类型转换。

源码中的介绍：

src/backend/utils/adt/selfuncs.c

/*----------

* Operator selectivity estimation functions are called to estimate the

* selectivity of WHERE clauses whose top-level operator is their operator.

* We divide the problem into two cases:

* Restriction clause estimation: the clause involves vars of just

* one relation. 一种是符合WHERE条件的选择性（百分比）。

* Join clause estimation: the clause involves vars of multiple rels.

* Join selectivity estimation is far more difficult and usually less accurate

* than restriction estimation. -- JOIN的选择性评估通常没有WHERE条件的选择性准确。

* When dealing with the inner scan of a nestloop join, we consider the

* join's joinclauses as restriction clauses for the inner relation, and

* treat vars of the outer relation as parameters (a/k/a constants of unknown

* values). So, restriction estimators need to be able to accept an argument

* telling which relation is to be treated as the variable.

在使用nestloop JOIN时，一个表的字段将作为变量，另一个表的字段（及其统计信息）与操作符作为JOIN评估子句。

* The call convention for a restriction estimator (oprrest function) is

* Selectivity oprrest (PlannerInfo *root,

* Oid operator,

* List *args,

* int varRelid);

* 评估选择性需要4个参数:

* root: general information about the query (rtable and RelOptInfo lists

* are particularly important for the estimator). plannerinfo信息。

* operator: OID of the specific operator in question. 操作符的OID

* args: argument list from the operator clause. 操作符子句中的参数列表

* varRelid: if not zero, the relid (rtable index) of the relation to

* be treated as the variable relation. May be zero if the args list

* is known to contain vars of only one relation. 表示where条件所包含的参数来自哪些relation。

* This is represented at the SQL level (in pg_proc) as

* float8 oprrest (internal, oid, internal, int4); 在pg_proc数据字典中表示为oprrest指定的函数。

* The result is a selectivity, that is, a fraction (0 to 1) of the rows

* of the relation that are expected to produce a TRUE result for the

* given operator. 选择性函数的评估结果就是一个百分比。乘以pg_class.reltuples就可以得到记录数。

* The call convention for a join estimator (oprjoin function) is similar

* except that varRelid is not needed, and instead join information is

* supplied:

* JOIN选择性的计算函数与WHERE选择性的计算函数参数有轻微差别，么有varRelid, 增加了join信息的参数。

* Selectivity oprjoin (PlannerInfo *root,

* Oid operator,

* List *args,

* JoinType jointype,

* SpecialJoinInfo *sjinfo);

* float8 oprjoin (internal, oid, internal, int2, internal);

* (Before Postgres 8.4, join estimators had only the first four of these

* parameters. That signature is still allowed, but deprecated.) The

* relationship between jointype and sjinfo is explained in the comments for

* clause_selectivity() --- the short version is that jointype is usually

* best ignored in favor of examining sjinfo.

* Join selectivity for regular inner and outer joins is defined as the

* fraction (0 to 1) of the cross product of the relations that is expected

* to produce a TRUE result for the given operator. For both semi and anti (半连接与预连接)

* joins, however, the selectivity is defined as the fraction of the left-hand

* side relation's rows that are expected to have a match (ie, at least one

* row with a TRUE result) in the right-hand side.

* For both oprrest and oprjoin functions, the operator's input collation OID

* (if any) is passed using the standard fmgr mechanism, so that the estimator

* function can fetch it with PG_GET_COLLATION(). Note, however, that all

* statistics in pg_statistic are currently built using the database's default

* collation. Thus, in most cases where we are looking at statistics, we

* should ignore the actual operator collation and use DEFAULT_COLLATION_OID.

* We expect that the error induced by doing this is usually not large enough

* to justify complicating matters.

*----------

4. join，是joinsel即join的选择性计算函数。

对应pg_operator.oprjoin

postgres=# select distinct oprjoin from pg_operator order by 1;

oprjoin

------------------

eqjoinsel

neqjoinsel

scalarltjoinsel

scalargtjoinsel

areajoinsel

positionjoinsel

contjoinsel

iclikejoinsel

icnlikejoinsel

regexeqjoinsel

likejoinsel

icregexeqjoinsel

regexnejoinsel

nlikejoinsel

icregexnejoinsel

networkjoinsel

tsmatchjoinsel

arraycontjoinsel

(19 rows)

5. hashes

6. merges

hashes和merges表示该操作符是否允许hash join和merge join, 只有返回布尔逻辑值的二元操作符满足这个要求。

我们在pg_operator这个catalog中也可以查看到对应的介绍：

NameTypeReferencesDescriptionoidoid Row identifier (hidden attribute; must be explicitly selected)oprnamename Name of the operatoroprnamespaceoidpg_namespace.oidThe OID of the namespace that contains this operatoroprowneroidpg_authid.oidOwner of the operatoroprkindchar b = infix ("between"), l = prefix ("left"), r = postfix ("right")
指定操作符在什么位置，例如中间，左侧，右侧oprcanmergebool This operator supports merge joins
此操作符是否支持merge joinoprcanhashbool This operator supports hash joins
此操作符是否支持hash joinoprleftoidpg_type.oidType of the left operand
操作符左侧的数据类型oprrightoidpg_type.oidType of the right operand
操作符右侧的数据类型oprresultoidpg_type.oidType of the result
返回结果的数据类型oprcomoidpg_operator.oidCommutator of this operator, if any

oprnegateoidpg_operator.oidNegator of this operator, if anyoprcoderegprocpg_proc.oidFunction that implements this operatoroprrestregprocpg_proc.oidRestriction selectivity estimation function for this operatoroprjoinregprocpg_proc.oidJoin selectivity estimation function for this operator

[参考]

1. http://www.postgresql.org/docs/9.4/static/sql-createoperator.html

2. http://www.postgresql.org/docs/9.4/static/xoper-optimization.html

0 0