PostgreSQL (User defined Operator) UDO & Operator Optimization Information
来源:互联网 发布:java设计模式书籍 编辑:程序博客网 时间:2024/06/08 11:31
Postgres2015全国用户大会将于11月20至21日在北京丽亭华苑酒店召开。本次大会嘉宾阵容强大,国内顶级PostgreSQL数据库专家将悉数到场,并特邀欧洲、俄罗斯、日本、美国等国家和地区的数据库方面专家助阵:
- Postgres-XC项目的发起人铃木市一(SUZUKI Koichi)
- Postgres-XL的项目发起人Mason Sharp
- pgpool的作者石井达夫(Tatsuo Ishii)
- PG-Strom的作者海外浩平(Kaigai Kohei)
- Greenplum研发总监姚延栋
- 周正中(德哥), PostgreSQL中国用户会创始人之一
- 汪洋,平安科技数据库技术部经理
- ……
语法如下:
CREATE OPERATOR name ( PROCEDURE = function_name [, LEFTARG = left_type ] [, RIGHTARG = right_type ] [, COMMUTATOR = com_op ] [, NEGATOR = neg_op ] [, RESTRICT = res_proc ] [, JOIN = join_proc ] [, HASHES ] [, MERGES ])
例如创建一个求两个值的平均值的操作符:
首选要创建函数
postgres=# create function f_avg(numeric,numeric) returns numeric as $$postgres$# select ($1+$2)/2;postgres$# $$ language sql strict;CREATE FUNCTION
验证函数
postgres=# select f_avg(1,null);f_avg-------(1 row)postgres=# select f_avg(1,2);f_avg--------------------1.5000000000000000(1 row)
创建操作符,指定左右参数类型,调用的函数名,commutator是一个和优化器相关的选项,我后面会重点介绍:
postgres=# create operator ## (procedure=f_avg, leftarg=numeric, rightarg=numeric, commutator='##');CREATE OPERATORpostgres=# select 1 ## 2;?column?--------------------1.5000000000000000(1 row)
注意到在创建操作符的语法中有6个和优化器有关的关键字:
[, COMMUTATOR = com_op ] [, NEGATOR = neg_op ][, RESTRICT = res_proc ] [, JOIN = join_proc ][, HASHES ] [, MERGES ]
介绍如下:
假设x表示操作符左侧的参数,y表示操作符右侧的参数
1. commutator,指明x op1 y等效于y op2 x,即操作数调换,返回的值一样。例如2>1 和1<2结果是一致的。那么>就是<的commutator或者反之。又例如1+2和2+1是等价的,那么+就是+的commutator。commutator只需要在创建其中一个操作符时指定,创建另一个对应的操作符时可以不需要指定,PostgreSQL会自动建立这个关系。例如创建>操作符时指定了它的commutator是<,那么在创建<操作符时可以不需要指定>是它的commutator。
另外需要注意,有commutator操作符的操作符的左右两侧的参数类型必须一致,这样才能满足x op1 y等价于y op2 x。
优化器如何利用commutator呢?例如索引扫描,必须列在操作符的左侧才能使用索引。1 > tbl.c这个条件,如果>没有commutator的话,是不能使用索引的。
例子,以int4的>和<操作符为例,实验一下:
>和<在PostgreSQL中是一对commutator
postgres=# select oprcom::regoper from pg_operator where oprname='>' and oprcode='int4gt'::regproc;oprcom--------------pg_catalog.<(1 row)postgres=# select oprcom::regoper from pg_operator where oprname='<' and oprcode='int4lt'::regproc;oprcom--------------pg_catalog.>(1 row)
记录他们的oprcom对应的OID
postgres=# select * from pg_operator where oprname='>' and oprcode='int4gt'::regproc;oprname | oprnamespace | oprowner | oprkind | oprcanmerge | oprcanhash | oprleft | oprright | oprresult | oprcom | oprnegate | oprcode | oprrest | oprjoin---------+--------------+----------+---------+-------------+------------+---------+----------+-----------+--------+-----------+---------+-------------+-----------------> | 11 | 10 | b | f | f | 23 | 23 | 16 | 97 | 523 | int4gt | scalargtsel | scalargtjoinsel(1 row)postgres=# select * from pg_operator where oprname='<' and oprcode='int4lt'::regproc;oprname | oprnamespace | oprowner | oprkind | oprcanmerge | oprcanhash | oprleft | oprright | oprresult | oprcom | oprnegate | oprcode | oprrest | oprjoin---------+--------------+----------+---------+-------------+------------+---------+----------+-----------+--------+-----------+---------+-------------+-----------------< | 11 | 10 | b | f | f | 23 | 23 | 16 | 521 | 525 | int4lt | scalarltsel | scalarltjoinsel(1 row)
接下来我要通过更新pg_operator解除他们的commutator关系,设置为0即可。
postgres=# update pg_operator set oprcom=0 where oprname='>' and oprcode='int4gt'::regproc;UPDATE 1postgres=# update pg_operator set oprcom=0 where oprname='<' and oprcode='int4lt'::regproc;UPDATE 1
创建测试表,插入测试数据,创建索引:
postgres=# create table tbl(id int);CREATE TABLEpostgres=# insert into tbl select generate_series(1,100000);INSERT 0 100000postgres=# create index idx_tbl_id on tbl(id);CREATE INDEX
将列放在条件的左边可以走索引,但是放在右边不走索引。因为优化器不能决定>,<是否为commutator
postgres=# explain select * from tbl where id<10;QUERY PLAN---------------------------------------------------------------------------Index Only Scan using idx_tbl_id on tbl (cost=0.29..8.45 rows=9 width=4)Index Cond: (id < 10)(2 rows)postgres=# explain select * from tbl where 10>id;QUERY PLAN----------------------------------------------------------Seq Scan on tbl (cost=0.00..1361.00 rows=33333 width=4)Filter: (10 > id)(2 rows)
重新建立这两个 operator的commutator关系后,优化器会自动将10>id转换为id<10,并且走索引了:
postgres=# update pg_operator set oprcom=521 where oprname='<' and oprcode='int4lt'::regproc;UPDATE 1postgres=# update pg_operator set oprcom=97 where oprname='>' and oprcode='int4gt'::regproc;UPDATE 1postgres=# explain select * from tbl where 10>id;QUERY PLAN---------------------------------------------------------------------------Index Only Scan using idx_tbl_id on tbl (cost=0.29..8.45 rows=9 width=4)Index Cond: (id < 10)(2 rows)
2. negator,指x op1 y 等价于 not(y op2 x),或者x op1等价于not( y op2),或者op1 x 等价于not(op2 y),因此negator支持一元和二元操作符。
例子:
如果=和<>是一对negator操作符,NOT (x = y) 可以简化为 x <> y。
postgres=# explain select * from tbl where 10=id;QUERY PLAN---------------------------------------------------------------------------Index Only Scan using idx_tbl_id on tbl (cost=0.29..8.31 rows=1 width=4)Index Cond: (id = 10)(2 rows)postgres=# explain select * from tbl where not(10<>id);QUERY PLAN---------------------------------------------------------------------------Index Only Scan using idx_tbl_id on tbl (cost=0.29..8.31 rows=1 width=4)Index Cond: (id = 10)(2 rows)
同样,操作符两侧参数x,y的类型必须一致。并且仅适用于返回布尔逻辑类型的操作符。
3. restrict,是用于评估选择性的函数,仅适用于二元操作符,例如where col>100,这个查询条件,如何评估选择性呢?是通过操作符的restrict来指定的,选择性乘以pg_class.reltuples就可以评估得到这个查询条件的行数。
选择性函数的代码在 src/backend/utils/adt/
包括
-rw-r--r--. 1 1107 1107 33191 Jun 10 03:29 array_selfuncs.c-rw-r--r--. 1 1107 1107 2316 Jun 10 03:29 geo_selfuncs.c-rw-r--r--. 1 1107 1107 720 Jun 10 03:29 network_selfuncs.c-rw-r--r--. 1 1107 1107 33895 Jun 10 03:29 rangetypes_selfuncs.c-rw-r--r--. 1 1107 1107 218809 Jun 10 03:29 selfuncs.c
选择性函数,还需要依赖数据库的统计信息,从而计算选择性,常见的选择性计算函数有:
postgres=# select distinct oprrest from pg_operator order by 1;oprrest---------------eqsel 相等neqsel 不相等scalarltsel 小于等于scalargtsel 大于等于areaselpositionselcontseliclikeselicnlikeselregexeqsellikeselicregexeqselregexneselnlikeselicregexneselrangeselnetworkseltsmatchselarraycontsel(20 rows)
当然,用户如果自定义数据类型的话,也可以自定义选择性函数,或者使用以上标准的选择性函数,只是可能需要实现一下类型转换。
源码中的介绍:
src/backend/utils/adt/selfuncs.c
/*----------* Operator selectivity estimation functions are called to estimate the* selectivity of WHERE clauses whose top-level operator is their operator.* We divide the problem into two cases:* Restriction clause estimation: the clause involves vars of just* one relation. 一种是符合WHERE条件的选择性(百分比)。* Join clause estimation: the clause involves vars of multiple rels.* Join selectivity estimation is far more difficult and usually less accurate* than restriction estimation. -- JOIN的选择性评估通常没有WHERE条件的选择性准确。** When dealing with the inner scan of a nestloop join, we consider the* join's joinclauses as restriction clauses for the inner relation, and* treat vars of the outer relation as parameters (a/k/a constants of unknown* values). So, restriction estimators need to be able to accept an argument* telling which relation is to be treated as the variable.在使用nestloop JOIN时,一个表的字段将作为变量,另一个表的字段(及其统计信息)与操作符作为JOIN评估子句。** The call convention for a restriction estimator (oprrest function) is** Selectivity oprrest (PlannerInfo *root,* Oid operator,* List *args,* int varRelid);* 评估选择性需要4个参数:* root: general information about the query (rtable and RelOptInfo lists* are particularly important for the estimator). plannerinfo信息。* operator: OID of the specific operator in question. 操作符的OID* args: argument list from the operator clause. 操作符子句中的参数列表* varRelid: if not zero, the relid (rtable index) of the relation to* be treated as the variable relation. May be zero if the args list* is known to contain vars of only one relation. 表示where条件所包含的参数来自哪些relation。** This is represented at the SQL level (in pg_proc) as** float8 oprrest (internal, oid, internal, int4); 在pg_proc数据字典中表示为oprrest指定的函数。** The result is a selectivity, that is, a fraction (0 to 1) of the rows* of the relation that are expected to produce a TRUE result for the* given operator. 选择性函数的评估结果就是一个百分比。乘以pg_class.reltuples就可以得到记录数。** The call convention for a join estimator (oprjoin function) is similar* except that varRelid is not needed, and instead join information is* supplied:* JOIN选择性的计算函数与WHERE选择性的计算函数参数有轻微差别,么有varRelid, 增加了join信息的参数。* Selectivity oprjoin (PlannerInfo *root,* Oid operator,* List *args,* JoinType jointype,* SpecialJoinInfo *sjinfo);** float8 oprjoin (internal, oid, internal, int2, internal);** (Before Postgres 8.4, join estimators had only the first four of these* parameters. That signature is still allowed, but deprecated.) The* relationship between jointype and sjinfo is explained in the comments for* clause_selectivity() --- the short version is that jointype is usually* best ignored in favor of examining sjinfo.** Join selectivity for regular inner and outer joins is defined as the* fraction (0 to 1) of the cross product of the relations that is expected* to produce a TRUE result for the given operator. For both semi and anti (半连接与预连接)* joins, however, the selectivity is defined as the fraction of the left-hand* side relation's rows that are expected to have a match (ie, at least one* row with a TRUE result) in the right-hand side.** For both oprrest and oprjoin functions, the operator's input collation OID* (if any) is passed using the standard fmgr mechanism, so that the estimator* function can fetch it with PG_GET_COLLATION(). Note, however, that all* statistics in pg_statistic are currently built using the database's default* collation. Thus, in most cases where we are looking at statistics, we* should ignore the actual operator collation and use DEFAULT_COLLATION_OID.* We expect that the error induced by doing this is usually not large enough* to justify complicating matters.*----------
4. join,是joinsel即join的选择性计算函数。
对应pg_operator.oprjoin
postgres=# select distinct oprjoin from pg_operator order by 1;oprjoin-------------------eqjoinselneqjoinselscalarltjoinselscalargtjoinselareajoinselpositionjoinselcontjoinseliclikejoinselicnlikejoinselregexeqjoinsellikejoinselicregexeqjoinselregexnejoinselnlikejoinselicregexnejoinselnetworkjoinseltsmatchjoinselarraycontjoinsel(19 rows)
5. hashes
6. merges
hashes和merges表示该操作符是否允许hash join和merge join, 只有返回布尔逻辑值的二元操作符满足这个要求。
我们在pg_operator这个catalog中也可以查看到对应的介绍:
指定操作符在什么位置,例如中间,左侧,右侧oprcanmergebool This operator supports merge joins
此操作符是否支持merge joinoprcanhashbool This operator supports hash joins
此操作符是否支持hash joinoprleftoidpg_type.oidType of the left operand
操作符左侧的数据类型oprrightoidpg_type.oidType of the right operand
操作符右侧的数据类型oprresultoidpg_type.oidType of the result
返回结果的数据类型oprcomoidpg_operator.oidCommutator of this operator, if any
oprnegateoidpg_operator.oidNegator of this operator, if anyoprcoderegprocpg_proc.oidFunction that implements this operatoroprrestregprocpg_proc.oidRestriction selectivity estimation function for this operatoroprjoinregprocpg_proc.oidJoin selectivity estimation function for this operator
1. http://www.postgresql.org/docs/9.4/static/sql-createoperator.html
2. http://www.postgresql.org/docs/9.4/static/xoper-optimization.html
0 0
- PostgreSQL (User defined Operator) UDO & Operator Optimization Information
- operator->*
- operator[]
- operator
- operator
- operator.*()
- operator[]
- Operator
- operator
- operator
- operator ->
- operator
- operator ++
- .operator
- operator*() 和 operator->()
- operator++() 和 operator++(int)
- operator 与&operator
- operator和&operator
- geodatabase中如何获取feature class的ObjectID
- Attention model 的一些概念
- 51nod 1515:明辨是非 并查集合并
- Homework1
- 自动化测试软件UFT实验
- PostgreSQL (User defined Operator) UDO & Operator Optimization Information
- android sqlite的SqliteDataBase.insert()方法,以及如何在事务管理中使用
- Caffe matlab之基于Alex network的特征提取
- Word2Vec
- iOS8中使用CoreLocation定位
- 在linux系统中安装MySQL
- Eclipse操作技巧
- orafce extension for PostgreSQL, Oracle's compatibility functions and packages
- Manacher-字符串处理