PostgreSQL 传统 hash 分区方法和性能

来源:互联网 发布:网络验证源码 编辑:程序博客网 时间:2024/06/05 09:45

标签

PostgreSQL , hash , list, range , hashtext , 哈希函数 , 取模 , 传统分区方法 , trigger , rule , pg_pathman , 内置分区 , general 分区


我有几张阿里云幸运券分享给你,用券购买或者升级阿里云相应产品会有特惠惊喜哦!把想要买的产品的幸运券都领走吧!快下手,马上就要抢光了

背景

除了传统的基于trigger和rule的分区,PostgreSQL 10开始已经内置了分区功能(目前仅支持list和range),使用pg_pathman则支持hash分区。

从性能角度,目前最好的还是pg_pathman分区。

但是,传统的分区手段,依旧是最灵活的,在其他方法都不奏效时,可以考虑传统方法。

如何创建传统的hash分区

1、创建父表

create table tbl (id int, info text, crt_time timestamp);  

2、创建分区表,增加约束

do language plpgsql $$  declare    parts int := 4;  begin    for i in 0..parts-1 loop      execute format('create table tbl%s (like tbl including all) inherits (tbl)', i);      execute format('alter table tbl%s add constraint ck check(mod(id,%s)=%s)', i, parts, i);    end loop;  end;  $$;  

3、创建触发器函数,内容为数据路由,路由后返回NULL(即不写本地父表)

create or replace function ins_tbl() returns trigger as $$  declare  begin    case abs(mod(NEW.id,4))      when 0 then        insert into tbl0 values (NEW.*);      when 1 then        insert into tbl1 values (NEW.*);      when 2 then        insert into tbl2 values (NEW.*);      when 3 then        insert into tbl3 values (NEW.*);      else        return NEW;  -- 如果是NULL则写本地父表      end case;      return null;  end;  $$ language plpgsql strict;  

4、创建before触发器

create trigger tg1 before insert on tbl for each row when (NEW.id is not null) execute procedure ins_tbl();  

5、验证

postgres=# insert into tbl values (1);  INSERT 0 0  postgres=# insert into tbl values (null);  INSERT 0 1  postgres=# insert into tbl values (0);  INSERT 0 0  postgres=# insert into tbl values (1);  INSERT 0 0  postgres=# insert into tbl values (2);  INSERT 0 0  postgres=# insert into tbl values (3);  INSERT 0 0  postgres=# insert into tbl values (4);  INSERT 0 0      postgres=# select  tableoid::regclass, * from tbl;   tableoid | id | info | crt_time   ----------+----+------+----------   tbl      |    |      |    tbl0     |  0 |      |    tbl0     |  4 |      |    tbl1     |  1 |      |    tbl1     |  1 |      |    tbl2     |  2 |      |    tbl3     |  3 |      |   (7 rows)  

6、查询时,只要提供了约束条件,会自动过滤到子表,不会扫描不符合约束条件的其他子表。

postgres=# explain select * from tbl where abs(mod(id,4)) = abs(mod(1,4)) and id=1;                                  QUERY PLAN                                  --------------------------------------------------------------------------   Append  (cost=0.00..979127.84 rows=3 width=45)     ->  Seq Scan on tbl  (cost=0.00..840377.67 rows=2 width=45)           Filter: ((id = 1) AND (abs(mod(id, 4)) = 1))     ->  Seq Scan on tbl1  (cost=0.00..138750.17 rows=1 width=45)           Filter: ((id = 1) AND (abs(mod(id, 4)) = 1))  (5 rows)  

传统分区性能 对比 非分区表

传统分区表性能

性能相比没有分区有一定下降。(CPU开销略有提升)

1、创建压测脚本

vi test.sql  \set id random(1,100000)  insert into tbl values (:id);  

2、压测

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 56 -j 56 -T 120    transaction type: ./test.sql  scaling factor: 1  query mode: prepared  number of clients: 56  number of threads: 56  duration: 120 s  number of transactions actually processed: 21277635  latency average = 0.316 ms  latency stddev = 0.170 ms  tps = 177290.033472 (including connections establishing)  tps = 177306.915203 (excluding connections establishing)  script statistics:   - statement latencies in milliseconds:           0.002  \set id random(1,100000)           0.315  insert into tbl values (:id);  

3、资源开销

last pid: 36817;  load avg:  32.9,  15.7,  7.27;       up 15+00:46:36                                                                                                                                                               17:59:17  63 processes: 34 running, 29 sleeping  CPU states: 42.3% user,  0.0% nice, 20.4% system, 37.1% idle,  0.2% iowait  Memory: 192G used, 29G free, 116M buffers, 186G cached  DB activity: 168654 tps,  0 rollbs/s, 928 buffer r/s, 99 hit%,    176 row r/s, 168649 row w/  DB I/O:     0 reads/s,     0 KB/s,     0 writes/s,     0 KB/s    DB disk: 1455.4 GB total, 425.2 GB free (70% used)  Swap:   

未分区表性能

postgres=# drop trigger tg1 on tbl ;  

1、TPS

transaction type: ./test.sql  scaling factor: 1  query mode: prepared  number of clients: 56  number of threads: 56  duration: 120 s  number of transactions actually processed: 31188395  latency average = 0.215 ms  latency stddev = 0.261 ms  tps = 259884.798007 (including connections establishing)  tps = 259896.495810 (excluding connections establishing)  script statistics:   - statement latencies in milliseconds:           0.002  \set id random(1,100000)           0.214  insert into tbl values (:id);  

2、资源开销

last pid: 36964;  load avg:  31.7,  18.7,  8.89;       up 15+00:47:41                                                                                                                                                               18:00:22  63 processes: 45 running, 18 sleeping  CPU states: 33.3% user,  0.0% nice, 26.8% system, 39.8% idle,  0.1% iowait  Memory: 194G used, 26G free, 118M buffers, 188G cached  DB activity: 256543 tps,  0 rollbs/s, 1006 buffer r/s, 99 hit%,    176 row r/s, 256538 row w  DB I/O:     0 reads/s,     0 KB/s,     0 writes/s,     0 KB/s    DB disk: 1455.4 GB total, 424.8 GB free (70% used)  Swap:   

非整型字段,如何实现哈希分区

1、PostgreSQL内部提供了类型转换的哈希函数,可以将任意类型转换为整型。

                                   List of functions     Schema   |      Name      | Result data type |     Argument data types     |  Type    ------------+----------------+------------------+-----------------------------+--------   pg_catalog | hash_aclitem   | integer          | aclitem                     | normal   pg_catalog | hash_array     | integer          | anyarray                    | normal   pg_catalog | hash_numeric   | integer          | numeric                     | normal   pg_catalog | hash_range     | integer          | anyrange                    | normal   pg_catalog | hashbpchar     | integer          | character                   | normal   pg_catalog | hashchar       | integer          | "char"                      | normal   pg_catalog | hashenum       | integer          | anyenum                     | normal   pg_catalog | hashfloat4     | integer          | real                        | normal   pg_catalog | hashfloat8     | integer          | double precision            | normal   pg_catalog | hashinet       | integer          | inet                        | normal   pg_catalog | hashint2       | integer          | smallint                    | normal   pg_catalog | hashint4       | integer          | integer                     | normal   pg_catalog | hashint8       | integer          | bigint                      | normal   pg_catalog | hashmacaddr    | integer          | macaddr                     | normal   pg_catalog | hashmacaddr8   | integer          | macaddr8                    | normal   pg_catalog | hashname       | integer          | name                        | normal   pg_catalog | hashoid        | integer          | oid                         | normal   pg_catalog | hashoidvector  | integer          | oidvector                   | normal   pg_catalog | hashtext       | integer          | text                        | normal   pg_catalog | hashvarlena    | integer          | internal                    | normal   pg_catalog | interval_hash  | integer          | interval                    | normal   pg_catalog | jsonb_hash     | integer          | jsonb                       | normal   pg_catalog | pg_lsn_hash    | integer          | pg_lsn                      | normal   pg_catalog | time_hash      | integer          | time without time zone      | normal   pg_catalog | timestamp_hash | integer          | timestamp without time zone | normal   pg_catalog | timetz_hash    | integer          | time with time zone         | normal   pg_catalog | uuid_hash      | integer          | uuid                        | normal  

2、其他字段类型的哈希表方法如下

如 hashtext

drop table tbl;    create table tbl (id text, info text, crt_time timestamp);    do language plpgsql $$  declare    parts int := 4;  begin    for i in 0..parts-1 loop      execute format('create table tbl%s (like tbl including all) inherits (tbl)', i);      execute format('alter table tbl%s add constraint ck check(abs(mod(hashtext(id),%s))=%s)', i, parts, i);    end loop;  end;  $$;    create or replace function ins_tbl() returns trigger as $$  declare  begin    case abs(mod(hashtext(NEW.id),4))      when 0 then        insert into tbl0 values (NEW.*);      when 1 then        insert into tbl1 values (NEW.*);      when 2 then        insert into tbl2 values (NEW.*);      when 3 then        insert into tbl3 values (NEW.*);      else        return NEW;      end case;      return null;  end;  $$ language plpgsql strict;    create trigger tg1 before insert on tbl for each row when (NEW.id is not null) execute procedure ins_tbl();  

性能与整型一样。

传统分区性能 对比 非分区表 - 性能结果

1、性能

模式insert N 行/s基于trigger的hash分区17.7 万未分区26 万

2、CPU资源开销

模式usersystemidle基于trigger的hash分区42.3%20.4%37.1%未分区33.3%26.8%39.8%

小结

除了传统的基于trigger和rule的分区,PostgreSQL 10开始已经内置了分区功能(目前仅支持list和range),使用pg_pathman则支持hash分区。

从性能角度,目前最好的还是pg_pathman分区。

《PostgreSQL 10 内置分区 vs pg_pathman perf profiling》

《PostgreSQL 10.0 preview 功能增强 - 内置分区表》

《PostgreSQL 9.5+ 高效分区表实现 - pg_pathman》

但是,传统的分区手段,依旧是最灵活的,在其他方法都不奏效时,可以考虑传统方法。

传统手段中,最懒散的做法(当然是以牺牲性能为前提),例子:

《PostgreSQL general public partition table trigger》

阅读原文http://click.aliyun.com/m/35570/