PostgreSQL zhparser & english parser, query, insert performance
来源:互联网 发布:银行数据管理制度 编辑:程序博客网 时间:2024/06/06 01:22
Postgres2015全国用户大会将于11月20至21日在北京丽亭华苑酒店召开。本次大会嘉宾阵容强大,国内顶级PostgreSQL数据库专家将悉数到场,并特邀欧洲、俄罗斯、日本、美国等国家和地区的数据库方面专家助阵:
- Postgres-XC项目的发起人铃木市一(SUZUKI Koichi)
- Postgres-XL的项目发起人Mason Sharp
- pgpool的作者石井达夫(Tatsuo Ishii)
- PG-Strom的作者海外浩平(Kaigai Kohei)
- Greenplum研发总监姚延栋
- 周正中(德哥), PostgreSQL中国用户会创始人之一
- 汪洋,平安科技数据库技术部经理
- ……
本文主要测试一下PostgreSQL 中英文分词的性能,以及带索引的情况下的插入,查询性能。
测试环境:
CPU:Intel(R) Xeon(R) CPU X7460 @ 2.66GHzMEM:60GOS:CentOS 6.x x64PostgreSQL 9.4.1
中文分词使用 http://blog.163.com/digoal@126/blog/static/163877040201422410175698/
性能数据汇总:
英语分词性能:~ 900万 words每秒 ( Intel(R) Xeon(R) CPU X7460 @ 2.66GHz )
中文分词性能:~ 400万 字每秒 ( Intel(R) Xeon(R) CPU X7460 @ 2.66GHz )
英文分词+插入性能:~ 666万 字每秒 ( Intel(R) Xeon(R) CPU X7460 @ 2.66GHz )
中文分词+插入性能:~ 290万 字每秒 ( Intel(R) Xeon(R) CPU X7460 @ 2.66GHz )
查询性能和查询条件,数据量都有关系,没有很好的评估标准,大多数查询可以在毫秒级返回。
分词性能测试过程如下:
为了测试效果,首先要修改to_tsvector的函数稳定性,避免一次计算多次调用。
对稳定性不了解的话,介绍请参考我以前写的文章或视频。
digoal=> \df+ to_tsvectorList of functionsSchema | Name | Result data type | Argument data types | Type | Security | Volatility | Owner | Language | Sourcecode | Description------------+-------------+------------------+---------------------+--------+----------+------------+----------+----------+------------------+-----------------------pg_catalog | to_tsvector | tsvector | regconfig, text | normal | invoker | immutable | postgres | internal | to_tsvector_byid | transform to tsvectorpg_catalog | to_tsvector | tsvector | text | normal | invoker | stable | postgres | internal | to_tsvector | transform to tsvector(2 rows)digoal=# alter function to_tsvector(regconfig,text) stable;ALTER FUNCTIONTime: 0.632 ms
创建英语和中文的测试函数:
英文272个单词,中文包含291个字或单词.
create or replace function f_testts_en() returns void as $$declarebeginperform to_tsvector('english', 'Before you can use PostgreSQL you need to install it, of course. It is possible that PostgreSQL is already installed at your site, either because it was included in your operating system distribution or because the system administrator already installed it. If that is the case, you should obtain information from the operating system documentation or your system administrator about how to access PostgreSQL.If you are not sure whether PostgreSQL is already available or whether you can use it for your experimentation then you can install it yourself. Doing so is not hard and it can be a good exercise. PostgreSQL can be installed by any unprivileged user; no superuser (root) access is required.If you are installing PostgreSQL yourself, then refer to Chapter 15 for instructions on installation, and return to this guide when the installation is complete. Be sure to follow closely the section about setting up the appropriate environment variables.If your site administrator has not set things up in the default way, you might have some more work to do. For example, if the database server machine is a remote machine, you will need to set the PGHOST environment variable to the name of the database server machine. The environment variable PGPORT might also have to be set. The bottom line is this: if you try to start an application program and it complains that it cannot connect to the database, you should consult your site administrator or, if that is you, the documentation to make sure that your environment is properly set up. If you did not understand the preceding paragraph then read the next section.');return;end;$$ language plpgsql strict;
分词结果:
'15':126 'access':63,113 'administr':38,59,158,242 'alreadi':19,39,73 'also':209 'applic':225 'appropri':152 'avail':74 'bottom':215 'cannot':232 'case':46 'chapter':125 'close':145 'complain':229 'complet':140 'connect':233 'consult':239 'cours':12 'databas':180,201,236 'default':166 'distribut':33 'document':55,249 'either':24 'environ':153,194,205,255 'exampl':177 'exercis':101 'experiment':83 'follow':144 'good':100 'guid':135 'hard':94 'includ':28 'inform':50 'instal':9,20,40,87,105,119,130,138 'instruct':128 'line':216 'machin':182,186,203 'make':251 'might':169,208 'name':198 'need':7,189 'next':271 'obtain':49 'oper':31,53 'paragraph':267 'pghost':193 'pgport':207 'possibl':15 'postgresql':5,17,64,71,102,120 'preced':266 'program':226 'proper':257 'read':269 'refer':123 'remot':185 'requir':115 'return':132 'root':112 'section':147,272 'server':181,202 'set':149,161,191,213,258 'site':23,157,241 'start':223 'superus':111 'sure':69,142,252 'system':32,37,54,58 'thing':162 'tri':221 'understand':264 'unprivileg':108 'use':4,79 'user':109 'variabl':154,195,206 'way':167 'whether':70,76 'work':173(1 row)
create or replace function f_testts_zh() returns void as $$declarebeginperform to_tsvector('testzhcfg','<PostgreSQL发展历程,社区介绍,资源介绍 2小时><PostgreSQL SQL基础 2天>第一天PostgreSQL的安装, 架构基础介绍, 如何创建和访问数据库;SQL语言基础(创建表, 操作表数据, 表关联查询, 聚合查询);SQL高级应用(视图, 外键, 事务, 窗口函数, 继承);SQL语义结构介绍(关键字, 常量, 操作符, 特殊字符, 注释, 操作符的优先级);值表达式介绍(列, 位置参数, 数组下标, field引用, 操作符调用, 函数调用, 聚合表达式, 窗口函数调用, 类型转换, collate表达式, 标量子查询, 数组构造器, 行构造器, 表达式逻辑运算规则);函数调用用法介绍(位置参数调用, 命名参数调用, 混合调用);数据定义(默认值, 约束, 系统列, 表结构和定义修改, 权限, schema, 继承, 分区, 外部数据);第二天数据查询(from子句, where子句, group by having子句, 窗口子句, select子句, 结果集合操作, 排序, 限制, 位移, values子句, CET用法);数据类型(数字, 货币, 字符, 字节, 时间, 布尔, 枚举, 几何, 网络, 比特, 文本搜索, UUID, XML, JSON, 数组, 复合类型, 范围类型, 对象类型, 伪类型);函数与操作符(逻辑, 比较, 数学运算, 字符串, 字节流, 比特位, 样式匹配, 格式化, 时间, 枚举, 几何, 网络, 全文检索, XML, JSON, 序列, 条件, 数组, 范围, 聚合, 窗口, 子查询, 行与数组的比较, SRF函数, 系统信息函数, 系统管理函数, 触发器函数, 事件触发器函数);类型转换;索引用法;全文检索用法;并行控制;SQL性能优化;<PostgreSQL 数据库管理 3天>第一天PostgreSQL源代码安装(源码结构介绍, 配置介绍, 自定义FLAG介绍, 内核参数优化, 安装);数据库服务器配置(OS USER, 初始化集群, 启动数据库集群, 关闭数据库集群, 升级数据库软件, 防止欺骗, 数据封装方法, SSL, SSH隧道)数据库集群进程结构介绍(postmaster, logger, checkpointer, writer, wal writer, autovacuum launcher, autovacuum worker, stats collector, backend, worker);认证和连接(客户端认证方法介绍, 认证配置, 常用认证方法使用举例, 常见认证错误排错);pgAdmin III 使用介绍(安装, 使用, 调试函数, 维护数据);数据库配置(guc.c, 参数优先级, 参数含义, 开发参数, guc隐含参数, 如何修改参数并使之生效, 如何查看当前参数值, 如何查看参数值的范围);用户管理(创建角色, 角色权限管理, 角色membership管理, 角色成员权限管理);数据库逻辑结构(cluster, database, schema, object, field, access privilege);数据库物理结构(tablespace, datafile, segment, block, controlfile, xlog, archivelog);第二天数据库管理(创建表空间, 创建数据库, 创建数据库模板, 数据库配置, 删除数据库, 跨数据库的数据访问);基于角色的对象权限管理(表, 列, 序列, 数据库, 域, 外部数据, 函数, 语言, 大对象, schema, 表空间, 类型);默认权限和继承权限(default privilege, inherit privilege);数据库安全(认证, 网络, 数据存储, 密码, 注入, 欺骗);数据库监控(状态监控, 趋势监控, 预警, 常用监控工具和插件nagios, zabbix, pg_statsinfo);多版本并发控制和垃圾回收;日常维护(垃圾回收, 数据重组, 索引维护, VM维护, 预防XID溢出, 日志维护);第三天数据迁移(逻辑备份和还原, 增量数据迁移的方法, 异构数据迁移的方法(如oracle to postgresql));如何打数据库补丁;数据库版本升级(小版本升级方法, 大版本升级方法);建模与Benchmark(如何根据业务形态建立数据库测试模型, 测试工具的使用, 根据测试模型和硬件标准输出benchmark);数据库日志分析(错误代码介绍, 日志的筛选过滤, 日志的维护);数据库审计(参数层面的审计开关, 定制化审计(如触发器结合HSTORE), 审计数据的维护手段);本地化(字符集, collate, 编码转换, 如何防止乱码);数据库性能分析(OS性能分析报表, SQL性能分析报表);数据库巡检(如何定制巡检项, 指标, 如何分析巡检报告);');return;end;$$ language plpgsql strict;
分词结果:
'access':352 'archivelog':363 'autovacuum':271,273 'backend':277 'benchmark':486,500 'block':360 'by':123 'c':309 'cet':137 'checkpointer':267 'cluster':347 'collate':75,530 'collector':276 'controlfile':361 'database':348 'datafile':358 'default':405 'field':63,351 'flag':230 'from':118 'group':122 'guc':308,316 'having':124 'hstore':523 'iii':297 'inherit':407 'json':154,181 'launcher':272'logger':266 'membership':338 'nagios':429 'object':350 'oracle':468 'os':239,537 'pg_statsinfo':431 'pgadmin':296 'postgresql':1,9,15,215,221,470 'postmaster':265 'privilege':353,406,408 'schema':112,349,397 'segment':359 'select':127 'sql':10,23,35,43,212,540 'srf':191 'ssh':258 'ssl':257 'stats':275 'tablespace':357 'to':469 'user':240 'uuid':152 'values':135 'vm':444 'wal':269 'where':120'worker':274,278 'writer':268,270 'xid':447 'xlog':362 'xml':153,180 'zabbix':430 '下标':62 '业务':487 '举例':291 '乱码':534 '事件':201 '事务':40 '二天':116,364 '介绍':5,7,19,46,57,91,226,228,231,264,284,299,506 '代码':505 '优先级':54,311 '优化':214,234 '位移':134'位置':59,92 '使':321 '使用':290,298,301,494 '信息':194 '修改':110,319 '值':55,103,325,328 '全文检索':179,208 '关联':31 '关键字':47'关闭':246 '内核':232 '几何':148,177 '函数':41,67,71,88,163,192,195,198,200,203,303,393 '分区':114 '分析':503,548 '列':58,106,388 '创建':20,25,332,367,370,372 '初始化':241 '删除':377 '匹配':173 '升级':249,476,479,483 '历程':3 '参数':60,93,96,233,310,312,315,318,320,324,327,514 '发展':2 '含义':313 '启动':243 '命名':95 '回收':436,439 '垃圾':435,438 '域':391 '基础':11,18 '增量':458 '备份':456 '合':156 '大':395,481 '天':12,14,218,220,452 '如':467,520 '子句':119,121,125,126,128,136 '字':170 '字符':51,143 '字符串':169 '字符集':529 '字节':144 '存储':414 '安全':410 '安装':16,223,235,300 '定义':101,109,229 '定制':518,545 '审计':513,516,519,524 '客户端':281 '码':415 '对象':160,384,396 '封装':255 '小':477 '小时':8 '层面':515 '巡检':544,546,549 '工具':427 '布尔':146 '常用':287,425 '常见':292 '常量':48 '并发':433 '并行':210 '序列':182,389 '应用':37 '建模':485 '建立':489 '开关':517 '开发':314 '异':462 '引用':64 '形态':488'性能':213 '性能分析':536,538,541 '成员':341 '手段':527 '打':471 '报告':550 '报表':539,542 '指标':547 '排序':132 '排错':295 '控制':211,434 '插件':428 '搜索':151 '操作':27,131 '操作符':49,53,65,164 '数字':141 '数学':167 '数据':29,100,115,139,254,305,381,392,413,440,453,459,464,525 '数据库':22,216,236,244,247,250,260,306,344,354,365,371,373,375,378,380,390,409,418,472,474,490,501,512,535,543 '据查询':117 '数组':61,80,155,184,189 '文本':150 '方法':256,283,289,461,466,480,484 '日志':449,502,507,510 '时间':145,175 '服务器':237 '本地化':528 '权限':111,335,342,385,402,404 '条件':183 '构':463 '构造器':81,83 '枚举':147,176 '架构':17 '查看':323,326 '查询':32,34,79,187 '标':77 '标准':498 '样式':172 '格式化':174 '模型':492,496 '模板':374 '欺骗':253,417 '比较':166,190 '注入':416 '注释':52 '测试':491,495 '测试工具':493 '混合':98 '源代码':222 '源码':224 '溢出':448 '版本':432,475,478,482 '物理':355 '特殊':50 '状态':420 '生效':322 '用户':330 '用法':90,138,207,209 '监控':419,421,423,426 '硬件':497 '社区':4 '空间':369,399 '第一':13,219 '第三':451 '筛选':508'管理':197,217,331,336,339,343,366,386 '类型':73,140,157,159,161,162,204,400 '系统':105,193,196 '索引':206,442 '约束':104 '结合':522 '结构':45,108,225,263,346,356 '结果':129 '继承':42,113,403 '维护':304,437,443,445,450,511,526 '编码':531 '网络':149,178,412 '聚合':33,69,186 '节流':171 '范围':158,185,329 '行':82,188 '补丁':473 '表':26,28,30,107,368,387,398 '表达式':56,70,76,84 '规则':87 '视图':38 '角色':333,334,337,340,383 '触发器':199,202,521 '认证':279,282,285,288,293,411 '访问':21,382 '语义':44 '语言':394 '语言基础':24 '调用':66,68,72,89,94,97,99 '调试':302 '货币':142 '资源':6 '趋势':422 '跨':379 '转换':74,205,532 '软件':251 '输出':499 '迁移':454,460,465 '过滤':509 '运算':86,168 '还原':457 '进程':262 '连接':280 '逻辑':85,165,345,455 '配置':227,238,286,307,376 '重组':441 '量子':78'错误':294,504 '键':39 '防止':252,533 '限制':133 '隐含':317 '隧道':259 '集合':130 '集群':242,245,248,261 '预警':424 '预防':446 '高':36 '默认':102,401(1 row)
英文分词性能测试结果:
英语分词性能:~ 900万 words每秒 ( Intel(R) Xeon(R) CPU X7460 @ 2.66GHz )
postgres@db-192-168-173-33-> vi test.sqlselect f_testts_en();postgres@db-192-168-173-33-> pgbench -M prepared -n -f test.sql -P 5 -c 36 -j 36 -T 30000000progress: 5.0 s, 33072.6 tps, lat 1.083 ms stddev 0.371progress: 10.0 s, 33218.9 tps, lat 1.082 ms stddev 0.366progress: 15.0 s, 33213.4 tps, lat 1.083 ms stddev 0.365progress: 20.0 s, 33202.3 tps, lat 1.083 ms stddev 0.364progress: 25.0 s, 33204.5 tps, lat 1.083 ms stddev 0.364progress: 30.0 s, 33215.0 tps, lat 1.083 ms stddev 0.366progress: 35.0 s, 33220.8 tps, lat 1.082 ms stddev 0.367progress: 40.0 s, 33210.7 tps, lat 1.083 ms stddev 0.364progress: 45.0 s, 33220.4 tps, lat 1.082 ms stddev 0.365progress: 50.0 s, 33217.2 tps, lat 1.083 ms stddev 0.365progress: 55.0 s, 33231.6 tps, lat 1.082 ms stddev 0.367progress: 60.0 s, 33234.4 tps, lat 1.082 ms stddev 0.367progress: 65.0 s, 33232.7 tps, lat 1.082 ms stddev 0.367progress: 70.0 s, 33212.2 tps, lat 1.083 ms stddev 0.369progress: 75.0 s, 33232.4 tps, lat 1.082 ms stddev 0.367progress: 80.0 s, 33222.7 tps, lat 1.082 ms stddev 0.367progress: 85.0 s, 33221.1 tps, lat 1.082 ms stddev 0.365progress: 90.0 s, 33220.4 tps, lat 1.082 ms stddev 0.365progress: 95.0 s, 33213.6 tps, lat 1.083 ms stddev 0.364progress: 100.0 s, 33227.5 tps, lat 1.082 ms stddev 0.368progress: 105.0 s, 33230.8 tps, lat 1.082 ms stddev 0.366progress: 110.0 s, 33228.1 tps, lat 1.082 ms stddev 0.366
中文分词性能测试结果:
中文分词性能:~ 400万 字每秒 ( Intel(R) Xeon(R) CPU X7460 @ 2.66GHz )
postgres@db-192-168-173-33-> vi test.sqlselect f_testts_zh();postgres@db-192-168-173-33-> pgbench -M prepared -n -f test.sql -P 5 -c 36 -j 36 -T 30000000progress: 5.0 s, 13874.6 tps, lat 2.582 ms stddev 0.892progress: 10.0 s, 13947.0 tps, lat 2.580 ms stddev 0.875progress: 15.0 s, 13949.0 tps, lat 2.580 ms stddev 0.874progress: 20.0 s, 13950.1 tps, lat 2.579 ms stddev 0.871progress: 25.0 s, 13951.1 tps, lat 2.579 ms stddev 0.876progress: 30.0 s, 13950.6 tps, lat 2.579 ms stddev 0.873progress: 35.0 s, 13950.5 tps, lat 2.579 ms stddev 0.877progress: 40.0 s, 13951.8 tps, lat 2.579 ms stddev 0.876progress: 45.0 s, 13953.2 tps, lat 2.579 ms stddev 0.878progress: 50.0 s, 13949.9 tps, lat 2.579 ms stddev 0.876progress: 55.0 s, 13947.4 tps, lat 2.580 ms stddev 0.875progress: 60.0 s, 13947.8 tps, lat 2.580 ms stddev 0.877progress: 65.0 s, 13948.4 tps, lat 2.580 ms stddev 0.877progress: 70.0 s, 13952.1 tps, lat 2.579 ms stddev 0.878progress: 75.0 s, 13948.1 tps, lat 2.580 ms stddev 0.877progress: 80.0 s, 13950.3 tps, lat 2.579 ms stddev 0.875progress: 85.0 s, 13951.3 tps, lat 2.579 ms stddev 0.875progress: 90.0 s, 13950.9 tps, lat 2.579 ms stddev 0.879progress: 95.0 s, 13949.6 tps, lat 2.579 ms stddev 0.876progress: 100.0 s, 13950.1 tps, lat 2.579 ms stddev 0.876progress: 105.0 s, 13951.9 tps, lat 2.579 ms stddev 0.874progress: 110.0 s, 13950.7 tps, lat 2.579 ms stddev 0.878
分词+插入性能测试:
使用gist或gin索引,索引的适用场景和优化手段可以参考我以前写的文章。
(静态数据考虑GIN,动态数据考虑GIST,GIN可以通过增加WORK_MEM优化动态性能)。
digoal=> create table test_ts(ts tsvector);CREATE TABLEdigoal=> create index idx_test_ts on test_ts using gist(ts);CREATE INDEXcreate or replace function f_testts_en() returns void as $$declarebegininsert into test_ts values (to_tsvector('Before you can use PostgreSQL you need to install it, of course. It is possible that PostgreSQL is already installed at your site, either because it was included in your operating system distribution or because the system administrator already installed it. If that is the case, you should obtain information from the operating system documentation or your system administrator about how to access PostgreSQL.If you are not sure whether PostgreSQL is already available or whether you can use it for your experimentation then you can install it yourself. Doing so is not hard and it can be a good exercise. PostgreSQL can be installed by any unprivileged user; no superuser (root) access is required.If you are installing PostgreSQL yourself, then refer to Chapter 15 for instructions on installation, and return to this guide when the installation is complete. Be sure to follow closely the section about setting up the appropriate environment variables.If your site administrator has not set things up in the default way, you might have some more work to do. For example, if the database server machine is a remote machine, you will need to set the PGHOST environment variable to the name of the database server machine. The environment variable PGPORT might also have to be set. The bottom line is this: if you try to start an application program and it complains that it cannot connect to the database, you should consult your site administrator or, if that is you, the documentation to make sure that your environment is properly set up. If you did not understand the preceding paragraph then read the next section.'));return;end;$$ language plpgsql strict;create or replace function f_testts_zh() returns void as $$declarebegininsert into test_ts values ( to_tsvector('testzhcfg','<PostgreSQL发展历程,社区介绍,资源介绍 2小时><PostgreSQL SQL基础 2天>第一天PostgreSQL的安装, 架构基础介绍, 如何创建和访问数据库;SQL语言基础(创建表, 操作表数据, 表关联查询, 聚合查询);SQL高级应用(视图, 外键, 事务, 窗口函数, 继承);SQL语义结构介绍(关键字, 常量, 操作符, 特殊字符, 注释, 操作符的优先级);值表达式介绍(列, 位置参数, 数组下标, field引用, 操作符调用, 函数调用, 聚合表达式, 窗口函数调用, 类型转换, collate表达式, 标量子查询, 数组构造器, 行构造器, 表达式逻辑运算规则);函数调用用法介绍(位置参数调用, 命名参数调用, 混合调用);数据定义(默认值, 约束, 系统列, 表结构和定义修改, 权限, schema, 继承, 分区, 外部数据);第二天数据查询(from子句, where子句, group by having子句, 窗口子句, select子句, 结果集合操作, 排序, 限制, 位移, values子句, CET用法);数据类型(数字, 货币, 字符, 字节, 时间, 布尔, 枚举, 几何, 网络, 比特, 文本搜索, UUID, XML, JSON, 数组, 复合类型, 范围类型, 对象类型, 伪类型);函数与操作符(逻辑, 比较, 数学运算, 字符串, 字节流, 比特位, 样式匹配, 格式化, 时间, 枚举, 几何, 网络, 全文检索, XML, JSON, 序列, 条件, 数组, 范围, 聚合, 窗口, 子查询, 行与数组的比较, SRF函数, 系统信息函数, 系统管理函数, 触发器函数, 事件触发器函数);类型转换;索引用法;全文检索用法;并行控制;SQL性能优化;<PostgreSQL 数据库管理 3天>第一天PostgreSQL源代码安装(源码结构介绍, 配置介绍, 自定义FLAG介绍, 内核参数优化, 安装);数据库服务器配置(OS USER, 初始化集群, 启动数据库集群, 关闭数据库集群, 升级数据库软件, 防止欺骗, 数据封装方法, SSL, SSH隧道)数据库集群进程结构介绍(postmaster, logger, checkpointer, writer, wal writer, autovacuum launcher, autovacuum worker, stats collector, backend, worker);认证和连接(客户端认证方法介绍, 认证配置, 常用认证方法使用举例, 常见认证错误排错);pgAdmin III 使用介绍(安装, 使用, 调试函数, 维护数据);数据库配置(guc.c, 参数优先级, 参数含义, 开发参数, guc隐含参数, 如何修改参数并使之生效, 如何查看当前参数值, 如何查看参数值的范围);用户管理(创建角色, 角色权限管理, 角色membership管理, 角色成员权限管理);数据库逻辑结构(cluster, database, schema, object, field, access privilege);数据库物理结构(tablespace, datafile, segment, block, controlfile, xlog, archivelog);第二天数据库管理(创建表空间, 创建数据库, 创建数据库模板, 数据库配置, 删除数据库, 跨数据库的数据访问);基于角色的对象权限管理(表, 列, 序列, 数据库, 域, 外部数据, 函数, 语言, 大对象, schema, 表空间, 类型);默认权限和继承权限(default privilege, inherit privilege);数据库安全(认证, 网络, 数据存储, 密码, 注入, 欺骗);数据库监控(状态监控, 趋势监控, 预警, 常用监控工具和插件nagios, zabbix, pg_statsinfo);多版本并发控制和垃圾回收;日常维护(垃圾回收, 数据重组, 索引维护, VM维护, 预防XID溢出, 日志维护);第三天数据迁移(逻辑备份和还原, 增量数据迁移的方法, 异构数据迁移的方法(如oracle to postgresql));如何打数据库补丁;数据库版本升级(小版本升级方法, 大版本升级方法);建模与Benchmark(如何根据业务形态建立数据库测试模型, 测试工具的使用, 根据测试模型和硬件标准输出benchmark);数据库日志分析(错误代码介绍, 日志的筛选过滤, 日志的维护);数据库审计(参数层面的审计开关, 定制化审计(如触发器结合HSTORE), 审计数据的维护手段);本地化(字符集, collate, 编码转换, 如何防止乱码);数据库性能分析(OS性能分析报表, SQL性能分析报表);数据库巡检(如何定制巡检项, 指标, 如何分析巡检报告);'));return;end;$$ language plpgsql strict;
英语分词+插入性能:
英文分词+插入性能:~ 666万 字每秒 ( Intel(R) Xeon(R) CPU X7460 @ 2.66GHz )
postgres@db-192-168-173-33-> pgbench -M prepared -n -f test.sql -P 5 -c 36 -j 36 -T 30000000progress: 5.0 s, 24825.6 tps, lat 1.443 ms stddev 1.082progress: 10.0 s, 24609.2 tps, lat 1.462 ms stddev 1.112progress: 15.0 s, 24766.2 tps, lat 1.452 ms stddev 1.106progress: 20.0 s, 24578.8 tps, lat 1.463 ms stddev 1.095progress: 25.0 s, 24761.1 tps, lat 1.453 ms stddev 1.106progress: 30.0 s, 24558.6 tps, lat 1.465 ms stddev 1.129progress: 35.0 s, 22420.1 tps, lat 1.604 ms stddev 1.376progress: 40.0 s, 25480.7 tps, lat 1.412 ms stddev 1.108progress: 45.0 s, 25528.4 tps, lat 1.409 ms stddev 1.097progress: 50.0 s, 25239.3 tps, lat 1.425 ms stddev 1.113progress: 55.0 s, 25236.4 tps, lat 1.425 ms stddev 1.117progress: 60.0 s, 25392.1 tps, lat 1.417 ms stddev 1.079progress: 65.0 s, 25521.0 tps, lat 1.409 ms stddev 1.100progress: 70.0 s, 25154.2 tps, lat 1.430 ms stddev 1.117progress: 75.0 s, 25260.6 tps, lat 1.424 ms stddev 1.105progress: 80.0 s, 25364.2 tps, lat 1.418 ms stddev 1.086progress: 85.0 s, 25383.5 tps, lat 1.417 ms stddev 1.094progress: 90.0 s, 25392.8 tps, lat 1.417 ms stddev 1.116progress: 95.0 s, 25651.1 tps, lat 1.402 ms stddev 1.102progress: 100.0 s, 25663.7 tps, lat 1.401 ms stddev 1.082progress: 105.0 s, 24486.3 tps, lat 1.469 ms stddev 1.204progress: 110.0 s, 22847.7 tps, lat 1.574 ms stddev 1.362progress: 115.0 s, 24687.0 tps, lat 1.457 ms stddev 1.248progress: 120.0 s, 25433.6 tps, lat 1.414 ms stddev 1.095progress: 125.0 s, 25667.7 tps, lat 1.401 ms stddev 1.094progress: 130.0 s, 25581.7 tps, lat 1.406 ms stddev 1.116progress: 135.0 s, 25411.5 tps, lat 1.415 ms stddev 1.128progress: 140.0 s, 25507.5 tps, lat 1.410 ms stddev 1.105progress: 145.0 s, 25503.0 tps, lat 1.410 ms stddev 1.087progress: 150.0 s, 25533.5 tps, lat 1.409 ms stddev 1.110progress: 155.0 s, 25708.8 tps, lat 1.399 ms stddev 1.087progress: 160.0 s, 25574.1 tps, lat 1.406 ms stddev 1.118progress: 165.0 s, 25580.5 tps, lat 1.406 ms stddev 1.098progress: 170.0 s, 25428.9 tps, lat 1.414 ms stddev 1.111progress: 175.0 s, 25522.0 tps, lat 1.409 ms stddev 1.102progress: 180.0 s, 23641.7 tps, lat 1.521 ms stddev 1.276progress: 185.0 s, 23065.6 tps, lat 1.559 ms stddev 1.425progress: 190.0 s, 24623.5 tps, lat 1.461 ms stddev 1.218progress: 195.0 s, 25486.2 tps, lat 1.411 ms stddev 1.142progress: 200.0 s, 25507.4 tps, lat 1.410 ms stddev 1.108
中文分词+插入性能:
中文分词+插入性能:~ 290万 字每秒 ( Intel(R) Xeon(R) CPU X7460 @ 2.66GHz )
postgres@db-192-168-173-33-> pgbench -M prepared -n -f test.sql -P 5 -c 36 -j 36 -T 30000000progress: 5.0 s, 10934.3 tps, lat 3.277 ms stddev 2.349progress: 10.0 s, 10855.9 tps, lat 3.315 ms stddev 2.359progress: 15.0 s, 10828.8 tps, lat 3.323 ms stddev 2.426progress: 20.0 s, 10849.6 tps, lat 3.316 ms stddev 2.361progress: 25.0 s, 10909.1 tps, lat 3.299 ms stddev 2.320progress: 30.0 s, 10855.7 tps, lat 3.315 ms stddev 2.388progress: 35.0 s, 10857.4 tps, lat 3.315 ms stddev 2.367progress: 40.0 s, 10799.4 tps, lat 3.332 ms stddev 2.503progress: 45.0 s, 10962.9 tps, lat 3.283 ms stddev 2.390progress: 50.0 s, 10843.2 tps, lat 3.319 ms stddev 2.466progress: 55.0 s, 9665.7 tps, lat 3.723 ms stddev 3.115progress: 60.0 s, 9950.5 tps, lat 3.618 ms stddev 2.918progress: 65.0 s, 10822.1 tps, lat 3.324 ms stddev 2.470progress: 70.0 s, 10764.6 tps, lat 3.343 ms stddev 2.517progress: 75.0 s, 10887.8 tps, lat 3.306 ms stddev 2.481progress: 80.0 s, 10897.6 tps, lat 3.303 ms stddev 2.474progress: 85.0 s, 10814.6 tps, lat 3.328 ms stddev 2.458progress: 90.0 s, 10805.0 tps, lat 3.330 ms stddev 2.552progress: 95.0 s, 10921.5 tps, lat 3.296 ms stddev 2.422progress: 100.0 s, 10820.0 tps, lat 3.326 ms stddev 2.518progress: 105.0 s, 9833.8 tps, lat 3.659 ms stddev 3.212progress: 110.0 s, 9609.7 tps, lat 3.744 ms stddev 3.084progress: 115.0 s, 10159.1 tps, lat 3.543 ms stddev 2.917progress: 120.0 s, 10849.9 tps, lat 3.316 ms stddev 2.520progress: 125.0 s, 10820.0 tps, lat 3.327 ms stddev 2.505progress: 130.0 s, 10828.0 tps, lat 3.323 ms stddev 2.549progress: 135.0 s, 10819.1 tps, lat 3.326 ms stddev 2.536
查询性能:
查询性能视查询条件,数据内容而定。一般可以在毫秒级返回。当然还要关注实际查询条件和索引存储结构,有些查询组合可能使查询时间较长,因为需要扫描的数据链路更长。
digoal=> explain analyze select * from test_ts where ts @@ tsquery '源代码' limit 1;QUERY PLAN------------------------------------------------------------------------------------------------------------------------------------------Limit (cost=0.41..0.85 rows=1 width=4426) (actual time=0.156..0.156 rows=1 loops=1)-> Index Scan using idx_test_ts on test_ts (cost=0.41..1039026.29 rows=2371193 width=4426) (actual time=0.154..0.154 rows=1 loops=1)Index Cond: (ts @@ '''源代码'''::tsquery)Planning time: 0.222 msExecution time: 0.236 ms(5 rows)Time: 0.881 ms
digoal=> explain analyze select * from test_ts where ts @@ tsquery '源代码 & !集群' limit 1;QUERY PLAN--------------------------------------------------------------------------------------------------------------------------Limit (cost=178.00..180.01 rows=1 width=4426) (actual time=363.066..363.066 rows=0 loops=1)-> Bitmap Heap Scan on test_ts (cost=178.00..180.01 rows=1 width=4426) (actual time=363.064..363.064 rows=0 loops=1)Recheck Cond: (ts @@ '''源代码'' & !''集群'''::tsquery)-> Bitmap Index Scan on idx_1 (cost=0.00..178.00 rows=1 width=0) (actual time=363.061..363.061 rows=0 loops=1)Index Cond: (ts @@ '''源代码'' & !''集群'''::tsquery)Planning time: 0.167 msExecution time: 363.105 ms(7 rows)Time: 363.683 ms
digoal=> explain analyze select * from test_ts where ts @@ tsquery '德哥' limit 1;QUERY PLAN----------------------------------------------------------------------------------------------------------------------------Limit (cost=181.88..183.68 rows=1 width=4426) (actual time=0.019..0.019 rows=0 loops=1)-> Bitmap Heap Scan on test_ts (cost=181.88..21478.97 rows=11856 width=4426) (actual time=0.018..0.018 rows=0 loops=1)Recheck Cond: (ts @@ '''德哥'''::tsquery)-> Bitmap Index Scan on idx_1 (cost=0.00..178.92 rows=11856 width=0) (actual time=0.015..0.015 rows=0 loops=1)Index Cond: (ts @@ '''德哥'''::tsquery)Planning time: 0.180 msExecution time: 0.052 ms(7 rows)Time: 0.592 ms
存储空间:
digoal=> select count(*) from test_ts;count---------2371193(1 row)约6.9亿中文字,分词后约6.4亿中英文词组。分词占用10GB,索引占用5GBdigoal=> \dt+ test_tsList of relationsSchema | Name | Type | Owner | Size | Description--------+---------+-------+--------+-------+-------------digoal | test_ts | table | digoal | 10 GB |(1 row)digoal=> \di+List of relationsSchema | Name | Type | Owner | Table | Size | Description--------+-------------+-------+--------+---------+---------+-------------digoal | idx_test_ts | index | digoal | test_ts | 4959 MB |(1 row)
1. http://blog.163.com/digoal@126/blog/static/163877040201422410175698/
2. http://blog.163.com/digoal@126/blog/static/163877040201552102814822/
0 0
- PostgreSQL zhparser & english parser, query, insert performance
- Postgresql中文分词zhparser学习(一)
- Lucene Query Parser Syntax
- postgresql insert
- Remote query performance tunning
- Apache Lucene - Query Parser Syntax
- Solr 使用自定义 Query Parser
- Solr 使用自定义 Query Parser
- BULK INSERT Performance
- ADO Insert Performance Issue
- PostgreSQL Random Query Tuning
- 安装postgresQL Query Cache
- PostgreSQL Query 优化体验
- postgresql query user
- PostgreSQL配置文件--QUERY TUNING
- Performance Tuning-The Query Optimizer
- batch insert in PostgreSQL
- PostgreSql Partition + Hibernate Insert
- 直方图均衡化的作用是图像增强原理
- SQL语句执行顺序
- llvm之IR手册翻译(2)
- Hadoop
- 【软件工程】软件测试
- PostgreSQL zhparser & english parser, query, insert performance
- matlab unique找出矩阵中不同元素
- 为何nested loop要求小表驱动,hash join又为何要求小表hashed?(转)
- android开源项目
- 可乐卤蛋的做法
- VMware提示:已将该虚拟机配置为使用 64 位客户机操作系统。但是,无法执行 64 位操作。
- 教你如何迅速秒杀掉:99%的海量数据处理面试题
- don't shutdown before pg_stop_backup()
- 关于chrome插件的开发