Postgresql数据库count(distinct)优化
来源:互联网 发布:淘宝差评多久会不显示 编辑:程序博客网 时间:2024/06/15 08:41
基本信息
基本情况
表共800W数据,从260W的结果集中计算出不同的案件数量(130万),需要执行20多秒原SQL内容
select count(distinct c_bh_aj) as ajcount from db_znspgl.t_zlglpt_wt where d_cjrq between '20160913' and '20170909';
- 表信息和数据量
znspgl=# \d+ db_znspgl.t_zlglpt_wt Table "db_znspgl.t_zlglpt_wt" Column | Type | Modifiers | Storage | Stats target | Description ---------+------------------------+-----------+----------+--------------+------------- c_bh | character(32) | not null | extended | | 编号 c_bh_aj | character(32) | | extended | | 案件编号 n_ajbs | numeric(15,0) | | main | | 案件标识 c_zjgz | character varying(600) | | extended | | 质检规则 c_zjxm | character varying(300) | | extended | | 质检项目 d_cjrq | date | | plain | | 创建日期Indexes: "pk_zlglpt_wt" PRIMARY KEY, btree (c_bh) "i_t_zlglpt_wt_ajbs" btree (n_ajbs) "i_t_zlglpt_wt_bh_aj" btree (c_bh_aj) "i_t_zlglpt_wt_cjrq" btree (d_cjrq)znspgl=# select count(*) from db_znspgl.t_zlglpt_wtznspgl-# ; count --------- 8000000(1 row)
- 数据库版本信息
znspgl=# select version(); version -------------------------------------------------------------------------------------------- PostgreSQL 9.5.5 (ArteryBase 3.5.3, Thunisoft). on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17), 64-bit(1 row)
- 执行计划
znspgl=# explain analyze select count(distinct c_bh_aj) as ajcount from db_znspgl.t_zlglpt_wt where d_cjrq between '20160913' and '20170909'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=313357.40..313357.41 rows=1 width=33) (actual time=23478.562..23478.563 rows=1 loops=1) -> Bitmap Heap Scan on t_zlglpt_wt (cost=55811.21..306782.09 rows=2630125 width=33) (actual time=366.909..3946.452 rows=2644330 loops=1) Recheck Cond: ((d_cjrq >= '2016-09-13'::date) AND (d_cjrq <= '2017-09-09'::date)) Rows Removed by Index Recheck: 2670504 Heap Blocks: exact=105741 lossy=105694 -> Bitmap Index Scan on i_t_zlglpt_wt_cjrq (cost=0.00..55153.68 rows=2630125 width=0) (actual time=341.468..341.468 rows=2644330 loops=1) Index Cond: ((d_cjrq >= '2016-09-13'::date) AND (d_cjrq <= '2017-09-09'::date)) Planning time: 0.143 ms Execution time: 23478.624 ms
尝试增加覆盖索引
- 增加索引
create index i_zlglpt_wt_zh01 on db_znspgl.t_zlglpt_wt (d_cjrq,c_bh_aj);
- 再次查看执行计划
znspgl=# explain analyze select count(distinct c_bh_aj) as ajcount from db_znspgl.t_zlglpt_wt where d_cjrq between '20160913' and '20170909'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=134006.11..134006.12 rows=1 width=33) (actual time=21696.556..21696.557 rows=1 loops=1) -> Index Only Scan using i_zlglpt_wt_zh01 on t_zlglpt_wt (cost=0.56..127480.16 rows=2610380 width=33) (actual time=0.055..2684.807 rows=2644330 loops=1) Index Cond: ((d_cjrq >= '2016-09-13'::date) AND (d_cjrq <= '2017-09-09'::date)) Heap Fetches: 0 Planning time: 0.318 ms Execution time: 21696.604 ms
- 思考
1、SQL速度提升很少!
2、时间主要话费在Aggregate上了,时间从2648一下子升级到21696。
3、理论上200W的count(distinct) 不应该花费19秒那么长时间,而且c_bh_aj还是有序的(建立索引了)
伪loose index scan
从网上看到一片帖子《分析MySQL中优化distinct的技巧》,count distinct 慢的原因是因为扫描编号时会扫描到很多重复的项,可以通过loose index scan避免这些重复的扫描(前提distinct项是有序的!),mysql 和 abase虽然不支持原生的loose index scan(oracle支持),但是可以通过改写SQL达到!
- 重新建立索引
drop index db_znspgl.i_zlglpt_wt_zh01;create index i_zlglpt_wt_zh01 on db_znspgl.t_zlglpt_wt (c_bh_aj,d_cjrq);
- 改写SQL
select count(*) from ( select distinct(c_bh_aj) from db_znspgl.t_zlglpt_wt where d_cjrq between '20160913' and '20170909' ) t;
- 查看执行计划
znspgl=# explain analyze select count(*) from (select distinct(c_bh_aj) from db_znspgl.t_zlglpt_wt where d_cjrq between '20160913' and '20170909' ) t; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=347567.23..347567.24 rows=1 width=0) (actual time=6954.845..6954.846 rows=1 loops=1) -> Unique (cost=0.56..343310.31 rows=340554 width=33) (actual time=0.034..5969.209 rows=1322165 loops=1) -> Index Only Scan using i_zlglpt_wt_zh01 on t_zlglpt_wt (cost=0.56..336784.36 rows=2610380 width=33) (actual time=0.031..2840.502 rows=2644330 loops=1) Index Cond: ((d_cjrq >= '2016-09-13'::date) AND (d_cjrq <= '2017-09-09'::date)) Heap Fetches: 0 Planning time: 0.172 ms Execution time: 6954.890 ms(7 rows)
- 通过timing 计算SQL执行时间
znspgl=# \timing onTiming is on.znspgl=# select count(*) from (select distinct(c_bh_aj) from db_znspgl.t_zlglpt_wt where d_cjrq between '20160913' and '20170909' ) t; count --------- 1322165(1 row)Time: 1322.715 ms
总结
通过伪loose index scan的SQL处理可以有效提高count(distinct)的执行速度!
阅读全文
0 0
- Postgresql数据库count(distinct)优化
- Hive SQL优化之 Count Distinct
- Hive SQL优化之 Count Distinct
- sql优化之:count(distinct xxxx)
- Hive SQL优化之 Count Distinct
- count(distinct)
- Postgresql distinct/ distinct on
- Sql优化(二) 快速计算Distinct Count
- 【SQL优化】使用子查询可提升 COUNT DISTINCT
- SQL优化(二) 快速计算Distinct Count
- AS COUNT,Having ,DISTINCT,ALL 在数据库中的使用
- distinct() vs count()
- SQL(select distinct count)
- 处理 Distinct Count
- Mysql中count(*),DISTINCT
- SQL COUNT DISTINCT 函数
- mongodb count,distinct,group
- hive count distinct
- patch文件如何合并到源码目录中
- Unity_Unity的出现的Bug_087
- AndroidStudio导出jar包
- python: 切片,迭代,列表生成器,生成器,迭代器
- LeetCode 284. Peeking Iterator--实现Iterator接口的hashNext和next方法,来实现peek方法
- Postgresql数据库count(distinct)优化
- 根据iTOP-4412驱动实验手册实验03Menuconfig_Kconfig,P59配置内核,输入命令make menuconfig出错???
- 【菜鸟级】H5多人网络在线格斗游戏简单Demo
- Golang请求网页返回数据乱码处理一例
- git命令之git tag 给当前分支打标签
- unity的scriptableObject的使用
- tabbar视图控制器之间的切换
- MFC禁止对话框的拖动
- Android事件传递