Presto查询性能测试记录

来源:互联网 发布:essential phone 知乎 编辑:程序博客网 时间:2024/05/21 22:56
一、测试环境
HDFS 七个节点,五个dn,两个nn。Hive配置在其中一个nn,使用MySQL远程元数据库,同一个节点同时开启hiveserver2和 metastore。所有节点内存一致,30GB,CPU为Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz (x86_64)。所有节点通通过两台真实物理机虚拟(使用XenServer虚拟)。

Presto配置

二、样本数据
数据存储在HDFS,格式为parquet,存在分区。每条记录如下:
     id |  <tm, k1..k350, t351..t500>
根据id分区,分区数为1000。字段类型:id int, tm String, kn 为int, tn 为double。
按业务场景,每秒每个分区(设备)产生一条数据,共501个字段(500个测点+timestamp),每天每分区产生172800条数据。共有两天的数据,样本总容量为172800*1000*2=345600000。

三、Hive表建立
(1)创建内部表
     create table wt_csv_newtb (tm string,
         k1 int, k2 int .. k350 int, t351 double,
          t352 double .. t500 double)
     partitioned by (id int) stored as parquet;

(2)创建1000分区
分别执行1000个分区:
     alter table wt_csv_newtb add partition (id = x);
x 取值{1,2,3 .. 1000}

(3)数据创建
使用程序生成数据,并导入此表中,此处略。

四、测试查询
1、数据完整性校验
SQL语句      select count(*) from wt_csv_newtb where id = <dev_id>
序号
id
查询时间(秒)
1
1

2
560

3
2

4
100

5
999

以上所有测试都在毫秒时间内完成,时间很短,无法测试。

SQL语句      select count(*) from wt_csv_newtb;  -- 全分区查询

查询4次:
34秒
21秒
19秒
17秒

2、全量数据多维度查询
SQL语句     SELECT t380 FROM wt_csv_newtb WHERE t451>999.9994 and id>23 and id<(23+100) order by t380;
测试三次(结果集大小为9):
8秒
6秒
3.98秒

3、全量数据多维度聚合查询
(1)
SQL语句     select SUM(k200),AVG(t400),AVG(t400+3),AVG(t400+5) FROM wt_csv_newtb WHERE t451>999.88 and id IN (select id from table_r where wfid = ‘3’) group by id
2分27秒 (11风机)

(2)
SQL语句     select SUM(k200),AVG(t400),AVG(t400+3),AVG(t400+5) FROM wt_csv_newtb WHERE t451>999.88
and id IN (select id from table_r where wfid = ‘5’) group by id
1分55秒 (106风机)
分别执行两个查询:
子查询时间很短,此处不记。
SQL语句同    select SUM(k200),AVG(t400),AVG(t400+3),AVG(t400+5) FROM wt_csv_newtb WHERE t451>999.88 and id between 119 and 224 group by id
12秒 (106风机)

(3)
SQL语句     select sum(t489), wt_csv_newtb.id from wt_csv_newtb join table_r on wt_csv_newtb.id=table_r.id where t489>110 and t442>77 and T378>162 and T410<67 and T351<125 group by wt_csv_newtb.id
3分33秒(1000个结果)
3分52秒(1000个结果)

select sum(t489), wt_csv_newtb.id, table_r.wfid as wfid from wt_csv_newtb join table_r on wt_csv_newtb.id=table_r.id where t489>110 and t442>77 and T378>162 and T410<67 and T351<125 group by wt_csv_newtb.id, table_r.wfid;
3分13秒(1007个结果)
3分15秒(1007结果)

4、其他复杂的SQL查询
SQL语句     select N.id from ( select A.id, A.t360, B.t400 from wt_csv_newtb as A join wt_csv_newtb as B on A.id = B.id where A.tm > '2017-04-02 09:30:00' and A.tm < '2017-04-02 18:30:00' and B.tm > '2017-04-03 10:30:00' and B.tm < '2017-04-03 19:30:00' ) as N where N.t360 > 999.30 and N.t400 > 983.60
3分1秒(0个结果)

SQL语句     select N.id from ( select A.id, A.t360, B.t400 from wt_csv_newtb as A join wt_csv_newtb as B on A.id = B.id where A.tm > '2017-04-02 09:30:00' and A.tm < '2017-04-02 18:30:00' and B.tm > '2017-04-02 20:30:00' and B.tm < '2017-04-03 05:30:00' ) as N where N.t360 > 800 and N.t400 > 800;
2分36秒(0个结果)

SQL语句     select N.id from ( select A.id, A.t360, B.t400 from wt_csv_newtb as A join wt_csv_newtb as B on A.id = B.id where A.tm > '2017-04-02 09:30:00' and A.tm < '2017-04-02 18:30:00' and B.tm > '2017-04-02 20:30:00' and B.tm < '2017-04-03 05:30:00' ) as N where N.t360 > 100 and N.t400 > 400;
2分32秒(0结果)

SQL语句     select N.id from ( select A.id, A.t360, B.t400 from wt_csv_newtb as A join wt_csv_newtb as B on A.id = B.id where A.tm > '2017-04-02 09:30:00' and A.tm < '2017-04-02 18:30:00' and B.tm > '2017-04-02 20:30:00' and B.tm < '2017-04-03 05:30:00' ) as N where N.t360 > 100 and N.t400 > 100;

SQL语句     select N.id from ( select A.id, A.t360, B.t400 from wt_csv_newtb as A join wt_csv_newtb as B on A.id = B.id where A.tm > '2017-04-02 09:30:00' and A.tm < '2017-04-02 18:30:00' and B.tm > '2017-04-02 20:30:00' and B.tm < '2017-04-03 05:30:00' ) as N where N.t360 > 199.999 and N.t400 > 199.980;
2分03秒(490结果)

SQL语句     select N.id from ( select A.id, A.t360, B.t400 from wt_csv_newtb as A join wt_csv_newtb as B on A.id = B.id where A.tm > '2017-04-02 09:00:00' and A.tm < '2017-04-02 18:00:00' and B.tm > '2017-04-02 20:28:00' and B.tm < '2017-04-03 05:28:00' ) as N where N.t360 > 199.999 and N.t400 > 199.980;
2分6秒(499结果)

5、排序测试
SQL     select T380 from wt_csv_newtb where tm > '2017-04-02 08:00:00' and tm < '2017-04-02 20:00:00' order by T380 limit 1000;
59秒(1000结果)

SQL     select T380 from wt_csv_newtb where tm > '2017-04-02 09:00:00' and tm < '2017-04-02 21:00:00' order by T380 limit 10;
57秒(10结果)

SQL     select T380 from wt_csv_newtb where tm > '2017-04-03 08:00:00' and tm < '2017-04-03 20:00:00' order by T380 limit 1000;
50秒(1000结果)
原创粉丝点击