Presto查询性能测试记录

来源：互联网发布：essential phone 知乎编辑：程序博客网时间：2024/05/21 22:56

一、测试环境

HDFS 七个节点，五个dn，两个nn。Hive配置在其中一个nn，使用MySQL远程元数据库，同一个节点同时开启hiveserver2和 metastore。所有节点内存一致，30GB，CPU为Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz （x86_64）。所有节点通通过两台真实物理机虚拟（使用XenServer虚拟）。

Presto配置

二、样本数据

数据存储在HDFS，格式为parquet，存在分区。每条记录如下：

id | <tm, k1..k350, t351..t500>

根据id分区，分区数为1000。字段类型：id int， tm String， kn 为int， tn 为double。

按业务场景，每秒每个分区（设备）产生一条数据，共501个字段（500个测点+timestamp），每天每分区产生172800条数据。共有两天的数据，样本总容量为172800*1000*2=345600000。

三、Hive表建立

（1）创建内部表

create table wt_csv_newtb (tm string,

k1 int, k2 int .. k350 int, t351 double,

t352 double .. t500 double)

partitioned by (id int) stored as parquet;

（2）创建1000分区

分别执行1000个分区：

alter table wt_csv_newtb add partition (id = x);

x 取值{1,2,3 .. 1000}

（3）数据创建

使用程序生成数据，并导入此表中，此处略。

四、测试查询

1、数据完整性校验

SQL语句 select count(*) from wt_csv_newtb where id = <dev_id>

序号
id
查询时间（秒）
1
1

2
560

3
2

4
100

5
999

以上所有测试都在毫秒时间内完成，时间很短，无法测试。

SQL语句 select count(*) from wt_csv_newtb; -- 全分区查询

查询4次：

34秒

21秒

19秒

17秒

2、全量数据多维度查询

SQL语句 SELECT t380 FROM wt_csv_newtb WHERE t451>999.9994 and id>23 and id<(23+100) order by t380;

测试三次（结果集大小为9）：

8秒

6秒

3.98秒

3、全量数据多维度聚合查询

（1）

SQL语句 select SUM(k200),AVG(t400),AVG(t400+3),AVG(t400+5) FROM wt_csv_newtb WHERE t451>999.88 and id IN (select id from table_r where wfid = ‘3’) group by id

2分27秒（11风机）

（2）

SQL语句 select SUM(k200),AVG(t400),AVG(t400+3),AVG(t400+5) FROM wt_csv_newtb WHERE t451>999.88

and id IN (select id from table_r where wfid = ‘5’) group by id

1分55秒（106风机）

分别执行两个查询：

子查询时间很短，此处不记。

SQL语句同 select SUM(k200),AVG(t400),AVG(t400+3),AVG(t400+5) FROM wt_csv_newtb WHERE t451>999.88 and id between 119 and 224 group by id

12秒（106风机）

（3）

SQL语句 select sum(t489), wt_csv_newtb.id from wt_csv_newtb join table_r on wt_csv_newtb.id=table_r.id where t489>110 and t442>77 and T378>162 and T410<67 and T351<125 group by wt_csv_newtb.id

3分33秒（1000个结果）

3分52秒（1000个结果）

select sum(t489), wt_csv_newtb.id, table_r.wfid as wfid from wt_csv_newtb join table_r on wt_csv_newtb.id=table_r.id where t489>110 and t442>77 and T378>162 and T410<67 and T351<125 group by wt_csv_newtb.id, table_r.wfid;

3分13秒（1007个结果）

3分15秒（1007结果）

4、其他复杂的SQL查询

SQL语句 select N.id from ( select A.id, A.t360, B.t400 from wt_csv_newtb as A join wt_csv_newtb as B on A.id = B.id where A.tm > '2017-04-02 09:30:00' and A.tm < '2017-04-02 18:30:00' and B.tm > '2017-04-03 10:30:00' and B.tm < '2017-04-03 19:30:00' ) as N where N.t360 > 999.30 and N.t400 > 983.60

3分1秒（0个结果）

SQL语句 select N.id from ( select A.id, A.t360, B.t400 from wt_csv_newtb as A join wt_csv_newtb as B on A.id = B.id where A.tm > '2017-04-02 09:30:00' and A.tm < '2017-04-02 18:30:00' and B.tm > '2017-04-02 20:30:00' and B.tm < '2017-04-03 05:30:00' ) as N where N.t360 > 800 and N.t400 > 800;

2分36秒（0个结果）

SQL语句 select N.id from ( select A.id, A.t360, B.t400 from wt_csv_newtb as A join wt_csv_newtb as B on A.id = B.id where A.tm > '2017-04-02 09:30:00' and A.tm < '2017-04-02 18:30:00' and B.tm > '2017-04-02 20:30:00' and B.tm < '2017-04-03 05:30:00' ) as N where N.t360 > 100 and N.t400 > 400;

2分32秒（0结果）

SQL语句 select N.id from ( select A.id, A.t360, B.t400 from wt_csv_newtb as A join wt_csv_newtb as B on A.id = B.id where A.tm > '2017-04-02 09:30:00' and A.tm < '2017-04-02 18:30:00' and B.tm > '2017-04-02 20:30:00' and B.tm < '2017-04-03 05:30:00' ) as N where N.t360 > 100 and N.t400 > 100;

SQL语句 select N.id from ( select A.id, A.t360, B.t400 from wt_csv_newtb as A join wt_csv_newtb as B on A.id = B.id where A.tm > '2017-04-02 09:30:00' and A.tm < '2017-04-02 18:30:00' and B.tm > '2017-04-02 20:30:00' and B.tm < '2017-04-03 05:30:00' ) as N where N.t360 > 199.999 and N.t400 > 199.980;

2分03秒（490结果）

SQL语句 select N.id from ( select A.id, A.t360, B.t400 from wt_csv_newtb as A join wt_csv_newtb as B on A.id = B.id where A.tm > '2017-04-02 09:00:00' and A.tm < '2017-04-02 18:00:00' and B.tm > '2017-04-02 20:28:00' and B.tm < '2017-04-03 05:28:00' ) as N where N.t360 > 199.999 and N.t400 > 199.980;

2分6秒（499结果）

5、排序测试

SQL select T380 from wt_csv_newtb where tm > '2017-04-02 08:00:00' and tm < '2017-04-02 20:00:00' order by T380 limit 1000;

59秒（1000结果）

SQL select T380 from wt_csv_newtb where tm > '2017-04-02 09:00:00' and tm < '2017-04-02 21:00:00' order by T380 limit 10;

57秒（10结果）

SQL select T380 from wt_csv_newtb where tm > '2017-04-03 08:00:00' and tm < '2017-04-03 20:00:00' order by T380 limit 1000;

50秒（1000结果）

阅读全文

0 0