GreenPlum简单性能测试与分析--续

来源:互联网 发布:mysql中格式化时间 编辑:程序博客网 时间:2024/05/29 14:19

之前对GreenPlum与Mysql进行了TPC-H类的对比测试,发现同等资源配比条件下,GreenPlum的性能远好于Mysql,有部分原因是得益于GreenPlum本身采用了更高效的算法,比如说做多表join时,采用的是hash join方式。如果采用同样高效的算法,两者的性能又如何?由于GreenPlum是由PostgreSQL演变而来,完全采用了PostgreSQL的优化算法,这次,我们将GreenPlum与PostgreSQL进行对比测试,在同等资源配比条件下,查看GreenPlum(分布式PostgreSQL)和单机版PostgreSQL的性能表现。

一.目的

  1. 比较在同等资源条件下具有分布式属性的GreenPlum与PostgreSQL在进行TPC-H类测试的性能区别。
  2. 分析和总结两种DB造成性能区别的原因。

二.测试环境与配置信息

测试环境:腾讯云
测试对象:GreenPlum、PostgreSQL,两者的配置信息统计如下:

表1 GreenPlum集群服务器

 Master HostSegment HostSegment Host操作系统CentOS 6.7 64位CentOS 6.7 64位CentOS 6.7 64位CPUIntel(R) Xeon(R) CPU E5-26xx v3 2核Intel(R) Xeon(R) CPU E5-26xx v3 2核Intel(R) Xeon(R) CPU E5-26xx v3 2核内存8GB8GB8GB公网带宽100Mbps100Mbps100MbpsIP123.207.228.40123.207.228.21123.207.85.105Segment数量022版本greenplum-db-4.3.8.1-build-1-RHEL5-x86_64greenplum-db-4.3.8.1-build-1-RHEL5-x86_64greenplum-db-4.3.8.1-build-1-RHEL5-x86_64

表2 PostgreSQL服务器

指标参数操作系统CentOS 6.7 64位cpuIntel(R) Xeon(R) CPU E5-26xx v3 8核内存24GB公网带宽100MbpsIP119.29.229.209版本PostgreSQL 9.5.4

三.测试结果与分析

1.总测试数据量为1G时
结果统计信息如下:

表3 总量为1GB时各测试表数据量统计

表名称数据条数customer150000lineitem6001215nation25orders1500000part200000partsupp800000region5supplier10000

表4 总量为1GB时22条sql执行时间统计

执行的sqlGeenPlum执行时间(单位:秒)PostgreSQL执行时间(单位:秒)Q14.0112.93Q20.500.62Q31.351.29Q40.110.52Q50.190.72Q60.010.79Q76.061.84Q81.460.59Q94.007.04Q100.142.19Q110.300.18Q120.082.15Q131.044.05Q140.040.42Q150.071.66Q160.510.80Q173.2123.07Q1814.235.86Q190.950.17Q200.163.10Q217.232.22Q220.960.28

分析:从以上的表4可以看出,PostgreSQL在22条sql中有8条sql的执行时间比GreenPlum少,接近一半的比例,我们直接放大10倍的测试数据量进行下一步测试。

2.总测试数据量为10G时
结果统计如下:

表5 总量为10GB时各测试表数据量统计

表名称数据条数customer1500000lineitem59986052nation25orders15000000part2000000partsupp8000000region5supplier100000

表6 总量为10GB时22条sql执行时间统计

执行的sqlGeenPlum执行时间(单位:秒)PostgreSQL执行时间(单位:秒)Q136.98130.61Q23.1017.08Q314.39117.83Q40.116.81Q50.20114.46Q60.0111.08Q780.1242.96Q86.6145.13Q949.72118.36Q100.1640.51Q112.283.06Q120.0821.47Q1319.2968.83Q140.0536.28Q150.0923.16Q166.3012.77Q17134.22127.79Q18168.03199.48Q196.251.96Q200.5452.10Q2184.68190.59Q2217.932.98

分析:放大数据量到10G后可以明显看出,PostgreSQL执行测试sql的时间大幅度增多,性能下降比较厉害,但仍有3条测试sql快于GreenPlum,我们选取其中一条对比查看下两者的性能区别原因。
这里我们以Q7为例,Greenplum的执行时间大约是PostgreSQL的两倍,Q7如下:

图1 Q7表示的sql语句

在PostgreSQL上执行explain Q7,得到结果如下:

图2 数据量为10G时PostgreSQL上执行explain Q7的结果

对执行进行分析,可以看出,整个过程最耗时的部分如上图红色框部分标识,对应的条件查询操作分别是:
1).在lineitem表上对l_shipdata字段按条件查询,因为在字段有索引,采用了高效的Bitmap索引查询(Bitmap索引查询分两步:1.建位图;2.扫表。详细了解可看http://kb.cnblogs.com/page/515258/ )。
2).lineitem和orders表hash join操作。
为了方便进一步分析,我们加上analyze参数,获取详细的执行时间,由于内容过多,这里只截取部分重要信息如下:

图3 数据量为10G时PostgreSQL上执行explain analyze Q7的部分结果

根据以上信息,我们可以得出这两部分操作的具体执行时间,但由于PostgreSQL采取多任务并行,因此,我们需要对每步操作计算出一个滞留时间(该时间段内系统只执行该步操作),缩短滞留时间可直接提升执行速度,每步的滞留时间为前步的结束时间与该步结束时间之差。两部分的滞留时间分别为:

1).Bitmap Heap Scan:20197-2233=17964ms
2).Hash join:42889-26200=16689ms

PostgreSQL执行Q7的总时间为42963ms,因此,可以印证系统的耗时主要集中在上述两步操作上。
接下来,我们在GreenPlum上执行explain Q7,结果如下:


图4 数据量为10G时GreenPlum上执行explain Q7的结果

与PostgreSQL不同的是,GreenPlum的耗时多了数据重分布部分。同样,我们通过analyze参数得到详细的执行时间如下:

图5 数据量为10G时GreenPlum上执行explain analyze Q7的部分结果

根据执行计划信息,选出耗时最长的三步操作,计算出在一个segment(耗时最长的)上这三部分的滞留时间为:
1).Scan lineitem: 6216ms
2).Redistribute: 36273ms
3).Hash join: 29885ms

GreenPlum执行Q7的总时间为80121ms,可见数据重分布的时间占据了整个执行时间的一半,进行Hash join操作的时间占比也较多,主要是segment的内存不足,引起了磁盘的IO。

小结:对比PostgreSQL和GreenPlum在Q7的执行计划,GreenPlum的耗时较多的原因主要是数据重分布的大量时间消耗和hash join时超出内存引起磁盘IO。虽然GreenPlum各segment并行扫lineitem表节省了时间,但占比较小,对总时间的消耗影响较小。

基于此,是否可以减少数据重分布操作的耗时占比?我们尝试进一步增加测试的数据量,比较10G的测试数据对于真实的OLAP场景还是过少,扩大5倍的测试量,继续查看耗时情况是否有所改变。

3. 总测试数据量为50G时
表7 总量为50GB时各测试表数据量统计

表名称数据条数customer7500000lineitem300005811nation25orders75000000part10000000partsupp40000000region5supplier500000

表8 总量为50GB时22条sql执行时间统计

执行的sqlGeenPlum执行时间(单位:秒)PostgreSQL执行时间(单位:秒)Q1212.27802.24Q216.53164.20Q3156.312142.18Q40.132934.76Q50.232322.92Q60.016439.26Q7535.6611906.74Q876.769171.83Q9313.91>26060.36Q100.411905.13Q117.7117.65Q120.19>3948.07Q13108.05354.59Q140.058054.72Q150.07>2036.03Q1634.74221.49Q17862.90>9010.56Q18913.973174.24Q19129.148666.38Q202.289389.21Q211064.67>26868.31Q2290.901066.44

分析:从结果表可明显看出,在22条SQL中,GreenPlum的执行效率都比PostgreSQL高出很多,我们还是以Q7为例,查看两种数据量下执行效率不一致的直接原因。

经过对执行计划的分析,发现区别还是集中在步骤2提到的几个部分,这里就不再重复给出整体的查询计划,直接查看耗时较多的部分如下:


图6 数据量为50G时PostgreSQL上执行explain analyze Q7的部分结果

图7 数据量为50G时GreenPlum上执行explain analyze Q7的部分结果

PostgreSQL的主要滞留时间有:
1).Bitmap Heap Scan: 9290197ms
2).Hash join: 713138ms

总执行时间为10219009ms,可见主要的耗时集中在Bitmap Heap Scan上,
GreenPlum的主要滞留时间有:
1).Scan lineitem: 130397ms
2).Redistribute: 140685ms
3).Hash join: 211456ms

总的执行时间为537134ms,相比步骤2的10G测试数据量,数据重分布的耗时占比明显下降,主要耗时已集中在hash join操作上。

GreenPlum和PostgreSQL在执行同样的wheret条件时,扫表的方式不一样,原因在于GreenPlum里的lineitem表为列存储,直接扫表更方便更快。

对比PostgreSQL两次的测试结果,发现Bitmao Heap Scan操作的性能下降比较明显,第一次扫18188314 行用时17秒,而第二次扫90522811行用时9190秒。

小结:增大数据量,会减少数据重分布耗时对整体执行时间的影响比重,主要耗时集中在内部数据的计算上。由于扫表涉及到磁盘IO,GreenPlum将扫表任务分割给多个segment同时进行,减少了单个节点要执行的扫表量,相当于并行IO操作,对整体的性能提升较大。

四.总结

通过对不同数据量(1G,10G,50G)的测试对比以及分析,可以看出,在TPC-H类的测试时,数据量越大,GreenPlum性能越好于单机版的PostgreSQL。由于GreenPlum采用分布式架构,为了实现各节点并行计算能力,需要在节点间进行广播或者数据重分布,对整体的性能有一定影响,当数据量较小时,计算量小,广播或者重分布耗时占总耗时比例大,影响整体的执行效率,可能会出现GreenPlum不如单机版PostgreSQL效率高;当数据量较大时,整体计算的量很大,广播或者重分布耗时不再是影响性能的关键因素,分布式属性的GreenPlum在关于复杂语句执行查询效率较高,原因在于,一是多节点同时进行计算(hash join、sort等),提升计算速度,且可以充分利用系统CPU资源;二是扫表时,将任务分派到多节点,减少了单个节点的IO次数,达到并行IO的目的,更适用于OLAP场景。

五.其他事项

  1. 由于原生的TPC-H的测试用例不直接支持GreenPlum和PostgreSQL,因此需要修改测试脚本,生成新的建表语句如《附录一》所示,测试sql如《附录二》。

  2. GreenPlum的数据导入可以使用GreenPlum自带的gpfdist工具,搭建多个gpfdsit文件服务器并行导入,但文件服务器的数量不能多于segment数量,这点官方文档并未说明。

附录一:建表语句

GreenPlum:BEGIN;        CREATE TABLE PART (                P_PARTKEY               SERIAL8,                P_NAME                  VARCHAR(55),                P_MFGR                  CHAR(25),                P_BRAND                 CHAR(10),                P_TYPE                  VARCHAR(25),                P_SIZE                  INTEGER,                P_CONTAINER             CHAR(10),                P_RETAILPRICE   DECIMAL,                P_COMMENT               VARCHAR(23)        ) with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (p_partkey);        COPY part FROM '/tmp/dss-data/part.csv' WITH csv DELIMITER '|';COMMIT;BEGIN;        CREATE TABLE REGION (                R_REGIONKEY     SERIAL8,                R_NAME          CHAR(25),                R_COMMENT       VARCHAR(152)        )  with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (r_regionkey);        COPY region FROM '/tmp/dss-data/region.csv' WITH csv DELIMITER '|';COMMIT;BEGIN;        CREATE TABLE NATION (                N_NATIONKEY             SERIAL8,                N_NAME                  CHAR(25),                N_REGIONKEY             BIGINT NOT NULL,  -- references R_REGIONKEY                N_COMMENT               VARCHAR(152)        )  with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (n_nationkey);        COPY nation FROM '/tmp/dss-data/nation.csv' WITH csv DELIMITER '|';COMMIT;BEGIN;        CREATE TABLE SUPPLIER (                S_SUPPKEY               SERIAL8,                S_NAME                  CHAR(25),                S_ADDRESS               VARCHAR(40),                S_NATIONKEY             BIGINT NOT NULL, -- references N_NATIONKEY                S_PHONE                 CHAR(15),                S_ACCTBAL               DECIMAL,                S_COMMENT               VARCHAR(101)        )  with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (s_suppkey);        COPY supplier FROM '/tmp/dss-data/supplier.csv' WITH csv DELIMITER '|';COMMIT;BEGIN;        CREATE TABLE CUSTOMER (                C_CUSTKEY               SERIAL8,                C_NAME                  VARCHAR(25),                C_ADDRESS               VARCHAR(40),                C_NATIONKEY             BIGINT NOT NULL, -- references N_NATIONKEY                C_PHONE                 CHAR(15),                C_ACCTBAL               DECIMAL,                C_MKTSEGMENT    CHAR(10),                C_COMMENT               VARCHAR(117)        )  with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (c_custkey);        COPY customer FROM '/tmp/dss-data/customer.csv' WITH csv DELIMITER '|';COMMIT;BEGIN;        CREATE TABLE PARTSUPP (                PS_PARTKEY              BIGINT NOT NULL, -- references P_PARTKEY                PS_SUPPKEY              BIGINT NOT NULL, -- references S_SUPPKEY                PS_AVAILQTY             INTEGER,                PS_SUPPLYCOST   DECIMAL,                PS_COMMENT              VARCHAR(199)        )  with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (ps_partkey,ps_suppkey);        COPY partsupp FROM '/tmp/dss-data/partsupp.csv' WITH csv DELIMITER '|';COMMIT;BEGIN;        CREATE TABLE ORDERS (                O_ORDERKEY              SERIAL8,                O_CUSTKEY               BIGINT NOT NULL, -- references C_CUSTKEY                O_ORDERSTATUS   CHAR(1),                O_TOTALPRICE    DECIMAL,                O_ORDERDATE             DATE,                O_ORDERPRIORITY CHAR(15),                O_CLERK                 CHAR(15),                O_SHIPPRIORITY  INTEGER,                O_COMMENT               VARCHAR(79)        )  with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (o_orderkey);        COPY orders FROM '/tmp/dss-data/orders.csv' WITH csv DELIMITER '|';COMMIT;BEGIN;        CREATE TABLE LINEITEM (                L_ORDERKEY              BIGINT NOT NULL, -- references O_ORDERKEY                L_PARTKEY               BIGINT NOT NULL, -- references P_PARTKEY (compound fk to PARTSUPP)                L_SUPPKEY               BIGINT NOT NULL, -- references S_SUPPKEY (compound fk to PARTSUPP)                L_LINENUMBER    INTEGER,                L_QUANTITY              DECIMAL,                L_EXTENDEDPRICE DECIMAL,                L_DISCOUNT              DECIMAL,                L_TAX                   DECIMAL,                L_RETURNFLAG    CHAR(1),                L_LINESTATUS    CHAR(1),                L_SHIPDATE              DATE,                L_COMMITDATE    DATE,                L_RECEIPTDATE   DATE,                L_SHIPINSTRUCT  CHAR(25),                L_SHIPMODE              CHAR(10),                L_COMMENT               VARCHAR(44)        )  with (APPENDONLY=true,BLOCKSIZE=2097152,ORIENTATION=COLUMN,CHECKSUM=true,OIDS=false) DISTRIBUTED BY (l_orderkey, l_linenumber);        COPY lineitem FROM '/tmp/dss-data/lineitem.csv' WITH csv DELIMITER '|';COMMIT;PostgreSQL:BEGIN;        CREATE TABLE PART (                P_PARTKEY               SERIAL,                P_NAME                  VARCHAR(55),                P_MFGR                  CHAR(25),                P_BRAND                 CHAR(10),                P_TYPE                  VARCHAR(25),                P_SIZE                  INTEGER,                P_CONTAINER             CHAR(10),                P_RETAILPRICE   DECIMAL,                P_COMMENT               VARCHAR(23)        );        COPY part FROM '/tmp/dss-data-copy/part.csv' WITH csv DELIMITER '|';COMMIT;BEGIN;        CREATE TABLE REGION (                R_REGIONKEY     SERIAL,                R_NAME          CHAR(25),                R_COMMENT       VARCHAR(152)        );        COPY region FROM '/tmp/dss-data-copy/region.csv' WITH (FORMAT csv, DELIMITER '|');COMMIT;BEGIN;        CREATE TABLE NATION (                N_NATIONKEY             SERIAL,                N_NAME                  CHAR(25),                N_REGIONKEY             BIGINT NOT NULL,  -- references R_REGIONKEY                N_COMMENT               VARCHAR(152)        );        COPY nation FROM '/tmp/dss-data-copy/nation.csv' WITH (FORMAT csv, DELIMITER '|');COMMIT;BEGIN;        CREATE TABLE SUPPLIER (                S_SUPPKEY               SERIAL,                S_NAME                  CHAR(25),                S_ADDRESS               VARCHAR(40),                S_NATIONKEY             BIGINT NOT NULL, -- references N_NATIONKEY                S_PHONE                 CHAR(15),                S_ACCTBAL               DECIMAL,                S_COMMENT               VARCHAR(101)        );        COPY supplier FROM '/tmp/dss-data-copy/supplier.csv' WITH (FORMAT csv, DELIMITER '|');COMMIT;BEGIN;        CREATE TABLE CUSTOMER (                C_CUSTKEY               SERIAL,                C_NAME                  VARCHAR(25),                C_ADDRESS               VARCHAR(40),                C_NATIONKEY             BIGINT NOT NULL, -- references N_NATIONKEY                C_PHONE                 CHAR(15),                C_ACCTBAL               DECIMAL,                C_MKTSEGMENT    CHAR(10),                C_COMMENT               VARCHAR(117)        );        COPY customer FROM '/tmp/dss-data-copy/customer.csv' WITH (FORMAT csv, DELIMITER '|');COMMIT;BEGIN;        CREATE TABLE PARTSUPP (                PS_PARTKEY              BIGINT NOT NULL, -- references P_PARTKEY                PS_SUPPKEY              BIGINT NOT NULL, -- references S_SUPPKEY                PS_AVAILQTY             INTEGER,                PS_SUPPLYCOST   DECIMAL,                PS_COMMENT              VARCHAR(199)        );        COPY partsupp FROM '/tmp/dss-data-copy/partsupp.csv' WITH (FORMAT csv, DELIMITER '|');COMMIT;BEGIN;        CREATE TABLE ORDERS (                O_ORDERKEY              SERIAL,                O_CUSTKEY               BIGINT NOT NULL, -- references C_CUSTKEY                O_ORDERSTATUS   CHAR(1),                O_TOTALPRICE    DECIMAL,                O_ORDERDATE             DATE,                O_ORDERPRIORITY CHAR(15),                O_CLERK                 CHAR(15),                O_SHIPPRIORITY  INTEGER,                O_COMMENT               VARCHAR(79)        );        COPY orders FROM '/tmp/dss-data-copy/orders.csv' WITH (FORMAT csv, DELIMITER '|');COMMIT;BEGIN;        CREATE TABLE LINEITEM (                L_ORDERKEY              BIGINT NOT NULL, -- references O_ORDERKEY                L_PARTKEY               BIGINT NOT NULL, -- references P_PARTKEY (compound fk to PARTSUPP)                L_SUPPKEY               BIGINT NOT NULL, -- references S_SUPPKEY (compound fk to PARTSUPP)                L_LINENUMBER    INTEGER,                L_QUANTITY              DECIMAL,                L_EXTENDEDPRICE DECIMAL,                L_DISCOUNT              DECIMAL,                L_TAX                   DECIMAL,                L_RETURNFLAG    CHAR(1),                L_LINESTATUS    CHAR(1),                L_SHIPDATE              DATE,                L_COMMITDATE    DATE,                L_RECEIPTDATE   DATE,                L_SHIPINSTRUCT  CHAR(25),                L_SHIPMODE              CHAR(10),                L_COMMENT               VARCHAR(44)        );        COPY lineitem FROM '/tmp/dss-data-copy/lineitem.csv' WITH (FORMAT csv, DELIMITER '|');COMMIT;

附录二:查询语句

Q1:-- using 1471398061 as a seed to the RNGselect        l_returnflag,        l_linestatus,        sum(l_quantity) as sum_qty,        sum(l_extendedprice) as sum_base_price,        sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,        sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,        avg(l_quantity) as avg_qty,        avg(l_extendedprice) as avg_price,        avg(l_discount) as avg_disc,        count(*) as count_orderfrom        lineitemwhere        l_shipdate <= date '1998-12-01' - interval '85' daygroup by        l_returnflag,        l_linestatusorder by        l_returnflag,        l_linestatusLIMIT 1;Q2:-- using 1471398061 as a seed to the RNGselect        s_acctbal,        s_name,        n_name,        p_partkey,        p_mfgr,        s_address,        s_phone,        s_commentfrom        part,        supplier,        partsupp,        nation,        regionwhere        p_partkey = ps_partkey        and s_suppkey = ps_suppkey        and p_size = 48        and p_type like '%STEEL'        and s_nationkey = n_nationkey        and n_regionkey = r_regionkey        and r_name = 'AFRICA'        and ps_supplycost = (                select                        min(ps_supplycost)                from                        partsupp,                        supplier,                        nation,                        region                where                        p_partkey = ps_partkey                        and s_suppkey = ps_suppkey                        and s_nationkey = n_nationkey                        and n_regionkey = r_regionkey                        and r_name = 'AFRICA'        )order by        s_acctbal desc,        n_name,        s_name,        p_partkeyLIMIT 100;Q3:-- using 1471398061 as a seed to the RNGselect        l_orderkey,        sum(l_extendedprice * (1 - l_discount)) as revenue,        o_orderdate,        o_shippriorityfrom        customer,        orders,        lineitemwhere        c_mktsegment = 'HOUSEHOLD'        and c_custkey = o_custkey        and l_orderkey = o_orderkey        and o_orderdate < date '1995-03-03'        and l_shipdate > date '1995-03-03'group by        l_orderkey,        o_orderdate,        o_shippriorityorder by        revenue desc,        o_orderdateLIMIT 10;Q4:-- using 1471398061 as a seed to the RNGselect        o_orderpriority,        count(*) as order_countfrom        orderswhere        o_orderdate >= date '1993-06-01'        and o_orderdate < date '1993-06-01' + interval '3' month        and exists (                select                        *                from                        lineitem                where                        l_orderkey = o_orderkey                        and l_commitdate < l_receiptdate        )group by        o_orderpriorityorder by        o_orderpriorityLIMIT 1;Q5:-- using 1471398061 as a seed to the RNGselect        n_name,        sum(l_extendedprice * (1 - l_discount)) as revenuefrom        customer,        orders,        lineitem,        supplier,        nation,        regionwhere        c_custkey = o_custkey        and l_orderkey = o_orderkey        and l_suppkey = s_suppkey        and c_nationkey = s_nationkey        and s_nationkey = n_nationkey        and n_regionkey = r_regionkey        and r_name = 'AMERICA'        and o_orderdate >= date '1993-01-01'        and o_orderdate < date '1993-01-01' + interval '1' yeargroup by        n_nameorder by        revenue descLIMIT 1;Q6:-- using 1471398061 as a seed to the RNGselect        sum(l_extendedprice * l_discount) as revenuefrom        lineitemwhere        l_shipdate >= date '1993-01-01'        and l_shipdate < date '1993-01-01' + interval '1' year        and l_discount between 0.02 - 0.01 and 0.02 + 0.01        and l_quantity < 24LIMIT 1;Q7:-- using 1471398061 as a seed to the RNGselect        supp_nation,        cust_nation,        l_year,        sum(volume) as revenuefrom        (                select                        n1.n_name as supp_nation,                        n2.n_name as cust_nation,                        extract(year from l_shipdate) as l_year,                        l_extendedprice * (1 - l_discount) as volume                from                        supplier,                        lineitem,                        orders,                        customer,                        nation n1,                        nation n2                where                        s_suppkey = l_suppkey                        and o_orderkey = l_orderkey                        and c_custkey = o_custkey                        and s_nationkey = n1.n_nationkey                        and c_nationkey = n2.n_nationkey                        and (                                (n1.n_name = 'BRAZIL' and n2.n_name = 'INDONESIA')                                or (n1.n_name = 'INDONESIA' and n2.n_name = 'BRAZIL')                        )                        and l_shipdate between date '1995-01-01' and date '1996-12-31'        ) as shippinggroup by        supp_nation,        cust_nation,        l_yearorder by        supp_nation,        cust_nation,        l_yearLIMIT 1;Q8:-- using 1471398061 as a seed to the RNGselect        o_year,        sum(case                when nation = 'INDONESIA' then volume                else 0        end) / sum(volume) as mkt_sharefrom        (                select                        extract(year from o_orderdate) as o_year,                        l_extendedprice * (1 - l_discount) as volume,                        n2.n_name as nation                from                        part,                        supplier,                        lineitem,                        orders,                        customer,                        nation n1,                        nation n2,                        region                where                        p_partkey = l_partkey                        and s_suppkey = l_suppkey                        and l_orderkey = o_orderkey                        and o_custkey = c_custkey                        and c_nationkey = n1.n_nationkey                        and n1.n_regionkey = r_regionkey                        and r_name = 'ASIA'                        and s_nationkey = n2.n_nationkey                        and o_orderdate between date '1995-01-01' and date '1996-12-31'                        and p_type = 'ECONOMY BURNISHED BRASS'        ) as all_nationsgroup by        o_yearorder by        o_yearLIMIT 1;Q9:-- using 1471398061 as a seed to the RNGselect        nation,        o_year,        sum(amount) as sum_profitfrom        (                select                        n_name as nation,                        extract(year from o_orderdate) as o_year,                        l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount                from                        part,                        supplier,                        lineitem,                        partsupp,                        orders,                        nation                where                        s_suppkey = l_suppkey                        and ps_suppkey = l_suppkey                        and ps_partkey = l_partkey                        and p_partkey = l_partkey                        and o_orderkey = l_orderkey                        and s_nationkey = n_nationkey                        and p_name like '%powder%'        ) as profitgroup by        nation,        o_yearorder by        nation,        o_year descLIMIT 1;-- using 1471398061 as a seed to the RNGQ10select        c_custkey,        c_name,        sum(l_extendedprice * (1 - l_discount)) as revenue,        c_acctbal,        n_name,        c_address,        c_phone,        c_commentfrom        customer,        orders,        lineitem,        nationwhere        c_custkey = o_custkey        and l_orderkey = o_orderkey        and o_orderdate >= date '1993-06-01'        and o_orderdate < date '1993-06-01' + interval '3' month        and l_returnflag = 'R'        and c_nationkey = n_nationkeygroup by        c_custkey,        c_name,        c_acctbal,        c_phone,        n_name,        c_address,        c_commentorder by        revenue descLIMIT 20;Q11-- using 1471398061 as a seed to the RNGselect        ps_partkey,        sum(ps_supplycost * ps_availqty) as valuefrom        partsupp,        supplier,        nationwhere        ps_suppkey = s_suppkey        and s_nationkey = n_nationkey        and n_name = 'PERU'group by        ps_partkey having                sum(ps_supplycost * ps_availqty) > (                        select                                sum(ps_supplycost * ps_availqty) * 0.0001000000                        from                                partsupp,                                supplier,                                nation                        where                                ps_suppkey = s_suppkey                                and s_nationkey = n_nationkey                                and n_name = 'PERU'                )order by        value descLIMIT 1;-- using 1471398061 as a seed to the RNGQ12select        l_shipmode,        sum(case                when o_orderpriority = '1-URGENT'                        or o_orderpriority = '2-HIGH'                        then 1                else 0        end) as high_line_count,        sum(case                when o_orderpriority <> '1-URGENT'                        and o_orderpriority <> '2-HIGH'                        then 1                else 0        end) as low_line_countfrom        orders,        lineitemwhere        o_orderkey = l_orderkey        and l_shipmode in ('REG AIR', 'RAIL')        and l_commitdate < l_receiptdate        and l_shipdate < l_commitdate        and l_receiptdate >= date '1993-01-01'        and l_receiptdate < date '1993-01-01' + interval '1' yeargroup by        l_shipmodeorder by        l_shipmodeLIMIT 1;-- using 1471398061 as a seed to the RNGQ13select        c_count,        count(*) as custdistfrom        (                select                        c_custkey,                        count(o_orderkey)                from                        customer left outer join orders on                                c_custkey = o_custkey                                and o_comment not like '%pending%packages%'                group by                        c_custkey        ) as c_orders (c_custkey, c_count)group by        c_countorder by        custdist desc,        c_count descQ14LIMIT 1;-- using 1471398061 as a seed to the RNGselect        100.00 * sum(case                when p_type like 'PROMO%'                        then l_extendedprice * (1 - l_discount)                else 0        end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenuefrom        lineitem,        partwhere        l_partkey = p_partkey        and l_shipdate >= date '1993-09-01'        and l_shipdate < date '1993-09-01' + interval '1' monthLIMIT 1;Q15-- using 1471398061 as a seed to the RNGcreate view revenue0 (supplier_no, total_revenue) as        select                l_suppkey,                sum(l_extendedprice * (1 - l_discount))        from                lineitem        where                l_shipdate >= date '1994-11-01'                and l_shipdate < date '1994-11-01' + interval '3' month        group by                l_suppkey;select        s_suppkey,        s_name,        s_address,        s_phone,        total_revenuefrom        supplier,        revenue0where        s_suppkey = supplier_no        and total_revenue = (                select                        max(total_revenue)                from                        revenue0        )order by        s_suppkeyLIMIT 1;Q16drop view revenue0;-- using 1471398061 as a seed to the RNGselect        p_brand,        p_type,        p_size,        count(distinct ps_suppkey) as supplier_cntfrom        partsupp,        partwhere        p_partkey = ps_partkey        and p_brand <> 'Brand#22'        and p_type not like 'STANDARD PLATED%'        and p_size in (34, 17, 18, 16, 15, 49, 1, 48)        and ps_suppkey not in (                select                        s_suppkey                from                        supplier                where                        s_comment like '%Customer%Complaints%'        )group by        p_brand,        p_type,        p_sizeorder by        supplier_cnt desc,        p_brand,        p_type,        p_sizeLIMIT 1;Q17:-- using 1471398061 as a seed to the RNGselect        sum(l_extendedprice) / 7.0 as avg_yearlyfrom        lineitem,        part,        (SELECT l_partkey AS agg_partkey, 0.2 * avg(l_quantity) AS avg_quantity FROM lineitem GROUP BY l_partkey) part_aggwhere        p_partkey = l_partkey        and agg_partkey = l_partkey        and p_brand = 'Brand#21'        and p_container = 'JUMBO JAR'        and l_quantity < avg_quantityLIMIT 1;Q18-- using 1471398061 as a seed to the RNGselect        c_name,        c_custkey,        o_orderkey,        o_orderdate,        o_totalprice,        sum(l_quantity)from        customer,        orders,        lineitemwhere        o_orderkey in (                select                        l_orderkey                from                        lineitem                group by                        l_orderkey having                                sum(l_quantity) > 312        )        and c_custkey = o_custkey        and o_orderkey = l_orderkeygroup by        c_name,        c_custkey,        o_orderkey,        o_orderdate,        o_totalpriceorder by        o_totalprice desc,        o_orderdateLIMIT 100;-- using 1471398061 as a seed to the RNGQ19select        sum(l_extendedprice* (1 - l_discount)) as revenuefrom        lineitem,        partwhere        (                p_partkey = l_partkey                and p_brand = 'Brand#42'                and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')                and l_quantity >= 7 and l_quantity <= 7 + 10                and p_size between 1 and 5                and l_shipmode in ('AIR', 'AIR REG')                and l_shipinstruct = 'DELIVER IN PERSON'        )        or        (                p_partkey = l_partkey                and p_brand = 'Brand#22'                and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')                and l_quantity >= 20 and l_quantity <= 20 + 10                and p_size between 1 and 10                and l_shipmode in ('AIR', 'AIR REG')                and l_shipinstruct = 'DELIVER IN PERSON'        )        or        (                p_partkey = l_partkey                and p_brand = 'Brand#25'                and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')                and l_quantity >= 21 and l_quantity <= 21 + 10                and p_size between 1 and 15                and l_shipmode in ('AIR', 'AIR REG')                and l_shipinstruct = 'DELIVER IN PERSON'        )LIMIT 1;Q20-- using 1471398061 as a seed to the RNGselect        s_name,        s_addressfrom        supplier,        nationwhere        s_suppkey in (                select                        ps_suppkey                from                        partsupp,                        (                                select                                        l_partkey agg_partkey,                                        l_suppkey agg_suppkey,                                        0.5 * sum(l_quantity) AS agg_quantity                                from                                        lineitem                                where                                        l_shipdate >= date '1994-01-01'                                        and l_shipdate < date '1994-01-01' + interval '1' year                                group by                                        l_partkey,                                        l_suppkey                        ) agg_lineitem                where                        agg_partkey = ps_partkey                        and agg_suppkey = ps_suppkey                        and ps_partkey in (                                select                                        p_partkey                                from                                        part                                where                                        p_name like 'forest%'                        )                        and ps_availqty > agg_quantity        )        and s_nationkey = n_nationkey        and n_name = 'FRANCE'order by        s_nameLIMIT 1;Q21-- using 1471398061 as a seed to the RNGselect        s_name,        count(*) as numwaitfrom        supplier,        lineitem l1,        orders,        nationwhere        s_suppkey = l1.l_suppkey        and o_orderkey = l1.l_orderkey        and o_orderstatus = 'F'        and l1.l_receiptdate > l1.l_commitdate        and exists (                select                        *                from                        lineitem l2                where                        l2.l_orderkey = l1.l_orderkey                        and l2.l_suppkey <> l1.l_suppkey        )        and not exists (                select                        *                from                        lineitem l3                where                        l3.l_orderkey = l1.l_orderkey                        and l3.l_suppkey <> l1.l_suppkey                        and l3.l_receiptdate > l3.l_commitdate        )        and s_nationkey = n_nationkey        and n_name = 'GERMANY'group by        s_nameorder by        numwait desc,        s_nameLIMIT 100;Q22-- using 1471398061 as a seed to the RNGselect        cntrycode,        count(*) as numcust,        sum(c_acctbal) as totacctbalfrom        (                select                        substring(c_phone from 1 for 2) as cntrycode,                        c_acctbal                from                        customer                where                        substring(c_phone from 1 for 2) in                                ('16', '10', '34', '26', '33', '18', '11')                        and c_acctbal > (                                select                                        avg(c_acctbal)                                from                                        customer                                where                                        c_acctbal > 0.00                                        and substring(c_phone from 1 for 2) in                                                ('16', '10', '34', '26', '33', '18', '11')                        )                        and not exists (                                select                                        *                                from                                        orders                                where                                        o_custkey = c_custkey                        )        ) as custsalegroup by        cntrycodeorder by        cntrycodeLIMIT 1;
0 0
原创粉丝点击