Hive分析窗口函数(一) SUM,AVG,MIN,MAX
来源:互联网 发布:数据迁移重要程度 编辑:程序博客网 时间:2024/05/29 19:08
Hive分析窗口函数(一) SUM,AVG,MIN,MAX
Hive中提供了越来越多的分析函数,用于完成负责的统计分析。抽时间将所有的分析窗口函数理一遍,将陆续发布。
今天先看几个基础的,SUM、AVG、MIN、MAX。
用于实现分组内所有和连续累积的统计。
Hive版本为 apache-hive-0.13.1
数据准备:
CREATE EXTERNAL TABLE lxw1234 (cookieid string,createtime string, --daypv INT) ROW FORMAT DELIMITEDFIELDS TERMINATED BY ','stored as textfile location '/tmp/lxw11/'; DESC lxw1234;cookieid STRINGcreatetime STRINGpv INT hive> select * from lxw1234;OKcookie1 2015-04-10 1cookie1 2015-04-11 5cookie1 2015-04-12 7cookie1 2015-04-13 3cookie1 2015-04-14 2cookie1 2015-04-15 4cookie1 2015-04-16 4
SUM:
注意,结果和ORDER BY相关,默认为升序
SELECT cookieid,createtime,pv,SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1 SUM(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行 FROM lxw1234;cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6 -----------------------------------------------------------------------------cookie1 2015-04-10 1 1 1 26 1 6 26cookie1 2015-04-11 5 6 6 26 6 13 25cookie1 2015-04-12 7 13 13 26 13 16 20cookie1 2015-04-13 3 16 16 26 16 18 13cookie1 2015-04-14 2 18 18 26 17 21 10cookie1 2015-04-15 4 22 22 26 16 20 8cookie1 2015-04-16 4 26 26 26 13 13 4pv1: 分组内从起点到当前行的pv累积,如,11号的pv1=10号的pv+11号的pv, 12号=10号+11号+12号pv2: 同pv1pv3: 分组内(cookie1)所有的pv累加pv4: 分组内当前行+往前3行,如,11号=10号+11号, 12号=10号+11号+12号, 13号=10号+11号+12号+13号, 14号=11号+12号+13号+14号pv5: 分组内当前行+往前3行+往后1行,如,14号=11号+12号+13号+14号+15号=5+7+3+2+4=21pv6: 分组内当前行+往后所有行,如,13号=13号+14号+15号+16号=3+2+4+4=13,14号=14号+15号+16号=2+4+4=10
如果不指定ROWS BETWEEN,默认为从起点到当前行;
如果不指定ORDER BY,则将分组内所有值累加;
关键是理解ROWS BETWEEN (WINDOW子句)含义:
PRECEDING:往前
FOLLOWING:往后
CURRENT ROW:当前行
UNBOUNDED:起点,UNBOUNDED PRECEDING 表示从前面的起点, UNBOUNDED FOLLOWING:表示到后面的终点。
--其他AVG,MIN,MAX,和SUM用法一样。
SELECT cookieid,createtime,pv,AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1 AVG(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行 FROM lxw1234; cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6 -----------------------------------------------------------------------------cookie1 2015-04-10 1 1.0 1.0 3.7142857142857144 1.0 3.0 3.7142857142857144cookie1 2015-04-11 5 3.0 3.0 3.7142857142857144 3.0 4.333333333333333 4.166666666666667cookie1 2015-04-12 7 4.333333333333333 4.333333333333333 3.7142857142857144 4.333333333333333 4.0 4.0cookie1 2015-04-13 3 4.0 4.0 3.7142857142857144 4.0 3.6 3.25cookie1 2015-04-14 2 3.6 3.6 3.7142857142857144 4.25 4.2 3.3333333333333335cookie1 2015-04-15 4 3.6666666666666665 3.6666666666666665 3.7142857142857144 4.0 4.0 4.0cookie1 2015-04-16 4 3.7142857142857144 3.7142857142857144 3.7142857142857144 3.25 3.25 4.0
SELECT cookieid,createtime,pv,MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1 MIN(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行 FROM lxw1234;cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6 -----------------------------------------------------------------------------cookie1 2015-04-10 1 1 1 1 1 1 1cookie1 2015-04-11 5 1 1 1 1 1 2cookie1 2015-04-12 7 1 1 1 1 1 2cookie1 2015-04-13 3 1 1 1 1 1 2cookie1 2015-04-14 2 1 1 1 2 2 2cookie1 2015-04-15 4 1 1 1 2 2 4cookie1 2015-04-16 4 1 1 1 2 2 4
SELECT cookieid,createtime,pv,MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1 MAX(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行 FROM lxw1234;cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6 -----------------------------------------------------------------------------cookie1 2015-04-10 1 1 1 7 1 5 7cookie1 2015-04-11 5 5 5 7 5 7 7cookie1 2015-04-12 7 7 7 7 7 7 7cookie1 2015-04-13 3 7 7 7 7 7 4cookie1 2015-04-14 2 7 7 7 7 7 4cookie1 2015-04-15 4 7 7 7 7 7 4cookie1 2015-04-16 4 7 7 7 4 4 4
转自: http://029bigdata.com/?tag=hive%E5%88%86%E6%9E%90%E5%87%BD%E6%95%B0
更多的分析函数,如:NTILE,ROW_NUMBER,RANK,DENSE_RANK
CUME_DIST,PERCENT_RANK
LAG,LEAD,FIRST_VALUE,LAST_VALUE
GROUPING SETS,GROUPING__ID,CUBE,ROLLUP
Hive中提供了越来越多的分析函数,用于完成负责的统计分析。抽时间将所有的分析窗口函数理一遍,将陆续发布。
今天先看几个基础的,SUM、AVG、MIN、MAX。
用于实现分组内所有和连续累积的统计。
Hive版本为 apache-hive-0.13.1
数据准备:
CREATE EXTERNAL TABLE lxw1234 (cookieid string,createtime string, --daypv INT) ROW FORMAT DELIMITEDFIELDS TERMINATED BY ','stored as textfile location '/tmp/lxw11/'; DESC lxw1234;cookieid STRINGcreatetime STRINGpv INT hive> select * from lxw1234;OKcookie1 2015-04-10 1cookie1 2015-04-11 5cookie1 2015-04-12 7cookie1 2015-04-13 3cookie1 2015-04-14 2cookie1 2015-04-15 4cookie1 2015-04-16 4
SUM:
注意,结果和ORDER BY相关,默认为升序
SELECT cookieid,createtime,pv,SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1 SUM(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行 FROM lxw1234;cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6 -----------------------------------------------------------------------------cookie1 2015-04-10 1 1 1 26 1 6 26cookie1 2015-04-11 5 6 6 26 6 13 25cookie1 2015-04-12 7 13 13 26 13 16 20cookie1 2015-04-13 3 16 16 26 16 18 13cookie1 2015-04-14 2 18 18 26 17 21 10cookie1 2015-04-15 4 22 22 26 16 20 8cookie1 2015-04-16 4 26 26 26 13 13 4pv1: 分组内从起点到当前行的pv累积,如,11号的pv1=10号的pv+11号的pv, 12号=10号+11号+12号pv2: 同pv1pv3: 分组内(cookie1)所有的pv累加pv4: 分组内当前行+往前3行,如,11号=10号+11号, 12号=10号+11号+12号, 13号=10号+11号+12号+13号, 14号=11号+12号+13号+14号pv5: 分组内当前行+往前3行+往后1行,如,14号=11号+12号+13号+14号+15号=5+7+3+2+4=21pv6: 分组内当前行+往后所有行,如,13号=13号+14号+15号+16号=3+2+4+4=13,14号=14号+15号+16号=2+4+4=10
如果不指定ROWS BETWEEN,默认为从起点到当前行;
如果不指定ORDER BY,则将分组内所有值累加;
关键是理解ROWS BETWEEN (WINDOW子句)含义:
PRECEDING:往前
FOLLOWING:往后
CURRENT ROW:当前行
UNBOUNDED:起点,UNBOUNDED PRECEDING 表示从前面的起点, UNBOUNDED FOLLOWING:表示到后面的终点。
--其他AVG,MIN,MAX,和SUM用法一样。
SELECT cookieid,createtime,pv,AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1 AVG(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行 FROM lxw1234; cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6 -----------------------------------------------------------------------------cookie1 2015-04-10 1 1.0 1.0 3.7142857142857144 1.0 3.0 3.7142857142857144cookie1 2015-04-11 5 3.0 3.0 3.7142857142857144 3.0 4.333333333333333 4.166666666666667cookie1 2015-04-12 7 4.333333333333333 4.333333333333333 3.7142857142857144 4.333333333333333 4.0 4.0cookie1 2015-04-13 3 4.0 4.0 3.7142857142857144 4.0 3.6 3.25cookie1 2015-04-14 2 3.6 3.6 3.7142857142857144 4.25 4.2 3.3333333333333335cookie1 2015-04-15 4 3.6666666666666665 3.6666666666666665 3.7142857142857144 4.0 4.0 4.0cookie1 2015-04-16 4 3.7142857142857144 3.7142857142857144 3.7142857142857144 3.25 3.25 4.0
SELECT cookieid,createtime,pv,MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1 MIN(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行 FROM lxw1234;cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6 -----------------------------------------------------------------------------cookie1 2015-04-10 1 1 1 1 1 1 1cookie1 2015-04-11 5 1 1 1 1 1 2cookie1 2015-04-12 7 1 1 1 1 1 2cookie1 2015-04-13 3 1 1 1 1 1 2cookie1 2015-04-14 2 1 1 1 2 2 2cookie1 2015-04-15 4 1 1 1 2 2 4cookie1 2015-04-16 4 1 1 1 2 2 4
SELECT cookieid,createtime,pv,MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1 MAX(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行 FROM lxw1234;cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6 -----------------------------------------------------------------------------cookie1 2015-04-10 1 1 1 7 1 5 7cookie1 2015-04-11 5 5 5 7 5 7 7cookie1 2015-04-12 7 7 7 7 7 7 7cookie1 2015-04-13 3 7 7 7 7 7 4cookie1 2015-04-14 2 7 7 7 7 7 4cookie1 2015-04-15 4 7 7 7 7 7 4cookie1 2015-04-16 4 7 7 7 4 4 4
转自: http://029bigdata.com/?tag=hive%E5%88%86%E6%9E%90%E5%87%BD%E6%95%B0
更多的分析函数,如:NTILE,ROW_NUMBER,RANK,DENSE_RANK
CUME_DIST,PERCENT_RANK
LAG,LEAD,FIRST_VALUE,LAST_VALUE
GROUPING SETS,GROUPING__ID,CUBE,ROLLUP
0 0
- Hive分析窗口函数(一) SUM,AVG,MIN,MAX
- Hive分析窗口函数(一) SUM,AVG,MIN,MAX
- Hive分析窗口函数(一) SUM,AVG,MIN,MAX
- Hive分析窗口函数(一) SUM,AVG,MIN,MAX
- Hive分析窗口函数(一) SUM,AVG,MIN,MAX
- Hive分析窗口函数(一) SUM,AVG,MIN,MAX
- Hive分析窗口函数(一) SUM,AVG,MIN,MAX
- Hive分析窗口函数(一) SUM,AVG,MIN,MAX
- Hive分析窗口函数(一) SUM,AVG,MIN,MAX
- HIVE分析窗口函数:SUM,AVG,MIN,MAX
- Hive分析窗口函数之SUM,AVG,MIN和MAX
- 连续求和分析函数max(...)/min(...)/avg(...)/sum(...) over ... ——分析函数1
- Oracle分析函数三——SUM,AVG,MIN,MAX,COUNT
- Oracle分析函数三——SUM,AVG,MIN,MAX,COUNT
- Oracle分析函数三——SUM,AVG,MIN,MAX,COUNT
- Oracle分析函数三——SUM,AVG,MIN,MAX,COUNT
- Oracle分析函数三——SUM,AVG,MIN,MAX,COUNT
- 常用集函数,count(),sum(),avg(),max(),min()
- Mysq创建触发器
- Android面试题-与XMPP相关试题一
- 欢迎使用CSDN-markdown编辑器
- linux学习笔记
- no python interpreter configured for the project
- Hive分析窗口函数(一) SUM,AVG,MIN,MAX
- TensorFlow入门(三)多层 CNNs 实现 mnist分类
- C++作业四
- javaFighting_小银考呀考不过四级
- Maven的pom.xml文件详解------Build Settings
- python调用matplotlib
- Android自定义View构造函数详解
- TCP(HTTP)长连接和短连接区别和怎样维护长连接
- Python使用pyquery抓取数据实例