hive功能 Cube, Rollup介绍
来源:互联网 发布:mac log4j的输出路径 编辑:程序博客网 时间:2024/06/02 01:48
GROUPING SETS
GROUPING SETS作为GROUP BY的子句,允许开发人员在GROUP BY语句后面指定多个统计选项,可以简单理解为多条group by语句通过union all把查询结果聚合起来结合起来,下面是几个实例可以帮助我们了解,
以acorn_3g.test_xinyan_reg为例:
[dp@YZSJHL19-87 xjob]$ hive -e <span class="code-quote" style="color: rgb(0, 145, 0); background-color: inherit;">"use acorn_3g;desc test_xinyan_reg;"</span>user_id bigint None device_id <span class="code-object" style="color: rgb(145, 0, 145); background-color: inherit;">int</span> None 手机,平板 os_id <span class="code-object" style="color: rgb(145, 0, 145); background-color: inherit;">int</span> None 操作系统类型 app_id <span class="code-object" style="color: rgb(145, 0, 145); background-color: inherit;">int</span> None 手机app_id client_version string None 客户端版本 from_id <span class="code-object" style="color: rgb(145, 0, 145); background-color: inherit;">int</span> None 四级渠道
几个demo帮助大家了解:
SELECT device_id,null,null,count(user_id) FROM test_xinyan_reg group by device_idselect device_id,os_id,app_id,count(user_id) from test_xinyan_reg group by device_id,os_id,app_id grouping sets((device_id,os_id))SELECT device_id,os_id,null,count(user_id) FROM test_xinyan_reg group by device_id,os_idselect device_id,os_id,app_id,count(user_id) from test_xinyan_reg group by device_id,os_id,app_id grouping sets((device_id,os_id),(device_id))SELECT device_id,os_id,null,count(user_id) FROM test_xinyan_reg group by device_id,os_id
UNION ALL
SELECT device_id,null,null,count(user_id) FROM test_xinyan_reg group by device_idselect device_id,os_id,app_id,count(user_id) from test_xinyan_reg group by device_id,os_id,app_id grouping sets((device_id),(os_id),(device_id,os_id),())SELECT device_id,null,null,count(user_id) FROM test_xinyan_reg group by device_id
UNION ALL
SELECT null,os_id,null,count(user_id) FROM test_xinyan_reg group by os_id
UNION ALL
SELECT device_id,os_id,null,count(user_id) FROM test_xinyan_reg group by device_id,os_id
UNION ALL
SELECT null,null,null,count(user_id) FROM test_xinyan_reg
CUBE函数
cube简称数据魔方,可以实现hive多个任意维度的查询,cube(a,b,c)则首先会对(a,b,c)进行group by,然后依次是(a,b),(a,c),(a),(b,c),(b),(c),最后在对全表进行group by,他会统计所选列中值的所有组合的聚合
select device_id,os_id,app_id,client_version,from_id,count(user_id) from test_xinyan_reg group by device_id,os_id,app_id,client_version,from_id with cube;
手工实现需要写的hql语句(写个程序自己生成的,手写累死):
SELECT device_id,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span> ,count(user_id) FROM test_xinyan_reg group by device_idUNION ALLSELECT <span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,os_id,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span> ,count(user_id) FROM test_xinyan_reg group by os_idUNION ALLSELECT device_id,os_id,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span> ,count(user_id) FROM test_xinyan_reg group by device_id,os_idUNION ALLSELECT <span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,app_id,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span> ,count(user_id) FROM test_xinyan_reg group by app_idUNION ALLSELECT device_id,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,app_id,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span> ,count(user_id) FROM test_xinyan_reg group by device_id,app_idUNION ALLSELECT <span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,os_id,app_id,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span> ,count(user_id) FROM test_xinyan_reg group by os_id,app_idUNION ALLSELECT device_id,os_id,app_id,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span> ,count(user_id) FROM test_xinyan_reg group by device_id,os_id,app_idUNION ALLSELECT <span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,client_version,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span> ,count(user_id) FROM test_xinyan_reg group by client_versionUNION ALLSELECT device_id,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,client_version,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span> ,count(user_id) FROM test_xinyan_reg group by device_id,client_versionUNION ALLSELECT <span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,os_id,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,client_version,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span> ,count(user_id) FROM test_xinyan_reg group by os_id,client_versionUNION ALLSELECT device_id,os_id,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,client_version,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span> ,count(user_id) FROM test_xinyan_reg group by device_id,os_id,client_versionUNION ALLSELECT <span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,app_id,client_version,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span> ,count(user_id) FROM test_xinyan_reg group by app_id,client_versionUNION ALLSELECT device_id,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,app_id,client_version,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span> ,count(user_id) FROM test_xinyan_reg group by device_id,app_id,client_versionUNION ALLSELECT <span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,os_id,app_id,client_version,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span> ,count(user_id) FROM test_xinyan_reg group by os_id,app_id,client_versionUNION ALLSELECT device_id,os_id,app_id,client_version,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span> ,count(user_id) FROM test_xinyan_reg group by device_id,os_id,app_id,client_versionUNION ALLSELECT <span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,from_id ,count(user_id) FROM test_xinyan_reg group by from_idUNION ALLSELECT device_id,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,from_id ,count(user_id) FROM test_xinyan_reg group by device_id,from_idUNION ALLSELECT <span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,os_id,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,from_id ,count(user_id) FROM test_xinyan_reg group by os_id,from_idUNION ALLSELECT device_id,os_id,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,from_id ,count(user_id) FROM test_xinyan_reg group by device_id,os_id,from_idUNION ALLSELECT <span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,app_id,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,from_id ,count(user_id) FROM test_xinyan_reg group by app_id,from_idUNION ALLSELECT device_id,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,app_id,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,from_id ,count(user_id) FROM test_xinyan_reg group by device_id,app_id,from_idUNION ALLSELECT <span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,os_id,app_id,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,from_id ,count(user_id) FROM test_xinyan_reg group by os_id,app_id,from_idUNION ALLSELECT device_id,os_id,app_id,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,from_id ,count(user_id) FROM test_xinyan_reg group by device_id,os_id,app_id,from_idUNION ALLSELECT <span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,client_version,from_id ,count(user_id) FROM test_xinyan_reg group by client_version,from_idUNION ALLSELECT device_id,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,client_version,from_id ,count(user_id) FROM test_xinyan_reg group by device_id,client_version,from_idUNION ALLSELECT <span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,os_id,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,client_version,from_id ,count(user_id) FROM test_xinyan_reg group by os_id,client_version,from_idUNION ALLSELECT device_id,os_id,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,client_version,from_id ,count(user_id) FROM test_xinyan_reg group by device_id,os_id,client_version,from_idUNION ALLSELECT <span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,app_id,client_version,from_id ,count(user_id) FROM test_xinyan_reg group by app_id,client_version,from_idUNION ALLSELECT device_id,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,app_id,client_version,from_id ,count(user_id) FROM test_xinyan_reg group by device_id,app_id,client_version,from_idUNION ALLSELECT <span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,os_id,app_id,client_version,from_id ,count(user_id) FROM test_xinyan_reg group by os_id,app_id,client_version,from_idUNION ALLSELECT device_id,os_id,app_id,client_version,from_id ,count(user_id) FROM test_xinyan_reg group by device_id,os_id,app_id,client_version,from_idUNION ALLSELECT <span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span>,<span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span> ,count(user_id) FROM test_xinyan_reg
看着很蛋疼是不是,体会到cube的强大了吗!(低版本hive可以通过union all方式解决,算是没有办法的办法)
ROLL UP函数
rollup可以实现从右到做递减多级的统计,显示统计某一层次结构的聚合。
select device_id,os_id,app_id,client_version,from_id,count(user_id) from test_xinyan_reg group by device_id,os_id,app_id,client_version,from_id with rollup;
等价以下sql语句:
select device_id,os_id,app_id,client_version,from_id,count(user_id) from test_xinyan_reg group by device_id,os_id,app_id,client_version,from_id grouping sets ((device_id,os_id,app_id,client_version,from_id),(device_id,os_id,app_id,client_version),(device_id,os_id,app_id),(device_id,os_id),(device_id),());
Grouping_ID函数
当我们没有统计某一列时,它的值显示为null,这可能与列本身就有null值冲突,这就需要一种方法区分是没有统计还是值本来就是null。(写一个排列组合的算法,就马上理解了,grouping_id其实就是所统计各列二进制和)
直接拿官方文档一个例子,O(∩_∩)O哈哈~
1
2
2
3
3
3
NULL
4
5
hql统计:
SELECT key, value, GROUPING__ID, count(*) from T1 GROUP BY key, value WITH ROLLUP
统计结果如下:
GROUPING__ID转变为二进制,如果对应位上有值为null,说明这列本身值就是null。(通过类DataFilterNull.py 扫描,可以筛选过滤掉列中null、“”统计结果),
窗口函数
hive窗口函数,感觉大部分都是在模仿oracle,有对oracle熟悉的,应该看下就知道怎么用。
具体参见:http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/ds_Hive/language_manual/ptf-window.html
主要围绕..over( partitoin by ..) ..
3g业务求新增激活时候,有的一部手机,可能注册多个渠道,这时候就要按时间顺序求第一个:
select f.udid,f.from_id,f.ins_date from (select /* +MAPJOIN(u) */ u.device_id as udid ,g.device_id as gdid,u.from_id,u.ins_date,row_number() over (partition by u.device_id order by u.ins_date asc) as row_number from user_device_info u left <span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">outer</span> join (select device_id from 3g_device_id where log_date<'2013-07-25') g on ( u.device_id = g.device_id ) where u.log_date='2013-07-25' and u.device_id is not <span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span> and u.device_id <> '') f where f.gdid is <span class="code-keyword" style="color: rgb(0, 0, 145); background-color: inherit;">null</span> and row_number=1
参考资料
apache hive窗口函数官方介绍:http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/ds_Hive/language_manual/ptf-window.html
apache hive官方:cube、rollup函数介绍:https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup
oracle窗口函数介绍:http://www.blogjava.net/pengpenglin/archive/2012/04/12/211334.html
- hive功能 Cube, Rollup介绍
- hive新功能 Cube, Rollup介绍
- Hive 0.10 支持cube rollup
- Hive Cube RollUP 分组问题
- ORACLE ROLLUP和CUBE介绍
- hive Enhanced Aggregation, Cube, Grouping and Rollup
- 【Oracle】ROLLUP和CUBE的“小计”功能
- GROUPING SETS,ROLLUP,CUBE用法介绍
- Rollup & Cube
- ROLLUP&CUBE
- ROLLUP & CUBE
- HIVE分析窗口函数: GROUPING SETS,GROUPING__ID,CUBE,ROLLUP
- Hive分析窗口函数(五) GROUPING SETS,GROUPING__ID,CUBE,ROLLUP
- Hive分析窗口函数(五) GROUPING SETS,GROUPING__ID,CUBE,ROLLUP
- Hive分析窗口函数(五) GROUPING SETS,GROUPING__ID,CUBE,ROLLUP
- Hive分析窗口函数之GROUPING SETS,CUBE和ROLLUP
- Hive分析窗口函数(五) GROUPING SETS,GROUPING__ID,CUBE,ROLLUP
- Hive分析窗口函数(五) GROUPING SETS,GROUPING__ID,CUBE,ROLLUP
- 程序员的自我修养--链接,装载,库
- 采用HttpURLConnection的GET方式实现登陆案例
- 在php中调用接口以及编写接口
- 误删电脑配置信息还原
- 关于权限的问题,记录一下
- hive功能 Cube, Rollup介绍
- JDK配置 笔记
- 顺序表
- 读书总结-linux系统的信号机制
- 浏览器的渲染原理简介
- 2016-9-11 CCF第四题
- C++11 并发指南一(C++11 多线程初探)
- 会计算的calc()
- Raptor实践参考:求和