HiveSQL 进阶指南
来源:互联网 发布:逆回购 知乎 编辑:程序博客网 时间:2024/06/06 08:38
SQL进阶指南
数据准备
环境:
hadoop 2.4.1 +
hive 1.2.1+
create table tmp.yws_example as select 1 as showing_id,'A' as broker_id ,'c001' as cust_id,'h001' as housedel_id ,1 as biz_type ,20160101 as start_time union all select 2 as showing_id,'A' as broker_id ,'c002' as cust_id,'h001' as housedel_id ,1 as biz_type ,20160105 as start_time union all select 3 as showing_id,'A' as broker_id ,'c003' as cust_id,'h002' as housedel_id ,2 as biz_type ,20160108 as start_time union all select 4 as showing_id,'A' as broker_id ,'c005' as cust_id,'h002' as housedel_id ,2 as biz_type ,20160209 as start_time union all select 5 as showing_id,'A' as broker_id ,'c001' as cust_id,'h003' as housedel_id ,1 as biz_type ,20160218 as start_time union all select 6 as showing_id,'A' as broker_id ,'c004' as cust_id,'h004' as housedel_id ,2 as biz_type ,20160214 as start_time union all select 7 as showing_id,'A' as broker_id ,'c008' as cust_id,'h004' as housedel_id ,2 as biz_type ,20160307 as start_time union all select 8 as showing_id,'A' as broker_id ,'c007' as cust_id,'h001' as housedel_id ,1 as biz_type ,20160306 as start_time union all select 9 as showing_id,'A' as broker_id ,'c004' as cust_id,'h001' as housedel_id ,1 as biz_type ,20160329 as start_time union all select 10 as showing_id,'A' as broker_id ,'c002' as cust_id,'h002' as housedel_id ,2 as biz_type ,20160411 as start_time union all select 11 as showing_id,'A' as broker_id ,'c002' as cust_id,'h002' as housedel_id ,2 as biz_type ,20160419 as start_time union all select 12 as showing_id,'A' as broker_id ,'c002' as cust_id,'h003' as housedel_id ,1 as biz_type ,20160425 as start_time union all select 13 as showing_id,'B' as broker_id ,'c009' as cust_id,'h002' as housedel_id ,2 as biz_type ,20160523 as start_time union all select 14 as showing_id,'B' as broker_id ,'c005' as cust_id,'h002' as housedel_id ,2 as biz_type ,20160724 as start_time union all select 15 as showing_id,'B' as broker_id ,'c007' as cust_id,'h003' as housedel_id ,1 as biz_type ,20160829 as start_time union all select 16 as showing_id,'B' as broker_id ,'c006' as cust_id,'h004' as housedel_id ,2 as biz_type ,20160917 as start_time union all select 17 as showing_id,'B' as broker_id ,'c007' as cust_id,'h001' as housedel_id ,1 as biz_type ,20160323 as start_time union all select 18 as showing_id,'B' as broker_id ,'c009' as cust_id,'h001' as housedel_id ,1 as biz_type ,20160422 as start_time union all select 19 as showing_id,'B' as broker_id ,'c004' as cust_id,'h003' as housedel_id ,1 as biz_type ,20161011 as start_time union all select 20 as showing_id,'B' as broker_id ,'c003' as cust_id,'h002' as housedel_id ,2 as biz_type ,20161112 as start_time union all select 21 as showing_id,'B' as broker_id ,'c001' as cust_id,'h002' as housedel_id ,2 as biz_type ,20161213 as start_time union all select 22 as showing_id,'B' as broker_id ,'c007' as cust_id,'h003' as housedel_id ,1 as biz_type ,20160530 as start_time union all select 23 as showing_id,'B' as broker_id ,'c006' as cust_id,'h001' as housedel_id ,1 as biz_type ,20160809 as start_time union all select 24 as showing_id,'B' as broker_id ,'c002' as cust_id,'h001' as housedel_id ,1 as biz_type ,20160907 as start_time ;
CASE WHEN
例子
select case when biz_type=1 then '买卖' when biz_type=2 then '租赁' else '未知' end as case_name ,if(biz_type=1,'买卖',if(biz_type=2,'租赁','未知')) as if_name ,count(1) ,sum(1) ,max(start_time) ,min(start_time) -- avgfrom tmp.yws_examplegroup by case when biz_type=1 then '买卖' when biz_type=2 then '租赁' else '未知' end ,if(biz_type=1,'买卖',if(biz_type=2,'租赁','未知'))
CASE WHEN与IF()\NVL()比较
1、case when 语法中when 语句可以多次使用,最后可以使用else ,类似if --else if --else if --else . 2、if()可以写成与case when 等价。但条理性比较差,不容易组织代码 3、nvl() 更简单,但功能有限。完全可以使用case when /if()替换
常见用法
1、编码转换/异常转换 —- 如上例代码,不多表述
2、连续数据离散化
举例如房子面积来分小/中/大户型select case when build_area>0 and build_area<90 then '小户型' when build_area>=90 and build_area<144 then '中户型' when build_area>=144 then '大户型' else '未知' end as build_type_name .....from tab_namewhere pt='20170424000000'and dp in ('4','5','6')
3、分类合并
-- 22:本科(实习生),50:本科(统招),40:专科(统招),30:大专以下-- ,60:硕士研究生,45:45-本科,17:专科(非统招),70:博士研究生-- ,20:硕士研究生(实习生),10:初中及以下,18:专科(实习生),07:未知,A:未知,''::未知select case when highest_degree in (22,50,45) then '本科' when highest_degree in (40,17,18) then '专科' when highest_degree in (60,20) then '硕士研究生' when highest_degree=70 then '博士研究生' else '其它' end as degree_type_name .......from table_name where pt='20170424000000'
4、case when 经常与聚合函数配合使用,实现分支计算
select pt ,sum(case when request_url regexp '/ershoufang/' then 1 end) ershoufang_pv ,sum(case when request_url regexp '^http://xx.faxxxng.xxxxx.cxxxom/' then 1 end) xinfang_pv ,sum(case when request_url regexp '/zufang/' then 1 end ) zufang_pv .......from log_table_name where pt='20170424000000'group by pt
5、与聚合函数配合使用,实现行转列功能
select broker_id ,max(case when biz_type=1 then start_time else 10000101 end ) ,min(case when biz_type=1 then start_time else 99990101 end ) ,max(case when biz_type=2 then start_time else 10000101 end ) ,min(case when biz_type=2 then start_time else 99990101 end )from tmp.yws_examplegroup by broker_id
窗口函数
URL: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics
LEAD/LAG–相对取值
ead 向后取指定位置列值,如果没有则填默认值
lag 与lead向反,是向前取值
lead(取值列,窗口内偏移量,默认值)
lag(取值列,窗口内偏移量,默认值)
over(partition by 分窗口的列 order by 排序列 )
first_value/last_value–绝对取值
count/sum/min/max/avg–相对聚合
分析函数
RANK/ROW_NUMBER/DENSE_RANK
cume_dist/percent_rank/ntile
聚合函数、case when 与row_number结合
列拆行 行合列
参考URL:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
注意:
如果要求合并的字段顺序和原来拆分时相同就得写自定义函数了
数据立方体 rollup/cube/grouping sets
参考链接:
https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C+Grouping+and+Rollup
注意:上图中多个group by 同时只能存在一个
0 0
- HiveSQL 进阶指南
- HiveSQL学习
- hiveSQL记录
- HiveSQL备忘
- HiveSQL-get_json_object
- hiveSQL操作
- 开发人员进阶指南..........
- 构图进阶指南
- ACM进阶指南
- ACM进阶指南
- Acm进阶指南
- ACM进阶指南
- ACM进阶指南
- ACM进阶指南
- 7 MongoDB进阶指南
- ACM进阶指南
- ACM进阶指南
- ACM进阶训练指南
- ZOJ 3474 Taekwondo
- CSS font-family常见中文字体对应的英文名称
- 【笔试面试】简单总结笔试和面试中的海量数据问题
- 解决高德地图the map must have a size 错误
- android http post提交数据
- HiveSQL 进阶指南
- 内核并发竞态解决机制
- Android---keycode
- 跟小博老师一起学习数据库 ——JDBC-上篇
- hadoop日志
- JAVA-练习MyShopping升级菜单切换
- Chrome学习笔记(二):UI组件,皮肤引擎
- linux下shell脚本学习
- hadoop三种启动方式