hive中的分析函数的典型应用

来源:互联网 发布:lol老是网络断开连接 编辑:程序博客网 时间:2024/06/05 20:21

大家:

  好!今天看到了一个hive的分析函数的题,感觉很有意思。把答案整理了整理,分享出来,希望对大家有用。需求如下所示:


   起初我看到这道题时,感觉应该要用到分析函数,但就不知道怎么用。好在最终,写出来了,思路如下所示:

----测试表的表结构以及数据如下所示:
hive> desc sales;
OK
id                   int                                    
produce_name         string                                 
start_time           date                                   
end_time             date                                   
days                 int                                    
Time taken: 0.354 seconds, Fetched: 5 row(s)
hive> select * from sales;
OK
1 nike 2011-09-01 2011-09-055
2 nike 2011-09-03 2011-09-064
3 nike 2011-09-09 2011-09-157
4 oppo 2011-08-04 2011-08-052
5 oppo 2011-08-04 2011-08-1512
6 vivo 2011-08-15 2011-08-217
7 vivo 2011-09-02 2011-09-1211
Time taken: 0.223 seconds, Fetched: 7 row(s)




---第一步:求出每个开始时间的前一次促销的开始时间和结束时间
select id,produce_name,start_time,end_time,
lag(start_time) over(partition by produce_name order by id) before_start_time,
lag(end_time) over(partition by produce_name order by id) before_end_time
from sales;
---第一步:执行结果(第一步)
1 nike 2011-09-01 2011-09-05NULL NULL
2 nike 2011-09-03 2011-09-062011-09-01 2011-09-05
3 nike 2011-09-09 2011-09-152011-09-03 2011-09-06
4 oppo 2011-08-04 2011-08-05NULL NULL
5 oppo 2011-08-04 2011-08-152011-08-04 2011-08-05
6 vivo 2011-08-15 2011-08-21NULL NULL
7 vivo 2011-09-02 2011-09-122011-08-15 2011-08-21




---第二步  依据前一次促销的开始时间和结束时间,开始时间统一为最早的,为后面分组做准备
select produce_name,start_time,max(end_time) end_time from 
(select id,produce_name,case when start_time>=before_start_time and start_time<=before_end_time then before_start_time else start_time end as start_time, end_time
from (select id,produce_name,start_time,end_time,
lag(start_time) over(partition by produce_name order by id) before_start_time,
lag(end_time) over(partition by produce_name order by id) before_end_time
from sales) t) d
group by produce_name,start_time;


--第二步 执行结果(第二步)
nike 2011-09-012011-09-06
nike 2011-09-092011-09-15
oppo 2011-08-042011-08-15
vivo 2011-08-152011-08-21
vivo 2011-09-022011-09-12




---第三步  依据合并后的开始时间,算出每个时间段内的促销天数之和
select produce_name,start_time,end_time,datediff(end_time,start_time)+1 days from
(select produce_name,start_time,max(end_time) end_time from 
(select id,produce_name,case when start_time>=before_start_time and start_time<=before_end_time then before_start_time else start_time end as start_time, end_time
from (select id,produce_name,start_time,end_time,
lag(start_time) over(partition by produce_name order by id) before_start_time,
lag(end_time) over(partition by produce_name order by id) before_end_time
from sales) t) d
group by produce_name,start_time) e;


--执行结果(第三步)
nike 2011-09-012011-09-06 6
nike 2011-09-092011-09-15 7
oppo 2011-08-042011-08-15 12
vivo 2011-08-152011-08-21 7
vivo 2011-09-022011-09-12 11




--第四步:依据产品名称,求出最终的促销天数之和
select produce_name,sum(days) from
(select produce_name,start_time,end_time,datediff(end_time,start_time)+1 days from
(select produce_name,start_time,max(end_time) end_time from 
(select id,produce_name,case when start_time>=before_start_time and start_time<=before_end_time then before_start_time else start_time end as start_time, end_time
from (select id,produce_name,start_time,end_time,
lag(start_time) over(partition by produce_name order by id) before_start_time,
lag(end_time) over(partition by produce_name order by id) before_end_time
from sales) t) d
group by produce_name,start_time) e) f
group by produce_name;


---执行结果(第四步)
nike 13
oppo 12
vivo 18


说明:  个人的见解,希望对大家有帮助!


原创粉丝点击