Hive经排序后查询前N条记录

来源:互联网 发布:大智慧 cyc 源码 编辑:程序博客网 时间:2024/05/14 15:22

基础知识:

hive的排序查询的语法是

row_number() over (partition by 字段a order by 计算项b desc ) rank 

--这里rank是别名

partition by:类似hive的建表,分区的意思;

order by :排序,默认是升序,加desc降序;

这里按字段a分区,对计算项b进行降序排序

row_number() 可以给每一行一个行号,而后面的over是给点行号的条件

参考

hive 分组+组内排序 , 求topN - 幽灵工作室 - 博客频道 - CSDN.NET  http://blog.csdn.net/u010670689/article/details/49337137

实例一:
相当于按muid分组,然后同muid组内按muid,time排序
select channel,muid,time, row_number() over (partition by muid order by muid,time desc ) rank
from dc_dev.tmp_row_num
group by channel,muid,time
实例二:
相当于按muid分组,然后同muid组内按time排序
select channel,muid,time, row_number() over (partition by channel order by time desc ) rank
from dc_dev.tmp_row_num
group by channel,muid,time
实例三:
相当于按channel,muid分组,然后同channel,muid组内按time排序
select channel,muid,time, row_number() over (partition by channel,muid order by time desc ) rank
from dc_dev.tmp_row_num
group by channel,muid,time



如上图所示是我们需要查询的信息

WITH CON AS (SELECT * FROM CONVERT)
INSERT OVERWRITE LOCAL DIRECTORY "/home/hadoop/zhu/test/2065-t"
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
SELECT C.V,C1.V,T2.S3B,T2.G1,T2.S6,T2.S11,T2.S12,T2.ATTIMG1,T2.JU,R.S4,CASE FM 
WHEN -1 THEN "中性"
WHEN 0 THEN "正面"
WHEN 1 THEN "负面"
END AS FM FROM(  
SELECT T1.* FROM( 
SELECT PID,FM,S2,S3B,G1,S6,S11,S12,ATTIMG1,Ju,cBRDINT3,ROW_NUMBER() OVER (PARTITION BY PID ORDER BY PID,FM DESC) RANK 
FROM (
SELECT  S2,S3B,G1,S6,S11,S12,ATTIMG1,Ju,cBRDINT3,PID,(CASE FM
WHEN "0" THEN -1
WHEN "1" THEN 1
WHEN "2" THEN 0
END)AS FM FROM  Car_Data_BCATRSTest
LATERAL VIEW explode(BRDINT3) tBRDINT3 AS cBRDINT3
WHERE FM IS NOT NULL AND S9=1 AND array_contains(BRDINT3,"2065")=true) T0
GROUP BY PID,FM)T1
WHERE T1.RANK<=1)T2
JOIN CON C ON (concat(upper("brdint3"),cBRDINT3)=C.K)
JOIN CON C1 ON (concat(upper("s2"),T2.S2)=C1.K)
JOIN Car_Data_RawSTRS R ON (T2.PID=R.ROWKEY);







0 0
原创粉丝点击