hive 分组topN
来源:互联网 发布:秋之风腊肠淘宝 编辑:程序博客网 时间:2024/06/05 01:51
Time taken: 0.008 seconds, Fetched: 44 row(s)
hive> show create table jxl_report;
OK
CREATE TABLE `jxl_report`(
`id` bigint COMMENT '主键',
。。。
`user_name` string COMMENT '用户名',
`phone_no` string COMMENT '用户手机号',
`create_by` bigint COMMENT '创建用户',
`update_by` bigint COMMENT '修改用户',
`valid` boolean,
`create_time` string COMMENT '创建时间',
`row_key` bigint,
`cid` bigint COMMENT '进件ID')
COMMENT '报告详情'
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://nwdservice/user/hive/warehouse/dataimport.db/jxl_report'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true',
'numFiles'='18',
'numRows'='4589964',
'rawDataSize'='5764994784',
'totalSize'='613551501',
'transient_lastDdlTime'='1512378856')
Time taken: 0.064 seconds, Fetched: 34 row(s)
hive> show create table jxl_report;
OK
CREATE TABLE `jxl_report`(
`id` bigint COMMENT '主键',
。。。
`user_name` string COMMENT '用户名',
`phone_no` string COMMENT '用户手机号',
`create_by` bigint COMMENT '创建用户',
`update_by` bigint COMMENT '修改用户',
`valid` boolean,
`create_time` string COMMENT '创建时间',
`row_key` bigint,
`cid` bigint COMMENT '进件ID')
COMMENT '报告详情'
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://nwdservice/user/hive/warehouse/dataimport.db/jxl_report'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true',
'numFiles'='18',
'numRows'='4589964',
'rawDataSize'='5764994784',
'totalSize'='613551501',
'transient_lastDdlTime'='1512378856')
Time taken: 0.064 seconds, Fetched: 34 row(s)
# row_number() over
create table tmp_distinct_rpt as select id from ( select *,row_number() over (partition by cid order by report_update_time desc ) as od from jxl_report ) t1 where od <=1;
#rank() over
select *, rank() over (partition by sub order by score) as od from t;
create table tmp_distinct_rpt as select id from ( select *,rank() over (partition by cid order by report_update_time desc ) as od from jxl_report ) t1 where od <=1;
create table tmp_distinct_rpt as select id from ( select *,dense_ran() over (partition by cid order by report_update_time desc ) as od from jxl_report ) t1 where od <=1;
安装cid 分组,按照时间获取报告!
参考:http://www.mamicode.com/info-detail-849458.html
阅读全文
0 0
- Hive TopN+分组TopN
- Hive TopN+分组TopN
- hive 分组topN
- hive 分组+组内排序 , 求topN
- hive 分组+组内排序 , 求topN
- Spark--分组TopN
- mysql 分组topN
- hive分组排序函数-row_number() over (partition by * order by d topN
- MongoDB系列之分组topN
- Spark Scala TopN分组排序
- 使用RDD解决spark TopN问题:分组、排序、取TopN
- Spark核心编程-分组取topN
- spark中实现分组取topN
- Java实现GroupBy/分组TopN功能
- sparksql分组后topN(JAVA)
- Spark Scala 分组排序取TopN
- Spark Java 分组排序取TopN
- hive使用rank实现topN的查询
- 缓存
- Linux-3.10.1内核编译安装
- ThinkPHP5常用数据操作
- [转]wchar_t char std::string std::wstring CString 转换
- 史上最简单的SpringCloud教程 | 第二篇: 服务消费者(rest+ribbon)
- hive 分组topN
- 正则表达式
- C#常见错误解决方法
- 算法笔记 //12_最长公共子序列问题(动态规划算法)
- Maven的下载以及安装步骤方法
- JDK安装后的环境配置
- mysql多字段排序
- 在linux中Samba使用配置
- POI读取excel简单例子