Hive sql 使用group by 字段被限制使用 collect_set/collect_list处理
来源:互联网 发布:1password windows版 编辑:程序博客网 时间:2024/05/16 00:40
FAILED: SemanticException [Error 10025]: Line 1:7 Expression not in GROUP BY key 'userid'
userid被要求也处在group by分组字段里面。
这个不同于mysql语句,mysql这样写是没有问题的。
以下是处理方法;
hive> SELECT sequnce,actiontime,collect_set(pagecode),collect_set(actioncode) FROM T_BZ GROUP BY Sequnce ,ActionTime limit 100;Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1407387657227_0043, Tracking URL = http://n1.hadoop:8089/proxy/application_1407387657227_0043/
Kill Command = /app/prog/hadoop/bin/hadoop job -kill job_1407387657227_0043
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-08-07 20:07:12,881 Stage-1 map = 0%, reduce = 0%
2014-08-07 20:07:24,192 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 18.84 sec
2014-08-07 20:07:29,347 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 20.71 sec
MapReduce Total cumulative CPU time: 20 seconds 710 msec
Ended Job = job_1407387657227_0043
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 20.71 sec HDFS Read: 96397668 HDFS Write: 6969 SUCCESS
Total MapReduce CPU Time Spent: 20 seconds 710 msec
OK
00015a21-ef6d-4f05-b04e-ffd98fab2922 2014-07-24 01:20:33 [] ["A0001"]
00015a21-ef6d-4f05-b04e-ffd98fab2922 2014-07-24 01:20:37 ["P001"] ["A0001"]
00015a21-ef6d-4f05-b04e-ffd98fab2922 2014-07-24 01:20:45 ["P003","P001"] ["A0002","A0001"]
00015a21-ef6d-4f05-b04e-ffd98fab2922 2014-07-24 01:21:07 ["P003"] ["A0011"]
00015a21-ef6d-4f05-b04e-ffd98fab2922 2014-07-24 01:21:11 ["P003","P001"] ["A0017","A0001"]
00015a21-ef6d-4f05-b04e-ffd98fab2922 2014-07-24 01:21:13 ["P001","P002"] ["A0003","A0001"]
00015a21-ef6d-4f05-b04e-ffd98fab2922 2014-07-24 01:21:22 ["P002"] ["A0006"]
可以看到结果的一个集合。
当然如果不想得到集合,可以这样写,获取集合的第一个元素::
hive> SELECT sequnce,actiontime,collect_set(pagecode)[0],collect_set(actioncode)[0] FROM T_BZ GROUP BY Sequnce ,ActionTime limit 100;
00015a21-ef6d-4f05-b04e-ffd98fab2922 2014-07-24 01:20:33 A0001
00015a21-ef6d-4f05-b04e-ffd98fab2922 2014-07-24 01:20:37 P001 A0001
00015a21-ef6d-4f05-b04e-ffd98fab2922 2014-07-24 01:20:45 P003 A0002
00015a21-ef6d-4f05-b04e-ffd98fab2922 2014-07-24 01:21:07 P003 A0011
00015a21-ef6d-4f05-b04e-ffd98fab2922 2014-07-24 01:21:11 P003 A0017
00015a21-ef6d-4f05-b04e-ffd98fab2922 2014-07-24 01:21:13 P001 A0003
00015a21-ef6d-4f05-b04e-ffd98fab2922 2014-07-24 01:21:22 P002 A0006
这样的结果就和mysql一致了。
当然如果不想去重还可以使用collect_list处理,这两个函数是HIVE的UDF函数。
- Hive sql 使用group by 字段被限制使用 collect_set/collect_list处理
- hive:(group by, having;order by)的使用;group by+多个字段,以及wiki说的group by两种使用限制验证
- hive:(group by, having;order by)的使用;group by+多个字段,以及wiki说的group by两种使用限制验证
- Hive 的collect_set使用详解
- Hive 的collect_set使用详解
- Hive 的collect_set使用详解
- Hive的collect_set使用详解
- Hive 的collect_set使用详解
- SQL group by使用
- oracle 与ms sql对日期字段使用group by
- sql mysql group by使用
- sql,group by的使用
- SQL Group by的使用
- SQL中Group by使用
- sql group by 字段合并
- SQL 部分函数的使用,子查询,group by,虚拟字段,case……
- SQL group by语句的使用
- SQL select 使用 GROUP BY 分组介绍
- HDU 4911 Inversion(基本算法-排序,基本算法-分治)
- 如何在Eclipse的项目中引用其它项目
- macos eclipse报错
- 离线的并查集
- java 对象类型转换之父对象转化为子对象
- Hive sql 使用group by 字段被限制使用 collect_set/collect_list处理
- mysql int(X)含义
- hdu 3311 tsp问题
- Message Flood
- OC视频笔记-3
- POJ 3263 Tallest Cow(线段树)
- poj2446
- 二叉树遍历(前序,中序,后序)
- 一些题:编程之美 | CPU使用率