Hive中order by,sort by,distribute by,cluster by的区别
来源:互联网 发布:手机房屋设计软件 编辑:程序博客网 时间:2024/06/05 17:10
Hive中常见的高级查询包括:group by、Order by、join、distribute by、sort by、cluster by、Union all。今天我们来看看order by操作,Order by表示按照某些字段排序,语法如下:
- select col,col2...
- from tableName
- where condition
- order by col1,col2 [asc|desc]
(1):order by后面可以有多列进行排序,默认按字典排序。
(2):order by为全局排序。
(3):order by需要reduce操作,且只有一个reduce,无法配置(因为多个reduce无法完成全局排序)。
order by操作会受到如下属性的制约:
- set hive.mapred.mode=nonstrict; (default value / 默认值)
- set hive.mapred.mode=strict;
下面我们通过一个示例来深入体会order by的用法:
数据库有一个employees表,数据如下:
- hive> select * from employees;
- OK
- lavimer 15000.0 ["li","lu","wang"] {"k1":1.0,"k2":2.0,"k3":3.0} {"street":"dingnan","city":"ganzhou","num":101} 2015-01-24 love
- liao 18000.0 ["liu","li","huang"] {"k4":2.0,"k5":3.0,"k6":6.0} {"street":"dingnan","city":"ganzhou","num":102} 2015-01-24 love
- zhang 19000.0 ["xiao","wen","tian"] {"k7":7.0,"k8":8.0,"k8":8.0} {"street":"dingnan","city":"ganzhou","num":103} 2015-01-24 love
现在我要按第二列(salary)降序排列:
- hive> select * from employees order by salary desc;
- //执行MapReduce的过程
- Job 0: Map: 1 Reduce: 1 Cumulative CPU: 2.62 sec HDFS Read: 415 HDFS Write: 245 SUCCESS
- Total MapReduce CPU Time Spent: 2 seconds 620 msec
- OK
- zhang 19000.0 ["xiao","wen","tian"] {"k7":7.0,"k8":8.0} {"street":"dingnan","city":"ganzhou","num":103} 2015-01-24 love
- liao 18000.0 ["liu","li","huang"] {"k4":2.0,"k5":3.0,"k6":6.0} {"street":"dingnan","city":"ganzhou","num":102} 2015-01-24 love
- lavimer 15000.0 ["li","lu","wang"] {"k1":1.0,"k2":2.0,"k3":3.0} {"street":"dingnan","city":"ganzhou","num":101} 2015-01-24 love
- Time taken: 20.484 seconds
- hive>
此时的hive.mapred.mode属性为:
- hive> set hive.mapred.mode;
- hive.mapred.mode=nonstrict
- hive>
现在我们将它改为strict,然后再使用order by进行查询:
- hive> set hive.mapred.mode=strict;
- hive> select * from employees order by salary desc;
- FAILED: Error in semantic analysis: 1:33 In strict mode, if ORDER BY is specified, LIMIT must also be specified. Error encountered near token 'salary'
- hive>
- hive> select * from employees order by salary desc limit 3;
- FAILED: Error in semantic analysis: No partition predicate found for Alias "employees" Table "employees"
先来看看分区:
- hive> show partitions employees;
- OK
- date_time=2015-01-24/type=love
- Time taken: 0.096 seconds
在strict模式先使用order by查询:
- hive> select * from employees where partition(date_time='2015-01-24',type='love') order by salary desc limit 3;
- FAILED: Parse Error: line 1:30 cannot recognize input near 'partition' '(' 'date_time' in expression specification
- hive
- > select * from employees where date_time='2015-01-24' and type='love' order by salary desc limit 3;
- //执行MapReduce程序
- Total MapReduce CPU Time Spent: 3 seconds 510 msec
- OK
- zhang 19000.0 ["xiao","wen","tian"] {"k7":7.0,"k8":8.0} {"street":"dingnan","city":"ganzhou","num":103} 2015-01-24 love
- liao 18000.0 ["liu","li","huang"] {"k4":2.0,"k5":3.0,"k6":6.0} {"street":"dingnan","city":"ganzhou","num":102} 2015-01-24 love
- lavimer 15000.0 ["li","lu","wang"] {"k1":1.0,"k2":2.0,"k3":3.0} {"street":"dingnan","city":"ganzhou","num":101} 2015-01-24 love
- Time taken: 19.861 seconds
- hive>
1 0
- Hive中order by,sort by,distribute by,cluster by
- hive中order by,distribute by,sort by,cluster by
- hive中order by,sort by,distribute by,Cluster By的区别
- Hive中order by,sort by,distribute by,cluster by的区别
- Hive中order by,sort by,distribute by,cluster by的区别
- hive中order by、distribute by、sort by和cluster by的区别和联系
- Hive中order by,sort by,distribute by,cluster by的区别
- Hive中order by,sort by,distribute by,cluster by的区别
- Hive中order by,sort by,distribute by,cluster by的区别
- Hive中order by、sort by、distribute by、cluster by的区别
- hive Sort By/Order By/Cluster By/Distribute By
- hive中的order by+sort by+distribute by+cluster by
- Hive Sort by/Order By/Cluster By/Distribute By
- hive sort by,order by ,distribute by,cluster by
- Hive order by/sort by/distribute by/cluster by作用
- hive中的order by , sort by, distribute by, cluster by
- Hive-2.HiveQL查询中ORDER BY 和SORT BY 语句|包含SORT BY 的DISTRIBUTE BY|CLUSTER BY
- hive 中的Sort By、 Order By、Cluster By、Distribute By 区别<转>
- 教你如何制作OS X Lion启动U盘与使用U盘安装系统
- CloudFoundry in 1 Box简介:Lattice篇
- java实现将毫秒数转化为yyyy-MM-dd格式的数据
- 如何让Activiti-Explorer使用sql server数据库
- MySQL查询--使用LEFT JOIN解决同一张表查询
- Hive中order by,sort by,distribute by,cluster by的区别
- 土地利用分类详细教程——以高分一号影像为例(上)
- Xcode插件安装与管理
- 最小生成树(prime算法、kruskal算法) 和 最短路径算法(floyd、dijkstra)
- 全面了解Activity
- poj1565-Skew Binary
- SIGPIPE导致进程终止
- 51nod1066bash游戏
- iOS UITextField 使用详解