Hadoop基础教程-第11章 Hive:SQL on Hadoop(11.7 HQL:数据查询)(草稿)
来源:互联网 发布:中国古代名将 知乎 编辑:程序博客网 时间:2024/05/16 08:38
第11章 Hive:SQL on Hadoop
11.7 HQL:数据查询
本节操作使用默认default数据库。
hive> use default;OKTime taken: 0.035 secondshive>
11.7.1 limit 子句
hive> select * from emp limit 5;OK7782 CLARK MANAGER 7839 1981-06-09 2450.0 0.0 107839 KING PRESIDENT 0 1981-11-17 5000.0 0.0 107934 MILLER CLERK 7782 1982-01-23 1300.0 0.0 107369 SMITH CLERK 7902 1980-12-17 800.0 0.0 207566 JONES MANAGER 7839 1981-04-02 2975.0 0.0 20Time taken: 2.18 seconds, Fetched: 5 row(s)hive>
hive> select * from emp limit 1,3;OK7839 KING PRESIDENT 0 1981-11-17 5000.0 0.0 107934 MILLER CLERK 7782 1982-01-23 1300.0 0.0 107369 SMITH CLERK 7902 1980-12-17 800.0 0.0 20Time taken: 0.303 seconds, Fetched: 3 row(s)hive>
11.7.2 case when then 语句
hive> select eid,ename, > case > when did=10 then 'd10' > when did=20 then 'd20' > when did=30 then 'd30' > end > from emp;OK7782 CLARK d107839 KING d107934 MILLER d107369 SMITH d207566 JONES d207902 FORD d207499 ALLEN d307521 WARD d307654 MARTIN d307698 BLAKE d307844 TURNER d307900 JAMES d30Time taken: 0.469 seconds, Fetched: 12 row(s)hive>
11.7.3 where 子句
可以用or
和and
连接多个表达式。
A=B
基本类型 A等于B返回true,否则返回false A<>B
,A!=B
基本类型 A或B为null,返回null; A和B都不空时,A不等于B返回true,否则返回false A<B
基本类型 A或B为null,返回null; A小于B返回true,否则返回false A<=B
基本类型 A或B为null,返回null; A小于等于B返回true,否则返回false A>B
基本类型 A或B为null,返回null; A大于B返回true,否则返回false A>=B
基本类型 A或B为null,返回null; A大于等于B返回true,否则返回false A Between B And C 基本类型 筛选 A 的值处于 B 和 C 之间 A not between B and C 基本类型 筛选 A 的值不处于 B 和 C 之间 A is null 所有类型 如果A等于null,返回true,否则返回false。注意,不能使用A=null A is not null 所有类型 筛选 A 值不是null的 A like B string B是一个SQL正则表达式,%
表示一个或者多个字符,_
表示一个字符 A not like B string 与正则表达式B相反结果 A rlike B string 正则匹配hive> select * from emp > where did=30 and comm<>0.0;OK7499 ALLEN SALESMAN 7698 1981-02-20 1600.0 300.0 307521 WARD SALESMAN 7698 1981-02-22 1250.0 500.0 307654 MARTIN SALESMAN 7698 1981-09-28 1250.0 1400.0 30Time taken: 0.667 seconds, Fetched: 3 row(s)hive>
hive> select * from emp > where hiredate between '1981-01-01' and '1981-12-30';OK7782 CLARK MANAGER 7839 1981-06-09 2450.0 0.0 107839 KING PRESIDENT 0 1981-11-17 5000.0 0.0 107566 JONES MANAGER 7839 1981-04-02 2975.0 0.0 207902 FORD ANALYST 7566 1981-12-03 3000.0 0.0 207499 ALLEN SALESMAN 7698 1981-02-20 1600.0 300.0 307521 WARD SALESMAN 7698 1981-02-22 1250.0 500.0 307654 MARTIN SALESMAN 7698 1981-09-28 1250.0 1400.0 307698 BLAKE MANAGER 7839 1981-05-01 2850.0 0.0 307844 TURNER SALESMAN 7698 1981-09-08 1500.0 0.0 307900 JAMES CLERK 7698 1981-12-03 950.0 0.0 30Time taken: 0.311 seconds, Fetched: 10 row(s)hive>
11.7.4 group by 子句
【例】查询每个部门的人数
hive> select count(*) from emp > group by did;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = root_20170814101839_fe4cccbd-62c4-48cb-986b-a70f93c19535Total jobs = 1Launching Job 1 out of 1Number of reduce tasks not specified. Estimated from input data size: 1In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number>In order to set a constant number of reducers: set mapreduce.job.reduces=<number>Starting Job = job_1502717288397_0003, Tracking URL = http://node1:8088/proxy/application_1502717288397_0003/Kill Command = /opt/hadoop-2.7.3/bin/hadoop job -kill job_1502717288397_0003Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 12017-08-14 10:19:10,549 Stage-1 map = 0%, reduce = 0%2017-08-14 10:19:27,526 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.86 sec2017-08-14 10:19:42,984 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.84 secMapReduce Total cumulative CPU time: 6 seconds 840 msecEnded Job = job_1502717288397_0003MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 6.84 sec HDFS Read: 9667 HDFS Write: 129 SUCCESSTotal MapReduce CPU Time Spent: 6 seconds 840 msecOK336Time taken: 66.588 seconds, Fetched: 3 row(s)hive>
与Oracle数据库一样,如果使用group by子句,那么查询的字段如果没有出现在group by子句的后面,则必须使用聚合函数。
hive> select ename,avg(sal) from emp group by did;FAILED: SemanticException [Error 10025]: Line 1:7 Expression not in GROUP BY key 'ename'hive>
如果想对分组结果进行过滤,可以使用having子句
hive> select avg(sal) from emp > group by did > having avg(sal)>2000;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = root_20170814103035_2106518a-aab4-4cdd-9b39-7b08fb1a44b6Total jobs = 1Launching Job 1 out of 1Number of reduce tasks not specified. Estimated from input data size: 1In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number>In order to set a constant number of reducers: set mapreduce.job.reduces=<number>Starting Job = job_1502717288397_0007, Tracking URL = http://node1:8088/proxy/application_1502717288397_0007/Kill Command = /opt/hadoop-2.7.3/bin/hadoop job -kill job_1502717288397_0007Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 12017-08-14 10:31:08,601 Stage-1 map = 0%, reduce = 0%2017-08-14 10:31:25,888 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.88 sec2017-08-14 10:31:43,122 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 8.34 secMapReduce Total cumulative CPU time: 8 seconds 340 msecEnded Job = job_1502717288397_0007MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 8.34 sec HDFS Read: 10807 HDFS Write: 149 SUCCESSTotal MapReduce CPU Time Spent: 8 seconds 340 msecOK2916.66666666666652258.3333333333335Time taken: 70.202 seconds, Fetched: 2 row(s)hive>
11.7.5 内连接
(1)准备阶段
部门数据
[root@node3 data]# vi dept[root@node3 data]# cat dept10,ACCOUNTING,NEW YORK20,RESEARCH,DALLAS30,SALES,CHICAGO40,OPERATIONS,BOSTON[root@node3 data]#
创建部门表dept,并导入数据
hive> create table if not exists dept( > did int, > dname string, > dloc string) > row format delimited fields terminated by ',';OKTime taken: 0.185 secondshive> load data local inpath '/root/data/dept' into table dept;Loading data to table default.deptOKTime taken: 0.705 secondshive> select * from dept;OK10 ACCOUNTING NEW YORK20 RESEARCH DALLAS30 SALES CHICAGO40 OPERATIONS BOSTONTime taken: 0.155 seconds, Fetched: 4 row(s)hive>
(2)等值连接
连接的两个表中,只有同时满足连接条件的记录才会放入结果表中。执行查询命令select eid,ename,dname from emp,dept where emp.did=dept.did;
hive> select eid,ename,dname > from emp,dept > where emp.did=dept.did;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = root_20170815101902_18fb066b-11f2-438d-8a92-9d4490b80ec2Total jobs = 1SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]2017-08-15 10:19:13 Starting to launch local task to process map join; maximum memory = 5189795842017-08-15 10:19:17 Dump the side-table for tag: 1 with group count: 4 into file: file:/tmp/root/6e48493b-cca6-4657-a459-7ff36c6137b1/hive_2017-08-15_10-19-02_412_1947825427115470836-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile11--.hashtable2017-08-15 10:19:17 Uploaded 1 File to: file:/tmp/root/6e48493b-cca6-4657-a459-7ff36c6137b1/hive_2017-08-15_10-19-02_412_1947825427115470836-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile11--.hashtable (373 bytes)2017-08-15 10:19:17 End of local task; Time Taken: 3.36 sec.Execution completed successfullyMapredLocal task succeededLaunching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1502805552881_0002, Tracking URL = http://node1:8088/proxy/application_1502805552881_0002/Kill Command = /opt/hadoop-2.7.3/bin/hadoop job -kill job_1502805552881_0002Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 02017-08-15 10:19:38,347 Stage-3 map = 0%, reduce = 0%2017-08-15 10:19:52,530 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 2.69 secMapReduce Total cumulative CPU time: 2 seconds 690 msecEnded Job = job_1502805552881_0002MapReduce Jobs Launched: Stage-Stage-3: Map: 1 Cumulative CPU: 2.69 sec HDFS Read: 7746 HDFS Write: 459 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 690 msecOK7782 CLARK ACCOUNTING7839 KING ACCOUNTING7934 MILLER ACCOUNTING7369 SMITH RESEARCH7566 JONES RESEARCH7902 FORD RESEARCH7499 ALLEN SALES7521 WARD SALES7654 MARTIN SALES7698 BLAKE SALES7844 TURNER SALES7900 JAMES SALESTime taken: 52.488 seconds, Fetched: 12 row(s)hive>
内连接另一种形式:select e.eid,e.ename,d.dname from emp e join dept d on e.did=d.did;
其中,e和d分别是表emp和dept的别名。
hive> select e.eid,e.ename,d.dname > from emp e join dept d > on e.did=d.did;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = root_20170815102454_0ca5cc0f-1802-4b4f-8156-d055482971b7Total jobs = 1SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]2017-08-15 10:25:06 Starting to launch local task to process map join; maximum memory = 5189795842017-08-15 10:25:10 Dump the side-table for tag: 1 with group count: 4 into file: file:/tmp/root/6e48493b-cca6-4657-a459-7ff36c6137b1/hive_2017-08-15_10-24-54_978_8533839352082414207-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile21--.hashtable2017-08-15 10:25:10 Uploaded 1 File to: file:/tmp/root/6e48493b-cca6-4657-a459-7ff36c6137b1/hive_2017-08-15_10-24-54_978_8533839352082414207-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile21--.hashtable (373 bytes)2017-08-15 10:25:10 End of local task; Time Taken: 3.864 sec.Execution completed successfullyMapredLocal task succeededLaunching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1502805552881_0003, Tracking URL = http://node1:8088/proxy/application_1502805552881_0003/Kill Command = /opt/hadoop-2.7.3/bin/hadoop job -kill job_1502805552881_0003Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 02017-08-15 10:25:27,803 Stage-3 map = 0%, reduce = 0%2017-08-15 10:25:38,323 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 2.24 secMapReduce Total cumulative CPU time: 2 seconds 240 msecEnded Job = job_1502805552881_0003MapReduce Jobs Launched: Stage-Stage-3: Map: 1 Cumulative CPU: 2.24 sec HDFS Read: 7746 HDFS Write: 459 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 240 msecOK7782 CLARK ACCOUNTING7839 KING ACCOUNTING7934 MILLER ACCOUNTING7369 SMITH RESEARCH7566 JONES RESEARCH7902 FORD RESEARCH7499 ALLEN SALES7521 WARD SALES7654 MARTIN SALES7698 BLAKE SALES7844 TURNER SALES7900 JAMES SALESTime taken: 44.558 seconds, Fetched: 12 row(s)hive>
11.7.6 外连接
内连接只显示了俩张表都存在的数据,而外连接则是显示出所有的数据 。外连接又分为左外连接、右外连接以及全外连接,可以简称左连接、右连接和全连接。其中左连接是左边的表输出完整数据。右连接是右边的表输出完整数据,全连接就是两张表的所有行全部输出。
(1)补充数据
为了测试外连接,需要在雇员表中添加一个没有部门的新员工信息
hive> insert into emp(eid,ename,hiredate) values(8888,'HADRON','2016-08-31'); WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = root_20170816091805_3f28cfe1-278a-486e-9ed4-34ee791f5322Total jobs = 3Launching Job 1 out of 3Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1502887685471_0005, Tracking URL = http://node1:8088/proxy/application_1502887685471_0005/Kill Command = /opt/hadoop-2.7.3/bin/hadoop job -kill job_1502887685471_0005Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02017-08-16 09:18:20,468 Stage-1 map = 0%, reduce = 0%2017-08-16 09:18:32,863 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.48 secMapReduce Total cumulative CPU time: 2 seconds 480 msecEnded Job = job_1502887685471_0005Stage-4 is selected by condition resolver.Stage-3 is filtered out by condition resolver.Stage-5 is filtered out by condition resolver.Moving data to directory hdfs://cetc/user/hive/warehouse/emp/.hive-staging_hive_2017-08-16_09-18-05_560_4497324549163312419-1/-ext-10000Loading data to table default.empMapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.48 sec HDFS Read: 4861 HDFS Write: 105 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 480 msecOKTime taken: 30.102 secondshive>
(2)左连接
hive> select e.eid,e.ename,d.dname from emp e left join dept d on e.did=d.did;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = root_20170816091950_1483c055-0461-4293-9387-3221eb15b448Total jobs = 1SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]2017-08-16 09:20:01 Starting to launch local task to process map join; maximum memory = 5189795842017-08-16 09:20:04 Dump the side-table for tag: 1 with group count: 4 into file: file:/tmp/root/50703eac-08fc-4214-9bd0-488ad09f51e2/hive_2017-08-16_09-19-50_831_2141891134326428737-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile21--.hashtable2017-08-16 09:20:04 Uploaded 1 File to: file:/tmp/root/50703eac-08fc-4214-9bd0-488ad09f51e2/hive_2017-08-16_09-19-50_831_2141891134326428737-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile21--.hashtable (373 bytes)2017-08-16 09:20:04 End of local task; Time Taken: 2.606 sec.Execution completed successfullyMapredLocal task succeededLaunching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1502887685471_0006, Tracking URL = http://node1:8088/proxy/application_1502887685471_0006/Kill Command = /opt/hadoop-2.7.3/bin/hadoop job -kill job_1502887685471_0006Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 02017-08-16 09:20:20,596 Stage-3 map = 0%, reduce = 0%2017-08-16 09:20:30,466 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 1.82 secMapReduce Total cumulative CPU time: 1 seconds 820 msecEnded Job = job_1502887685471_0006MapReduce Jobs Launched: Stage-Stage-3: Map: 1 Cumulative CPU: 1.82 sec HDFS Read: 7158 HDFS Write: 486 SUCCESSTotal MapReduce CPU Time Spent: 1 seconds 820 msecOK7782 CLARK ACCOUNTING7839 KING ACCOUNTING7934 MILLER ACCOUNTING7369 SMITH RESEARCH7566 JONES RESEARCH7902 FORD RESEARCH7499 ALLEN SALES7521 WARD SALES7654 MARTIN SALES7698 BLAKE SALES7844 TURNER SALES7900 JAMES SALES8888 HADRON NULLTime taken: 41.877 seconds, Fetched: 13 row(s)hive>
(3)右连接
hive> select e.eid,e.ename,d.dname from emp e right join dept d on e.did=d.did;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = root_20170816092117_f97eea90-4039-45f9-8097-900c457f50f3Total jobs = 1SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]2017-08-16 09:21:28 Starting to launch local task to process map join; maximum memory = 5189795842017-08-16 09:21:31 Dump the side-table for tag: 0 with group count: 4 into file: file:/tmp/root/50703eac-08fc-4214-9bd0-488ad09f51e2/hive_2017-08-16_09-21-17_901_4210650042244478897-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile30--.hashtable2017-08-16 09:21:31 Uploaded 1 File to: file:/tmp/root/50703eac-08fc-4214-9bd0-488ad09f51e2/hive_2017-08-16_09-21-17_901_4210650042244478897-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile30--.hashtable (498 bytes)2017-08-16 09:21:31 End of local task; Time Taken: 2.599 sec.Execution completed successfullyMapredLocal task succeededLaunching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1502887685471_0007, Tracking URL = http://node1:8088/proxy/application_1502887685471_0007/Kill Command = /opt/hadoop-2.7.3/bin/hadoop job -kill job_1502887685471_0007Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 02017-08-16 09:21:48,072 Stage-3 map = 0%, reduce = 0%2017-08-16 09:21:58,404 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 1.95 secMapReduce Total cumulative CPU time: 1 seconds 950 msecEnded Job = job_1502887685471_0007MapReduce Jobs Launched: Stage-Stage-3: Map: 1 Cumulative CPU: 1.95 sec HDFS Read: 6169 HDFS Write: 488 SUCCESSTotal MapReduce CPU Time Spent: 1 seconds 950 msecOK7782 CLARK ACCOUNTING7839 KING ACCOUNTING7934 MILLER ACCOUNTING7369 SMITH RESEARCH7566 JONES RESEARCH7902 FORD RESEARCH7499 ALLEN SALES7521 WARD SALES7654 MARTIN SALES7698 BLAKE SALES7844 TURNER SALES7900 JAMES SALESNULL NULL OPERATIONSTime taken: 42.768 seconds, Fetched: 13 row(s)hive>
(4)全连接
hive> select e.eid,e.ename,d.dname from emp e full join dept d on e.did=d.did;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = root_20170816092230_3b3f51e8-2753-41a2-9f41-af98770847faTotal jobs = 1Launching Job 1 out of 1Number of reduce tasks not specified. Estimated from input data size: 1In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number>In order to set a constant number of reducers: set mapreduce.job.reduces=<number>Starting Job = job_1502887685471_0008, Tracking URL = http://node1:8088/proxy/application_1502887685471_0008/Kill Command = /opt/hadoop-2.7.3/bin/hadoop job -kill job_1502887685471_0008Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 12017-08-16 09:22:45,280 Stage-1 map = 0%, reduce = 0%2017-08-16 09:23:06,469 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6.28 sec2017-08-16 09:23:16,137 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 8.52 secMapReduce Total cumulative CPU time: 8 seconds 520 msecEnded Job = job_1502887685471_0008MapReduce Jobs Launched: Stage-Stage-1: Map: 2 Reduce: 1 Cumulative CPU: 8.52 sec HDFS Read: 16627 HDFS Write: 515 SUCCESSTotal MapReduce CPU Time Spent: 8 seconds 520 msecOK8888 HADRON NULL7782 CLARK ACCOUNTING7934 MILLER ACCOUNTING7839 KING ACCOUNTING7902 FORD RESEARCH7566 JONES RESEARCH7369 SMITH RESEARCH7654 MARTIN SALES7521 WARD SALES7900 JAMES SALES7499 ALLEN SALES7844 TURNER SALES7698 BLAKE SALESNULL NULL OPERATIONSTime taken: 47.921 seconds, Fetched: 14 row(s)hive>
(5)左半连接
hive> select e.eid,e.ename from emp e > left semi join dept d > on e.did=d.did;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = root_20170816093310_97649892-9cb8-4b26-954a-1a301976184dTotal jobs = 1SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]2017-08-16 09:33:22 Starting to launch local task to process map join; maximum memory = 5189795842017-08-16 09:33:25 Dump the side-table for tag: 1 with group count: 4 into file: file:/tmp/root/50703eac-08fc-4214-9bd0-488ad09f51e2/hive_2017-08-16_09-33-10_356_3322488615725018370-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile51--.hashtable2017-08-16 09:33:25 Uploaded 1 File to: file:/tmp/root/50703eac-08fc-4214-9bd0-488ad09f51e2/hive_2017-08-16_09-33-10_356_3322488615725018370-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile51--.hashtable (332 bytes)2017-08-16 09:33:25 End of local task; Time Taken: 2.642 sec.Execution completed successfullyMapredLocal task succeededLaunching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1502887685471_0010, Tracking URL = http://node1:8088/proxy/application_1502887685471_0010/Kill Command = /opt/hadoop-2.7.3/bin/hadoop job -kill job_1502887685471_0010Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 02017-08-16 09:33:43,441 Stage-3 map = 0%, reduce = 0%2017-08-16 09:33:55,350 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 2.93 secMapReduce Total cumulative CPU time: 2 seconds 930 msecEnded Job = job_1502887685471_0010MapReduce Jobs Launched: Stage-Stage-3: Map: 1 Cumulative CPU: 2.93 sec HDFS Read: 7152 HDFS Write: 363 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 930 msecOK7782 CLARK7839 KING7934 MILLER7369 SMITH7566 JONES7902 FORD7499 ALLEN7521 WARD7654 MARTIN7698 BLAKE7844 TURNER7900 JAMESTime taken: 46.179 seconds, Fetched: 12 row(s)hive>
左半连接和内连接的结果一样,只不过使用IN查询。
hive> select e.eid,e.ename from emp e > where e.did in(select d.did from dept d);WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = root_20170816092716_b479bf9e-fd03-48a1-be72-fb64d90f2efcTotal jobs = 1SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]2017-08-16 09:27:27 Starting to launch local task to process map join; maximum memory = 5189795842017-08-16 09:27:30 Dump the side-table for tag: 1 with group count: 4 into file: file:/tmp/root/50703eac-08fc-4214-9bd0-488ad09f51e2/hive_2017-08-16_09-27-16_222_8300160340562225012-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile41--.hashtable2017-08-16 09:27:30 Uploaded 1 File to: file:/tmp/root/50703eac-08fc-4214-9bd0-488ad09f51e2/hive_2017-08-16_09-27-16_222_8300160340562225012-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile41--.hashtable (332 bytes)2017-08-16 09:27:30 End of local task; Time Taken: 2.987 sec.Execution completed successfullyMapredLocal task succeededLaunching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1502887685471_0009, Tracking URL = http://node1:8088/proxy/application_1502887685471_0009/Kill Command = /opt/hadoop-2.7.3/bin/hadoop job -kill job_1502887685471_0009Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 02017-08-16 09:27:46,861 Stage-3 map = 0%, reduce = 0%2017-08-16 09:27:59,352 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 2.91 secMapReduce Total cumulative CPU time: 2 seconds 910 msecEnded Job = job_1502887685471_0009MapReduce Jobs Launched: Stage-Stage-3: Map: 1 Cumulative CPU: 2.91 sec HDFS Read: 7152 HDFS Write: 363 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 910 msecOK7782 CLARK7839 KING7934 MILLER7369 SMITH7566 JONES7902 FORD7499 ALLEN7521 WARD7654 MARTIN7698 BLAKE7844 TURNER7900 JAMESTime taken: 44.323 seconds, Fetched: 12 row(s)hive>
注意,左半连接通常比内连接要高效,因为对于左表的一条 指定的记录,在右表一旦找到匹配的记录,Hive就会停止扫描。
(4)
阅读全文
0 0
- Hadoop基础教程-第11章 Hive:SQL on Hadoop(11.7 HQL:数据查询)(草稿)
- Hadoop基础教程-第11章 Hive:SQL on Hadoop(11.8 HQL:排序)(草稿)
- Hadoop基础教程-第11章 Hive:SQL on Hadoop(11.5 HQL:DDL数据定义)(草稿)
- Hadoop基础教程-第11章 Hive:SQL on Hadoop(11.6 HQL:DML数据操纵)(草稿)
- Hadoop基础教程-第11章 Hive:SQL on Hadoop(11.1 Hive 介绍)(草稿)
- Hadoop基础教程-第11章 Hive:SQL on Hadoop(11.2 Hive安装与配置)(草稿)
- Hadoop基础教程-第11章 Hive:SQL on Hadoop(11.4 数据类型和存储格式)(草稿)
- Hadoop基础教程-第11章 Hive:SQL on Hadoop(11.3 Hive 快速入门)
- Hadoop基础教程-第12章 Hive:进阶(12.4 Hive Metastore)(草稿)
- Hadoop基础教程-第12章 Hive:进阶(12.5 Hive外表)(草稿)
- Hadoop基础教程-第12章 Hive:进阶(12.1 内置函数)(草稿)
- Hadoop基础教程-第12章 Hive:进阶(12.2 自定义函数)(草稿)
- Hadoop基础教程-第12章 Hive:进阶(12.3 HiveServer2)(草稿)
- Hadoop基础教程-第10章 HBase:Hadoop数据库(10.1 NoSQL介绍)(草稿)
- Hadoop基础教程-第10章 HBase:Hadoop数据库(10.2 HBase基本概念、框架)(草稿)
- Hadoop基础教程-第10章 HBase:Hadoop数据库(10.3 HBase安装与配置)(草稿)
- Hadoop基础教程-第10章 HBase:Hadoop数据库(10.4 NTP时间同步)(草稿)
- Hadoop基础教程-第10章 HBase:Hadoop数据库(10.5 HBase Shell)(草稿)
- 实例解释inner join 和 natural join 的区别
- Python-爬取2345电影并写入文件
- 浏览器如何渲染页面?
- 扩展SQLite使其能从apk文件中读取db
- 学生管理系统·
- Hadoop基础教程-第11章 Hive:SQL on Hadoop(11.7 HQL:数据查询)(草稿)
- 替换空格
- Rsync远程数据同步工具
- 网络编程
- 朴素贝叶斯:分母相对于所有类别为常数
- 网站开发(三)网站前台的导入及控制器对应
- 进程间通讯--动态库的共享数据
- gopher meetup
- Centos 添加yum源