Hadoop基础教程-第11章 Hive：SQL on Hadoop（11.7 HQL：数据查询）（草稿）

来源：互联网发布：中国古代名将知乎编辑：程序博客网时间：2024/05/16 08:38

第11章 Hive：SQL on Hadoop

11.7 HQL：数据查询

本节操作使用默认default数据库。

hive> use default;OKTime taken: 0.035 secondshive>

11.7.1 limit 子句

hive> select * from emp limit 5;OK7782    CLARK   MANAGER 7839    1981-06-09  2450.0  0.0 107839    KING    PRESIDENT   0   1981-11-17  5000.0  0.0 107934    MILLER  CLERK   7782    1982-01-23  1300.0  0.0 107369    SMITH   CLERK   7902    1980-12-17  800.0   0.0 207566    JONES   MANAGER 7839    1981-04-02  2975.0  0.0 20Time taken: 2.18 seconds, Fetched: 5 row(s)hive>

hive> select * from emp limit 1,3;OK7839    KING    PRESIDENT   0   1981-11-17  5000.0  0.0 107934    MILLER  CLERK   7782    1982-01-23  1300.0  0.0 107369    SMITH   CLERK   7902    1980-12-17  800.0   0.0 20Time taken: 0.303 seconds, Fetched: 3 row(s)hive>

11.7.2 case when then 语句

hive> select eid,ename,    > case    > when did=10 then 'd10'    > when did=20 then 'd20'    > when did=30 then 'd30'    > end    > from emp;OK7782    CLARK   d107839    KING    d107934    MILLER  d107369    SMITH   d207566    JONES   d207902    FORD    d207499    ALLEN   d307521    WARD    d307654    MARTIN  d307698    BLAKE   d307844    TURNER  d307900    JAMES   d30Time taken: 0.469 seconds, Fetched: 12 row(s)hive>

11.7.3 where 子句

可以用or和and连接多个表达式。

操作符数据类型说明 A=B 基本类型 A等于B返回true，否则返回false A<>B，A!=B 基本类型 A或B为null，返回null; A和B都不空时，A不等于B返回true，否则返回false A<B 基本类型 A或B为null，返回null; A小于B返回true，否则返回false A<=B 基本类型 A或B为null，返回null; A小于等于B返回true，否则返回false A>B 基本类型 A或B为null，返回null; A大于B返回true，否则返回false A>=B 基本类型 A或B为null，返回null; A大于等于B返回true，否则返回false A Between B And C 基本类型筛选 A 的值处于 B 和 C 之间 A not between B and C 基本类型筛选 A 的值不处于 B 和 C 之间 A is null 所有类型如果A等于null，返回true,否则返回false。注意，不能使用A=null A is not null 所有类型筛选 A 值不是null的 A like B string B是一个SQL正则表达式，%表示一个或者多个字符，_表示一个字符 A not like B string 与正则表达式B相反结果 A rlike B string 正则匹配

hive> select * from emp    > where did=30 and comm<>0.0;OK7499    ALLEN   SALESMAN    7698    1981-02-20  1600.0  300.0   307521    WARD    SALESMAN    7698    1981-02-22  1250.0  500.0   307654    MARTIN  SALESMAN    7698    1981-09-28  1250.0  1400.0  30Time taken: 0.667 seconds, Fetched: 3 row(s)hive>

hive> select * from emp    > where hiredate between '1981-01-01' and '1981-12-30';OK7782    CLARK   MANAGER 7839    1981-06-09  2450.0  0.0 107839    KING    PRESIDENT   0   1981-11-17  5000.0  0.0 107566    JONES   MANAGER 7839    1981-04-02  2975.0  0.0 207902    FORD    ANALYST 7566    1981-12-03  3000.0  0.0 207499    ALLEN   SALESMAN    7698    1981-02-20  1600.0  300.0   307521    WARD    SALESMAN    7698    1981-02-22  1250.0  500.0   307654    MARTIN  SALESMAN    7698    1981-09-28  1250.0  1400.0  307698    BLAKE   MANAGER 7839    1981-05-01  2850.0  0.0 307844    TURNER  SALESMAN    7698    1981-09-08  1500.0  0.0 307900    JAMES   CLERK   7698    1981-12-03  950.0   0.0 30Time taken: 0.311 seconds, Fetched: 10 row(s)hive>

11.7.4 group by 子句

【例】查询每个部门的人数

hive> select count(*) from emp    > group by did;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = root_20170814101839_fe4cccbd-62c4-48cb-986b-a70f93c19535Total jobs = 1Launching Job 1 out of 1Number of reduce tasks not specified. Estimated from input data size: 1In order to change the average load for a reducer (in bytes):  set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:  set hive.exec.reducers.max=<number>In order to set a constant number of reducers:  set mapreduce.job.reduces=<number>Starting Job = job_1502717288397_0003, Tracking URL = http://node1:8088/proxy/application_1502717288397_0003/Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502717288397_0003Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 12017-08-14 10:19:10,549 Stage-1 map = 0%,  reduce = 0%2017-08-14 10:19:27,526 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.86 sec2017-08-14 10:19:42,984 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.84 secMapReduce Total cumulative CPU time: 6 seconds 840 msecEnded Job = job_1502717288397_0003MapReduce Jobs Launched: Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 6.84 sec   HDFS Read: 9667 HDFS Write: 129 SUCCESSTotal MapReduce CPU Time Spent: 6 seconds 840 msecOK336Time taken: 66.588 seconds, Fetched: 3 row(s)hive>

与Oracle数据库一样，如果使用group by子句，那么查询的字段如果没有出现在group by子句的后面，则必须使用聚合函数。

hive> select ename,avg(sal) from emp group by did;FAILED: SemanticException [Error 10025]: Line 1:7 Expression not in GROUP BY key 'ename'hive>

如果想对分组结果进行过滤，可以使用having子句

hive> select avg(sal) from emp     > group by did    > having avg(sal)>2000;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = root_20170814103035_2106518a-aab4-4cdd-9b39-7b08fb1a44b6Total jobs = 1Launching Job 1 out of 1Number of reduce tasks not specified. Estimated from input data size: 1In order to change the average load for a reducer (in bytes):  set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:  set hive.exec.reducers.max=<number>In order to set a constant number of reducers:  set mapreduce.job.reduces=<number>Starting Job = job_1502717288397_0007, Tracking URL = http://node1:8088/proxy/application_1502717288397_0007/Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502717288397_0007Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 12017-08-14 10:31:08,601 Stage-1 map = 0%,  reduce = 0%2017-08-14 10:31:25,888 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.88 sec2017-08-14 10:31:43,122 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 8.34 secMapReduce Total cumulative CPU time: 8 seconds 340 msecEnded Job = job_1502717288397_0007MapReduce Jobs Launched: Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 8.34 sec   HDFS Read: 10807 HDFS Write: 149 SUCCESSTotal MapReduce CPU Time Spent: 8 seconds 340 msecOK2916.66666666666652258.3333333333335Time taken: 70.202 seconds, Fetched: 2 row(s)hive>

11.7.5 内连接

（1）准备阶段
部门数据

[root@node3 data]# vi dept[root@node3 data]# cat dept10,ACCOUNTING,NEW YORK20,RESEARCH,DALLAS30,SALES,CHICAGO40,OPERATIONS,BOSTON[root@node3 data]#

创建部门表dept，并导入数据

hive> create table if not exists dept(    > did    int,    > dname string,    > dloc string)    > row format delimited fields terminated by ',';OKTime taken: 0.185 secondshive> load data local inpath '/root/data/dept' into table dept;Loading data to table default.deptOKTime taken: 0.705 secondshive> select * from dept;OK10  ACCOUNTING  NEW YORK20  RESEARCH    DALLAS30  SALES   CHICAGO40  OPERATIONS  BOSTONTime taken: 0.155 seconds, Fetched: 4 row(s)hive>

（2）等值连接
连接的两个表中，只有同时满足连接条件的记录才会放入结果表中。执行查询命令select eid,ename,dname from emp,dept where emp.did=dept.did;

hive> select eid,ename,dname    > from emp,dept    > where emp.did=dept.did;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = root_20170815101902_18fb066b-11f2-438d-8a92-9d4490b80ec2Total jobs = 1SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]2017-08-15 10:19:13 Starting to launch local task to process map join;  maximum memory = 5189795842017-08-15 10:19:17 Dump the side-table for tag: 1 with group count: 4 into file: file:/tmp/root/6e48493b-cca6-4657-a459-7ff36c6137b1/hive_2017-08-15_10-19-02_412_1947825427115470836-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile11--.hashtable2017-08-15 10:19:17 Uploaded 1 File to: file:/tmp/root/6e48493b-cca6-4657-a459-7ff36c6137b1/hive_2017-08-15_10-19-02_412_1947825427115470836-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile11--.hashtable (373 bytes)2017-08-15 10:19:17 End of local task; Time Taken: 3.36 sec.Execution completed successfullyMapredLocal task succeededLaunching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1502805552881_0002, Tracking URL = http://node1:8088/proxy/application_1502805552881_0002/Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502805552881_0002Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 02017-08-15 10:19:38,347 Stage-3 map = 0%,  reduce = 0%2017-08-15 10:19:52,530 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 2.69 secMapReduce Total cumulative CPU time: 2 seconds 690 msecEnded Job = job_1502805552881_0002MapReduce Jobs Launched: Stage-Stage-3: Map: 1   Cumulative CPU: 2.69 sec   HDFS Read: 7746 HDFS Write: 459 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 690 msecOK7782    CLARK   ACCOUNTING7839    KING    ACCOUNTING7934    MILLER  ACCOUNTING7369    SMITH   RESEARCH7566    JONES   RESEARCH7902    FORD    RESEARCH7499    ALLEN   SALES7521    WARD    SALES7654    MARTIN  SALES7698    BLAKE   SALES7844    TURNER  SALES7900    JAMES   SALESTime taken: 52.488 seconds, Fetched: 12 row(s)hive>

内连接另一种形式：select e.eid,e.ename,d.dname from emp e join dept d on e.did=d.did;
其中，e和d分别是表emp和dept的别名。

hive> select e.eid,e.ename,d.dname    > from emp e join dept d    > on e.did=d.did;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = root_20170815102454_0ca5cc0f-1802-4b4f-8156-d055482971b7Total jobs = 1SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]2017-08-15 10:25:06 Starting to launch local task to process map join;  maximum memory = 5189795842017-08-15 10:25:10 Dump the side-table for tag: 1 with group count: 4 into file: file:/tmp/root/6e48493b-cca6-4657-a459-7ff36c6137b1/hive_2017-08-15_10-24-54_978_8533839352082414207-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile21--.hashtable2017-08-15 10:25:10 Uploaded 1 File to: file:/tmp/root/6e48493b-cca6-4657-a459-7ff36c6137b1/hive_2017-08-15_10-24-54_978_8533839352082414207-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile21--.hashtable (373 bytes)2017-08-15 10:25:10 End of local task; Time Taken: 3.864 sec.Execution completed successfullyMapredLocal task succeededLaunching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1502805552881_0003, Tracking URL = http://node1:8088/proxy/application_1502805552881_0003/Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502805552881_0003Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 02017-08-15 10:25:27,803 Stage-3 map = 0%,  reduce = 0%2017-08-15 10:25:38,323 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 2.24 secMapReduce Total cumulative CPU time: 2 seconds 240 msecEnded Job = job_1502805552881_0003MapReduce Jobs Launched: Stage-Stage-3: Map: 1   Cumulative CPU: 2.24 sec   HDFS Read: 7746 HDFS Write: 459 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 240 msecOK7782    CLARK   ACCOUNTING7839    KING    ACCOUNTING7934    MILLER  ACCOUNTING7369    SMITH   RESEARCH7566    JONES   RESEARCH7902    FORD    RESEARCH7499    ALLEN   SALES7521    WARD    SALES7654    MARTIN  SALES7698    BLAKE   SALES7844    TURNER  SALES7900    JAMES   SALESTime taken: 44.558 seconds, Fetched: 12 row(s)hive>

11.7.6 外连接

内连接只显示了俩张表都存在的数据，而外连接则是显示出所有的数据。外连接又分为左外连接、右外连接以及全外连接，可以简称左连接、右连接和全连接。其中左连接是左边的表输出完整数据。右连接是右边的表输出完整数据，全连接就是两张表的所有行全部输出。

（1）补充数据
为了测试外连接，需要在雇员表中添加一个没有部门的新员工信息

hive> insert into emp(eid,ename,hiredate) values(8888,'HADRON','2016-08-31'); WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = root_20170816091805_3f28cfe1-278a-486e-9ed4-34ee791f5322Total jobs = 3Launching Job 1 out of 3Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1502887685471_0005, Tracking URL = http://node1:8088/proxy/application_1502887685471_0005/Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502887685471_0005Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02017-08-16 09:18:20,468 Stage-1 map = 0%,  reduce = 0%2017-08-16 09:18:32,863 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.48 secMapReduce Total cumulative CPU time: 2 seconds 480 msecEnded Job = job_1502887685471_0005Stage-4 is selected by condition resolver.Stage-3 is filtered out by condition resolver.Stage-5 is filtered out by condition resolver.Moving data to directory hdfs://cetc/user/hive/warehouse/emp/.hive-staging_hive_2017-08-16_09-18-05_560_4497324549163312419-1/-ext-10000Loading data to table default.empMapReduce Jobs Launched: Stage-Stage-1: Map: 1   Cumulative CPU: 2.48 sec   HDFS Read: 4861 HDFS Write: 105 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 480 msecOKTime taken: 30.102 secondshive>

（2）左连接

hive> select e.eid,e.ename,d.dname from emp e left join dept d on e.did=d.did;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = root_20170816091950_1483c055-0461-4293-9387-3221eb15b448Total jobs = 1SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]2017-08-16 09:20:01 Starting to launch local task to process map join;  maximum memory = 5189795842017-08-16 09:20:04 Dump the side-table for tag: 1 with group count: 4 into file: file:/tmp/root/50703eac-08fc-4214-9bd0-488ad09f51e2/hive_2017-08-16_09-19-50_831_2141891134326428737-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile21--.hashtable2017-08-16 09:20:04 Uploaded 1 File to: file:/tmp/root/50703eac-08fc-4214-9bd0-488ad09f51e2/hive_2017-08-16_09-19-50_831_2141891134326428737-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile21--.hashtable (373 bytes)2017-08-16 09:20:04 End of local task; Time Taken: 2.606 sec.Execution completed successfullyMapredLocal task succeededLaunching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1502887685471_0006, Tracking URL = http://node1:8088/proxy/application_1502887685471_0006/Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502887685471_0006Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 02017-08-16 09:20:20,596 Stage-3 map = 0%,  reduce = 0%2017-08-16 09:20:30,466 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 1.82 secMapReduce Total cumulative CPU time: 1 seconds 820 msecEnded Job = job_1502887685471_0006MapReduce Jobs Launched: Stage-Stage-3: Map: 1   Cumulative CPU: 1.82 sec   HDFS Read: 7158 HDFS Write: 486 SUCCESSTotal MapReduce CPU Time Spent: 1 seconds 820 msecOK7782    CLARK   ACCOUNTING7839    KING    ACCOUNTING7934    MILLER  ACCOUNTING7369    SMITH   RESEARCH7566    JONES   RESEARCH7902    FORD    RESEARCH7499    ALLEN   SALES7521    WARD    SALES7654    MARTIN  SALES7698    BLAKE   SALES7844    TURNER  SALES7900    JAMES   SALES8888    HADRON  NULLTime taken: 41.877 seconds, Fetched: 13 row(s)hive>

（3）右连接

hive> select e.eid,e.ename,d.dname from emp e right join dept d on e.did=d.did;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = root_20170816092117_f97eea90-4039-45f9-8097-900c457f50f3Total jobs = 1SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]2017-08-16 09:21:28 Starting to launch local task to process map join;  maximum memory = 5189795842017-08-16 09:21:31 Dump the side-table for tag: 0 with group count: 4 into file: file:/tmp/root/50703eac-08fc-4214-9bd0-488ad09f51e2/hive_2017-08-16_09-21-17_901_4210650042244478897-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile30--.hashtable2017-08-16 09:21:31 Uploaded 1 File to: file:/tmp/root/50703eac-08fc-4214-9bd0-488ad09f51e2/hive_2017-08-16_09-21-17_901_4210650042244478897-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile30--.hashtable (498 bytes)2017-08-16 09:21:31 End of local task; Time Taken: 2.599 sec.Execution completed successfullyMapredLocal task succeededLaunching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1502887685471_0007, Tracking URL = http://node1:8088/proxy/application_1502887685471_0007/Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502887685471_0007Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 02017-08-16 09:21:48,072 Stage-3 map = 0%,  reduce = 0%2017-08-16 09:21:58,404 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 1.95 secMapReduce Total cumulative CPU time: 1 seconds 950 msecEnded Job = job_1502887685471_0007MapReduce Jobs Launched: Stage-Stage-3: Map: 1   Cumulative CPU: 1.95 sec   HDFS Read: 6169 HDFS Write: 488 SUCCESSTotal MapReduce CPU Time Spent: 1 seconds 950 msecOK7782    CLARK   ACCOUNTING7839    KING    ACCOUNTING7934    MILLER  ACCOUNTING7369    SMITH   RESEARCH7566    JONES   RESEARCH7902    FORD    RESEARCH7499    ALLEN   SALES7521    WARD    SALES7654    MARTIN  SALES7698    BLAKE   SALES7844    TURNER  SALES7900    JAMES   SALESNULL    NULL    OPERATIONSTime taken: 42.768 seconds, Fetched: 13 row(s)hive>

（4）全连接

hive> select e.eid,e.ename,d.dname from emp e full join dept d on e.did=d.did;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = root_20170816092230_3b3f51e8-2753-41a2-9f41-af98770847faTotal jobs = 1Launching Job 1 out of 1Number of reduce tasks not specified. Estimated from input data size: 1In order to change the average load for a reducer (in bytes):  set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:  set hive.exec.reducers.max=<number>In order to set a constant number of reducers:  set mapreduce.job.reduces=<number>Starting Job = job_1502887685471_0008, Tracking URL = http://node1:8088/proxy/application_1502887685471_0008/Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502887685471_0008Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 12017-08-16 09:22:45,280 Stage-1 map = 0%,  reduce = 0%2017-08-16 09:23:06,469 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.28 sec2017-08-16 09:23:16,137 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 8.52 secMapReduce Total cumulative CPU time: 8 seconds 520 msecEnded Job = job_1502887685471_0008MapReduce Jobs Launched: Stage-Stage-1: Map: 2  Reduce: 1   Cumulative CPU: 8.52 sec   HDFS Read: 16627 HDFS Write: 515 SUCCESSTotal MapReduce CPU Time Spent: 8 seconds 520 msecOK8888    HADRON  NULL7782    CLARK   ACCOUNTING7934    MILLER  ACCOUNTING7839    KING    ACCOUNTING7902    FORD    RESEARCH7566    JONES   RESEARCH7369    SMITH   RESEARCH7654    MARTIN  SALES7521    WARD    SALES7900    JAMES   SALES7499    ALLEN   SALES7844    TURNER  SALES7698    BLAKE   SALESNULL    NULL    OPERATIONSTime taken: 47.921 seconds, Fetched: 14 row(s)hive>

（5）左半连接

hive> select e.eid,e.ename from emp e     > left semi join dept d    > on e.did=d.did;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = root_20170816093310_97649892-9cb8-4b26-954a-1a301976184dTotal jobs = 1SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]2017-08-16 09:33:22 Starting to launch local task to process map join;  maximum memory = 5189795842017-08-16 09:33:25 Dump the side-table for tag: 1 with group count: 4 into file: file:/tmp/root/50703eac-08fc-4214-9bd0-488ad09f51e2/hive_2017-08-16_09-33-10_356_3322488615725018370-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile51--.hashtable2017-08-16 09:33:25 Uploaded 1 File to: file:/tmp/root/50703eac-08fc-4214-9bd0-488ad09f51e2/hive_2017-08-16_09-33-10_356_3322488615725018370-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile51--.hashtable (332 bytes)2017-08-16 09:33:25 End of local task; Time Taken: 2.642 sec.Execution completed successfullyMapredLocal task succeededLaunching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1502887685471_0010, Tracking URL = http://node1:8088/proxy/application_1502887685471_0010/Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502887685471_0010Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 02017-08-16 09:33:43,441 Stage-3 map = 0%,  reduce = 0%2017-08-16 09:33:55,350 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 2.93 secMapReduce Total cumulative CPU time: 2 seconds 930 msecEnded Job = job_1502887685471_0010MapReduce Jobs Launched: Stage-Stage-3: Map: 1   Cumulative CPU: 2.93 sec   HDFS Read: 7152 HDFS Write: 363 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 930 msecOK7782    CLARK7839    KING7934    MILLER7369    SMITH7566    JONES7902    FORD7499    ALLEN7521    WARD7654    MARTIN7698    BLAKE7844    TURNER7900    JAMESTime taken: 46.179 seconds, Fetched: 12 row(s)hive>

左半连接和内连接的结果一样，只不过使用IN查询。

hive> select e.eid,e.ename from emp e     > where e.did in(select d.did from dept d);WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = root_20170816092716_b479bf9e-fd03-48a1-be72-fb64d90f2efcTotal jobs = 1SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]2017-08-16 09:27:27 Starting to launch local task to process map join;  maximum memory = 5189795842017-08-16 09:27:30 Dump the side-table for tag: 1 with group count: 4 into file: file:/tmp/root/50703eac-08fc-4214-9bd0-488ad09f51e2/hive_2017-08-16_09-27-16_222_8300160340562225012-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile41--.hashtable2017-08-16 09:27:30 Uploaded 1 File to: file:/tmp/root/50703eac-08fc-4214-9bd0-488ad09f51e2/hive_2017-08-16_09-27-16_222_8300160340562225012-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile41--.hashtable (332 bytes)2017-08-16 09:27:30 End of local task; Time Taken: 2.987 sec.Execution completed successfullyMapredLocal task succeededLaunching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1502887685471_0009, Tracking URL = http://node1:8088/proxy/application_1502887685471_0009/Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502887685471_0009Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 02017-08-16 09:27:46,861 Stage-3 map = 0%,  reduce = 0%2017-08-16 09:27:59,352 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 2.91 secMapReduce Total cumulative CPU time: 2 seconds 910 msecEnded Job = job_1502887685471_0009MapReduce Jobs Launched: Stage-Stage-3: Map: 1   Cumulative CPU: 2.91 sec   HDFS Read: 7152 HDFS Write: 363 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 910 msecOK7782    CLARK7839    KING7934    MILLER7369    SMITH7566    JONES7902    FORD7499    ALLEN7521    WARD7654    MARTIN7698    BLAKE7844    TURNER7900    JAMESTime taken: 44.323 seconds, Fetched: 12 row(s)hive>

注意，左半连接通常比内连接要高效，因为对于左表的一条指定的记录，在右表一旦找到匹配的记录，Hive就会停止扫描。
（4）

阅读全文

0 0