hive 实践练习1 建表 查询

来源:互联网 发布:网络打印机通讯协议 编辑:程序博客网 时间:2024/06/06 21:08
head -n 5 visits_data.txt

cat visits.hive


[root@master exercise]#head -n 5 visits_data.txt
BUCKLEY SUMMER 10/12/2010 14:48 10/12/2010 14:45 WH
CLOONEY GEORGE 10/12/2010 14:47 10/12/2010 14:45 WH
PRENDERGAST JOHN 10/12/2010 14:48 10/12/2010 14:45 WH
LANIER JAZMIN 10/13/2010 13:00 WH BILL SIGNING/
MAYNARD ELIZABETH 10/13/2010 12:34 10/13/2010 13:00 WH BILL SIGNING/

[root@master exercise]# cat visits.hive
--cat visits.hive
create table people_visits (
last_name string,
first_name string,
arrival_time string,
scheduled_time string,
meeting_location string,
info_comment string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t' ;

建表语句不知道什么意思


[root@master ~]# ./hive -f /opt/visits.hive
bash: ./hive: No such file or directory

hive> ./hive -f /opt/exercise/visits.hive
> ;
NoViableAltException(17@[])



hive> -f /opt/exercise/visits.hive
> ;
NoViableAltException(299@[])
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1074)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:397)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:309)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1145)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1193)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1082)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: ParseException line 1:1 cannot recognize input near '-' 'f' '/'






[root@master exercise]# hive -f /opt/exercise/visits.hive

Logging initialized using configuration in jar:file:/opt/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.)



[root@master ~]# hive

Logging initialized using configuration in jar:file:/opt/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
hive (default)> show tables;
OK
tab_name
people_visits
Time taken: 1.281 seconds, Fetched: 1 row(s)
hive (default)> describe people_visits;
OK
col_name data_type comment
last_name string
first_name string
arrival_time string
scheduled_time string
meeting_location string
info_comment string
Time taken: 0.544 seconds, Fetched: 6 row(s)


[root@master exercise]# hadoop fs -put visits_data.txt /data/hive/warehouse/people_visits
put: `/data/hive/warehouse/people_visits/': No such file or directory


(You are getting the error, because there is no such directory specified in the path. Please take a look at my answer to a similar question which explains how hadoop interprets relative path's.
Make sure you create the directory first using:
bin/hadoop fs -mkdir input
and then try to re-execute the command -put.)



[root@master data]# hadoop fs -mkdir /hive
--OK

[root@master data]# hadoop fs -mkdir /data/hive/warehouse/people_visits
mkdir: `/data/hive/warehouse/people_visits': No such file or directory

查看hadoop 目录结构 ? 不知道如何建立hadoop 子目录结构
hadoop fs -put visits_data.txt /hive



17/08/23 10:56:03 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /hive/visits_data.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.


put: File /hive/visits_data.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.



首先确认已经关闭防火墙,然后发现,我用master,slave两个机器,配置
dfs.replication 属性为2 ,修改成1 搞定。
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

[root@master exercise]# hadoop fs -ls /hive
Found 1 items
-rw-r--r-- 2 root supergroup 989239 2017-08-23 11:04 /hive/visits_data.txt


hive (default)> select * from people_visits limit 5;
OK
people_visits.last_name people_visits.first_name people_visits.arrival_time people_visits.scheduled_time people_visits.meeting_location people_visits.info_comment
Time taken: 2.319 seconds

放入的数据找不到!!! 是不是配置问题?

需要配置

<property>  
  <name>hive.metastore.warehouse.dir</name>  
  <value>/user/hive/warehouse</value>  
</property>  


然后执行
[root@master exercise]# hadoop fs -put visits_data.txt /user/hive/warehouse/employees
You have new mail in /var/spool/mail/root


[root@master exercise]#hadoop fs -ls /user/hive/warehouse/people_visits
Found 1 items
-rw-r--r-- 2 root supergroup 989239 2017-08-24 15:08 /user/hive/warehouse/people_visits/visits_data.txt


hive (default)> select * from people_visits limit 5;
OK
people_visits.last_name people_visits.first_name people_visits.arrival_time people_visits.scheduled_time people_visits.meeting_location people_visits.info_comment
BUCKLEY SUMMER 10/12/2010 14:48 10/12/2010 14:45 WH
CLOONEY GEORGE 10/12/2010 14:47 10/12/2010 14:45 WH
PRENDERGAST JOHN 10/12/2010 14:48 10/12/2010 14:45 WH
LANIER JAZMIN 10/13/2010 13:00 WH BILL SIGNING/
MAYNARD ELIZABETH 10/13/2010 12:34 10/13/2010 13:00 WH BILL SIGNING/
Time taken: 0.631 seconds, Fetched: 5 row(s


hive (default)> select count(*) from people_visits;
Query ID = root_20170824153813_390ff406-b9bb-4b83-99bf-a2f5bf9092dc
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1503560054683_0001, Tracking URL = http://master:8088/proxy/application_1503560054683_0001/
Kill Command = /opt/hadoop-2.6.5/bin/hadoop job -kill job_1503560054683_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-08-24 15:38:39,084 Stage-1 map = 0%, reduce = 0%
2017-08-24 15:38:47,164 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.88 sec
2017-08-24 15:38:53,717 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.32 sec
MapReduce Total cumulative CPU time: 5 seconds 320 msec
Ended Job = job_1503560054683_0001
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 5.32 sec HDFS Read: 996386 HDFS Write: 6 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 320 msec
OK
_c0
17977
Time taken: 43.436 seconds, Fetched: 1 row(s)


原创粉丝点击