Hive表中四种不同数据导出方式以及如何自定义导出列分隔符
来源:互联网 发布:八旗军战斗力 知乎 编辑:程序博客网 时间:2024/05/21 06:23
文章转载于http://blog.csdn.net/niityzu/article/details/42238483
问题导读:
1、Hive表数据四种导出方式是?
2、导出命令中LOCAL的作用及有无的区别?
3、导出命令中是否可以向导入命令一样使用INTO?
4、如何自定义导出文件的列分隔符?
5、hive的-e和-f参数的作用及如何使用其来导出数据?
6、hive shell环境中source命令的作用?
我们以test1表为例演示几种数据导出方式,test1表中的数据内容如下:
- hive> select * from test1;
- OK
- Tom 24.0 NanJing Nanjing University
- Jack 29.0 NanJing Southeast China University
- Mary Kake 21.0 SuZhou Suzhou University
- John Doe 24.0 YangZhou YangZhou University
- Bill King 23.0 XuZhou Xuzhou Normal University
- Time taken: 0.064 seconds, Fetched: 5 row(s)
一、导出数据至本地文件系统
(1) 导出数据
- hive> insert overwrite local directory "/home/hadoopUser/data"
- > select name,age,address
- > from test1;
/home/hadoopUser/data是一个目录,一个文件或者多个文件将会被写到该目录下,具体个数取决于调用的reducer的个数。此处的local表示导出到本地文件系统,如果不加local,则表示导出到分布式文件系统。
(2) 查看导出的数据
- Tom^A24.0^ANanJing
- Jack^A29.0^ANanJing
- Mary Kake^A21.0^ASuZhou
- John Doe^A24.0^AYangZhou
- Bill King^A23.0^AXuZhou
在本地文件系统目录/home/hadoopUser/data中生成了一个000000_0文件,里面的内容如上所示。每个列以^A(八进制\001)分隔。
二、导出数据至分布式文件系统(1) 导出数据
- hive> insert overwrite directory "/output"
- > select name,age,address
- > from test1;
此处与第一种方法的不同之处就在于没有local关键字,目录/output为分布式文件系统的一个目录,也可以写全:hdfs://master-server/output
(2) 查看导出的数据
结果和第一种方法一样
注意:如果我们将overwrite改成into,就像数据导入表一样,结果会怎么样。
- hive> insert into local directory "/home/hadoopUser/data"
- > select name,age,address
- > from test1;
- NoViableAltException(85@[184:1: tableName : (db= identifier DOT tab= identifier -> ^( TOK_TABNAME $db $tab) |tab= identifier -> ^( TOK_TABNAME $tab) );])
- at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
- at org.antlr.runtime.DFA.predict(DFA.java:116)
- at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.tableName(HiveParser_FromClauseParser.java:4945)
- at org.apache.hadoop.hive.ql.parse.HiveParser.tableName(HiveParser.java:40208)
- at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.tableOrPartition(HiveParser_IdentifiersParser.java:10233)
- at org.apache.hadoop.hive.ql.parse.HiveParser.tableOrPartition(HiveParser.java:40210)
- at org.apache.hadoop.hive.ql.parse.HiveParser.insertClause(HiveParser.java:39685)
- at org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:37647)
- at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:36898)
- at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:36774)
- at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1338)
- at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
- at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
- at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
- at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404)
- at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
- at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
- at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
- at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
- at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
- at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
- at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
- at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
- at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
- at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
- at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
- at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
- at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
- at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
- at java.lang.reflect.Method.invoke(Method.java:606)
- at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
- FAILED: ParseException line 1:12 missing TABLE at 'local' near 'into' in table name
和Hive数据导入不一样,不能使用INSERT INTO命令导出表中数据。
三、将表中数据导出至另外一个表中
此种方法与我写的另外一篇博客内容相似,此处不再累赘。数据导入表中的一种方法,即如何将查询结果插入表中,附上链接:通过查询语句向表中插入数据
四、hive -e和-f参数的使用及数据导出
(1) 使用hive -e数据导出
用户可能有时候期望执行一个或者多个查询(使用分号分隔),执行结束后hive CLI立即退出。hive使用-e 命令在终端执行一条或者多条语句,语句使用引号引起来,还可以将查询的结果重定向到一个文件中。
注意:不需要在Hive shell中执行,直接在终端执行即可。
需要先使用"use hive"进入对应的数据库,否则会报错。如下
- [hadoopUser@secondmgt ~]$ hive -e "use hive;select * from test1" > /home/hadoopUser/data/myquery
- 14/12/29 18:53:18 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
- 14/12/29 18:53:18 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
- 14/12/29 18:53:18 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
- 14/12/29 18:53:18 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
- 14/12/29 18:53:18 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
- 14/12/29 18:53:18 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
- 14/12/29 18:53:18 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
- 14/12/29 18:53:18 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
-
- Logging initialized using configuration in file:/home/hadoopUser/cloud/hive/apache-hive-0.13.1-bin/conf/hive-log4j.properties
- OK
- Time taken: 0.597 seconds
- OK
- Time taken: 0.66 seconds, Fetched: 5 row(s)
查看导出文件结果
- Tom 24.0 NanJing Nanjing University
- Jack 29.0 NanJing Southeast China University
- Mary Kake 21.0 SuZhou Suzhou University
- John Doe 24.0 YangZhou YangZhou University
- Bill King 23.0 XuZhou Xuzhou Normal University
注意:默认是以\t作为列分隔符的。 (2) 使用hive -f 执行查询文件
有时候我们将多条查询语句写入到一个文件中,使用hive -f 可以执行该文件中的一条或者多条语句。按照惯例,一般把这些Hive查询文件保存为具有.q或者.hql后缀名的文件。
编辑query.hql,内容如下:
- use hive;
- select name,age from test1;
执行命令及过程如下:- [hadoopUser@secondmgt ~]$ hive -f data/query.hql
- 14/12/29 19:07:38 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
- 14/12/29 19:07:38 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
- 14/12/29 19:07:38 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
- 14/12/29 19:07:38 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
- 14/12/29 19:07:38 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
- 14/12/29 19:07:38 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
- 14/12/29 19:07:38 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
- 14/12/29 19:07:38 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
-
- Logging initialized using configuration in file:/home/hadoopUser/cloud/hive/apache-hive-0.13.1-bin/conf/hive-log4j.properties
- OK
- Time taken: 0.564 seconds
- Total jobs = 1
- Launching Job 1 out of 1
- Number of reduce tasks is set to 0 since there's no reduce operator
- Starting Job = job_1419317102229_0038, Tracking URL = http://secondmgt:8088/proxy/application_1419317102229_0038/
- Kill Command = /home/hadoopUser/cloud/hadoop/programs/hadoop-2.2.0/bin/hadoop job -kill job_1419317102229_0038
- Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
- 2014-12-29 19:07:55,168 Stage-1 map = 0%, reduce = 0%
- 2014-12-29 19:08:06,671 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.93 sec
- MapReduce Total cumulative CPU time: 2 seconds 930 msec
- Ended Job = job_1419317102229_0038
- MapReduce Jobs Launched:
- Job 0: Map: 1 Cumulative CPU: 2.93 sec HDFS Read: 415 HDFS Write: 63 SUCCESS
- Total MapReduce CPU Time Spent: 2 seconds 930 msec
- OK
- Tom 24.0
- Jack 29.0
- Mary Kake 21.0
- John Doe 24.0
- Bill King 23.0
- Time taken: 27.17 seconds, Fetched: 5 row(s)
(3) 在Hive shell中使用source命令执行一个脚本文件
- hive> source /home/hadoopUser/data/query.hql;
- OK
- Time taken: 0.394 seconds
- Total jobs = 1
- Launching Job 1 out of 1
- Number of reduce tasks is set to 0 since there's no reduce operator
- Starting Job = job_1419317102229_0039, Tracking URL = http://secondmgt:8088/proxy/application_1419317102229_0039/
- Kill Command = /home/hadoopUser/cloud/hadoop/programs/hadoop-2.2.0/bin/hadoop job -kill job_1419317102229_0039
- Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
- 2014-12-29 19:13:03,718 Stage-1 map = 0%, reduce = 0%
- 2014-12-29 19:13:15,459 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.7 sec
- MapReduce Total cumulative CPU time: 2 seconds 700 msec
- Ended Job = job_1419317102229_0039
- MapReduce Jobs Launched:
- Job 0: Map: 1 Cumulative CPU: 2.7 sec HDFS Read: 415 HDFS Write: 63 SUCCESS
- Total MapReduce CPU Time Spent: 2 seconds 700 msec
- OK
- Tom 24.0
- Jack 29.0
- Mary Kake 21.0
- John Doe 24.0
- Bill King 23.0
- Time taken: 27.424 seconds, Fetched: 5 row(s)
附:
自定义导出文件列分隔符
在默认情况下,从Hive表中导出的文件是以^A作为列分隔符的,但是有时候此方法分隔显示的不直观,为此Hive允许指定列的分隔符。我们还是以上面第一种方法导出表中数据为例。
(1) 导出表中数据
- hive> insert overwrite local directory "/home/hadoopUser/data"
- > row format delimited
- > fields terminated by '\t'
- > select * from test1;
此处是以\t作为列的分隔符的。
(2) 查看导出结果
- Tom 24.0 NanJing Nanjing University
- Jack 29.0 NanJing Southeast China University
- Mary Kake 21.0 SuZhou Suzhou University
- John Doe 24.0 YangZhou YangZhou University
- Bill King 23.0 XuZhou Xuzhou Normal University
和第一种方法默认以^A导出相比,显得更直观了一些
0 0