几种保存Hive查询结果的方法
来源:互联网 发布:对流风和穿堂风知乎 编辑:程序博客网 时间:2024/04/28 10:47
很多时候,我们需要将Hive的查询(select)结果保存起来,方便进一步处理或查看。
在Hive里面提供了不同的方式来保存查询结果,在这里做下总结:
一、保存结果到本地
方法1:调用hive标准输出,将查询结果写到指定的文件中
这个方法最为常见,笔者也经常使用。sql的查询结果将直接保存到/tmp/out.txt中$ hive -e "select user, login_timestamp from user_login" > /tmp/out.txt当然我们也可以查询保存到某个文件file.sql中,按下面的方式执行查询,并保存结果$ hive -f file.sql > /tmp/out.txt下面是file.sql的内容:$ cat file.sqlselect user, login_timestamp from user_loginhive客户的的详细使用方法可以参考hive的官方文档《Hive Batch Mode Commands》hive -e '<query-string>' executes the query string.hive -f <filepath> executes one or more SQL queries from a file.
方法2:使用INSERT OVERWRITE LOCAL DIRECTORY结果到本地
hive> insert overwrite local directory "/tmp/out/" > select user, login_time from user_login;上面的命令会将select user, login_time from user_login的查询结果保存到/tmp/out/本地目录下。
我们查看一下/tmp/out/目录下的文件,发现命令执行后,多了两个文件:
$ find /tmp/out/ -type f/tmp/out/.000000_0.crc/tmp/out/000000_0这两个文件存放的内容不一样,其中000000_0存放查询的结果,带有crc后缀的存放那个文件的crc32校验。用vim打开查看下000000_0的内容:
vim /tmp/out/000000_0 1 user_1^A20140701 2 user_2^A20140701 3 user_2^A20140701可以看到,导出的查询结果字段之间是用^A(Ctrl+A)作为分割符,行与行之间用\n作为分割。
默认的字段分割符有时候可能不太方便,幸好Hive提供了修改分割符号的方法,我们只要在导出时指定就可以了:
hive> insert overwrite local directory "/tmp/out/" > row format delimited fields terminated by "\t" > select user, login_time from user_login;$ vim /tmp/out/000000_01 user_1 201407012 user_2 201407013 user_2 20140701可以看到字段分割符已经变成了tab(人眼看起来更舒服^-^)。
同样,我们也可以指定复杂类型(collection、map)的输出格式
更多关于INSERT OVERWRITE LOCAL DIRECTORY的语法,可以参考HIVE的官方文档《Writing data into the filesystem from queries》。
Standard syntax:INSERT OVERWRITE <span style="color:#ff0000;">[LOCAL] </span>DIRECTORY directory1 [ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11.0) SELECT ... FROM ... Hive extension (multiple inserts):FROM from_statementINSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ... row_format : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char] [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] [NULL DEFINED AS char] (Note: Only available starting with Hive 0.13)
二、保存结果到hdfs
保存查询结果到hdfs很简单,使用INSERT OVERWRITE DIRECTORY就可以完成操作:
hive> insert overwrite directory "/tmp/out/" > row format delimited fields terminated by "\t" > select user, login_time from user_login;需要注意的是,跟保存到本地文件系统的差别是,保存到hdfs时命令不需要指定LOCAL项
更多关于INSERT OVERWRITE DIRECTORY的语法,可以参考HIVE的官方文档《Writing data into the filesystem from queries》。
三、保存结果到HIVE表
方法1、已经建好结果表,使用INSERT OVERWRITE TABLE以覆盖方式写入结果表
如果结果表已经建好,可以使用INSERT OVERWRITE TABLE将结果写入结果表:
hive> desc query_result;OKuser string,login_time biginthive> insert overwrite table query_result > select user, login_time from user_login;hive> select * from query_result; OKuser_120140701user_220140701user_320140701
当然,HIVE也提供了追加方式INSERT TABLE,可以在原有数据后面加上新的查询结果。在上面这个例子基础上,我们再追加一个查询结果:
hive> insert into table query_result > select * from query_result;hive> select * from query_result;OKuser_120140701user_220140701user_320140701<span style="color:#ff0000;">user_120140701user_220140701user_320140701</span>注意标红的部分,使用INSERT TABLE后,query_result增加了三行数据更多关于INSERT OVERWRITE TABLE的语法,可以参考HIVE官方文档《Inserting data into Hive Tables from queries》
Standard syntax:INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement; Hive extension (multiple inserts):FROM from_statementINSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2][INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...;FROM from_statementINSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2][INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...; Hive extension (dynamic partition inserts):INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
方法2、如果需要新建一个表,用于存放查询结果,可以使用CREATE TABLE AS SELECT语法
hive> create table query_result > as > select user, login_time from user_login;hive> select * from query_result; OKuser_120140701user_220140701user_320140701更多关于CREATE TABLE AS SELECT的语法,可以参考HIVE官方文档《Create Table As Select (CTAS)》CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name (Note: TEMPORARY available starting with Hive 0.14.0) [(col_name data_type [COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [SKEWED BY (col_name, col_name, ...) ON ([(col_value, col_value, ...), ...|col_value, col_value, ...]) [STORED AS DIRECTORIES] (Note: Only available starting with Hive 0.10.0)] [ [ROW FORMAT row_format] [STORED AS file_format] | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] (Note: Only available starting with Hive 0.6.0) ] [LOCATION hdfs_path] [TBLPROPERTIES (property_name=property_value, ...)] (Note: Only available starting with Hive 0.6.0) [AS select_statement] (Note: Only available starting with Hive 0.5.0, and not supported when creating external tables.) CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name LIKE existing_table_or_view_name [LOCATION hdfs_path] data_type : primitive_type | array_type | map_type | struct_type | union_type (Note: Only available starting with Hive 0.7.0) primitive_type : TINYINT | SMALLINT | INT | BIGINT | BOOLEAN | FLOAT | DOUBLE | STRING | BINARY (Note: Only available starting with Hive 0.8.0) | TIMESTAMP (Note: Only available starting with Hive 0.8.0) | DECIMAL (Note: Only available starting with Hive 0.11.0) | DECIMAL(precision, scale) (Note: Only available starting with Hive 0.13.0) | VARCHAR (Note: Only available starting with Hive 0.12.0) | CHAR (Note: Only available starting with Hive 0.13.0) array_type : ARRAY < data_type > map_type : MAP < primitive_type, data_type > struct_type : STRUCT < col_name : data_type [COMMENT col_comment], ...> union_type : UNIONTYPE < data_type, data_type, ... > (Note: Only available starting with Hive 0.7.0) row_format : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char] [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] [NULL DEFINED AS char] (Note: Only available starting with Hive 0.13) | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)] file_format: : SEQUENCEFILE | TEXTFILE | RCFILE (Note: Only available starting with Hive 0.6.0) | ORC (Note: Only available starting with Hive 0.11.0) | AVRO (Note: Only available starting with Hive 0.14.0) | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
四、使用hdfs直接导出表
Hive是构建在hdfs上的,因此,我们可以使用hdfs的命令hadoop dfs -get直接导出表。
首先、我们先找到要导出的表存放到哪个目录下:
hive> show create table user_login;OKCREATE TABLE `user_login`( `user` string, `login_time` bigint)ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'<span style="color:#ff0000;">LOCATION 'file:/user/hive/warehouse/test.db/user_login'</span>TBLPROPERTIES ( 'totalSize'='160', 'numRows'='10', 'rawDataSize'='150', 'COLUMN_STATS_ACCURATE'='true', 'numFiles'='1', 'transient_lastDdlTime'='1411544983')Time taken: 0.174 seconds, Fetched: 18 row(s)可以看到,user_login表存放到在file:/user/hive/warehouse/test.db/user_login。
接下来,直接利用hadoop dfs -get导出到本地:
hadoop dfs -get file:/user/hive/warehouse/test.db/user_login /tmp/out/更多关于hadoop dfs -get命令,可以参考hadoop dfs命令界面文档《File System Shell》
0 0
- 几种保存Hive查询结果的方法
- 几种保存Hive查询结果的方法
- 几种保存Hive查询结果的方法
- 保存Hive查询的方法
- hive中查询星期几的方法
- 利用python将hive查询结果保存到mysql
- mysql查询结果转换为PHP数组的几种方法的区别
- C++操作SQLITE获得查询结果集的几种方法总结
- C++操作SQLITE获得查询结果集的几种方法总结
- C++操作SQLITE获得查询结果集的几种方法总结
- Hive几种参数配置的方法
- Hive学习笔记 --- 执行Hive操作的几种方法
- hive查询结果导出
- hive 查询结果压缩
- 保存图片的几种方法
- 网上图片的几种保存方法
- MATLAB 保存图片的几种方法
- 保存对象信息的几种方法
- 两个字符串的最大公共子串
- 设置打开Finder的快捷键
- IOS UITable 插入和删除的动画
- poj 1847 最短路径的dijkstra算法
- android调用照相机
- 几种保存Hive查询结果的方法
- 数字集成电路设计-13-常用模块集锦
- 如何在socket编程的Tcp连接中实现心跳协议
- 一个关于char在不同平台下 的bug
- 推荐一本新手学习使用GitHub的必读书
- 页面间值的获取
- iOS开发 UI篇 — 懒加载
- Java error: Implicit super constructor is undefined for default constructor
- 一道淘汰85%面试者的百度开发者面试题