Hive基本使用
来源:互联网 发布:mac os 百度云盘 编辑:程序博客网 时间:2024/05/29 19:32
1、生成测试数据
8、视图创建
10、修改表结构
1)编写函数类,继承自UDF,函数调用时会调用evaluate
3)保存至hive客户端节点一个目录中
4)加载jar文件
12、利用Hive API操作Hive实例
使用ASCII 06作为分隔符,在bash下用ctrl+v ctrl+f输入seq 1 9999999 | awk '{print $1"^F"$1"aaaaaaaaaaaaaaaa"}' > a.txt2、创建表结构
hive (dcenter)> create table t(id int,msg string) row format delimited fields terminated by '\006' stored as textfile;OKTime taken: 0.191 seconds--查看表结构hive (dcenter)> desc formatted t;OK# col_name data_type comment id int None msg string None # Detailed Table Information Database: dcenter Owner: hc CreateTime: Thu Dec 26 11:00:04 CST 2013 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://hc1:9000/hive/warehouse/dcenter.db/t Table Type: MANAGED_TABLE Table Parameters: transient_lastDdlTime1388026804 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: field.delim \u0006 serialization.format\u0006 Time taken: 0.183 seconds3、导入表数据
--本地载入hive (dcenter)> load data local inpath '/tmp/a.txt' overwrite into table t;Copying data from file:/tmp/a.txtCopying file: file:/tmp/a.txtLoading data to table dcenter.tDeleted hdfs://hc1:9000/hive/warehouse/dcenter.db/tOKTime taken: 28.181 seconds--HDFS载入hive (dcenter)> load data inpath '/tmp/a.*' overwrite into table t;Loading data to table dcenter.tDeleted hdfs://hc1:9000/hive/warehouse/dcenter.db/tOKTime taken: 0.768 seconds4、查看表文件
--HDFS载入的时候,实际做的是文件移动hive (dcenter)> dfs -ls /hive/warehouse/dcenter.db/t;Found 1 items-rw-r--r-- 2 hc supergroup 317777760 2013-12-26 11:28 /hive/warehouse/dcenter.db/t/a.txt5、外部表创建
--外部表文件的位置要位于HDFS中,客户端节点上的文件他是不能识别的hive (dcenter)> create external table t1(id int,msg string) row format delimited fields terminated by '\006' stored as textfile ocation '/tmp/b.txt';OKTime taken: 0.213 secondshive (dcenter)> desc formatted t1;OK# col_name data_type comment id int None msg string None # Detailed Table Information Database: dcenter Owner: hc CreateTime: Thu Dec 26 11:37:17 CST 2013 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://hc1:9000/tmp/b.txt Table Type: EXTERNAL_TABLE Table Parameters: EXTERNAL TRUE transient_lastDdlTime1388029037 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: field.delim \u0006 serialization.format\u0006 Time taken: 0.18 seconds6、分区表创建
--分区表物理上把不同分区放在各自的目录文件中,可以在load数据的时候动态创建hive (dcenter)> create table t2 ( id int,msg string) partitioned by( indate string) row format delimited fields terminated by '\006' stored as textfile ;OKTime taken: 1.524 secondshive (dcenter)> load data local inpath '/tmp/b.txt' overwrite into table t2 partition(indate="20131001");Copying data from file:/tmp/b.txtCopying file: file:/tmp/b.txtLoading data to table dcenter.t2 partition (indate=20131001)OKTime taken: 28.818 secondshive (dcenter)> load data local inpath '/tmp/b.txt' overwrite into table t2 partition(indate="20131002");Copying data from file:/tmp/b.txtCopying file: file:/tmp/b.txtLoading data to table dcenter.t2 partition (indate=20131002)OKTime taken: 28.348 seconds访问分区的方式如下:hive (dcenter)> select count(1) from t2 where indate>='20131001' and indate<'20131002';Total MapReduce jobs = 1Launching Job 1 out of 1.....7、导出表数据
hive (dcenter)> insert overwrite local directory '/tmp/t' select * from t;Total MapReduce jobs = 1Launching Job 1 out of 1...--大致检测表数据hive (dcenter)> !wc -l /tmp/t/000000_0;9999 /tmp/t/000000_0[hc@hc8 t]$ hexdump -C /tmp/t/000000_0 | head -200000000 31 01 31 61 61 61 61 61 61 61 61 61 61 61 61 61 |1.1aaaaaaaaaaaaa|00000010 61 61 61 0a 32 01 32 61 61 61 61 61 61 61 61 61 |aaa.2.2aaaaaaaaa|[hc@hc8 t]$ hexdump -C /tmp/a.txt | head -200000000 31 06 31 61 61 61 61 61 61 61 61 61 61 61 61 61 |1.1aaaaaaaaaaaaa|00000010 61 61 61 0a 32 06 32 61 61 61 61 61 61 61 61 61 |aaa.2.2aaaaaaaaa|可以看导入文件的分隔符虽然为06,但导出后变为了01,这也是Hive的默认列分隔符,可以更改。
8、视图创建
--视图可以降低我们编写SQL的复杂程度,如下面转化hive (dcenter)> from( select * from t where id<100 order by id asc) > a select id,msg where id>98;Total MapReduce jobs = 1Launching Job 1 out of 1...hive (dcenter)> create view v_t as select * from t where id<100 order by id asc;OKTime taken: 0.284 secondshive (dcenter)> select * from v_t where id>98;OK9999aaaaaaaaaaaaaaaaTime taken: 33.617 seconds9、explain的使用
hive (dcenter)> explain select count(1) from t a join t1 b on (a.id=b.id);OKABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME t) a) (TOK_TABREF (TOK_TABNAME t1) b) (= (. (TOK_TABLE_OR_COL a) id) (. (TOK_TABLE_OR_COL b) id)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_FUNCTION count 1)))))STAGE DEPENDENCIES://阶段间的依赖关系 Stage-1 is a root stage Stage-2 depends on stages: Stage-1 Stage-0 is a root stageSTAGE PLANS: Stage: Stage-1 Map Reduce --该阶段扫描t t1 两个表,先做map操作,打相应文件标记 Alias -> Map Operator Tree: a TableScan alias: a Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 0 b TableScan alias: b Reduce Output Operator key expressions: expr: id type: int sort order: + Map-reduce partition columns: expr: id type: int tag: 1 Reduce Operator Tree: --reduce阶段读取来自不同文件的相同KEY,进行合并 Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 1 handleSkewJoin: false Select Operator Group By Operator aggregations: expr: count(1) bucketGroup: false mode: hash outputColumnNames: _col0 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-2 --该阶段统计Stage-1的记录数 Map Reduce Alias -> Map Operator Tree: hdfs://hc1:9000/tmp/hive-hc/hive_2013-12-26_15-54-38_047_2645242837849245691/-mr-10002 Reduce Output Operator sort order: tag: -1 value expressions: expr: _col0 type: bigint Reduce Operator Tree: Group By Operator aggregations: expr: count(VALUE._col0) bucketGroup: false mode: mergepartial outputColumnNames: _col0 Select Operator expressions: expr: _col0 type: bigint outputColumnNames: _col0 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 --取出数据 Fetch Operator limit: -1Time taken: 0.262 seconds
Stage-1阶段操作如下图:
10、修改表结构
--需要导入的数据,b.txt比a.txt多一列[root@hc8 tmp]# cat a.txt11aaaaaaaaaaaaaaaa22aaaaaaaaaaaaaaaa[root@hc8 tmp]# cat b.txt 11aaaaaaaaaaaaaaaa622aaaaaaaaaaaaaaaa7--创建分区表hive (dcenter)> create table t ( id int,msg string) partitioned by( indate string) row format delimited fields terminated by '\006' stored as textfile ;OKTime taken: 0.453 seconds--导入一个分区hive (dcenter)> load data local inpath '/tmp/a.txt' overwrite into table t partition(indate="1001");Copying data from file:/tmp/a.txtCopying file: file:/tmp/a.txtLoading data to table dcenter.t partition (indate=1001)OKTime taken: 1.111 seconds--查询数据hive (dcenter)> select * from t;OK11aaaaaaaaaaaaaaaa100122aaaaaaaaaaaaaaaa1001Time taken: 0.408 seconds--修改表结构hive (dcenter)> alter table t add columns (id2 int);OKTime taken: 0.371 seconds--再次导入分区hive (dcenter)> load data local inpath '/tmp/b.txt' overwrite into table t partition(indate="1002");Copying data from file:/tmp/b.txtCopying file: file:/tmp/b.txtLoading data to table dcenter.t partition (indate=1002)OKTime taken: 0.918 seconds--查询数据hive (dcenter)> select * from t;OK11aaaaaaaaaaaaaaaaNULL100122aaaaaaaaaaaaaaaaNULL100111aaaaaaaaaaaaaaaa6100222aaaaaaaaaaaaaaaa71002Time taken: 0.436 seconds11、自定义函数的使用
1)编写函数类,继承自UDF,函数调用时会调用evaluate
package com.zy.hive.function;import java.util.ArrayList;import java.util.List;import org.apache.hadoop.hive.ql.exec.UDF;public class range extends UDF{//构建IP查询库private static List<IpRange> ipLib = new ArrayList<IpRange>();static{for(int i=0;i<5695104;i++){ipLib.add(new IpRange(i,i+5,"USA"+i));}}//调用时执行的函数public String evaluate(int ip){IpRange ir;for(int i=0;i<ipLib.size();i++){ir = ipLib.get(i);if(ip>=ir.getStartip() && ip <= ir.getEndip()){return ir.getCountry();}}return null;}public static void main(String[] args) {range a = new range();for(int i=0;i<100;i++)System.out.println(a.evaluate(2));}}//ip类结构class IpRange{private int startip;private int endip;private String country;public IpRange(int startip, int endip, String country) {this.startip = startip;this.endip = endip;this.country = country;}public int getStartip() {return startip;}public void setStartip(int startip) {this.startip = startip;}public int getEndip() {return endip;}public void setEndip(int endip) {this.endip = endip;}public String getCountry() {return country;}public void setCountry(String country) {this.country = country;}}2)导出jar文件
3)保存至hive客户端节点一个目录中
4)加载jar文件
hive (dcenter)> add jar /tmp/lihm_udf1.jar;Added /tmp/lihm_udf1.jar to class pathAdded resource: /tmp/lihm_udf1.jar5)创建临时函数
hive (dcenter)> create temporary function findip as 'com.zy.hive.function.range';OKTime taken: 5.669 seconds6)调用函数
hive (dcenter)> select id,findip(id) from t;1USA02USA01USA02USA0Time taken: 28.983 seconds7)删除函数
hive (dcenter)> drop temporary function findip;OKTime taken: 0.013 seconds
12、利用Hive API操作Hive实例
package com.zy.hive.function;import java.sql.Connection;import java.sql.DriverManager;import java.sql.ResultSet;import java.sql.Statement;public class HiveServerTest {public static void main(String[] args) throws Exception {String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";Class.forName(driverName);Connection conn = DriverManager.getConnection("jdbc:hive://192.168.1.1:10000/dcenter","","");Statement stmt = conn.createStatement();stmt.executeQuery("use dcenter");//query record from a tableResultSet rst = stmt.executeQuery("select id,msg from t");while(rst.next()){System.out.println("id:"+rst.getInt(1)+" msg:"+rst.getString(2));}//show tablesrst = stmt.executeQuery("show tables");while(rst.next()){System.out.println(rst.getString(1)+"\t");}conn.close();}}
0 0
- Hive基本使用①
- Hive基本使用
- Hive 基本使用
- HIVE基本语法使用
- Hive的基本使用
- Hive 的基本使用
- hive的基本使用
- Hive基本使用
- hive的基本使用01
- hive的基本使用02
- hive使用过的基本命令
- Hive基本类型使用-类型转换
- Hive基本使用——命令行
- Hive简介、什么是Hive、为什么使用Hive、Hive的特点、Hive架构图、Hive基本组成、Hive与Hadoop的关系、Hive与传统数据库对比、Hive数据存储(来自学习资料)
- 走向云计算之Hive基本架构和使用详解
- hive的基本简介及安装、配置、使用(一)
- hive 数据源 使用mysql; hive 启动报错; 加载数据 建表等基本命令
- Hive基本命令整理
- UFBFN RNDIS
- 安装tftp服务器出现如下提示:xinetd is needed by tftp-server-0.42-3.1.i386
- 编辑博客
- thinkphp 2013-12-26
- yii分页
- Hive基本使用
- mysql数据源
- ffmpeg parse_loglevel(argc, argv, options)函数解析
- USB音频类描述符及其说明
- Direct3D学习手记六:深度
- 选择排序算法
- LeetCode之Plus One
- 深入浅出cocos2d-x(1):Cocos2d-x 中“HelloLua” 深入分析
- WinJS.Promise探索