转载:Hive的安装以及使用
来源:互联网 发布:萧山网络问政平台下载 编辑:程序博客网 时间:2024/06/04 19:12
目录
- Hive的安装
- Hive的基本使用:CRUD
- Hive交互式模式
- 数据导入
- 数据导出
- Hive查询HiveQL
- Hive视图
- Hive分区表
1. Hive的安装
系统环境
装好hadoop的环境后,我们可以把Hive装在namenode机器上(c1)。
hadoop的环境,请参考:让Hadoop跑在云端系列文章,RHadoop实践系列之一:Hadoop环境搭建
下载: hive-0.9.0.tar.gz
解压到: /home/cos/toolkit/hive-0.9.0
hive配置
~ cd /home/cos/toolkit/hive-0.9.0~ cp hive-default.xml.template hive-site.xml~ cp hive-log4j.properties.template hive-log4j.properties
修改hive-site.xml配置文件
把Hive的元数据存储到MySQL中
~ vi conf/hive-site.xmljavax.jdo.option.ConnectionURLjdbc:mysql://c1:3306/hive_metadata?createDatabaseIfNotExist=trueJDBC connect string for a JDBC metastorejavax.jdo.option.ConnectionDriverNamecom.mysql.jdbc.DriverDriver class name for a JDBC metastorejavax.jdo.option.ConnectionUserNamehiveusername to use against metastore databasejavax.jdo.option.ConnectionPasswordhivepassword to use against metastore databasehive.metastore.warehouse.dir/user/hive/warehouselocation of default database for the warehouse
修改hive-log4j.properties
#log4j.appender.EventCounter=org.apache.hadoop.metrics.jvm.EventCounterlog4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
设置环境变量
~ sudo vi /etc/environmentPATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/home/cos/toolkit/ant184/bin:/home/cos/toolkit/jdk16/bin:/home/cos/toolkit/maven3/bin:/home/cos/toolkit/hadoop-1.0.3/bin:/home/cos/toolkit/hive-0.9.0/bin"JAVA_HOME=/home/cos/toolkit/jdk16ANT_HOME=/home/cos/toolkit/ant184MAVEN_HOME=/home/cos/toolkit/maven3HADOOP_HOME=/home/cos/toolkit/hadoop-1.0.3HIVE_HOME=/home/cos/toolkit/hive-0.9.0CLASSPATH=/home/cos/toolkit/jdk16/lib/dt.jar:/home/cos/toolkit/jdk16/lib/tools.jar
在hdfs上面,创建目录
$HADOOP_HOME/bin/hadoop fs -mkidr /tmp$HADOOP_HOME/bin/hadoop fs -mkidr /user/hive/warehouse$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse
在MySQL中创建数据库
create database hive_metadata;grant all on hive_metadata.* to hive@'%' identified by 'hive';grant all on hive_metadata.* to hive@localhost identified by 'hive';ALTER DATABASE hive_metadata CHARACTER SET latin1;
手动上传mysql的jdbc库到hive/lib
~ ls /home/cos/toolkit/hive-0.9.0/libmysql-connector-java-5.1.22-bin.jar
启动hive
#启动metastore服务~ bin/hive --service metastore &Starting Hive Metastore Server#启动hiveserver服务~ bin/hive --service hiveserver &Starting Hive Thrift Server#启动hive客户端~ bin/hive shellLogging initialized using configuration in file:/root/hive-0.9.0/conf/hive-log4j.propertiesHive history file=/tmp/root/hive_job_log_root_201211141845_1864939641.txthive> show tablesOK
查询MySQL数据库中的元数据
~ mysql -uroot -pmysql> use hive_metadata;Database changedmysql> show tables;+-------------------------+| Tables_in_hive_metadata |+-------------------------+| BUCKETING_COLS || CDS || COLUMNS_V2 || DATABASE_PARAMS || DBS || IDXS || INDEX_PARAMS || PARTITIONS || PARTITION_KEYS || PARTITION_KEY_VALS || PARTITION_PARAMS || PART_COL_PRIVS || PART_PRIVS || SDS || SD_PARAMS || SEQUENCE_TABLE || SERDES || SERDE_PARAMS || SORT_COLS || TABLE_PARAMS || TBLS || TBL_COL_PRIVS || TBL_PRIVS |+-------------------------+23 rows in set (0.00 sec)
Hive已经成功安装,下面是hive的使用攻略。
2. Hive的基本使用
1. 进入hive控制台
~ cd /home/cos/toolkit/hive-0.9.0~ bin/hive shellLogging initialized using configuration in file:/home/cos/toolkit/hive-0.9.0/conf/hive-log4j.propertiesHive history file=/tmp/cos/hive_job_log_cos_201307160003_95040367.txthive>
新建表
#创建数据(文本以tab分隔)~ vi /home/cos/demo/t_hive.txt16 2 361 12 1341 2 3117 21 371 2 311 12 3411 2 34#创建新表hive> CREATE TABLE t_hive (a int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';OKTime taken: 0.489 seconds#导入数据t_hive.txt到t_hive表hive> LOAD DATA LOCAL INPATH '/home/cos/demo/t_hive.txt' OVERWRITE INTO TABLE t_hive ;Copying data from file:/home/cos/demo/t_hive.txtCopying file: file:/home/cos/demo/t_hive.txtLoading data to table default.t_hiveDeleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hiveOKTime taken: 0.397 seconds
查看表和数据
#查看表 hive> show tables;OKt_hiveTime taken: 0.099 seconds#正则匹配表名hive>show tables '*t*';OKt_hiveTime taken: 0.065 seconds#查看表数据hive> select * from t_hive;OK16 2 361 12 1341 2 3117 21 371 2 311 12 3411 2 34Time taken: 0.264 seconds#查看表结构hive> desc t_hive;OKa intb intc intTime taken: 0.1 seconds
修改表
#增加一个字段hive> ALTER TABLE t_hive ADD COLUMNS (new_col String);OKTime taken: 0.186 secondshive> desc t_hive;OKa intb intc intnew_col stringTime taken: 0.086 seconds#重命令表名~ ALTER TABLE t_hive RENAME TO t_hadoop;OKTime taken: 0.45 secondshive> show tables;OKt_hadoopTime taken: 0.07 seconds
删除表
hive> DROP TABLE t_hadoop;OKTime taken: 0.767 secondshive> show tables;OKTime taken: 0.064 seconds
3. Hive交互式模式
- quit,exit: 退出交互式shell
- reset: 重置配置为默认值
- set = : 修改特定变量的值(如果变量名拼写错误,不会报错)
- set : 输出用户覆盖的hive配置变量
- set -v : 输出所有Hadoop和Hive的配置变量
- add FILE[S] *, add JAR[S] *, add ARCHIVE[S] * : 添加 一个或多个 file, jar, archives到分布式缓存
- list FILE[S], list JAR[S], list ARCHIVE[S] : 输出已经添加到分布式缓存的资源。
- list FILE[S] *, list JAR[S] *,list ARCHIVE[S] * : 检查给定的资源是否添加到分布式缓存
- delete FILE[S] *,delete JAR[S] *,delete ARCHIVE[S] * : 从分布式缓存删除指定的资源
- ! : 从Hive shell执行一个shell命令
- dfs : 从Hive shell执行一个dfs命令
- : 执行一个Hive 查询,然后输出结果到标准输出
- source FILE : 在CLI里执行一个hive脚本文件
4. 数据导入
还以刚才的t_hive为例。
#创建表结构hive> CREATE TABLE t_hive (a int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
从操作本地文件系统加载数据(LOCAL)
hive> LOAD DATA LOCAL INPATH '/home/cos/demo/t_hive.txt' OVERWRITE INTO TABLE t_hive ;Copying data from file:/home/cos/demo/t_hive.txtCopying file: file:/home/cos/demo/t_hive.txtLoading data to table default.t_hiveDeleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hiveOKTime taken: 0.612 seconds#在HDFS中查找刚刚导入的数据~ hadoop fs -cat /user/hive/warehouse/t_hive/t_hive.txt16 2 361 12 1341 2 3117 21 371 2 311 12 3411 2 34
从HDFS加载数据
创建表t_hive2hive> CREATE TABLE t_hive2 (a int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';#从HDFS加载数据hive> LOAD DATA INPATH '/user/hive/warehouse/t_hive/t_hive.txt' OVERWRITE INTO TABLE t_hive2;Loading data to table default.t_hive2Deleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hive2OKTime taken: 0.325 seconds#查看数据hive> select * from t_hive2;OK16 2 361 12 1341 2 3117 21 371 2 311 12 3411 2 34Time taken: 0.287 seconds
从其他表导入数据
hive> INSERT OVERWRITE TABLE t_hive2 SELECT * FROM t_hive ;Total MapReduce jobs = 2Launching Job 1 out of 2Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_201307131407_0002, Tracking URL = http://c1.wtmart.com:50030/jobdetails.jsp?jobid=job_201307131407_0002Kill Command = /home/cos/toolkit/hadoop-1.0.3/libexec/../bin/hadoop job -Dmapred.job.tracker=hdfs://c1.wtmart.com:9001 -kill job_201307131407_0002Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02013-07-16 10:32:41,979 Stage-1 map = 0%, reduce = 0%2013-07-16 10:32:48,034 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10:32:49,050 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10:32:50,068 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10:32:51,082 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10:32:52,093 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10:32:53,102 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.03 sec2013-07-16 10:32:54,112 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.03 secMapReduce Total cumulative CPU time: 1 seconds 30 msecEnded Job = job_201307131407_0002Ended Job = -314818888, job is filtered out (removed at runtime).Moving data to: hdfs://c1.wtmart.com:9000/tmp/hive-cos/hive_2013-07-16_10-32-31_323_5732404975764014154/-ext-10000Loading data to table default.t_hive2Deleted hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hive2Table default.t_hive2 stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 56, raw_data_size: 0]7 Rows loaded to t_hive2MapReduce Jobs Launched:Job 0: Map: 1 Cumulative CPU: 1.03 sec HDFS Read: 273 HDFS Write: 56 SUCCESSTotal MapReduce CPU Time Spent: 1 seconds 30 msecOKTime taken: 23.227 secondshive> select * from t_hive2;OK16 2 361 12 1341 2 3117 21 371 2 311 12 3411 2 34Time taken: 0.134 seconds
创建表并从其他表导入数据
#删除表hive> DROP TABLE t_hive;#创建表并从其他表导入数据hive> CREATE TABLE t_hive AS SELECT * FROM t_hive2 ;Total MapReduce jobs = 2Launching Job 1 out of 2Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_201307131407_0003, Tracking URL = http://c1.wtmart.com:50030/jobdetails.jsp?jobid=job_201307131407_0003Kill Command = /home/cos/toolkit/hadoop-1.0.3/libexec/../bin/hadoop job -Dmapred.job.tracker=hdfs://c1.wtmart.com:9001 -kill job_201307131407_0003Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02013-07-16 10:36:48,612 Stage-1 map = 0%, reduce = 0%2013-07-16 10:36:54,648 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10:36:55,657 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10:36:56,666 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10:36:57,673 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10:36:58,683 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec2013-07-16 10:36:59,691 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.13 secMapReduce Total cumulative CPU time: 1 seconds 130 msecEnded Job = job_201307131407_0003Ended Job = -670956236, job is filtered out (removed at runtime).Moving data to: hdfs://c1.wtmart.com:9000/tmp/hive-cos/hive_2013-07-16_10-36-39_986_1343249562812540343/-ext-10001Moving data to: hdfs://c1.wtmart.com:9000/user/hive/warehouse/t_hiveTable default.t_hive stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 56, raw_data_size: 0]7 Rows loaded to hdfs://c1.wtmart.com:9000/tmp/hive-cos/hive_2013-07-16_10-36-39_986_1343249562812540343/-ext-10000MapReduce Jobs Launched:Job 0: Map: 1 Cumulative CPU: 1.13 sec HDFS Read: 272 HDFS Write: 56 SUCCESSTotal MapReduce CPU Time Spent: 1 seconds 130 msecOKTime taken: 20.13 secondshive> select * from t_hive;OK16 2 361 12 1341 2 3117 21 371 2 311 12 3411 2 34Time taken: 0.109 seconds
仅复制表结构不导数据
hive> CREATE TABLE t_hive3 LIKE t_hive;hive> select * from t_hive3;OKTime taken: 0.077 seconds
从MySQL数据库导入数据
我们将在介绍Sqoop时讲。
5. 数据导出
从HDFS复制到HDFS其他位置
~ hadoop fs -cp /user/hive/warehouse/t_hive /~ hadoop fs -ls /t_hiveFound 1 items-rw-r--r-- 1 cos supergroup 56 2013-07-16 10:41 /t_hive/000000_0~ hadoop fs -cat /t_hive/000000_016236112134123117213712311123411234
通过Hive导出到本地文件系统
hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/t_hive' SELECT * FROM t_hive;Total MapReduce jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_201307131407_0005, Tracking URL = http://c1.wtmart.com:50030/jobdetails.jsp?jobid=job_201307131407_0005Kill Command = /home/cos/toolkit/hadoop-1.0.3/libexec/../bin/hadoop job -Dmapred.job.tracker=hdfs://c1.wtmart.com:9001 -kill job_201307131407_0005Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02013-07-16 10:46:24,774 Stage-1 map = 0%, reduce = 0%2013-07-16 10:46:30,823 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10:46:31,833 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10:46:32,844 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10:46:33,856 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10:46:34,865 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10:46:35,873 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.87 sec2013-07-16 10:46:36,884 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 0.87 secMapReduce Total cumulative CPU time: 870 msecEnded Job = job_201307131407_0005Copying data to local directory /tmp/t_hiveCopying data to local directory /tmp/t_hive7 Rows loaded to /tmp/t_hiveMapReduce Jobs Launched:Job 0: Map: 1 Cumulative CPU: 0.87 sec HDFS Read: 271 HDFS Write: 56 SUCCESSTotal MapReduce CPU Time Spent: 870 msecOKTime taken: 23.369 seconds#查看本地操作系统hive> ! cat /tmp/t_hive/000000_0;hive> 16236112134123117213712311123411234
6. Hive查询HiveQL
注:以下代码将去掉map,reduce的日志输出部分。
普通查询:排序,列别名,嵌套子查询
hive> FROM ( > SELECT b,c as c2 FROM t_hive > ) t > SELECT t.b, t.c2 > WHERE b>2 > LIMIT 2;12 1321 3
连接查询:JOIN
hive> SELECT t1.a,t1.b,t2.a,t2.b > FROM t_hive t1 JOIN t_hive2 t2 on t1.a=t2.a > WHERE t1.c>10;1 12 1 1211 2 11 241 2 41 261 12 61 1271 2 71 2
聚合查询1:count, avg
hive> SELECT count(*), avg(a) FROM t_hive;7 31.142857142857142
聚合查询2:count, distinct
hive> SELECT count(DISTINCT b) FROM t_hive;3
聚合查询3:GROUP BY, HAVING
#GROUP BYhive> SELECT avg(a),b,sum(c) FROM t_hive GROUP BY b,c16.0 2 356.0 2 6211.0 2 3461.0 12 131.0 12 3417.0 21 3#HAVINGhive> SELECT avg(a),b,sum(c) FROM t_hive GROUP BY b,c HAVING sum(c)>3056.0 2 6211.0 2 341.0 12 34
7. Hive视图
Hive视图和数据库视图的概念是一样的,我们还以t_hive为例。
hive> CREATE VIEW v_hive AS SELECT a,b FROM t_hive where c>30;hive> select * from v_hive;41 271 21 1211 2
删除视图
hive> DROP VIEW IF EXISTS v_hive;OKTime taken: 0.495 seconds
8. Hive分区表
分区表是数据库的基本概念,但很多时候数据量不大,我们完全用不到分区表。Hive是一种OLAP数据仓库软件,涉及的数据量是非常大的,所以分区表在这个场景就显得非常重要!!
下面我们重新定义一个数据表结构:t_hft
创建数据
~ vi /home/cos/demo/t_hft_20130627.csv000001,092023,9.76000002,091947,8.99000004,092002,9.79000005,091514,2.2000001,092008,9.70000001,092059,9.45~ vi /home/cos/demo/t_hft_20130628.csv000001,092023,9.76000002,091947,8.99000004,092002,9.79000005,091514,2.2000001,092008,9.70000001,092059,9.45
创建数据表
DROP TABLE IF EXISTS t_hft;CREATE TABLE t_hft(SecurityID STRING,tradeTime STRING,PreClosePx DOUBLE) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
创建分区数据表
根据业务:按天和股票ID进行分区设计
DROP TABLE IF EXISTS t_hft;CREATE TABLE t_hft(SecurityID STRING,tradeTime STRING,PreClosePx DOUBLE) PARTITIONED BY (tradeDate INT)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
导入数据
#20130627hive> LOAD DATA LOCAL INPATH '/home/cos/demo/t_hft_20130627.csv' OVERWRITE INTO TABLE t_hft PARTITION (tradeDate=20130627);Copying data from file:/home/cos/demo/t_hft_20130627.csvCopying file: file:/home/cos/demo/t_hft_20130627.csvLoading data to table default.t_hft partition (tradedate=20130627)#20130628hive> LOAD DATA LOCAL INPATH '/home/cos/demo/t_hft_20130628.csv' OVERWRITE INTO TABLE t_hft PARTITION (tradeDate=20130628);Copying data from file:/home/cos/demo/t_hft_20130628.csvCopying file: file:/home/cos/demo/t_hft_20130628.csvLoading data to table default.t_hft partition (tradedate=20130628)
查看分区表
hive> SHOW PARTITIONS t_hft;tradedate=20130627tradedate=20130628Time taken: 0.082 seconds
查询数据
hive> select * from t_hft where securityid='000001';000001 092023 9.76 20130627000001 092008 9.7 20130627000001 092059 9.45 20130627000001 092023 9.76 20130628000001 092008 9.7 20130628000001 092059 9.45 20130628hive> select * from t_hft where tradedate=20130627 and PreClosePx<9;000002 091947 8.99 20130627000005 091514 2.2 20130627
Hive基于使用完成,这些都是日常的操作。后面我会继续讲一下,HiveQL优化及Hive的运维。
- 转载:Hive的安装以及使用
- hive的安装以及简单的使用
- hadoop集群安装以及Hive、sqoop的使用
- hive与mysql安装 以及使用
- Hive的安装与使用
- Hive的安装与使用
- hive的安装使用总结
- [转载]hive的查询注意事项以及优…
- hive安装以及配置
- Hive语言的使用,以及性能优化
- 数据仓库以及Hive的使用总结
- Hadoop详解(七)——Hive的原理和安装配置和UDF,flume的安装和配置以及简单使用,flume+hive+Hadoop进行日志处理
- hive 使用TEZ的安装配置
- hive的安装与简单使用
- hive简介以及安装配置
- hive安装过程中遇到的一些问题以及解决方案
- Hive的安装配置和连接mysql以及基础操作
- mysql以及hive安装以及部署
- CSS浮动 (比较详细、生动、经典)
- 提高数据库查询性能浅谈
- 设计模式读书笔记-----原型模式
- 第10,11周 项目一(2)
- html复习第七天 京东首页布局完成
- 转载:Hive的安装以及使用
- OS X 快捷键——上篇
- 3D 标签云
- 一个主机两个域名对应两个网站
- RecycleView初尝试
- 设计模式读书笔记-----单例模式
- 如何利用github打造博客专属域名
- 选择排序C++实现
- 中文字符集编码Unicode ,gb2312 , cp936 ,GBK,GB18030