Linux下hive-0.13.1安装教程

来源:互联网 发布:淘宝手机卖家怎么退款 编辑:程序博客网 时间:2024/05/09 01:14

       Hive可以视为在Hadoop和HDFS之上为用户封装一层便于用户使用的接口,该接口有丰富的样式,包括命令终端、Web UI以及JDBC/ODBC等(本文都将讲到)。因此Hive的安装依赖于Hadoop。下面介绍如何下载、安装、配置和使用Hive。

一.Hive安装和配置

1.先决条件

       ubuntu下已经安装好hadoop(我装的是hadoop2.4.0,系统为ubuntu13.10)。

2.下载Hive安装包

       当前Hive最新版本为0.13.1,可到apache官网下载,下面给出链接:

       http://apache.fayea.com/apache-mirror/hive/

       链接下提供了两种安装包:

       apache-hive-0.13.1-bin.tar.gz:二进制版,已经编译好的,下载解压后就直接能用。

       apache-hive-0.13.1-src.tar.gz:源代码版,下载后要先编译后才能用,编译用mvn即可。

       下载二进制版,然后将其解压到你Hadoop所在的那个目录下(其实解压的哪里都可以,但毕竟是和Hadoop相关的嘛放在一起方便管理,我的是在/opt下,以下路径中请将/opt修改为你Hive所在的路径)

        解压:tar -zvxf apache-hive-0.13.1-bin.tar.gz /opt

        重命名下:mv /opt/apache-hive-0.13.1-bin /opt/hive-0.13.1

3.配置系统环境变量/etc/profile或/root/.bashrc

        export HIVE_HOME=/opt/hive-0.13.1

        export PATH=$PATH:$HIVE_HOME/bin:$HIVE_HOME/conf

        source /etc/profile 使刚刚的配置生效

4.配置Hive

        hive的配置文件放在$HIVE_HOME/conf下,里面有4个默认的配置文件模板

          hive-default.xml.template                           默认模板

          hive-env.sh.template                hive-env.sh默认配置
          hive-exec-log4j.properties.template    exec默认配置
          hive-log4j.properties.template               log默认配置
        可不做任何修改hive也能运行,默认的配置元数据是存放在Derby数据库里面的,大多数人都不怎么熟悉,我们得改用mysql来存储我们的元数据,以及修改数据存放位置和日志存放位置等使得我们必须配置自己的环境,下面介绍如何配置。
        (1)创建配置文件,直接copy默认配置文件再修改即可,用户自定义配置会覆盖默认配置
         cp $HIVE_HOME/conf/hive-default.xml.template $HIVE_HOME/conf/hive-site.xml
       cp $HIVE_HOME/conf/hive-env.sh.template $HIVE_HOME/conf/hive-env.sh
       cp $HIVE_HOME/conf/hive-exec-log4j.properties.template $HIVE_HOME/conf/hive-exec-log4j.properties
       cp $HIVE_HOME/conf/hive-log4j.properties.template $HIVE_HOME/conf/hive-log4j.properties

        (2)修改 hive-env.sh
         vi $HIVE_HOME/conf/hive-env.sh 
       export HADOOP_HOME=/home/hadoop/hadoop-2.4.0
       export HIVE_CONF_DIR=/home/hadoop/hive-0.13.1/conf

        (3)修改 hive-log4j.properties
          mkdir $HIVE_HOME/logs
        vi $HIVE_HOME/conf/hive-log4j.properties
        hive.log.dir=/opt/hive-0.13.1/logs

        (4)修改 hive-site.xml
         vi $HIVE_HOME/conf/hive-site.xml
     <configuration>
     <property>
       <name>hive.metastore.warehouse.dir</name>
       <value>/hive/warehouse</value>
     </property>
     <property>
       <name>hive.exec.scratchdir</name>
       <value>/hive/scratchdir</value>
     </property>
     <property>
       <name>hive.querylog.location</name>
       <value>/opt/hive-0.13.1/logs</value>
     </property>
     <property>
       <name>javax.jdo.option.ConnectionURL</name>
       <value>jdbc:mysql://127.0.0.1:3306/hive?createDatabaseIfNotExist=true</value>
     </property>
     <property>
       <name>javax.jdo.option.ConnectionDriverName</name>
       <value>com.mysql.jdbc.Driver</value>
     </property>
     <property>
       <name>javax.jdo.option.ConnectionUserName</name>
       <value>root</value>
     </property>
     <property>
       <name>javax.jdo.option.ConnectionPassword</name>
       <value>123456</value>
     </property>
     <property>
       <name>hive.aux.jars.path</name>
       <value>file:///opt/hive/lib/hive-hbase-handler-0.13.1.jar,file:///opt/hive/lib/hbase-client-0.98.2-hadoop2.jar,file:///opt/hive/lib/hbase-common-0.98.2-hadoop2.jar,file:///opt/hive/lib/hbase-common-0.98.2-hadoop2-tests.jar,file:///opt/hive/lib/hbase-protocol-0.98.2-hadoop2.jar,file:///opt/hive/lib/hbase-server-0.98.2-hadoop2.jar,file:///opt/hive/lib/htrace-core-2.04.jar,file:///opt/hive/lib/zookeeper-3.4.6.jar,file:///opt/hive/lib/protobuf-java-2.5.0.jar,file:///opt/hive/lib/guava-12.0.1.jar</value>
     </property>

        下面解释下一些重要的配置项:
         hive.metastore.warehouse.dir:指定hive的数据存储目录,指定的是HDFS上的位置,默认值:            /user/hive/warehouse
         hive.exec.scratchdir:指定hive的临时数据目录,默认位置为:/tmp/hive-${user.name}

               javax.jdo.option.ConnectionURL:指定hive连接的数据库的数据库连接字符串
         javax.jdo.option.ConnectionDriverName:指定驱动的类入口名称
         hive.aux.jars.path:是与hbase整合的时候需要用到的jar包,必须加上
5.复制hbase库到hive下面
         cp $HBASE_HOME/lib/hbase-* $HIVE_HOME/lib/
         cp $HBASE_HOME/lib/htrace-core-2.04.jar $HIVE_HOME/lib/
         cp $HBASE_HOME/lib/zookeeper-3.4.6.jar $HIVE_HOME/lib/
         cp $HBASE_HOME/lib/protobuf-java-2.5.0.jar $HIVE_HOME/lib/
         cp $HBASE_HOME/lib/guava-12.0.1.jar $HIVE_HOME/lib/
6.上传jdbc jar包
         默认hive使用Derby数据库存放元数据,并且也集成了Derby数据库及连接驱动jar包,但此处我们换成了MySQL作为数据库,所以还必须得有MySQL的JDBC驱动包。
         下载链接:http://dev.mysql.com/downloads/connector/j/5.0.html
          解压:tar -zxvf mysql-connector-java-5.1.32.tar.gz
          将驱动包复制到$HIVE_HOME/lib下:cp mysql-connector-java-5.1.32-bin.jar$HIVE_HOME/lib        
 7.安装并配置MySQL
          如果你还没装MySQL,就先装MySQL
          安装mysql,执行命令:sudo apt-get install mysql-server
          安装时用户名设为root,密码为123456,同上面配置
          安装完成后,只拥有root用户。下面我们创建Hive系统的用户权限,步骤如下:
          (1)创建用户
           CREATE USER 'hive'@'%' IDENTIFIED BY 'hive';
          (2)赋予权限
           GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' WITH GRANT OPTION;
          (3)强制写出
           flush privileges;
           此外,为了使远程用户也能访问MySQL,需要修改/etc/mysql/my.cnf文件,将bind-address一行注释掉,该参数绑定本地用户访问。
           配置完后,重启MySQL:sudo /etc/ini.d/mysql restart
           至此,Hive配置完成。

二.Hive shell

           1.创建测试数据,以及数据仓库目录

          

vi /opt/hive-0.13.1/testdata001.datluffy,20zero,21hadoop fs -mkdir -p hive/warehouse</span>


           2.使用shell命令,测试hive
          
root@Ubuntu-Kylin:/opt/hive# hive14/08/28 21:26:13 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* insteadLogging initialized using configuration in file:/opt/hive/conf/hive-log4j.propertieshive> show databases;OKdefaulttest1test2test3Time taken: 1.375 seconds, Fetched: 4 row(s)hive> create database test4;OKTime taken: 0.946 secondshive> show databases;OKdefaulttest1test2test3test4Time taken: 0.043 seconds, Fetched: 5 row(s)hive> use test4;OKTime taken: 0.047 secondshive> create table testtable (name string,age int) row format delimited fields terminated by ',' stored as textfile;OKTime taken: 0.813 secondshive> show tables;OKtesttableTime taken: 0.056 seconds, Fetched: 1 row(s)hive> load data local inpath '/opt/hive/testdata001.dat' overwrite into table testtable;Copying data from file:/opt/hive/testdata001.datCopying file: file:/opt/hive/testdata001.datLoading data to table test4.testtablermr: DEPRECATED: Please use 'rm -r' instead.Deleted hdfs://localhost:9000/hive/warehouse/test4.db/testtableTable test4.testtable stats: [numFiles=1, numRows=0, totalSize=17, rawDataSize=0]OKTime taken: 2.532 secondshive> select * from testtable;OKluffy    20zero    21Time taken: 1.061 seconds, Fetched: 2 row(s)
           至此,hive测试成功。
           3.hive to hbase(Hive中的表数据导入到Hbase中去)
          
hive> use test4;OKTime taken: 0.043 secondshive> create table hive2hbase_1(key string,value int) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "hive2hbase_1");OKTime taken: 6.56 secondshive> show tables;OKhive2hbase_1testtableTime taken: 0.049 seconds, Fetched: 2 row(s)</span></span>
           4.将testtable表中的数据导入到表hive2hbase_1中,会自动同步到hbase
hive> insert overwrite table hive2hbase_1 select * from testtable;Total jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1409199368386_0002, Tracking URL = http://Ubuntu-Kylin:8088/proxy/application_1409199368386_0002/Kill Command = /opt/hadoop-2.4.0/bin/hadoop job  -kill job_1409199368386_0002Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 02014-08-28 22:05:00,025 Stage-0 map = 0%,  reduce = 0%2014-08-28 22:05:34,630 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 6.74 secMapReduce Total cumulative CPU time: 6 seconds 740 msecEnded Job = job_1409199368386_0002MapReduce Jobs Launched: Job 0: Map: 1   Cumulative CPU: 8.11 sec   HDFS Read: 242 HDFS Write: 0 SUCCESSTotal MapReduce CPU Time Spent: 8 seconds 110 msecOKTime taken: 89.487 secondshive> select * from hive2hbase_1;OKluffy20zero21Time taken: 0.79 seconds, Fetched: 2 row(s)
           5.用shell连接hbase,查看hive过来的数据是否已经存在
root@Ubuntu-Kylin:/opt/hive# hbase shell2014-08-28 22:34:47,023 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available2014-08-28 22:34:47,240 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available2014-08-28 22:34:47,322 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available2014-08-28 22:34:47,395 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available2014-08-28 22:34:47,464 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.availableHBase Shell; enter 'help<RETURN>' for list of supported commands.Type "exit<RETURN>" to leave the HBase ShellVersion 0.98.2-hadoop2, r1591526, Wed Apr 30 20:17:33 PDT 2014hbase(main):001:0> listTABLE                                                                           SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/opt/hbase-0.98.2-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/hadoop-2.4.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.hbase2hive                                                                      hive2hbase                                                                      hive2hbase_1                                                                    student                                                                         teacher                                                                         test                                                                            6 row(s) in 5.5730 seconds=> ["hbase2hive", "hive2hbase", "hive2hbase_1", "student", "teacher", "test"]hbase(main):002:0> scan 'hive2hbase_1'ROW                   COLUMN+CELL                                                luffy                column=cf1:val, timestamp=1409234737169, value=20          zero                 column=cf1:val, timestamp=1409234737169, value=21         2 row(s) in 0.8880 seconds
           至此,hive to hbase 测试功能正常。
           6.hbase to hive(Hbase中的表数据导入到Hive)
       1)、在hbase下创建表hbase2hive_1
hbase(main):003:0> create 'hbase2hive_1','name','age'0 row(s) in 1.1060 seconds=> Hbase::Table - hbase2hive_1hbase(main):004:0> put 'hbase2hive_1','lucy','age','19'0 row(s) in 0.2000 secondshbase(main):005:0> put 'hbase2hive_1','lazi','age','20'0 row(s) in 0.0160 secondshbase(main):006:0> scan 'hbase2hive_1'ROW                   COLUMN+CELL                                                lazi                 column=age:, timestamp=1409236970365, value=20             lucy                 column=age:, timestamp=1409236934766, value=19            2 row(s) in 0.0430 seconds
           2)、Hive下创建表连接Hbase中的表
root@Ubuntu-Kylin:/opt/hive# hive14/08/28 22:45:12 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* insteadLogging initialized using configuration in file:/opt/hive/conf/hive-log4j.propertieshive> show databases;OKdefaulttest1test2test4Time taken: 1.495 seconds, Fetched: 4 row(s)hive> use test4;OKTime taken: 0.048 secondshive> show tables;OKhive2hbase_1testtableTime taken: 0.059 seconds, Fetched: 2 row(s)hive> create external table hbase2hive_1(key string,value map<string,string>) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" ="age:") TBLPROPERTIES ("hbase.table.name" = "hbase2hive_1");OKTime taken: 0.385 secondshive> select * from hbase2hive_1;OKlazi    {"":"20"}lucy    {"":"19"}Time taken: 0.661 seconds, Fetched: 2 row(s)
           至此,hbase to hive 测试成功。

三.Hive 网络(Web UI)接口

           使用Hive的网络接口需要修改配置文件hive-site.xml
           <property>
           <name>hive.hwi.war.file</name>
           <value>lib/hive-hwi-0.13.1.war</value>
           <description>This sets the path to the HWI war file, relative to ${HIVE_HOME}. </description>
           </property>

           <property>
           <name>hive.hwi.listen.host</name>
           <value>0.0.0.0</value>
           <description>This is the host address the Hive Web Interface will listen on</description>
           </property>

           <property>
           <name>hive.hwi.listen.port</name>
           <value>9999</value>
           <description>This is the port the Hive Web Interface will listen on</description>
           </property>
           这里需要注意的是hive0.13.1版本中没有hive-hwi-0.13.1.war这个包,可以到网上去下,我的解决方法是:去源码中找到hwi/web/包。先用zip 打包成.zip文件,然后在修改后缀名得到。
wget http://apache.fayea.com/apache-mirror/hive/hive-0.13.1/apache-hive-0.13.1-src.tar.gztar -zxvf apache-hive-0.13.1-src.tar.gzcd apache-hive-0.13.1-srccd hwi/webzip hive-hwi-0.13.1.zip ./*
修改后缀名为war
mv hive-hwi-0.13.1.war $HIVE_HOME/lib</span>
                 配置完成后,开启服务:
root@Ubuntu-Kylin:/opt/hive# hive --service hwi14/08/29 00:11:07 INFO hwi.HWIServer: HWI is starting up14/08/29 00:11:10 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead14/08/29 00:11:10 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog14/08/29 00:11:10 INFO mortbay.log: jetty-6.1.2614/08/29 00:11:10 INFO mortbay.log: Extract /opt/hive/lib/hive-hwi-0.13.1.war to /tmp/Jetty_0_0_0_0_9999_hive.hwi.0.13.1.war__hwi__.xvnhjk/webapp14/08/29 00:11:11 INFO mortbay.log: Started SocketConnector@0.0.0.0:9999
          这样就能通过浏览器访问Hive了,输入地址:http:/localhost:9999/hwi,界面如图:

          可以看到Hive的网络接口拉近了用户和系统之间的距离。可以通过创建会话,并进行查询。

四.Hive的JDBC 接口

          Eclipse环境配置
          在Eclipse中新建一个Java项目,我的命名:HiveJdbcClient
之后右键项目,点击Build Path->Configure Build Path->Libraries将$HIVE_HOME/lib下的全部Jar包和hadoop-common-2.4.0.jar添加到项目中。
          Eclipse上运行Hive程序时:需要hive开启端口监听用户的连接,在终端输入命令:

root@Ubuntu-Kylin:/opt/hive# hive --service hiveserverStarting Hive Thrift Server
                下面是一个用java编写的JDBC客户端访问的代码样例:
           将项目运行在Hadoop上,成功!
终端输出:
OKOKOKOKCopying data from file:/home/dashengong/workspace/u1_data.datCopying file: file:/home/dashengong/workspace/u1_data.datLoading data to table default.u1_datarmr: DEPRECATED: Please use 'rm -r' instead.Deleted hdfs://localhost:9000/hive/warehouse/u1_dataTable default.u1_data stats: [numFiles=1, numRows=0, totalSize=52, rawDataSize=0]OKOKTotal jobs = 1Launching Job 1 out of 1Number of reduce tasks determined at compile time: 1In order to change the average load for a reducer (in bytes):  set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:  set hive.exec.reducers.max=<number>In order to set a constant number of reducers:  set mapreduce.job.reduces=<number>Starting Job = job_1409199368386_0003, Tracking URL = http://Ubuntu-Kylin:8088/proxy/application_1409199368386_0003/Kill Command = /opt/hadoop-2.4.0/bin/hadoop job  -kill job_1409199368386_0003Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 12014-08-29 00:38:12,026 Stage-1 map = 0%,  reduce = 0%2014-08-29 00:38:48,126 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.95 sec2014-08-29 00:39:17,752 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.47 secMapReduce Total cumulative CPU time: 6 seconds 470 msecEnded Job = job_1409199368386_0003MapReduce Jobs Launched: Job 0: Map: 1  Reduce: 1   Cumulative CPU: 6.47 sec   HDFS Read: 262 HDFS Write: 2 SUCCESSTotal MapReduce CPU Time Spent: 6 seconds 470 msecOK
eclipse控制台输出:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe).log4j:WARN Please initialize the log4j system properly.log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.Running: show tables:u1_dataRunning: describe u1_data:userid              int                 movieid             int                 rating              int                 city                string              viewtime            string              Running: load data local inpath '/home/dashengong/workspace/u1_data.dat' overwrite into table u1_data:Running: select * from u1_data limit 5:90beijing85chengduRunning: select count(*) from u1_data:2
查看HDFS:


           至此,hive在eclipse上配置成功。

注:在进行上述的所有测试时,务必保证hadoop,hbase,mysql都启动起来了!最后,如有错误,请指正!

 
0 0
原创粉丝点击