hive整合hbase并做测试

来源:互联网 发布:灵界室友网络剧下载 编辑:程序博客网 时间:2024/05/22 01:41

基于Hadoop+Hive架构对海量数据进行查询:http://blog.csdn.net/kunshan_shenbin/article/details/7105319 
HBase 0.90.5 + Hadoop 1.0.0
集成:http://blog.csdn.net/kunshan_shenbin/article/details/7209990


第一步,启动hadoop,命令:./start-all.sh

第二步,启动hive,命令:

拷贝hbase-0.94.12.jarzookeeper-3.4.2.jarhive/lib下。 
注意:如何hive/lib下已经存在这两个文件的其他版本(例如zookeeper-3.3.1.jar),建议删除后使用hbase下的相关版本。 
修改hive/confhive-site.xml文件,在底部添加如下内容: 

<!--

<property>

 <name>hive.exec.scratchdir</name> 

  <value>/usr/local/hive/tmp</value> 

</property> 

--><property>

 <name>hive.querylog.location</name>

 <value>/usr/local/hadoop/hive/logs</value>

</property>

<property>

 <name>hive.aux.jars.path</name>  <value>file:///usr/local/hadoop/hive/lib/hive-hbase-handler-0.9.0.jar,file:///usr/local/hadoop/hive/lib/hbase-0.94.12.jar,file:///usr/local/hadoop/hive/lib/zookeeper-3.4.2.jar</value>

</property>

<property>

  <name>hbase.zookeeper.quorum</name>  <value>nistest.node1,nistest.node2,nistest.node3,instest.node4</value>

</property>

 

注意:如果hive-site.xml不存在则自行创建,或者把hive-default.xml.template文件改名后使用。 

 

配置了hbase.zookeeper.quorum以后,启动就不需要加配置文件了

直接/bin/hive启动就可以(同时showtalbes的时候没有表是因为hive跟hbase有不同的表名,单独管理)

如果没有配置

启动的时候需要添加配置文件:

Bin/hive –hiveconfhbase.zookeeper.quorum=nistest.node1,nistest.node2,nistest.node3,nistest.node4

 

第三步,启动hbase,命令:./start-hbase.sh

第四步,建立关联表,这里我们要查询的表在hbase中已经存在所以,使用CREATE EXTERNAL TABLE来建立,如下:

创建hbase表(在hbase客户端hbase shell)的代码如下:

create 'hbase_table_1','cf1'

然后加入数据:

put 'hbase_table_1','1','cf1:val','fjsh1'

Java代码  

CREATE EXTERNAL TABLE hbase_table_2(key string, valuestring)  

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  

WITH SERDEPROPERTIES ("hbase.columns.mapping" ="cf1:val")  

TBLPROPERTIES("hbase.table.name" ="hbase_table_1");   

1.  hbase.columns.mapping指向对应的列族;多列时,data:1,data:2;多列族时,data1:1,data2:1;

2.  hbase.table.name指向对应的表;

3.  hbase_table_2(keystring, value string),这个是关联表

4. 我们看一下HBase中要查询的表的结构,

DESCRIPTION                                           ENABLED                     

 'hbase_table_1', {NAME => 'cf1',DATA_BLOCK_ENCODING  true                        

 => 'NONE', BLOOMFILTER => 'NONE',REPLICATION_SCOPE =                             

 >'0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VE                             

 RSIONS => '0', TTL => '2147483647',KEEP_DELETED_CELL                             

 S=> 'false', BLOCKSIZE => '65536', IN_MEMORY => 'fal                             

 se',ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}                              

1 row(s) in 0.1380 seconds

在看一下表中的数据,

ROW                    COLUMN+CELL                                                  

 1                     column=cf1:val,timestamp=1397729052180, value=value1       

 2                     column=cf1:val,timestamp=1397533269763, value=fjsh         

 3                     column=cf1:val,timestamp=1397729085322, value=value3       

3 row(s) in 0.1000 seconds

 

这张图片是来自于网上的解释


我们在hive命令行中先查看一下hbase_table_2,

hive> select * from hbase_table_2

   > ;

OK

1       value1

2       fjsh

3       value3

Time taken: 1.045 seconds

对比下hbase中的数据是正确的,然后我们再插入一条数据在hbase中

put 'hbase_table_1','4','cf1:val','fjsh4'

然后在hive中查看结果:

select * from hbase_table_2              

   > ;

OK

1       value1

2       fjsh

3       value3

4       fjsh4

Time taken: 0.222 seconds

也是正确无误的

下面我们来查询一下hbase_table_2表中value值为fjsh4的数据,

select * from hbase_table_2 wherevalue='fjsh4'

   > ;

Total MapReduce jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0 sincethere's no reduce operator

Starting Job = job_201404151531_0011,Tracking URL = http://nistest.master:50030/jobdetails.jsp?jobid=job_201404151531_0011

Kill Command =/usr/local/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker=nistest.master:8021 -kill job_201404151531_0011

Hadoop job information for Stage-1: numberof mappers: 1; number of reducers: 0

2014-04-17 20:18:53,597 Stage-1 map =0%,  reduce = 0%

2014-04-17 20:19:00,670 Stage-1 map =100%,  reduce = 0%, Cumulative CPU 2.03sec

2014-04-17 20:19:01,682 Stage-1 map =100%,  reduce = 0%, Cumulative CPU 2.03sec

2014-04-17 20:19:02,692 Stage-1 map =100%,  reduce = 0%, Cumulative CPU 2.03sec

2014-04-17 20:19:03,702 Stage-1 map =100%,  reduce = 0%, Cumulative CPU 2.03sec

2014-04-17 20:19:04,710 Stage-1 map =100%,  reduce = 100%, Cumulative CPU 2.03sec

MapReduce Total cumulative CPU time: 2seconds 30 msec

Ended Job = job_201404151531_0011

MapReduce Jobs Launched:

Job 0: Map: 1   Cumulative CPU: 2.03 sec   HDFS Read: 267 HDFS Write: 8 SUCCESS

Total MapReduce CPU Time Spent: 2 seconds30 msec

OK

4       fjsh4

Time taken: 51.066 seconds

 

以上只是在命令行里左对应的查询,我们的目的是使用JAVA代码来查询出有用的数据,其实这个也很简单,

首先,启动Hive的命令有点变化,使用如下命令:

Java代码  

  1. ./hive --service hiveserver  -p 10000

./hive --service hiveserver

 这里我们默认使用嵌入的Derby数据库,这里可以在hive-site.xml文件中查看到:

 这里我们默认使用嵌入的Derby数据库,这里可以在hive-site.xml文件中查看到:

Java代码  

  1. <property>
    <name>hive.metastore.local</name>
    <value>true</value>
    </property>
    <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://slave1:3306/hive?createDatabaseIfNotExist=true</value>
    </property>
    <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
    </property>
    <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hivepass</value>
    </property>

 在此,数据库链接的URL可以使用默认的:jdbc:hive://localhost:10000/default

 有了上面的准备,下面我们就可以使用JAVA代码来读取数据了,如下:

 

import java.sql.*;

 

import org.apache.log4j.Logger;

import com.mysql.jdbc.Statement;

 

public class HiveTest {

 private static String driverName ="org.apache.hadoop.hive.jdbc.HiveDriver";

     private static String url ="jdbc:hive://192.168.2.161:10000/default";

     private static String user ="sa";

     private static String password ="sa";

     private static String sql ="";

     private static ResultSet res;

     private static final Logger log = Logger.getLogger(HiveJdbcClient.class);   

     private static Connection con;

   public static void main(String[] args)throws Exception {

   setUp();

   testSelect();

}

  

     public static void testSelect() throws SQLException { 

         java.sql.Statement stmt =  con.createStatement(); 

         ResultSet res = stmt.executeQuery("select * fromhbase_table_2"); 

       // boolean moreRow = res.next(); 

         while (res.next()) { 

             System.out.println(res.getString(1)+","+res.getString(2)); 

             //moreRow = res.next(); 

         } 

     }        

   

     protected static void setUp() throws Exception { 

      Class.forName(driverName);

              conDriverManager.getConnection(url,user, password);          

     } 

 

 

}

执行结果:

1,value1

2,fjsh

3,value3                                            

4,fjsh4

 

0 0
原创粉丝点击