HBase与hive集成

来源:互联网 发布:在淘宝上买东西的流程 编辑:程序博客网 时间:2024/05/21 08:49

集成环境:hadoop-2.6.0(Master,Slave1,Slave2),hbase-0.98.6-hadoop2,hive-1.2.1

1. hive和hbase集成需要的jar包有guava,hbase-common,hbase-server,hbase-client,hbase-protocol,hbase-it,htrace-core这七个jar包。

进入$HIVE_HOME/lib下以及$HBASE_HOME/lib下,看hive和hbase下的guava的jar包版本是否相同,如果不相同,在hive/lib下执行命令

rm -rf guava-XX.jar
删除hive里的guava的jar包,然后在hbase/lib执行命令,将guava-12.0.1.jar包拷贝到hive/lib目录下,并将其余的六个jar包也拷贝到hive/lib目录下

[root@Master lib]# cp guava-12.0.1.jar /usr/soft/hive-1.2.1/lib/

[root@Master lib]# cp hbase-common-0.98.6-hadoop2.jar /usr/soft/hive-1.2.1/lib/[root@Master lib]# cp hbase-server-0.98.6-hadoop2.jar /usr/soft/hive-1.2.1/lib/[root@Master lib]# cp hbase-client-0.98.6-hadoop2.jar /usr/soft/hive-1.2.1/lib/[root@Master lib]# cp hbase-protocol-0.98.6-hadoop2.jar /usr/soft/hive-1.2.1/lib/[root@Master lib]# cp hbase-it-0.98.6-hadoop2.jar /usr/soft/hive-1.2.1/lib/[root@Master lib]# cp htrace-core-2.04.jar /usr/soft/hive-1.2.1/lib/

2. 修改hive/conf下的hive-site.xml配置文件,在最后添加如下属性

<property>        <name>hbase.zookeeper.quorum</name>        <value>Master</value>  </property>


3. 启动hive,HBase与hive集成有两种方式,第一种是创建表管理表hbase_table_1,指定数据存储在hbase表中


hive (default)> CREATE TABLE hbase_table_1(key int, value string)                 > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'                > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")                > TBLPROPERTIES ("hbase.table.name" = "xyz"); OKTime taken: 4.938 seconds

 在hbase中查看是否创建xyz表

hbase(main):004:0> listTABLE                                                                           basic                                                                           sub_user                                                                        test                                                                            xyz                                                                             4 row(s) in 0.0310 seconds=> ["basic", "sub_user", "test", "xyz"]


 往hive表hbase_table_1表中插入数据


hive (default)> insert overwrite table hbase_table_1 select empno, ename from emp;Query ID = root_20170825100051_a6ec3c4e-f78c-4a63-9db3-291d9e73c0f9Total jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1502944428192_0002, Tracking URL = http://10.226.118.24:8888/proxy/application_1502944428192_0002/Kill Command = /usr/soft/hadoop-2.6.0/bin/hadoop job  -kill job_1502944428192_0002Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 02017-08-25 10:01:09,798 Stage-0 map = 0%,  reduce = 0%2017-08-25 10:01:12,978 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 1.72 secMapReduce Total cumulative CPU time: 1 seconds 720 msecEnded Job = job_1502944428192_0002MapReduce Jobs Launched: Stage-Stage-0: Map: 1   Cumulative CPU: 1.72 sec   HDFS Read: 9729 HDFS Write: 263148 SUCCESSTotal MapReduce CPU Time Spent: 1 seconds 720 msecOKempnoenameTime taken: 22.516 seconds


 查看hbase表xyz中是否有数据

hbase(main):006:0> scan 'xyz'ROW                   COLUMN+CELL                                                7369                 column=cf1:val, timestamp=1503626471989, value=SMITH       7499                 column=cf1:val, timestamp=1503626471989, value=ALLEN       7521                 column=cf1:val, timestamp=1503626471989, value=WARD        7566                 column=cf1:val, timestamp=1503626471989, value=JONES      4 row(s) in 0.0800 seconds

4. 第二中方式是创建外部表hbase_test,hbase中已经有test表

hbase(main):004:0> listTABLE                                                                           basic                                                                           sub_user                                                                        test                                                                            xyz                                                                             4 row(s) in 0.0240 seconds=> ["basic", "sub_user", "test", "xyz"]
hbase(main):003:0> scan 'test'ROW                   COLUMN+CELL                                                10002                column=cf:age, timestamp=1502847463784, value=56           10002                column=cf:name, timestamp=1502847451295, value=zhangsan    10003                column=cf:age, timestamp=1503279594383, value=35           10003                column=cf:name, timestamp=1502847534361, value=zhaoliu    2 row(s) in 0.2230 seconds

hive (default)> create external table hbase_test(id int, name string, age int)              > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'              > with serdeproperties ("hbase.columns.mapping" = ":key,cf:name,cf:age")              > tblproperties ("hbase.table.name" = "test");OKTime taken: 2.619 seconds

hive (default)> select * from hbase_test ;OKhbase_test.idhbase_test.namehbase_test.age10002zhangsan5610003zhaoliu35Time taken: 0.595 seconds, Fetched: 2 row(s)








原创粉丝点击