Hive与Hbase整合
来源:互联网 发布:mysql执行计划怎么看 编辑:程序博客网 时间:2024/05/08 23:35
Hive与Hbase整合
我们这边开始使用hbase做实时查询,但是分析的任务还是得交给hive,hive计算的结果导入到hbase.
hive提供了几个jar包,帮助我们实现:
- 创建与hbase共享的表,数据(数据和表两边都有)
- 映射来自hbase的表到hive
- hive查询的结果直接导入hbase
启动hive
启动命令如下,主要是指定jar包,以及hbase使用的zookeeper的地址
bin/hive --auxpath /opt/CDH/hive/lib/hive-hbase-handler-0.10.0-cdh4.3.2.jar,/opt/CDH/hive/lib/hbase-0.94.6-cdh4.3.2.jar,/opt/CDH/hive/lib/zookeeper-3.4.5-cdh4.3.2.jar,/opt/CDH/hive/lib/guava-11.0.2.jar -hiveconf hbase.zookeeper.quorum=192.168.253.119,192.168.253.130
测试表
我们先在hive中创建测试表:
//create hive tmp tableCREATE TABLE pokes (foo INT, bar STRING)ROW FORMAT DELIMITEDFIELDS TERMINATED BY '\t';//test.txt数据格式:1 hello//插入数据到hive表LOAD DATA INPATH '/user/mapred/test.txt' OVERWRITE INTO TABLE pokes;
创建hive-hbase表
在hive中创建表时,制定映射到对应的hbase表,默认两边的表名字一样。
//create table share with hbasehive> CREATE TABLE hbase_hive_table(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") ;
切换到hbase shell,查看一下表是否存在:
hbase(main):007:0> describe 'hbase_hive_table'DESCRIPTION ENABLED {NAME => 'hbase_hive_table', FAMILIES => [{NAME => 'cf1', DATA_BL true OCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => ' 0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOC KCACHE => 'true'}]} 1 row(s) in 0.0800 seconds
写数据测试
//insert testhive> INSERT OVERWRITE TABLE hbase_hive_table SELECT * FROM pokes WHERE foo=1;Total MapReduce jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_201407241659_0007, Tracking URL = http://centos149:50030/jobdetails.jsp?jobid=job_201407241659_0007Kill Command = /opt/CDH/hadoop/share/hadoop/mapreduce1/bin/hadoop job -kill job_201407241659_0007Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 02014-08-07 16:15:14,505 Stage-0 map = 0%, reduce = 0%2014-08-07 16:15:20,010 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 2.46 sec2014-08-07 16:15:21,087 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 2.46 sec2014-08-07 16:15:22,190 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 2.46 sec2014-08-07 16:15:23,200 Stage-0 map = 100%, reduce = 100%, Cumulative CPU 2.46 secMapReduce Total cumulative CPU time: 2 seconds 460 msecEnded Job = job_201407241659_00071 Rows loaded to hbase_hive_tableMapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 2.46 sec HDFS Read: 196 HDFS Write: 0 SUCCESSTotal MapReduce CPU Time Spent: 2 seconds 460 msecOKTime taken: 34.594 seconds
我们切换到hbase shell,查看一下表是否已经写入信息:
hbase(main):005:0> scan 'hbase_hive_table'ROW COLUMN+CELL 1 column=cf1:val, timestamp=1407399353262, value=hello
如果想要提高写入hbase表的速度,可以添加如下设置,关闭wal预写日志
//hbase write maybe slow, because of wal, so set to falseset hive.hbase.wal.enabled=false;
Reference
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
0 0
- hive与hbase整合
- hbase与hive整合
- hive与hbase整合
- hive与Hbase整合
- hive与hbase整合
- hive与hbase整合
- hive 与 hbase 整合
- Hive与Hbase整合
- Hive与Hbase 整合
- hive与hbase整合
- hive与HBase整合
- hive与hbase整合
- Hbase与Hive整合
- 【Hive/HBase】Hive与HBase的整合
- Hadoop Hive与Hbase整合
- Hive与HBase的整合
- Hadoop Hive与Hbase整合
- Hadoop Hive与Hbase整合
- dev TableView 获取当前选中行的行号
- 堆和栈的区别
- 常用的webservice接口
- 同步,异步,阻塞,非阻塞
- HDUJ 1754 I Hate It
- Hive与Hbase整合
- mongodb与mysql命令对比
- lucene4.5源码分析系列:lucene概述
- Visual Studio中删除所有空行
- PYTHON的pandas如何处理从MySQL中导出的datetime?
- Oracle笔记之卸载
- hive压缩配置
- bianyi 32bit wine
- java笔记-IO流-IO基本操作