hive 访问hbase 表

来源:互联网 发布:c 语言编译器安卓版 编辑:程序博客网 时间:2024/06/05 07:26

1.背景

大部分同学可能不会用hbase ,或者用hbase 不习惯,但是对sql 却很熟悉,比如要统计表的记录数,用sql 可能知道怎么写,用hbase 可能不知道,或者知道但是很慢,性能有问题等,这时我们可以通过hive 建立与hbase 表的关联关系没映射hbase 表到hive。


2.建hive表


2.1 hbase 表

  hbase 已存在product 表三个列簇EXT(id ,desc ),computer(name,price),food (name ,price)没个列簇有多个列

hbase(main):007:0> desc 'product'Table product is ENABLEDproductCOLUMN FAMILIES DESCRIPTION{NAME => 'EXT', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}{NAME => 'computer', BLOOMFILTER => 'ROW', VERSIONS => '5', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}{NAME => 'food', BLOOMFILTER => 'ROW', VERSIONS => '7', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE =>'65536', REPLICATION_SCOPE => '0'}3 row(s) in 0.3780 seconds


有4条记录如下:

hbase(main):009:0> scan 'product'ROW               COLUMN+CELL rowkey001        column=EXT:desc, timestamp=1509610518837, value=ext1 rowkey001        column=EXT:id, timestamp=1509610507780, value=1 rowkey001        column=computer:name, timestamp=1507369786304, value=ThinkPad E550 rowkey001        column=computer:price, timestamp=1509609996387, value=4200 rowkey001        column=food:name, timestamp=1507370035248, value=orange rowkey001        column=food:price, timestamp=1507371767465, value=10 rowkey002        column=EXT:desc, timestamp=1509610896463, value=buy at 2017-11-03 rowkey002        column=EXT:id, timestamp=1509610861714, value=0001 rowkey002        column=computer:name, timestamp=1509610842778, value=ACER S6720 rowkey002        column=computer:price, timestamp=1509610793222, value=4100 rowkey002        column=food:name, timestamp=1509678372445, value=apple rowkey002        column=food:price, timestamp=1509610776529, value=12 rowkey003        column=EXT:desc, timestamp=1509617309557, value=MAC product rowkey003        column=EXT:id, timestamp=1509617309613, value=00001 rowkey003        column=computer:name, timestamp=1509617309642, value=MAC S6826 rowkey003        column=computer:price, timestamp=1509617309706, value=6500 rowkey003        column=food:name, timestamp=1509678262882, value=mangosteen rowkey003        column=food:price, timestamp=1509617309740, value=6.5 rowkey004        column=EXT:desc, timestamp=1509618190303, value=ext0001 rowkey004        column=EXT:id, timestamp=1509618190345, value=Pera rowkey004        column=computer:name, timestamp=1509618190371, value=AIGO S60 rowkey004        column=computer:price, timestamp=1509618190401, value=2500 rowkey004        column=food:name, timestamp=1509618192426, value=Pear rowkey004        column=food:price, timestamp=1509618190478, value=13.24 row(s) in 0.4020 seconds

hive 建外部表

create external table product(rowkey varchar(100),fname  varchar(100),fprice float,cname  varchar(100),cprice float,eid varchar(100),edesc varchar(100)) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES("hbase.columns.mapping" = ":key,food:name,food:price,computer:name,computer:price,EXT:id,EXT:desc")TBLPROPERTIES("hbase.table.name" = "product");


说明:hbase.columns.mapping 字段与建表字段顺序一致。

3.查看hive表

hive> select * from product;OKrowkey001       orange  10.0    ThinkPad E550   4200.0  1       ext1rowkey002       apple   12.0    ACER S6720      4100.0  0001    buy at 2017-11-03rowkey003       mangosteen      6.5     MAC S6826       6500.0  00001   MAC productrowkey004       Pear    13.2    AIGO S60        2500.0  Pera    ext0001Time taken: 0.508 seconds, Fetched: 4 row(s)



4.变成本地表

hive> create table  product_1 as  select * from product;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Query ID = hadoop_20171103112618_d1137b8c-7d4c-4bb5-b60f-1cc20c325581Total jobs = 3Launching Job 1 out of 3Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1509675916323_0003, Tracking URL = http://master:18088/proxy/application_1509675916323_0003/Kill Command = /home/hadoop/hadoop-2.8.1//bin/hadoop job  -kill job_1509675916323_0003Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02017-11-03 11:26:53,261 Stage-1 map = 0%,  reduce = 0%2017-11-03 11:27:16,905 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.51 secMapReduce Total cumulative CPU time: 3 seconds 510 msecEnded Job = job_1509675916323_0003Stage-4 is selected by condition resolver.Stage-3 is filtered out by condition resolver.Stage-5 is filtered out by condition resolver.Moving data to directory hdfs://master:9000/user/hive/warehouse/.hive-staging_hive_2017-11-03_11-26-18_130_15224692176553118-1/-ext-10002Moving data to directory hdfs://master:9000/user/hive/warehouse/product_1MapReduce Jobs Launched: Stage-Stage-1: Map: 1   Cumulative CPU: 3.51 sec   HDFS Read: 5736 HDFS Write: 296 SUCCESSTotal MapReduce CPU Time Spent: 3 seconds 510 msecOK

查看数据:

hive> select * from product_1;OKrowkey001       orange  10.0    ThinkPad E550   4200.0  1       ext1rowkey002       apple   12.0    ACER S6720      4100.0  0001    buy at 2017-11-03rowkey003       mangosteen      6.5     MAC S6826       6500.0  00001   MAC productrowkey004       Pear    13.2    AIGO S60        2500.0  Pera    ext0001Time taken: 0.155 seconds, Fetched: 4 row(s)


外部表的数据会跟随hbase 表中数据变化而变化,但是hive本地表不是,是固化的。

hbase 修改rowkey002 food name为apple1;--------------------------------------------hbase(main):010:0> put 'product','rowkey002','food:name','apple1'0 row(s) in 0.3230 secondshive> select * from product;OKrowkey001       orange  10.0    ThinkPad E550   4200.0  1       ext1rowkey002       apple1  12.0    ACER S6720      4100.0  0001    buy at 2017-11-03rowkey003       mangosteen      6.5     MAC S6826       6500.0  00001   MAC productrowkey004       Pear    13.2    AIGO S60        2500.0  Pera    ext0001Time taken: 0.728 seconds, Fetched: 4 row(s)hive> select * from product_1;OKrowkey001       orange  10.0    ThinkPad E550   4200.0  1       ext1rowkey002       apple   12.0    ACER S6720      4100.0  0001    buy at 2017-11-03rowkey003       mangosteen      6.5     MAC S6826       6500.0  00001   MAC productrowkey004       Pear    13.2    AIGO S60        2500.0  Pera    ext0001Time taken: 0.425 seconds, Fetched: 4 row(s)
原创粉丝点击