Hive连接Hbase操作数据

来源:互联网 发布:ipad程序员必备app 编辑:程序博客网 时间:2024/05/21 09:04
Hive整合HBase原理

Hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,并提供完整的sql查询功能,可以将sql语句转换为MapReduce任务进行运行。 其优点是学习成本低,可以通过类SQL语句快速实现简单的MapReduce统计,不必开发专门的MapReduce应用,十分适合数据仓库的统计分析。

Hive与HBase整合的实现是利用两者本身对外的API接口互相进行通信,相互通信主要是依靠Hive安装包lib/hive-hbase-handler.jar工具类,它负责Hbase和Hive进行通信的。


hadoop,hbase,hive都已经集群正常安装。

hadoop,hbase,hive都已正常启动。


命令行模式连接连接hbase   

[root@node1 bin]# ./hbase shell

list 看下table

hbase(main):006:0* listTABLE                                                                                                                                                                                test                                                                                                                                                                                 user                                                                                                                                                                                 2 row(s) in 0.4750 seconds
看下表结构

hbase(main):007:0> describe 'user'Table user is ENABLED                                                                                                                                                                user                                                                                                                                                                                 COLUMN FAMILIES DESCRIPTION                                                                                                                                                          {NAME => 'account', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                                                                        {NAME => 'address', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                                                                        {NAME => 'info', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                                                                           {NAME => 'userid', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                                                                         4 row(s) in 0.7020 seconds


然后扫描user表数据看看。

hbase(main):004:0> scan 'user'ROW                                            COLUMN+CELL                                                                                                                            lisi                                          column=account:name, timestamp=1495708477345, value=lisi                                                                               lisi                                          column=account:passport, timestamp=1495708477353, value=96857123123231                                                                 lisi                                          column=account:password, timestamp=1495708477349, value=654321                                                                         lisi                                          column=address:city, timestamp=1495708477381, value=\xE6\xB7\xB1\xE5\x9C\xB3                                                           lisi                                          column=address:province, timestamp=1495708477377, value=\xE5\xB9\xBF\xE4\xB8\x9C                                                       lisi                                          column=info:age, timestamp=1495708477358, value=38                                                                                     lisi                                          column=info:sex, timestamp=1495708477363, value=\xE5\xA5\xB3                                                                           lisi                                          column=userid:id, timestamp=1495708477330, value=002                                                                                   zhangsan                                      column=account:name, timestamp=1495708405658, value=zhangsan                                                                           zhangsan                                      column=account:passport, timestamp=1495708405699, value=968574321                                                                      zhangsan                                      column=account:password, timestamp=1495708405669, value=123456                                                                         zhangsan                                      column=address:city, timestamp=1495708405773, value=\xE6\xB7\xB1\xE5\x9C\xB3                                                           zhangsan                                      column=address:province, timestamp=1495708405764, value=\xE5\xB9\xBF\xE4\xB8\x9C                                                       zhangsan                                      column=info:age, timestamp=1495708405712, value=26                                                                                     zhangsan                                      column=info:sex, timestamp=1495708405755, value=\xE7\x94\xB7                                                                           zhangsan                                      column=userid:id, timestamp=1495708405444, value=001                                                                                  2 row(s) in 0.2020 seconds

在hive/bin中运行hive命令行模式

[root@master bin]# ./hive

执行建立关联hbase关联语句hbase_user表

会发现有报错:return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the locations

hive> CREATE EXTERNAL TABLE hbase_user(key string, idcard string,passport string,country string,name string,password string,    > province string,city string,age string,sex string ,id string)       > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'       > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,account:idcard,account:passport,account:country,account:name,account:password,    > address:province,address:city,info:age,info:sex,userid:id")       > TBLPROPERTIES("hbase.table.name" = "user");FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the locationsat org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:312)at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61)at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:295)at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:155)at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:811)at org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:602)at org.apache.hadoop.hbase.MetaTableAccessor.tableExists(MetaTableAccessor.java:366)at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:303)at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:313)at org.apache.hadoop.hive.hbase.HBaseStorageHandler.preCreateTable(HBaseStorageHandler.java:205)at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:742)
can not get location 以及下面的日志 应该是连不上hbase,用jps查看下hbase的运行情况

[root@master conf]# jps3945 HMaster18681 RunJar2699 NameNode3330 NodeManager2951 SecondaryNameNode3226 ResourceManager3874 HQuorumPeer18901 Jps
发现一切正常

接着查看下hive日志发现: 都是zookeeper连接失败 

 Opening socket connection to server localhost/127.0.0.1:2181

2017-05-25T03:06:12,259  INFO [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)2017-05-25T03:06:12,260  WARN [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnectjava.net.ConnectException: Connection refusedat sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)2017-05-25T03:06:13,362  INFO [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)2017-05-25T03:06:13,363  WARN [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnectjava.net.ConnectException: Connection refusedat sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)2017-05-25T03:06:13,465  INFO [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)2017-05-25T03:06:13,466  WARN [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnectjava.net.ConnectException: Connection refusedat sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)2017-05-25T03:06:14,568  INFO [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)2017-05-25T03:06:14,569  WARN [9b1835d1-6488-4521-99d5-88d3e786be46 main-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
去看下hive-site.xml的配置文件,配置上

<property>    <name>hive.zookeeper.quorum</name>    <value>master,node1,node2</value>    <description>      List of ZooKeeper servers to talk to. This is needed for:      1. Read/write locks - when hive.lock.manager is set to      org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager,      2. When HiveServer2 supports service discovery via Zookeeper.      3. For delegation token storage if zookeeper store is used, if      hive.cluster.delegation.token.store.zookeeper.connectString is not set      4. LLAP daemon registry service    </description>  </property>
<property>    <name>hive.zookeeper.client.port</name>    <value>2181</value>    <description>      The port of ZooKeeper servers to talk to.      If the list of Zookeeper servers specified in hive.zookeeper.quorum      does not contain port numbers, this value is used.    </description>  </property>

而 hbase集群中的用zookeeper的默认的端口是2222,所以为了端口统一,     把hbase-site.xml中的改成2181,记得重启服务

或者直接  把hbase-site.xml 复制到hive的conf目录下,hive会读取hbase的zookeeper的 zookeeper.quorum和 zookeeper.port

两种方法都可以解决问题


接着在hive中再次执行 create table语句

hive> CREATE EXTERNAL TABLE hbase_user(key string, idcard string,passport string,country string,name string,password string,    > province string,city string,age string,sex string ,id string)       > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'       > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,account:idcard,account:passport,account:country,account:name,account:password,    > address:province,address:city,info:age,info:sex,userid:id")       > TBLPROPERTIES("hbase.table.name" = "user");OKTime taken: 20.323 secondshive> show tables;OKapacheloghbase_userTime taken: 2.75 seconds, Fetched: 2 row(s)

执行成功,

接着用hiveql查询下数据

hive> select * from hbase_user;OKlisiNULL96857123123231NULLlisi654321广东深圳38女002zhangsanNULL968574321NULLzhangsan123456广东深圳26男001Time taken: 5.798 seconds, Fetched: 2 row(s)hive> describe hbase_user;OKkey                 string                                  idcard              string                                  passport            string                                  country             string                                  name                string                                  password            string                                  province            string                                  city                string                                  age                 string                                  sex                 string                                  id                  string                                  Time taken: 3.785 seconds, Fetched: 11 row(s)hive> select key ,idcard,password,country,name, passport,province,city,age,sex,id from hbase_user;OKlisiNULL654321NULLlisi96857123123231广东深圳38女002zhangsanNULL123456chinazhangsan968574321广东深圳26男001Time taken: 2.341 seconds, Fetched: 2 row(s)

..null 是因为 hbase的column没有设置idcard字段值,和 country的值所以是为null

给hbase 表 user设置country看看,和idcard

./hbase shell

hbase(main):003:0> put 'user','zhangsan','account:idcard','420923156366998855';hbase(main):004:0* put 'user','lisi','account:idcard','520369856366998855';hbase(main):005:0* put 'user','lisi','account:country','china';

hive> select key ,idcard,password,country,name, passport,province,city,age,sex,id from hbase_user;OKlisi520369856366998855654321chinalisi96857123123231广东深圳38女002zhangsan420923156366998855123456chinazhangsan968574321广东深圳26男001Time taken: 2.388 seconds, Fetched: 2 row(s)hive> select * from hbase_user where name='zhangsan';OKzhangsan420923156366998855968574321chinazhangsan123456广东深圳26男001Time taken: 2.651 seconds, Fetched: 1 row(s)hive> select count(key) from hbase_user;WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.Query ID = root_20170525040249_f808c765-79f6-43c0-aa94-ebfed7751091Total jobs = 1Launching Job 1 out of 1Number of reduce tasks determined at compile time: 1In order to change the average load for a reducer (in bytes):  set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:  set hive.exec.reducers.max=<number>In order to set a constant number of reducers:  set mapreduce.job.reduces=<number>Starting Job = job_1495621107567_0001, Tracking URL = http://master:8088/proxy/application_1495621107567_0001/Kill Command = /usr/tools/hadoop/bin/hadoop job  -kill job_1495621107567_0001Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. PermGen space



执行count的时候,运行mapreducer,PermGen space了。。。。  也是醉了

window7机器上了运行了虚拟机,3个linux组成的  hadoop,hbase,hive集群, 8g内存  内存使用率走横线,..... 

实际上有时是可以count出来数据的,估计windows开了不少进程





原创粉丝点击
热门问题 老师的惩罚 人脸识别 我在镇武司摸鱼那些年 重生之率土为王 我在大康的咸鱼生活 盘龙之生命进化 天生仙种 凡人之先天五行 春回大明朝 姑娘不必设防,我是瞎子 小米3s手机触屏部分失灵怎么办 魅族手机屏幕锁密码忘了怎么办 手机没设置魅族账号密码忘了怎么办 魅族手机格式化密码忘了怎么办 魅族手机忘记密码了怎么解锁怎么办 手机设置的应用加密忘记密码怎么办 手机上设置应用加密忘记密码怎么办 魅蓝flyme密码忘了怎么办图片 魅族手机经常自动账号锁屏怎么办 魅族锁定后又不知道密码怎么办 魅族手机锁屏锁定了怎么办 魅族手机已锁定怎么办密码忘了 京东抢到了小米8不发货怎么办 第一次网上预约没有就诊卡号怎么办 京东定金交了未发货怎么办 买了没有预售许可证的房子怎么办 买了没有预售证的房子怎么办 苹果手机发烫容易变3g网怎么办 魅族手机有指纹和密码怎么办刷机 魅族手机指纹解锁密码忘了怎么办 魅蓝5s运存占用太多怎么办 魅蓝e2手机照片被删了怎么办 魅蓝e2不小心删除了照片怎么办 魅蓝3s返回键失灵怎么办 糖猫电话手表屏碎了怎么办 魅蓝手机没下安装包强制更新怎么办 老婆赌博输了30多万现在怎么办啊 红米nt2手机通话声音小怎么办? 微信退出后重新登录忘记密码怎么办 微退出后再登录忘记密码了怎么办 忘记微信密码又退出微信怎么办 无线网自家密码忘了也连不上怎么办 无线网密码忘了连接不上怎么办 魅蓝手机插口半夜坏了怎么办 魅族手机换屏后出现跳屏怎么办 小米手机微信小程序转发不出怎么办 苹果手机ad码和密码忘了怎么办 魅族手机摔掉无法开机怎么办 魅族音量+电源键直接开机了怎么办 魅蓝2数字锁机了怎么办 苹果5s蓝屏开不了机怎么办