1008-Hive访问HBase表数据

来源:互联网 发布:seo搜索引擎优化 遵义 编辑:程序博客网 时间:2024/05/19 04:26
1.  Hive整合HBase原理
Hive与HBase整合的实现是利用两者本身对外的API接口互相进行通信,相互通信主要是依靠Hive安装包lib/hive-hbase-handler-0.13.0.jar工具类,它负责Hbase和Hive进行通信的。
Hive和HBase通信原理如下图:

2.  Hive的安装
假设这里已经完成hive的安装,下面需要考虑相关的jar包
(1)考虑jar包
#删除$HIVA_HOME/lib目录下的的Zookeeper的jar包
rm -rf $HIVE_HOME/lib/zookeeper*

#拷贝生产环境下的Zookeeper的jar包到$HIVA_HOME/lib目录下
cp $ZOOKEEPER_HOME/zookeeper-3.4.6.jar $HIVA_HOME/lib

3、创建HBase表,将数据添加到HBase表中
4、创建HBase表映射的Hive表
5、在Hive下访问Hbase的表

(1) 编写Mapreduce,读取每行数据然后保存HBase
(2) 让Hive操作HBase表的数据
(3) Hive统计分析HBase表的数据,分析用户访客行为

3、查看hbase中的数据
3.1 全表查看
scan 'UserVisitInfo'
3.2 根据rowkey查看
hbase(main):012:0> get 'UserVisitInfo','20150706_3037487029517069460000'COLUMN                          CELL                                                                                     info:FirstAccessUrl            timestamp=1443000064923, value=/m/subject/100000000000009_0.html                         info:browser                   timestamp=1443000064923, value=Safari                                                    info:browserVersion            timestamp=1443000064923, value=533.1                                                     info:firstAccessTime           timestamp=1443000064923, value=20150706000104                                            info:operateSystem             timestamp=1443000064923, value=linux                                                     info:recentAccessTime          timestamp=1443000065001, value=20150706030107                                            info:recentAccessUrl           timestamp=1443000065001, value=/m/                                                       info:screenColor               timestamp=1443000064923, value=24                                                        info:screenSize                timestamp=1443000064923, value=480x854                                                   info:siteType                  timestamp=1443000064923, value=0                                                         info:userFlag                  timestamp=1443000064923, value=3037487029517069460000                                    info:userProvince              timestamp=1443000064923, value=999                                                       info:userVisitId               timestamp=1443000064923, value=20150706_3037487029517069460000                           info:visitCount                timestamp=1443000065001, value=2                                                         info:visitDay                  timestamp=1443000064923, value=20150706                                                  info:visitFlag                 timestamp=1443000064923, value=3037487029517069460000                                    info:visitHour                 timestamp=1443000064923, value=0                                                         info:visitIp                   timestamp=1443000064923, value=10.139.198.176                                            info:visitKeepTime             timestamp=1443000065001, value=10803      
         
 
 4、统计hive分析hbase表的数据
 4.1 创建HBase表,将数据添加到HBase表中
 UserVisitInfo
 4.2 创建HBase表映射的Hive表
 (1) 创建表
CREATE external TABLE User_Visit_Info( userVisitId string,   FirstAccessUrl string, browserVersion string,                                                   firstAccessTime string ,                                            operateSystem string,       recentAccessTime string,                                           recentAccessUrl string,                                                    screenColor string,                                                    screenSize string,                                                  siteType string,     userFlag string,                                   userProvince string,                                               visitCount string,                                                      visitDay string,                                                visitFlag string,                                   visitHour string,  visitIp string,                                           visitKeepTime string)STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:FirstAccessUrl,info:browserVersion,info:firstAccessTime,info:operateSystem,info:recentAccessTime,info:recentAccessUrl,info:screenColor,info:screenSize,info:siteType,info:userFlag,info:userProvince,info:visitCount,info:visitDay,info:visitFlag,info:visitHour,info:visitIp,info:visitKeepTime")TBLPROPERTIES ("hbase.table.name" = "UserVisitInfo");
 4.3 使用Hive统计分析
0 0
原创粉丝点击