Alluxio计算框架整合系列 (一) -- Alluxio & Apache Hive

来源:互联网 发布:淘宝图片修改 编辑:程序博客网 时间:2024/06/05 15:52

我们在上篇文章中,已经演示了如何安装Alluxio,并将HDFS作为Alluxio的底层文件系统。

这里我们将通过一系列的文章,来说明如何将Alluxio和常见的一些结算框架实现集成,用以提交查询。

本篇,将说明如何实现 Alluxio和Apache Hive的集成(至于查询效率到底有多少提升?我们稍后将通过专门的查询性能测试对比来说明)。



其他更多关于Alluxio的使用和交流,请加QQ群讨论:
Alluxio-China 452894882


一、准备工作

1. 已经安装好Hadoop基础环境和Hive客户端环境

2. 系统已安装Java1.7以上版本的JDK

3. 通过Hive能正常提交查询作业


二、配置及环境准备工作

1. 修改hadoop、hive配置文件中Alluxio相关配置项

在hvie客户端所在的服务器,修改Hadoop配置文件 core-site.xml,增加如下黑体部分示例的配置项

[op1@HIVE_CLIENT_HOST_NAME01 app]$ vi /etc/hadoop/conf/core-site.xml

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--  Licensed under the Apache License, Version 2.0 (the "License");  you may not use this file except in compliance with the License.  You may obtain a copy of the License at    http://www.apache.org/licenses/LICENSE-2.0  Unless required by applicable law or agreed to in writing, software  distributed under the License is distributed on an "AS IS" BASIS,  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and  limitations under the License. See accompanying LICENSE file.--><!-- Put site-specific property overrides in this file. --><configuration><!-- OTHER CONFIGURATION --><property><name>fs.alluxio.impl</name><value>alluxio.hadoop.FileSystem</value></property><property><name>fs.AbstractFileSystem.alluxio.impl</name><value>alluxio.hadoop.AlluxioFileSystem</value></property><property><name>alluxio.user.file.writetype.default</name><value>CACHE_THROUGH</value></property></configuration>

在hvie客户端所在的服务器,修改hive-site.xml, 增加如下黑体部分示例的配置项

[op1@HIVE_CLIENT_HOST_NAME01 app]$ vi /etc/hive/conf/hive-site.xml


<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><!-- OTHER CONFIGURATION --><property><name>fs.defaultFS</name><value>alluxio://<ALLUXIO_MASTER_HOST_NAME>:19998</value></property><property><name>alluxio.user.file.writetype.default</name><value>CACHE_THROUGH</value></property></configuration>


在Hadoop运行环境配置文件hadoop-env.shhadoop-env.sh中增加如下黑体所示HADOOP_CLASSPATH配置项

[op1@HIVE_CLIENT_HOST_NAME01 app]$ vi /etc/hadoop/conf/hadoop-env.sh

export HADOOP_CLASSPATH=/<PATH_TO_ALLUXIO>/core/client/target/alluxio-core-client-1.4.0-jar-with-dependencies.jar:${HADOOP_CLASSPATH}


2. 分发alluxio 客户端jar包到所有Hadoop MapReduce节点,并重启Hadoop集群的NodeManager

将上述步骤,hadoop-env.sh中配置的Alluxio客户端jar文件/<PATH_TO_ALLUXIO>/core/client/target/alluxio-core-client-1.4.0-jar-with-dependencies.jar分发到所有Hadoop MapReduce节点的$HADOOP_HOME/share/hadoop/common/lib 目录中, 之后重启所有Hadoop MapReduce节点。


[op1@HIVE_CLIENT_HOST_NAME01 app]$ ls -l $HADOOP_HOME/share/hadoop/common/lib/ | grep "alluxio-core-client"
-rw-rw-r-- 1 op1 op1 46835687 Jan 15 21:50 alluxio-core-client-1.4.0-jar-with-dependencies.jar

3. 分发alluxio 客户端jar包到Hive跟目录下的lib目录

将上述步骤,hadoop-env.sh中配置的Alluxio客户端jar文件/<PATH_TO_ALLUXIO>/core/client/target/alluxio-core-client-1.4.0-jar-with-dependencies.jar分发到Hive跟目录下的lib目录中

[op1@HIVE_CLIENT_HOST_NAME01 app]$ ls -l $HIVE_HOME/lib/ | grep "alluxio-core-client"
-rw-r--r--  1 op1 op1 46835687 Jan 15 21:53 alluxio-core-client-1.4.0-jar-with-dependencies.jar

三、验证

1. 在安装有alluxio的节点上行执行如下命令,查看是否存在目录路径/user/hive/warehouse和/tmp

[op1@HIVE_CLIENT_HOST_NAME01 current]$ ./bin/alluxio fs ls alluxio://SVR3404HW1288:19998/user/hive/
drwxr-xr-x     op1            op1            1.00B     01-15-2017 23:17:19:661  Directory      /user/hive/warehouse


如果不存在上述目录 ,请执行如下命令创建目录路径/user/hive/warehouse和/tmp

./bin/alluxio fs mkdir /tmp

./bin/alluxio fs mkdir /user/hive/warehouse

./bin/alluxio fs chmod 775 /tmp

./bin/alluxio fs chmod 775 /user/hive/warehouse

2. 在hive客户端所在服务器本地,创建数据文件hive-test.txt,写入数据如下

1 hadoop2 hive3 hbase4 hello5 alluxio-hive


3. 启动hive客户端,创建表,并加载本地数据到该表

hive> CREATE TABLE IF NOT EXISTS test_alluxio_hive_tbl (id INT,word STRING)ROW FORMAT DELIMITED FIELDS TERMINATED BY " " LINES TERMINATED BY "\n";OKTime taken: 0.762 secondshive> LOAD DATA LOCAL INPATH 'hive-test.txt' OVERWRITE INTO TABLE test_alluxio_hive_tbl;Loading data to table default.wordsTable default.words stats: [numFiles=1, numRows=0, totalSize=32, rawDataSize=0]OKTime taken: 1.302 seconds


4. 执行查询

hive> show tables;OKtest_alluxio_hive_tblwordsTime taken: 0.578 seconds, Fetched: 2 row(s)hive> select * from test_alluxio_hive_tbl;OK1hadoop2hive3hbase4hello5alluxio-hiveTime taken: 0.498 seconds, Fetched: 5 row(s)hive> 

另,也可以在Alluxio Master Web UI中查看数据文件


5. 删除表

hive> drop table if exists test_alluxio_hive_tbl;OKTime taken: 0.172 secondshive> 




0 0