hive2.1.1 + hadoop2.8.0 + windows7（不用cygwin）搭建Hive

来源：互联网发布：淘宝高仿鞋哪些店铺好编辑：程序博客网时间：2024/05/16 09:47

环境： win7+jdk7+hadoop2.8+hive2.1.1

推荐文章：《hadoop2.8.0 + jdk1.7 + windows7(不用linux环境) 搭建与异常处理》

一、Hive介绍

Hive负责把Sql语句转化成MapReduce程序的插件

二、下载

官网：http://hive.apache.org/

下载地址：http://mirror.bit.edu.cn/apache/hive/

我下载的是apache-hive-2.1.1

三、windows下安装运行Hive

1、配置环境变量HIVE_HOME=E:\apache-hive-2.1.1-bin，在path下追加HIVE_HOME/bin;

2、修改配置文件

（1）重命名配置文件

在HIVE_HOME/conf文件夹下的

hive-env.sh.template

hive-exec-log4j2.properties.template

hive-log4j2.properties.template

hive-default.xml.template

分别复制一份，重命名为

hive-env.sh

hive-exec-log4j2.properties

hive-log4j2.properties

hive-site.xml

（2）配置hive-env.sh

在空白处添加

export HADOOP_HOME=F:\hadoop\hadoop-2.7.2
export HIVE_CONF_DIR=F:\hadoop\apache-hive-2.1.1-bin\conf
export HIVE_AUX_JARS_PATH=F:\hadoop\apache-hive-2.1.1-bin\lib

（3）配置hive-exec-log4j2.properties 与 hive-log4j2.properties

修改以下两项即可

property.hive.log.dir = E:\apache-hive-2.1.1-bin\hivelog
property.hive.log.file = hive.log

（4）配置hive-sitexml

<!--修改配置-->   <property>      <name>hive.metastore.warehouse.dir</name>       <!--hive的数据存储目录，指定的位置在hdfs上的目录，需在hdfs先创建该目录-->      <value>/user/hive/warehouse</value>      <description>location of default database for the warehouse</description>  </property>  <property>      <name>hive.exec.scratchdir</name>      <!--hive的临时数据目录，指定的位置在hdfs上的目录，需在hdfs先创建该目录-->      <value>/tmp/hive</value>      <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>  </property>  <property>      <name>hive.exec.local.scratchdir</name>      <!--本地目录-->      <value>E:/apache-hive-2.1.1-bin/hive/iotmp</value>      <description>Local scratch space for Hive jobs</description>  </property>  <property>      <name>hive.downloaded.resources.dir</name>      <!--本地目录-->      <value>E:/apache-hive-2.1.1-bin/hive/iotmp</value>      <description>Temporary local directory for added resources in the remote file system.</description>  </property>  <property>      <name>hive.querylog.location</name>      <!--本地目录-->    <value>E:/apache-hive-2.1.1-bin/hive/iotmp</value>      <description>Location of Hive run time structured log file</description>  </property>  <property>      <name>hive.server2.logging.operation.log.location</name><!--本地目录-->    <value>E:/apache-hive-2.1.1-bin/iotmp/operation_logs</value>      <description>Top level directory where operation logs are stored if logging functionality is enabled</description>  </property>    <property>      <name>javax.jdo.option.ConnectionURL</name>  <!--连接mysql数据库的ip和端口，需创建hive库-->      <value>jdbc:mysql://ip:端口/hive</value>  </property><property>      <name>javax.jdo.option.ConnectionDriverName</name>  <!--指定mysql驱动-->      <value>com.mysql.jdbc.Driver</value>  </property>  <property>      <name>javax.jdo.option.ConnectionUserName</name><!--数据库账号-->    <value>root</value>  </property>  <property>      <name>javax.jdo.option.ConnectionPassword</name><!--数据库密码-->    <value>root</value>  </property>     <!-- 解决 Caused by: MetaException(message:Version information not found in metastore. )  -->  <property>        <name>hive.metastore.schema.verification</name>        <value>false</value>        <description>        Enforce metastore schema version consistency.        True: Verify that version information stored in metastore matches with one from Hive jars.  Also disable automatic              schema migration attempt. Users are required to manully migrate schema after Hive upgrade which ensures              proper metastore schema migration. (Default)        False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.        </description>    </property>   <!--新增配置项-->  <!-- 解决 Required table missing : "`VERSION`" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.autoCreateTables"  -->  <property>      <name>datanucleus.autoCreateSchema</name>      <value>true</value>  </property>    <property>      <name>datanucleus.autoCreateTables</name>      <value>true</value>  </property>    <property>      <name>datanucleus.autoCreateColumns</name>      <value>true</value>  </property>

（5）创建数据库和文件夹

创建hive数据库： create database hive default character set utf8;

如果启动时报错则改用create database hive default character set lant1;

祥见下面：异常处理1

在hdfs上创建相应文件夹

hadoop fs -mkdir /tmp

hadoop fs -mkdir /user/hive/warehouse

hadoop fs -chmod g+w /tmp

hadoop fs -chmod g+w /user/hive/warehouse

3、启动

启动前确保内存足够

启动hdfs: start-dfs

启动yarn: start-yarn

启动metastore服务：hive --service metastore

启动Hive：hive

以上即可安装完成，可在hive客户端执行建表语句

注意：。

1、因为jdbc已经是直接连接hive数据库，所以不用 use hive; 直接执行建表语句即可。

搭建步骤参考：http://blog.csdn.net/f7anty/article/details/72629622

四、使用步骤

1、建表

create table t_user(uid int, name string, age int)

row format delimited

fields terminated by "\t";

建完表在hdfs的/user/hive/warehouse文件夹下就会创建一个和表名一致的文件夹

2、上传数据文件至表名文件夹

在E盘创建文本文件data.txt，文件内容：

1zhangsan182lisi193wangwu194maliu20

在E盘执行 hadoop fs -put data.txt /user/hive/warehouse/t_user

或者在hive客户端执行

load data local inpath 'E:/data.txt' into table t_user;

3、用hive查看数据文件

在hive客户端执行select * from t_user

4、用hive执行mapreduce程序

在hive客户端执行

create table t_user_simple

select name, age from t_user;

或者

select count(*) from t_user;

四、知识点

1、Hive默认数据库Derby

Derby是轻量级的java数据库，Hive使用Derby作为默认数据库，用于存储元数据。运行Hive客户端，即在当前文件夹创建Derby数据库文件。

如果更换文件夹执行客户端，就会在新文件夹创建数据库文件，即更换文件夹后无法查到原来的元数据。

2、Hive不支持insert语句

因为其本身只支持处理文件

3、Hive默认加载表文件夹下的所有数据文件

当在Hive客户端创建table后，在hdfs上会生成对应文件夹，如果往表名文件夹上传数据文件夹，则可以使用select语句查出。

如果往表名文件夹上传多个数据文件，也可用select查出。

4、Hive添加表数据方式

（1）使用hdfs的put命令，上传至对应的表名文件夹

（2）在hive客户端使用load data local inpath 'E:/data.txt' into table t_user;

（3）从另一张表获取数据插入这张表(MR执行)

insert into table t_user_simple

select name, age from _user

where age>18;

5、外表

Hive可以用外表存储一些公共数据，外表在删除时外表文件夹下的数据不会被删除

create external table t_user_ex(uid int, name string, age int)

row format delimited

fields terminated by "\t"

location '/hive-tmp/user/';

五、异常处理

1、Specified key was too long; max key length is 767

如果数据库的字符集是utf8，可能会报此异常，可以更换成lant1解决。

也可以参考：http://tcspecial.iteye.com/blog/2105079

阅读全文

0 0