Hive基于MySQL保存元数据的安装

来源:互联网 发布:清华大学 网络教育 编辑:程序博客网 时间:2024/06/05 04:35

Hive下载
Hive官方网站:http://hive.apache.org/
Hive官方下载:http://hive.apache.org/downloads.html
Apache归档:Apache Software Foundation Distribution Directory
本次下载版本:apache-hive-0.13.1-bin.tar.gz
解压Hive

$ tar zxvf apache-hive-0.13.1-bin.tar.gz -C /opt/modules/$ cd /opt/modules/$ mv apache-hive-0.13.1-bin/ hive-0.13.1

配置Hive

$ cd /opt/modules/hive-0.13.1/conf$ cp hive-env.sh.template hive-env.sh

编辑hive-env.sh修改如下两行代码

$ vim hive-env.sh# Set HADOOP_HOME to point to a specific hadoop install directoryHADOOP_HOME=/opt/modules/hadoop-2.5.0# Hive Configuration Directory can be controlled by:export HIVE_CONF_DIR=/opt/modules/hive-0.13.1/conf

验证Hive
运行Hive之前,先启动Hadoop,需要在HDFS上创建/tmp和/user/hive/warehouse文件夹,并需要给新创建的文件夹写权限,如下代码所示:

$ cd /opt/modules/hadoop-2.5.0/$ bin/hdfs dfs -mkdir /tmp$ bin/hdfs dfs -mkdir -p /user/hive/warehouse$ bin/hdfs dfs -chmod g+w /tmp$ bin/hdfs dfs -chmod g+w /user/hive/warehouse

至此Hive内嵌模式已经安装完成,如下命令来验证hive安装:

$ cd /opt/modules/hive-0.13.1/$ bin/hive

如下信息表示Hive内嵌模式安装成功。

Logging initialized using configuration in jar:file:/opt/modules/hive-0.13.1/lib/hive-common-0.13.1.jar!/hive-log4j.propertieshive> show databases;OKdefaultTime taken: 0.576 seconds, Fetched: 1 row(s)

MySQL保存元数据
下载MySQL源

$ wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm

安装mysql-community-release-el7-5.noarch.rpm包

$ sudo rpm -ivh mysql-community-release-el7-5.noarch.rpm

安装mysql

$ sudo yum install -y mysql-server

启动MySQL

$ sudo service mysqld start

配置MySQL开机启动

$ sudo chkconfig mysqld on

设置MySQL root密码

$ mysqladmin -u root password 'hive'

登录MySQL

$ mysql -uroot -p

配置远程登录

mysql> grant all privileges on *.* to 'root'@'%' identified by 'hive' with grant option;

删除原用户信息

mysql> use mysqlmysql> delete from user where host='localhost' and user='root';

最后只剩如下root记录

mysql> select host, user, password from user;+------+------+-------------------------------------------+| host | user | password                                  |+------+------+-------------------------------------------+| %    | root | *4DF1D66463C18D44E3B001A8FB1BBFBEA13E27FC |+------+------+-------------------------------------------+

重启MySQL服务

mysql> quit;$ sudo service mysqld restart

配置Hive使用MySQL存储

$ cd /opt/modules/hive-0.13.1/$ cp conf/hive-default.xml.template conf/hive-site.xml

修改hive-site.xml文件

$ vim conf/hive-site.xml<configuration>    <property>      <name>javax.jdo.option.ConnectionURL</name>      <value>jdbc:mysql://hadoop01.malone.com:3306/metastore?createDatabaseIfNotExist=true</value>      <description>JDBC connect string for a JDBC metastore</description>    </property>    <property>      <name>javax.jdo.option.ConnectionDriverName</name>      <value>com.mysql.jdbc.Driver</value>      <description>Driver class name for a JDBC metastore</description>    </property>    <property>      <name>javax.jdo.option.ConnectionUserName</name>      <value>root</value>      <description>username to use against metastore database</description>    </property>    <property>      <name>javax.jdo.option.ConnectionPassword</name>      <value>hive</value>      <description>password to use against metastore database</description>    </property> </configuration>

导入MySQL驱动包

$ mv mysql-connector-java-5.1.27-bin.jar /opt/modules/hive-0.13.1/lib/

HQL语句测试

$ cd /opt/modules/hive-0.13.1/$ bin/hivehive> show databases;OKdefaultTime taken: 1.418 seconds, Fetched: 1 row(s)hive> create database if not exists hive_testdb;OKTime taken: 1.084 secondshive> use hive_testdb;OKTime taken: 0.027 secondshive> show tables;OKTime taken: 0.029 secondshive> create table employee(id int, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';OKTime taken: 1.542 secondshive> load data local inpath '/opt/datas/hive/employee.txt' into table employee;Copying data from file:/opt/datas/hive/employee.txtCopying file: file:/opt/datas/hive/employee.txtLoading data to table hive_testdb.employeeTable hive_testdb.employee stats: [numFiles=1, numRows=0, totalSize=52, rawDataSize=0]OKTime taken: 1.939 secondshive> desc employee;OKid                      int                                         name                    string                                      Time taken: 0.185 seconds, Fetched: 2 row(s)hive> desc extended employee;OKid                      int                                         name                    string                                      Detailed Table Information  Table(tableName:employee, dbName:hive_testdb, owner:hadoop, createTime:1472398263, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null)], location:hdfs://hadoop01.malone.com:8020/user/hive/warehouse/hive_testdb.db/employee, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=  , field.delim=Time taken: 0.161 seconds, Fetched: 4 row(s)hive> desc formatted employee;OK# col_name              data_type               comment             id                      int                                         name                    string                                      # Detailed Table Information         Database:               hive_testdb              Owner:                  hadoop                   CreateTime:             Sun Aug 28 23:31:03 CST 2016     LastAccessTime:         UNKNOWN                  Protect Mode:           None                     Retention:              0                        Location:               hdfs://hadoop01.malone.com:8020/user/hive/warehouse/hive_testdb.db/employee  Table Type:             MANAGED_TABLE            Table Parameters:            COLUMN_STATS_ACCURATE   true                    numFiles                1                       numRows                 0                       rawDataSize             0                       totalSize               52                      transient_lastDdlTime   1472398294          # Storage Information        SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe   InputFormat:            org.apache.hadoop.mapred.TextInputFormat     OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   Compressed:             No                       Num Buckets:            -1                       Bucket Columns:         []                       Sort Columns:           []                       Storage Desc Params:             field.delim             \t                      serialization.format    \t                  Time taken: 0.264 seconds, Fetched: 33 row(s)hive> select * from employee;OK1   burce.lee2   jacky.chen3   elbert.malone4   andy.lauTime taken: 0.817 seconds, Fetched: 4 row(s)hive> select id from employee;Total jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1472391663133_0001, Tracking URL = http://hadoop01.malone.com:8088/proxy/application_1472391663133_0001/Kill Command = /opt/modules/hadoop-2.5.0/bin/hadoop job  -kill job_1472391663133_0001Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02016-08-28 23:35:16,716 Stage-1 map = 0%,  reduce = 0%2016-08-28 23:35:50,749 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.84 secMapReduce Total cumulative CPU time: 1 seconds 840 msecEnded Job = job_1472391663133_0001MapReduce Jobs Launched: Job 0: Map: 1   Cumulative CPU: 1.84 sec   HDFS Read: 294 HDFS Write: 8 SUCCESSTotal MapReduce CPU Time Spent: 1 seconds 840 msecOK1234Time taken: 86.453 seconds, Fetched: 4 row(s)

Hive常用属性配置
cli命令行显示数据库名称和列标题名称

$ cd /opt/modules/hive-0.13.1/$ vim conf/hive-site.xml

新增如下配置信息

<property>  <name>hive.cli.print.header</name>  <value>true</value>  <description>Whether to print the names of the columns in query output.</description></property><property>  <name>hive.cli.print.current.db</name>  <value>true</value>  <description>Whether to include the current database in the Hive prompt.</description></property>

修改后的效果

$ bin/hiveLogging initialized using configuration in jar:file:/opt/modules/hive-0.13.1/lib/hive-common-0.13.1.jar!/hive-log4j.propertieshive (default)> show databases;OKdatabase_namedefaulthive_testdbTime taken: 0.768 seconds, Fetched: 2 row(s)hive (default)> use hive_testdb;OKTime taken: 0.028 secondshive (hive_testdb)> show tables;OKtab_nameemployeeTime taken: 0.063 seconds, Fetched: 1 row(s)hive (hive_testdb)> select * from employee;OKemployee.id employee.name1   burce.lee2   jacky.chen3   elbert.malone4   andy.lauTime taken: 0.917 seconds, Fetched: 4 row(s)

配置Hive的日志信息

$ cd /opt/modules/hive-0.13.1/conf$ cp hive-log4j.properties.template hive-log4j.properties$ vim hive-log4j.properties

修改如下信息

# Define some default values that can be overridden by system propertieshive.log.threshold=ALLhive.root.logger=INFO,DRFAhive.log.dir=/opt/modules/hive-0.13.1/logshive.log.file=hive.log
0 0
原创粉丝点击