hive简介与安装

来源:互联网 发布:谷歌拼音输入法 mac 编辑:程序博客网 时间:2024/05/14 23:35
一、hive简介
1、Hive 是建立在 Hadoop  上的数据仓库基础构架。它提供了一系列的工具,可以用来进行数据提取转化加载(ETL ),这是一种可以存储、查询和分析存储在 Hadoop  中的大规模数据的机制。Hive 定义了简单的类 SQL  查询语言,称为 QL ,它允许熟悉 SQL  的用户查询数据。同时,这个语言也允许熟悉 MapReduce  开发者的开发自定义的 mapper  和 reducer  来处理内建的 mapper 和 reducer  无法完成的复杂的分析工作。
2、Hive是SQL解析引擎,它将SQL语句转译成M/R Job然后在Hadoop执行。
Hive的表其实就是HDFS的目录/文件,按表名把文件夹分开。如果是分区表,则分区值是子文件夹,可以直接在M/R Job里使用这些数据。


•用户接口,包括 CLI,JDBC/ODBC,WebUI

•元数据存储,通常是存储在关系数据库如 mysql, derby 中

•解释器、编译器、优化器、执行器

•Hadoop:用 HDFS 进行存储,利用 MapReduce 进行计算


用户接口主要有三个:CLI,JDBC/ODBC和WebUI
1.CLI,即Shell命令行
2.JDBC/ODBC 是Hive 的Java,与使用传统数据库JDBC的方式类似
3.WebGUI是通过浏览器访问Hive
Hive 将元数据存储在数据库中(metastore),目前只支持mysql、derby。Hive 中的元数据包括表的名字,表的列和分区及其属性,表的属性(是否为外部表等),表的数据所在目录等。其中,hive默认是用derby存储元数据信息,但是这种数据库在相同目录启动客户端,只允许一个客户端连接,所以大多数时候,我们都使用mysql作为其存储元数据的数据库。
解释器、编译器、优化器完成HQL 查询语句从词法分析、语法分析、编译、优化以及查询计划(plan)的生成。生成的查询计划存储在HDFS 中,并在随后有MapReduce调用执行
Hive 的数据存储在HDFS 中,大部分的查询由MapReduce完成(包含 * 的查询,比如 select * from table 不会生成MapRedcue任务)


二、hive的安装

Hive只在一个节点上安装即可
1.上传tar包
 
2.解压
tar -zxvf hive-0.9.0.tar.gz -C /cloud/
3.配置mysql metastore(切换到root用户)
配置HIVE_HOME环境变量
rpm -qa | grep mysql
rpm -e mysql-libs-5.1.66-2.el6_3.i686 --nodeps
rpm -ivh MySQL-server-5.1.73-1.glibc23.i386.rpm 
rpm -ivh MySQL-client-5.1.73-1.glibc23.i386.rpm 
修改mysql的密码
/usr/bin/mysql_secure_installation
(注意:删除匿名用户,允许用户远程连接)
登陆mysql
mysql -u root -p


4.配置hive
cp hive-default.xml.template hive-site.xml 
修改hive-site.xml(删除所有内容,只留一个<property></property>)
添加如下内容:
<property>
 <name>javax.jdo.option.ConnectionURL</name>
 <value>jdbc:mysql://hadoop00:3306/hive?createDatabaseIfNotExist=true</value>
 <description>JDBC connect string for a JDBC metastore</description>
</property>


<property>
 <name>javax.jdo.option.ConnectionDriverName</name>
 <value>com.mysql.jdbc.Driver</value>
 <description>Driver class name for a JDBC metastore</description>
</property>


<property>
 <name>javax.jdo.option.ConnectionUserName</name>
 <value>root</value>
 <description>username to use against metastore database</description>
</property>


<property>
 <name>javax.jdo.option.ConnectionPassword</name>
 <value>123</value>
 <description>password to use against metastore database</description>
</property>

5.安装hive和mysq完成后,将mysql的连接jar包拷贝到$HIVE_HOME/lib目录下
如果出现没有权限的问题,在mysql授权(在安装mysql的机器上执行)
mysql -uroot -p
#(执行下面的语句  *.*:所有库下的所有表   %:任何IP地址或主机都可以连接)
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY '123' WITH GRANT OPTION;
FLUSH PRIVILEGES;

6.建表(默认是内部表)
create table trade_detail(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t';
建分区表
create table td_part(id bigint, account string, income double, expenses double, time string) partitioned by (logdate string) row format delimited fields terminated by '\t';
建外部表
create external table td_ext(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t' location '/td_ext';


7.创建分区表
普通表和分区表区别:有大量数据增加的需要建分区表
create table book (id bigint, name string) partitioned by (pubdate string) row format delimited fields terminated by '\t'; 


分区表加载数据
load data local inpath './book.txt' overwrite into table book partition (pubdate='2010-08-22');

load data local inpath '/root/data.am' into table beauty partition (nation="USA");



select nation, avg(size) from beauties group by nation order by avg(size);

三、出现报错
可能还会出现这种报错:
17/05/04 19:06:12 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces17/05/04 19:06:12 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize17/05/04 19:06:12 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative17/05/04 19:06:12 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node17/05/04 19:06:12 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive17/05/04 19:06:12 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack17/05/04 19:06:12 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize17/05/04 19:06:12 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.neededLogging initialized using configuration in jar:file:/heres/apache-hive-0.13.0-bin/lib/hive-common-0.13.0.jar!/hive-log4j.propertiesException in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:344)        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)        at java.lang.reflect.Method.invoke(Method.java:606)        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient        at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1412)        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:62)        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)        at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2444)        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2456)        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:338)        ... 7 moreCaused by: java.lang.reflect.InvocationTargetException        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)        at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410)        ... 12 moreCaused by: javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:mysql://heres05:3306/hive?createDatabaseIfNotExist=true, username = root. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failureThe last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)        at com.mysql.jdbc.Util.handleNewInstance(Util.java:406)        at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1074)        at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2214)        at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:781)        at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:46)        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)        at com.mysql.jdbc.Util.handleNewInstance(Util.java:406)        at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:352)        at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:284)        at java.sql.DriverManager.getConnection(DriverManager.java:571)        at java.sql.DriverManager.getConnection(DriverManager.java:187)        at com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:361)        at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416)        at com.jolbox.bonecp.BoneCPDataSource.getConnection(BoneCPDataSource.java:120)        at org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:501)        at org.datanucleus.store.rdbms.RDBMSStoreManager.<init>(RDBMSStoreManager.java:298)        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)        at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)        at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)        at org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187)        at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356)        at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775)        at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333)        at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)        at java.lang.reflect.Method.invoke(Method.java:606)        at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)        at java.security.AccessController.doPrivileged(Native Method)        at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)        at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)        at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)        at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)        at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:309)        at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:338)        at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:247)        at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:222)        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)        at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58)        at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:498)        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:476)        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:524)        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:398)        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:357)        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:54)        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)        at org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4948)        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:171)        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)        at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410)        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:62)        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)        at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2444)        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2456)        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:338)        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)        at java.lang.reflect.Method.invoke(Method.java:606)        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
解决办法:

1、 修改MySQL的参数. /etc/my.cnf 添加 
[mysqld]
wait_timeout=31536000
interactive_timeout=31536000
2、重启mysql 
service mysql restart



0 0
原创粉丝点击