我们使用Hive Server 1已经很长时间了,用户ad-hoc query,hive-web, wormhole,运营工具等都是通过hive server来提交语句。但是hive server极其不稳定,经常会莫名奇妙假死,导致client端所有的connection都被block住了。对此我们不得不配置一个crontab检查脚本,会不断执行"show tables"语句来检测server是否假死,如果假死,只能杀死daemon进程重启。另外Hive Server 1的concurrency支持不好,如果一个用户在连接中设置了一些环境变量,绑定到一个thrift worker thread, 用户断开连接,另一个用户也创建了一个连接,他有可能也被分配到之前的worker thread,会复用之前的配置。这是因为thrift不支持检测client是否断开链接,它也就无法清除session状态信息。同时session绑定到worker thread的方式很难做HA。Hive Server 2中已经完美支持了session, client端每次RPC call的时候会带上一个SessionID, Server端会mapping到保存状态信息的Session State,使得任何一个worker thread都可以执行同一个Session的不同语句,而不会绑死在同一个上。
Hive 0.11 包含了Hive Server 1 和 Hive Server 2,还包含1的原因是为了做到向下兼容性。从长远来看都会以Hive Server 2作为首选。
配置
1. 配置hive server监听端口和Host
[html] view plaincopyprint?
- <property>
- <name>hive.server2.thrift.port</name>
- <value>10000</value>
- </property>
- <property>
- <name>hive.server2.thrift.bind.host</name>
- <value>test84.hadoop</value>
- </property>
<property> <name>hive.server2.thrift.port</name> <value>10000</value></property><property> <name>hive.server2.thrift.bind.host</name> <value>test84.hadoop</value></property>
2. 配置kerberos认证,这样thrift client与hive server 2, hive server 2与hdfs交互 都由kerberos作认证
[html] view plaincopyprint?
- <property>
- <name>hive.server2.authentication</name>
- <value>KERBEROS</value>
- <description>
- Client authentication types.
- NONE: no authentication check
- LDAP: LDAP/AD based authentication
- KERBEROS: Kerberos/GSSAPI authentication
- CUSTOM: Custom authentication provider
- (Use with property hive.server2.custom.authentication.class)
- </description>
- </property>
- <property>
- <name>hive.server2.authentication.kerberos.principal</name>
- <value>hadoop/_HOST@DIANPING.COM</value>
- </property>
- <property>
- <name>hive.server2.authentication.kerberos.keytab</name>
- <value>/etc/hadoop.keytab</value>
- </property>
<property> <name>hive.server2.authentication</name> <value>KERBEROS</value> <description> Client authentication types. NONE: no authentication check LDAP: LDAP/AD based authentication KERBEROS: Kerberos/GSSAPI authentication CUSTOM: Custom authentication provider (Use with property hive.server2.custom.authentication.class) </description></property><property> <name>hive.server2.authentication.kerberos.principal</name> <value>hadoop/_HOST@DIANPING.COM</value></property><property> <name>hive.server2.authentication.kerberos.keytab</name> <value>/etc/hadoop.keytab</value></property>
3. 设置impersonation,这样hive server会以提交用户的身份去执行语句,如果设置为false,则会以起hive server daemon的admin user来执行语句
[html] view plaincopyprint?
- <property>
- <name>hive.server2.enable.doAs</name>
- <value>true</value>
- </property>
<property> <name>hive.server2.enable.doAs</name> <value>true</value></property>
执行命令$HIVE_HOME/bin/hive --service hiveserver2或者
$HIVE_HOME/bin/hiveserver2 会调用org.apache.hive.service.server.HiveServer2的main方法来启动
hive log中输出日志信息如下:
[plain] view plaincopyprint?
- 2013-09-17 14:59:21,081 INFO server.HiveServer2 (HiveStringUtils.java:startupShutdownMessage(604)) - STARTUP_MSG:
- /************************************************************
- STARTUP_MSG: Starting HiveServer2
- STARTUP_MSG: host = test84.hadoop/10.1.77.84
- STARTUP_MSG: args = []
- STARTUP_MSG: version = 0.11.0
- STARTUP_MSG: classpath = 略.................
- 2013-09-17 14:59:21,957 INFO security.UserGroupInformation (UserGroupInformation.java:loginUserFromKeytab(633)) - Login successful for user hadoop/test84.hadoop@DIANPING.COM using keytab file /etc/hadoop.keytab
- 2013-09-17 14:59:21,958 INFO service.AbstractService (AbstractService.java:init(89)) - Service:OperationManager is inited.
- 2013-09-17 14:59:21,958 INFO service.AbstractService (AbstractService.java:init(89)) - Service:SessionManager is inited.
- 2013-09-17 14:59:21,958 INFO service.AbstractService (AbstractService.java:init(89)) - Service:CLIService is inited.
- 2013-09-17 14:59:21,959 INFO service.AbstractService (AbstractService.java:init(89)) - Service:ThriftCLIService is inited.
- 2013-09-17 14:59:21,959 INFO service.AbstractService (AbstractService.java:init(89)) - Service:HiveServer2 is inited.
- 2013-09-17 14:59:21,959 INFO service.AbstractService (AbstractService.java:start(104)) - Service:OperationManager is started.
- 2013-09-17 14:59:21,960 INFO service.AbstractService (AbstractService.java:start(104)) - Service:SessionManager is started.
- 2013-09-17 14:59:21,960 INFO service.AbstractService (AbstractService.java:start(104)) - Service:CLIService is started.
- 2013-09-17 14:59:22,007 INFO metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(409)) - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
- 2013-09-17 14:59:22,032 INFO metastore.ObjectStore (ObjectStore.java:initialize(222)) - ObjectStore, initialize called
- 2013-09-17 14:59:22,955 INFO metastore.ObjectStore (ObjectStore.java:getPMF(267)) - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
- 2013-09-17 14:59:23,000 INFO metastore.ObjectStore (ObjectStore.java:setConf(205)) - Initialized ObjectStore
- 2013-09-17 14:59:23,909 INFO metastore.HiveMetaStore (HiveMetaStore.java:logInfo(452)) - 0: get_databases: default
- 2013-09-17 14:59:23,912 INFO HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(238)) - ugi=hadoop/test84.hadoop@DIANPING.COM ip=unknown-ip-addr cmd=get_databases: default
- 2013-09-17 14:59:23,933 INFO service.AbstractService (AbstractService.java:start(104)) - Service:ThriftCLIService is started.
- 2013-09-17 14:59:23,948 INFO service.AbstractService (AbstractService.java:start(104)) - Service:HiveServer2 is started.
- 2013-09-17 14:59:24,025 INFO security.UserGroupInformation (UserGroupInformation.java:loginUserFromKeytab(633)) - Login successful for user hadoop/test84.hadoop@DIANPING.COM using keytab file /etc/hadoop.keytab
- 2013-09-17 14:59:24,047 INFO thrift.ThriftCLIService (ThriftCLIService.java:run(435)) - ThriftCLIService listening on test84.hadoop/10.1.77.84:10000
2013-09-17 14:59:21,081 INFO server.HiveServer2 (HiveStringUtils.java:startupShutdownMessage(604)) - STARTUP_MSG: /************************************************************STARTUP_MSG: Starting HiveServer2STARTUP_MSG: host = test84.hadoop/10.1.77.84STARTUP_MSG: args = []STARTUP_MSG: version = 0.11.0STARTUP_MSG: classpath = 略.................2013-09-17 14:59:21,957 INFO security.UserGroupInformation (UserGroupInformation.java:loginUserFromKeytab(633)) - Login successful for user hadoop/test84.hadoop@DIANPING.COM using keytab file /etc/hadoop.keytab2013-09-17 14:59:21,958 INFO service.AbstractService (AbstractService.java:init(89)) - Service:OperationManager is inited.2013-09-17 14:59:21,958 INFO service.AbstractService (AbstractService.java:init(89)) - Service:SessionManager is inited.2013-09-17 14:59:21,958 INFO service.AbstractService (AbstractService.java:init(89)) - Service:CLIService is inited.2013-09-17 14:59:21,959 INFO service.AbstractService (AbstractService.java:init(89)) - Service:ThriftCLIService is inited.2013-09-17 14:59:21,959 INFO service.AbstractService (AbstractService.java:init(89)) - Service:HiveServer2 is inited.2013-09-17 14:59:21,959 INFO service.AbstractService (AbstractService.java:start(104)) - Service:OperationManager is started.2013-09-17 14:59:21,960 INFO service.AbstractService (AbstractService.java:start(104)) - Service:SessionManager is started.2013-09-17 14:59:21,960 INFO service.AbstractService (AbstractService.java:start(104)) - Service:CLIService is started.2013-09-17 14:59:22,007 INFO metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(409)) - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore2013-09-17 14:59:22,032 INFO metastore.ObjectStore (ObjectStore.java:initialize(222)) - ObjectStore, initialize called2013-09-17 14:59:22,955 INFO metastore.ObjectStore (ObjectStore.java:getPMF(267)) - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"2013-09-17 14:59:23,000 INFO metastore.ObjectStore (ObjectStore.java:setConf(205)) - Initialized ObjectStore2013-09-17 14:59:23,909 INFO metastore.HiveMetaStore (HiveMetaStore.java:logInfo(452)) - 0: get_databases: default2013-09-17 14:59:23,912 INFO HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(238)) - ugi=hadoop/test84.hadoop@DIANPING.COM ip=unknown-ip-addr cmd=get_databases: default 2013-09-17 14:59:23,933 INFO service.AbstractService (AbstractService.java:start(104)) - Service:ThriftCLIService is started.2013-09-17 14:59:23,948 INFO service.AbstractService (AbstractService.java:start(104)) - Service:HiveServer2 is started.2013-09-17 14:59:24,025 INFO security.UserGroupInformation (UserGroupInformation.java:loginUserFromKeytab(633)) - Login successful for user hadoop/test84.hadoop@DIANPING.COM using keytab file /etc/hadoop.keytab2013-09-17 14:59:24,047 INFO thrift.ThriftCLIService (ThriftCLIService.java:run(435)) - ThriftCLIService listening on test84.hadoop/10.1.77.84:10000
可以看到在HiveServer2已经变成一个Compisite Service了,它包含了一组service,包括OperationManager,SessionManager,CLIService,ThriftCLIService。并且在初始化的时候会建立HiveMetaStore连接,并调用get_databases命令来测试。最后启动thrift server(实际上是一个TThreadPool),监听在test84.hadoop/10.1.77.84:10000端口上
另外Hadoop FileSystem Cache会以uri schema, authority, ugi(CurrentUser)和unique的组合作为Key来缓存文件系统对象。但是这会导致hive server memory leak,通过"jmap -histo pid"可以看出filesystem对象数量和所占空间非常大,所以需要在启动hive server的时候加上disable file system cache的参数。
[plain] view plaincopyprint?
- $HIVE_HOME/bin/hive --service hiveserver2 --hiveconf fs.hdfs.impl.disable.cache=true --hiveconf fs.file.impl.disable.cache=true
$HIVE_HOME/bin/hive --service hiveserver2 --hiveconf fs.hdfs.impl.disable.cache=true --hiveconf fs.file.impl.disable.cache=true
1. Beeline访问hive server 2
Beeline是hive 0.11引入的新的交互式CLI,它基于SQLLine,可以作为Hive JDBC Client端访问Hive Server 2,启动一个beeline就是维护了一个session。
由于采用了kerberos认证方式,所以需要在本地有kerberos ticket,并且在connection url中指定hive server 2的service principal,此处为principal=hadoop/test84.hadoop@DIANPING.COM,另外用户名和密码可以不用填写,之后的语句会以当前ticket cache中principal的用户身份来执行。
[plain] view plaincopyprint?
- -dpsh-3.2$ bin/beeline
- Beeline version 0.11.0 by Apache Hive
- beeline> !connect jdbc:hive2://test84.hadoop:10000/default;principal=hadoop/test84.hadoop@DIANPING.COM
- scan complete in 2ms
- Connecting to jdbc:hive2://test84.hadoop:10000/default;principal=hadoop/test84.hadoop@DIANPING.COM
- Enter username for jdbc:hive2://test84.hadoop:10000/default;principal=hadoop/test84.hadoop@DIANPING.COM:
- Enter password for jdbc:hive2://test84.hadoop:10000/default;principal=hadoop/test84.hadoop@DIANPING.COM:
- Connected to: Hive (version 0.11.0)
- Driver: Hive (version 0.11.0)
- Transaction isolation: TRANSACTION_REPEATABLE_READ
- 0: jdbc:hive2://test84.hadoop:10000/default> select count(1) from abc;
- +------+
- | _c0 |
- +------+
- | 0 |
- +------+
- 1 row selected (29.277 seconds)
- 0: jdbc:hive2://test84.hadoop:10000/default> !q
- Closing: org.apache.hive.jdbc.HiveConnection
-dpsh-3.2$ bin/beeline Beeline version 0.11.0 by Apache Hivebeeline> !connect jdbc:hive2://test84.hadoop:10000/default;principal=hadoop/test84.hadoop@DIANPING.COMscan complete in 2msConnecting to jdbc:hive2://test84.hadoop:10000/default;principal=hadoop/test84.hadoop@DIANPING.COMEnter username for jdbc:hive2://test84.hadoop:10000/default;principal=hadoop/test84.hadoop@DIANPING.COM: Enter password for jdbc:hive2://test84.hadoop:10000/default;principal=hadoop/test84.hadoop@DIANPING.COM: Connected to: Hive (version 0.11.0)Driver: Hive (version 0.11.0)Transaction isolation: TRANSACTION_REPEATABLE_READ0: jdbc:hive2://test84.hadoop:10000/default> select count(1) from abc;+------+| _c0 |+------+| 0 |+------+1 row selected (29.277 seconds)0: jdbc:hive2://test84.hadoop:10000/default> !qClosing: org.apache.hive.jdbc.HiveConnection
thrift client和server会建立一个session handler,有唯一的HandleIdentifier(SessionID),由CLIService中的SessionManager统一管理(维护了SessionHandle对HiveSession的mapping关系),HiveSession维护了SessionConf和HiveConf信息,用户的每次执行语句会新建一个driver,将hiveconf传进去后再执行语句,这也就是Hive server 2支持concurrency的方式。每次操作(会有不同的opType,比如EXECUTE_STATEMEN)会生成独立的OperationHandle,也有各自的HandleIdentifier。用户在beeline中输入"!q"会销毁该session,并且销毁相应的资源。
ps : 用下来有点不太爽的是执行mapreduce job时候没有执行过程信息,如果是一个执行时间很长的语句,会等很久而没有任何信息反馈。
2. JDBC方式
hive server 1的driver classname是org.apache.hadoop.hive.jdbc.HiveDriver,Hive Server 2的是org.apache.hive.jdbc.HiveDriver,这两个容易混淆。
另外可以在connectionUrl中指定HiveConf param和变量,params之间用';'分割,params和variables用'#'来隔开。这些都是session级别的,hive在建立完session后,会首先执行set hiveconf key value语句。
比如:
1. 带hiveconf和variables: jdbc:hive2://test84.hadoop:10000/default?hive.cli.conf.printheader=true#stab=salesTable;icol=customerID
2. 带variables: jdbc:hive2://test84.hadoop:10000/default;user=foo;password=bar
示例代码:[java] view plaincopyprint?
- import java.sql.Connection;
- import java.sql.DriverManager;
- import java.sql.ResultSet;
- import java.sql.ResultSetMetaData;
- import java.sql.SQLException;
- import java.sql.Statement;
-
- public class HiveTest {
-
- public static void main(String[] args) throws SQLException {
- try {
- Class.forName("org.apache.hive.jdbc.HiveDriver");
- } catch (ClassNotFoundException e) {
- e.printStackTrace();
- }
- Connection conn = DriverManager
- .getConnection(
- "jdbc:hive2://test84.hadoop:10000/default;principal=hadoop/test84.hadoop@DIANPING.COM",
- "", "");
- Statement stmt = conn.createStatement();
- String sql = "select * from abc";
- System.out.println("Running: " + sql);
- ResultSet res = stmt.executeQuery(sql);
- ResultSetMetaData rsmd = res.getMetaData();
- int columnCount = rsmd.getColumnCount();
- for (int i = 1; i <= columnCount; i++) {
- System.out.println(rsmd.getColumnTypeName(i) + ":"
- + rsmd.getColumnName(i));
- }
-
- while (res.next()) {
- System.out.println(String.valueOf(res.getInt(1)) + "\t"
- + res.getString(2));
- }
- }
- }
import java.sql.Connection;import java.sql.DriverManager;import java.sql.ResultSet;import java.sql.ResultSetMetaData;import java.sql.SQLException;import java.sql.Statement;public class HiveTest {public static void main(String[] args) throws SQLException {try {Class.forName("org.apache.hive.jdbc.HiveDriver");} catch (ClassNotFoundException e) {e.printStackTrace();}Connection conn = DriverManager.getConnection("jdbc:hive2://test84.hadoop:10000/default;principal=hadoop/test84.hadoop@DIANPING.COM","", "");Statement stmt = conn.createStatement();String sql = "select * from abc";System.out.println("Running: " + sql);ResultSet res = stmt.executeQuery(sql);ResultSetMetaData rsmd = res.getMetaData();int columnCount = rsmd.getColumnCount();for (int i = 1; i <= columnCount; i++) {System.out.println(rsmd.getColumnTypeName(i) + ":"+ rsmd.getColumnName(i));}while (res.next()) {System.out.println(String.valueOf(res.getInt(1)) + "\t"+ res.getString(2));}}}
HiveStatement现在支持取消语句,调用Statement.cancel()会终止并销毁正在执行中的driver
注:如果kerberos认证有问题的话,可以在起client jvm时候增加JVM option "-Dsun.security.krb5.debug=true"来查看详细信息