python借助pysh2连接hiveserver2操作hive数据库时thrift.transport.TTransport.TTransportException: TSocket read 0

来源:互联网 发布:linux vi 显示行数 编辑:程序博客网 时间:2024/06/06 01:25

python 借助pysh2包 连接hiveserver2操作hive数据库时,报如下错误提示信息:

python连接hive数据库时运行报错如下:

thrift.transport.TTransport.TTransportException: TSocket read 0 bytes

或者

Required field 'sessionHandle' is unset! Struct:TExecuteStatementReq(sessionHandle:null, statement:USE default, confOverlay:{})

hive 的hiveserver2的运行日志报错如下:
2017-10-12T14:24:03,540  WARN [HiveServer2-Handler-Pool: Thread-39] service.CompositeService: Failed to open session
java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: a6 is not allowed to impersonate anonymous
………………
ERROR [HiveServer2-Handler-Pool: Thread-39] server.TThreadPoolServer: Thrift error occurred during processing of message.

org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client?

序—写在前面:

最近工作中开始接触大数据项目,由于对大数据相关的一些软件感兴趣,如Hadoop,Hbase,hive,thrift,zookeeper等软件包感兴趣,于是在工作间隙在本地mac安装这些开发环境的伪分布式,前几天顺利完成python利用thrift操作hbase的小程序编写,接着就想同样利用python来操作hive数据库,虽然最后成功完成该小程序,但其中过程之波折,主要是遇到如下这个问题,百度之,很少回答或者回答内容让人摸不着头脑,困扰了我快一天,最后在罗大神帮忙下,顺利解决该问题。此文仅仅说明该问题的解决过程,最后再次感谢,罗大神和峰哥的帮忙。一定要充分利用好日志!

闲话不说啦,开始正文啦。本文首选抛出本文要解决的问题,然后进行执行错误原因查找,之后给出原因分析及原因解决方案,最后补充给出Hive中HiveServer或者HiveServer2的区别。

一、抛出问题
1.启动MySQL,hadoop,hive之后,最后需要启动hiveserver2
python使用HiveServer2模式连接hive数据库服务器时,成功启动HiveServer2后并置于后台运行
localhost:bin a6$ pwd/Users/a6/Applications/apache-hive-2.3.0-bin/binlocalhost:bin a6$ hive --service hiveserver2 &
默认端口是10000
也可启动时指定端口,命令如下
hive --service hiveserver2 --hiveconf hive.server2.thrift.port=10001 &

2.安装pyhs2这个python工具包,下面显示我已经安装成功。
localhost:bin a6$ sudo pip install pyhs2Password:The directory '/Users/a6/Library/Caches/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.The directory '/Users/a6/Library/Caches/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.Requirement already satisfied: pyhs2 in /Library/Python/2.7/site-packagesRequirement already satisfied: sasl in /Library/Python/2.7/site-packages (from pyhs2)Requirement already satisfied: thrift in /Library/Python/2.7/site-packages/thrift-0.10.0-py2.7-macosx-10.12-intel.egg (from pyhs2)Requirement already satisfied: six in /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python (from sasl->pyhs2)
3.python利用pyhs2操作hive数据的代码如下 

import pyhs2with pyhs2.connect(host='localhost',                   port=10000,                   authMechanism="NOSASL",                   user='a6',                   password=''                   #password='anonymous'                 ) as conn:    with conn.cursor() as cur:        #Show databases        print "connect hive database success"        print cur.getDatabases()        print "read data sucess"

二、原因查找

1. python执行窗口报错如下:

/System/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 /Users/a6/Downloads/PycharmProjects/test_use_hbase_by_thrift/test11.pyTraceback (most recent call last):dssdskd  File "/Users/a6/Downloads/PycharmProjects/test_use_hbase_by_thrift/test11.py", line 13, in <module>    print "sucess"  File "/Library/Python/2.7/site-packages/pyhs2/connections.py", line 58, in __exit__    self.close()  File "/Library/Python/2.7/site-packages/pyhs2/connections.py", line 78, in close    self.client.CloseSession(req)  File "/Library/Python/2.7/site-packages/pyhs2/TCLIService/TCLIService.py", line 184, in CloseSession    return self.recv_CloseSession()  File "/Library/Python/2.7/site-packages/pyhs2/TCLIService/TCLIService.py", line 195, in recv_CloseSession    (fname, mtype, rseqid) = self._iprot.readMessageBegin()  File "build/bdist.macosx-10.12-intel/egg/thrift/protocol/TBinaryProtocol.py", line 134, in readMessageBegin  File "build/bdist.macosx-10.12-intel/egg/thrift/protocol/TBinaryProtocol.py", line 217, in readI32  File "build/bdist.macosx-10.12-intel/egg/thrift/transport/TTransport.py", line 60, in readAll  File "build/bdist.macosx-10.12-intel/egg/thrift/transport/TTransport.py", line 161, in read  File "build/bdist.macosx-10.12-intel/egg/thrift/transport/TSocket.py", line 132, in readthrift.transport.TTransport.TTransportException: TSocket read 0 bytes

2. hive 执行日志的web UI查找

     Hive从2.0版本开始,为HiveServer2提供了一个简单的WEB UI界面,界面中可以直观的看到当前链接的会话、历史日志、配置参数以及度量信息。
 1).查看并配置hiveserver2的web UI信息
localhost:conf a6$ pwd/Users/a6/Applications/apache-hive-2.3.0-bin/conflocalhost:conf a6$ vi hive-site.xml
配置web ui 界面非常简单,两个参数:
<property>    <name>hive.server2.webui.host</name>    <value>0.0.0.0</value>    <description>The host address the HiveServer2 WebUI will listen on</description>  </property>  <property>    <name>hive.server2.webui.port</name>    <value>10002</value>    <description>The port the HiveServer2 WebUI will listen on. This can beset to 0 or a negative integer to disable the web UI</description>  </property>

修改配置文件后,必须需要重新启动HiveServer2,在浏览器中输入
http://localhost:10002/    或者    http://127.0.0.1:10002/

即可进入HiveServer2的WEB UI管理界面,然后就可以方便查看其相关的执行log日志。

2).利用hiveserver2的web UI页面查看执行记录
在浏览器中输入
http://localhost:10002
依次选择“Local logs”——>“hive.log”——>然后翻阅到最下面查看报错信息。
http://localhost:10002/logs/hive.log
报错信息如下:
2017-10-12T14:20:45,755  INFO [HiveServer2-Handler-Pool: Thread-42] session.SessionState: Resetting thread name to  HiveServer2-Handler-Pool: Thread-422017-10-12T14:20:45,760  WARN [HiveServer2-Handler-Pool: Thread-42] thrift.ThriftCLIService: Error opening session:org.apache.hive.service.cli.HiveSQLException: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: a6 is not allowed to impersonate anonymous        at org.apache.hive.service.cli.session.SessionManager.createSession(SessionManager.java:419) ~[hive-service-2.3.0.jar:2.3.0]        at org.apache.hive.service.cli.session.SessionManager.openSession(SessionManager.java:362) ~[hive-service-2.3.0.jar:2.3.0]        at org.apache.hive.service.cli.CLIService.openSessionWithImpersonation(CLIService.java:193) ~[hive-service-2.3.0.jar:2.3.0]        at org.apache.hive.service.cli.thrift.ThriftCLIService.getSessionHandle(ThriftCLIService.java:440) ~[hive-service-2.3.0.jar:2.3.0]        at org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:322) ~[hive-service-2.3.0.jar:2.3.0]        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1377) ~[hive-exec-2.3.0.jar:2.3.0]        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1362) ~[hive-exec-2.3.0.jar:2.3.0]        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[hive-exec-2.3.0.jar:2.3.0]        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[hive-exec-2.3.0.jar:2.3.0]        at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) ~[hive-service-2.3.0.jar:2.3.0]        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) ~[hive-exec-2.3.0.jar:2.3.0]        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]Caused by: java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: a6 is not allowed to impersonate anonymous        at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:89) ~[hive-service-2.3.0.jar:2.3.0]        at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) ~[hive-service-2.3.0.jar:2.3.0]        at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) ~[hive-service-2.3.0.jar:2.3.0]        at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131]        at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_131]        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692) ~[hadoop-common-2.6.5.jar:?]        at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) ~[hive-service-2.3.0.jar:2.3.0]        at com.sun.proxy.$Proxy37.open(Unknown Source) ~[?:?]        at org.apache.hive.service.cli.session.SessionManager.createSession(SessionManager.java:410) ~[hive-service-2.3.0.jar:2.3.0]        ... 13 moreCaused by: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: a6 is not allowed to impersonate anonymous
三、原因分析及解决方案
1. 原因分析:

python连接hive信息,报出如下信息:


Required field 'sessionHandle' is unset! Struct:TExecuteStatementReq(sessionHandle:null, statement:USE default, confOverlay:{})


显示,这个的时候说明你写的连接Hive的参数有问题。

我的这里的信息是hive账号出现了问题,导致权限不够。

请检查hive的username,或者其他连接信息、

或者项目的hive-jdbc版本和服务器不一致的原因造成的,替换成和服务器一致的版本就可以了,PS:hive前期版本中bug较多,推荐使用最新的版本

我的出错原因是执行查询hive操作的用户与配置hadoop和hive操作的用户不一致

     2.解决方案
  • 1). 修改hadoop 配置文件 etc/hadoop/core-site.xml,加入如下配置项
<!--10-12 add-->    <property>        <name>hadoop.proxyuser.a6.hosts</name>        <value>*</value>    </property>    <property>            <name>hadoop.proxyuser.a6.groups</name>            <value>*</value>    </property>

      2). 最终配置结果如下图:
  
3).修改hadoop的core-site.xml配置文件完成之后,需要重新启动hadoop服务(主要是hdfs服务)
localhost:hadoop a6$ pwd/Users/a6/Applications/hadoop-2.6.5/etc/hadooplocalhost:hadoop a6$ sh ../../sbin/start-all.shThis script is Deprecated. Instead use start-dfs.sh and start-yarn.sh17/10/12 15:07:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableStarting namenodes on [localhost]localhost: starting namenode, logging to /Users/a6/Applications/hadoop-2.6.5/logs/hadoop-a6-namenode-localhost.outlocalhost: starting datanode, logging to /Users/a6/Applications/hadoop-2.6.5/logs/hadoop-a6-datanode-localhost.outStarting secondary namenodes [0.0.0.0]0.0.0.0: starting secondarynamenode, logging to /Users/a6/Applications/hadoop-2.6.5/logs/hadoop-a6-secondarynamenode-localhost.out17/10/12 15:08:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicablestarting yarn daemonsstarting resourcemanager, logging to /Users/a6/Applications/hadoop-2.6.5/logs/yarn-a6-resourcemanager-localhost.outlocalhost: starting nodemanager, logging to /Users/a6/Applications/hadoop-2.6.5/logs/yarn-a6-nodemanager-localhost.out

四、Hive中HiveServer或者HiveServer2的区别

在之前的学习和实践Hive中,使用的都是CLI或者hive –e的方式,该方式仅允许使用HiveQL执行查询、更新等操作,并且该方式比较笨拙单一。幸好Hive提供了轻客户端的实现,通过HiveServer或者HiveServer2,客户端可以在不启动CLI的情况下对Hive中的数据进行操作,两者都允许远程客户端使用多种编程语言如Java、Python向Hive提交请求,取回结果。HiveServer或者HiveServer2都是基于Thrift的,但HiveSever有时被称为Thrift server,而HiveServer2却不会。既然已经存在HiveServer为什么还需要HiveServer2呢?这是因为HiveServer不能处理多于一个客户端的并发请求,这是由于HiveServer使用的Thrift接口所导致的限制,不能通过修改HiveServer的代码修正。因此在Hive-0.11.0版本中重写了HiveServer代码得到了HiveServer2,进而解决了该问题。HiveServer2支持多客户端的并发和认证,为开放API客户端如JDBC、ODBC提供了更好的支持。
       既然HiveServer2提供了更强大的功能,将会对其进行着重学习,但也会简单了解一下HiveServer的使用方法。在命令中输入hive --service help,结果如下。可以使用hive <parameters> --service serviceName <serviceparameters>启动特定的服务,如cli、hiverserver、hiveserver2等.

参考:
http://blog.csdn.net/u011686226/article/details/52044176
http://blog.csdn.net/vfgbv/article/details/51012806
http://blog.csdn.net/u012965373/article/details/52903389
http://blog.csdn.net/u012965373/article/details/52057968

阅读全文
0 0