hive 0.8运行python脚本问题
来源:互联网 发布:联想网络控制工具破解 编辑:程序博客网 时间:2024/05/03 03:23
最近在hive上执行python脚本出现了以下问题,在hive命令行里,执行时报错信息如下:
hive> from records
> select transform(year,temperature,quality)
> using 'python /user/hive/script/is_good_quality.py'
> as year,temperature;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201112291016_0023, Tracking URL = http://10.200.187.26:50030/jobdetails.jsp?jobid=job_201112291016_0023
Kill Command = /opt/hadoop-0.20.205.0/libexec/../bin/hadoop job -Dmapred.job.tracker=10.200.187.26:9001 -kill job_201112291016_0023
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2011-12-29 14:56:34,192 Stage-1 map = 0%, reduce = 0%
2011-12-29 14:57:16,405 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201112291016_0023 with errors
Error during job, obtaining debugging information...
Examining task ID: task_201112291016_0023_m_000002 (and more) from job job_201112291016_0023
Exception in thread "Thread-248" java.lang.RuntimeException: Error while reading from task log url
at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
at org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:211)
at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:81)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at sun.net.NetworkClient.doConnect(NetworkClient.java:163)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:395)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:530)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:234)
at sun.net.www.http.HttpClient.New(HttpClient.java:307)
at sun.net.www.http.HttpClient.New(HttpClient.java:324)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1172)
at java.net.URL.openStream(URL.java:1010)
at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
... 3 more
在hadoop的日志文件里(/opt/hadoop-0.20.205.0/logs/hadoop-root-jobtracker-chenyi3.log),错误信息如下:
2011-12-29 14:57:06,865 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201112291016_0023_m_000000_3: java.lang.RuntimeException: Hive Runtime Error while closing operators
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hit error while closing ..
at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:452)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
... 7 more
按照之前找到的解决方案如下:
A few things I'd check for if I were debugging this:
1) Is the python file set to be executable (chmod +x file.py)
2) Make sure the python file is in the same place on all machines. Probably better - put the file in hdfs then you can use " using 'hdfs://path/to/file.py' " instead of a local path
3) Take a look at your job on the hadoop dashboard (http://master-node:9100), if you click on a failed task it will give you the actual java error and stack trace so you can see what actually went wrong with the execution
4) make sure python is installed on all the slave nodes! (I always overlook this one)
Hope that helps.....
还是无法执行成功,暂时无解中(如有网友知道,请告知)……………………
通过几天的努力,终于把这个问题解决了,原因在于配置/etc/hosts文件,请参考我另一篇文章《Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.》。
- hive 0.8运行python脚本问题
- Python脚本运行问题
- hive 使用python脚本
- hive transform shell脚本运行
- Python:定时运行脚本
- Python 定时运行脚本
- 如何运行Python脚本
- Python批量运行脚本
- Python定时运行脚本
- python 后台运行脚本
- 运行python脚本
- python脚本后台运行
- hive transform脚本书写问题
- 【linux】crontab运行python脚本遇到的问题
- hive脚本运行查看错误日志方式
- hive安装与运行问题
- hive中用python脚本做小表的关联
- python 脚本文件直接运行
- jvm的描述信息
- 按钮加载位图
- Clojure:在REPL上实现一个简单的shell(一)
- 二分查找法 c语言版本
- 现有的安全体系比喻,很形象
- hive 0.8运行python脚本问题
- oracle 和 Jboss 的8080端口冲突
- Java 基础一些代码练习笔记(二分查找)
- SpringMVC+FreeMarker实现半自动静态化
- SMW0 HTML模版的形式上传文件 维护MIME类型
- WebView学习指南
- glutCreateWindow
- HttpClient 的 Post 提交表单简单示例
- 关于direct path read事件