【MapReduce】Streaming Job Failed!
来源:互联网 发布:网络诈骗的方式 编辑:程序博客网 时间:2024/06/13 18:40
报错发生情况:
用Python写好了一个MR程序,使用Linux环境本地测试正常。
在Hadoop环境上测试就报错。
我的环境:
$hadoop versionHadoop 2.5.2...
执行指令:
hadoop jar $HADOOP_INSTALL_HOME/contrib/streaming/hadoop-*streaming*.jar \-file ./mapper.py -mapper ./mapper.py \-file ./reducer.py -reducer ./reducer.py \-input /data/poem/data_test \-output /data/poem/result
报错信息:
packageJobJar: [mapper.py, reducer.py] [] /tmp/streamjob4957099323859594325.jar tmpDir=null17/04/13 15:10:52 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:803217/04/13 15:10:53 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:803217/04/13 15:10:56 INFO mapred.FileInputFormat: Total input paths to process : 217/04/13 15:10:56 INFO mapreduce.JobSubmitter: number of splits:217/04/13 15:10:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1492067422224_000117/04/13 15:10:57 INFO impl.YarnClientImpl: Submitted application application_1492067422224_000117/04/13 15:10:57 INFO mapreduce.Job: The url to track the job: http://chinahaoop0:8088/proxy/application_1492067422224_0001/17/04/13 15:10:57 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-hadoop/mapred/local]17/04/13 15:10:57 INFO streaming.StreamJob: Running job: job_1492067422224_000117/04/13 15:10:57 INFO streaming.StreamJob: Job running in-process (local Hadoop)17/04/13 15:10:59 INFO streaming.StreamJob: map 0% reduce 0%17/04/13 15:11:56 INFO streaming.StreamJob: map 50% reduce 0%17/04/13 15:11:57 INFO streaming.StreamJob: map 100% reduce 0%17/04/13 15:11:58 INFO streaming.StreamJob: map 0% reduce 0%17/04/13 15:12:27 INFO streaming.StreamJob: map 50% reduce 0%17/04/13 15:12:31 INFO streaming.StreamJob: map 0% reduce 0%17/04/13 15:13:08 INFO streaming.StreamJob: map 100% reduce 0%17/04/13 15:13:09 INFO streaming.StreamJob: map 0% reduce 0%17/04/13 15:13:30 INFO streaming.StreamJob: map 50% reduce 0%17/04/13 15:13:32 INFO streaming.StreamJob: map 100% reduce 0%17/04/13 15:13:33 INFO streaming.StreamJob: map 100% reduce 100%17/04/13 15:13:36 INFO streaming.StreamJob: Job running in-process (local Hadoop)17/04/13 15:13:36 ERROR streaming.StreamJob: Job not Successful!17/04/13 15:13:36 INFO streaming.StreamJob: killJob...17/04/13 15:13:36 INFO impl.YarnClientImpl: Killed application application_1492067422224_0001Streaming Job Failed!
找到日志文件,发现具体报错信息为:
Error: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1937) at org.apache.hadoop.mapred.JobConf.getMapRunnerClass(JobConf.java:1125) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1905) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929) ... 8 moreCaused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1811) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903) ... 9 more
报错的关键信息是:
java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found
定位错误过程
1.MR脚本有误:
本地测试的时候,脚本正常,排除此问题。
2.环境配置有误:
使用hadoop的example jar包测试,正常。排除此问题。
3.jar包问题:
因为提示ClassNotFund的异常,第一个时间就应该想到是jar包的问题。jar包可能与hadoop的版本不匹配。
最终处理:
我的jar包是在网上单独下的,因为根据网上大多数教程提供的路径$HADOOP_INSTALL_HOME/contrib/streaming/hadoop-streaming.jar
最初我没有找到相应的路径,以为需要自身去下载。
最后发现,hadoop 2.5.2中对应的jar包地址是在:
$HADOOP_INSTALL_HOME/share/hadoop/tools/lib
藏得有点儿太深了呀(′д` )…彡…彡!找了我半天!
重写的执行语句:
hadoop jar $HADOOP_INSTALL_HOME/share/hadoop/tools/lib/hadoop-*streaming*.jar\-file ./mapper.py -mapper ./mapper.py \-file ./reducer.py -reducer ./reducer.py \-input /data/poem/data_test -output /data/poem/result
经验总结:
- ClassNotFound异常,很有可能是jar包与hadoop环境不匹配。我的jar包太老了。像hadoop-streaming*.jar这类型的官方发布基础jar包,一般在装软件的时候都会自带。
- 软件不同的版本,其路径很有可能有变化,需要灵活应变。(就连Centos7对比之前版本,许多命令都变了呢)
- 屏幕上打印的的异常信息,常常不是很详细且精准。除了看屏幕上的错误信息以外,最好查看运行日志,查看详细的错误报告。
2 0
- 【MapReduce】Streaming Job Failed!
- Mapreduce streaming
- 利用streaming 编写 mapreduce
- MapReduce: Job提交过程
- Hadoop,MapReduce,JOB参数
- 【Hadoop】MapReduce Job Files
- MapReduce的job调优
- MapReduce Job Control
- mapreduce job一直卡住
- mapreduce链接job流
- 6 Spark Streaming Job思考
- Spark Streaming 2.0 runDummySpark Job
- spark streaming job 耗时监控
- nutch "Job failed!" 问题解决
- Job Setup: Failed
- Job failed to start
- A failed job change
- Hadoop MapReduce 深入MapReduce Job 提交
- 视觉slam闭环检测之-DBoW2 -视觉词袋构建
- 图像处理相关的重要期刊汇总
- mysql存储过程
- HttpURLConnection详解
- Mybatis ${} 与#{} 区别
- 【MapReduce】Streaming Job Failed!
- ubuntu14.04 ssh允许root用户远程登录
- Android阅读推荐
- hamster
- js、php微信浏览器判断、移动设备判断
- linux firefox提示“firefox is already running”的解决方法
- ZYNQ7000平台Linux文件系统工具buildroot下QT5.4的编译配置
- Linux查看系统信息的一些命令及查看已安装软件包的命令(我主要查服务)
- RESTful API 设计指南