【MapReduce】Streaming Job Failed!

来源:互联网 发布:网络诈骗的方式 编辑:程序博客网 时间:2024/06/13 18:40

报错发生情况:

用Python写好了一个MR程序,使用Linux环境本地测试正常。
在Hadoop环境上测试就报错。

我的环境:

$hadoop versionHadoop 2.5.2...

执行指令:

hadoop jar $HADOOP_INSTALL_HOME/contrib/streaming/hadoop-*streaming*.jar   \-file ./mapper.py -mapper ./mapper.py \-file ./reducer.py -reducer ./reducer.py \-input /data/poem/data_test \-output /data/poem/result

报错信息:

packageJobJar: [mapper.py, reducer.py] [] /tmp/streamjob4957099323859594325.jar tmpDir=null17/04/13 15:10:52 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:803217/04/13 15:10:53 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:803217/04/13 15:10:56 INFO mapred.FileInputFormat: Total input paths to process : 217/04/13 15:10:56 INFO mapreduce.JobSubmitter: number of splits:217/04/13 15:10:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1492067422224_000117/04/13 15:10:57 INFO impl.YarnClientImpl: Submitted application application_1492067422224_000117/04/13 15:10:57 INFO mapreduce.Job: The url to track the job: http://chinahaoop0:8088/proxy/application_1492067422224_0001/17/04/13 15:10:57 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-hadoop/mapred/local]17/04/13 15:10:57 INFO streaming.StreamJob: Running job: job_1492067422224_000117/04/13 15:10:57 INFO streaming.StreamJob: Job running in-process (local Hadoop)17/04/13 15:10:59 INFO streaming.StreamJob:  map 0%  reduce 0%17/04/13 15:11:56 INFO streaming.StreamJob:  map 50%  reduce 0%17/04/13 15:11:57 INFO streaming.StreamJob:  map 100%  reduce 0%17/04/13 15:11:58 INFO streaming.StreamJob:  map 0%  reduce 0%17/04/13 15:12:27 INFO streaming.StreamJob:  map 50%  reduce 0%17/04/13 15:12:31 INFO streaming.StreamJob:  map 0%  reduce 0%17/04/13 15:13:08 INFO streaming.StreamJob:  map 100%  reduce 0%17/04/13 15:13:09 INFO streaming.StreamJob:  map 0%  reduce 0%17/04/13 15:13:30 INFO streaming.StreamJob:  map 50%  reduce 0%17/04/13 15:13:32 INFO streaming.StreamJob:  map 100%  reduce 0%17/04/13 15:13:33 INFO streaming.StreamJob:  map 100%  reduce 100%17/04/13 15:13:36 INFO streaming.StreamJob: Job running in-process (local Hadoop)17/04/13 15:13:36 ERROR streaming.StreamJob: Job not Successful!17/04/13 15:13:36 INFO streaming.StreamJob: killJob...17/04/13 15:13:36 INFO impl.YarnClientImpl: Killed application application_1492067422224_0001Streaming Job Failed!

找到日志文件,发现具体报错信息为:

Error: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1937)        at org.apache.hadoop.mapred.JobConf.getMapRunnerClass(JobConf.java:1125)        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)        at java.security.AccessController.doPrivileged(Native Method)        at javax.security.auth.Subject.doAs(Subject.java:422)        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1905)        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)        ... 8 moreCaused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1811)        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)        ... 9 more

报错的关键信息是:

java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found

定位错误过程

1.MR脚本有误:

本地测试的时候,脚本正常,排除此问题。

2.环境配置有误:

使用hadoop的example jar包测试,正常。排除此问题。

3.jar包问题:

因为提示ClassNotFund的异常,第一个时间就应该想到是jar包的问题。jar包可能与hadoop的版本不匹配。

最终处理:

我的jar包是在网上单独下的,因为根据网上大多数教程提供的路径$HADOOP_INSTALL_HOME/contrib/streaming/hadoop-streaming.jar
最初我没有找到相应的路径,以为需要自身去下载。

最后发现,hadoop 2.5.2中对应的jar包地址是在:

$HADOOP_INSTALL_HOME/share/hadoop/tools/lib

jar包

藏得有点儿太深了呀(′д` )…彡…彡!找了我半天!

重写的执行语句:

hadoop jar $HADOOP_INSTALL_HOME/share/hadoop/tools/lib/hadoop-*streaming*.jar\-file ./mapper.py -mapper ./mapper.py \-file ./reducer.py  -reducer ./reducer.py \-input /data/poem/data_test -output /data/poem/result

经验总结:

  1. ClassNotFound异常,很有可能是jar包与hadoop环境不匹配。我的jar包太老了。像hadoop-streaming*.jar这类型的官方发布基础jar包,一般在装软件的时候都会自带。
  2. 软件不同的版本,其路径很有可能有变化,需要灵活应变。(就连Centos7对比之前版本,许多命令都变了呢)
  3. 屏幕上打印的的异常信息,常常不是很详细且精准。除了看屏幕上的错误信息以外,最好查看运行日志,查看详细的错误报告。
2 0