hadoop 程序遇到的问题

来源:互联网 发布:视频音乐制作软件 编辑:程序博客网 时间:2024/04/28 01:52
1
java.lang.Exception: java.lang.RuntimeException: java.lang.NoSuchMethodException: Hadoo$MRMapper.<init>()    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403)<span class="xml"><span style="color:#000000;">

原因:hadoop的Mapper和Reduce作为内部类必须是静态的

解决:添加static即可


2.

</span></span><pre name="code" class="html"> job_local737230221_0001java.lang.Exception: java.lang.NullPointerException    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403)Caused by: java.lang.NullPointerException    at mapredutest.Mapreduce$MRmapper.map(Mapreduce.java:40)    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)    at mapredutest.Mapreduce$MRmapper.run(Mapreduce.java:28)    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.

问题: 多半应该是数据类新不匹配的问题,mapper里的key value和map里的要匹配,reduce同理,同样job里的定义也要匹配。


 3.想获取目标文件的 value为 double型,结果出错

java.io.IOException: wrong value class: class org.apache.hadoop.io.DoubleWritable is not class org.apache.hadoop.io.IntWritable

原因: 因为程序里map出的value是IntWritable型的,但是Reduce里出的是DoubleWritable型,在job里,定义了job.setCombinerClass(...Reduce.class),显然类型不匹配,所以报错解决:设置匹配的类型,或获取采取默认combiner,不指定combiner:


4

Exception in thread “main” java.lang.IllegalArgumentException: Wrong FS: hdfs://localhost:9000/output, expected: file:///

原因:Configuration 的实例conf未能找到 fs.default.name的值,所以conf里不知 hdfs://localhost:9000.
解决:
 // pickup config files off classpath Configuration conf = new Configuration() // explicitely add other config files // PASS A PATH NOT A STRING! conf.addResource(new Path("/opt/hadoop/hadoop-2.2.0/etc/core-site.xml")); FileSystem fs = FileSystem.get(conf); // load files and stuff below!

注意是conf.addResource(new Path(String str);不是conf.addResource(String str);


5. GibbsSampling 运行的问题:里面设置了iteration次数,开始记录sampl的次数begintorecord,及recordstep间隔数,通过提取job里static变量 的值来用,在eclipse里调试可行,但是换命令行执行 得到的记录为空,但换成对应的数字运行,就会得到期望的结果,显然是变量值提取过程出错了!但是为什么在eclipse里调试无错,且变量值提取正确,而换命令行运行jar不行?

原因 暂未知

解决, 将变量已常量的形式,赋值在job configuration里,已configuration的形式赋值并调用,结果正确!


6.  没弄懂eclipse调试和命令行调试机理差别在哪?

 在命令行运行时,路径参数直接写hadoop的相对路径,即可识别,elipse里调试,run on hadoop 却必须写全路径,即包含如 http://localhost:9000,否则无法识别。

暂未找出原因。

7.  问题类似于6, 在startjob里,需要运行两个mapreduce程序,后者对前者的运行结果做概率计算,所以基本写成如下

int res =        ToolRunner.run(new Configuration(), new GibbsSamplingJob(), gibbsArgs);    // System.exit(res);    int resCP =        ToolRunner.run(new Configuration(), new ComputeProbability(),            computeProbabilityArges);    System.exit(resCP);

命令行运行:无错误提示信息,先执行了res,会看到正常的map执行过程,执行完毕,接着执行了resCP,最后得到两个正确的结果,在浏览器里会看到两个allication,

eclipse调试: 会得到期望的结果,但console只显示了后者map过程,前者未显示,且中间提示错误如下:

Connecting to datanode 127.0.0.1:50010Error making BlockReader. Closing stale NioInetPeer(Socket[addr=/127.0.0.1,port=50010,localport=39052])java.io.EOFException: Premature EOF: no length prefix availableat org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1492)at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:392)at org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:131)at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1088)at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:533)at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:749)at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:793)at java.io.DataInputStream.read(Unknown Source)at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:211)at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:164)at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)at java.util.concurrent.FutureTask.run(Unknown Source)at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)at java.lang.Thread.run(Unknown Source)Starting flush of map output

暂未解决





  参考:http://www.tuicool.com/articles/YNFzem     
  http://answers.mapr.com/questions/6873/exception-wrong-value-class-class-orgapachehadoopiotext-is-not-class-orgapachehadoopiolongwritable
http://www.opensourceconnections.com/blog/2013/03/24/hdfs-debugging-wrong-fs-expected-file-exception/

 


                                             
0 0
原创粉丝点击