NullPointerException at org.apache.nutch.fetcher.FetcherOutputFormat.checkOutputSpecs

来源:互联网 发布:xp禁止自动安装软件 编辑:程序博客网 时间:2024/06/03 14:37

坑爹呀,这个错误,纠结了我好几天,终于解决了,我觉的很有必要单独拿出来与大家分享下:

原因就是这个是Nutch1.3在与Hadoop0.20.203.0整合的时候的一个bug,在官网上有做出了相应的修改:

修改方法就是得修改两个文件:加号表示添加,减号表示删除~~

修改的第一个文件是:src/java/org/apache/nutch/parse/ParseOutputFormat.java

 public void checkOutputSpecs(FileSystem fs, JobConf job) throws IOException {-    Path out = FileOutputFormat.getOutputPath(job);-    if (fs.exists(new Path(out, CrawlDatum.PARSE_DIR_NAME)))-      throw new IOException("Segment already parsed!");+      Path out = FileOutputFormat.getOutputPath(job);+      if ((out == null) && (job.getNumReduceTasks() != 0)) {+          throw new InvalidJobConfException(+                  "Output directory not set in JobConf.");+      }+      if (fs == null) {+          fs = out.getFileSystem(job);+      }+      if (fs.exists(new Path(out, CrawlDatum.PARSE_DIR_NAME)))+          throw new IOException("Segment already parsed!");   }

 修改的第二个文件是:src/java/org/apache/nutch/fetcher/FetcherOutputFormat.java

import org.apache.hadoop.io.SequenceFile.CompressionType;  import org.apache.hadoop.mapred.FileOutputFormat;+import org.apache.hadoop.mapred.InvalidJobConfException; import org.apache.hadoop.mapred.OutputFormat; import org.apache.hadoop.mapred.RecordWriter; import org.apache.hadoop.mapred.JobConf;@@ -46,8 +47,15 @@    public void checkOutputSpecs(FileSystem fs, JobConf job) throws IOException {     Path out = FileOutputFormat.getOutputPath(job);+    if ((out == null) && (job.getNumReduceTasks() != 0)) {+    throw new InvalidJobConfException(+    "Output directory not set in JobConf.");+    }+    if (fs == null) {+    fs = out.getFileSystem(job);+    }     if (fs.exists(new Path(out, CrawlDatum.FETCH_DIR_NAME)))-      throw new IOException("Segment already fetched!");+    throw new IOException("Segment already fetched!");   }

修改完这两个文件,再重新ant编译下,问题解决~~

原创粉丝点击