Ubuntu 12.04 nutch 2.3.1 出现问题总结

来源:互联网 发布:关闭miui优化有好处吗 编辑:程序博客网 时间:2024/06/08 17:33

在安装使用nutch的过程中我遇到了不少问题,我使用的平台是Ubuntu 12.04 32位,nutch安装环境为jdk1.8.0_121,hbase0.98.8,solr4.10.3。

参考博客为:
1、http://blog.csdn.net/freedomboy319/article/details/44172277
2、http://blog.csdn.net/a973893384/article/details/49666063

目前已经基本安装成功,但是在抓取时还是会出现一些问题:

IndexingJob: done.SOLR dedup -> http://localhost:8983/solr~/lab1/NUTCH_HOME/runtime/local/bin/nutch solrdedup -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true http://localhost:8983/solrSLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:~/lab1/NUTCH_HOME/runtime/local/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:~/lab1/NUTCH_HOME/runtime/local/lib/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.Exception in thread "main" java.lang.RuntimeException: job failed: name=apache-nutch-2.3.1.jar, jobid=job_local365318350_0001    at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)    at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:383)    at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.run(SolrDeleteDuplicates.java:393)    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)    at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.main(SolrDeleteDuplicates.java:403)Error running:  ~/lab1/NUTCH_HOME/runtime/local/bin/nutch solrdedup -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true http://localhost:8983/solrFailed with exit value 1.

经过查询发现是有SLF4J冲突文件,只要删除其中一个冲突问题就解决了,也可以正常爬到数据。

但是index还是无法建立,在同一个地方继续报错,所以需要改进

SOLR dedup -> http://localhost:8983/solr//home/silvia/lab1/NUTCH_HOME/runtime/local/bin/nutch solrdedup -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true http://localhost:8983/solr/Exception in thread "main" java.lang.RuntimeException: job failed: name=apache-nutch-2.3.1.jar, jobid=job_local2020123009_0001    at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)    at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:383)    at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.run(SolrDeleteDuplicates.java:393)    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)    at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.main(SolrDeleteDuplicates.java:403)Error running:  /home/silvia/lab1/NUTCH_HOME/runtime/local/bin/nutch solrdedup -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true http://localhost:8983/solr/Failed with exit value 1.

待更新。。。

1 0
原创粉丝点击