Hadoop “Failed to set setXIncludeAware(true) for parser” error and how to resolve it

来源:互联网 发布:js怎么实现图片轮播 编辑:程序博客网 时间:2024/06/14 15:54

原文地址: http://caffeinbean.wordpress.com/2011/03/01/hadoop-failed-to-set-setxincludeawaretrue-for-parser-error-and-how-to-resolve-it/


Hadoop is a great piece of technology. But it’s not the technology that helps you solve the great problems. It’s the attitude you gain after absorbing the knowledge, and the courage to attack the problems.

For Hadoop, the “hello world” application is WordCount. Basically you feed a document with the assumption that it can be huge, the map reduce program outputs unique words and their counts. In real life however, the challenges you face is not as trivial. Some are not yet answered and subject to active exploration and development. Dependency injection is a hot topic for instance. But for this post I’ll focus on a specific problem and present you the solution.

If you ever have to deal with XML in map reduce environment, it’s possible that you get a stacktrace dump similar below.

1
ERROR conf.Configuration: Failed to set setXIncludeAware(true) for parser org.apache.xerces.jaxp.DocumentBuilderFactoryImpl@47315d34:java.lang.UnsupportedOperationException: This parser does not support specification "null" version "null"java.lang.UnsupportedOperationException: This parser does not support specification "null" version "null" at javax.xml.parsers.DocumentBuilderFactory.setXIncludeAware(DocumentBuilderFactory.java:590)   at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1054)   at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1030)  at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:980)    at org.apache.hadoop.conf.Configuration.set(Configuration.java:405) at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:585)  at org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:290) at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:375)   at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)  at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:138)  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:59)    at my.job.MapReduce.main(MyJob.java:103)    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)    at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

The reason is that the JDK supplied XML libraries are a bit out of date. In order to get rid of this error, you’ll need to both provide recent versions of Xalan and Xerces with you job configuration, which means you’ll need to make them available in your classpath.

If you’re using maven, (you are using maven for map reduce jobs right?) it’s just a couple of lines to include in the pom file.

1
2
3
4
5
6
7
8
9
10
<dependency>
    <groupId>xerces</groupId>
    <artifactId>xercesImpl</artifactId>
    <version>2.9.1</version>
</dependency>
<dependency>
    <groupId>xalan</groupId>
    <artifactId>xalan</artifactId>
    <version>2.7.1</version>
</dependency>

The versions for xalan are xerces are specific. You need to supply the versions listed or above.


原创粉丝点击