诡异的jdom出现网络访问

来源:互联网 发布:按键精灵 js 插件 编辑:程序博客网 时间:2024/06/05 09:58

一段用jdom解析struts2 Action配置文件的代码,最近莫名其妙抛异常了。

看一下异常完全是八竿子打不上的,ConnectException, 莫不成解析XML还需要联网?

java.net.ConnectException: Connection timed out: connectat java.net.PlainSocketImpl.socketConnect(Native Method)at java.net.PlainSocketImpl.doConnect(Unknown Source)at java.net.PlainSocketImpl.connectToAddress(Unknown Source)at java.net.PlainSocketImpl.connect(Unknown Source)at java.net.SocksSocketImpl.connect(Unknown Source)at java.net.Socket.connect(Unknown Source)at java.net.Socket.connect(Unknown Source)at sun.net.NetworkClient.doConnect(Unknown Source)at sun.net.www.http.HttpClient.openServer(Unknown Source)at sun.net.www.http.HttpClient.openServer(Unknown Source)at sun.net.www.http.HttpClient.<init>(Unknown Source)at sun.net.www.http.HttpClient.New(Unknown Source)at sun.net.www.http.HttpClient.New(Unknown Source)at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(Unknown Source)at sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source)at sun.net.www.protocol.http.HttpURLConnection.connect(Unknown Source)at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(Unknown Source)at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(Unknown Source)at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source)at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source)at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)at org.jdom.input.SAXBuilder.build(SAXBuilder.java:489)at org.jdom.input.SAXBuilder.build(SAXBuilder.java:847)at org.jdom.input.SAXBuilder.build(SAXBuilder.java:826)at xml.UrlFetcher.main(UrlFetcher.java:34)

 

随后猜测问题可能出在这里,xml有引用一个外部dtd定义,

<?xml version="1.0" encoding="GBK"?><!DOCTYPE struts PUBLIC"-//Apache Software Foundation//DTD Struts Configuration 2.0//EN""http://struts.apache.org/dtds/struts-2.0.dtd">

jdom在解析的过程中默认会读取xml文件中的dtd,所以如果是url格式的就会产生外网访问。

而最近除了国内的gfw喜欢和谐网站之外,公司也上了gfw,所有外网请求都是悲剧。。这里程序是无法连接到http://struts.apache.org的,所以也就抛出ConnectException

 

上网找了一圈,最终得到真传:(就是下面被我注释掉的这么一行,加上去就ok,无需访问外网)

 

Document doc;Element root; SAXBuilder builder = new SAXBuilder();//builder.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);try {doc = builder.build(new File(filename));root = doc.getRootElement();Element packageN = (Element) XPath.selectSingleNode(root, "//package");System.out.println("packageN:"+packageN);String namespace = packageN.getAttribute("namespace").getValue();System.out.println("   namespace:"+namespace);List actions = XPath.selectNodes(root, "//action");Iterator iter = actions.iterator();while (iter.hasNext()) {Element rule = (Element) iter.next();String action = rule.getAttribute("name").getValue();//System.out.println("     actionName:"+action);System.out.println(namespace+"/"+action);//所有的url列表,哈哈//System.out.println(namespace+"/"+action+".xhtml");}} catch (JDOMException e) {e.printStackTrace();} catch (IOException e) {e.printStackTrace();}

 

为什么这么神奇呢?填入http://apache.org/xml/features/nonvalidating/load-external-dtd 就不会有问题了。

 

1.根据上面长长的错误堆栈,定位到问题出在com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl这里。

 

2.先在类里面搜一下nonvalidating,发现是

    /** Feature identifier: XInclude processing */    private static final String XINCLUDE_FEATURE =         Constants.XERCES_FEATURE_PREFIX + Constants.XINCLUDE_FEATURE;

 3.然后找一下上面这个字段的引用:发现了这个方法:

    /**     * Gets the XInclude processing mode for this parser     * @return the state of XInclude processing mode     */    public boolean isXIncludeAware() {        try {            return xmlReader.getFeature(XINCLUDE_FEATURE);        }        catch (SAXException exc) {            return false;        }    }
 

4.然后查看这个方法的引用,发现就是在parser的实现里面!也就是说给出的这个既定uri,会禁用到dtd下载

   SAXParserImpl(SAXParserFactoryImpl spf, Hashtable features, boolean secureProcessing)        throws SAXException    {        // Instantiate a SAXParser directly and not through SAX so that we use the right ClassLoader        xmlReader = new JAXPSAXParser(this);         // JAXP default is false while SAX 2 default is true!        xmlReader.setFeature0(NAMESPACES_FEATURE, spf.isNamespaceAware());        // SAX "namespaces" and "namespace-prefixes" features should not        // both be false.  We make them opposite for backward compatibility        // since JAXP 1.0 apps may want to receive xmlns* attributes.        xmlReader.setFeature0(NAMESPACE_PREFIXES_FEATURE, !spf.isNamespaceAware());                // Avoid setting the XInclude processing feature if the value is false.        // This will keep the configuration from throwing an exception if it        // does not support XInclude.        if (spf.isXIncludeAware()) {            xmlReader.setFeature0(XINCLUDE_FEATURE, true);        }
 

 

原文解答:

==============================

How do I keep the DTD from loading? Even when I turn off validation the parser tries to load the DTD file.

Even when validation is turned off, an XML parser will by default load the external DTD file in order to parse the DTD for external entity declarations. Xerces has a feature to turn off this behavior named "http://apache.org/xml/features/nonvalidating/load-external-dtd" and if you know you're using Xerces you can set this feature on the builder.

builder.setFeature(
"http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

If you're using another parser like Crimson, your best bet is to set up an EntityResolver that resolves the DTD without actually reading the separate file.

import org.xml.sax.*;
import java.io.*;

public class NoOpEntityResolver implements EntityResolver {
       public InputSource resolveEntity(String publicId, String systemId) {
              return new InputSource(new StringBufferInputStream(""));
       }
}

Then in the builder


builder.setEntityResolver(new NoOpEntityResolver());

There is a downside to this approach. Any entities in the document will be resolved to the empty string, and will effectively disappear. If your document has entities, you need to setExpandEntities(false) code and ensure the EntityResolver only suppresses the DocType.

0 0
原创粉丝点击