org.dom4j.DocumentException: Error on line 1 of document: 前言中不允许有内容

来源:互联网 发布:网络教育招生 编辑:程序博客网 时间:2024/05/22 05:09

下面是大致的异常栈:

org.dom4j.DocumentException: Error on line 1 of document  : 前言中不允许有内容。 Nested exception: 前言中不允许有内容。at org.dom4j.io.SAXReader.read(SAXReader.java:482)at org.dom4j.DocumentHelper.parseText(DocumentHelper.java:278)at com.apobates.parser.RssParser.build(RssParser.java:38)at com.apobates.machine.reader.Reader.mainParser(Reader.java:57)at com.apobates.machine.reader.Reader.load(Reader.java:37)at com.apobates.test.ParserEntityTest.main(ParserEntityTest.java:41)Nested exception: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; 前言中不允许有内容。at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown Source)at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(Unknown Source)at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)at org.dom4j.io.SAXReader.read(SAXReader.java:465)at org.dom4j.DocumentHelper.parseText(DocumentHelper.java:278)at com.apobates.parser.RssParser.build(RssParser.java:38)at com.apobates.machine.reader.Reader.mainParser(Reader.java:57)at com.apobates.machine.reader.Reader.load(Reader.java:37)at com.apobates.test.ParserEntityTest.main(ParserEntityTest.java:41)Nested exception: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; 前言中不允许有内容。at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown Source)at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(Unknown Source)at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)at org.dom4j.io.SAXReader.read(SAXReader.java:465)at org.dom4j.DocumentHelper.parseText(DocumentHelper.java:278)at com.apobates.parser.RssParser.build(RssParser.java:38)at com.apobates.machine.reader.Reader.mainParser(Reader.java:57)at com.apobates.machine.reader.Reader.load(Reader.java:37)at com.apobates.test.ParserEntityTest.main(ParserEntityTest.java:41)

上面的异常发生在我用HttpGet把http://answers.microsoft.com/en-us/feed/f/ie的响应放到一个字符串中,在RssParser的build方法中运行下面代码:

Document doc=DocumentHelper.parseText(responseText);

时抛出的,在控制台上print出来的字符串另存为xml并没有发现xml是非良构的,我就开始找“前言中不允许有内容”,并没有找到xml中存在前言的概念,不过在google中却发现这个异常提示与之接近:Content is not allowed in prolog.

翻译一下是说:内容是不允许在序言。这下有头绪了。下面看一看xml序言有哪些内容:

The prolog refers to the information that appears before the start tag of the document or root element. It includes information that applies to the document as a whole, such as character encoding, document structure, and style sheets.

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="show_book.xsl"?><!DOCTYPE catalog SYSTEM "catalog.dtd"><!--catalog last updated 2000-11-01-->


上面的定义来自MSDN:http://msdn.microsoft.com/en-us/library/vstudio/ms256037(v=vs.100).aspx

answers的feed xml的序言有:

<?xml version="1.0" encoding="utf-8"?><rss xmlns:a10="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Internet Explorer Category - All Threads</title><description /><language>en-us</language><a10:link href="http://answers.microsoft.com/en-us/ie/forum?tab=Threads&threadType=all" />

xml version肯定没问题,后面是rss,好像发现跟名称空间有点关系,我用正则替换掉item中的a10:发现问题解决了,最离谱的是如果用下面的代码也没问题:

SAXReader xmlReader = new SAXReader();List<RssNews> rs=new ArrayList<RssNews>();try {Document doc=xmlReader.read(new URL("http://answers.microsoft.com/en-us/feed/f/ie"));List<Node> list = doc.selectNodes("//item");                       //ETC} catch (MalformedURLException e) {// TODO Auto-generated catch blocke.printStackTrace();} catch (DocumentException e) {// TODO Auto-generated catch blocke.printStackTrace();}

dom4J对DocumentHelper.parseText要求过严了,一直想在这行代码之前关闭掉名称空间检查,苦读几天api,也没发现

原创粉丝点击