XML解析开发指南

来源:互联网 发布:法语自学软件app 编辑:程序博客网 时间:2024/04/30 18:43

XML解析开发指南

1       修改历史

版本

修改历史

 作者

描述

开发时间(h)

0.1

2007-8-14

LevinSoft

创建文档得基本结构、基本流程

5

0.2

2007-8-22

LevinSoft

修改:xml中间件为xml框架,

0.01

2       介绍

本文是基于XML开发相关内容。主要包括:XML基本概念、XML框架比较、XPATH介绍、开发实例、参考资源等。

我相信,掌握学习方法比没有秩序的学习要重要的多。因此,在文章每一部分,对于重要的地方、或者是比较难理解的地方,结合自己的实际开发,提出了一些心得。包括:学习的内容、进步的过程等。给出了翔实的例子,并添加注释。

3       基本概念

XML (Extensible Markup Language) is a flexible way to create common information formats and share both the format and the data on the World Wide Web, intranets, and elsewhere. XML can be used by any individual or group of individuals or companies that wants to share information in a consistent way. XML, a formal recommendation from the World Wide Web Consortium (W3C), is similar to the language of today's Web pages, the Hypertext Markup Language (HTML). Both XML and HTML contain markup symbols to describe the contents of a page or file. HTML, however, describes the content of a Web page (mainly text and graphic images) only in terms of how it is to be displayed and interacted with. For example, a <P> starts a new paragraph. XML describes the content in terms of what data is being described. For example, a <PHONENUM> could indicate that the data that followed it was a phone number. This means that an XML file can be processed purely as data by a program or it can be stored with similar data on another computer or, like an HTML file, that it can be displayed. For example, depending on how the application in the receiving computer wanted to handle the phone number, it could be stored, displayed, or dialed

3.1    XML

Ø        eXtensible Markup Language

Ø        XML is a meta-language, you create tags by yourself.

Ø        XML was derived from Standard Generalized Markup Language, SGML(Standard Generalized Markup Language).

Ø        XML is only a markup language , but use it we can write applications in such aspects: web sites, electronic data interchange, vector graphics, genealogy, real-estate listings, object serialization, remote procedure calls, voice-mail systems, and more.

Ø        XML is a creation of the World Wide Web Consortium (W3C) http://www.w3.org

3.2    Why XML

Ø        Data and presentation can be separated.

Ø        So the developer will put more concerns on data, but not how to display them.

Ø        Exchanging Data is more flexible.

n         XML can be stored as text, it is independent of platforms.

Ø        Used widely in the B2B business model

Ø        Many databases support XML , you can import or export data to or from the databases.

Ø        XML can be used to Create new Languages, such as WML(Wireless Markup Language).

3.3    What is XML?

Ø        Simple text formatted to follow a well-defined set of rules

Ø        XML documents consist primarily of tags and text, familiar with HTML document.

Ø        The tags define keys that have text values and may contain data on any topic, that is, name-value pairs.Example: City = Beijing, County=China

Ø        This text may be stored/represented in:

n         A normal file stored on disk.

n         A message being sent over HTTP.

n         A character string in a programming language.

n         A text BLOB (binary large object) in a database.

n         Any other way textual data can be used.

3.4    XML Family Overview

3.5    XSL

XSL is a family of recommendations for defining XML document transformation and presentation. It consists of three parts:

XSL Transformations (XSLT)

a language for transforming XML

the XML Path Language (XPath)

an expression language used by XSLT to access or refer to parts of an XML document. (XPath is also used by the XML Linking specification)

XSL Formatting Objects (XSL-FO)

an XML vocabulary for specifying formatting semantics

An XSLT stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses a formatting vocabulary, such as (X)HTML or XSL-FO. For a more detailed explanation of how XSL works, see the What Is XSL page.

For background information on style sheets, see the Web style sheets resource page. XSL is developed by the W3C XSL Working Group (members only) whose charter is to develop the next version of XSL. XSL is part of W3C's XML Activity, whose work is described in the XML Activity Statement

3.5.1            XSLT

The XSL Transformations (XSLT) describes a language for transforming XML documents into other XML documents or other text output. It was defined by the W3C XSL Working group.

The XSLT 1.0 Recommendation is located at:

http://www.w3.org/TR/1999/REC-xslt-19991116

4       XPATH

4.1    介绍

4.1.1            Abstract

XPath is a language for addressing parts of an XML document, designed to be used by both XSLT and XPointer.

XPath 2.0 is an expression language that allows the processing of values conforming to the data model defined in [XQuery/XPath Data Model (XDM)]. The data model provides a tree representation of XML documents as well as atomic values such as integers, strings, and booleans, and sequences that may contain both references to nodes in an XML document and atomic values. The result of an XPath expression may be a selection of nodes from the input documents, or an atomic value, or more generally, any sequence allowed by the data model. The name of the language derives from its most distinctive feature, the path expression, which provides a means of hierarchic addressing of the nodes in an XML tree. XPath 2.0 is a superset of [XPath 1.0], with the added capability to support a richer set of data types, and to take advantage of the type information that becomes available when documents are validated using XML Schema. A backwards compatibility mode is provided to ensure that nearly all XPath 1.0 expressions continue to deliver the same result with XPath 2.0; exceptions to this policy are noted in [I Backwards Compatibility with XPath 1.0].

4.1.2            Abbreviated Syntax

Here are some examples of location paths using abbreviated syntax:

·         para selects the para element children of the context node

·         * selects all element children of the context node

·         text() selects all text node children of the context node

·         @name selects the name attribute of the context node

·         @* selects all the attributes of the context node

·         para[1] selects the first para child of the context node

·         para[last()] selects the last para child of the context node

·         */para selects all para grandchildren of the context node

·         /doc/chapter[5]/section[2] selects the second section of the fifth chapter of the doc

·         chapter//para selects the para element descendants of the chapter element children of the context node

·         //para selects all the para descendants of the document root and thus selects all para elements in the same document as the context node

·         //olist/item selects all the item elements in the same document as the context node that have an olist parent

·         . selects the context node

·         .//para selects the para element descendants of the context node

·         .. selects the parent of the context node

·         ../@lang selects the lang attribute of the parent of the context node

·         para[@type="warning"] selects all para children of the context node that have a type attribute with value warning

·         para[@type="warning"][5] selects the fifth para child of the context node that has a type attribute with value warning

·         para[5][@type="warning"] selects the fifth para child of the context node if that child has a type attribute with value warning

·         chapter[title="Introduction"] selects the chapter children of the context node that have one or more title children with string-value equal to Introduction

·         chapter[title] selects the chapter children of the context node that have one or more title children

·         employee[@secretary and @assistant] selects all the employee children of the context node that have both a secretary attribute and an assistant attribute

The most important abbreviation is that child:: can be omitted from a location step. In effect, child is the default axis. For example, a location path div/para is short for child::div/child::para.

There is also an abbreviation for attributes: attribute:: can be abbreviated to @. For example, a location path para[@type="warning"] is short for child::para[attribute::type="warning"] and so selects para children with a type attribute with value equal to warning.

// is short for /descendant-or-self::node()/. For example, //para is short for /descendant-or-self::node()/child::para and so will select any para element in the document (even a para element that is a document element will be selected by //para since the document element node is a child of the root node); div//para is short for div/descendant-or-self::node()/child::para and so will select all para descendants of div children.

NOTE: The location path //para[1] does not mean the same as the location path /descendant::para[1]. The latter selects the first descendant para element; the former selects all descendant para elements that are the first para children of their parents.

A location step of . is short for self::node(). This is particularly useful in conjunction with //. For example, the location path .//para is short for

self::node()/descendant-or-self::node()/child::para

and so will select all para descendant elements of the context node.

Similarly, a location step of .. is short for parent::node(). For example, ../title is short for parent::node()/child::title and so will select the title children of the parent of the context node.

4.1.3            Abbreviations

注意:如果级联多层(多于两层),也要把参数设置为://

[10]   

AbbreviatedAbsoluteLocationPath

   ::=   

'//' RelativeLocationPath

 

[11]   

AbbreviatedRelativeLocationPath

   ::=   

RelativeLocationPath '//' Step

 

[12]   

AbbreviatedStep

   ::=   

'.'

 

 

 

 

| '..'

 

[13]   

AbbreviatedAxisSpecifier

   ::=   

'@'?

 

 

4.2    使用心得

1. 这些缩写经常组合起来使用。

2. 灵活的使用常用这些缩写,可以大大提高开发效率。

3. 没有必要记住所有的缩写,需要时,可以查询一下手册。

4. 但是要记住最常用的使用。每次查询手册也是非常耗时间的。提高你的开发速度和成就感。

4.3    XPath操作实例

这里采用dom4j工具对xpath语法的使用进行举例说明 。

4.3.1             para[@type="warning"]

para[@type="warning"] selects all para children of the context node that have a type attribute with value warning

 

方法一:

Node propertyNode = userDoc.selectSingleNode("/persons/personList/property[@name='"

              + name + "']/@value");

       if(propertyNode == null)

           return “”;

       return propertyNode.getText()

方法二:

             Node node = configFile.selectSingleNode("/para-config/paraMapping[@para='"

              + paraValue + "']");

       Node mappedIdNode = node.selectSingleNode("@mappedId");    

4.3.2             取得节点列表

List personList = docInfo.selectNodes("/persons/personList/privilage");  

4.3.3             //

//para selects all the para descendants of the document root and thus selects all para elements in the same document as the context node

 

List<Element> nodeList = orderRelationNode.selectNodes("//person");

4.3.4            /web-app/servlet[1]/servlet-name

其中[1]表示,第一个node

private static String DEMO_XML =

        "<?xml version='1.0' encoding='ISO-8859-1'?>/n" +

        "<web-app>/n" +

        "<servlet>/n" +

        "<servlet-name>snoop</servlet-name>/n"+

        "<servlet-class>SnoopServlet</servlet-class>/n"+

        "</servlet>/n" +

        "</web-app>";

 

demoDocument = DocumentHelper.parseText( DEMO_XML );

demoDocument.valueOf( "/web-app/servlet[1]/servlet-name" )

 

另外一种方式:

public String getSingleNodeValue(String nodeName) {

        String xPathExpression = "/Package/" + nodeName;

        return getDocument().valueOf(xPathExpression);

}

4.3.5             XML属性值设置

当给一个节点中的一个属性赋值时,如果赋值为null,那么在dom4j解析的底层,处理的方式,不给这个付任何值。也就是说,在客户端,它收不到这个属性的显示。

例如:XML格式

<Response status="0" message="查询个人订购关系成功">

<param name="lines" value="2"/>

<person userId="1002" userName="P1001"> </person>

</Response>

当如果,当在逻辑处理时,如果设置一个属性为null, 比如:userName设置为null,那么在客户端。

<param name="lines" value="1"/>

<param name="personList">

<person userId="05797220830"/>

</param>

4.4    更多好的实例

http://www.zvon.org/xxl/XPathTutorial/

该网站,还提供大量的xml资源。

·         XLab - interactive XPath experiments

·         XML tutorial

·         DTD tutorial

·         XML Namespace tutorial

·         XHTML 1.0 reference

·         XHTML Basic reference

下面摘取了几个实例。

4.4.1            Example1

Values of attributes can be used as selection criteria. Function normalize-space removes leading and trailing spaces and replaces sequences of whitespace characters by a single space

//BBB[@name='bbb']

Select BBB elements which have attribute name with value 'bbb'


     <AAA>
          <BBB id = "b1"/>
          <BBB name = " bbb "/>
          <
BBB name = "bbb"/>
     </AAA>

4.5    Other References

Character Model

World Wide Web Consortium. Character Model for the World Wide Web. W3C Working Draft. See http://www.w3.org/TR/WD-charmod

DOM

World Wide Web Consortium. Document Object Model (DOM) Level 1 Specification. W3C Recommendation. See http://www.w3.org/TR/REC-DOM-Level-1

JLS

J. Gosling, B. Joy, and G. Steele. The Java Language Specification. See http://java.sun.com/docs/books/jls/index.html.

ISO/IEC 10646

ISO (International Organization for Standardization). ISO/IEC 10646-1:1993, Information technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Multilingual Plane. International Standard. See http://www.iso.ch/cate/d18741.html.

TEI

C.M. Sperberg-McQueen, L. Burnard Guidelines for Electronic Text Encoding and Interchange. See http://etext.virginia.edu/TEI.html.

Unicode

Unicode Consortium. The Unicode Standard. See http://www.unicode.org/unicode/standard/standard.html.

XML Infoset

World Wide Web Consortium. XML Information Set. W3C Working Draft. See http://www.w3.org/TR/xml-infoset

XPointer

World Wide Web Consortium. XML Pointer Language (XPointer). W3C Working Draft. See http://www.w3.org/TR/WD-xptr

XQL

J. Robie, J. Lapp, D. Schach. XML Query Language (XQL). See http://www.w3.org/TandS/QL/QL98/pp/xql.html

XSLT

World Wide Web Consortium. XSL Transformations (XSLT). W3C Recommendation. See http://www.w3.org/TR/xslt

W3C   http://www.w3.org/ The World Wide Web Consortium (W3C) develops interoperable technologies (specifications, guidelines, software, and tools) to lead the Web to its full potential. W3C is a forum for information, commerce, communication, and collective understanding. On this page, you'll find W3C news, links to W3C technologies and ways to get involved. New visitors can find help in Finding Your Way at W3C. We encourage organizations to learn more about W3C and about W3C Membership.

5       Document models

下面内容引用:01 Sep 2001

 

http://www.ibm.com/developerworks/xml/library/x-injava/index.html

 

clarify the terminology used in this article

·         parser means the program that interprets the structure of an XML text document

·         document representation means the data structures used by a program to work with the document in memory

·         document model means a library and API that supports working with a document representation

DOM

DOM (Document Object Model) is the official W3C standard for representing XML documents in a platform- and language-neutral manner. It serves as a good comparison for any Java-specific models. To make departing from the DOM standard worthwhile, Java-specific models should offer significant performance and/or ease-of-use advantages over Java DOM implementations.

JDOM

JDOM is intended to be a Java-specific document model that makes interacting with XML simpler and faster than using DOM implementations. As the first such Java-specific model, JDOM has been heavily publicized and promoted. It is also being considered for eventual use as a Java Standard Extension through the Java Specification Request JSR-102. The actual form this will take is still under development, though, and the JDOM APIs have been undergoing significant changes between beta versions. JDOM has been under development since early 2000.

dom4j

dom4j originated as a kind of intellectual offshoot from JDOM, though it represents a completely separate development effort. It incorporates a number of features beyond the basic XML document representation, including integrated XPath support, XML Schema support (currently in alpha form), and event-based processing for very large or streamed documents. It also gives the option of building the document representation with parallel access through the dom4j APIs and a standard DOM interface. It has been under development since late 2000, with existing APIs preserved between recent releases.

Electric XML

Electric XML (EXML) is a spin-off from a commercial project supporting distributed computing. It differs from the other models discussed so far in that it properly supports only a subset of XML documents, it does not provide any support for validation, and it has a more restrictive license. However, EXML offers the advantages of very small size and direct support for an XPath subset, and it made an interesting candidate for this comparison since it has been promoted as an alternative to the other models in several recent articles.

XML Pull Parser

XML Pull Parser (XPP) is a recent development that demonstrates a different approach to XML parsing. As with EXML, XPP properly supports only a subset of XML documents and does not provide any support for validation. It does share the advantage of a very small size. That advantage, combined with its pull-parser approach, made it a good alternative to include in this comparison.

The different Java XML document models all have some areas of strength, but from the performance standpoint there are some clear winners.

XPP is the performance leader in most respects. For middleware-type applications that do not require validation, entities, processing instructions, or comments, XPP looks to be an excellent choice despite its newness. This is especially true for applications running as browser applets or in limited memory environments.

dom4j doesn't have the sheer speed of XPP, but it does provide very good performance with a much more standardized and fully functional implementation, including built-in support for SAX2, DOM, and even XPath. Xerces DOM (with deferred node creation) also does well on most performance measurements, though it suffers on small files and Java serialization. For general XML handling, both dom4j and Xerces DOM are probably good choices, with the preference between the two determined by whether you consider Java-specific features or cross-language compatibility more important.

JDOM and Crimson DOM consistently rank poorly on the performance tests. Crimson DOM may still be worth using in the case of small documents, where Xerces does poorly. JDOM doesn't really have anything to recommend it from the performance standpoint, though the developers have said they intend to focus on performance before the official release. However, it'll probably be difficult for JDOM to match the performance of the other models without some restructuring of the API.

Conclusions

The different Java XML document models all have some areas of strength, but from the performance standpoint there are some clear winners.

XPP is the performance leader in most respects. For middleware-type applications that do not require validation, entities, processing instructions, or comments, XPP looks to be an excellent choice despite its newness. This is especially true for applications running as browser applets or in limited memory environments.

dom4j doesn't have the sheer speed of XPP, but it does provide very good performance with a much more standardized and fully functional implementation, including built-in support for SAX2, DOM, and even XPath. Xerces DOM (with deferred node creation) also does well on most performance measurements, though it suffers on small files and Java serialization. For general XML handling, both dom4j and Xerces DOM are probably good choices, with the preference between the two determined by whether you consider Java-specific features or cross-language compatibility more important.

JDOM and Crimson DOM consistently rank poorly on the performance tests. Crimson DOM may still be worth using in the case of small documents, where Xerces does poorly. JDOM doesn't really have anything to recommend it from the performance standpoint, though the developers have said they intend to focus on performance before the official release. However, it'll probably be difficult for JDOM to match the performance of the other models without some restructuring of the API.

6       XML

6.1    dom4j

dom4j is an easy to use, open source library for working with XML, XPath and XSLT on the Java platform using the Java Collections Framework and with full support for DOM, SAX and JAXP.

从功能和性能讲,对比其它相关的XML API,有一定的优势。

当前一些成熟的开发项目、公司项目都采用该API

6.2    JAXP

The JavaTM API for XML Processing (JAXP) includes the basic facilities for working with XML documents through the following standardized set of Java Platform APIs

7       Dom4j Quick Start

The following resource comes from http://www.dom4j.org/guide.html.

下面的部分,覆盖了dom4j主要api的使用。提供快速上手实例。

7.1    Parsing XML

public Document parse(URL url) throws DocumentException {
        SAXReader reader = new SAXReader();
        Document document = reader.read(url);
        return document;
    }

7.2    Using Iterators

A document can be navigated using a variety of methods that return standard Java Iterators. For example

    public void bar(Document document) throws DocumentException {

 

        Element root = document.getRootElement();

 

        // iterate through child elements of root

        for ( Iterator i = root.elementIterator(); i.hasNext(); ) {

            Element element = (Element) i.next();

            // do something

        }

 

        // iterate through child elements of root with element name "foo"

        for ( Iterator i = root.elementIterator( "foo" ); i.hasNext(); ) {

            Element foo = (Element) i.next();

            // do something

        }

 

        // iterate through attributes of root

        for ( Iterator i = root.attributeIterator(); i.hasNext(); ) {

            Attribute attribute = (Attribute) i.next();

            // do something

        }

     }

7.3    Powerful Navigation with XPath

In dom4j XPath expressions can be evaluated on the Document or on any Node in the tree (such as Attribute, Element or ProcessingInstruction). This allows complex navigation throughout the document with a single line of code. For example.

    public void bar(Document document) {
        List list = document.selectNodes( "//foo/bar" );
 
        Node node = document.selectSingleNode( "//foo/bar/author" );
 
        String name = node.valueOf( "@name" );
    }

For example if you wish to find all the hypertext links in an XHTML document the following code would do the trick.

    public void findLinks(Document document) throws DocumentException {
 
        List list = document.selectNodes( "//a/@href" );
 
        for (Iterator iter = list.iterator(); iter.hasNext(); ) {
            Attribute attribute = (Attribute) iter.next();
            String url = attribute.getValue();
        }
    }

If you need any help learning the XPath language we highly recommend the Zvon tutorial which allows you to learn by example.

7.4    Fast Looping

If you ever have to walk a large XML document tree then for performance we recommend you use the fast looping method which avoids the cost of creating an Iterator object for each loop. For example

    public void treeWalk(Document document) {
        treeWalk( document.getRootElement() );
    }
 
    public void treeWalk(Element element) {
        for ( int i = 0, size = element.nodeCount(); i < size; i++ ) {
            Node node = element.node(i);
            if ( node instanceof Element ) {
                treeWalk( (Element) node );
            }
            else {
                // do something....
            }
        }
    }

7.5    Creating a new XML document

Often in dom4j you will need to create a new document from scratch. Here's an example of doing that.

import org.dom4j.Document;
import org.dom4j.DocumentHelper;
import org.dom4j.Element;
 
public class Foo {
 
    public Document createDocument() {
        Document document = DocumentHelper.createDocument();
        Element root = document.addElement( "root" );
 
        Element author1 = root.addElement( "author" )
            .addAttribute( "name", "James" )
            .addAttribute( "location", "UK" )
            .addText( "James Strachan" );
        
        Element author2 = root.addElement( "author" )
            .addAttribute( "name", "Bob" )
            .addAttribute( "location", "US" )
            .addText( "Bob McWhirter" );
 
        return document;
    }
}

7.6    Writing a document to a file

A quick and easy way to write a Document (or any Node) to a Writer is via the write() method.

  FileWriter out = new FileWriter( "foo.xml" );
  document.write( out );

If you want to be able to change the format of the output, such as pretty printing or a compact format, or you want to be able to work with Writer objects or OutputStream objects as the destination, then you can use the XMLWriter class.

import org.dom4j.Document;
import org.dom4j.io.OutputFormat;
import org.dom4j.io.XMLWriter;
 
public class Foo {
 
    public void write(Document document) throws IOException {
 
        // lets write to a file
        XMLWriter writer = new XMLWriter(
            new FileWriter( "output.xml" )
        );
        writer.write( document );
        writer.close();
 
 
        // Pretty print the document to System.out
        OutputFormat format = OutputFormat.createPrettyPrint();
        writer = new XMLWriter( System.out, format );
        writer.write( document );
 
        // Compact format to System.out
        format = OutputFormat.createCompactFormat();
        writer = new XMLWriter( System.out, format );
        writer.write( document );
    }
}

7.7    Converting to and from Strings

If you have a reference to a Document or any other Node such as an Attribute or Element, you can turn it into the default XML text via the asXML() method.

        Document document = ...;
        String text = document.asXML();

If you have some XML as a String you can parse it back into a Document again using the helper method DocumentHelper.parseText()

        String text = "<person> <name>James</name> </person>";
        Document document = DocumentHelper.parseText(text);

7.8    Styling a Document with XSLT

Applying XSLT on a Document is quite straightforward using the JAXP API from Sun. This allows you to work against any XSLT engine such as Xalan or SAXON. Here is an example of using JAXP to create a transformer and then applying it to a Document.

import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
 
import org.dom4j.Document;
import org.dom4j.io.DocumentResult;
import org.dom4j.io.DocumentSource;
 
public class Foo {
 
    public Document styleDocument(
        Document document, 
        String stylesheet
    ) throws Exception {
 
        // load the transformer using JAXP
        TransformerFactory factory = TransformerFactory.newInstance();
        Transformer transformer = factory.newTransformer( 
            new StreamSource( stylesheet ) 
        );
 
        // now lets style the given document
        DocumentSource source = new DocumentSource( document );
        DocumentResult result = new DocumentResult();
        transformer.transform( source, result );
 
        // return the transformed document
        Document transformedDoc = result.getDocument();
        return transformedDoc;
    }
}

7.9    Dom4j类详解

7.9.1            XMLWriter

/**<p><code>XMLWriter</code> takes a DOM4J tree and formats it to a

stream as XML.

 

1.      demoDocument = DocumentHelper.parseText( DEMO_XML );

            new XMLWriter(OutputFormat.createPrettyPrint()).write(demoDocument);

this.buffer.append( demoDocument.valueOf( "/web-app/servlet[1]/servlet-name" ) );

2.      new XMLWriter( System.out, format )

7.9.2            SAXReader

提供了读出文件和数据流的方法

<p><code>SAXReader</code> creates a DOM4J tree from SAX parsing events.</p>

7.9.3            DocumentHelper

提供了很多非常方便的实现方法。

7.9.3.1             DocumentHelper.parseText(xml)

/** <p><code>parseText</code> parses the given text as an XML document

l       and returns the newly created Document.

7.9.3.2             创建一个rootElement

方式一:

rootElement = docMessage.addElement("Response");

docMessage.setRootElement(rootElement);

 

refactor后的实现方式:

Element resultRoot = DocumentHelper.createElement( "Response" );

this.resultDoc = DocumentHelper.createDocument( resultRoot );

7.9.3.3             W3C DOM converting to dom4j tree

A simple test program to demonstrate using W3C DOM and JAXP to load a DOM

  * XML tree then converting it to a DOM4J tree.

    protected Document parse( String url ) throws Exception {

        // parse a DOM tree

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

        DocumentBuilder builder = factory.newDocumentBuilder();

        println( "Loading document with JAXP builder: " + builder );

       

        org.w3c.dom.Document domDocument = builder.parse( url );

       

        println( "Created W3C DOM document: " + domDocument );

       

        // now convert to DOM4J model

        DOMReader reader = new DOMReader();

        Document document = reader.read(domDocument);

       

        println( "Created DOM4J document: " + document );

       

        return document;

    }

7.9.3.4             CreateXMLDemo

Document document = DocumentHelper.createDocument();

Element root = document.addElement( "system" );

 

然后,可以循环的添加子节点

Element element = root.addElement( "property" );

            element.addAttribute( "name", name );

            element.addText( value );

8       XML定义方式和解析方式

8.1    定义XML的原则和方法

1. 首先要分析清楚要做什么。确定展示一行数据、还是很多数据。

2. 找出需要展示的信息。比如:应该那些字段,字段含义要明确。

3. 定义格式。如果是多行的,那么定义父节点要合理的分类。

4. 对格式中的,要合理的命名,并添加必要的注释和实例。

8.2    XML存储多行数据实例

<?xml version="1.0" encoding="gb2312"?>

<root status="X" lines="Y">

   <rows>

<row id="XXXX1" name="YYYY1" addr="ZZZZ1" tel="KKKK1" type="T" /  cityCode=”MMMM”>

<row id="XXXX2" name="YYYY2" addr="ZZZZ2" tel="KKKK2" type="T" cityCode=”MMMM”/>

</rows>

    </root>                               

注释:第一行是不能省略的。否则不能解析。

 

id表示单位ID

name表示单位名称;

addr表示单位地址;

tel表示单位电话;

Type表示单位类型

cityCode 表示单位区号

status=0时,表示接口数据有效;status=1为无效。

lines表示查询记录行数。记录行数至多100条。

8.3    SAX (event-based)

8.3.1            不同几种方式来取值

public class DOM {

    public static void main(String[] args) {

       Document document = null;

       try{

                     document = DocumentHelper.parseText("/server_config.xml");

       }catch (Exception e) {

           // TODO: handle exception

       }

      

//     SAXReader reader = new SAXReader();

//     InputStream inputStream = DOM.class.getResourceAsStream("/server_config.xml");

//     try

//     {

//         document = reader.read(inputStream);

//     }

//     catch(DocumentException exp)

//     {

//         System.out.println("Load " + "/server_config.xml" + " error.");

//         document = null;

//     }

      

8.4    参考资源

1http://www.zvon.org/xxl/XPathTutorial/General/examples.htm 提供很多实例。

9       JAXP

9.1    Introduction

9.1.1            Document Object Model (DOM) Level 3

The Document Object Model (DOM) is a set of interfaces defined by the W3C DOM Working Group. It describes facilities for a programmatic representation of a parsed XML (or HTML) document. The DOM Level 3 specification defines these interfaces using Interface Definition Language (IDL) in a language independent fashion and also includes a Java Language binding.

The JavaTM API for XML processing specification includes by reference both the abstract semantics described for the DOM Level 3 Core Recommendation interfaces and the associated Java Language binding. It does not include the optional extensions defined by the DOM working group.

The API package included by the JavaTM API for XML processing specification is:

  • org.w3c.dom, which is a part of the Common DOM API.

9.1.2            Simple API For XML Parsing (SAX) 2.0.2

The Simple API for XML (SAX) is a public domain API developed cooperatively by the members of the XML-DEV mailing list. It provides an event-driven interface to the process of parsing an XML document.

An event driven interface provides a mechanism for "callback" notifications to application’s code as the underlying parser recognizes XML syntactic constructions in the document.

The JavaTM API for XML processing specification includes by reference the SAX 2.0.2 API and the SAX2 extensions in its entirety.

The API packages included by reference are:

  • org.xml.sax
  • org.xml.sax.helpers
  • org.xml.sax.ext

9.1.3            XSLT 1.0

The XSL Transformations (XSLT) describes a language for transforming XML documents into other XML documents or other text output. It was defined by the W3C XSL Working group.

The XSLT 1.0 Recommendation is located at:

http://www.w3.org/TR/1999/REC-xslt-19991116

The JavaTM API for XML processing specification includes by reference the XSLT 1.0 specification in its entirety.

The API packages included by reference are:

  • javax.xml.transform

9.1.4            Pluggability Layer

The endorsed APIs provide broad and useful functionality. However, the use of a SAX or a DOM parser typically requires knowledge of the specific implementation of the parser. Providing the functionality of the endorsed APIs in the Java Platform, while allowing choice of the implementation of the parser, requires a Pluggability layer.

The JavaTM API for XML processing specification defines a Pluggability mechanism to allow a compliant SAX or DOM parser to be used through the abstract javax.xml.parsers and javax.xml.transform API. Also, compliant Transformers are pluggable via the javax.xml.transform API.

 

9.2    Java api基本介绍

1.      Document  public interface Document extends Node

Document 接口表示整个 HTML XML 文档。从概念上讲,它是文档树的根,并提供对文档数据的基本访问。

因为元素、文本节点、注释、处理指令等不能存在于 Document 的上下文之外,所以 Document 接口还包含所需的创建这些对象的工厂方法。所创建的 Node 对象具有 ownerDocument 属性,该属性将 Node 对象与创建这些对象时的上下文所属的 Document 关联起来。

2.      Node  public interface Node

Node 接口是整个文档对象模型的主要数据类型。它表示该文档树中的单个节点。当实现 Node 接口的所有对象公开处理子节点的方法时,不是实现 Node 接口的所有对象都有子节点。例如,Text 节点可能没有子节点,且将子节点添加到这样的节点将导致引发 DOMException

包括属性 nodeNamenodeValue attributes 作为一种获取节点信息的机制,无需向下强制转换为特定的派生接口。在没有对特定的 nodeType(如 Element nodeValue Comment attributes)的属性的明显映射的情况下,这将返回 null。注意,特定的接口可能包含其他更方便的机制来获取和设置相关信息。

3.      Element  public interface Element extends Node

接口表示 HTML XML 文档中的一个元素。元素可能有与它们相关的属性;由于 Element 接口继承自 Node,所以可以使用一般 Node 接口属性 attributes 来获得元素所有属性的集合。Element 接口上有通过名称获得 Attr 对象或通过名称获得属性值的方法。在 XML 中(其中的属性值可能包含实体引用),应该获得 Attr 对象来检查表示属性值的可能相当复杂的子树。另一方面,在 HTML 中(其中的所有属性都有简单的字符串值),可以使用直接访问属性值的方法,这既安全又便捷。

9.3    开发实例

采用JXAPapi去解析xml的实例。

9.4    Xml定义

<?xml version="1.0" encoding="gb2312"?>

<!--  hostname:主机地址 ,sid:数据库标识符 ,username:用户名 ,password:密码 -->

<urls>

  <url areaCode="0570" area="嘉兴">

      <hostname >134.100.64.164</hostname>

              <sid>jxtest</sid>

      <username>jx</username>

      <password>jxtest</password>

</url>

</urls>

9.5    解析办法

private String[] doUrlXmlParser(String areaCode) {

       logger.debug("start of the doUrlXmlParser()");

       String[] dbinfo = new String[4];

       try {

           /*

            * Defines a factory API that enables applications to obtain a

            * parser that produces DOM object trees from XML documents.

            */

           DocumentBuilderFactory domfac = DocumentBuilderFactory.newInstance();

          

           DocumentBuilder dombuilder = domfac.newDocumentBuilder();

           logger.debug(" start to parse file: " + CONFIG_FILE);

          

           /*

            * Parse the content of the given URI as an XML document

            * and return a new DOM {@link Document} object.

            */

           Document domConfig = dombuilder.parse(UrlXmlParser.class

                  .getResource(CONFIG_FILE).getPath());

           NodeList urls = domConfig.getDocumentElement().getChildNodes();

 

           if(urls == null) {

              throw NullChildNodeException("XML中子节点没有配置!");

           }

          

           for (int i = 0; i < urls.getLength(); i++) {

              Node url = urls.item(i);

              if (url.getNodeType() == Node.ELEMENT_NODE) {

                  String _areaCode = url.getAttributes().getNamedItem("areaCode").getNodeValue();

                  if (areaCode.equals(_areaCode))

                     for (Node node = url.getFirstChild(); node != null; node = node

                            .getNextSibling()) {

                         if (node.getNodeType() == Node.ELEMENT_NODE) {

                            populateXmlParams(dbinfo, node);

                         }

                     }

              }

           }

      

           if (dbinfo[0] == null)

              logger.debug("/t fail to parse " + CONFIG_FILE

                     + ":解析的区号不存在配置文件中");

           logger.debug("end of the doUrlXmlParser()");

           return dbinfo;

       } catch (FactoryConfigurationError e) {

           logger.debug("/t FactoryConfigurationError" + e);

           logger.debug("end of the doUrlXmlParser()");

           e.printStackTrace();

           return dbinfo;

       } catch (ParserConfigurationException e) {

           logger.debug("/t ParserConfigurationException" + e);

           logger.debug("end of the doUrlXmlParser()");

           e.printStackTrace();

           return dbinfo;

       } catch (SAXException e) {

           logger.debug("/t SAXException" + e);

           logger.debug("end of the doUrlXmlParser()");

           e.printStackTrace();

           return dbinfo;

       } catch (IOException e) {

           logger.debug("/t IOException" + e);

           logger.debug("end of the doUrlXmlParser()");

           e.printStackTrace();

           return dbinfo;

       }

    }

 

10            开发心得

10.1       需要必备的基础知识

1. Xml相关的概念,其中:xpathxslt, xsl, document models等等。

2. 选择一个XML框架软件。例如: dom4j

10.2       进步方法和步骤

1. 首先了解xml基本知识。

2. 选择一种解析工具,到这个工具的网站,阅读quick start文件。

3. eclipse中,参照例子,自己写几次testcase,测试一下。

4. 在自己的项目,实际去使用这些工具。

5. 遇到不懂的名称或简写,例如:xpath expression,写testcase去测试。

11            总结和展望

12            参考资料

Ø        http://www.w3c.org

n         Recommendations, drafts, specifications…

Ø        http://www.w3schools.com/default.asp

n         A very good website which provides many tutorials for XML technologies

Ø        http://www.xml.com/resourceguide/

n         Provide many resource links

Ø        http://java.sun.com

Ø        http://www.ibm.com/developerworks/xml/library/x-injava/index.html

n         compares the performance and functionality of several Java document models

Ø        http://www.zvon.org/xxl/XPathTutorial/General/examples.html

n         Tutorial

Ø        http://www.dom4j.org/

n         Provide some basic operation examples

Ø        http://www.zvon.org/xxl/XPathTutorial/General/examples.html

n         Provide some XPath examples

Ø         

 

13            术语和附录

原创粉丝点击