java xml 和 dom4j 使用ABC

来源:互联网 发布:手机切换淘宝网电脑版 编辑:程序博客网 时间:2024/05/16 09:55

java应用中到处可见xml的身影 ,从web.xml到各框架的配置文件,至wsdp中多数是xml:jaxp(java api for xml processing),jaxb(java Architecture for xml binding),sjsxp(sun java streaming xml parser),....可以访问wsdp(java web service developer pack)的网址:http://www.oracle.com/technetwork/java/webservicespack-jsp-140788.html.我只起个头.里面涉及的知识很广。看完本文的读者可以阅读这两本书:java and xml,java web service

part 1: java xml

1.xml解析器

相信多数java程序员都知道,官方只负责组织架构api,api的实现由不同的厂商来开发.在xml上,好像是迟了一点,在StAX面世之后,jaxp还没出炉,已知的技术有:

1.1 w3c.org 的Dom,跨语言的,跨平台的文档对象模型 .这个不论是在前端和服务器中都有汲列。Document是可读可写的内存树,在非顺序读和写的时候比较出色通常配合xpath

1.2 saxproject.org的sax,也是跨语言的,基于事件回调的处理模型, 比dom的优势是省内存,是顺序读。

1.3 bea的StAX试图统一dom和sax的领土,在jsr 173项目之下.网址:https://www.jcp.org/en/jsr/detail?id=173.我没用过,就不多说了


上面的都是卖点,但不是解析器。我知道的解析器有apache 的 xerces:http://xerces.apache.org/.

支持

    SAX 2.0.2
    DOM Level 3 Core, Load and Save
    DOM Level 2 Core, Events, Traversal and Range
    JAXP 1.4
    StAX 1.0 Event API (javax.xml.stream.events)


2.jaxp是什么

是java对sax,dom的封口,让你用一个jaxp即可使用dom,也可以使用sax.用sax解析xml时使用SAXParserFactory;用dom解析时使用DocumentBuilderFactory


3.jaxb是什么

在java对象和xml之间架起一座桥梁。让你不用理会dom,sax,stax.你面对的要么是xml,要么是java bean.使用marshaller把java对象转到xml(可以一个实例一个xml也可以多个实例一个xml),unmarshaller把xml中的数据还原为java对象实例 ,包的位置:javax.xml.bind.


part 2: dom4j

1.dom4j不是xml解析器,它跟jdom不同之处在于其提供了一套xml抽像接口,顶接口:node,Attribute, Branch, CDATA, CharacterData, Comment, Document(不是w3c的Document), DocumentType, Element, Entity, ProcessingInstruction, Text都是node的子接口


2.默认工厂:DocumentFactory

还有几个具体用处的子工厂:BeanDocumentFactory, DatatypeDocumentFactory, DatatypeElementFactory, DOMDocumentFactory, IndexedDocumentFactory, NonLazyDocumentFactory, UserDataDocumentFactory


说一说:DOMDocumentFactory,它继承了DocumentFactory并实现了org.w3c.dom.DOMImplementation.如果有一个方法接受org.w3c.dom.Element,你可以传给它一个DOMDocumentFactory实例创建的org.dom4j.Element


3.dom4j中也可以使用sax,dom,stax来解析xml,创建的解析器是通过jaxp创建的,他们都在org.dom4j.io包中,org.dom4j.io.DOMReader,org.dom4j.io.SAXReader.org.dom4j.io.XPP3Reader.解析方法都是:read


4.序列化:指的是输出到字符串对象,文件,控制台,可使用:org.dom4j.io.XMLWriter,除此之外还有:

org.dom4j.io.DOMWriter输出到org.dom4j.document中返回一个org.w3c.dom.Document,

org.dom4j.io.SAXWriter输出到org.xml.sax.ContentHandler


5.使用Dom4j解析RSS url

5.1 使用ElementHandler

public class SAXRssParser{private final SAXReader reader;private final List<RssItem> items;public SAXRssParser() {super();this.reader =  new SAXReader();this.items = new ArrayList<>();}public boolean parser(final URL url) {// TODO Auto-generated method stubreader.addHandler("/rss/channel/item",new ElementHandler(){final ItemChildElementHandler titleHandler=new ItemChildElementHandler();final ItemChildElementHandler linkHandler=new ItemChildElementHandler();final ItemChildElementHandler dateHandler=new ItemChildElementHandler();final ItemChildElementHandler descripHandler=new ItemChildElementHandler();@Overridepublic void onStart(ElementPath elementPath) {// TODO Auto-generated method stubelementPath.addHandler("title",titleHandler);elementPath.addHandler("link",linkHandler);elementPath.addHandler("pubDate",dateHandler);elementPath.addHandler("description",descripHandler);}@Overridepublic void onEnd(ElementPath elementPath) {// TODO Auto-generated method stubelementPath.removeHandler("title");elementPath.removeHandler("link");elementPath.removeHandler("pubDate");elementPath.removeHandler("description");try {URL curURL = processRemoteLink(linkHandler.getNodeContent(),url);//处理方法Date curDate = processDate(dateHandler.getNodeContent());//处理方法items.add(new RssItem(curURL, titleHandler.getNodeContent(), descripHandler.getNodeContent(),curDate));} catch (MalformedURLException e) {e.printStackTrace();}}});try {reader.read(url);} catch (DocumentException e) {// TODO Auto-generated catch blocke.printStackTrace();}return items.size()>0?true:false;}public List<RssItem> getEntryList(){return items;}private class ItemChildElementHandler implements ElementHandler{private String tagName;private String tagText;@Overridepublic void onStart(ElementPath elementPath) {// TODO Auto-generated method stubElement elt = elementPath.getCurrent();tagName=elt.getName();}@Overridepublic void onEnd(ElementPath elementPath) {// TODO Auto-generated method stubElement elt = elementPath.getCurrent();tagText=elt.getText();}@SuppressWarnings("unused")public String getNodeNames(){return tagName;}public String getNodeContent(){return tagText;}}}

RssItem对像

@ThreadSafepublic class RssItem implements Serializable{private static final long serialVersionUID = 673250215751499564L;/** * 条目的连接地址 */private final URL url;/** * 条目标题 */private final String title;/** * 条目简述 */private final String description;/** * 条目发布日期 */private final Date date;public RssItem(URL url, String title, String description, Date date) {super();this.url = url;this.title = title;this.description = description;this.date = date;}public URL getUrl() {return url;}public String getTitle() {return title;}public String getDescription() {return description;}public Date getDate() {return date;}@Overridepublic int hashCode() {                //ETC}@Overridepublic boolean equals(Object obj) {                //ETC}@Overridepublic String toString() {                //ETC}}

测试

public class SAXRssParserTest {public static void main(String[] args) {// TODO Auto-generated method stubString b="http://news.baidu.com/n?cmd=7&loc=4075&name=%D1%CC%CC%A8&tn=rss";final long beginTime=System.nanoTime();SAXRssParser sap=new SAXRssParser();try{if(sap.parser(new URL(b))){List<RssItem> news=sap.getEntryList();System.out.println("size:"+news.size());for(RssItem ri:news){System.out.println("title:"+ri.getTitle()+"@"+ri.getDate());System.out.println("link:"+ri.getUrl());}}}catch(MalformedURLException e){e.printStackTrace();}final long endTime=System.nanoTime();System.out.println("used Second: "+(endTime-beginTime)/1.0e9);}}

5.2 使用VisitorSupport

public class SAXRssParser{private final SAXReader reader;private final List<RssItem> items;public SAXRssParser() {super();this.reader =  new SAXReader();this.items = new ArrayList<>();}public boolean parser(final URL url) {// TODO Auto-generated method stub                Document document=reader.read(url);                final RssVisitorSupport rvs=new RssVisitorSupport(url);document.accept(rvs);items.addAll(rvs.getNews());return rvs.getTotalStep()>0?true:false;}public List<RssItem> getEntryList(){return items;}class RssVisitorSupport extends VisitorSupport{private int step=0;private RssItemBuilder build=null;private final List<RssItem> news;private final URL referURL;public RssVisitorSupport(final URL referURL){this.referURL=referURL;this.news=new ArrayList<>();}@Overridepublic void visit(Element node) {// TODO Auto-generated method stubString eleName=node.getName();if(eleName.equals("item")){build=new RssItemBuilder();step++;}if (eleName.equals("title") && build!=null) {build.setTitle(node.getText());}if (eleName.equals("link") && build!=null) {try{build.setURL(processRemoteLink(node.getText(),referURL));}catch(MalformedURLException e){e.printStackTrace();}}if (eleName.equals("pubDate") && build!=null) {build.setDate(processDate(node.getText()));}if (eleName.equals("description") && build!=null) {build.setDescription(node.getText());}if(build!=null && !build.isEmpty()){news.add(build.build());build=null;//不设置此值会出现重复数据}}public int getTotalStep(){return step;}public List<RssItem> getNews(){return news;}}}

由于RssItem设计为不可变对象,所以在RssVisitorSupport中使用的对象:RssItemBuilder,使用了构建模式。关于Builder设计模式可以参考此文:

Builder Design Pattern in Java

我测了几个rss地址发现:VisitorSupport > ElementHandler > Iterator


6.jaxb示例

场景:以前在写后台程序时都有一个功能管理菜单,不知道jaxb为何时,都会创建一份xml,用一种解析器在程序启动时创建一个单例

6.1功能管理菜单xml

<?xml version="1.0" encoding="UTF-8"?><root ico="sec"><group name="会员管理" link="/user" symbol="sec_1"><item><anchor>会员列表</anchor><id>child_1_1</id><link>/user</link></item><item><anchor>个人信息</anchor><id>child_1_2</id><link>/user/person</link></item><item><anchor>企业信息</anchor><id>child_1_3</id><link>/user/company</link></item><item><anchor>安全问题</anchor><id>child_1_4</id><link>/user/secret</link></item><item><anchor>信用记录</anchor><id>child_1_5</id><link>/user/trust</link></item></group><group name="商品管理" link="/product" symbol="sec_2"><item><anchor>商品列表</anchor><id>child_2_1</id><link>/product</link></item><item><anchor>交易帐号</anchor><id>child_2_2</id><link>/product/account</link></item><item><anchor>扩展字段</anchor><id>child_2_3</id><link>/product/field</link></item><item><anchor>类型模板</anchor><id>child_2_4</id><link>/product/field/template</link></item></group><group name="订单管理" link="/order" symbol="sec_3"><item><anchor>订单列表</anchor><id>child_3_1</id><link>/order</link></item><item><anchor>清单管理</anchor><id>child_3_2</id><link>/order/inventory</link></item><item><anchor>点评管理</anchor><id>child_3_3</id><link>/order/pointer</link></item></group><group name="财务管理" symbol="sec_4"><item><anchor>网银交易渠道</anchor><id>child_4_1</id><link>/channel</link></item><item><anchor>充值记录</anchor><id>child_4_2</id><link>/channel/cache</link></item><item><anchor>银行卡管理</anchor><id>child_4_3</id><link>/bank/card</link></item><item><anchor>帐单管理</anchor><id>child_4_4</id><link>/bill</link></item><item><anchor>现金记录</anchor><id>child_4_5</id><link>/bank/saction</link></item><item><anchor>支付宝转账记录</anchor><id>child_4_6</id><link>/bill/ali</link></item></group><group name="新闻管理" link="/news" symbol="sec_5"><item><anchor>新闻列表</anchor><id>child_5_1</id><link>/news</link></item><item><anchor>新闻栏目</anchor><id>child_5_2</id><link>/news/category</link></item><item><anchor>新闻标题标识</anchor><id>child_5_3</id><link>/news/level</link></item></group><group name="系统管理" symbol="sec_6"><item><anchor>投诉/意见反馈</anchor><id>child_6_1</id><link>/feedback</link></item><item><anchor>活跃日志</anchor><id>child_6_2</id><link>/user/active</link></item><item><anchor>会员等级</anchor><id>child_6_3</id><link>/user/level</link></item><item><anchor>手机短信</anchor><id>child_6_4</id><link>/recaptcha</link></item><item><anchor>站内消息</anchor><id>child_6_5</id><link>/message</link></item><item><anchor>关键词</anchor><id>child_6_6</id><link>/word</link></item></group></root>


6.2使用dom4j的saxreader解析上面的xml文件

import java.io.InputStream;import java.util.ArrayList;import java.util.Iterator;import java.util.List;import net.project.entity.Group;import net.project.entity.GroupItem;import org.dom4j.Document;import org.dom4j.DocumentException;import org.dom4j.Element;import org.dom4j.Node;import org.dom4j.io.SAXReader;/** * 传统的sax解析 * @author xiaofanku * 20130701 */public class ParserManagerPanel {private static ParserManagerPanel instance=null;private final List<Group> group;private ParserManagerPanel(InputStream stream){this.group=new ArrayList<Group>();try{parser(new SAXReader().read(stream));}catch(DocumentException e){e.printStackTrace();}}private void parser(final Document doc){List<Node> list = doc.selectNodes("//group");for (Iterator<Node> iter = list.iterator(); iter.hasNext(); ) {Element currentGroup=(Element)iter.next();Group mg=new Group();String defaultLink=currentGroup.attributeValue("link");if(defaultLink==null || defaultLink.isEmpty()){defaultLink="-";}mg.setLink(defaultLink);mg.setName(currentGroup.attributeValue("name"));mg.setSymbol(currentGroup.attributeValue("symbol"));List<Node> groupChild=currentGroup.selectNodes("./item");for(Node currentItem:groupChild){Element anchor=(Element)currentItem.selectSingleNode("./anchor");Element idEle=(Element)currentItem.selectSingleNode("./id");Element link=(Element)currentItem.selectSingleNode("./link");try{GroupItem item=new GroupItem();item.setAnchor(anchor.getText());item.setId(idEle.getText());item.setLink(link.getText());mg.getItems().add(item);}catch(NullPointerException e){e.printStackTrace();}}group.add(mg);}}public static ParserManagerPanel getInstance(InputStream input){if(instance==null){instance=new ParserManagerPanel(input);}return instance;}public List<Group> getStruct(){return group;}}

其中汲及的对象

public class Group implements Serializable{/** *  */private static final long serialVersionUID = 1L;private String name;private String symbol;private String link;private List<GroupItem> items= null;public Group() {super();// TODO Auto-generated constructor stubitems=new ArrayList<>();}        //SET/GET}
public class GroupItem implements Serializable{/** *  */private static final long serialVersionUID = 1L;private String anchor;private String id;private String link;public GroupItem() {super();// TODO Auto-generated constructor stub}        //SET/GET}


6.3如果使用jaxb只需要多加几个注解,完全可以不用dom4j来将xml转成对象

@XmlAccessorType(XmlAccessType.FIELD)@XmlRootElement(name="group")public class Group implements Serializable{/** *  */private static final long serialVersionUID = 1L;@XmlAttributeprivate String name;@XmlAttributeprivate String symbol;@XmlAttribute(required = false)private String link;@XmlElement(name="item")private List<GroupItem> items= null;public Group() {super();// TODO Auto-generated constructor stubitems=new ArrayList<>();}        //GET/SET}

@XmlAccessorType(XmlAccessType.FIELD)@XmlRootElement(name="item")public class GroupItem implements Serializable{/** *  */private static final long serialVersionUID = 1L;@XmlElementprivate String anchor;@XmlElementprivate String id;@XmlElementprivate String link;public GroupItem() {super();// TODO Auto-generated constructor stub}        //GET/SET}

新增一个类

@XmlAccessorType(XmlAccessType.FIELD)@XmlRootElement(name="root")public class GroupPanel {@XmlElement(name="group")private List<Group> groups= null;@XmlAttributeprivate String ico;public GroupPanel() {super();// TODO Auto-generated constructor stubgroups=new ArrayList<>();}        //GET/SET}


最后是调用时的测试代码

JAXBContext jc = JAXBContext.newInstance(GroupPanel.class, Group.class, GroupItem.class);Unmarshaller u = jc.createUnmarshaller();GroupPanel gs = (GroupPanel) u.unmarshal(new File("/managerGroup.xml"));


0 0
原创粉丝点击