java中使用SAX读取和写出XML文件

来源:互联网 发布:哥特鸟嘴面具 知乎 编辑:程序博客网 时间:2024/06/05 13:22

SAX是一种事件驱动的流式XML文件处理方式,区别与DOM方式的是不需要在内存中建一棵DOM树,而是根据读取XML时遇到的标签事件来顺序处理,因此具有速度快,内存占用上的优点。SAX往往是大容量XML文件处理的首选方法,SAX读取XML相对比较简单,但是写XML就稍微比DOM方式复杂一些,网上的例子也不够全面和详细,刚好在工作中用到了XML读取和写出XML,记录下来以供参考。

读取XML文件:

首先,要读取的目标XML文件如下:

<?xml version="1.0" encoding="UTF-8" ?>  <oes:Notifications xmlns:oes="http://xml.sax.test.com/oesAccessNotification">      <oes:Notification>          <oes:NotificationID>11111</oes:NotificationID>          <oes:NotificationType>AlarmNew</oes:NotificationType>          <oes:timeStamp>2009-02-25T08:57:17</oes:timeStamp>          <oes:Appendix>              <oes:MapItem key="key" value="value"/>          </oes:Appendix>          <oes:Content>              <alarmNew systemDN="PLMN-1/S3SN-1/SRME-BSS-2/SBSS-0">                  <alarmId>400951</alarmId>                  <alarmText>PIPE 0 IS SLOW OR NOT WORKING</alarmText>                  <eventTime>2009-02-25T08:57:17+02:00</eventTime>                  <eventType>processingError</eventType>                  <perceivedSeverity>critical</perceivedSeverity>                  <probableCause>0</probableCause>                  <specificProblem>86600</specificProblem>                  <additionalText1>A Raised by pipe supervision script, process ID 20848</additionalText1>                  <additionalText2>A test additional text2</additionalText2>                  <additionalText3>A test additional text3</additionalText3>                  <additionalText4>A test additional text4</additionalText4>                  <additionalText5>A Original Additional text: test alarm1 | Original Probable Cause: Toxic Leak1 Detected |                      Original alarm time: 20090901183006+0530 | Automatic clearing:Y                  </additionalText5>                  <additionalText6>Original                  </additionalText6>              </alarmNew>          </oes:Content>      </oes:Notification>  </oes:Notifications>  

SAX读取该XML文件的过程如下:

(1).定义XML中各种标签:

class Constant {      public static final String NAME_SPACE = "xmlns:oes";      public static final String SCHEMA = "http://xml.sax.test.com/oesAccessNotification";              public static final String NOTIFICATIONS = "oes:Notifications";              public static final String NOTIFICATION = "oes:Notification";          public static final String NOTIFICATION_ID = "oes:NotificationID";      public static final String NOTIFICATION_TYPE ="oes:NotificationType";      public static final String TIME_STAMP = "oes:timeStamp";      public static final String APPENDIX = "oes:Appendix";      public static final String MAP_ITEM = "oes:MapItem";      public static final String KEY = "key";      public static final String VALUE = "value";      public static final String CONTENT = "oes:Content";      public static final String ALARM_NEW = "alarmNew";      public static final String SYSTEM_DN = "systemDN";      public static final String ALARM_ID = "alarmId";      public static final String ALRAM_TEXT = "alarmText";      public static final String EVENT_TIME = "eventTime";      public static final String EVENT_TYPE = "eventType";      public static final String PERCEIVED_SEVERITY = "perceivedSeverity";      public static final String PROBABLE_CAUSE= "probableCause";      public static final String SPECIFIC_PROBLEM = "specificProblem";      public static final String ADDITION_TEXT1 = "additionalText1";      public static final String ADDITION_TEXT2 = "additionalText2";      public static final String ADDITION_TEXT3 = "additionalText3";      public static final String ADDITION_TEXT4 = "additionalText4";      public static final String ADDITION_TEXT5 = "additionalText5";      public static final String ADDITION_TEXT6 = "additionalText6";      public static final String ADDITION_TEXT7 = "additionalText7";  }  
这些定义会在读取XML的处理过程中用到。

(2).定义XML文件节点对应的Java对象:

class EventFactory {        private XMLReader xmlReader;        public static class InternalEvent {          private String notificationType = "";                    private Map<String, String> props = new HashMap<String, String>();            public String getNotificationType() {              return notificationType;          }                    public String getProp(String name) {              String str = props.get(name);              if (str == null) {                  return "";              } else {                  return str;              }          }                    public Map<String, String> getProps(){              return props;          }            public void setNotificationType(String notificationType) {              this.notificationType = notificationType;          }            public void putAttribute(String name, String value) {              this.props.put(name, value);          }        }      //调用SAX读取XML的方法,XML文件的数据会被存放到该List中      public List<InternalEvent> read(String xmlPath) throws ParserConfigurationException, SAXException {          SAXParserFactory spf = SAXParserFactory.newInstance();          SAXParser saxParser = spf.newSAXParser();                    xmlReader = saxParser.getXMLReader();          List<InternalEvent> container = new LinkedList<InternalEvent>();          ContentHandler handler = new ReadXMLHandler(container);          xmlReader.setContentHandler(handler);          try {              xmlReader.parse(new InputSource(xmlPath));          } catch (IOException e) {              e.printStackTrace();          }          return container;      }        }  
该对象会在SAX读取XML文件时,将XML数据转换为内存中的java对象。

(3).SAX读取XML文件:

class ReadXMLHandler extends DefaultHandler {        private List<EventFactory.InternalEvent> eventContainer;      private StringBuilder buf = new StringBuilder();      private EventFactory.InternalEvent event;      private static final Set<String> ATTR_TAGS = new HashSet<String>();        static {          ATTR_TAGS.add(Constant.EVENT_TIME);          ATTR_TAGS.add(Constant.SPECIFIC_PROBLEM);          ATTR_TAGS.add(Constant.ALRAM_TEXT);          ATTR_TAGS.add(Constant.PERCEIVED_SEVERITY);          ATTR_TAGS.add(Constant.ADDITION_TEXT1);          ATTR_TAGS.add(Constant.ADDITION_TEXT2);          ATTR_TAGS.add(Constant.ADDITION_TEXT3);          ATTR_TAGS.add(Constant.ADDITION_TEXT4);          ATTR_TAGS.add(Constant.ADDITION_TEXT5);          ATTR_TAGS.add(Constant.ADDITION_TEXT6);          ATTR_TAGS.add(Constant.ADDITION_TEXT7);          ATTR_TAGS.add(Constant.EVENT_TYPE);      }        public ReadXMLHandler(List<EventFactory.InternalEvent> eventContainer) {          this.eventContainer = eventContainer;      }        @Override      public void startElement(String uri, String localName, String qName,              Attributes attributes) throws SAXException {          buf.setLength(0);          if (qName.equals("oes:Notification")) {              event = new EventFactory.InternalEvent();              eventContainer.add(event);          }          else if(qName.equals(Constant.MAP_ITEM)){              //获取元素中的属性值,如<a key="key" value="value"/>,获取key和value              String key = attributes.getValue(Constant.KEY);              event.putAttribute(Constant.KEY, key);              String value = attributes.getValue(Constant.VALUE);              event.putAttribute(Constant.VALUE, value);          }          else if(qName.equals(Constant.ALARM_NEW)){              String systemDn = attributes.getValue(Constant.SYSTEM_DN);              event.putAttribute(Constant.SYSTEM_DN, systemDn);          }      }        @Override      public void endElement(String uri, String localName, String qName)              throws SAXException {          if (qName.equals(Constant.NOTIFICATION_TYPE)) {              event.setNotificationType(buf.toString());          }           else if (ATTR_TAGS.contains(qName)){              event.putAttribute(qName, buf.toString());          }      }        //获取元素值,如<a>abc</a>,获取其中的abc      @Override      public void characters(char[] ch, int start, int length)              throws SAXException {          buf.append(ch, start, length);      }    }  
至此XML就读取成功

转换并写出XML文件:

比起SAX读取XML来,SAX写XML要相对复杂一些,流程如下:

(1).对读取的XML对象做一个简单的转换:

class Convert {      public static String convertString(String value){          return value + "_TEST";      }   }  
转换很简单,即将XML标签加一个“_TEST”,同时给值也加一个“_TEST“

(2).写XML文件:

class WriteXML {        SAXTransformerFactory fac = (SAXTransformerFactory) SAXTransformerFactory.newInstance();        private TransformerHandler handler = null;      private OutputStream outStream = null;      private String fileName;      private AttributesImpl atts;      private String rootElement;      //元素层次,用于控制XML缩进      private static int level = 0;      //每个层次父级缩进4个空格,即一个tab      private static String tab = "    ";      //系统换行符,Windows为:"\n",Linux/Unix为:"/n"      private static final String separator = System.getProperties().getProperty("os.name").toUpperCase().indexOf("WINDOWS") != -1 ? "\n" : "/n";            public WriteXML(String fileName, String rootElement) {          this.fileName = fileName;          this.rootElement = rootElement;          init();      }        public void init() {          try {              handler = fac.newTransformerHandler();              Transformer transformer = handler.getTransformer();              //设置输出采用的编码方式              transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");              //是否自动添加额外的空白              transformer.setOutputProperty(OutputKeys.INDENT, "yes");              //是否忽略xml声明              transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");              outStream = new FileOutputStream(fileName);              Result resultxml = new StreamResult(outStream);              handler.setResult(resultxml);              atts = new AttributesImpl();              start();          }catch (Exception e) {              e.printStackTrace();          }      }        private void start() {          try {              handler.startDocument();              //设置schema和名称空间              atts.addAttribute("", "", Constant.NAME_SPACE, String.class.getName(), Constant.SCHEMA);              handler.startElement("", "", rootElement, atts);          } catch (Exception e) {              e.printStackTrace();          }      }        //元素里面会嵌套子节点,因此元素的开始和结束分开写      //如:<a><b>bcd</b></a>      private void startElement(String objectElement, AttributesImpl attrs)              throws SAXException {          if(attrs == null){              attrs = new AttributesImpl();          }          level++;          appendTab();          if (objectElement != null) {              //注意,如果atts.addAttribute设置了属性,则会输出如:<a key="key" value="value">abc</a>格式              //如果没有设置属性,则输出如:<a>abc</a>格式              handler.startElement("", "", objectElement, attrs);          }         }            //正常元素结束标记,如:</a>      private void endElement(String objectElement) throws SAXException{          level--;          appendTab();          if (objectElement != null) {              handler.endElement("", "", objectElement);          }      }        //自封闭的空元素,如<a key="key" value="value"/>,不用换行,写在一行时XML自动会自封闭      private void endEmptyElement(String objectElement) throws SAXException{          handler.endElement("", "", objectElement);      }            //无子节点的元素成为属性,如<a>abc</a>      private void writeAttribute(String key, String value) throws SAXException{          atts.clear();          level++;          appendTab();          handler.startElement("", "", key, atts);          handler.characters(value.toCharArray(), 0, value.length());          handler.endElement("", "", key);          level--;      }        public void end() {          try {              handler.endElement("", "", rootElement);              // 文档结束,同步到磁盘              handler.endDocument();              outStream.close();          }catch (Exception e) {              e.printStackTrace();          }      }        //Tab缩进,SAX默认不自动缩进,因此需要手动根据元素层次进行缩进控制      private void appendTab() throws SAXException{          String indent = separator + "    ";          for(int i = 1 ; i< level; i++){             indent += tab;         }          handler.characters(indent.toCharArray(), 0, indent.length());      }            public void writeNotification(InternalEvent event) throws SAXException{          Map<String, String> props = event.getProps();          Set<String> keys = props.keySet();                    level = 0;          //写<oes:Notification>节点          startElement(Constant.NOTIFICATION, null);                    //写oes:NotificationID          writeAttribute(Convert.convertString(Constant.NOTIFICATION_ID), Convert.convertString(props.get(Constant.NOTIFICATION_ID)));          keys.remove(Constant.NOTIFICATION_ID);          //写oes:NotificationType          writeAttribute(Convert.convertString(Constant.NOTIFICATION_TYPE), Convert.convertString(event.getNotificationType()));          //写oes:timeStamp          writeAttribute(Convert.convertString(Constant.TIME_STAMP), Convert.convertString(props.get(Constant.TIME_STAMP)));          keys.remove(Constant.TIME_STAMP);                    //写<oes:Appendix>节点          startElement(Constant.APPENDIX, null);          //写oes:MapItem          atts = new AttributesImpl();          atts.addAttribute("", "", Convert.convertString(Constant.KEY), String.class.getName(), Convert.convertString(props.get(Constant.KEY)));          keys.remove(Constant.KEY);          atts.addAttribute("", "", Convert.convertString(Constant.VALUE), String.class.getName(), Convert.convertString(props.get(Constant.VALUE)));          keys.remove(Constant.VALUE);          startElement(Constant.MAP_ITEM, atts);          //结束oes:MapItem,由于MapItem是个自封闭的元素,需要特殊处理          endEmptyElement(Constant.MAP_ITEM);          keys.remove(Constant.MAP_ITEM);          //结束oes:MapItem节点          endElement(Constant.APPENDIX);          keys.remove(Constant.APPENDIX);                    //写oes:Content节点          startElement(Constant.CONTENT, null);          keys.remove(Constant.CONTENT);                    //写alarmNew节点          atts = new AttributesImpl();          atts.addAttribute("", "", Convert.convertString(Constant.SYSTEM_DN), String.class.getName(), Convert.convertString(props.get(Constant.SYSTEM_DN)));          startElement(Constant.ALARM_NEW, atts);          keys.remove(Constant.ALARM_NEW);                    //写Alarm节点内的属性          for(String key : keys){              writeAttribute(Convert.convertString(key), Convert.convertString(props.get(key)));          }                    //结束alarmNew节点          endElement(Constant.ALARM_NEW);                    //结束oes:Content节点          endElement(Constant.CONTENT);                    //结束<oes:Notification>节点          endElement(Constant.NOTIFICATION);      }  }   
(3).先用SAX读取XML文件,然后使用SAX处理写出的demo程序如下:
public class FlexMapping {      private static String inputFile = "input/input.xml";      private static String outputFile = "output/output.xml";      private static List<InternalEvent> events;      public static void main(String[] args) {          long start = System.currentTimeMillis();          try {              events = new EventFactory().read(inputFile);              WriteXML xml = new WriteXML(outputFile, Constant.NOTIFICATIONS);              for(InternalEvent event : events){                  xml.writeNotification(event);              }              xml.end();          }catch (Exception e) {              e.printStackTrace();          }          System.out.println("耗时:" + (System.currentTimeMillis() - start) + "ms.");      }  }  
写出的XML文件如下:
<?xml version="1.0" encoding="UTF-8"?>  <oes:Notifications xmlns:oes="http://xml.sax.test.com/oesAccessNotification">      <oes:Notification>          <oes:NotificationID_TEST>11111_TEST</oes:NotificationID_TEST>          <oes:NotificationType_TEST>AlarmNew_TEST</oes:NotificationType_TEST>          <oes:timeStamp_TEST>2009-02-25T08:57:17_TEST</oes:timeStamp_TEST>          <oes:Appendix>              <oes:MapItem key_TEST="key_TEST" value_TEST="value_TEST"/>          </oes:Appendix>              <oes:Content>                  <alarmNew systemDN_TEST="PLMN-1/S3SN-1/SRME-BSS-2/SBSS-0_TEST">                      <additionalText1_TEST>A Raised by pipe supervision script, process ID 20848_TEST</additionalText1_TEST>                      <systemDN_TEST>PLMN-1/S3SN-1/SRME-BSS-2/SBSS-0_TEST</systemDN_TEST>                      <additionalText2_TEST>A test additional text2_TEST</additionalText2_TEST>                      <eventTime_TEST>2009-02-25T08:57:17+02:00_TEST</eventTime_TEST>                      <probableCause_TEST>0_TEST</probableCause_TEST>                      <additionalText3_TEST>A test additional text3_TEST</additionalText3_TEST>                      <alarmText_TEST>PIPE 0 IS SLOW OR NOT WORKING_TEST</alarmText_TEST>                      <specificProblem_TEST>86600_TEST</specificProblem_TEST>                      <additionalText6_TEST>Original_TEST</additionalText6_TEST>                      <additionalText5_TEST>A Original Additional text: test alarm1 | Original Probable Cause: Toxic Leak1 Detected |                      Original alarm time: 20090901183006+0530 | Automatic clearing:Y                  _TEST</additionalText5_TEST>                      <perceivedSeverity_TEST>critical_TEST</perceivedSeverity_TEST>                      <additionalText4_TEST>A test additional text4_TEST</additionalText4_TEST>                      <alarmId_TEST>400951_TEST</alarmId_TEST>                      <eventType_TEST>processingError_TEST</eventType_TEST>              </alarmNew>          </oes:Content>      </oes:Notification>  </oes:Notifications>  


原创粉丝点击