Python XML

来源：互联网发布：notepad sql格式化编辑：程序博客网时间：2024/05/19 15:40

最近在研究python解析xml。python从来不缺解析xml的库，我分析了一下，综合来看有两个非常合适，一个是大名鼎鼎很低调的xml.dom，一个是强大而且高效的lxml。先来学习minidom的。

这个类实现的readNodes作用是读取节点值和相应的属性

readElementByName是根据输入的元素名字来读取其子元素的结点属性

不是很难理解

先把xml贴上来

1 <?xml version="1.0" encoding="UTF-8"?>
2 <waf>
3     <policy> acl </policy>
4     <prot>
5          <dstip>2.2.2.2</dstip>
6          <dstip>3.3.3.3</dstip>
7          <dstport>80</dstport>
8          <srcip>3.3.3.3</srcip>
9          <srcport>8888</srcport>
10         <protocol>17</protocol>
11     </prot>
12
13     <other test_case_id = "1" >
14         <action>
15             0
16         </action>
17         <res>
18             0
19         </res>
20     </other>
21     <rule ID="18612269" value="/x22" />
22 </waf>

1 #!/usr/bin/env python
2 #coding=utf-8
3 from xml.dom import minidom
4
5 class Xml_dom():
6     def readNodes(self,domElement):
7         for nodes in domElement.childNodes:
8             if nodes.nodeType == 1:
9                 print nodes.nodeName+'====================='
10                 for keys in nodes.attributes.keys():
11                     print nodes.attributes[keys].name+'='+nodes.attributes[keys].value
12                     if len(nodes.childNodes)==1:
13                         print nodes.nodeName+':'+nodes.childNodes[0].nodeValue
14                     else:
15                         self.readNodes(nodes)
16     def readElementByName(self,elementList):
17         for elements in elementList:
18             if elements.nodeType == 1:
19                 print elements.nodeName+'>>>>>>>>>>>>>>>>>>>>>>>'
20                 for keys in elements.attributes.keys():
21                     print elements.attributes[keys].name+'='+elements.attributes[keys].value
22             if len(elements.childNodes) == 1:
23                 print elements.nodeName+':'+elements.childNodes[0].nodeValue
24             else:
25                 self.readElementByName(elements.childNodes)
26     def __init__(self,filename,elename):
27         self.dom = minidom.parse(filename)
28         self.root = self.dom.documentElement
29         print '=========xml_dom==============/n'
30         self.readNodes(self.root)
31         print '=========end===============/n'
32         print '>>>>>>>>>xml_dom>>>>>>>>>>/n'
33         el = self.dom.getElementsByTagName(elename)
34         self.readElementByName(el)
35         print ">>>>>>>>>end>>>>>>>>>>>>"
36
37 if __name__=='__main__':
38 #    a = Xml_dom('rule_sqlInj.xml','configs')
39     a = Xml_dom('waf_sqlrule.xml','prot')

得到的结果：

> "D:/Python25/pythonw.exe" -u "D:/学习/python/xml/xml_dom/xml_dom.py"
=========xml_dom==============

policy=====================
prot=====================
other=====================
test_case_id=1
action=====================
res=====================
rule=====================
ID=18612269
value=/x22
=========end===============

>>>>>>>>>xml_dom>>>>>>>>>>

prot>>>>>>>>>>>>>>>>>>>>>>>
dstip>>>>>>>>>>>>>>>>>>>>>>>
dstip:2.2.2.2
dstip>>>>>>>>>>>>>>>>>>>>>>>
dstip:3.3.3.3
dstport>>>>>>>>>>>>>>>>>>>>>>>
dstport:80
srcip>>>>>>>>>>>>>>>>>>>>>>>
srcip:3.3.3.3
srcport>>>>>>>>>>>>>>>>>>>>>>>
srcport:8888
protocol>>>>>>>>>>>>>>>>>>>>>>>
protocol:17
>>>>>>>>>end>>>>>>>>>>>>

网上从来不缺乏minidom解析的文章，我也是刚学到的。这个还好理解，其实掌握了基本的method就可以应用了。如果想学习更多，可以直接看minidom的源码。