利用python的xmllib2实现XML文件解析

来源：互联网发布：萧亚轩知乎男友编辑：程序博客网时间：2024/05/22 10:05

http://www.fjqnl.net/?p=126

待解析的XML文件：

<?xml version='1.0'?><people class="dgdg"><person class="dfdg"><![CDATA[here is a bunch of fun text that i want to get a substring out of.]]><name class="DEV" col="dark"><![CDATA[function matchwo(a,b){if(a < b && a < 0)then{return 1;}else{return 0;}}]]></name><age class="ddg">34</age><friends class="hddd"><friend class="helll">Steve</friend><friend class="dgdgdfe">Mark</friend><friend class="dgdga" dgl="dgdgdash">Dave</friend><![CDATA[here is a bunch of fun text that i want to get a substringout of.]]></friends></person></people>

用python实现的解析程序：

import libxml2def wlakProperty(xmlnode):'''提取节点属性函数'''if not xmlnode.properties == None:for property in xmlnode.properties: #循环打印节点属性if property.type == 'attribute':print property.name + '=' + property.content,def walkTree(xmlnode):'''遍历节点函数'''child = xmlnode.childrenwhile child is not None: #判断子节点是否存在if not child.isBlankNode(): #判断子节点是否为空元素if child.type == "element":childCount = int(child.xpathEval('count(*)')) #计算字节点素数目depth = int(child.xpathEval('count(ancestor::*)'))-1 #计算此时节点深度，使输出有层次if childCount == 0: #如果没有字节点了，则打印此节点print depth * '\t' + child.name + ':' + child.content,wlakProperty(child)print '\n'else:#依然存在子节点，继续递归遍历print depth * '\t' + child.name,wlakProperty(child)print '\n'walkTree(child)child = child.nextif __name__ == '__main__':doc = libxml2.parseFile("people.xml") #导入要解析的XML文件root = doc.getRootElement()#获取根节点walkTree(root)doc.freeDoc()

参考：http://ukchill.com/technology/getting-started-with-libxml2-and-python-part-1/