python处理xml数据

来源：互联网发布：阿里云内网带宽编辑：程序博客网时间：2024/05/21 13:17

由于最近需要使用python处理xml数据，因此到网上找了些资料学习了下。
 最新学习的是python的xml.dom.minidom模块，按照资料上的说法，特地在python命令行环境验证了一下：
 执行之后却发现xml.dom.minidom无法获取xml节点之间的文本值，代码如下：
 >>> test = "<a>1<c id='1'>4</c></a>"
 >>> tdoc = xml.dom.minidom.parseString(test)
 w=tdoc.getElementsByTagName('b')
 >>> node=w[0]
 >>> node.nodeValue
 网上的资料和python官方文档都说是用 node.nodeValue获取节点的值，但是上述代码却输出为空。注意到node.nodeValue只对TEXT_NODE类型的节点有效，因此查看一下节点b的类型：
 >>> node.nodeType
 1
 1代表ELEMENT_NODE节点类型。于是在网上和官方文档找了半天，也没有获取到什么有用的信息。不得以，只好看看其他的模块。在网上的另一篇文章中对比了python处理xml的模块性能优势，于是选择了 cElementTree 模块，测试代码如下：

>>> import cElementTree
>>> dir(cElementTree )
['Comment', 'Element', 'ElementPath', 'ElementTree', 'PI', 'ProcessingInstruction', 'QName', 'SubElement', 'TreeBuilder', 'VERSION', 'XML', 'XMLID', 'XMLParser', 'XMLParserError', 'XMLTreeBuilder', '__doc__', '__file__', '__name__', '__version__', 'dump', 'fromstring', 'iselement', 'iterparse', 'parse', 'tostring']
>>> test = "<a>1<c id='1'>wewew</c></a>"
>>>
>>> dom = cElementTree.parse(test)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<string>", line 45, in parse
File "<string>", line 22, in parse
IOError: [Errno 2] No such file or directory: "<a>1<c id='1'>wewew</c></a>"
>>> dom = cElementTree.fromstring(test)
>>> root = dom.getroot()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: getroot
>>> type(dom)
<type 'Element'>
>>> dir(dom)
['!__reduce__', '__copy__', '__deepcopy__', 'append', 'clear', 'find', 'findall', 'findtext', 'get', 'getchildren', 'getiterator', 'insert', 'items', 'keys', 'makeelement', 'remove', 'set']
>>> dom.find('b')
<Element 'b' at 0xb7bef2d8>
>>> w=dom.find('b')
>>> w.text
'1'
>>> test = "<a>1wewew</a>"
>>> dom = cElementTree.fromstring(test)
>>> dom.find('b')
<Element 'b' at 0xb7bef410>
>>> w=dom.find('b')
>>> w.text
'1'
>>> w=dom.find('b')
>>> w.text
'1'
>>> w=dom.findall('b')
>>> type(w)
<type 'list'>
>>> w
[<Element 'b' at 0xb7bef410>, <Element 'b' at 0xb7bef4a0>]
>>> w[1].text
'wewew'
>>> c=dom.find('c')
>>> dir(c)
['!__reduce__', '__copy__', '__deepcopy__', 'append', 'clear', 'find', 'findall', 'findtext', 'get', 'getchildren', 'getiterator', 'insert', 'items', 'keys', 'makeelement', 'remove', 'set']
>>> c.items()
[('id', '1')]
>>> c.get('id')
'1'
>>>
 不过， cElementTree.fromstring(test)有一个小小的陷进，当你要查找的节点是root节点是，你无法使用dom.find(node)的方式获取节点，而且也无法判断当前节点是否是自己所需要的节点，一个变通的方法：
 str(dom).split(' ')[1][1:-1] == node:
 可以通过判断上述代码的True和False来获取