Python生成PASCAL VOC格式的xml标注文件

来源:互联网 发布:淘宝上买的司考视频 编辑:程序博客网 时间:2024/05/27 09:45

PASCAL VOC数据集的标注文件是xml格式的。对于py-faster-rcnn 或者ssd,通常以下示例的字段是合适的:

<annotation>  <folder>GTSDB</folder>  <filename>000001.jpg</filename>  <size>    <width>500</width>    <height>375</height>    <depth>3</depth>  </size>  <object>    <name>mouse</name>    <difficult>0</difficult>    <bndbox>      <xmin>99</xmin>      <ymin>358</ymin>      <xmax>135</xmax>      <ymax>375</ymax>    </bndbox>  </object></annotation>

怎样从csv或者txt格式的文件,读取bbox信息,生成xml格式的annotation文件呢?直接逐行写文件肯定可以,但是以后改起来并不太方便,\t和空格的替换也不太方便。


sudo pip install lxml
#from xml.etree.ElementTree import Element, SubElement, tostringfrom lxml.etree import Element, SubElement, tostringimport pprintfrom xml.dom.minidom import parseStringnode_root = Element('annotation')node_folder = SubElement(node_root, 'folder')node_folder.text = 'GTSDB'node_filename = SubElement(node_root, 'filename')node_filename.text = '000001.jpg'node_size = SubElement(node_root, 'size')node_width = SubElement(node_size, 'width')node_width.text = '500'node_height = SubElement(node_size, 'height')node_height.text = '375'node_depth = SubElement(node_size, 'depth')node_depth.text = '3'node_object = SubElement(node_root, 'object')node_name = SubElement(node_object, 'name')node_name.text = 'mouse'node_difficult = SubElement(node_object, 'difficult')node_difficult.text = '0'node_bndbox = SubElement(node_object, 'bndbox')node_xmin = SubElement(node_bndbox, 'xmin')node_xmin.text = '99'node_ymin = SubElement(node_bndbox, 'ymin')node_ymin.text = '358'node_xmax = SubElement(node_bndbox, 'xmax')node_xmax.text = '135'node_ymax = SubElement(node_bndbox, 'ymax')node_ymax.text = '375'xml = tostring(node_root, pretty_print=True)  #格式化显示,该换行的换行dom = parseString(xml)print xml