Python XML转Json之XML2Dict使用

来源:互联网 发布:淘宝vpn 编辑:程序博客网 时间:2024/05/16 00:33

1. Json读写方法

def parseFromFile(self, fname):    """    Overwritten to read JSON files.    """    f = open(fname, "r")    return json.load(f)def serializeToFile(self, fname, annotations):    """    Overwritten to write JSON files.    """    f = open(fname, "w")    json.dump(annotations, f, indent=4, separators=(',', ': '), sort_keys=True)    f.write("\n")

2. xml文件的工具包XML2Dict

将xml转换成Python本地字典对象, 访问子元素和字典常用方法类似,略有不同, 使用 “.”
注: 使用xml2dict库,需要在本地项目添加 xml2dict.py, object_dict.py,下载链接

加载xml文件

from xml2dict import XML2Dictxml = XML2Dict()r = xml.parse("待处理文件名.xml")  

xml示例[voc2007格式]:

<annotation>    <folder>VOC2007</folder>    <filename>AL_00001.JPG</filename>    <size>        <width>800</width>        <height>1160</height>        <depth>3</depth>    </size>    <object>        <name>l_faster</name>        <pose>Unspecified</pose>        <truncated>0</truncated>        <difficult>0</difficult>        <bndbox>            <xmin>270</xmin>            <ymin>376</ymin>            <xmax>352</xmax>            <ymax>503</ymax>        </bndbox>    </object>    <object>        <name>l_faster</name>        <pose>Unspecified</pose>        <truncated>0</truncated>        <difficult>0</difficult>        <bndbox>            <xmin>262</xmin>            <ymin>746</ymin>            <xmax>355</xmax>            <ymax>871</ymax>        </bndbox>    </object>    <object>        <name>r_faster</name>        <pose>Unspecified</pose>        <truncated>0</truncated>        <difficult>0</difficult>        <bndbox>            <xmin>412</xmin>            <ymin>376</ymin>            <xmax>494</xmax>            <ymax>486</ymax>        </bndbox>    </object>    <object>        <name>r_faster</name>        <pose>Unspecified</pose>        <truncated>0</truncated>        <difficult>0</difficult>        <bndbox>            <xmin>411</xmin>            <ymin>748</ymin>            <xmax>493</xmax>            <ymax>862</ymax>        </bndbox>    </object></annotation>

分析下这个文件的格式:

最外一层被<annotation></annotation>包围往里一层是:<file_name></file_name>,<size></size>,<object></object>,其中object是列表,包括name和bndbox,示例访问annotation下级元素
# -*- coding: utf-8 -*-from xml2dict import XML2Dictxml = XML2Dict()r = xml.parse('Annotations/AL_00001.xml')for item in r.annotation:    print itemprint '------------'for item in r.annotation.object:    print item.name, item.bndbox.xmin, item.bndbox.xmax, item.bndbox.ymin, item.bndbox.ymax

执行结果:

objectfoldersizevaluefilename------------l_faster 270 352 376 503l_faster 262 355 746 871r_faster 412 494 376 486r_faster 411 493 748 862

完整代码[xml2json]

# -*- coding: utf-8 -*-from xml2dict import XML2Dictimport jsonimport globdef serializeToFile(fname, annotations):    """    Overwritten to write JSON files.    """    f = open(fname, "w")    json.dump(annotations, f, indent=4, separators=(',', ': '), sort_keys=True)    f.write("\n")def getAnnos(file_name="", prefix=''):    xml = XML2Dict()    root = xml.parse(file_name)    # get a dict object    anno = root.annotation    image_name = anno.filename    item = {'filename': prefix + image_name, 'class': 'image', 'annotations': []}    for obj in anno.object:        cls = {'l_faster': 'C1', 'r_faster': 'C2'}[obj.name]        box = obj.bndbox        x, y, width, height = int(box.xmin), int(box.ymin), int(box.xmax) - int(box.xmin), int(box.ymax) - int(box.ymin)        item['annotations'] += [{                "class": cls,                "height": height,                "width": width,                "x": x,                "y": y            }]    return itemif __name__ == '__main__':    annotations = []    anno_name = 'AR_001-550.json'    files = glob.glob('Annotations/AR_*.xml')    files = sorted(files)    # print files.sort()    for filename in files:        item = getAnnos(filename, prefix='TFS/JPEGImages/')        print item        print '-----------------'        annotations += [item] #"xmls/AL_00001.xml"    serializeToFile(anno_name, annotations)

参考:

  • python处理.xml文件工具包之XML2Dict
  • python对XML的解析
原创粉丝点击