YOLO v2 人脸检测 by zhangzexuan

来源：互联网发布：淘宝付费推广技巧编辑：程序博客网时间：2024/06/06 00:47

一、数据准备

YOLO本身使用的是VOC 的数据集，因此，要使用其它数据集来训练YOLO v2的话，可以在VOC数据集的基础之上进行改造，或者说按照VOC数据集的结构和格式来构建所需的数据集。

这里我所使用的数据集是CelebA(Large-scale CelebFaces Attributes Dataset)大规模名人人脸标注数据集。使用的是.jpg格式的图片。CelebA数据集中的图片命名格式为统一的000001.jpg—202599.jpg，BoundingBox的信息在list_bbox_celeba.txt中保存。格式如下：

--------------------------------------------------list_bbox_celeba.txt-----------------------------------------------------

第一行：图片总数量

第二行：格式信息

余下所有行：< xxxxxx.jpg > < [x1] [y1] [width] [height] >

--------------------------------------------------list_bbox_celeba.txt-----------------------------------------------------

其中，x1，y1表示该BoundingBox左上角点的坐标，width，height分别表示该BoundingBox的宽度和高度。

在YOLO中，每张图片都需要一个对应的label文件，这个label文件应当是一个.txt文件，除后缀名外，它的文件名与该图片的文件名相同，其中的内容为< 类别代码> < [x] [y][width] [height] >，类别代码为从0开始的整数，它用于在.names文件中指明该BoundingBox中目标的类别。这里x = BoundingBox的中心点横坐标/图片宽度，y = BoundingBox的中心点纵坐标/图片高度，width = BoundingBox宽度/图片宽度，height = BoundingBox高度/图片高度。

二、数据处理

YOLOv2训练时需要：

1. 指明训练图片绝对路径的train.txt文件和指明验证图片绝对路径的val.txt。

2. 所有图片对应的label文本文件，在voc数据集中位于VOC2007/labels文件夹内。

3. 训练数据配置文件voc.data

4. 网络配置文件，这里用tiny-yolo.cfg

5. 类别名列表文件voc.names

以上1，2两项需要我们自己生成，作者提供了一个python程序（darknet/scripts/voc_label.py）来生成所有的label文本文件和图片路径文本文件，但前提是我的数据必须按照voc数据集的格式布局，这里使用【http://blog.csdn.net/minstyrain/article/details/77888176?locationNum=4&fps=1】提供的python程序（我将之命名为celebA2YOLO.py）来将celebA数据集转换成voc数据集的格式。代码中的生成文件路径需要按照自己的要求修改。我修改后的代码如下：

import cv2,h5py,osimport numpy as npfrom xml.dom.minidom import Documentimport progressbarrootdir="../"imgdir=rootdir+"Img/img_celeba"  landmarkpath=rootdir+"Anno/list_landmarks_celeba.txt"bboxpath=rootdir+"Anno/list_bbox_celeba.txt"vocannotationdir=rootdir+"VOCdevkit/VOC2007/"+"Annotations"labelsdir=rootdir+"VOCdevkit/VOC2007/"+"labels"  convet2yoloformat=Trueconvert2vocformat=True  resized_dim=(48,48)  datasetprefix="/home/scw4750/zhangzexuan/CelebA/VOCdevkit/VOC2007/JPEGImages/"progress = progressbar.ProgressBar(widgets=[    progressbar.Percentage(),    ' (', progressbar.SimpleProgress(), ') ',    ' (', progressbar.Timer(), ') ',    ' (', progressbar.ETA(), ') ',])def drawbboxandlandmarks(img,bbox,landmark):    cv2.rectangle(img,(bbox[0],bbox[1]),(bbox[0]+bbox[2],bbox[1]+bbox[3]),(0,255,0))    for i in range(int(len(landmark)/2)):        cv2.circle(img,(int(landmark[2*i]),int(landmark[2*i+1])),2,(0,0,255))  def loadgt():    imgpaths=[]    landmarks=[]    bboxes=[]    with open(landmarkpath) as landmarkfile:        lines=landmarkfile.readlines()        lines=lines[2:]        for line in lines:            landmarkline=line.split()            imgpath=landmarkline[0]            imgpaths.append(imgpath)            landmarkline=landmarkline[1:]            landmark=[int(str) for str in landmarkline]            landmarks.append(landmark)    with open(bboxpath) as bboxfile:        lines=bboxfile.readlines()        lines=lines[2:]        for line in lines:            bboxline=line.split()            imgpath=bboxline[0]            bboxline=bboxline[1:]            bbox=[int(bb) for bb in bboxline]            bboxes.append(bbox)    return imgpaths,bboxes,landmarks  def generate_hdf5():    imgpaths,bboxes,landmarks=loadgt()    numofimg=len(imgpaths)    faces=[]    labels=[]    #numofimg=2    for i in range(numofimg):        imgpath=imgdir+"/"+imgpaths[i]        print(i)#,imgpath)        bbox=bboxes[i]        landmark=landmarks[i]        img=cv2.imread(imgpath)        if bbox[2]<=0 or bbox[3]<=0:            continue        face=img[bbox[1]:bbox[1]+bbox[3],bbox[0]:bbox[0]+bbox[2]]        face=cv2.resize(face,resized_dim)        faces.append(face)        label=[]        label.append(1)        for i in range(len(bbox)):            label.append(bbox[i])        for i in range(len(landmark)):            lm=landmark[i]            if i%2==0:                lm=(lm-bbox[0])*1.0/(bbox[2])            else:                lm=(lm-bbox[1])*1.0/(bbox[3])            label.append(lm)        labels.append(label)    faces=np.asarray(faces)    labels=np.asarray(labels)    f=h5py.File('train.h5','w')    f['data']=faces.astype(np.float32)    f['labels']=labels.astype(np.float32)    f.close()def viewginhdf5():    f = h5py.File('train.h5','r')    f.keys()    faces=f['data'][:]    labels=f['labels'][:]    for i in range(len(faces)):        print(i)        face=faces[i].astype(np.uint8)        label=labels[i]        bbox=label[1:4]        landmark=label[5:]        for i in range(int(len(landmark)/2)):            cv2.circle(face,(int(landmark[2*i]*resized_dim[0]),int(landmark[2*i+1]*resized_dim[1])),1,(0,0,255))        cv2.imshow("img",face)        cv2.waitKey()    f.close()def showgt():    landmarkfile=open(landmarkpath)    bboxfile=open(bboxpath)    numofimgs=int(landmarkfile.readline())    _=landmarkfile.readline()    _=bboxfile.readline()    _=bboxfile.readline()    index=0      pbar = progress.start()    if convet2yoloformat:        if not os.path.exists(labelsdir):            os.mkdir(labelsdir)    if convert2vocformat:        if not os.path.exists(vocannotationdir):            os.mkdir(vocannotationdir)#    while(index<numofimgs):    for i in pbar(range(numofimgs)):        #pbar.update(int((index/(numofimgs-1))*10000))        landmarkline=landmarkfile.readline().split()        filename=landmarkline[0]        #sys.stdout.write("\r"+str(index)+":"+filename)        #sys.stdout.flush()        imgpath=imgdir+"/"+filename        img=cv2.imread(imgpath)        landmarkline=landmarkline[1:]        landmark=[int(pt) for pt in landmarkline]        bboxline=bboxfile.readline().split()        imgpath2=imgdir+"/"+bboxline[0]        bboxline=bboxline[1:]        bbox=[int(bb) for bb in bboxline]        drawbboxandlandmarks(img,bbox,landmark)        if convet2yoloformat:            height=img.shape[0]            width=img.shape[1]            txtpath=labelsdir+"/"+filename            txtpath=txtpath[:-3]+"txt"            ftxt=open(txtpath,'w')            xcenter=(bbox[0]+bbox[2]*0.5)/width            ycenter=(bbox[1]+bbox[3]*0.5)/height            wr=bbox[2]*1.0/width            hr=bbox[3]*1.0/height            line="0 "+str(xcenter)+" "+str(ycenter)+" "+str(wr)+" "+str(hr)+"\n"            ftxt.write(line)            ftxt.close()        if convert2vocformat:            xmlpath=vocannotationdir+"/"+filename            xmlpath=xmlpath[:-3]+"xml"            doc = Document()            annotation = doc.createElement('annotation')            doc.appendChild(annotation)            folder = doc.createElement('folder')            folder_name = doc.createTextNode('CelebA')            folder.appendChild(folder_name)            annotation.appendChild(folder)            filenamenode = doc.createElement('filename')            filename_name = doc.createTextNode(filename)            filenamenode.appendChild(filename_name)            annotation.appendChild(filenamenode)            source = doc.createElement('source')            annotation.appendChild(source)            database = doc.createElement('database')            database.appendChild(doc.createTextNode('CelebA Database'))            source.appendChild(database)            annotation_s = doc.createElement('annotation')            annotation_s.appendChild(doc.createTextNode('PASCAL VOC2007'))            source.appendChild(annotation_s)            image = doc.createElement('image')            image.appendChild(doc.createTextNode('flickr'))            source.appendChild(image)            flickrid = doc.createElement('flickrid')            flickrid.appendChild(doc.createTextNode('-1'))            source.appendChild(flickrid)            owner = doc.createElement('owner')            annotation.appendChild(owner)            flickrid_o = doc.createElement('flickrid')            flickrid_o.appendChild(doc.createTextNode('tdr'))            owner.appendChild(flickrid_o)            name_o = doc.createElement('name')            name_o.appendChild(doc.createTextNode('yanyu'))            owner.appendChild(name_o)            size = doc.createElement('size')            annotation.appendChild(size)            width = doc.createElement('width')            width.appendChild(doc.createTextNode(str(img.shape[1])))            height = doc.createElement('height')            height.appendChild(doc.createTextNode(str(img.shape[0])))            depth = doc.createElement('depth')            depth.appendChild(doc.createTextNode(str(img.shape[2])))            size.appendChild(width)            size.appendChild(height)            size.appendChild(depth)            segmented = doc.createElement('segmented')            segmented.appendChild(doc.createTextNode('0'))            annotation.appendChild(segmented)            for i in range(1):                objects = doc.createElement('object')                annotation.appendChild(objects)                object_name = doc.createElement('name')                object_name.appendChild(doc.createTextNode('face'))                objects.appendChild(object_name)                pose = doc.createElement('pose')                pose.appendChild(doc.createTextNode('Unspecified'))                objects.appendChild(pose)                truncated = doc.createElement('truncated')                truncated.appendChild(doc.createTextNode('1'))                objects.appendChild(truncated)                difficult = doc.createElement('difficult')                difficult.appendChild(doc.createTextNode('0'))                objects.appendChild(difficult)                bndbox = doc.createElement('bndbox')                objects.appendChild(bndbox)                xmin = doc.createElement('xmin')                xmin.appendChild(doc.createTextNode(str(bbox[0])))                bndbox.appendChild(xmin)                ymin = doc.createElement('ymin')                ymin.appendChild(doc.createTextNode(str(bbox[1])))                bndbox.appendChild(ymin)                xmax = doc.createElement('xmax')                xmax.appendChild(doc.createTextNode(str(bbox[0]+bbox[2])))                bndbox.appendChild(xmax)                ymax = doc.createElement('ymax')                ymax.appendChild(doc.createTextNode(str(bbox[1]+bbox[3])))                bndbox.appendChild(ymax)            f=open(xmlpath,"w")            f.write(doc.toprettyxml(indent = ''))            f.close()        cv2.imshow("img",img)        cv2.waitKey(1)        index=index+1    pbar.finish()  def generatetxt(trainratio=0.7,valratio=0.2,testratio=0.1):    files=os.listdir(labelsdir)    ftrain=open(rootdir+"VOCdevkit/VOC2007/"+"train.txt","w")    fval=open(rootdir+"VOCdevkit/VOC2007/"+"val.txt","w")    ftrainval=open(rootdir+"VOCdevkit/VOC2007/"+"trainval.txt","w")    ftest=open(rootdir+"VOCdevkit/VOC2007/"+"test.txt","w")    index=0    for i in range(len(files)):        filename=files[i]        filename=datasetprefix+filename[:-3]+"jpg"+"\n"        if i<trainratio*len(files):            ftrain.write(filename)            ftrainval.write(filename)        elif i<(trainratio+valratio)*len(files):            fval.write(filename)            ftrainval.write(filename)        elif i<(trainratio+valratio+testratio)*len(files):            ftest.write(filename)    ftrain.close()    fval.close()    ftrainval.close()    ftest.close()  def generatevocsets(trainratio=0.7,valratio=0.2,testratio=0.1):    if not os.path.exists(rootdir+"VOCdevkit/VOC2007/ImageSets"):        os.mkdir(rootdir+"VOCdevkit/VOC2007/ImageSets")    if not os.path.exists(rootdir+"/ImageSets/Main"):        os.mkdir(rootdir+"VOCdevkit/VOC2007/ImageSets/Main")    ftrain=open(rootdir+"VOCdevkit/VOC2007/ImageSets/Main/train.txt",'w')    fval=open(rootdir+"VOCdevkit/VOC2007/ImageSets/Main/val.txt",'w')    ftrainval=open(rootdir+"VOCdevkit/VOC2007/ImageSets/Main/trainval.txt",'w')    ftest=open(rootdir+"VOCdevkit/VOC2007/ImageSets/Main/test.txt",'w')    files=os.listdir(labelsdir)    for i in range(len(files)):        imgfilename=files[i][:-4]        ftrainval.write(imgfilename+"\n")        if i<int(len(files)*trainratio):            ftrain.write(imgfilename+"\n")        elif i<int(len(files)*(trainratio+valratio)):            fval.write(imgfilename+"\n")        else:            ftest.write(imgfilename+"\n")    ftrain.close()    fval.close()    ftrainval.close()    ftest.close()  if __name__=="__main__":    showgt()    generatevocsets()    generatetxt()    #generate_hdf5()    #viewginhdf5()

celebA2YOLO.py需要我将celebA的数据以及celebA2YOLO.py按照如下目录结构放置：

----------------------------------------------CelebA目录结构----------------------------------------------

--CelebA

--Anno

list_bbox_celeba.txt

list_landmarks_celeba.txt

--Img

celebA2YOLO.py

--img_celeba

{allof celebA pics}

--VOCdevkit

--VOC2007

----------------------------------------------CelebA目录结构----------------------------------------------

注意以上所有目录及文件都必须存在。

运行celebA2YOLO.py后生成的目录结构如下：

----------------------------------------------CelebA目录结构----------------------------------------------

--CelebA

--Anno

list_bbox_celeba.txt

list_landmarks_celeba.txt

--Img

celebA2YOLO.py

--img_celeba

{allof celebA pics}

--VOCdevkit

--VOC2007

--Annotations

{allof XMLs}

--ImageSets

--Main

train.txt

val.txt

trainval.txt

test.txt

train.txt//这四个文件与ImageSets/Main/中的四个文件同名，我们之后所提到的train.txt,val.txt,trainval.txt,test.txt均指这四个文件而非ImageSets/Main/中的文件。

val.txt

trainval.txt

test.txt

----------------------------------------------CelebA目录结构----------------------------------------------

接着我在CelebA/VOCdevkit/VOC2007/下新建目录JPEGImages，将/CelebA/Img/img_celeba/下的所有图片移动到/CelebA/VOCdevkit/VOC2007/JPEGImages/中，并将yolo作者提供的voc_label.py复制到CelebA/下。

运行voc_label.py。得到的目录结构如下：

----------------------------------------------CelebA目录结构----------------------------------------------

--CelebA

voc_label.py

2007_train.txt//这里我们不用这两个文件，我们使用celebA2YOLO.py在VOC2007下生成的train.txt,test.txt,val.txt。

train.all.txt

--Anno

list_bbox_celeba.txt

list_landmarks_celeba.txt

--Img

celebA2YOLO.py

--img_celeba

--VOCdevkit

--VOC2007

--JPEGImages

{allof celeba pics}

--Annotations

{allof XMLs}

--ImageSets

--Main

train.txt

val.txt

trainval.txt

test.txt

--labels

{all of label.txt files for yolov2}

train.txt

val.txt

trainval.txt

test.txt

----------------------------------------------CelebA目录结构----------------------------------------------

这样我们就得到了1，2要求的文件。这样数据的准备和部署就做好啦。

三、修改YOLOv2相关文件

接下来我们考虑3。

先看voc.data文件，voc.data文件位于darknet/cfg/下，原版voc.data文件的内容如下：

-----------------------------------------------voc.data文件内容---------------------------------------------------

classes = 20

train = /home/pjreddie/data/voc/train.txt

valid = /home/pjreddie/data/voc/2007_test.txt

names = data/voc.names

backup = backup

-----------------------------------------------voc.data文件内容---------------------------------------------------

其中，train和valid两项分别代表训练图片和验证图片，其后的路径分别指向train.txt,val.txt。names指向类别名列表，即voc.names。backup为训练时备份权重文件的路径。

我将voc.data的内容作如下修改，同时在zhangzexuan/下新建backup目录：

-----------------------------------------------voc.data文件内容---------------------------------------------------

Classes = 1

train = /home/scw4750/zhangzexuan/CelebA/train.txt

valid = /home/scw4750/zhangzexuan/CelebA/val.txt//我使用celebA2YOLO.py所生成的train.txt和val.txt来进行训练以及验证

names = data/voc.names

backup = /home/scw4750/zhangzexuan/backup

-----------------------------------------------voc.data文件内容---------------------------------------------------

修改网络配置文件tiny-yolo.cfg，tiny-yolo.cfg位于darknet/cfg/目录下。按照下述修改：

1. 第3行与第4行，删除“batch”和“subdivisions”前的“#”和空格。

2. 第6行与第7行，在最前面加“#”。

3. 第4行，改为“subdivisions=8”。

4. 第125行，改为“classes=1”。

5. 第119行，改为“filters=30”。//这里filters的值的计算是有根据的：5 bbox per location; bbox hasstx,sty,tw,th,and confidences of 20 classes,and probabilities for voc.即5*(4+1+1)=30

修改类别名列表文件darknet/data/voc.names：这个文件就是按顺序存储了对应的类别名称，将其清空并在第一行写入“face”即可（因为我们在label文件xxxxxx.txt中用0来代表face）。

四、进行训练

阅读这篇文章之前，你应该已经看了yolo作者官网的教程，并且已经在你的系统上clone好了darknet。

做好了之前的步骤之后，我现在可以开始训练了。需要注意的是，darknet训练命令有省略某些参数的简写形式，我没有修改darknet的源码，因此我在进行训练的时候必须使用darknet的完整命令。

即：./darknet detector train cfg/voc.data cfg/tiny-yolo.cfg -gpus0,1,2,3

测试：./darknet detector test cfg/voc.data cfg/tiny-yolo.cfg [权重文件] [图片]

好啦，享受训练的乐趣吧~

阅读全文

0 0

YOLO v2 人脸检测 by zhangzexuan

一、 数据准备

二、数据处理

三、修改YOLOv2相关文件

四、进行训练

一、数据准备