踩着坑画bounding-box

来源：互联网发布：淮安seo大牛编辑：程序博客网时间：2024/05/18 17:58

首先介绍一下工作要求：针对nyu_depth_v2_labeled.mat数据，实现用python读取mat文件里面的数据并在mat中图片上画出每个物品的bounding-box。
首先，通过h5_file = h5py.File(“nyu_depth_v2_labeled.mat”)用h5py将mat数据转化为矩阵;

file=scipy.io.loadmat('splits.mat')##遍历mat文件中所有的items并输出每个items的数值for name,value in f.items():    print "Name ", name    print "Value", valuevariables = f.items()for var in variables:    name = var[0]    data = var[1]    print "Name ", name  # Name    if type(data) is h5py.Dataset:        # If DataSet pull the associated Data        # If not a dataset, you may need to access the element sub-items        value = data.value        print "Value", value  # NumPy Array / Value

通过上面的代码可以查看mat文件中存储的所有字段：

annotation:  N=1449,number of imagesaccelData:4     4*1449 accelerometer values indicated when each frame was taken:contain the roll, yaw, pitch and tilt angle of the device.depths:1449     in-painted depth maps:HxWxN=480*640*1449     H and W are the height and widthimages:1449     RGB images:HxWx3xN=480*640*3*1449instances:1449  instance maps:HxWxN=480*640*1449 labels:1449     HxWxN=480*640*1449  range from 1..C where C is the total number of classes.                 If a pixel’s label value is 0, then that pixel is ‘unlabeled’.names:1         Cx1 cell array of the english names of each class.    C=894 namesToIds:     map from english label names to class IDs (with C key-value pairs)                 1*6 [3707764736，2，1，1，1，1] rawDepthFilenames:1    Nx1 cell array of the filenames (in the Raw dataset)                        used for each of the depth images in the labeled dataset.rawDepths:1449  raw depth maps:HxWxN=480*640*1449                These depth maps capture the depth images after they have been projected onto the RGB image plane but before the missing depth values have been filled in.                Additionally, the depth non-linearity from the Kinect device has been removed                 and the values of each depth image are in meters.rawRgbFilenames:1      Nx1 cell array of the filenames (in the Raw dataset)                       used for each of the RGB images in the labeled dataset.sceneTypes:1           Nx1 cell array of the scene type from which each image was taken.scenes:1               Nx1 cell array of the name of the scene from which each image was taken.

通过labels里面存放的每张图片中所有物品所属的类别标签，来将每种物品用bounding-box框出来

    labels = h5_file['labels']   # 640*480    images = h5_file['images']   # 640*480    scenes = [u''.join(unichr(c) for c in h5_file[obj_ref]) for obj_ref in              h5_file['sceneTypes'][0]]    print("processing images")    for i, image in enumerate(images):        print("image", i + 1, "/", len(images))        draw_box(i, scenes[i], image.T, labels[i, :, :].T)

获取mat数据里面的images、labels和scenes数据，在images里面的图片画框，labels用来获取每张图片中物品的类别，scenes可将框好后图片按照场景类别进行分类保存。
接下来就是画bounding-box的关键代码：
先计算每张图中所有的类别标签，存储到list中：

def draw_box(i, scene, image, label):    L=[]    shape = list(label.shape) + [3]    for j in xrange(shape[0]):        for k in xrange(shape[1]):            if (label[j, k]!=0):                L.append(label[j,k])    L1=list(set(L))

获取每个像素点所属的类别，0为背景类，不计算在内;由于同类别的像素点很多，所以最后再对整个list进行去重，最后保存在list中的就是没有重复的几个类别（每张图包含的类别都不一样）
接下来就是按照list里面的类别对整张图片进行遍历，找出相同类别的像素块：

    image=image.copy()    for X in L1:         minX = shape[0]        minY = shape[1]        maxX = maxY = 0        for j in xrange(shape[0]):            for k in xrange(shape[1]):                if (label[j, k]==X):                    if (k<minX): minX=k                    if (k>maxX): maxX=k                    if (j<minY): minY=j                    if (j>maxY): maxY=j        cv2.rectangle(image, (minX, minY), (maxX, maxY), (0, 255, 0), 2)    imsave("%s/%05d_bounding_box.png" % (folder, i), image)

由于opencv和python的接口有一定的bug，不能直接在image上画框，需要先copy一份
注意，python中获取到的labels的shape，shape[0]代表高H，shape[1]代表宽W
在计算bounding-box左上角和右下角的坐标时，要将k与X进行比较，将j与Y进行比较。

阅读全文

0 0