py-faster-rcnn代码roidb.py的解读
来源:互联网 发布:ovid数据库优点 编辑:程序博客网 时间:2024/06/05 10:38
roidb是比较复杂的数据结构,存放了数据集的roi信息。原始的roidb来自数据集,在trian.py的get_training_roidb(imdb)函数进行了水平翻转扩充数量,然后prepare_roidb(imdb)【定义在roidb.py】为roidb添加了一些说明性的属性。
在这里暂时记录下roidb的结构信息,后面继续看的时候可能会有些修正:
roidb是由字典组成的list,roidb[img_index]包含了该图片索引所包含到roi信息,下面以roidb[img_index]为例说明:
roidb[img_index]包含的key,valueboxesbox位置信息,box_num*4的np arraygt_overlaps所有box在不同类别的得分,box_num*class_num矩阵gt_classes所有box的真实类别,box_num长度的listflipped是否翻转 image该图片的路径,字符串width图片的宽height 图片的高max_overlaps每个box的在所有类别的得分最大值,box_num长度max_classes每个box的得分最高所对应的类,box_num长度bbox_targets每个box的类别,以及与最接近的gt-box的4个方位偏移(共5列)
def add_bbox_regression_targets(roidb): """Add information needed to train bounding-box regressors.""" assert len(roidb) > 0 assert 'max_classes' in roidb[0], 'Did you call prepare_roidb first?' num_images = len(roidb) # Infer number of classes from the number of columns in gt_overlaps # 类别数,roidb[0]对应第0号图片上的roi,shape[1]多少列表示roi属于不同类上的概率 num_classes = roidb[0]['gt_overlaps'].shape[1] for im_i in xrange(num_images): rois = roidb[im_i]['boxes'] max_overlaps = roidb[im_i]['max_overlaps'] max_classes = roidb[im_i]['max_classes'] # bbox_targets:每个box的类别,以及与最接近的gt-box的4个方位偏移 roidb[im_i]['bbox_targets'] = \ _compute_targets(rois, max_overlaps, max_classes) # 这里config是false if cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED: # Use fixed / precomputed "means" and "stds" instead of empirical values # 使用固定的均值和方差代替经验值 means = np.tile( np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS), (num_classes, 1)) stds = np.tile( np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS), (num_classes, 1)) else: # Compute values needed for means and stds # 计算所需的均值和方差 # var(x) = E(x^2) - E(x)^2 # 计数各个类别出现box的数量 class_counts = np.zeros((num_classes, 1)) + cfg.EPS #加上cfg.EPS防止除0出错 # 21类*4个位置,如果出现box的类别与其中某一类相同,将该box的4个target加入4个列元素中 sums = np.zeros((num_classes, 4)) # 21类*4个位置,如果出现box的类别与其中某一类相同,将该box的4个target的平方加入4个列元素中 squared_sums = np.zeros((num_classes, 4)) for im_i in xrange(num_images): targets = roidb[im_i]['bbox_targets'] for cls in xrange(1, num_classes): cls_inds = np.where(targets[:, 0] == cls)[0] # box的类别与该类匹配,计入 if cls_inds.size > 0: class_counts[cls] += cls_inds.size sums[cls, :] += targets[cls_inds, 1:].sum(axis=0) squared_sums[cls, :] += \ (targets[cls_inds, 1:] ** 2).sum(axis=0) means = sums / class_counts # 均值 stds = np.sqrt(squared_sums / class_counts - means ** 2) #标准差 print 'bbox target means:' print means print means[1:, :].mean(axis=0) # ignore bg class print 'bbox target stdevs:' print stds print stds[1:, :].mean(axis=0) # ignore bg class # Normalize targets # 对每一box归一化target if cfg.TRAIN.BBOX_NORMALIZE_TARGETS: print "Normalizing targets" for im_i in xrange(num_images): targets = roidb[im_i]['bbox_targets'] for cls in xrange(1, num_classes): cls_inds = np.where(targets[:, 0] == cls)[0] roidb[im_i]['bbox_targets'][cls_inds, 1:] -= means[cls, :] roidb[im_i]['bbox_targets'][cls_inds, 1:] /= stds[cls, :] else: print "NOT normalizing targets" # 均值和方差也用于预测 # These values will be needed for making predictions # (the predicts will need to be unnormalized and uncentered) return means.ravel(), stds.ravel() # ravel()排序拉成一维def _compute_targets(rois, overlaps, labels): # 参数rois只含有当前图片的box信息 """Compute bounding-box regression targets for an image.""" # Indices目录 of ground-truth ROIs # ground-truth ROIs gt_inds = np.where(overlaps == 1)[0] if len(gt_inds) == 0: # Bail if the image has no ground-truth ROIs # 不存在gt ROI,返回空数组 return np.zeros((rois.shape[0], 5), dtype=np.float32) # Indices of examples for which we try to make predictions # BBOX阈值,只有ROI与gt的重叠度大于阈值,这样的ROI才能用作bb回归的训练样本 ex_inds = np.where(overlaps >= cfg.TRAIN.BBOX_THRESH)[0] # Get IoU overlap between each ex ROI and gt ROI # 计算ex ROI and gt ROI的IoU ex_gt_overlaps = bbox_overlaps( # 变数据格式为float np.ascontiguousarray(rois[ex_inds, :], dtype=np.float), np.ascontiguousarray(rois[gt_inds, :], dtype=np.float)) # Find which gt ROI each ex ROI has max overlap with: # this will be the ex ROI's gt target # 这里每一行代表一个ex_roi,列代表gt_roi,元素数值代表两者的IoU gt_assignment = ex_gt_overlaps.argmax(axis=1) #按行求最大,返回索引. gt_rois = rois[gt_inds[gt_assignment], :] #每个ex_roi对应的gt_rois,与下面ex_roi数量相同 ex_rois = rois[ex_inds, :] targets = np.zeros((rois.shape[0], 5), dtype=np.float32) targets[ex_inds, 0] = labels[ex_inds] #第一个元素是label targets[ex_inds, 1:] = bbox_transform(ex_rois, gt_rois) #后4个元素是ex_box与gt_box的4个方位的偏移 return targets
下面解读一下这两个函数。
1. _compute_targets(rois, overlaps, labels)
这个函数用来计算roi的偏移量。基本的步骤就是先确认是否含有ground-truth rois,主要通过 ground-truth ROIs的overlaps=1来确认。
然后找到重叠度大于一定阈值的box,再进行计算。
对于满足条件的box,会调用程序bbox_overlaps重新计算box对应于ground-truth box的重叠度,根据最大的重叠度找对应的ground truth box.
这样就可以利用 fast_rcnn.bbox_transform 的bbox_transform计算4个偏移(分别是中心点的x,y坐标,w,d长度偏移)。
输出的是一个二维数组,横坐标是盒子的序号,纵坐标是5维,第一维是类别,第二维到第五维为偏移。
bbox_overlaps的代码如下:
def bbox_overlaps( np.ndarray[DTYPE_t, ndim=2] boxes, np.ndarray[DTYPE_t, ndim=2] query_boxes): """ Parameters ---------- boxes: (N, 4) ndarray of float query_boxes: (K, 4) ndarray of float Returns ------- overlaps: (N, K) ndarray of overlap between boxes and query_boxes """ cdef unsigned int N = boxes.shape[0] cdef unsigned int K = query_boxes.shape[0] cdef np.ndarray[DTYPE_t, ndim=2] overlaps = np.zeros((N, K), dtype=DTYPE) cdef DTYPE_t iw, ih, box_area cdef DTYPE_t ua cdef unsigned int k, n for k in range(K): box_area = ( (query_boxes[k, 2] - query_boxes[k, 0] + 1) * (query_boxes[k, 3] - query_boxes[k, 1] + 1) ) for n in range(N): iw = ( min(boxes[n, 2], query_boxes[k, 2]) - max(boxes[n, 0], query_boxes[k, 0]) + 1 ) if iw > 0: ih = ( min(boxes[n, 3], query_boxes[k, 3]) - max(boxes[n, 1], query_boxes[k, 1]) + 1 ) if ih > 0: ua = float( (boxes[n, 2] - boxes[n, 0] + 1) * (boxes[n, 3] - boxes[n, 1] + 1) + box_area - iw * ih ) overlaps[n, k] = iw * ih / ua return overlaps
2. add_bbox_regression_targets
主要两个两件事: 1. 确定roidb每个图片的box的回归偏移量bbox_targets
2. 对于所有的类别,计算偏移量的均值和方差,这样输出的矩阵是二维,行坐标是种类(这里是21类),纵坐标是偏移量(这里是4).
并且在需要正则化目标项(即cfg.TRAIN.BBOX_NORMALIZE_TARGETS=true)时,使每个偏移都减去均值除以标准差。并返回均值和方差的折叠一维向量,
用于预测(即逆操作,去正则化,则中心化)。
参考:
- py-faster-rcnn代码阅读3-roidb.py
- Faster RCNN roidb.py
- py-faster-rcnn代码roidb.py的解读
- Faster RCNN roidb.py
- Faster-RCNN_TF代码解读16:roi_data_layer/roidb.py
- Faster RCNN minibatch.py解读
- faster rcnn源码解读(四)之数据类型imdb.py和pascal_voc.py(主要是imdb和roidb数据类型的解说)
- faster rcnn源码解读(四)之数据类型imdb.py和pascal_voc.py(主要是imdb和roidb数据类型的解说)
- faster rcnn源码解读(四)之数据类型imdb.py和pascal_voc.py(主要是imdb和roidb数据类型的解说)
- 跑py-faster-rcnn代码
- py-faster-rcnn源码解读系列
- py-faster-rcnn源码解读系列
- 【py-faster-rcnn】各函数作用解读
- py-faster-rcnn测试流程解读
- Faster RCNN train_faster_rcnn_alt_opt.py
- Faster RCNN layer.py
- Faster RCNN train.py
- Faster RCNN generate.py
- Faster RCNN blob.py
- Faster RCNN minibatch.py
- 欢迎使用CSDN-markdown编辑器
- 简单理解IIFE
- 重写UGUI按钮组件button的点击、选中事件
- LeetCode122. Best Time to Buy and Sell Stock II
- OpenStack Bandit项目介绍
- py-faster-rcnn代码roidb.py的解读
- 利用接口:设计动物声音“模拟器”,希望模拟器可以模拟许多动物的叫声。
- lab 2
- Java 如何判断线程池所有任务是否执行完毕
- Hive整合HBase 通过Hive读/写HBase中的表
- linux下GCC的安装
- 一个10分钟神经网络演讲
- [树状数组][codevs1081]线段树练习2
- Java 中字符串的处理