Tensorflow object detection API 源码阅读笔记：基本类（1）

来源：互联网发布：java反射之getmethod 编辑：程序博客网时间：2024/05/21 09:31

之前主要在结合paper看架构，在进入Fast R-CNN部分之前，先仔细研究一下一些基本类的具体实现。和当时读OpenFOAM的源代码套路差不多，不过OpenFOAM的变态c++让人特别绝望。

"""object_detection/core/matcher.py"""class Match(object):  """Class to store results from the matcher.  This class is used to store the results from the matcher. It provides  convenient methods to query the matching results."""    matched_column_indices() #输出match_results中大于-1的列id    matched_row_indices（） #输出match_results中大于-1的列的值    matched_column_indicator() #输出match_results中每一列是否matchclass Matcher(object):  """Abstract base class for matcher.  """  match函数：      with tf.name_scope(scope, 'Match', [similarity_matrix, params]) as scope:          return Match(self._match(similarity_matrix, **params))          #这里Match是前面定义的类，用于处理match_results；_match是一个抽象函数，在子类中实现，用于生成match_results。match_results: Integer tensor of shape [M]: match_results[i]>=0 means        that column i is matched to row match_results[i], match_results[i]=-1        means that the column is not matched. match_results[i]=-2 means that        the column is ignored (usually this happens when there is a very weak        match which one neither wants as positive nor negative example).目测similarity_matrix中行是表示anchor，列是表示ground-truth box。

"""object_detection/matchers/argmax_matcher.py上述match_results的含义能work就是因为这个类在_match函数中事先将相似矩阵的每一列取了最大值。"""def _match(self, similarity_matrix):    return tf.cond(        tf.greater(tf.shape(similarity_matrix)[0], 0),        _match_when_rows_are_non_empty, _match_when_rows_are_empty)        """When the rows are empty, all detections are false positives. So we return        a tensor of -1's to indicate that the columns do not match to any rows.        主要看_match_when_rows_are_non_empty函数。        函数很长，大意是按照阈值将相似性矩阵的行压缩成一行，取值按照上述match_results的含义编码，符合预期。"""结合上次看的core/target_assigner.py中的create_target_assigner函数。  return TargetAssigner(similarity_calc, matcher, box_coder,                        positive_class_weight=positive_class_weight,                        negative_class_weight=negative_class_weight,                        unmatched_cls_target=unmatched_cls_target)这个返回的TargetAssigner在faster_rcnn_meta_arch.py中就是    self._proposal_target_assigner = target_assigner.create_target_assigner(        'FasterRCNN', 'proposal')然后在target_assigner.py的batch_assign_targets函数中使用，注意这个不是TargetAssigner类的成员函数。用的下面要阅读的几个基本类，留待本文最后看以更好的理解。"""

"""object_detection/core/box_list.py和box_list_ops.py"""其实很简单：BoxList represents a list of bounding boxes as tensorflowtensors, where each bounding box is represented as a row of 4 numbers,[y_min, x_min, y_max, x_max].  It is assumed that all bounding boxeswithin a given list correspond to a single image.注意数据是以字典的方式存的: self.data = {'boxes': boxes}。而box_list_ops则包含了一些对box坐标列表的一些操作。

"""object_detection/core/region_similarity_calculator.py"""也很简单：Region Similarity Calculators for BoxLists.Region Similarity Calculators compare a pairwise measure of similaritybetween the boxes in two BoxLists.具体实现则是相当于对box_list_ops中的函数进行了一次封装。如class IouSimilarity(RegionSimilarityCalculator)调用的是box_list_ops.iou(boxlist1, boxlist2)。

"""object_detection/core/box_predictor.py"""这个类很重要。Box predictors are classes that take a high levelimage feature map as input and produce two predictions,(1) a tensor encoding box locations, and(2) a tensor encoding classes for each box我们前面遇到过class ConvolutionalBoxPredictor(BoxPredictor),已经分析过，就是通过卷积拟合下面两个量：        box_encodings: A float tensor of shape [batch_size, num_anchors, 1,          code_size] representing the location of the objects, where          num_anchors = feat_height * feat_width * num_predictions_per_location        class_predictions_with_background: A float tensor of shape          [batch_size, num_anchors, num_classes + 1] representing the class          predictions for the proposals  Currently this box predictor assumes that predictions are "shared" across  classes --- that is each anchor makes box predictions which do not depend  on class.这句话的意思是说box_encodings中的anchor坐标是与类别无关的，即所有类别都用同一套anchor的坐标预测值。注意class MaskRCNNBoxPredictor(BoxPredictor)不是这样，留待下次细读。

"""object_detection/core/box_coder.py"""简单： Box coders convert between coordinate frames, namely image-centric(with (0,0) on the top left of image) and anchor-centric (with (0,0) beingdefined by a specific anchor).

"""object_detection/box_coders/faster_rcnn_box_coder.py"""这里是box_coder的具体实现，与论文中描述的一致：Faster RCNN box coder follows the coding schema described below:  ty = (y - ya) / ha  tx = (x - xa) / wa  th = log(h / ha)  tw = log(w / wa)  where x, y, w, h denote the box's center coordinates, width and height  respectively. Similarly, xa, ya, wa, ha denote the anchor's center  coordinates, width and height. tx, ty, tw and th denote the anchor-encoded  center, width and height respectively.

'''object_detection/core/target_assigner.py这个类能将前面看过的几个基本类综合起来，我们仔细看看。Base target assigner module.The job of a TargetAssigner is, for a given set of anchors (bounding boxes) andgroundtruth detections (bounding boxes), to assign classification and regressiontargets to each anchor as well as weights to each anchor (specifying, e.g.,which anchors should not contribute to training loss).It assigns classification/regression targets by performing the following steps:1) Computing pairwise similarity between anchors and groundtruth boxes using a  provided RegionSimilarity Calculator2) Computing a matching based on the similarity matrix using a provided Matcher3) Assigning regression targets based on the matching and a provided BoxCoder4) Assigning classification targets based on the matching and groundtruth labels5) Note that TargetAssigners only operate on detections from a singleimage at a time, so any logic for applying a TargetAssigner to multipleimages must be handled externally.'''结合之前的分析和本文基本类的分析，上面这段注释应该容易看懂了，实现在这个函数中：def batch_assign_targets(target_assigner,                         anchors_batch,                         gt_box_batch,                         gt_class_targets_batch)1）中是用IOU；2）中是依据阈值判断；3）中注意就是box coder；4）中每个anchor的位置和类别targets是在assign()函数中生成，当然，里面调用了本文开头介绍的match类和class IouSimilarity(RegionSimilarityCalculator):      match_quality_matrix = self._similarity_calc.compare(groundtruth_boxes,                                                           anchors)      match = self._matcher.match(match_quality_matrix, **params)      reg_targets = self._create_regression_targets(anchors,                                                    groundtruth_boxes,                                                    match)      cls_targets = self._create_classification_targets(groundtruth_labels,                                                        match)      reg_weights = self._create_regression_weights(match)      cls_weights = self._create_classification_weights(          match, self._positive_class_weight, self._negative_class_weight)   注意，这里返回的weights不是为了平衡分类和坐标拟合的loss，而是选择哪些anchor对loss有贡献，即matched还是unmatched。targets中给matched的anchor分配了对应的分类和坐标拟合的ground truth。unmatched因为权重是0，随便给个值就行了。但是注意，loss_rpn的计算中是没有区分类别的，只区分是否存在object，见文末。5) 的意思是说在类的外面定义了一个函数batch_assign_targets来处理多张图片（batch wise）。采用了for循环。性能是否会有问题？这个函数在_loss_rpn中被调用。def batch_assign_targets(target_assigner,                         anchors_batch,                         gt_box_batch,                         gt_class_targets_batch)

"""faster_rcnn_meta_arch._loss_rpn输入：                rpn_box_encodings  # predicted box与anchor box之间坐标的差值                rpn_objectness_predictions_with_background,                anchors,                groundtruth_boxlists,                groundtruth_classes_with_background_list输出：      a dictionary mapping loss keys (`first_stage_localization_loss`,        `first_stage_objectness_loss`) to scalar tensors representing        corresponding loss values."""

阅读全文

0 0