【图像目标检测】Faster-RCNN

来源：互联网发布：linux 输出换行编辑：程序博客网时间：2024/05/15 23:44

有了Fast-RCNN后，Faster-RCNN表示为什么还在用selective search呢？这样会出现一种很尴尬的处境，region proposal的速度比后面检测+回归的速度要慢。

所以Faster-RCNN表示做Vision的话，CNN应该要一统江湖，于是把Selective Search也集成到网络中，提出RPN。从而把速度拉升到real-time。

FasterRCNN架构：Faster-RCNN = RPN + Fast-RCNN
这里写图片描述

Region Proposal Networks 结构

-RPN把任意大小的图像作为输入，输出为一系列的rectangular + object proposals

-在对VGG或者ZF进行fine-tuning后，在最后加入了rpnconv+rpnrelu，通过一个3x3的spatial window去对VGG/ZF这类分类网络的特征输出进行滑动，同时对一个中心点进行K个region proposals（anchor）进行预测。如果卷积特征层为WxH的话，则anchors为WHK个
也就是意味着在一个特征图feature map 中进行滑动anchor 窗，对每个中心点本文提出9种模式，分别映射到原始图像上进行region的提出。
9种模式窗：面积（128x128, 256x256, 512x512），对应着(1:1, 1:2, 2:1 )
这里写图片描述

Multi-Scale Anchors as Regression References:

两种主流的multi-scale predictions：
1)基于image/feature pyramids, 对图片/特征进行多尺度的拉伸
2)在feature images上滑动多尺度的window
白话一点就是 1）对输入图像进行多尺度拉伸， 2）对滑动窗进行多尺度拉伸

本文方法主要基于pyramid of anchors， It only relies on images and feature maps of a single scale, and uses filters (sliding windows on the feature map) of a single size
本文的选择方法类似第二种方法

RPN training：

两种 positive label 标注方案：
(i) the anchor/anchors with the highest Intersection-overUnion (IoU) overlap with a ground-truth box,
方案一、在anchor中选择最高IOU的box作为positive label
or (ii) an anchor that has an IoU overlap higher than 0.7 with any ground-truth box
方案二、anchor中IOU大于0.7的标为positive label

本文选择positive label为(i) 的原因：

Usually the second condition is sufficient to determine the positive samples; but we still adopt the first condition for the reason that in some rare cases the second condition may find no positive sample
通常第二种情况在一般情况下足够sufficient，但是第二种情况下在一些特定情况下会出现找不到的positive sample。因此本文选择第一种方案来标注positive label。

negative label
We assign a negative label to a non-positive anchor if its IoU ratio is lower than 0.3 for all ground-truth boxes. Anchors that are neither positive nor negative do not contribute to the training objective.
IOU低于0.3的都标为negative label

Loss Function
这里写图片描述

类似于Fast-RCNN，这里的loss function照样通过一个函数同时考虑了分类损失和box回归。
如果label为positive的时候，p为1，否则p为0。因为不考虑非ROI区域的box regression。

这里的box regression参数非（x,y,w,h）,作者做了一个转换，作者认为这样便于转换，如下：
这里写图片描述

训练方式

由于RPN和Fast-RCNN是独立训练的，因此需要develop a technique 来allows for sharing convolution between two networks, 而不是独立学习两个网络。
作者讨论了三种训练方式：1), Alternating training, 2) Approximate joint training, 3) Non-approximate joint training

这三个训练方式由于我现在未看代码，搬运一下别人的理解。以后补充：
来源：http://blog.csdn.net/shenxiaolu1984/article/details/51152614

1) Alternating training a. 从W0开始，训练RPN。用RPN提取训练集上的候选区域 b.
从W0开始，用候选区域训练Fast RCNN，参数记为W1 c. 从W1开始，训练RPN…
具体操作时，仅执行两次迭代，并在训练时冻结了部分层。论文中的实验使用此方法。如Ross Girshick在ICCV
15年的讲座Training R-CNNs of various
velocities中所述，采用此方法没有什么根本原因，主要是因为”实现问题，以及截稿日期“。

2) Approximate joint training 近似联合训练
直接在上图结构上训练。在backward计算梯度时，把提取的ROI区域当做固定值看待；在backward更新参数时，来自RPN和来自Fast
RCNN的增量合并输入原始特征提取层。此方法和前方法效果类似，但能将训练时间减少20%-25%。公布的python代码中包含此方法。

3) Non-approximate joint training 联合训练
直接在上图结构上训练。但在backward计算梯度时，要考虑ROI区域的变化的影响。推导超出本文范畴，请参看15年NIP论文[6]。

阅读全文

0 1