深度学习: Faster R-CNN 网络

来源：互联网发布：ajax json java前端编辑：程序博客网时间：2024/05/22 01:38

Structure

这里写图片描述

Loss Computation

多任务：
Faster R-CNN论文笔记——FR
这里写图片描述

Fast R-CNN网络有两个同级输出层（cls score和bbox_prdict层），都是全连接层，称为multi-task。
① clsscore层：用于分类，输出k+1维数组p，表示属于k类和背景的概率。对每个RoI（Region of Interesting）输出离散型概率分布
这里写图片描述
通常，p由k+1类的全连接层利用softmax计算得出。

② bbox_prdict层：用于调整候选区域位置，输出bounding box回归的位移，输出4*K维数组t，表示分别属于k类时，应该平移缩放的参数。
这里写图片描述
k表示类别的索引，是指相对于objectproposal尺度不变的平移，是指对数空间中相对于objectproposal的高与宽。
loss_cls层评估分类损失函数。由真实分类u对应的概率决定：

loss_bbox评估检测框定位的损失函数。比较真实分类对应的预测平移缩放参数这里写图片描述和真实平移缩放参数为
的差别：

其中，smooth L1损失函数为：

smooth L1损失函数曲线如下图9所示，作者这样设置的目的是想让loss对于离群点更加鲁棒，相比于L2损失函数，其对离群点、异常值（outlier）不敏感，可控制梯度的量级使训练时不容易跑飞。
这里写图片描述

最后总损失为（两者加权和，如果分类为背景则不考虑定位损失）：
这里写图片描述
规定u=0为背景类（也就是负标签），那么艾弗森括号指数函数[u≥1]表示背景候选区域即负样本不参与回归损失，不需要对候选区域进行回归操作。λ控制分类损失和回归损失的平衡。Fast R-CNN论文中，所有实验λ=1。
艾弗森括号指数函数为：
这里写图片描述
源码中bbox_loss_weights用于标记每一个bbox是否属于某一个类。

Code

附上作者的源码 rbgirshick/py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_end2end/train.prototxt ：

name: "ZF"layer {  name: 'input-data'  type: 'Python'  top: 'data'  top: 'im_info'  top: 'gt_boxes'  python_param {    module: 'roi_data_layer.layer'    layer: 'RoIDataLayer'    param_str: "'num_classes': 21"  }}#========= conv1-conv5 ============layer {    name: "conv1"    type: "Convolution"    bottom: "data"    top: "conv1"    param { lr_mult: 1.0 }    param { lr_mult: 2.0 }    convolution_param {        num_output: 96        kernel_size: 7        pad: 3        stride: 2    }}layer {    name: "relu1"    type: "ReLU"    bottom: "conv1"    top: "conv1"}layer {    name: "norm1"    type: "LRN"    bottom: "conv1"    top: "norm1"    lrn_param {        local_size: 3        alpha: 0.00005        beta: 0.75        norm_region: WITHIN_CHANNEL    engine: CAFFE    }}layer {    name: "pool1"    type: "Pooling"    bottom: "norm1"    top: "pool1"    pooling_param {        kernel_size: 3        stride: 2        pad: 1        pool: MAX    }}layer {    name: "conv2"    type: "Convolution"    bottom: "pool1"    top: "conv2"    param { lr_mult: 1.0 }    param { lr_mult: 2.0 }    convolution_param {        num_output: 256        kernel_size: 5        pad: 2        stride: 2    }}layer {    name: "relu2"    type: "ReLU"    bottom: "conv2"    top: "conv2"}layer {    name: "norm2"    type: "LRN"    bottom: "conv2"    top: "norm2"    lrn_param {        local_size: 3        alpha: 0.00005        beta: 0.75        norm_region: WITHIN_CHANNEL    engine: CAFFE    }}layer {    name: "pool2"    type: "Pooling"    bottom: "norm2"    top: "pool2"    pooling_param {        kernel_size: 3        stride: 2        pad: 1        pool: MAX    }}layer {    name: "conv3"    type: "Convolution"    bottom: "pool2"    top: "conv3"    param { lr_mult: 1.0 }    param { lr_mult: 2.0 }    convolution_param {        num_output: 384        kernel_size: 3        pad: 1        stride: 1    }}layer {    name: "relu3"    type: "ReLU"    bottom: "conv3"    top: "conv3"}layer {    name: "conv4"    type: "Convolution"    bottom: "conv3"    top: "conv4"    param { lr_mult: 1.0 }    param { lr_mult: 2.0 }    convolution_param {        num_output: 384        kernel_size: 3        pad: 1        stride: 1    }}layer {    name: "relu4"    type: "ReLU"    bottom: "conv4"    top: "conv4"}layer {    name: "conv5"    type: "Convolution"    bottom: "conv4"    top: "conv5"    param { lr_mult: 1.0 }    param { lr_mult: 2.0 }    convolution_param {        num_output: 256        kernel_size: 3        pad: 1        stride: 1    }}layer {    name: "relu5"    type: "ReLU"    bottom: "conv5"    top: "conv5"}#========= RPN ============layer {  name: "rpn_conv/3x3"  type: "Convolution"  bottom: "conv5"  top: "rpn/output"  param { lr_mult: 1.0 }  param { lr_mult: 2.0 }  convolution_param {    num_output: 256    kernel_size: 3 pad: 1 stride: 1    weight_filler { type: "gaussian" std: 0.01 }    bias_filler { type: "constant" value: 0 }  }}layer {  name: "rpn_relu/3x3"  type: "ReLU"  bottom: "rpn/output"  top: "rpn/output"}#layer {#  name: "rpn_conv/3x3"#  type: "Convolution"#  bottom: "conv5"#  top: "rpn_conv/3x3"#  param { lr_mult: 1.0 }#  param { lr_mult: 2.0 }#  convolution_param {#    num_output: 192#    kernel_size: 3 pad: 1 stride: 1#    weight_filler { type: "gaussian" std: 0.01 }#    bias_filler { type: "constant" value: 0 }#  }#}#layer {#  name: "rpn_conv/5x5"#  type: "Convolution"#  bottom: "conv5"#  top: "rpn_conv/5x5"#  param { lr_mult: 1.0 }#  param { lr_mult: 2.0 }#  convolution_param {#    num_output: 64#    kernel_size: 5 pad: 2 stride: 1#    weight_filler { type: "gaussian" std: 0.0036 }#    bias_filler { type: "constant" value: 0 }#  }#}#layer {#  name: "rpn/output"#  type: "Concat"#  bottom: "rpn_conv/3x3"#  bottom: "rpn_conv/5x5"#  top: "rpn/output"#}#layer {#  name: "rpn_relu/output"#  type: "ReLU"#  bottom: "rpn/output"#  top: "rpn/output"#}layer {  name: "rpn_cls_score"  type: "Convolution"  bottom: "rpn/output"  top: "rpn_cls_score"  param { lr_mult: 1.0 }  param { lr_mult: 2.0 }  convolution_param {    num_output: 18   # 2(bg/fg) * 9(anchors)    kernel_size: 1 pad: 0 stride: 1    weight_filler { type: "gaussian" std: 0.01 }    bias_filler { type: "constant" value: 0 }  }}layer {  name: "rpn_bbox_pred"  type: "Convolution"  bottom: "rpn/output"  top: "rpn_bbox_pred"  param { lr_mult: 1.0 }  param { lr_mult: 2.0 }  convolution_param {    num_output: 36   # 4 * 9(anchors)    kernel_size: 1 pad: 0 stride: 1    weight_filler { type: "gaussian" std: 0.01 }    bias_filler { type: "constant" value: 0 }  }}layer {   bottom: "rpn_cls_score"   top: "rpn_cls_score_reshape"   name: "rpn_cls_score_reshape"   type: "Reshape"   reshape_param { shape { dim: 0 dim: 2 dim: -1 dim: 0 } }}layer {  name: 'rpn-data'  type: 'Python'  bottom: 'rpn_cls_score'  bottom: 'gt_boxes'  bottom: 'im_info'  bottom: 'data'  top: 'rpn_labels'  top: 'rpn_bbox_targets'  top: 'rpn_bbox_inside_weights'  top: 'rpn_bbox_outside_weights'  python_param {    module: 'rpn.anchor_target_layer'    layer: 'AnchorTargetLayer'    param_str: "'feat_stride': 16"  }}layer {  name: "rpn_loss_cls"  type: "SoftmaxWithLoss"  bottom: "rpn_cls_score_reshape"  bottom: "rpn_labels"  propagate_down: 1  propagate_down: 0  top: "rpn_cls_loss"  loss_weight: 1  loss_param {    ignore_label: -1    normalize: true  }}layer {  name: "rpn_loss_bbox"  type: "SmoothL1Loss"  bottom: "rpn_bbox_pred"  bottom: "rpn_bbox_targets"  bottom: 'rpn_bbox_inside_weights'  bottom: 'rpn_bbox_outside_weights'  top: "rpn_loss_bbox"  loss_weight: 1  smooth_l1_loss_param { sigma: 3.0 }}#========= RoI Proposal ============layer {  name: "rpn_cls_prob"  type: "Softmax"  bottom: "rpn_cls_score_reshape"  top: "rpn_cls_prob"}layer {  name: 'rpn_cls_prob_reshape'  type: 'Reshape'  bottom: 'rpn_cls_prob'  top: 'rpn_cls_prob_reshape'  reshape_param { shape { dim: 0 dim: 18 dim: -1 dim: 0 } }}layer {  name: 'proposal'  type: 'Python'  bottom: 'rpn_cls_prob_reshape'  bottom: 'rpn_bbox_pred'  bottom: 'im_info'  top: 'rpn_rois'#  top: 'rpn_scores'  python_param {    module: 'rpn.proposal_layer'    layer: 'ProposalLayer'    param_str: "'feat_stride': 16"  }}#layer {#  name: 'debug-data'#  type: 'Python'#  bottom: 'data'#  bottom: 'rpn_rois'#  bottom: 'rpn_scores'#  python_param {#    module: 'rpn.debug_layer'#    layer: 'RPNDebugLayer'#  }#}layer {  name: 'roi-data'  type: 'Python'  bottom: 'rpn_rois'  bottom: 'gt_boxes'  top: 'rois'  top: 'labels'  top: 'bbox_targets'  top: 'bbox_inside_weights'  top: 'bbox_outside_weights'  python_param {    module: 'rpn.proposal_target_layer'    layer: 'ProposalTargetLayer'    param_str: "'num_classes': 21"  }}#========= RCNN ============layer {  name: "roi_pool_conv5"  type: "ROIPooling"  bottom: "conv5"  bottom: "rois"  top: "roi_pool_conv5"  roi_pooling_param {    pooled_w: 6    pooled_h: 6    spatial_scale: 0.0625 # 1/16  }}layer {  name: "fc6"  type: "InnerProduct"  bottom: "roi_pool_conv5"  top: "fc6"  param { lr_mult: 1.0 }  param { lr_mult: 2.0 }  inner_product_param {    num_output: 4096  }}layer {  name: "relu6"  type: "ReLU"  bottom: "fc6"  top: "fc6"}layer {  name: "drop6"  type: "Dropout"  bottom: "fc6"  top: "fc6"  dropout_param {    dropout_ratio: 0.5    scale_train: false  }}layer {  name: "fc7"  type: "InnerProduct"  bottom: "fc6"  top: "fc7"  param { lr_mult: 1.0 }  param { lr_mult: 2.0 }  inner_product_param {    num_output: 4096  }}layer {  name: "relu7"  type: "ReLU"  bottom: "fc7"  top: "fc7"}layer {  name: "drop7"  type: "Dropout"  bottom: "fc7"  top: "fc7"  dropout_param {    dropout_ratio: 0.5    scale_train: false  }}layer {  name: "cls_score"  type: "InnerProduct"  bottom: "fc7"  top: "cls_score"  param { lr_mult: 1.0 }  param { lr_mult: 2.0 }  inner_product_param {    num_output: 21    weight_filler {      type: "gaussian"      std: 0.01    }    bias_filler {      type: "constant"      value: 0    }  }}layer {  name: "bbox_pred"  type: "InnerProduct"  bottom: "fc7"  top: "bbox_pred"  param { lr_mult: 1.0 }  param { lr_mult: 2.0 }  inner_product_param {    num_output: 84    weight_filler {      type: "gaussian"      std: 0.001    }    bias_filler {      type: "constant"      value: 0    }  }}layer {  name: "loss_cls"  type: "SoftmaxWithLoss"  bottom: "cls_score"  bottom: "labels"  propagate_down: 1  propagate_down: 0  top: "cls_loss"  loss_weight: 1  loss_param {    ignore_label: -1    normalize: true  }}layer {  name: "loss_bbox"  type: "SmoothL1Loss"  bottom: "bbox_pred"  bottom: "bbox_targets"  bottom: 'bbox_inside_weights'  bottom: 'bbox_outside_weights'  top: "bbox_loss"  loss_weight: 1}

阅读全文

0 0