多阶段细化分割-iccv2017-A Stagewise Refinement Model for Detecting Salient Objects in Images

来源：互联网发布：怎么样打造淘宝爆款. 编辑：程序博客网时间：2024/05/22 15:50

读论文《A Stagewise Refinement Model for Detecting Salient Objects in Images》，

论文源码已经公布：https://github.com/TiantianWang/ICCV17_SRM

里面提到了PPM，

查PPM的实现，找到论文《Pyramid Scene Parsing Network》（CVPR2018），基本上与这篇论文一致，只是最后的concat不一样。

PSPnet论文有caffe实现的源码，地址在：https://github.com/hszhao/PSPNet

把里面的ppm实现和双插值的prototxt贴在这，以备以后使用。

layer {
name: "conv5_3_pool1"
type: "Pooling"
bottom: "conv5_3"
top: "conv5_3_pool1"
pooling_param {
pool: AVE
kernel_size: 60
stride: 60
}
}

类似的kernel_size 和stride 改成30,20,10。共四个平均池化，生成1*1,2*2,3*3,6*6的池化bin。

之后的双线性插值上采样是用了一个新的层Interp。

layer {
name: "conv5_3_pool1_interp"
type: "Interp"
bottom: "conv5_3_pool1_conv"
top: "conv5_3_pool1_interp"
interp_param {
height: 60
width: 60
}

caffe.proto中的参数设置：

message InterpParameter {
optional int32 height = 1 [default = 0]; // Height of output
optional int32 width = 2 [default = 0]; // Width of output
optional int32 zoom_factor = 3 [default = 1]; // zoom factor
optional int32 shrink_factor = 4 [default = 1]; // shrink factor
optional int32 pad_beg = 5 [default = 0]; // padding at begin of input
optional int32 pad_end = 6 [default = 0]; // padding at end of input
}

PSPnet中提到的中间损失auxiliary loss的描述，只在训练阶段用，测试时去掉，因此在caffe的源码中没有找到。

在知乎中有关于这篇论文的讨论，https://www.zhihu.com/question/53356671

在说这个额外损失，有人解释说：只是额外一个正常的softmax loss

Apart from the main branchusing softmax loss to train the final classifier, another classifieris applied after the fourth stage, i.e., the res4b22residue block. Different from relay backpropagation that blocks the backward auxiliary loss to several shallowlayers, we let the two loss functions pass through all previouslayers. The auxiliary loss helps optimize the learningprocess, while the master branch loss takes the most responsibility.We add weight to balance the auxiliary loss.

Setting an appropriate loss weight α in the auxiliarybranch is important. ‘AL’ denotes the auxiliary loss. Baseline isResNet50-based FCN with dilated network. Empirically, α = 0.4 yields the best performance.

阅读全文

0 0