READING NOTE: R-FCN: Object Detection via Region-based Fully Convolutional Networks

来源：互联网发布：逆波兰式算法的栈图编辑：程序博客网时间：2024/05/16 13:58

TITLE: R-FCN: Object Detection via Region-based Fully Convolutional Networks

AUTHER: Jifeng Dai, Yi Li, Kaiming He, Jian Sun

ASSOCIATION: MSRA, Tsinghua University

FROM: arXiv:1605.06409

A framework called Region-based Fully Convolutional Network (R-FCN) is develpped for object detection, which consists of shared, fully convolutional architectures.
A set of position-sensitive score maps are introduced to enalbe FCN representing translation variance.
A unique ROI pooling method is proposed to shepherd information from metioned score maps.

The image is processed by a FCN manner network.
At the end of FCN, a RPN (Region Proposal Network) is used to generate ROIs.
On the other hand, a score map of k2(C+1) channels is generated using a bank of specialized convolutional layers.
For each ROI, a selective ROI pooling is utilized to generate a C+1 channel score map.
The scores in the score map are averaged to vote for category.
Another 4k2 dim convolutional layer is learned for bounding box regression.

Training Details

R-FCN is trained end-to-end with pre-computed region proposals. Both category and position are learnt with the loss function: L(s,tx,y,w,h)=Lcls(sc∗)+λ[c∗>0]Lreg(t,t∗).
For each image, N proposals are generated and B out of N proposals are selected to train weights according to the highest losses. B is set to 128 in this work.
4-step alternating training is utilized to realizing feature sharing between R-FCN and RPN.

It is fast (170ms/image, 2.5-20x faster than Faster R-CNN).
End-to-end training is easier to process.
All learnable layers are convolutional and shared on the entire image, yet encode spatial information required for object detection.

0 0