1.EdgeDetection_1.1.DeepEdge

来源：互联网发布：淘宝网运动鞋女款秋冬编辑：程序博客网时间：2024/05/16 19:15

2015 CVPR

1. Single-Scale Architecture

(1) The KNet is an appropriate model for our setting as it has been trained over a large number of object classes (the 1000 categories of the ImageNet dataset) and and thus captures features that are generic and useful for many object categories.

(2) Its architecture consists of 5 convolutional layers and 3 fully connected layers. we utilize only the first 5 convolutional layers.

(3) The second convolutional layer seems to encode coherent edge structures. The third convolutional layer fires at locations corresponding to prototypical object shapes. The fourth layer appears to generate high responses for full shapes of the object, whereas the fifth layer fires on the specific object parts.

2. Extraction of High-Level Features

(1) We consider a small sub-volume of the feature map stack produced at each layer. The sub-volume is centered at the center of the patch in order to assess the presence of a contour in a small area around the candidate point.

(2) We perform max, average, and center pooling on this sub-volume. We define center pooling as selecting the center-value from each of the feature maps.

(3) Because the candidate point is located at the center of the input patch, center pooling extracts the activation value from the location that corresponds to our candidate point location.

3. Bifurcated Sub-NetWork

(1) We connect the feature maps computed via pooling from the five convolutional layers to two separately-trained network branches. Each branch consists of two fully-connected layers.

(2) The first branch is trained using binary labels to perform contour classification. This branch is making less selective predictions by classifying whether a given point is a contour or not.

(3) The second branch is optimized as a regressor to predict the fraction of human labelers agreeing about the contour presence at a particular point. It is trained to learn the structural differences between the contours that are marked by a different fraction of human labelers.

(4) At testing time, the scalar outputs computed from these two sub-networks are averaged to produce a final score indicative of the probability that the candidate point is a contour.

4. Other parts

(1) Loss function: Both branches optimize the sum of squared difference loss over the (binary or continuous) labels.

(2) Training data

Binary labels: we first sample 40000 positive examples that were marked as contours by at least one of the labelers.

Negative examples: we consider the points that were selected as candidate contour points by the Canny edge detector but that have not been marked as contours by any of the human labelers.

Regression labels: the fraction of human labelers that marked the point as a contour.

5. MultiScale Architecture

(1) We extract patches around the candidate point for different patch sizes so that they cover different spatial extents of the image. We then resize the patches to fit the KNet input and pump them in parallel through the five convolutional layers.

(2) The sizes of patches are 64*64, 128*128, 196*196 and a full-sized image. All of the patches are then resized to the KNet input dimensions of 227*227.

(3) We use sub-volumes of convolutional feature maps having spatial sizes 7*7, 5*5, 3*3, 3*3, and 3*3 for the convolutional layers 1, 2, 3, 4, 5. Our choice of sub-volume sizes is made to ensure we are roughly considering the same spatial extent of the original image at each layer.

0 0