CNN入门paper

来源：互联网发布：维生素c 知乎编辑：程序博客网时间：2024/05/18 18:20

Convolutional Networks for Images, Speech, and Time-Series

INTRODUCTION

multilayer back-propagation networks适用于图像识别（见PATTERN RECOGNITION AND NEURAL NETWORKS）
模式识别的traditional model中，需要：
- hand-designed feature extractor：用来从输入图像中获取信息，并消除无关变量
- trainable classier：将得到的feature vectors (or strings of symbols)分类
- standard, fully-connected multilayer networks 可被用作classifier
传统模型的缺点：
- 上千变量；
- fully-connected architecture忽略了输入中的拓扑结构

CONVOLUTIONAL NETWORKS

CNN中的三种architectural ideas（保证shift and distortion invariance）：
- local receptive fields
- shared weights (or weight replication)
- sometimes, spatial or temporal subsampling

VARIABLE-SIZE CONVOLUTIONAL NETWORKS, SDNN

A guide to convolution arithmetic for deep learning

Introduction

卷积层的输出shape受输入shape、kernel shape的选择、 zero padding and strides影响，而且这些性质之间的关系不易得到。但fully-connected layers的output size与the input size之间是独立的。

Discrete convolutions

神经网络主要源于仿射变换（affine transformation）：输入一个vector，给它乘上一个matrix，得到输出vector，通常还要加上一个 bias vector。也就是线性变换+平移。

齐次坐标：用N+1维来代表N维坐标
images无论是多少维，都可以表示为vector，它有如下性质：
- 可以用多维数组存储表示
- 存在一个或多个axes，如width axes和height axes
- 每一个axis表示该data的不同view（如color image的R、G、B通道）
上述性质在仿射变换中未被利用，所有axes被同等对待，未考虑其topology信息，于是discrete convolutions闪亮登场٩(๑❛ᴗ❛๑)۶
discrete convolution：
- a linear transformation that preserves this notion of ordering
- sparse（只有很少的input影响output）
- reuse parameters（共享权值，不同位置有相同权值）
一个discrete convolution的例子：
- 原始的图像数据称为input feature map
- kernel在原始图像上滑动，每个位置计算：
- 最终得到的结果为output feature maps
- 有多个input feature maps的情况很常见，如图像的不同通道。这种情况下kernel是三维的，也可以说是多个，最后对应元素相加
- 可以利用多个不同的kernels，产生所需数量的output feature maps的例子：
kernels collection相关参数：(n,m,k1,⋯,kN)
- n：output feature maps数量
- m： input feature maps数量
- kj：沿j轴的kernel size
output size oj的影响因素：
- ij：j轴方向的input size
- kj：j轴方向的kernel size
- sj：j轴方向的stride（两个相邻kernel的距离）
- pj：j轴方向的zero padding（起止处添加的0的个数）
strides形成了subsampling，可以理解为控制了kernel表达的程度or输出信息保留的多少

Pooling

Pooling operations reduce the size of feature maps by using some function to summarize subregions, such as taking the average or the maximum value.
Pooling works by sliding a window across the input and feeding the content
of the window to a pooling function.
pooling 层的e output size oj受以下因素影响：
- ij：j轴的input size
- kj：j轴的kernel window size
- sj：j轴的stride
average pooling和max pooling的例子：