网络模型--Densely Connected Convolutional Networks

来源：互联网发布：怎么给淘宝刷差评编辑：程序博客网时间：2024/05/22 10:17

Densely Connected Convolutional Networks CVPR2017 best paper

Code: https://github.com/liuzhuang13/DenseNet

本文受到 ResNet] and Highway Networks 的启发： bypass signal from one layer to the next via identity connections，这里主要是多加了几个 identity connections，发现这么干效果很好。

首先看看一个 5层的Dense Block 是怎么Densely Connected
这里写图片描述

上面5层的模块有多少连接了？ 5*（5+1）/2=15

整个网络结构如下图所示：
这里写图片描述

DenseNets
ResNets [11] add a skip-connection that bypasses the non-linear transformations with an identity function

Dense connectivity
这里写图片描述
where [x_0 ,x _1 ,…,x_l−1 ] refers to the concatenation of the feature-maps produced in layers 0,…,l−1

Composite function
H定义为 a composite function of three consecutive operations: batch normalization (BN) , followed by a rectified linear unit (ReLU) and a 3 × 3 convolution (Conv)

Pooling layers 可以改变特征图的尺寸，便于 concatenation

Growth rate：If each function H produces k feature-maps as output，We refer to the hyper-parameter k as the growth rate of the network

这里写图片描述

Bottleneck layers：尽管每个网络层只输出 k 个特征图，但是同时仍然有太多的输入个数，通常的做法是降维，在进行3×3卷积之前首先用一个 1×1卷积将输入个数降低到 4*k, 也就是在 H的定义中再加入一个 1×1卷积
Although each layer only produces k output feature maps, it typically has many more inputs. It has been noted in [36, 11] that a 1×1 convolution can be introduced as bottleneck layer before each 3×3 convolution to reduce the number of input feature-maps

为什么有太多的输入个数了？ If each function H_l produces k feature-maps as output, it follows that the l th layer has k×(l−1)+k0 input feature-maps, where k 0 is the number of channels in the input image.

Compression ：为了进一步提升模型的简洁性，我们在 transition layers里降低特征图数量
To further improve model compactness, we can reduce the number of feature-maps at transition layers. If a dense block contains m feature-maps, we let the following transition layer generate bθmc output feature-maps, where 0 <θ ≤1 is referred to as the compression factor.

Experiments
Error rates (%) on CIFAR and SVHN datasets
DenseNet and ResNet Top-1 (single model and single-crop) 对比：

参数规模还是比较小的
这里写图片描述

1） Middle: DenseNet-BC requires about 1/3 of the parameters as ResNet to achieve comparable accuracy
2）Right: Training and testing curves of the 1001-layer pre-activation ResNet [12] with more than 10M parameters and a 100-layer DenseNet with only 0.8M parameters

总的来说就是简单的多加几个 shortcut ，效果就好了，计算量少了！

转载于：http://blog.csdn.net/zhangjunhit/article/details/76060494

阅读全文

0 0