[paper note] Densely Connected Convolutional Networks

来源：互联网发布：命名实体识别算法编辑：程序博客网时间：2024/06/01 23:14

Current trend of CNN architecture: create short paths from early layers to later layers.
- ResNet
- Highway network: The first network with more than 100 layers, bypassing paths
- Stochastic depth: Improves the training of deep residual networks by dropping layers randomly during training, which manages to train a 1202-layer ResNet
- FractalNets
Wide filter is helpful.
Connect all layers with each other.
Combine features by concatenating them (ResNet combines by summation).
DenseNet layers are very narrow (12 feature-maps per layer), resulting in less parameters

dense block

Dense connectivity: concatenate all the preceding layers:
xl=Hl([x0,x1,…,xl−1])
Composite function:
H_l is defined as BN + ReLU + 3x3 Conv
Pooling and dense block
- See figure above
- Transition layer between dense blocks, consist of BN + 1x1 Conv + 2x2 AveragePooing
Growth rate k:
- The number of output feature-maps.
- The l-th layer will have k x (l-1) + k_0 input feature-maps (k_0:input image channels)
Bottleneck layers
- Introduce 1x1 Conv before 3x3 Conv to reduce number of feature-maps will improve computation efficiency.
- H_l is changed to BN + ReLU + 1x1 Conv + BN + ReLU + 3x3 Conv
- 1x1 Conv reduce the input to 4k feature-maps in the experiment.
Compression
- Reduce feature-maps number in transition layer by factor θ

Datasets
- CIFAR-10/100, 32x32
  - Zero-padded with 4 pixels on each side
  - Randomly cropped to again produce 32×32 images
  - Half of the images are then horizontally mirrored
- SVHN (Street View House Numbers), 32x32
- ImageNet, 224x224, 1.2m for training, 50000 for validation, 1000 classes
Settings: weight decay 10e-4, Nesterov momentum of 0.9 w\o dampening, learning rate 0.1 with decay scheme, dropout when no data augmentation
Accuracy result:
- 3.46% on CIFAR-10, L=190, k=40
- 17.18% on CIFAR-100, L=190, k=40
- 1.59% on SVHN, L=100, k=24
Capacity: the performance continues improving when L, k increase, showing the DenseNet is less prone to overfitting (???)
Parameter efficiency.

0 0