Densenet（Algorithm+Code）

来源：互联网发布：mac os 10.11 dmg下载编辑：程序博客网时间：2024/05/17 19:19

论文链接：Densely Connected Convolutional Networks
代码github：code

1.网络结构

这里写图片描述

该网络参考了ResNet(解决了网络深的时候梯度消失问题)、GoogleNet的inception（网络宽的问题）。如图所示，为稠密链接，每层以前一层的输出为输入，对于L层的网络，一共有L个链接，对于densenet，则有L(L+1)2.
加深网络结构首先需要解决的是梯度消失问题，解决方案是：尽量缩短前层和后层之间的连接。比如上图中，H4层可以直接用到原始输入信息X0，同时还用到了之前层对X0处理后的信息，这样能够最大化信息的流动。反向传播过程中，X0的梯度信息包含了损失函数直接对X0的导数，有利于梯度传播。
DenseNet的几个优点：
1、减轻了vanishing-gradient（梯度消失）
2、加强了feature的传递
3、更有效地利用了feature
4、一定程度上较少了参数数量

2.Dense Block 结构

如下图所示，每层实现了一组非线性变换Hl(.)，可以是Batch Normalization (BN) ,rectified linear units (ReLU) , Pooling , or Convolution (Conv). 第l层的输出为xl。
dense block
对于ResNet： xl=Hl(xl−1)+xl−1，这样做的好处是the gradient flows directly through the identity function from later layers to the earlier layers.
同时，由于identity function 和 H的输出通过相加的方式结合，会妨碍信息在整个网络的传播。受GooLeNet的启发，DenseNet通过串联的方式结合： xl=Hl([x0,x1,....xl−1])
这里Hl(.)是一个Composite function，是三个操作的组合：BN−>ReLU−>Conv(3×3)。
由于串联操作要求特征图x0,x1,....xl−1大小一致，而Pooling操作会改变特征图的大小，又不可或缺，于是就有了上图中的分块想法，其实这个想法类似于VGG模型中的“卷积栈”的做法。论文中称每个块为DenseBlock。
每个DenseBlock的之间层称为transition layers，由BN−>Conv(1×1)−>averagePooling(2×2)组成。
Growth rate：由于每个层的输入是所有之前层输出的连接，因此每个层的输出不需要像传统网络一样多。这里Hl(.)的输出的特征图的数量都为k，k即为Growth Rate，用来控制网络的“宽度”（特征图的通道数）.比如说第l层有k(l−1)+k0的输入特征图，k0是输入图片的通道数。

虽然说每个层只产生k个输出，但是后面层的输入依然会很多，因此引入了Bottleneck layers 。本质上是引入1x1的卷积层来减少输入的数量，Hl的具体表示如下:
BN−>ReLU−>Conv(1×1)−>BN−>ReLU−>Conv(3×3)
文中将带有Bottleneck layers的网络结构称为DenseNet-B。
除了在DenseBlock内部减少特征图的数量，还可以在transition layers中来进一步Compression。如果一个DenseNet有m个特征图的输出，则transition layer产生 ⌊θm⌋个输出，其中0<θ≤1。对于含有该操作的网络结构称为DenseNet-C。

同时包含Bottleneck layer和Compression的网络结构为DenseNet-BC。
具体的网络结构：
这里写图片描述

3.代码分析

3.1 Densenet
网络框架：
cov1→cov2→block1→transition1→block2→transition2→block3→transition3→block4→block3_up →bn_relu_conv→block2_up→bn_relu_conv→block1_up→bn_relu_conv→ unsample1→bn_relu_conv3→bn_sigmoid_conv

def dense_net(image,img_name_index, is_training=True):    with tf.variable_scope('conv1') as scope:        l = conv2d(image, 3, 16, 3, 1)        l = batch_norm_layer(l, is_training)        l = tf.nn.relu(l)    with tf.variable_scope('conv2') as scope:        l = conv2d(l, 16, 32, 3, 2)        l = batch_norm_layer(l, is_training)        l_fisrt_down = tf.nn.relu(l)    l = tf.nn.max_pool(l_fisrt_down, [1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME')    with tf.variable_scope('block1') as scope:        l = conv2d(l, 32, growth_rate, 3, 1)        for i in range(dense_block1_num):            l = add_layer('dense_layer.{}'.format(i), l, is_training, input_filters1=growth_rate * (i + 1))        block1, l = add_transition_average('transition1', l, is_training,                                           input_filters=growth_rate * (dense_block1_num + 1),                                           output_filters=32)    with tf.variable_scope('block2') as scope:        l = conv2d(l, 32, growth_rate, 3, 1)        for i in range(dense_block2_num):            l = add_layer('dense_layer.{}'.format(i), l, is_training, input_filters1=growth_rate * (i + 1))        block2, l = add_transition_average('transition2', l, is_training,                                           input_filters=growth_rate * (1 + dense_block2_num),                                           output_filters=32)    with tf.variable_scope('block3') as scope:        l = conv2d(l, 32, growth_rate, 3, 1)        for i in range(dense_block3_num):            l = add_layer('dense_layer.{}'.format(i), l, is_training, input_filters1=growth_rate * (i + 1))        block3, l = add_transition_average('transition3', l, is_training,                                           input_filters=growth_rate * (1 + dense_block3_num),                                           output_filters=32)    with tf.variable_scope('block4') as scope:        l = conv2d(l, 32, growth_rate, 3, 1)        for i in range(dense_block4_num):            l = add_layer('dense_layer.{}'.format(i), l, is_training, input_filters1=growth_rate * (i + 1))            # l = add_transition_average('transition4', block4, is_training, input_filters=growth_rate * dense_block4_num,            #                            output_filters=32)    with tf.variable_scope('block3_up') as scope:        l = bn_relu_conv(l, is_training, growth_rate * (1 + dense_block4_num), 32, 3, 1, name='bn_relu_conv1')        l = upsample(l, 32, 32, 3, 2)        l = tf.concat([l, block3], 3)        l = bn_relu_conv(l, is_training, 64, growth_rate, 3, 1, name='bn_relu_conv2')        for i in range(dense_block3_num):            l = add_layer('dense_layer.{}'.format(i), l, is_training, input_filters1=growth_rate * (i + 1))    with tf.variable_scope('block2_up') as scope:        l = bn_relu_conv(l, is_training, growth_rate * (1 + dense_block3_num), 32, 3, 1, name='bn_relu_conv1')        l = upsample(l, 32, 32, 3, 2)        l = tf.concat([l, block2], 3)        l = bn_relu_conv(l, is_training, 64, growth_rate, 3, 1, name='bn_relu_conv2')        for i in range(dense_block2_num):            l = add_layer('dense_layer.{}'.format(i), l, is_training, input_filters1=growth_rate * (i + 1))    with tf.variable_scope('block1_up') as scope:        l = bn_relu_conv(l, is_training, growth_rate * (1 + dense_block2_num), 32, 3, 1, name='bn_relu_conv1')        l = upsample(l, 32, 32, 3, 2)        l = tf.concat([l, block1], 3)        l = bn_relu_conv(l, is_training, 64, growth_rate, 3, 1, name='bn_relu_conv2')        for i in range(dense_block1_num):            l = add_layer('dense_layer.{}'.format(i), l, is_training, input_filters1=growth_rate * (i + 1))    l = bn_relu_conv(l, is_training, growth_rate * (1 + dense_block1_num), 64, 3, 1, name='bn_relu_conv1')    with tf.variable_scope('upsample1') as scope:        l = upsample(l, 64, 64, 3, 2)        l = bn_relu_conv(l, is_training, 64, 32, 3, 1)    l = tf.concat([l, l_fisrt_down], 3)    l = bn_relu_conv(l, is_training, 64, 64, 3, 1, name='bn_relu_conv2')    with tf.variable_scope('bn_relu_conv3') as scope:        l = upsample(l, 64, 64, 3, 2)        l = bn_relu_conv(l, is_training, 64, 16, 3, 1)    with tf.variable_scope('bn_sigmoid_conv') as scope:        l = bn_relu_conv(l, is_training, 16, 1, 1, 1)        image_conv = tf.nn.sigmoid(l)    saver = tf.train.Saver()    if ckpt and ckpt.model_checkpoint_path:        if img_name_index==0:            saver.restore(sess, ckpt.model_checkpoint_path)            print('model restored')    return image_conv

3.2 reuse的使用
reuse是设置代码是否允许重复
第一次设置reuse为false，原因是还没有开始测试过，网络没有记忆，初始化是不允许有相同的网络结构。
第二次设置为true，原因是测试过一次之后，后续测试代码需要循环跑，允许出现重复的网络结构。

 if img_name_index==0:        output = inference(img_input,img_name_index,is_training=False,scope_reuse=False)    else:        output = inference(img_input,img_name_index,is_training=False,scope_reuse=True)

3.3 测试图片时，制作反转测试集方法

preds_x = np.squeeze(preds_all[0])preds_x90 = cv2.warpAffine(np.squeeze(preds_all[1],axis=0),M270,(224,224))preds_x180 = cv2.warpAffine(np.squeeze(preds_all[2],axis=0),M180,(224,224))preds_x270 = cv2.warpAffine(np.squeeze(preds_all[3],axis=0), M90, (224, 224))preds_xup_down = np.squeeze(preds_all[4][::-1,:,:])preds_xleft_right = np.squeeze(preds_all[5][:,::-1,:])preds=(preds_x+preds_x90+preds_x180+preds_x270+preds_xup_down+preds_xleft_right)

3.4 tips
* 输入神经网络的参数都是四维：batch、高度、宽度、feature map
* 如何deconv ：（即如何使得包含深度特征的图片变大）
对小图片进行补0，再用卷积核卷积变成大图片，但是这样容易造成信息冗余与丢失。采用双线性插值的办法，如下图：
这里写图片描述

阅读全文

0 0