深度了解AlexNet和实现

来源：互联网发布：淘宝店招牌怎么制作编辑：程序博客网时间：2024/06/03 22:52

# AlexNet

# 1、AlexNet相关简述(文献1)# AlexNet中包含了几个比较新的技术点，也首次在CNN中成功应用了ReLU、Dropout和LRN等Trick。同时AlexNet也使用了GPU进行运算加速## AlexNet将LeNet的思想发扬光大，把CNN的基本原理应用到了很深很宽的网络中。AlexNet主要使用到的新技术点如下：# （1）成功使用ReLU作为CNN的激活函数，并验证其效果在较深的网络超过了Sigmoid，成功解决了Sigmoid在网络较深时的梯度弥散问题。虽然ReLU激活函数在很久之前就被提出了，但是直到AlexNet的出现才将其发扬光大。# （2）训练时使用Dropout随机忽略一部分神经元，以避免模型过拟合。Dropout虽有单独的论文论述，但是AlexNet将其实用化，通过实践证实了它的效果。在AlexNet中主要是最后几个全连接层使用了Dropout。# （3）在CNN中使用重叠的最大池化。此前CNN中普遍使用平均池化，AlexNet全部使用最大池化，避免平均池化的模糊化效果。并且AlexNet中提出让步长比池化核的尺寸小，这样池化层的输出之间会有重叠和覆盖，提升了特征的丰富性。# （4）提出了LRN层，对局部神经元的活动创建竞争机制，使得其中响应比较大的值变得相对更大，并抑制其他反馈较小的神经元，增强了模型的泛化能力。# （5）使用CUDA加速深度卷积网络的训练，利用GPU强大的并行计算能力，处理神经网络训练时大量的矩阵运算。AlexNet使用了两块GTX 580 GPU进行训练，单个GTX 580只有3GB显存，这限制了可训练的网络的最大规模。因此作者将AlexNet分布在两个GPU上，在每个GPU的显存中储存一半的神经元的参数。因为GPU之间通信方便，可以互相访问显存，而不需要通过主机内存，所以同时使用多块GPU也是非常高效的。同时，AlexNet的设计让GPU之间的通信只在网络的某些层进行，控制了通信的性能损耗。 # （6）数据增强，随机地从256*256的原始图像中截取224*224大小的区域（以及水平翻转的镜像），相当于增加了2*(256-224)^2=2048倍的数据量。如果没有数据增强，仅靠原始的数据量，参数众多的CNN会陷入过拟合中，使用了数据增强后可以大大减轻过拟合，提升泛化能力。进行预测时，则是取图片的四个角加中间共5个位置，并进行左右翻转，一共获得10张图片，对他们进行预测并对10次结果求均值。同时，AlexNet论文中提到了会对图像的RGB数据进行PCA处理，并对主成分做一个标准差为0.1的高斯扰动，增加一些噪声，这个Trick可以让错误率再下降1%。# AlexNet的基本结构(文献2)

#conv1 (Convolution)kernel size:11 stride:4 pad:0 out_layer:96#lrn#relu#pool1(MAX Pooling)kernel size :3 stride:2 pad:0#conv2 (Convolution)kernel size:5 stride:1 pad:2 out_layer:256#lrn#relu#pool2(MAX Pooling)kernel size :3 stride:2 pad:0#conv3(Convolution)kernel size:3 stride:1 pad:1 out_layer:384#conv4(Convolution)kernel size:3 stride:1 pad:1 out_layer:384#conv5(Convolution)kernel size:3 stride:1 pad:1 out_layer:256#pool5(MAX Pooling)kernel size:3 stride:2 pad:0#fc6 #relu6#drop6 out 4096#fc7#relu7#drop7 out 4096#fc8 out 1000


## alexnet总共包括8层，其中前5层convolutional，后面3层是full-connected，文章里面说的是减少任何一个卷积结果会变得很差，下面我来具体讲讲每一层的构成：#


# 第一层卷积层 输入图像为224*224*3的图像，使用了96个kernels（96,11,11,3），以4个pixel为一个单位来右移或者下移，能够产生5555个卷积后的矩形框值，然后进行response-normalized（其实是Local Response Normalized）和pooled之后，alexnet里面采样了两个GPU，所以从图上面看第一层卷积层厚度有两部分，池化pool_size=(3,3),滑动步长为2个pixels，得到96个2727个feature。# 第二层卷积层使用256个（同样，分布在两个GPU上，每个128kernels（5*5*48）），做pad_size(2,2)的处理，以1个pixel为单位移动，能够产生27*27个卷积后的矩阵框，做LRN处理，然后pooled，池化以3*3矩形框，2个pixel为步长，得到256个13*13个features。# 第三层、第四层都没有LRN和pool，第五层只有pool，其中第三层使用384个kernels（3*3*384，pad_size=(1,1),得到384*15*15，kernel_size为（3，3),以1个pixel为步长，得到384*13*13）；第四层使用384个kernels（pad_size(1,1)得到384*15*15，核大小为（3，3）步长为1个pixel，得到384*13*13）；第五层使用256个kernels（pad_size(1,1)得到384*15*15，kernel_size(3,3)，得到256*13*13，pool_size(3，3）步长2个pixels，得到256*6*6）。# 全连接层： 前两层分别有4096个神经元，最后输出softmax为1000个（ImageNet）

# Local Response Normalisation

# 使用ReLU f(x)=max(0,x)后，你会发现激活函数之后的值没有了tanh、sigmoid函数那样有一个值域区间，所以一般在ReLU之后会做一个normalization

# Dropout



# Dropout也是经常挺说的一个概念，能够比较有效地防止神经网络的过拟合。 相对于一般如线性模型使用正则的方法来防止模型过拟合，而在神经网络中Dropout通过修改神经网络本身结构来实现。对于某一层神经元，通过定义的概率来随机删除一些神经元，同时保持输入层与输出层神经元的个人不变，然后按照神经网络的学习方法进行参数更新，下一次迭代中，重新随机删除一些神经元，直至训练结束## Data Augmentation# 其实，最简单的增强模型性能，防止模型过拟合的方法是增加数据，但是其实增加数据也是有策略的，从224*224中随机提出227*227的patches，还有就是通过PCA来扩展数据集。这样就很有效地扩展了数据集，其实还有更多的方法视你的业务场景去使用，比如做基本的图像转换如增加减少亮度，一些滤光算法等等之类的，这是一种特别有效地手段，尤其是当数据量不够大的时候。

(这里实现了3次convolutional和二次全连接层)# 输入数据from tensorflow.examples.tutorials.mnist import input_datamnist = input_data.read_data_sets("./mnist/", one_hot=True)import tensorflow as tf# 定义网络超参数learning_rate = 0.001training_iters = 200000batch_size = 64display_step = 20# 定义网络参数n_input = 784 # 输入的维度n_classes = 10 # 标签的维度dropout = 0.8 # Dropout 的概率# 占位符输入x = tf.placeholder(tf.types.float32, [None, n_input])y = tf.placeholder(tf.types.float32, [None, n_classes])keep_prob = tf.placeholder(tf.types.float32)# 卷积操作def conv2d(name, l_input, w, b):    return tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(l_input, w, strides=[1, 1, 1, 1], padding='SAME'),b), name=name)# 最大下采样操作def max_pool(name, l_input, k):    return tf.nn.max_pool(l_input, ksize=[1, k, k, 1], strides=[1, k, k, 1], padding='SAME', name=name)# 归一化操作def norm(name, l_input, lsize=4):    return tf.nn.lrn(l_input, lsize, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name=name)# 定义整个网络def alex_net(_X, _weights, _biases, _dropout):    # 向量转为矩阵    _X = tf.reshape(_X, shape=[-1, 28, 28, 1])    # 卷积层    conv1 = conv2d('conv1', _X, _weights['wc1'], _biases['bc1'])    # 下采样层    pool1 = max_pool('pool1', conv1, k=2)    # 归一化层    norm1 = norm('norm1', pool1, lsize=4)    # Dropout    norm1 = tf.nn.dropout(norm1, _dropout)    # 卷积    conv2 = conv2d('conv2', norm1, _weights['wc2'], _biases['bc2'])    # 下采样    pool2 = max_pool('pool2', conv2, k=2)    # 归一化    norm2 = norm('norm2', pool2, lsize=4)    # Dropout    norm2 = tf.nn.dropout(norm2, _dropout)    # 卷积    conv3 = conv2d('conv3', norm2, _weights['wc3'], _biases['bc3'])    # 下采样    pool3 = max_pool('pool3', conv3, k=2)    # 归一化    norm3 = norm('norm3', pool3, lsize=4)    # Dropout    norm3 = tf.nn.dropout(norm3, _dropout)

    # 全连接层，先把特征图转为向量    dense1 = tf.reshape(norm3, [-1, _weights['wd1'].get_shape().as_list()[0]])    dense1 = tf.nn.relu(tf.matmul(dense1, _weights['wd1']) + _biases['bd1'], name='fc1')    # 全连接层    dense2 = tf.nn.relu(tf.matmul(dense1, _weights['wd2']) + _biases['bd2'], name='fc2') # Relu activation    # 网络输出层    out = tf.matmul(dense2, _weights['out']) + _biases['out']    return out# 存储所有的网络参数weights = {    'wc1': tf.Variable(tf.random_normal([3, 3, 1, 64])),    'wc2': tf.Variable(tf.random_normal([3, 3, 64, 128])),    'wc3': tf.Variable(tf.random_normal([3, 3, 128, 256])),

    'wc4': tf.Variable(tf.random_normal([3, 3, 128, 256])),
    'wc5': tf.Variable(tf.random_normal([3, 3, 128, 256])),
    'wd1': tf.Variable(tf.random_normal([4*4*256, 1024])),

    'wd2': tf.Variable(tf.random_normal([1024, 1024])),

    'out': tf.Variable(tf.random_normal([1024, 10]))}

biases = {    'bc1': tf.Variable(tf.random_normal([64])),

      'bc2': tf.Variable(tf.random_normal([128])),

      'bc3': tf.Variable(tf.random_normal([256])),

      'bd1': tf.Variable(tf.random_normal([1024])),

      'bd2': tf.Variable(tf.random_normal([1024])),

              'out': tf.Variable(tf.random_normal([n_classes]))}

# 构建模型

pred = alex_net(x, weights, biases, keep_prob)

# 定义损失函数和学习步骤

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# 测试网络

correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))

accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# 初始化所有的共享变量init = tf.initialize_all_variables()

# 开启一个训练with tf.Session() as sess:    sess.run(init)    step = 1    # Keep training until reach max iterations    while step * batch_size < training_iters:        batch_xs, batch_ys = mnist.train.next_batch(batch_size)        # 获取批数据        sess.run(optimizer, feed_dict={x: batch_xs, y: batch_ys, keep_prob: dropout})        if step % display_step == 0:            # 计算精度            acc = sess.run(accuracy, feed_dict={x: batch_xs, y: batch_ys, keep_prob: 1.})            # 计算损失值            loss = sess.run(cost, feed_dict={x: batch_xs, y: batch_ys, keep_prob: 1.})            print "Iter " + str(step*batch_size) + ", Minibatch Loss= " + "{:.6f}".format(loss) + ", Training Accuracy= " + "{:.5f}".format(acc)        step += 1    print "Optimization Finished!"    # 计算测试精度    print "Testing Accuracy:", sess.run(accuracy, feed_dict={x: mnist.test.images[:256], y: mnist.test.labels[:256], keep_prob: 1.})

###########################################################################################################
###########################################################################################################
###########################################################################################################
实现5次卷基层操作

# coding=utf-8import tensorflow as tffrom datatime import datatimeimport mathimport timebatch_size = 32num_batches = 100"""定义一个用来显示网络每一层结构的函数print_actications，展示每一个卷积层或池化层输出tensor的尺寸。这个函数接受一个tensor作为输入，并显示其名称（t.op.name）和tensor尺寸（t.get_shape.as_list）"""def print_activations(t):    print(t.op.name, ' ', t.get_shape.as_list())# 设计Alexnet的网络结构"""先定义函数inference，它接受images作为输入，返回最后一层pool5(第5个池化层)及parameters（Alexnet所有需要训练的模型参数）。这个inference函数会很大，包含多个卷积层和池化层。"""def inference(images):    parameters = []    # 第一层卷积层    with tf.name_scope('conv1') as scope:        kernel = tf.Variable(tf.truncated_normal([11, 11, 3, 64],                                                 dtype=tf.float32, stddev=1e-1), name='weights')        conv = tf.nn.conv2d(images, kernel, [1, 4, 4, 1], padding='SAME')        biases = tf.Variable(tf.constant(0.0, shape=[64], dtype=tf.float32),                             trainable=True, name='biases')        bias = tf.nn.bias_add(conv, biases)        conv1 = tf.nn.relu(bias, name=scope)        print_activations(conv1)        parameters += [kernel, biases]    # 在第一层之后加上LRN层和最大池化层    # depth_radius = 4    lrn1 = tf.nn.lrn(conv1, 4, bias=1.0, alpha=0.001/9, beta=0.75, name='lrn1')    # padding=VALID，取样时不能超过边框    pool1 = tf.nn.max_pool(lrn1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],                           padding='VALID', name='pool1')    print_activations(pool1)    # 第二层卷积层    # 卷积的步长全为1,即扫描全图像素    with tf.name_scope('conv2'):        kernel = tf.Variable(tf.truncated_normal([5, 5, 64, 192], dtype=tf.float32,                                                 stddev=1e-1), name='weights')        conv = tf.nn.conv2d(pool1, kernel, [1, 1, 1, 1], padding='SAME')        biases = tf.Variable(tf.constant(o.o, shape=[192],                                         dtype=tf.float32), trainable=True, name='biases')        bias = tf.nn.bias_add(conv, biases)        conv2 = tf.nn.relu(bias, name=scope)        parameters += [kernel, biases]    print_activations(conv2)    lrn2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001/9, beta=0.75, name='lrn2')    pool2 = tf.nn.max_pool(lrn2, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],                           padding='VALID', name='pool2')    print_activations(pool2)    # 第三层卷积层，没有添加LRN和最大池化层    with tf.name_scope('conv3'):        kernel = tf.Variable(tf.truncated_normal([3, 3, 192, 384], dtype=tf.float32,                                                 stddev=1e-1）, name='weights')        conv = tf.nn.conv2d(conv2, kernel, [1, 1, 1, 1], padding='SAME')        biases = tf.Variable(tf.constant(0.0, shape=[384], dtype=tf.float32),                                         trainable=True, name='biases')        bias = tf.nn.bias_add(conv, biases)        conv3 = tf.nn.relu(bias, name=scope)        print_activations(conv3)        parameters += [kernel, biases]    # 第四层卷积层    with tf.name_scope('conv4') as scope:        kernel = tf.Vaiable(tf.truncated_normal([3, 3, 384, 256], dtype=tf.float32,                                                stddev=1e-1), name='weight')        conv = tf.nn.conv2d(conv3, kernel)        biases = tf.Variable(tf.constant(0.0, shape=[256],                                         dtype=tf.float32), name='biases')        bias = tf.nn.bias_add(conv, biases)        conv4 = tf.nn.relu(bias, name=scope)        print_activations(conv4)        parameters += [kernel, biases]    # 第五层卷积层,后面加一个池化层    with tf.name_scope('conv5') as scope:        kernel = tf.Variable(tf.truncated_normal([3, 3, 256, 256], dtype=tf.float32,                                                 stddev=1e-1), name='weights')        conv = tf.nn.conv2d(conv4, kernel, [1, 1, 1, 1], padding='SAME')        biases = tf.Variable(tf.constant(0.0, dtype=tf.float32, shape=[256]), name='biases')        bias = tf.nn.bias_add(conv, biases)        conv5 = tf.nn.relu(bias, name=scope)        parameters += [kernel, bias]        print_activations(conv5)    pool5 = tf.nn.max_pool(conv5, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],                           padding='VALID', name='pool5')    print_activations(pool5)
    # 全连接层，先把特征图转为向量
    dense1 = tf.reshape(pool5, [-1, [ 6*6*256, 4096]].get_shape().as_list()[0]])

    #6th    dense1 = tf.nn.relu(tf.matmul(dense1, [ 6*6*256, 4096]) + _biases['bd1'], name='fc1')     dense1=tf.nn.dropout(dense1, keep_prob)    # 全连接层,7th    dense2 = tf.nn.relu(tf.matmul(dense1, [4096, 4096]) + _biases['bd2'], name='fc2')     dense2=tf.nn.dropout(dense1, keep_prob)

    # Relu activation

    # 网络输出层 8th    out = tf.matmul(dense2, [4096,1024]) + _biases['out']

    return out

# http://lib.csdn.net/article/deeplearning/60535

# http://hacker.duanshishi.com/?p=1661

# http://blog.csdn.net/searobbers_duck/article/details/51645941

# https://kratzert.github.io/2017/02/24/finetuning-alexnet-with-tensorflow.html

0 0