【TensorFlow-windows】(五) CNN（卷积神经网络）对cifar10的识别

来源：互联网发布：网络剧受众分析论文编辑：程序博客网时间：2024/06/06 07:26

主要内容：
1.基于CNN的cifar10识别（详细代码注释）
2.该实现中的函数总结

平台：
1.windows 10 64位
2.Anaconda3-4.2.0-Windows-x86_64.exe （当时TF还不支持python3.6，又懒得在高版本的anaconda下配置多个Python环境，于是装了一个3-4.2.0（默认装python3.5），建议装anaconda3的最新版本，TF1.2.0版本已经支持python3.6！）
3.TensorFlow1.1.0

CNN的介绍可以看：
https://en.wikipedia.org/wiki/Convolutional_neural_network
http://cs231n.github.io/convolutional-networks/

新知识点：权值衰减（weight decay），相当于L2范数的作用，是对较大的权值进行惩罚，通过权值衰减，可以减轻过拟合。
权值衰减的操作是在原loss的基础上加上一个 weight loss（权值的loss），从而在优化的时候考虑到权值

数据增强（data augmentation）:早期是在Alex-net里边出现，可参考论文（Imagenet classification with deep convolutional neural networks），这里边用到的数据增强把32*32的裁剪成28*28，然后有水平翻转，随机亮度和对比度具体看代码实现

LRN（Local Response Normalization,局部响应归一化层）: 同样的，在论文（Imagenet classification with deep convolutional neural networks）里边出现，不得不说2012年Alex-net的提出是CNN历史时刻。具体作用就是把输出值拉回到中间的线性区，从而减轻梯度消失（个人理解，不知道是不是这样）

这里用的CNN结构是：输入层-Conv1-Pool1-LRN1-Conv2-LRN2-Pool2-FC1(带权值衰减)-FC2（带权值衰减）-softmax（输出层）

放代码前说几点注意事项！由于涉及cifar10数据的下载（cifar.py）和读取预处理（cifar10_input.py）。会用到两个包，这是需要另外下载的，包的下载：打开命令窗口输入 git clone https://github.com/tensorflow/models.git
即可下载到你对应的文件下，然后找到cifar10文件夹，里边有cifar10.py和cifar10_input.py，就是代码里要import的。其实如果是自己下载了cifar10数据，则 cifar10是不需要的，我在代码里已经注释掉了

这里用的CNN模型如下：
conv1–pool1–LRN1—-conv2–LRN2–pool2—FCN1–FCN2–输出

代码：

import tensorflow as tfimport numpy as npimport timefrom tensorflow.examples.tutorials.cifar10 import cifar10,cifar10_input  # 要去 github上下载# 参数定义max_steps = 1000batch_size = 128data_dir = 'F:\\tensorflow_xuexi\\cifar-10-batches-bin' #自己文件位置print(data_dir)# 对权值采取L2的正则化，L2正则化是使权值不能太大# 使各个特征的权值比较平均，从而达到防止过拟合的效果# 此函数用来定义权值w的，wl是 weight loss 的缩写 def variable_with_weight_loss(shape, stddev, wl):    var = tf.Variable(tf.truncated_normal(shape, stddev = stddev))    if wl is not None:        weight_loss = tf.multiply(tf.nn.l2_loss(var), wl, name = 'weight_loss')        tf.add_to_collection('losses', weight_loss) # 把weight loss 加到总的loss中    return var# 下载（若未下载）并解压#cifar10.maybe_download_and_extract()  # 如果自己下载好了文件就不需要这行！# distorted_inputs函数是返回训练所需要的数据,并且做了数据增强# （水平翻转，随机剪切一块24*24，随机亮度和对比度，），并且进行# 标准化（减去均值，除以方差，0均值，方差为1）  一次返回一个batch的样本和标签images_train, labels_train = cifar10_input.distorted_inputs(    data_dir = data_dir, batch_size = batch_size)# 生成测试数据images_test, labels_test = cifar10_input.inputs(eval_data = True,    data_dir = data_dir, batch_size = batch_size)# 占位符，计算图的输入image_holder = tf.placeholder(tf.float32, [batch_size, 24, 24, 3])label_holder = tf.placeholder(tf.int32, [batch_size])#---------------第1/4步：定义算法公式-------------------# block1，第一块：卷积层1，池化层1，池化后进行LRN# wl = 0 表示不对第一个卷积层的权值进行L2正则化weight1 = variable_with_weight_loss(shape=[5,5,3,64], stddev = 5e-2, wl=0.0)kernel1 = tf.nn.conv2d(image_holder, weight1,[1,1,1,1],padding = 'SAME')bias1 = tf.Variable(tf.constant(0.0,shape = [64]))conv1 = tf.nn.relu(tf.nn.bias_add(kernel1, bias1)) ##############???pool1 = tf.nn.max_pool(conv1, ksize=[1, 3, 3, 1], strides = [1,2,2,1],                        padding = 'SAME')norm1 = tf.nn.lrn(pool1, 4, bias = 1.0,alpha=0.001/9.0, beta = 0.75)# block2,第二块：卷积层２，先ＬＲＮ，再进行池化２weight2 = variable_with_weight_loss(shape= [5,5,64,64], stddev= 5e-2, wl = 0.0)kernel2 = tf.nn.conv2d(norm1, weight2, [1,1,1,1], padding ='SAME')bias2 = tf.Variable(tf.constant(0.1, shape = [64]))conv2 = tf.nn.relu(tf.nn.bias_add(kernel2, bias2))norm2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha = 0.001/9.0, beta = 0.75)pool2 = tf.nn.max_pool(norm2, ksize=[1,3,3,1], strides=[1,2,2,1], padding='SAME')# 全连接层1 ，先把上一层的结果拉成 向量# 在这里对权值进行了L2正则化， weight loss 值为0.04reshape = tf.reshape(pool2, [batch_size, -1])dim = reshape.get_shape()[1].value  # 获取数据长度 weight3 = variable_with_weight_loss(shape = [dim, 384], stddev = 0.04,wl = 0.04)bias3 = tf.Variable(tf.constant(0.1, shape = [384]))local3 = tf.nn.relu(tf.matmul(reshape, weight3) + bias3)# 全连接层2 weight4 = variable_with_weight_loss(shape = [384,192], stddev = 0.04, wl = 0.04)bias4 = tf.Variable(tf.constant(0.1, shape = [192]))local4 = tf.nn.relu(tf.matmul(local3, weight4) + bias4)# 输出层，这里不进行softmax，softmax操作放到了loss计算里面去了weight5 = variable_with_weight_loss(shape = [192,10], stddev = 1/192.0, wl =0.0)bias5 = tf.Variable(tf.constant(0.0, shape = [10]))logits = tf.add(tf.matmul(local4, weight5), bias5)#---------------第2/4步：定义loss和优化器-------------------def loss(logits, labels):    labels = tf.cast(labels, tf.int64)    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(                    logits = logits, labels = labels, name = 'cross_entropy_per_example')    cross_entropy_mean = tf.reduce_mean(cross_entropy, name = 'cross_entropy')    tf.add_to_collection('losses', cross_entropy_mean)    return tf.add_n(tf.get_collection('losses'), name = 'total_loss')loss = loss(logits, label_holder)train_op = tf.train.AdamOptimizer(1e-3).minimize(loss)top_k_op = tf.nn.in_top_k(logits, label_holder, 1)  # 选择top1 sess = tf.InteractiveSession()tf.global_variables_initializer().run()# 启动数据增强的线程队列，启动16个线程来进行加速tf.train.start_queue_runners()#---------------第3/4步：训练-------------------for step in range(max_steps):#    print('开始训练了~')    start_time = time.time()    image_batch, label_batch = sess.run([images_train, labels_train])#    print('拿到 image batch了！')#    print(image_batch)#    print(label_batch)    # loss_value = sess.run([train_op, loss], feed_dict={image_holder: image_batch,                             # label_holder : label_batch})    loss_value = sess.run([loss], feed_dict={image_holder: image_batch,                          label_holder : label_batch})#    print('看看lossvalue：',loss_value, loss_value[1])    duration = time.time() - start_time    if step % 10 == 0 :#        print( '检查 batch_size / duration 是什么类型',batch_size / duration )        examples_per_sec = batch_size / duration  # 加这个        sec_per_batch = float(duration)        format_str = ('step %d, loss = %.2f (%.1f examples/sec; %.3f sec/batch)')#        print(format_str % (step, float(loss_value[1]), examples_per_sec, sec_per_batch))        print(format_str % (step, float(loss_value), examples_per_sec, sec_per_batch))#---------------第4/4步：测试集上评估模型-------------------'num_examples = 10000import mathnum_iter = int(math.ceil(num_examples / batch_size))true_count = 0total_sample_count = num_iter * batch_sizestep = 0while step < num_iter:    image_batch, label_batch = sess.run([images_test,labels_test])    predictions = sess.run([top_k_op], feed_dict = {image_holder: image_batch,                    label_holder: label_batch})    true_count += np.sum(predictions)    step += 1precision = true_count / total_sample_countprint('precision @ 1 = %.3f' % precision)

函数总结（续上篇）

sess = tf.InteractiveSession() 将sess注册为默认的session
tf.placeholder() ， Placeholder是输入数据的地方，也称为占位符，通俗的理解就是给输入数据（此例中的图片x）和真实标签（y_）提供一个入口，或者是存放地。（个人理解，可能不太正确，后期对TF有深入认识的话再回来改~~）
tf.Variable() Variable是用来存储模型参数，与存储数据的tensor不同，tensor一旦使用掉就消失
tf.matmul() 矩阵相乘函数
tf.reduce_mean 和tf.reduce_sum 是缩减维度的计算均值，以及缩减维度的求和
tf.argmax() 是寻找tensor中值最大的元素的序号，此例中用来判断类别
tf.cast() 用于数据类型转换
————————————–我是分割线（一）———————————–

tf.random_uniform 生成均匀分布的随机数
tf.train.AdamOptimizer() 创建优化器，优化方法为Adam（adaptive moment estimation，Adam优化方法根据损失函数对每个参数的梯度的一阶矩估计和二阶矩估计动态调整针对于每个参数的学习速率）
tf.placeholder “占位符”，只要是对网络的输入，都需要用这个函数这个进行“初始化”
tf.random_normal 生成正态分布
tf.add 和 tf.matmul 数据的相加、相乘
tf.reduce_sum 缩减维度的求和
tf.pow 求幂函数
tf.subtract 数据的相减
tf.global_variables_initializer 定义全局参数初始化
tf.Session 创建会话.
tf.Variable 创建变量，是用来存储模型参数的变量。是有别于模型的输入数据的
tf.train.AdamOptimizer (learning_rate = 0.001) 采用Adam进行优化，学习率为 0.001
————————————–我是分割线（二）———————————–
1. hidden1_drop = tf.nn.dropout(hidden1, keep_prob) 给 hindden1层增加Droput，返回新的层hidden1_drop,keep_prob是 Droput的比例
2. mnist.train.next_batch() 来详细讲讲这个函数。一句话概括就是，打乱样本顺序，然后按顺序读取batch_size 个样本进行返回。
具体看代码及其注释，首先要找到函数定义，在tensorflow\contrib\learn\python\learn\datasets 下的mnist.py
————————————–我是分割线（三）———————————–
1. tf.nn.conv2d(x, W, strides = [1, 1, 1, 1], padding =’SAME’)对于这个函数主要理解 strides和padding，首先明确，x是输入，W是卷积核，并且它们的维数都是4（发现strides里有4个元素没，没错！就是一一对应的）
先说一下卷积核W也是一个四维张量，各维度表示的信息是：[filter_height, filter_width, in_channels, out_channels]

输入x，x是一个四维张量，各维度表示的信息是：[batch, in_height, in_width, in_channels]

strides里的每个元素就是对应输入x的四个维度的步长，因为第2，3维是图像的长和宽，所以平时用的strides就在这里设置，而第1，4维一般不用到，所以是1

padding只有两种取值方式，一个是 padding=[‘VALID’] 一个是padding=[‘SAME’]
valid：采用丢弃的方式，只要移动一步时，最右边有超出，则这一步不移动，并且剩余的进行丢弃。如下图，图片长13，卷积核长6，步长是5，当移动一步之后，已经卷积核6-11，再移动一步，已经没有足够的像素点了，所以就不能移动，因此 12，13被丢弃。
same：顾名思义，就是保持输入的大小不变，方法是在图像边缘处填充全0的像素
————————————–我是分割线（四）———————————–
1.tf.nn.l2_loss()是对一个Tensor对象求L2 norm，这里用来求权值的L2范数
2.tf.add_to_collection(‘losses’, weight_loss), 把weight_loss 加到名为losses的loss中去
3.tf.nn.lrn，做局部相应归一化，来看看函数的定义

lrn(input, depth_radius=None, bias=None, alpha=None, beta=None,name=None)

函数有6个输入，第一个自然是输入啦，2-5是LRN用到的参数，第一个是depth半径，bias alpha beta是相应的参数，来看看伪代码就清楚了

      sqr_sum[a, b, c, d] = sum(input[a, b, c, d - depth_radius : d + depth_radius + 1] ** 2)      output = input / (bias + alpha * sqr_sum) ** beta

2-5的参数是有默认值的，分别是5，1，1，0.5

3.tf.nn.sparse_softmax_cross_entropy_with_logits()就是计算交叉熵的，但是如果是一个batch的，得出的是一个向量，还需要配合tf.reduce_mean()才能计算得出最终的loss（标量）

4.tf.get_collection(‘losses’)，从一个集合中取出全部变量，返回的是一个列表，这里是把losses的都取出来，在上面把 weight_loss和输出层的loss都加到losses中了，这个函数就是把losses中的loss取出来

5.tf.add_n：把一个列表的东西都依次加起来（Adds all input tensors element-wise.）输入是一个list (inputs: A list of Tensor objects, each with same shape and type.)

阅读全文

0 0