mnist初探

来源：互联网发布：矢量动画软件编辑：程序博客网时间：2024/06/14 12:37

这大约可以算是一个初学者对于数据挖掘领域的一些笔记吧

代码主要来自TensorFlow的开源实现，根据《TensorFlow 实战》p80-p83，实现两个卷积层加上一个全连接层构建的一个卷积神经网络。
刚开始学的时候可能是基础缺少了太多了吧，所以就只能自己不停的去翻文档，查资料什么的。也算是领悟到了神经网络在设计时的一些套路。

首先我们先将mnist的数据导入，需要注意的一点是one_hot这种表示方法和sess的表达方式

相对应的MNIST数据集的标签是介于0到9的数字，用来描述给定图片里表示的数字。为了用于这个教程，我们使标签数据是”one-hot vectors”。一个one-hot向量除了某一位的数字是1以外其余各维度数字都是0。所以在此教程中，数字n将表示成一个只有在第n维度（从0开始）数字为1的10维向量。比如，标签0将表示成([1,0,0,0,0,0,0,0,0,0,0])。因此， mnist.train.labels 是一个 [60000, 10]的数字矩阵。
我们知道TensorFlow是以图的方式来进行计算的，所以我们在运算的过程中也都会以Session这个对象来启动图。
为了便于使用诸如 IPython 之类的 Python 交互环境, 可以使用 InteractiveSession 代替 Session 类, 使用 Tensor.eval()和 Operation.run() 方法代替 Session.run(). 这样可以避免使用一个变量来持有会话.
下面使用了InteractiveSession来完成sess的创建。

from tensorflow.examples.tutorials.mnist import input_dataimport tensorflow as tfmnist = input_data.read_data_sets("data/", one_hot=True)sess = tf.InteractiveSession()

给`weight`和`bias`创建`Variable`对象。

存储模型参数的，不同于存储数据的tensor使用掉就会消失，Variable在模型的迭代训练中是可持久化的，比如显存中，然鹅我买不起显卡TAT，会在长期迭代中持久存在，并每次都更新。
在这个神经网络中初始化也是比较重要的，推荐使用加上一些随机的噪声来避免完全的对称，并加上比较小的正值来避免死亡节点的产生。

def weight_veriable(shape):    initial = tf.truncated_normal(shape, stddev=0.1)    return tf.Variable(initial)def bias_veriable(shape):    initial = tf.constant(0.1, shape=shape)    return tf.Variable(initial)

定义卷积层和池化操作的函数

至于卷积操作和池化操作算是一个非常重要的工具了，卷积运算的目的是提取输入的不同特征，第一层卷积层可能只能提取一些低级的特征如边缘、线条和角等层级，更多层的网路能从低级特征中迭代提取更复杂的特征。

池化层通常会分别作用于每个输入的特征并减小其大小。目前最常用形式的池化层是每隔2个元素从图像划分出2*2的区块，然后对每个区块中的4个数取最大值。这将会减少75%的数据量。

除了最大池化之外，池化层也可以使用其他池化函数，例如“平均池化”甚至“L2-范数池化”等。过去，平均池化的使用曾经较为广泛，但是最近由于最大池化在实践中的表现更好，平均池化已经不太常用。

下面的conv2d中x是输入，W是卷积的参数，算是x和W求卷积，如下面这个例子，W为[5,5,1,32]，前两个数字代表了卷积核的尺寸，1代表一个颜色通道，该数据集中只包含了灰度单色，32代表卷积核的数量，也就是会提取多少个特征。SAME代表在边界加上一些，使得图像的尺寸保持不变。strides代表移动的参数，第一个代表batch，即不会漏过一个batch，中间两个1代表了在height和width上每次移动一步，最后一个1代表不遗漏每一个channel，一般要求strides[0] = strides[3] =1

max_pool_2x2就是将一个2*2的像素块变成一个，其中四个参数的定义如下：

第一个参数value：需要池化的输入，一般池化层接在卷积层后面，所以输入通常是feature map，依然是[batch, height, width, channels]这样的shape

第二个参数ksize：池化窗口的大小，取一个四维向量，一般是[1, height, width, 1]，因为我们不想在batch和channels上做池化，所以这两个维度设为了1

第三个参数strides：和卷积类似，窗口在每一个维度上滑动的步长，一般也是[1, stride,stride, 1]

第四个参数padding：和卷积类似，可以取’VALID’ 或者’SAME’

W_conv1 = weight_variable([5, 5, 1, 32])h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)h_pool1 = max_pool_2x2(h_conv1)

def conv2d(x, W):    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')def max_pool_2x2(x):    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

在设计网络前，我们要定义输入的placeholder

x是输入的图片，y_是正确答案，即真实的label。卷积神经网络会使用到原有的空间信息，故我们需要将1X784的1-D信息通过reshape函数转化成28X28的2-D信息。reshape的参数为[-1,28,28,1]。-1表示样本的数量不确定，最后的1表示颜色通道数量。

x = tf.placeholder(tf.float32, [None, 784])y_ = tf.placeholder(tf.float32, [None, 10])x_image = tf.reshape(x, [-1, 28, 28, 1])

接下来我们就要开始设计网络了

第一个卷积层：

W_conv1 = weight_variable([5, 5, 1, 32])b_conv1 = bias_variable([32])h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)h_pool1 = max_pool_2x2(h_conv1)

第二个卷积层：

因为前面一层有32个卷积核，提取了32个特征，我们可以理解为有32种输入。

w_conv2 = weight_variable([5, 5, 32, 64])b_conv2 = bias_variable([64])h_conv2 = tf.nn.relu(conv2d(h_pool1, w_conv2) + b_conv2)h_pool2 = max_pool_2x2(h_conv2)

全连接层：

前面经历了两次池化操作，所以这时候图片尺寸已经变成了7X7，第二个卷积层我们放了64个卷积核，所以输出的tensor尺寸为7X7X64。再次采用reshape函数将输出转化为1-D的向量。然后通过一个全连接层进行输出，激活函数为relu。

W_fc1 = weight_variable([7 * 7 * 64, 1024])b_fc1 = bias_variable([1024])h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

为了减轻过拟合，我们常采用Dropout的方法，通过一个placeholder传入一个keep_prob比率来控制，随机丢弃一部分节点的数据来提高预测的准确度。

keep_prob = tf.placeholder(tf.float32)h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

最后我们将Dropout层连接一个softmax层，得到最后的概率。

W_fc2 = weight_variable([1024, 10])b_fc2 = bias_variable([10])y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

我们把loss函数定义为cross_entropy，关于交叉熵的知识可以参考这里。

优化器采用Adam，学习速率1e-4。

Adam 这个名字来源于 adaptive moment estimation，自适应矩估计。概率论中矩的含义是：如果一个随机变量 X 服从某个分布，X 的一阶矩是 E(X)，也就是样本平均值，X 的二阶矩就是 E(X^2)，也就是样本平方的平均值。Adam 算法根据损失函数对每个参数的梯度的一阶矩估计和二阶矩估计动态调整针对于每个参数的学习速率。Adam 也是基于梯度下降的方法，但是每次迭代参数的学习步长都有一个确定的范围，不会因为很大的梯度导致很大的学习步长，参数的值比较稳定。

同样的，计算准确率时，我们采用tf.argmax来挑选出最有可能的一个值。

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv),                                              reduction_indices=[1]))train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

下面就是我们的训练过程了，首先初始化所有的参数，训练时keep_prob比率设置成0.5，最后评测时设置为1。

总共进行10000次的训练，每50次对该batch进行一次评估。

tf.global_variables_initializer().run()for i in range(10000):    batch = mnist.train.next_batch(50)    if i % 100 == 0:        train_accuracy = accuracy.eval(feed_dict={x: batch[0], y_: batch[1],keep_prob: 1.0})        print("step %d, train accuracy %lf" % (i, train_accuracy))    train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})print("test accuracy %g" % accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels,keep_prob: 1.0}))

实验结果：

step 9100, train accuracy 1.000000
step 9200, train accuracy 1.000000
step 9300, train accuracy 1.000000
step 9400, train accuracy 1.000000
step 9500, train accuracy 1.000000
step 9600, train accuracy 1.000000
step 9700, train accuracy 1.000000
step 9800, train accuracy 1.000000
step 9900, train accuracy 1.000000
test accuracy 0.9901

进行20000次的准确率可以达到99.2%以上。

完整的代码：

from tensorflow.examples.tutorials.mnist import input_dataimport tensorflow as tfmnist = input_data.read_data_sets("data/", one_hot=True)sess = tf.InteractiveSession()def weight_variable(shape):    initial = tf.truncated_normal(shape, stddev=0.1)    return tf.Variable(initial)def bias_variable(shape):    initial = tf.constant(0.1, shape=shape)    return tf.Variable(initial)def conv2d(x, W):    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')def max_pool_2x2(x):    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')x = tf.placeholder(tf.float32, [None, 784])y_ = tf.placeholder(tf.float32, [None, 10])x_image = tf.reshape(x, [-1, 28, 28, 1])W_conv1 = weight_variable([5, 5, 1, 32])b_conv1 = bias_variable([32])h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)h_pool1 = max_pool_2x2(h_conv1)w_conv2 = weight_variable([5, 5, 32, 64])b_conv2 = bias_variable([64])h_conv2 = tf.nn.relu(conv2d(h_pool1, w_conv2) + b_conv2)h_pool2 = max_pool_2x2(h_conv2)W_fc1 = weight_variable([7 * 7 * 64, 1024])b_fc1 = bias_variable([1024])h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)keep_prob = tf.placeholder(tf.float32)h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)W_fc2 = weight_variable([1024, 10])b_fc2 = bias_variable([10])y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv),                                              reduction_indices=[1]))train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))tf.global_variables_initializer().run()for i in range(10000):    batch = mnist.train.next_batch(50)    if i % 100 == 0:        train_accuracy = accuracy.eval(feed_dict={x: batch[0], y_: batch[1],keep_prob: 1.0})        print("step %d, train accuracy %lf" % (i, train_accuracy))    train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})print("test accuracy %g" % accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels,keep_prob: 1.0}))

阅读全文

0 0