TensorFlow实战——DNN——MNIST数字识别

来源：互联网发布：大学生程序员如何赚钱编辑：程序博客网时间：2024/06/05 19:11

本文地址： http://blog.csdn.net/u011239443/article/details/71173351

先给出完整的程序代码的，然后我们再逐步解析：

https://github.com/xiaoyesoso/TensorFlowinAction/blob/master/InActionB1/chapter5/5_2_1.py

重构后的代码：
mnist_inference：https://github.com/xiaoyesoso/TensorFlowinAction/blob/master/InActionB1/chapter5/mnist_inference_5_5.py
mnist_train：https://github.com/xiaoyesoso/TensorFlowinAction/blob/master/InActionB1/chapter5/mnist_train_5_5.py
mnist_eval：https://github.com/xiaoyesoso/TensorFlowinAction/blob/master/InActionB1/chapter5/mnist_eval_5_5.py

入口

我们先看到程序的最后：

def main(argv=None):    mnist = input_data.read_data_sets("/home/soso/MNIST_data", one_hot=True)    train(mnist)if __name__ == '__main__':    tf.app.run()

if __name__ == '__main__':为Tensorflow提供的主程序的人口，tf.app.run()会调用main函数。

input_data.read_data_sets会自动帮你下载MNIST数字数据集，返回数据集mnist。该数据集用28×28（=784）的像素矩阵来存储图片数字，所以我们会设置参数INPUT_NODE = 784，矩阵元素取值范围在[0,1],0代表白色，1代表黑色。label是有长度为10的0/1向量表示数字0~9，因此参数OUTPUT_NODE = 10。如，向量第4个位置为1，其他位置都为0，则该数字为3。

训练

train：

    # 输入    x = tf.placeholder(tf.float32, [None, INPUT_NODE], name='x-input')    # label    y_ = tf.placeholder(tf.float32, [None, OUTPUT_NODE], name='y-input')    # 输出层——掩藏层1 的权值    weights1 = tf.Variable(tf.truncated_normal([INPUT_NODE, LAYER1_NONE], stddev=0.1))    # 输出层——掩藏层1 表达式中的 常数值    biases1 = tf.Variable(tf.constant(0.1, shape=[LAYER1_NONE]))   # 掩藏层1——输出层 的权值    weight2 = tf.Variable(tf.truncated_normal([LAYER1_NONE, OUTPUT_NODE], stddev=0.1))    # 掩藏层1——输出层 表达式中的 常数值    biases2 = tf.Variable(tf.constant(0.1, shape=[OUTPUT_NODE]))    # 构建出预测结果 y = inference(x, None, weights1, biases1, weight2, biases2)

我们来看下inference函数，由于avg_class为None，于是运行一下：

       # 这里使用relu作为激活函数，max（x,0）       # 构建出掩藏层1    layer1 = tf.nn.relu(tf.matmul(input_tensor, weights1) + biases1)    # 构建出预测结果    return tf.matmul(layer1, weights2) + biases2

滑动平均模型

在讲解后续的train函数之前，我们来降下滑动平均模型。滑动平均模型，是一个可以使得模型在测试数据上更加健壮的方法。滑动平均模型会为对每个变量(variable)维护一个影子变量(shadow variable)，shadow variable初始值等于相应variable的初始值，但每次运行variable更新时，shadow variable会更新：

shadow_variable=decay×shadow_variable+(1−decay)×variable

decay的初始值会接近于1（如，0.999）,实际decay也具有根据参数step动态设置为：

min{decay,1+step10+step}

trian:

    global_step = tf.Variable(0, trainable=False)        '''        tf.train.ExponentialMovingAverage：        传入 decay ， MOVING_AVERAGE_DECAY = 0.99        传入 step ， global_step = 0        构建出 variables_averages        '''        variables_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)    variables_averages_op = variables_averages.apply(tf.trainable_variables())     average_y = inference(x, variables_averages,weights1, biases1, weight2, biases2)

我们看下inference函数中另外的个分支：

    '''    与另个分支的区别在于    先使用 avg_class.average     转为 shadow_variable    再进行构造    '''    layer1 = tf.nn.relu(tf.matmul(input_tensor, avg_class.average(weights1)) + avg_class.average(biases1))    return tf.matmul(layer1,avg_class.average(weights2)) +avg_class.average(biases2)

损失函数

我们得到的预测结果y其实不一定是一个概率分布（即，各个label概率累加和不为1）。这里调动tf.nn.sparse_softmax_cross_entropy_with_logits来构建损失函数，还帮我们使用Softmax回归来处理得到符合概率分布结果：

softmax(yi)=eyi∑nj=1eyj

trian:

    # 注意，这里API 发生过变化    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y,labels=tf.argmax(y_, 1))    # 一个batch 数据损失函数的平均值    cross_entropy_mean = tf.reduce_mean(cross_entropy)

正则化

为了防止过拟合，我们在优化时并不是直接优化J(θ)（这里即cross_entropy_mean），而是优化J(θ)+λR(w)。其中R(w)刻画的是模型的复杂程度，而λ表示模型复杂损失在总损失中的比例，通常λ会接近于0，因为若λ太大会导致欠拟合。

我们来看下几种正则化的公式：

L1:R(w)=∑i|wi|
L2:R(w)=∑iw2i
L1和L2同时使用:R(w)=∑iα|wi|+(1−α)w2i

    '''    这里使用的是L2    lambda为，REGULARIZATION_RATE = 0.0001    '''    regularizer = tf.contrib.layers.l2_regularizer(REGULARIZATION_RATE)    regularization = regularizer(weights1) + regularizer(weight2)       # 构建出新的 损失函数    loss = cross_entropy_mean + regularization

学习率

学习率可以理解为做优化时的步长（如，做梯度下降往坡度最大方向走的步长）。当学习率太小，则需要进行非常轮学习;但当学习率太大，则会导致结果在极值两侧来回移动，无法达到最优解。我们这里调用了tf.train.exponential_decay，得到了衰减的学习率，公式如下：

decayed_learning_rate=learning_rate∗decay_rateglobal_step/dacay_steps

根据公式我们可以看出，学习率在不断的减小。这样就可以使得模型在刚开始学习时，大幅度的学习，再接近极值时小幅度的移动。

    '''     learning rate = LEARNING_RATE_BASE = 0.8     dacay step = mnist.train.num_examples / BATCH_SIZE     decay rate = LEARNING_RATE_DACAY = 0.99    '''     learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE, global_step,mnist.train.num_examples / BATCH_SIZE, LEARNING_RATE_DACAY)

评测

我们继续看接下来的train函数：

    # 使用随机梯度下降，优化loss    train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)    # train_op 包含了两个操作 train_step 和 variables_averages_op    train_op = tf.group(train_step, variables_averages_op)    # 构建 正确率    correct_prediction = tf.equal(tf.argmax(average_y, 1), tf.argmax(y_, 1))    #  构建一个batch的平均正确率    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))    # 开始 tf 会话    with tf.Session() as sess:        # 初始化变量         tf.initialize_all_variables().run()    # 验证集        validate_feed = {x: mnist.validation.images,y_: mnist.validation.labels}        # 测试集        test_feed = {x: mnist.test.images, y_: mnist.test.labels}    # TRAINING_STEPS = 30000        for i in range(TRAINING_STEPS):            # 每训练1000轮，打印出验证集的准确率            if i % 1000 == 0:                validate_acc = sess.run(accuracy, feed_dict=validate_feed)                print("After %d training step(s),v acc is %g"% (i, validate_acc))            # BATCH_SIZE = 100            xs, ys = mnist.train.next_batch(BATCH_SIZE)            sess.run(train_op, feed_dict={x: xs, y_: ys})        # 打印最终的准确率        test_acc = sess.run(accuracy, feed_dict=test_feed)print("After %d training step(s),test accuray is %g"% (TRAINING_STEPS, test_acc))

输出结果

After 0 training step(s),v acc is 0.0434After 1000 training step(s),v acc is 0.9768After 2000 training step(s),v acc is 0.9806After 3000 training step(s),v acc is 0.9838After 4000 training step(s),v acc is 0.9834After 5000 training step(s),v acc is 0.9838After 6000 training step(s),v acc is 0.9842After 7000 training step(s),v acc is 0.9836After 8000 training step(s),v acc is 0.9838After 9000 training step(s),v acc is 0.9844After 10000 training step(s),v acc is 0.9854After 11000 training step(s),v acc is 0.9852After 12000 training step(s),v acc is 0.9846After 13000 training step(s),v acc is 0.986After 14000 training step(s),v acc is 0.9858After 15000 training step(s),v acc is 0.985After 16000 training step(s),v acc is 0.9856After 17000 training step(s),v acc is 0.9862After 18000 training step(s),v acc is 0.986After 19000 training step(s),v acc is 0.9858After 20000 training step(s),v acc is 0.9862After 21000 training step(s),v acc is 0.9858After 22000 training step(s),v acc is 0.9864After 23000 training step(s),v acc is 0.9866After 24000 training step(s),v acc is 0.986After 25000 training step(s),v acc is 0.9862After 26000 training step(s),v acc is 0.986After 27000 training step(s),v acc is 0.9868After 28000 training step(s),v acc is 0.9864After 29000 training step(s),v acc is 0.9866After 30000 training step(s),test accuray is 0.9842

这里写图片描述

1 0