【TensorFlow】MNIST(使用LeNet5+滑动平均+正则化+指数衰减法+激活函数+模型持久化)

来源:互联网 发布:腾讯强力卸载软件 编辑:程序博客网 时间:2024/06/05 08:04

项目已上传至 GitHub —— lenet-5

下载MNIST数据集


官方下载地址(可能需要梯子)

http://yann.lecun.com/exdb/mnist/

这里提供了百度网盘的下载地址,需要的自取

链接: https://pan.baidu.com/s/1geOcXxT 密码: mws8

下载之后将其放在 mnist/data/ 文件夹下,目录结构如下

mnist/    data/        train-images-idx3-ubyte.gz        train-labels-idx1-ubyte.gz        t10k-images-idx3-ubyte.gz        t10k-labels-idx1-ubyte.gz

代码结构


为了使代码有更好的可读性和扩展性,需要将之按功能分为不同的模块,并将可重用的代码抽象成库函数

所以可以把代码分成三个模块

  • inference
  • train
  • evaluate

并且保存下来的模型需要单独放在 model 文件夹下

具体的文件夹目录如下

mnist/    data/        ......    LeNet-5/        model/            ......        inference.py        train.py        eval.py

完整代码


该代码实现自《TensorFlow:实战Google深度学习框架》

  1. 神经网络结构类似 LeNet-5,input -> conv -> maxpool -> conv -> maxpool -> fc -> fc -> output

  2. 使用的优化方法

    • 滑动平均
    • 正则化
    • 指数衰减法
  3. 使用 ReLU 激活函数

  4. 每隔 1000 轮保存一次模型

首先是 inference.py ,这个库函数负责模型训练及测试的前向传播过程

import tensorflow as tf# 定义神经网络相关参数OUTPUT_NODE = 10IMAGE_SIZE = 28NUM_CHANNELS = 1NUM_LABELS = 10# 第一层卷积层的尺寸和深度CONV1_DEEP = 32CONV1_SIZE = 5# 第二层卷积层的尺寸和深度CONV2_DEEP = 64CONV2_SIZE = 5# 全连接层的节点数FC_SIZE = 512# 前向传播def inference(input_tensor, train, regularizer):    # 声明第一层卷积层的变量并实现前向传播过程    with tf.variable_scope('layer1-conv1'):        conv1_weights = tf.get_variable(            'weight', [CONV1_SIZE, CONV1_SIZE, NUM_CHANNELS, CONV1_DEEP],            initializer=tf.truncated_normal_initializer(stddev=0.1))        conv1_biases = tf.get_variable(            'bias', [CONV1_DEEP], initializer=tf.constant_initializer(0.0))        # 使用边长为5,深度为32的过滤器,过滤器步长为1,使用全0填充        conv1 = tf.nn.conv2d(            input_tensor, conv1_weights, strides=[1, 1, 1, 1], padding='SAME')        relu1 = tf.nn.relu(tf.nn.bias_add(conv1, conv1_biases))    # 实现第二层池化层的前向传播。使用最大池化层,池化层过滤器的边长为2,步长为2,使用全0填充    with tf.variable_scope('layer2-pool1'):        pool1 = tf.nn.max_pool(            relu1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')    # 声明第三层卷积层的变量并实现前向传播    with tf.variable_scope('layer3-conv2'):        conv2_weights = tf.get_variable(            'weight', [CONV2_SIZE, CONV2_SIZE, CONV1_DEEP, CONV2_DEEP],            initializer=tf.truncated_normal_initializer(stddev=0.1))        conv2_biases = tf.get_variable(            'bias', [CONV2_DEEP], initializer=tf.constant_initializer(0.0))        # 使用边长为5,深度为64的过滤器,过滤器步长为1,使用全0填充        conv2 = tf.nn.conv2d(            pool1, conv2_weights, strides=[1, 1, 1, 1], padding='SAME')        relu2 = tf.nn.relu(tf.nn.bias_add(conv2, conv2_biases))    # 实现第四层池化层的前向传播    with tf.variable_scope('layer4-pool2'):        pool2 = tf.nn.max_pool(            relu2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')    # 将第四层池化层的输出转化为第五层全连接层的输入格式    pool_shape = pool2.get_shape().as_list()    nodes = pool_shape[1] * pool_shape[2] * pool_shape[3]    reshaped = tf.reshape(pool2, [pool_shape[0], nodes])    # 声明第五层全连接层的变量并前向传播    with tf.variable_scope('layer5-fc1'):        fc1_weights = tf.get_variable(            'weight', [nodes, FC_SIZE],            initializer=tf.truncated_normal_initializer(stddev=0.1))        fc1_biases = tf.get_variable(            'bias', [FC_SIZE], initializer=tf.constant_initializer(0.1))        # 只有全连接层的权重加入正则化        if regularizer != None:            tf.add_to_collection('losses', regularizer(fc1_weights))        fc1 = tf.nn.relu(tf.matmul(reshaped, fc1_weights) + fc1_biases)        # 只有训练的时候才使用dropout        if train:            fc1 = tf.nn.dropout(fc1, 0.5)    # 声明第六层的变量并前向传播    with tf.variable_scope('layer6-fc2'):        fc2_weights = tf.get_variable(            'weight', [FC_SIZE, NUM_LABELS],            initializer=tf.truncated_normal_initializer(stddev=0.1))        fc2_biases = tf.get_variable(            'bias', [NUM_LABELS], initializer=tf.constant_initializer(0.1))        if regularizer != None:            tf.add_to_collection('losses', regularizer(fc2_weights))        logit = tf.matmul(fc1, fc2_weights) + fc2_biases    # 返回第六层的输出    return logit

然后是 train.py ,训练模型的模块

import osimport tensorflow as tfimport numpy as npfrom tensorflow.examples.tutorials.mnist import input_dataimport inference# 优化方法参数LEARNING_RATE_BASE = 0.01  # 基础学习率LEARNING_RATE_DECAY = 0.99  # 学习率的衰减率REGULARIZATION_RATE = 0.0001  # 正则化项在损失函数中的系数MOVING_AVERAGE_DECAY = 0.99  # 滑动平均衰减率# 训练参数BATCH_SIZE = 100  # 一个训练batch中的图片数TRAINING_STEPS = 30000  # 训练轮数# 模型保存的路径和文件名MODEL_SAVE_PATH = 'model/'MODEL_NAME = 'lenet5.ckpt'def train(mnist):    # 实现模型    x = tf.placeholder(        tf.float32, [            BATCH_SIZE, inference.IMAGE_SIZE, inference.IMAGE_SIZE,            inference.NUM_CHANNELS        ],        name='x-input')  # 输入层    y_ = tf.placeholder(        tf.float32, [None, inference.OUTPUT_NODE], name='y-input')  # 标签    regularizer = tf.contrib.layers.l2_regularizer(        REGULARIZATION_RATE)  # 定义L2正则化损失函数    y = inference.inference(x, True, regularizer)  # 输出层    # 存储训练轮数,设置为不可训练    global_step = tf.Variable(0, trainable=False)    # 设置滑动平均方法    variable_averages = tf.train.ExponentialMovingAverage(        MOVING_AVERAGE_DECAY, global_step)  # 定义滑动平均类    variable_averages_op = variable_averages.apply(        tf.trainable_variables())  # 在所有可训练的变量上使用滑动平均    # 设置指数衰减法    learning_rate = tf.train.exponential_decay(        LEARNING_RATE_BASE, global_step, mnist.train.num_examples / BATCH_SIZE,        LEARNING_RATE_DECAY)    # 最小化损失函数    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(        logits=y, labels=tf.argmax(y_, 1))  # 计算每张图片的交叉熵    cross_entropy_mean = tf.reduce_mean(cross_entropy)  # 计算当前batch中所有图片的交叉熵平均值    loss = cross_entropy_mean + tf.add_n(        tf.get_collection('losses'))  # 总损失等于交叉熵损失和正则化损失的和    train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(        loss, global_step=global_step)  # 优化损失函数    # 同时反向传播和滑动平均    with tf.control_dependencies([train_step, variable_averages_op]):        train_op = tf.no_op(name='train')    # 初始化持久化类    saver = tf.train.Saver()    # 开始训练    with tf.Session() as sess:        # 初始化所有变量        tf.global_variables_initializer().run()        # 迭代训练        for i in range(TRAINING_STEPS):            # 产生该轮batch            xs, ys = mnist.train.next_batch(BATCH_SIZE)            xs = np.reshape(                xs, (BATCH_SIZE, inference.IMAGE_SIZE, inference.IMAGE_SIZE,                     inference.NUM_CHANNELS))  # 将MNIST数据格式转为四维矩阵            _, loss_value, step = sess.run(                [train_op, loss, global_step], feed_dict={                    x: xs,                    y_: ys                })            # 每1000轮保存一次模型            if i % 1000 == 0:                # 输出训练情况                print('After %d training steps, loss is %g.' % (step,                                                                loss_value))                # 保存当前模型                saver.save(                    sess,                    os.path.join(MODEL_SAVE_PATH, MODEL_NAME),                    global_step=global_step)# 主程序入口def main(argv=None):    mnist = input_data.read_data_sets('../data/', one_hot=True)    train(mnist)if __name__ == '__main__':    tf.app.run()

最后是 eval.py ,可以在训练模型的同时,每隔一段时间利用最新保存的模型进行测试

import timeimport tensorflow as tfimport numpy as npfrom tensorflow.examples.tutorials.mnist import input_dataimport inferenceimport train# 每10秒加载一次最新的模型,并在测试数据上测试最新模型的正确率EVAL_INTERVAL_SECS = 10NUM_VALIDATION = 5000def evaluate(mnist):    with tf.Graph().as_default() as g:        # 定义输入输出的格式        x = tf.placeholder(            tf.float32, [                NUM_VALIDATION, inference.IMAGE_SIZE, inference.IMAGE_SIZE,                inference.NUM_CHANNELS            ],            name='x-input')        y_ = tf.placeholder(            tf.float32, [NUM_VALIDATION, inference.OUTPUT_NODE],            name='y-input')        y = inference.inference(x, False, None)        # 验证集        xs = np.reshape(mnist.validation.images, [            NUM_VALIDATION, inference.IMAGE_SIZE, inference.IMAGE_SIZE,            inference.NUM_CHANNELS        ])        validate_feed = {x: xs, y_: mnist.validation.labels}        # 评估模型        correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))        # 通过变量重命名方式加载模型,获取滑动平均值        variable_averages = tf.train.ExponentialMovingAverage(            train.MOVING_AVERAGE_DECAY)        variables_to_restore = variable_averages.variables_to_restore()        saver = tf.train.Saver(variables_to_restore)        # 每隔10秒检测正确率        while True:            with tf.Session() as sess:                ckpt = tf.train.get_checkpoint_state(train.MODEL_SAVE_PATH)                if ckpt and ckpt.model_checkpoint_path:                    # 加载模型                    saver.restore(sess, ckpt.model_checkpoint_path)                    # 通过文件名字获取该模型保存的轮数                    global_step = ckpt.model_checkpoint_path.split('/')[                        -1].split('-')[-1]                    # 验证并输出结果                    accuracy_score = sess.run(                        accuracy, feed_dict=validate_feed)                    print(                        'After %s training steps, validattion accuracy = %g' %                        (global_step, accuracy_score))                else:                    print('No checkpoint file found')                    return            time.sleep(EVAL_INTERVAL_SECS)def main(argv=None):    mnist = input_data.read_data_sets('../data/', one_hot=True)    evaluate(mnist)if __name__ == '__main__':    tf.app.run()

运行结果


在原书中,没有说明学习率的设置情况,当我按学习率为 0.8 时,运行结果如下

After 1 training steps, loss is 6.73467.After 1001 training steps, loss is 22.4669.After 2001 training steps, loss is 19.5605.After 3001 training steps, loss is 17.1056.After 4001 training steps, loss is 15.0407.After 5001 training steps, loss is 13.2951.After 6001 training steps, loss is 11.8285.After 7001 training steps, loss is 10.5666.After 8001 training steps, loss is 9.48308.After 9001 training steps, loss is 8.57099.After 10001 training steps, loss is 7.77436.After 11001 training steps, loss is 7.10688.After 12001 training steps, loss is 6.53394.......

可以看出模型的参数没有很好的收敛,说明指数衰减法中与学习率相关的参数设置的不合理,即初始学习率、衰减系数、衰减速度,而这些参数都是根据经验设置的,所以应该修改这三个参数,使模型的参数收敛到一个极小值

将初始学习率修改为 0.01 之后的结果如下

......After 17001 training steps, loss is 0.646947.After 18001 training steps, loss is 0.673772.After 19001 training steps, loss is 0.69196.After 20001 training steps, loss is 0.683031.After 21001 training steps, loss is 0.648217.After 22001 training steps, loss is 0.634914.After 23001 training steps, loss is 0.658528.After 24001 training steps, loss is 0.64015.After 25001 training steps, loss is 0.615027.After 26001 training steps, loss is 0.631007.After 27001 training steps, loss is 0.640536.After 28001 training steps, loss is 0.611081.After 29001 training steps, loss is 0.611459.

这时运行 eval.py 进行测试,结果如下

After 29001 training steps, validattion accuracy = 0.9906

可见学习率会极大的影响模型参数最终的收敛效果

但是损失函数下降的速度和迭代之后总损失的大小没有必然联系,所以只需要调整初始学习率和衰减系数

LEARNING_RATE_BASE = 0.01  # 基础学习率LEARNING_RATE_DECAY = 0.99  # 学习率的衰减率
阅读全文
1 0
原创粉丝点击