Tensorflow手写数字识别之简单神经网络分类与CNN分类效果对比

来源：互联网发布：计价软件编辑：程序博客网时间：2024/06/08 09:15

用Tensorflow进行深度学习和人工智能具有开发简单，建模速度快，准确度高的优点。作为学习图像识别分类的入门，手写输入数字识别是个很好的例子。

MNIST包中共有60000个手写数字笔迹灰度图像作为训练集，每张手写数字笔迹图片均已保存为28*28像素，同时还有一个label集对这60000个训练图像一一标识。此外，还有一个测试集，包括10000张新的手写笔记灰度图像，以及一个对应10000张图片的标记。通过使用60000张训练集图片及label集分别创建简单的MNIST模型和CNN卷积神经网络模型，而后使用10000张测试图片及对应的label集对比不同模型效果。

A. 创建简单神经网络模型步骤如下：

1. 鉴于每张图片分辨率为28*28像素，即28行28列个数据，对于简单MNIST模型，这样的数据结构还过于复杂，若将图像中所有像素的二维关系转化为一维关系，模型建立和训练将会很简单。为将该图片中的所有像素串行化，即将该图片格式变为一行784列（1*784的结构）。对于模型的输出，可使用一个一行十列的结构，表示该模型分析手写图片后对应数字0~9的概率，概率最大者为1，其余9个为0。假设输入图像为n，则输入数据集可表示为一个二维张量[n, 784]，对于输出，使用[n, 10]的二维张量。程序中使用占位符placeholder表示，张数参数n使用None占位，由具体输入的图像张数初始化。

#define place holder for inputs to network

xs =tf.placeholder(tf.float32, [None,784])#28*28

ys =tf.placeholder(tf.float32, [None,10])

2. 添加中间层网络。可使用Y =XW + b的定义中间层模型，X表示输入的数据集（为[n,784]的二维张量）; W为weight权重张量，为[784, 10]的张量，XW做矩阵乘法后得到[n, 10]的张量; b为bias量，维度为[1，10]; Y为预测结果张量，该结果张量还需要使用激励函数处理，以拉开预测各数字概率，提高预测正确性，本程序中使用tf.nn.softmax方法，专门针对n选一的用例。

def add_layer(inputs,in_size, out_size, activation_function=None):

#add one morelayer and return the output of this layer

W = tf.Variable(tf.random_normal([in_size,out_size]))

b = tf.Variable(tf.zeros([1,out_size])+0.1)

Wb = tf.matmul(inputs, W)+b

if activation_functionis None:

outputs = Wb

else:

outputs = activation_function(Wb)

return outputs

3. 创建并定义网络。首先定义prediction张量，其值为添加中间层网络的返回张量。之后计算交叉熵cross_entropy，并使用梯度下降优化器GradientDescentOptimizer对交叉熵处理并训练得到张量train_step。

#add output layer

prediction= add_layer(xs, 784, 10,activation_function= tf.nn.softmax)

#the error between prediction and real data

cross_entropy= tf.reduce_mean(-tf.reduce_sum(ys* tf.log(prediction),reduction_indices=[1]))#loss

train_step= tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

4. 训练网络，首先要对所有变量初始化，之后，每次从训练集中随机去除100个样本训练网络，总共训练1001次得到训练模型

with tf.Session()assess:

if int((tf.__version__).split('.')[1]) <12andint((tf.__version__).split('.')[0])<1:

init =tf.initialize_all_veriables()

else:

init =tf.global_variables_initializer()

print(tf.__version__) sess.run(init)

for i inrange(1001):

batch_xs, batch_ys =mnist.train.next_batch(100)

sess.run(train_step, feed_dict = {xs: batch_xs, ys: batch_ys})

5. 计算模型准确性，算法如下，v_xs为输入的测试图像集，v_ys为输入测试图像对应的label集。依据输入v_xs计算出的预测结果集为y_pre将与v_ys这个label集进行对比，如果相同则判断正确，否则为错误，计算出的正确结果保存在correct_prediction 中。之后将correct_prediction张量转换为float32格式，并求均值得到正确率。

def compute_accuracy(v_xs,v_ys):

global prediction

y_pre = sess.run(prediction, feed_dict= {xs:v_xs})

correct_prediction = tf.equal(tf.argmax(y_pre,1), tf.argmax(v_ys,1))

accuracy =tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

result = sess.run(accuracy, feed_dict= {xs: v_xs, ys:v_ys})

return result

B. 创建CNN模型步骤如下：

1. 对于CNN网络，无需将图像转换为一维张量，保持其28*28*1（1为图像的channel数，灰度图像为1，彩色图像为3）的样式进行卷积，卷积后，图像将被变为28*28*32的张量。

2. 定义卷积核。卷积核为[5,5,1,32]的思维张量，该卷积核为5*5的大小，输入size为1，输出size为32

def kernel_variable(shape):

initial = tf.truncated_normal(shape=shape,stddev=0.1)

return tf.Variable(initial)

w_conv1 = kernel_variable([5,5,1,32])

3. 定义bias偏量，其输出size为32

def bias_variable(shape):

initial = tf.constant(0.1, shape=shape)

return tf.Variable(initial)

b_conv1 = bias_variable([32])

4. 构建两层卷积层，每层卷积的输出层均被relu激励函数处理，而后池化，作为下一层网络的输入。第一层卷积层处理后将n*28*28*1的图像集转换为n*28*28*32的维度，经历池化后变为n*14*14*32。第二层卷积层将第一层卷积层的输出由n*14*14*32变为n*14*14*64，经历池化后变为n*7*7*64维度。

# conv1 layer

w_conv1= kernel_variable([5,5,1,32]) #kernel 5*5, insize 1, out size 32

b_conv1= bias_variable([32])

h_conv1= tf.nn.relu(conv2d(x_image, w_conv1)+b_conv1) #output size 28*28*32

h_pool1= max_pool_2x2(h_conv1) #output size 14*14*32

# conv2 layer

w_conv2= kernel_variable([5,5,32,64]) #kernel 5*5, insize 32, out size 64

b_conv2= bias_variable([64])

h_conv2= tf.nn.relu(conv2d(h_pool1, w_conv2)+ b_conv2) #outputsize 14*14*64

h_pool2= max_pool_2x2(h_conv2) #output size 7*7*64

5. 建立两层神经网络预测结果。第一层神经网络现将第二次池化后的n*7*7*64的四维张量输入图像转换为n*3136的二维张量，3136是将7*7*64三维的数据转换为一维，之后该n*3136的张量与weight权重矩阵（[3136,1024] 的张量）相乘得到n*1024的二维张量输出给第二层网络层。为了应对过拟合，使用dropout以0.5的概率故意丢弃部分网络节点以提高网络适应性。第二层网络层权重矩阵为1024*10，与第一次输出结果相乘后得到n*10的结果集合。对于一对一的输出结果，可采用sigmod处理，对于一对多的输出，如本例，采用softmax。

# fc1 layer

w_fc1= kernel_variable([7*7*64,1024])

b_fc1= bias_variable([1024])

h_pool2_flat= tf.reshape(h_pool2, [-1,7*7*64])

h_fc1= tf.nn.relu(tf.matmul(h_pool2_flat, w_fc1)+b_fc1)

h_fc1_drop= tf.nn.dropout(h_fc1, keep_prob)

# fc2 layer

w_fc2= kernel_variable([1024,10])

b_fc2= bias_variable([10])

prediction_CNN= tf.nn.softmax(tf.matmul(h_fc1_drop,w_fc2)+b_fc2)

6.训练CNN网络。首先初始化所有变量。而后从训练集中每次取出100张图片和label训练网络，共训练1000次。

cross_entropy_CNN = tf.reduce_mean(-tf.reduce_sum(ys*tf.log(prediction_CNN),reduction_indices=[1]))#loss

train_step_CNN = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy_CNN)

with tf.Session()assess:

if int((tf.__version__).split('.')[1]) <12andint((tf.__version__).split('.')[0])<1:

init =tf.initialize_all_veriables()

else:

init =tf.global_variables_initializer()

print(tf.__version__)

sess. run(init)

for i in range(1001):

batch_xs, batch_ys =mnist.train.next_batch(100)

sess.run(train_step_CNN, feed_dict={xs: batch_xs,ys: batch_ys, keep_prob:0.5})

7.计算模型准确性，算法如下，v_xs为输入的测试图像集，v_ys为输入测试图像对应的label集。依据输入v_xs计算出的预测结果集为y_pre将与v_ys这个label集进行对比，如果相同则判断正确，否则为错误，计算出的正确结果保存在correct_prediction 中。之后将correct_prediction张量转换为float32格式，并求均值得到正确率。

def compute_accuracy(v_xs, v_ys):

global prediction_CNN

y_pre = sess.run(prediction_CNN,feed_dict= {xs:v_xs})

correct_prediction =tf.equal(tf.argmax(y_pre,1), tf.argmax(v_ys,1))

accuracy =tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

result = sess.run(accuracy, feed_dict= {xs: v_xs,ys:v_ys})

return result

8.每训练100次，使用测试集对网络当前训练结果进行检测，打印预测正确率。

for i inrange(1001):

batch_xs, batch_ys =mnist.train.next_batch(100)

sess.run(train_step, feed_dict = {xs: batch_xs,ys: batch_ys})

sess.run(train_step_CNN, feed_dict={xs: batch_xs,ys: batch_ys, keep_prob:0.5})

if i%100==0:

print('correctness: ', i,' is ',compute_accuracy(mnist.test.images, mnist.test.labels))

print('correctness_CNN: ', i,' is ',compute_accuracy_CNN(mnist.test.images, mnist.test.labels))

C. 结果对比如下：如下图可见，CNN网络准确性随着训练次数增加而提升，最后能打奥0.9683的准确度（完全正确为1），而简单MNIST在训练到800次时出现过拟合，准确率从最高的0.8692降到了0.098。我的电脑比较老，i5 （2410M）的CPU，8G内存，训练大约需要15分钟，对CPU使用率要求较高，内存在CNN网络训练时占用量较大。

途中红线为普通神经网络结果，蓝线为CNN网络结果，由左图可见，两种方法的loss都在随着训练次数的增加而降低，但是CNN能够更接近0，表现更出众，而预测精度也是类似，普通网络能达到约87%的正确率，但CNN网络可以达到97%，精度提升显著。每轮的计算结果如下：

correctness: 0 is  0.147100001574
correctness_CNN: 0 is  0.12120000273
loss: 0 is  9.97904
loss_CNN: 0 is  5.7561
correctness: 100 is  0.73710000515
correctness_CNN: 100 is  0.888899981976
loss: 100 is  1.38197
loss_CNN: 100 is  0.353873
correctness: 200 is  0.805999994278
correctness_CNN: 200 is  0.930100023746
loss: 200 is  0.997057
loss_CNN: 200 is  0.235152
correctness: 300 is  0.825699985027
correctness_CNN: 300 is  0.940500020981
loss: 300 is  0.866042
loss_CNN: 300 is  0.196917
correctness: 400 is  0.847999989986
correctness_CNN: 400 is  0.951200008392
loss: 400 is  0.753898
loss_CNN: 400 is  0.165623
correctness: 500 is  0.853100001812
correctness_CNN: 500 is  0.954999983311
loss: 500 is  0.697782
loss_CNN: 500 is  0.147157
correctness: 600 is  0.860800027847
correctness_CNN: 600 is  0.960699975491
loss: 600 is  0.666501
loss_CNN: 600 is  0.137592
correctness: 700 is  0.866400003433
correctness_CNN: 700 is  0.963800013065
loss: 700 is  0.618222
loss_CNN: 700 is  0.119138
correctness: 800 is  0.868799984455
correctness_CNN: 800 is  0.967599987984
loss: 800 is  0.59465
loss_CNN: 800 is  0.108558
correctness: 900 is  0.875800013542
correctness_CNN: 900 is  0.969799995422
loss: 900 is  0.567654
loss_CNN: 900 is  0.101511
correctness: 1000 is  0.87349998951
correctness_CNN: 1000 is  0.971400022507
loss: 1000 is  0.564226
loss_CNN: 1000 is  0.0913478

D. 完整代码如下：

from __future__ import print_functionimport tensorflow as tffrom tensorflow.examples.tutorials.mnist import input_dataimport osimport numpy as npimport matplotlib.pyplot as pltMODEL_SAVE_PATH="my_net/"MODEL_NAME="save_net.ckpt"#number 1 to 10 datamnist = input_data.read_data_sets('MNIST_data', one_hot=True)def add_layer(inputs, in_size, out_size, activation_function=None):    #add one more layer and return the output of this layer    W = tf.Variable(tf.random_normal([in_size, out_size]))    b = tf.Variable(tf.zeros([1,out_size])+0.1)    Wb = tf.matmul(inputs, W)+b    if activation_function is None:        outputs = Wb    else:        outputs = activation_function(Wb)    return outputsdef compute_accuracy(v_xs, v_ys):    global  prediction    y_pre = sess.run(prediction, feed_dict = {xs:v_xs})    correct_prediction = tf.equal(tf.argmax(y_pre,1), tf.argmax(v_ys,1))    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))    result = sess.run(accuracy, feed_dict = {xs: v_xs, ys:v_ys})    return resultdef compute_accuracy_CNN(v_xs, v_ys):    global  prediction_CNN    y_pre = sess.run(prediction_CNN, feed_dict = {xs:v_xs, keep_prob:1})    correct_prediction = tf.equal(tf.argmax(y_pre,1), tf.argmax(v_ys,1))    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))    result = sess.run(accuracy, feed_dict = {xs: v_xs, ys:v_ys, keep_prob:1})    return resultdef kernel_variable(shape):    initial = tf.truncated_normal(shape=shape, stddev = 0.1)    return tf.Variable(initial)def bias_variable(shape):    initial = tf.constant(0.1, shape=shape)    return tf.Variable(initial)def conv2d(x,W):    #stride [1, x_movement, y_movement,1]    #stride[0] and stride[3] must be 1    return tf.nn.conv2d(x, W, strides = [1,1,1,1], padding = 'SAME')def max_pool_2x2(x):    # stride [1, x_movement, y_movement,1]    return tf.nn.max_pool(x, ksize= [1,2,2,1], strides=[1,2,2,1], padding='SAME')#define place holder for inputs to networkxs = tf.placeholder(tf.float32, [None, 784]) #28*28ys = tf.placeholder(tf.float32, [None, 10])keep_prob = tf.placeholder(tf.float32)x_image = tf.reshape(xs, [-1,28,28,1])# conv1 layerw_conv1 = kernel_variable([5,5,1,32])   #kernel 5*5, in size 1, out size 32b_conv1 = bias_variable([32])h_conv1 = tf.nn.relu(conv2d(x_image, w_conv1)+b_conv1)  #output size 28*28*32h_pool1 = max_pool_2x2(h_conv1)         #output size 14*14*32# conv2 layerw_conv2 = kernel_variable([5,5,32,64])  #kernel 5*5, in size 32, out size 64b_conv2 = bias_variable([64])h_conv2 = tf.nn.relu(conv2d(h_pool1, w_conv2)+ b_conv2) #output size 14*14*64h_pool2 = max_pool_2x2(h_conv2)         #output size 7*7*64# fc1 layerw_fc1 = kernel_variable([7*7*64, 1024])b_fc1 = bias_variable([1024])h_pool2_flat = tf.reshape(h_pool2, [-1,7*7*64])h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, w_fc1)+b_fc1)h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)# fc2 layerw_fc2 = kernel_variable([1024,10])b_fc2 = bias_variable([10])prediction_CNN = tf.nn.softmax(tf.matmul(h_fc1_drop,w_fc2)+b_fc2)#add output layerprediction = add_layer(xs, 784, 10, activation_function= tf.nn.softmax)#the error between prediction and real datacross_entropy = tf.reduce_mean(-tf.reduce_sum(ys* tf.log(prediction), reduction_indices=[1])) #losscross_entropy_CNN = tf.reduce_mean(-tf.reduce_sum(ys* tf.log(prediction_CNN), reduction_indices=[1])) #losstrain_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)train_step_CNN = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy_CNN)saver = tf.train.Saver()  # define a saver for saving and restoringTotal_test_loss = np.zeros((int(1001/100)+1), float)Total_test_loss_CNN = np.zeros((int(1001/100)+1), float)Total_test_acc = np.zeros((int(1001/100)+1), float)Total_test_acc_CNN = np.zeros((int(1001/100)+1), float)count =0with tf.Session() as sess:    if int((tf.__version__).split('.')[1]) <12 and int((tf.__version__).split('.')[0])<1:        init = tf.initialize_all_veriables()    else:        init = tf.global_variables_initializer()    print(tf.__version__)    sess. run(init)    for i in range(1001):        batch_xs, batch_ys = mnist.train.next_batch(100)        sess.run(train_step, feed_dict = {xs: batch_xs, ys: batch_ys})        sess.run(train_step_CNN, feed_dict={xs: batch_xs, ys: batch_ys, keep_prob: 0.5})        if i%100 ==0:            Total_test_acc[count] = compute_accuracy(mnist.test.images, mnist.test.labels)            Total_test_acc_CNN[count] = compute_accuracy_CNN(mnist.test.images, mnist.test.labels)            print('correctness:         ', i, ' \tis \t', Total_test_acc[count])            print('correctness_CNN:     ', i, ' \tis \t', Total_test_acc_CNN[count])            loss = sess.run(cross_entropy, feed_dict={xs: mnist.test.images, ys: mnist.test.labels, keep_prob: 1.0})            loss_CNN = sess.run(cross_entropy_CNN,                                feed_dict={xs: mnist.test.images, ys: mnist.test.labels, keep_prob: 1.0})            print('loss:                ', i, ' \tis \t', loss)            print('loss_CNN:            ', i, ' \tis \t', loss_CNN)            Total_test_loss[count] = loss            Total_test_loss_CNN[count] = loss_CNN            count += 1    saver.save(sess, os.path.join(MODEL_SAVE_PATH, MODEL_NAME), write_meta_graph=False)    # plotting    plt.figure(1, figsize=(15, 5))    plt.subplot(121)    # plt.scatter(x, y)    plt.ylabel('Compare Losses')    plt.plot(Total_test_loss, 'r-', lw=5)    plt.plot(Total_test_loss_CNN, 'b-', lw=5)    plt.text(-1, -1, 'Loss Chart')    plt.subplot(122)    # plt.scatter(x, y)    plt.ylabel('Compare Accuracy:')    plt.plot(Total_test_acc, 'r-', lw=5)    plt.plot(Total_test_acc_CNN, 'b-', lw=5)    plt.text(-1, -1, 'Accuracy Chart')    plt.show()

阅读全文

1 0