[action]tensorflow深度学习实战 (4) 实现简单卷积神经网络

来源：互联网发布：几种搜索引擎算法比较编辑：程序博客网时间：2024/05/23 23:33

我们首先导入已经清洗好的数据。

这个清洗过程在之前的博文实战(1)。

# These are all the modules we'll be using later. Make sure you can import them# before proceeding further.from __future__ import print_functionimport numpy as npimport tensorflow as tffrom six.moves import cPickle as picklefrom six.moves import rangepickle_file = 'notMNIST.pickle'with open(pickle_file, 'rb') as f:    save = pickle.load(f)    train_dataset = save['train_dataset']    train_labels = save['train_labels']    valid_dataset = save['valid_dataset']    valid_labels = save['valid_labels']    test_dataset = save['test_dataset']    test_labels = save['test_labels']    del save  # hint to help gc free up memory    print('Training set', train_dataset.shape, train_labels.shape)    print('Validation set', valid_dataset.shape, valid_labels.shape)    print('Test set', test_dataset.shape, test_labels.shape)

之后需要转换好数据的维度。

卷积神经网络需要的图片矩阵是三维的，这一点需要做一下改变。(利用np.reshape)

其次，所有的label都是one-shot的。

image_size = 28num_labels = 10num_channels = 1 # grayscale# it is a picture in gray scale originally we only need add a dimentiondef reformat(dataset, labels):    dataset = dataset.reshape((-1, image_size, image_size, num_channels)).astype(np.float32)    labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)    return dataset, labelstrain_dataset, train_labels = reformat(train_dataset, train_labels)valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)test_dataset, test_labels = reformat(test_dataset, test_labels)print('Training set', train_dataset.shape, train_labels.shape)print('Validation set', valid_dataset.shape, valid_labels.shape)print('Test set', test_dataset.shape, test_labels.shape)def accuracy(predictions, labels):  return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1)) / predictions.shape[0])

这样我们得到了如下数据：

Training set (200000, 28, 28, 1) (200000, 10)Validation set (10000, 28, 28, 1) (10000, 10)Test set (10000, 28, 28, 1) (10000, 10)

1.简单卷积神经网络

tensorflow提供了卷积函数nn.conv2d。

我们来看一下这个函数的文档，学习下如何使用。

Signature: tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None)Docstring:Computes a 2-D convolution given 4-Dinput and filter tensors.Given an input tensor of shape [batch, in_height, in_width, in_channels]and a filter / kernel tensor of shape[filter_height, filter_width, in_channels, out_channels], this opperforms the following:    Flattens the filter to a 2-D matrix with shape[filter_height * filter_width * in_channels, output_channels].    Extracts image patches from the input tensor to form avirtual tensor of shape [batch, out_height, out_width, filter_height * filter_width * in_channels].    For each patch, right-multiplies the filter matrix and the image patchvector.In detail, with the default NHWC format,output[b, i, j, k] =    sum_{di, dj, q} input[b, strides[1] * i + di, strides[2] * j + dj, q] *                    filter[di, dj, q, k]Must have strides[0] = strides[3] = 1. For the most common case of the samehorizontal and vertices strides,strides = [1, stride, stride, 1].Args:input:A Tensor. Must be one of the following types:half, float32, float64.filter: A Tensor. Must have the same type asinput.strides: A list of ints. 1-D of length 4. The stride of the sliding window for each dimension ofinput. Must be in the same order as the dimension specified with format.padding: A string from:"SAME", "VALID". The type of padding algorithm to use.use_cudnn_on_gpu: An optionalbool. Defaults to True.data_format: An optional string from: "NHWC", "NCHW". Defaults to "NHWC". Specify the data format of the input and output data. With the default format "NHWC", the data is stored in the order of: [batch, in_height, in_width, in_channels]. Alternatively, the format could be "NCHW", the data storage order of: [batch, in_channels, in_height, in_width]. name: A name for the operation (optional).Returns: A Tensor. Has the same type asinput.Type: function

输入input就是我们的一组图片数据，这里是4维的。

filter：过滤器，也叫权重矩阵，要与input的类型一样。

stride：步长，含有四个整数的列表，一般第一个和第四个都应该为1，中间两个分别代表像素扫描的间隔。

padding：输入分割的方式，valid和same

最为重要的使理解conv2d对我们的输入参数都做了什么变化，这样我们才能知道如何设置好输入参数，满足conv2d的需要。

在conv2d中，

假设inpute的四个维度是[batch, in_height, in_width, in_channels]，

filter的四个维度是[filter_height, filter_width, in_channels, out_channels]。

进入conv2d函数后，filter被拉伸为2维数组.

新的二维数组A的维度是[filter_height＊filter_width＊in_channels, out_channels].

其次，input数组依然为4维数组，但是维度发生了变化。

input产生新的4维数组B的维度是[batch, out_height, out_width, filter_height * filter_width * in_channels]。

最后进行乘法B＊A。

所以在设置filter时应注意filter的维度以及input的维度的设置，否则conv2d无法进行运算。

领会了其中的缘由就可以设置好自己的卷积神经网络。

batch_size = 128patch_size = 5 # padding image pixels by 5*5depth = 16  # depthnum_hidden = 64 # num of  node in hidden layergraph = tf.Graph()with graph.as_default():        # Input data.    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, num_channels)) # num_channels=1 grayscale     tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))        tf_valid_dataset = tf.constant(valid_dataset)    tf_test_dataset = tf.constant(test_dataset)      # Variables.    layer1_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth], stddev=0.1))    layer1_biases = tf.Variable(tf.zeros([depth]))        layer2_weights = tf.Variable(tf.truncated_normal( [patch_size, patch_size, depth, depth], stddev=0.1))    layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))       layer3_weights = tf.Variable(tf.truncated_normal([image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))    layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))      layer4_weights = tf.Variable(tf.truncated_normal([num_hidden, num_labels], stddev=0.1))    layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))    # Model.    def model(data):        # data (batch, 28, 28, 1)                # weights reshaped to (patch_size*patch_size*num_channels, depth)        # data reshaped to (batch, 14, 14,  patch_size*patch_size*num_channels)        # conv shape (batch, 14, 14, depth)        conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME') # convolution        hidden = tf.nn.relu(conv + layer1_biases)        # weights shape (patch_size, patch_size, depth, depth)        # weights reshaped into (patch_size*patch_size* depth, depth)        # hidden reshaped into (batch, 7, 7, patch_size*patch_size* depth)        # conv shape (batch, 7, 7, depth)        conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME') # convolution        hidden = tf.nn.relu(conv + layer2_biases)        #  hidden shape (batch, 7, 7, depth)        shape = hidden.get_shape().as_list()        reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])         # reshape (batch, 7*7*depth)        # weights shape( 28//4 * 28//4*depth, num_hidden)        # hidden shape(batch, num_hidden)        hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)         #  return tensor  (batch, num_labels)        return tf.matmul(hidden, layer4_weights) + layer4_biases      # Training computation.    logits = model(tf_train_dataset)    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))        # Optimizer.    optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)      # Predictions for the training, validation, and test data.    train_prediction = tf.nn.softmax(logits)    valid_prediction = tf.nn.softmax(model(tf_valid_dataset))    test_prediction = tf.nn.softmax(model(tf_test_dataset))

num_steps = 1001with tf.Session(graph=graph) as session:  tf.initialize_all_variables().run()  print('Initialized')  for step in range(num_steps):    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)    batch_data = train_dataset[offset:(offset + batch_size), :, :, :]    batch_labels = train_labels[offset:(offset + batch_size), :]    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}    _, l, predictions = session.run(      [optimizer, loss, train_prediction], feed_dict=feed_dict)    if (step % 50 == 0):      print('Minibatch loss at step %d: %f' % (step, l))      print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))      print('Validation accuracy: %.1f%%' % accuracy(        valid_prediction.eval(), valid_labels))  print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

`2. 池化 pooling`

pooling会充分利用输入的信息，减少信息流失，但同样增加了计算量。

tensorflow提供了max pooling 以及 average pooling函数，计算基本类似，只不过将求最大值改为了求平均值。

nn.max_pool函数的文档如下：

Signature: tf.nn.max_pool(value, ksize, strides, padding, data_format='NHWC', name=None)Docstring:Performs the max pooling on the input.Args: value: A 4-D Tensor with shape[batch, height, width, channels] and type tf.float32.ksize: A list of ints that has length >= 4. The size of the window for each dimension of the input tensor.strides: A list of ints that has length >= 4. The stride of the sliding window for each dimension of the input tensor.padding: A string, either 'VALID' or'SAME'. The padding algorithm. See the comment heredata_format: A string. 'NHWC' and 'NCHW' are supported.name: Optional name for the operation.Returns: A Tensor with type tf.float32. The max pooled output tensor. Type: function

ksize的意思为 kernel size，假设kszie=［1，2，2，1］。ksize定义要取最大值的元素范围，即像素点A，以A为中心距A在2＊2范围内的像素，都会被遍历，并寻找到其中最大的元素。

stride与conv2d中的定义一样，理解为扫描步长。它的存在，定义了，max_pool函数返回的tensor的维度。

在写卷积神经网络时，最好先定义好模型函数(如上文的model函数)，这样再在前面补充权重矩阵的维度。没有写普通神经网络那么简洁(定义权重矩阵的时候,模型就定好了)。

# 87.5%batch_size = 128patch_size = 5 # padding image pixels by 5*5depth = 16  # depthnum_hidden = 64 # num of  node in hidden layergraph = tf.Graph()with graph.as_default():        # Input data.    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, num_channels)) # num_channels=1 grayscale     tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))        tf_valid_dataset = tf.constant(valid_dataset)    tf_test_dataset = tf.constant(test_dataset)      # Variables.    layer1_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth], stddev=0.1))    layer1_biases = tf.Variable(tf.zeros([depth]))        layer2_weights = tf.Variable(tf.truncated_normal( [patch_size, patch_size, depth, depth], stddev=0.1))    layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))       layer3_weights = tf.Variable(tf.truncated_normal([28//7 * 28//7 * depth, num_hidden], stddev=0.1))    layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))      layer4_weights = tf.Variable(tf.truncated_normal([num_hidden, num_labels], stddev=0.1))    layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))    # Model.    def model(data):        # data (batch, 28, 28, 1)                # weights reshaped to (patch_size*patch_size*num_channels, depth)        # data reshaped to (batch, 14, 14,  patch_size*patch_size*num_channels)        # conv shape (batch, 14, 14, depth)        conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME') # convolution        hidden = tf.nn.relu(conv + layer1_biases)        # weights shape (patch_size, patch_size, depth, depth)        # weights reshaped into (patch_size*patch_size* depth, depth)        # hidden reshaped into (batch, 7, 7, patch_size*patch_size* depth)        # conv shape (batch, 7, 7, depth)        conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME') # convolution        # conv shape (batch, 7, 7, depth)        #print('conv1 shape', conv.get_shape().as_list())        conv = tf.nn.max_pool(conv, [1,2,2,1], [1,2,2,1], padding='SAME') # strides change dimensions        #print('conv2 shape', conv.get_shape().as_list())        hidden = tf.nn.relu(conv + layer2_biases)        #  hidden shape (batch, 4, 4, depth)               shape = hidden.get_shape().as_list()        reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])         # reshape (batch,4*4*depth)        # weights shape( 4 * 4*depth, num_hidden)        # hidden shape(batch, num_hidden)        #print('reshape shape', reshape.get_shape().as_list())        #print('layer3_weights', layer3_weights.get_shape().as_list())        hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)         #  return tensor  (batch, num_labels)        return tf.matmul(hidden, layer4_weights) + layer4_biases      # Training computation.    logits = model(tf_train_dataset)    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))        # Optimizer.    optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)      # Predictions for the training, validation, and test data.    train_prediction = tf.nn.softmax(logits)    valid_prediction = tf.nn.softmax(model(tf_valid_dataset))    test_prediction = tf.nn.softmax(model(tf_test_dataset))    num_steps = 1001with tf.Session(graph=graph) as session:    tf.initialize_all_variables().run()    print('Initialized')    for step in range(num_steps):        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]        batch_labels = train_labels[offset:(offset + batch_size), :]        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}        _, l, predictions = session.run(        [optimizer, loss, train_prediction], feed_dict=feed_dict)        if (step % 50 == 0):            print('Minibatch loss at step %d: %f' % (step, l))            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))            print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

测试集的准确度为87%左右。

这还没有普通的神经网络好。

所以神经网络的架构很重要。

如果没有好的架构，再深的神经网络也许没有简单的逻辑回归的效果好，还浪费了大量资源，真是吃力不讨好。

这里可以用学习速率衰减来提高一下测试集的准确度。准确度在，92.6%左右。

# only add learning rate decay# 92.6%batch_size = 128patch_size = 5 # padding image pixels by 5*5depth = 16  # depthnum_hidden = 64 # num of  node in hidden layergraph = tf.Graph()with graph.as_default():        # Input data.    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, num_channels)) # num_channels=1 grayscale     tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))        tf_valid_dataset = tf.constant(valid_dataset)    tf_test_dataset = tf.constant(test_dataset)      # Variables.    global_step = tf.Variable(0)    learning_rate = tf.train.exponential_decay(0.1, global_step, 300, 0.7)    layer1_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth], stddev=0.1))    layer1_biases = tf.Variable(tf.zeros([depth]))        layer2_weights = tf.Variable(tf.truncated_normal( [patch_size, patch_size, depth, depth], stddev=0.1))    layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))       layer3_weights = tf.Variable(tf.truncated_normal([28//7 * 28//7 * depth, num_hidden], stddev=0.1))    layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))      layer4_weights = tf.Variable(tf.truncated_normal([num_hidden, num_labels], stddev=0.1))    layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))    # Model.    def model(data):        # data (batch, 28, 28, 1)                # weights reshaped to (patch_size*patch_size*num_channels, depth)        # data reshaped to (batch, 14, 14,  patch_size*patch_size*num_channels)        # conv shape (batch, 14, 14, depth)        conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME') # convolution        hidden = tf.nn.relu(conv + layer1_biases)        # weights shape (patch_size, patch_size, depth, depth)        # weights reshaped into (patch_size*patch_size* depth, depth)        # hidden reshaped into (batch, 7, 7, patch_size*patch_size* depth)        # conv shape (batch, 7, 7, depth)        conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME') # convolution        # conv shape (batch, 7, 7, depth)        #print('conv1 shape', conv.get_shape().as_list())        conv = tf.nn.max_pool(conv, [1,2,2,1], [1,2,2,1], padding='SAME') # strides change dimensions        #print('conv2 shape', conv.get_shape().as_list())        hidden = tf.nn.relu(conv + layer2_biases)        #  hidden shape (batch, 4, 4, depth)               shape = hidden.get_shape().as_list()        reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])         # reshape (batch,4*4*depth)        # weights shape( 4 * 4*depth, num_hidden)        # hidden shape(batch, num_hidden)        # print('reshape shape', reshape.get_shape().as_list())        # print('layer3_weights', layer3_weights.get_shape().as_list())        hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)         #  return tensor  (batch, num_labels)        return tf.matmul(hidden, layer4_weights) + layer4_biases      # Training computation.    logits = model(tf_train_dataset)        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))        # Optimizer.    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)      # Predictions for the training, validation, and test data.    train_prediction = tf.nn.softmax(logits)    valid_prediction = tf.nn.softmax(model(tf_valid_dataset))    test_prediction = tf.nn.softmax(model(tf_test_dataset))    num_steps = 20001with tf.Session(graph=graph) as session:    tf.initialize_all_variables().run()    print('Initialized')    for step in range(num_steps):        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]        batch_labels = train_labels[offset:(offset + batch_size), :]        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}        _, l, predictions = session.run(        [optimizer, loss, train_prediction], feed_dict=feed_dict)        if (step % 300 == 0):            print('current Learning rate', learning_rate.eval())            print('Minibatch loss at step %d: %f' % (step, l))            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))            print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

1 0