TensorFlow官方教程学习笔记（五）——前馈神经网络

来源：互联网发布：sqlserver数据库安全编辑：程序博客网时间：2024/06/05 07:36

教程地址：TensorFlow官方文档中文版

本文主要是在TensorFlow上搭建一个前馈神经网络（feed-forward neural network）来对TensorFlow的运作方式进行简单介绍。

代码在\examples\tutorials\mnist\中，主要使用两个文件：mnist.py和fully_connected_feed.py。

先看mnist.py，mnist.py的作用是构建一个完全连接的前馈网络，主要包含四个部分：inference()，loss()、trainging()以及evaluation()。

inference()，为满足促使神经网络向前反馈并做出预测的要求来构建模型，看代码：

def inference(images, hidden1_units, hidden2_units):  """Build the MNIST model up to where it may be used for inference.  Args:    images: Images placeholder, from inputs().    hidden1_units: Size of the first hidden layer.    hidden2_units: Size of the second hidden layer.  Returns:    softmax_linear: Output tensor with the computed logits.  """  # Hidden 1  with tf.name_scope('hidden1'):    weights = tf.Variable(        tf.truncated_normal([IMAGE_PIXELS, hidden1_units],                                                   stddev=1.0 / math.sqrt(float(IMAGE_PIXELS))),        name='weights')                             #权重是标准方差为输入尺寸开根号分之一的正态分布    biases = tf.Variable(tf.zeros([hidden1_units]),                         name='biases')    hidden1 = tf.nn.relu(tf.matmul(images, weights) + biases)  # Hidden 2  with tf.name_scope('hidden2'):    weights = tf.Variable(        tf.truncated_normal([hidden1_units, hidden2_units],                            stddev=1.0 / math.sqrt(float(hidden1_units))),        name='weights')    biases = tf.Variable(tf.zeros([hidden2_units]),                         name='biases')    hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)  # Linear  with tf.name_scope('softmax_linear'):    weights = tf.Variable(        tf.truncated_normal([hidden2_units, NUM_CLASSES],                            stddev=1.0 / math.sqrt(float(hidden2_units))),        name='weights')    biases = tf.Variable(tf.zeros([NUM_CLASSES]),                         name='biases')    logits = tf.matmul(hidden2, weights) + biases  return logits

使用tf.name_scope()定义了名为“hidden1”、“hidden2”和“softmax_linear”的三个变量空间，即作用域，作用域中创建的元素将会带上scope名称的前缀，那这和我们之前文章中的tf.variable_scope()有什么区别呢，在网上你可以搜到很多关于这两者的区别，但是我认为，在tf.name_scope()中使用tf.Variable()来创建变量，tf.variable_scope()中使用tf.get_variable()来创建变量，而tf.Variable()和tf.get_variable()的区别就是，使用tf.get_variable()创建的变量可以用来共享，而tf.Variable()不行。

在初始化时，对weight的初始化使用标准方差为输入维度开根号分之一的正太分布，biase的初始化使用常量0，在“hidden1”和“hidden2”中使用ReLU的激活函数，“softmax_linear”是输出维度和分类的总数相同，在这里为10 ，它的输出也就与分类情况有关。

loss()，定义了模型的损失函数：

def loss(logits, labels):  """Calculates the loss from the logits and the labels.  Args:    logits: Logits tensor, float - [batch_size, NUM_CLASSES].    labels: Labels tensor, int32 - [batch_size].  Returns:    loss: Loss tensor of type float.  """  labels = tf.to_int64(labels)  cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(      logits, labels, name='xentropy')  loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')  return loss

logtis的尺寸是[batch_size, NUM_CLASSES]，表示这一个batch中每一个样本关于所有类别的输出，而labels的尺寸是[batch_size]，表示这个batch中每个样本的类别索引。在教程中，提到了将labels进行one-hot编码，但是在代码中并没有出现，与教程中不同的是，这里使用的损失函数是sparse_softmax_cross_entropy_with_logits，而且tf.nn.sparse_softmax_cross_entropy_with_logits()也并没有要求labels需要是one-hot类型。求完loss后计算平均值作为这个batch总的损失。

training()，创建一个训练op：

def training(loss, learning_rate):  """Sets up the training Ops.  Creates a summarizer to track the loss over time in TensorBoard.  Creates an optimizer and applies the gradients to all trainable variables.  The Op returned by this function is what must be passed to the  `sess.run()` call to cause the model to train.  Args:    loss: Loss tensor, from loss().    learning_rate: The learning rate to use for gradient descent.  Returns:    train_op: The Op for training.  """  # Add a scalar summary for the snapshot loss.  tf.summary.scalar('loss', loss)  # Create the gradient descent optimizer with the given learning rate.  optimizer = tf.train.GradientDescentOptimizer(learning_rate)  # Create a variable to track the global step.  global_step = tf.Variable(0, name='global_step', trainable=False)  # Use the optimizer to apply the gradients that minimize the loss  # (and also increment the global step counter) as a single training step.  train_op = optimizer.minimize(loss, global_step=global_step)  return train_op

使用了tf.summary_scalar()来对计算的loss进行汇总，然后在后面可以通过SummaryWriter向事件文件写入数据，并可以通过TensorBoard来进行可视化。使用梯度下降来最小化损失函数，并使用了一个global_step变量，用来保存全局训练步骤，global_step不会进行训练，每经历一次训练op，global_step的值就加1。

evaluation()，用来评价模型的输出：

def evaluation(logits, labels):  """Evaluate the quality of the logits at predicting the label.  Args:    logits: Logits tensor, float - [batch_size, NUM_CLASSES].    labels: Labels tensor, int32 - [batch_size], with values in the      range [0, NUM_CLASSES).  Returns:    A scalar int32 tensor with the number of examples (out of batch_size)    that were predicted correctly.  """  # For a classifier model, we can use the in_top_k Op.  # It returns a bool tensor with shape [batch_size] that is true for  # the examples where the label is in the top k (here k=1)  # of all logits for that example.  correct = tf.nn.in_top_k(logits, labels, 1)  # Return the number of true entries.  return tf.reduce_sum(tf.cast(correct, tf.int32))

使用top_k来评估，并对整个batch的结果求和作为正确数目，这里k为1。

然后看fully_connected_feed.py：

def placeholder_inputs(batch_size):  """Generate placeholder variables to represent the input tensors.  These placeholders are used as inputs by the rest of the model building  code and will be fed from the downloaded data in the .run() loop, below.  Args:    batch_size: The batch size will be baked into both placeholders.  Returns:    images_placeholder: Images placeholder.    labels_placeholder: Labels placeholder.  """  # Note that the shapes of the placeholders match the shapes of the full  # image and label tensors, except the first dimension is now batch_size  # rather than the full size of the train or test data sets.  images_placeholder = tf.placeholder(tf.float32, shape=(batch_size,                                                         mnist.IMAGE_PIXELS))  labels_placeholder = tf.placeholder(tf.int32, shape=(batch_size))  return images_placeholder, labels_placeholder

创建图像和标签的占位符。

def fill_feed_dict(data_set, images_pl, labels_pl):  """Fills the feed_dict for training the given step.  A feed_dict takes the form of:  feed_dict = {      <placeholder>: <tensor of values to be passed for placeholder>,      ....  }  Args:    data_set: The set of images and labels, from input_data.read_data_sets()    images_pl: The images placeholder, from placeholder_inputs().    labels_pl: The labels placeholder, from placeholder_inputs().  Returns:    feed_dict: The feed dictionary mapping from placeholders to values.  """  # Create the feed_dict for the placeholders filled with the next  # `batch size` examples.  images_feed, labels_feed = data_set.next_batch(FLAGS.batch_size,                                                 FLAGS.fake_data)  feed_dict = {      images_pl: images_feed,      labels_pl: labels_feed,  }  return feed_dict

从date_set中读取next_batch的图像和标签，并可以通过写入feed_dict来传递给图像和标签的占位符。

def do_eval(sess,            eval_correct,            images_placeholder,            labels_placeholder,            data_set):  """Runs one evaluation against the full epoch of data.  Args:    sess: The session in which the model has been trained.    eval_correct: The Tensor that returns the number of correct predictions.    images_placeholder: The images placeholder.    labels_placeholder: The labels placeholder.    data_set: The set of images and labels to evaluate, from      input_data.read_data_sets().  """  # And run one epoch of eval.  true_count = 0  # Counts the number of correct predictions.  steps_per_epoch = data_set.num_examples // FLAGS.batch_size  num_examples = steps_per_epoch * FLAGS.batch_size  for step in xrange(steps_per_epoch):    feed_dict = fill_feed_dict(data_set,                               images_placeholder,                               labels_placeholder)    true_count += sess.run(eval_correct, feed_dict=feed_dict)  precision = true_count / num_examples  print('  Num examples: %d  Num correct: %d  Precision @ 1: %0.04f' %        (num_examples, true_count, precision))

用来统计整个dataset中的正确情况，并打印。

def run_training():  """Train MNIST for a number of steps."""  # Get the sets of images and labels for training, validation, and  # test on MNIST.  data_sets = input_data.read_data_sets(FLAGS.input_data_dir, FLAGS.fake_data)  # Tell TensorFlow that the model will be built into the default Graph.  with tf.Graph().as_default():    # Generate placeholders for the images and labels.    images_placeholder, labels_placeholder = placeholder_inputs(        FLAGS.batch_size)    # Build a Graph that computes predictions from the inference model.    logits = mnist.inference(images_placeholder,                             FLAGS.hidden1,                             FLAGS.hidden2)    # Add to the Graph the Ops for loss calculation.    loss = mnist.loss(logits, labels_placeholder)    # Add to the Graph the Ops that calculate and apply gradients.    train_op = mnist.training(loss, FLAGS.learning_rate)    # Add the Op to compare the logits to the labels during evaluation.    eval_correct = mnist.evaluation(logits, labels_placeholder)    # Build the summary Tensor based on the TF collection of Summaries.    summary = tf.summary.merge_all()    # Add the variable initializer Op.    init = tf.global_variables_initializer()    # Create a saver for writing training checkpoints.    saver = tf.train.Saver(write_version=tf.train.SaverDef.V2)    # Create a session for running Ops on the Graph.    sess = tf.Session()    # Instantiate a SummaryWriter to output summaries and the Graph.    summary_writer = tf.train.SummaryWriter(FLAGS.log_dir, sess.graph)    # And then after everything is built:    # Run the Op to initialize the variables.    sess.run(init)    # Start the training loop.    for step in xrange(FLAGS.max_steps):      start_time = time.time()      # Fill a feed dictionary with the actual set of images and labels      # for this particular training step.      feed_dict = fill_feed_dict(data_sets.train,                                 images_placeholder,                                 labels_placeholder)      # Run one step of the model.  The return values are the activations      # from the `train_op` (which is discarded) and the `loss` Op.  To      # inspect the values of your Ops or variables, you may include them      # in the list passed to sess.run() and the value tensors will be      # returned in the tuple from the call.      _, loss_value = sess.run([train_op, loss],                               feed_dict=feed_dict)      duration = time.time() - start_time      # Write the summaries and print an overview fairly often.      if step % 100 == 0:        # Print status to stdout.        print('Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, duration))        # Update the events file.        summary_str = sess.run(summary, feed_dict=feed_dict)        summary_writer.add_summary(summary_str, step)        summary_writer.flush()      # Save a checkpoint and evaluate the model periodically.      if (step + 1) % 1000 == 0 or (step + 1) == FLAGS.max_steps:        checkpoint_file = os.path.join(FLAGS.log_dir, 'model.ckpt')        saver.save(sess, checkpoint_file, global_step=step)        # Evaluate against the training set.        print('Training Data Eval:')        do_eval(sess,                eval_correct,                images_placeholder,                labels_placeholder,                data_sets.train)        # Evaluate against the validation set.        print('Validation Data Eval:')        do_eval(sess,                eval_correct,                images_placeholder,                labels_placeholder,                data_sets.validation)        # Evaluate against the test set.        print('Test Data Eval:')        do_eval(sess,                eval_correct,                images_placeholder,                labels_placeholder,                data_sets.test)

这个就是将所有函数整合起来作为一个整的训练过程，整个过程不难理解，在这里我们主要看以下几行代码：

Build the summary Tensor based on the TF collection of Summaries.summary = tf.summary.merge_all()

在mnist.py的loss()部分，对计算的loss有一个summary，在这里就是对loss进行一个汇总，

summary_writer = tf.train.SummaryWriter(FLAGS.log_dir, sess.graph)

初始化一个summary_writer来输出整个图和汇总的文件。

# Write the summaries and print an overview fairly often.if step % 100 == 0:    # Print status to stdout.    print('Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, duration))    # Update the events file.    summary_str = sess.run(summary, feed_dict=feed_dict)    summary_writer.add_summary(summary_str, step)    summary_writer.flush()

每个100步，打印一次loss的信息并写入汇总文件，把图运行的信息写到缓冲区，更新磁盘文件。

# Create a saver for writing training checkpoints.saver = tf.train.Saver(write_version=tf.train.SaverDef.V2)

创建一个saver用来保存checkpoints文件，在下面代码中有使用方法。saver也可以从checkpoints文件中恢复变量。Checkpoints文件是一个二进制文件，它把变量名映射到对应的tensor值。使用saver从checkpoint_dir目录下恢复变量的方法如下：

saver = tf.train.Saver() ckpt = tf.train.get_checkpoint_state(checkpoint_dir)      if ckpt and ckpt.model_checkpoint_path:          saver.restore(sess, ckpt.model_checkpoint_path)      else:          pass

# Save a checkpoint and evaluate the model periodically.if (step + 1) % 1000 == 0 or (step + 1) == FLAGS.max_steps:    checkpoint_file = os.path.join(FLAGS.log_dir, 'model.ckpt')    saver.save(sess, checkpoint_file, global_step=step)    # Evaluate against the training set.    print('Training Data Eval:')    do_eval(sess,            eval_correct,            images_placeholder,            labels_placeholder,            data_sets.train)    # Evaluate against the validation set.    print('Validation Data Eval:')    do_eval(sess,            eval_correct,                images_placeholder,                labels_placeholder,                data_sets.validation)        # Evaluate against the test set.        print('Test Data Eval:')        do_eval(sess,                eval_correct,                images_placeholder,                labels_placeholder,                data_sets.test)

每隔1000步，保存一次监测点的图信息，并对模型进行三次评估：训练数据评估、验证数据评估以及测试数据评估。

这样就是一个完整的前馈网络的运行过程，包括了模型的搭建、数据读取、模型训练、模型评估以及保存的操作。

在目录下输入：

python fully_connected_feed.py

训练结果如下：

因为我的Anaconda目录在E盘，所以我在E盘目录下找到了一个tmp的文件夹，E:/tmp/tensorflow/mnist/目录下有两个文件夹，分别是input和logs，input里头就是mnist数据集，logs存放的就是fully_connected_feed所保存的监测点文件。

然后切换目录到/tensorflow/tensorboard/中输入：

python tensorboard.py --logdir=E:/tmp/tensorflow/mnist/fully_connected_feed

然后就会出现一个地址：

在浏览器中输入这个地址就可以看到tensorboard了：

这样就可以看到整个训练过程中loss的变化情况。

1 0