深度学习笔记——深度学习框架TensorFlow（二）

来源：互联网发布：淘宝达人申请步骤编辑：程序博客网时间：2024/06/06 18:52

一. 学习网站：

https://www.tensorflow.org/versions/r0.12/tutorials/index.html
http://wiki.jikexueyuan.com/project/tensorflow-zh/tutorials/overview.html
http://www.tensorfly.cn/tfdoc/tutorials/mnist_beginners.html

二. 教程：

目录：
1. 面向机器学习初学者的 MNIST 初级教程
2. 面向机器学习专家的 MNIST 高级教程
3. TensorFlow 使用指南（以MNIST为例）
4. 简单的机器学习with tf.contrib.learn

MNIST For ML Beginners：
This tutorial is intended for readers who are new to both machine learning and TensorFlow. If you already know what MNIST is, and what softmax (multinomial logistic) regression is, you might prefer this faster paced tutorial. Be sure to install TensorFlow before starting either tutorial.
这篇教程是为机器学习和TensorFlow新手装备的，如果你已经知道了MNIST，softmax回归函数是什么，你更应该阅读（https://www.tensorflow.org/versions/r0.12/tutorials/mnist/pros/index.html）。在此之前请确保TensorFlow安装好了。

When one learns how to program, there’s a tradition that the first thing you do is print “Hello World.” Just like programming has Hello World, machine learning has MNIST.
MNIST类似与程序中的“Hello World”

MNIST is a simple computer vision dataset. It consists of images of handwritten digits like these:
MNIST是一个简单的计算视觉数据，他由如下手写的数字组成。
这里写图片描述

It also includes labels for each image, telling us which digit it is. For example, the labels for the above images are 5, 0, 4, and 1.
对于每张图片，它还包括一个标记，如上标记为5，0，4，1。

In this tutorial, we’re going to train a model to look at images and predict what digits they are. Our goal isn’t to train a really elaborate model that achieves state-of-the-art performance – although we’ll give you code to do that later! – but rather to dip a toe into using TensorFlow. As such, we’re going to start with a very simple model, called a Softmax Regression.
在这个教程中，我们训练一个模型，并且预测这个数字是什么。我们的目的不是要设计一个世界一流的复杂模型 – 尽管我们会在之后给你源代码去实现一流的预测模型 – 而是要介绍下如何使用TensorFlow。所以，我们这里会从一个很简单的数学模型开始，它叫做Softmax Regression。

The actual code for this tutorial is very short, and all the interesting stuff happens in just three lines. However, it is very important to understand the ideas behind it: both how TensorFlow works and the core machine learning concepts. Because of this, we are going to very carefully work through the code.
对应这个教程的实现代码很短，而且真正有意思的内容只包含在三行代码里面。但是，去理解包含在这些代码里面的设计思想是非常重要的：TensorFlow工作流程和机器学习的基本概念。因此，这个教程会很详细地介绍这些代码的实现原理。

About this tutorial:
This tutorial is an explanation, line by line, of what is happening in the mnist_softmax.py code.
源代码在这里：mnist_softmax.py（https://github.com/tensorflow/tensorflow/blob/r0.12/tensorflow/examples/tutorials/mnist/mnist_softmax.py）

教程目标：
1. 学习MNIST数据和softmax回归
2. 基于图片中的每一个像素，创建模型以识别数字
3. 通过上千个例子，使用TF来训练模型并且识别数字。
4. 检查模型的准确性

The MNIST Data:
1. 数据获取： Yann LeCun’s website（http://yann.lecun.com/exdb/mnist/）
2. 如果你下载了Github（https://github.com/tensorflow/tensorflow/blob/r0.12/tensorflow/examples/tutorials/mnist/mnist_softmax.py）中的数据可以直接使用以下两行代码获取数据：

from tensorflow.examples.tutorials.mnist import input_data  mnist = input_data.read_data_sets("MNIST_data/",one_hot=True)

对此我进行了修改，只需要考入两个文件就可以下载所需数据了：这里写图片描述
3.数据分析：MNIST数据被分为3个部分：mnist.train是55000条训练数据，mnist.test是10000条测试数据，mnist.validation是5000个验证数据。这样分割十分重要，在机器学习模型设计时必须有一个单独的测试数据集不用于训练而是用来评估这个模型的性能，从而更加容易把设计的模型推广到其他数据集上（泛化）。

As mentioned earlier, every MNIST data point has two parts: an image of a handwritten digit and a corresponding label. We’ll call the images “x” and the labels “y”. Both the training set and test set contain images and their corresponding labels; for example the training images are mnist.train.images and the training labels are mnist.train.labels.
如前所示，每一组MNIST数据都有两个部分，一张手写图片和相对应的标签。我们将图像标为x，标签标为y。训练集和测试集都是包含了图像和对应标签。例如，训练体系是mnist.train.images，训练标签是mnist.train.labels。

Each image is 28 pixels by 28 pixels. We can interpret this as a big array of numbers:
每一个图片都是28x28，我们可以将这些表示为一串数字（即数组）。
这里写图片描述

We can flatten this array into a vector of 28x28 = 784 numbers. It doesn’t matter how we flatten the array, as long as we’re consistent between images. From this perspective, the MNIST images are just a bunch of points in a 784-dimensional vector space, with a very rich structure (warning: computationally intensive visualizations).
我们可以将个数组变为28*28=784的向量，只要我们能在图像之间保持一致，我们如何映射数组并不重要。从这个角度来看，MNIST数据集的图片就是在784维向量空间里面的点，并且拥有比较复杂的结构（警告：此类数据的可视化是计算密集型的）

Flattening the data throws away information about the 2D structure of the image. Isn’t that bad? Well, the best computer vision methods do exploit this structure, and we will in later tutorials. But the simple method we will be using here, a softmax regression (defined below), won’t.
扁平图片的数字数组会丢失图片的2D结构信息，这显然是不理想的，最优秀的计算机视觉方法会挖掘这些结构信息，我们会在后续的教程中介绍。但我们在这会使用一些简单思想，softmax回归不会利用这些结构信息。

The result is that mnist.train.images is a tensor (an n-dimensional array) with a shape of [55000, 784]. The first dimension is an index into the list of images and the second dimension is the index for each pixel in each image. Each entry in the tensor is a pixel intensity between 0 and 1, for a particular pixel in a particular image.
结果是mnist.train.image是一个[55000,784]张量（一个n维数组），第一维代表的是图片序列的下标，第二维代表的是每张图片每个像素的下标。tensor的每一个都代表某张图片里的某个像素的强度值，介于0和1之间。这里写图片描述
Each image in MNIST has a corresponding label, a number between 0 and 9 representing the digit drawn in the image.
MNIST中的每一张图片都有一个对应的标签，数字介于0到9之间。

For the purposes of this tutorial, we’re going to want our labels as “one-hot vectors”. A one-hot vector is a vector which is 0 in most dimensions, and 1 in a single dimension. In this case, the nth digit will be represented as a vector which is 1 in the nth dimension. For example, 3 would be [0,0,0,1,0,0,0,0,0,0]. Consequently, mnist.train.labels is a [55000, 10] array of floats.
为了这个教程，我们使用的标签是”one-hot vectors”。一个one-hot vector是一个向量，这个系列由绝大多数的0和唯一一个1构成。在这个例子中，数字n将表示成一个只有在第n维度（从0开始）数字为1的10维向量。比如，标签3可以被表示为[0,0,0,1,0,0,0,0,0,0]。因此，mnist.train.labels是一组[55000,10]的float类型的数字。
这里写图片描述
We’re now ready to actually make our model!
我们现在开始构建这模型！

Softmax Regressions：
We know that every image in MNIST is of a handwritten digit between zero and nine. So there are only ten possible things that a given image can be. We want to be able to look at an image and give the probabilities for it being each digit. For example, our model might look at a picture of a nine and be 80% sure it’s a nine, but give a 5% chance to it being an eight (because of the top loop) and a bit of probability to all the others because it isn’t 100% sure.
我们知道MNIST中的每一幅图片都是一组0~9的手写数字。因此给定图片只有10个。我们希望可以得到给定图片代表每个数字的概率。例如，我们的模型可能推测一张包含9的图片代表数字9的概率是80%，但是判断它是8的概率是5%（因为8和9都有上半部分的小圆），然后给予它代表其他数字的概率更小的值。

This is a classic case where a softmax regression is a natural, simple model. If you want to assign probabilities to an object being one of several different things, softmax is the thing to do, because softmax gives us a list of values between 0 and 1 that add up to 1. Even later on, when we train more sophisticated models, the final step will be a layer of softmax.
这是一个使用softmax回归（softmax regression）模型的经典案例。如果你要为不同的事物分配不同的可能性，Softmax可以提供这样的能力，因为softmax给我们0~1的值，并且相加后值为1。甚至在此之后，当我们训练更多复杂模型时，最后一步也需要用softmax来分配概率。

A softmax regression has two steps: first we add up the evidence of our input being in certain classes, and then we convert that evidence into probabilities.
softmax回归分为两步：
第一步，我们将输入的证据添加到某一个确切类，并且我们将这些证据转换为概率。

To tally up the evidence that a given image is in a particular class, we do a weighted sum of the pixel intensities. The weight is negative if that pixel having a high intensity is evidence against the image being in that class, and positive if it is evidence in favor.
为了计算给定图片是否是某类的证据，我们对图片像素值进行加权求和。如果这像素具有很强的证据说明这张图片不属于该类，那么相应的权值就为负数，反之，如果这像素拥有有利的证据支持这张图片不属于这个类，那么权值为正数。

The following diagram shows the weights one model learned for each of these classes. Red represents negative weights, while blue represents positive weights.
下面的图片显示了一个模型学习到图片上每个像素对于特定数字类的权重。红色代表负数权值，蓝色代表正数权值。
这里写图片描述

We also add some extra evidence called a bias. Basically, we want to be able to say that some things are more likely independent of the input. The result is that the evidence for a class i given an input x is:
我们也添加一些叫做偏差的证据，因为输入往往有一些无关的干扰量。因此对于给定的输入图片x它代表的是数字i的证据可以表示的。
这里写图片描述
where Wi is the weights and bi is the bias for class i, and j is an index for summing over the pixels in our input image x. We then convert the evidence tallies into our predicted probabilities y using the “softmax” function:

这里Wi表示的是权重，bi表示的是类i的偏差，j
表示的是我们输入图像x中的像素索引用于像素求和。通过使用softmax函数我们可以将这些证据转换称我们预测的概率y。
这里写图片描述

Here softmax is serving as an “activation” or “link” function, shaping the output of our linear function into the form we want – in this case, a probability distribution over 10 cases. You can think of it as converting tallies of evidence into probabilities of our input being in each class. It’s defined as:
这里的softmax可以看成是一个激励（activation）函数或者链接（link）函数，把我们定义的线性函数输出转换成我们想要的格式，在这例子中也就是10个数字类的概率分布。因此你可以认为，给定一张图片，你可以把它看作是将证据的一致性转化为每个类中输入的概率，它被定义成如下：
这里写图片描述
展开等式右边的子式可以得到：

But it’s often more helpful to think of softmax the first way: exponentiating its inputs and then normalizing them. The exponentiation means that one more unit of evidence increases the weight given to any hypothesis multiplicatively. And conversely, having one less unit of evidence means that a hypothesis gets a fraction of its earlier weight. No hypothesis ever has zero or negative weight. Softmax then normalizes these weights, so that they add up to one, forming a valid probability distribution. (To get more intuition about the softmax function, check out the section on it in Michael Nielsen’s book, complete with an interactive visualization.)
但把softmax定义成的一种则更有益处：将输入进行幂运算，然后再正则化这些结果。这幂运算意味更大的证据对应更大的假设模型里面的乘数权重值。反之，拥有更少的证据因为着假设模型里面拥有更小的乘数系数。假设模型里面的权值不可以是0值或者是负值。接下来softmax会正则化它们的权重，使它们的总和为1，以此构造一个有效的概率分布。（更多的关于Softmax函数的信息，可以参考Michael Nieslen的书里面的这个部分，其中有关于softmax的可交互式的可视化解释。http://neuralnetworksanddeeplearning.com/chap3.html#softmax）

You can picture our softmax regression as looking something like the following, although with a lot more xs. For each output, we compute a weighted sum of the xs, add a bias, and then apply softmax.
对于softmax回归模型可以用下图解释，对于输入的xs加权求和，再分别加上一个偏置量，最后再输入到softmax函数中：
这里写图片描述

If we write that out as equations, we get:
如果我们写出来这些等式，我们会得到：
这里写图片描述
We can “vectorize” this procedure, turning it into a matrix multiplication and vector addition. This is helpful for computational efficiency. (It’s also a useful way to think.)
我们可以向量化这个过程，将此转化为矩阵相乘和相加的形式，这使得计算更加高效。
这里写图片描述
More compactly, we can just write:
更加紧凑的，我们也可以写成如下形式：

Implementing the Regression:
To do efficient numerical computing in Python, we typically use libraries like NumPy that do expensive operations such as matrix multiplication outside Python, using highly efficient code implemented in another language. Unfortunately, there can still be a lot of overhead from switching back to Python every operation. This overhead is especially bad if you want to run computations on GPUs or in a distributed manner, where there can be a high cost to transferring data.
为了用python实现高效的数值计算，我们通常会使用函数库，比如NumPy，会把类似矩阵乘法这样的复杂运算使用其他外部语言实现。不幸的是，从外部计算切换回Python的每一个操作，仍然是一个很大的开销。如果你用GPU来进行外部计算，这样的开销会更大。用分布式的计算方式，也会花费更多的资源用来传输数据。、

TensorFlow also does its heavy lifting outside Python, but it takes things a step further to avoid this overhead. Instead of running a single expensive operation independently from Python, TensorFlow lets us describe a graph of interacting operations that run entirely outside Python. (Approaches like this can be seen in a few machine learning libraries.)
TensorFlow也把复杂的计算放在python之外完成，但是为了避免前面说的那些开销，它做了进一步完善。Tensorflow不单独地运行单一的复杂计算，而是让我们可以先用图描述一系列可交互的计算操作，然后全部一起在Python之外运行。（这样类似的运行方式，可以在不少的机器学习库中看到。）

To use TensorFlow, first we need to import it.
使用TensorFlow之前，首先导入它：

import tensorflow as tf

We describe these interacting operations by manipulating symbolic variables. Let’s create one:
我们通过操作符号变量来描述这些可交互的操作单元，可以用下面的方式创建一个

x = tf.placeholder("float",[None,784])

x isn’t a specific value. It’s a placeholder, a value that we’ll input when we ask TensorFlow to run a computation. We want to be able to input any number of MNIST images, each flattened into a 784-dimensional vector. We represent this as a 2-D tensor of floating-point numbers, with a shape [None, 784]. (Here None means that a dimension can be of any length.)
x不是一个特定的值，而是一个占位符placeholder，我们在TensorFlow运行计算时输入这个值。我们希望能够输入任意数量的MNIST图像，每一张图展平成784维的向量。我们用2维的浮点数张量来表示这些图，这个张量的形状是[None，784 ]。（这里的None表示此张量的第一个维度可以是任何长度的。）

We also need the weights and biases for our model. We could imagine treating these like additional inputs, but TensorFlow has an even better way to handle it: Variable. A Variable is a modifiable tensor that lives in TensorFlow’s graph of interacting operations. It can be used and even modified by the computation. For machine learning applications, one generally has the model parameters be Variables.
我们的模型也需要权重值和偏置量，当然我们可以把它们当做是另外的输入（使用占位符），但TensorFlow有一个更好的方法来表示它们：Variable 。一个Variable代表一个可修改的张量，存在在TensorFlow的用于描述交互性操作的图中。它们可以用于计算输入值，也可以在计算中被修改。对于各种机器学习应用，一般都会有模型参数，可以用Variable表示。

W = tf.Variable(tf.zeros([784,10]))b = tf.Variable(tf.zeros([10]))

We create these Variables by giving tf.Variable the initial value of the Variable: in this case, we initialize both W and b as tensors full of zeros. Since we are going to learn W and b, it doesn’t matter very much what they initially are.
我们赋予tf.Variable不同的初值来创建不同的Variable：在这里，我们都用全为零的张量来初始化W和b。因为我们要学习W和b的值，它们的初值可以随意设置。

Notice that W has a shape of [784, 10] because we want to multiply the 784-dimensional image vectors by it to produce 10-dimensional vectors of evidence for the difference classes. b has a shape of [10] so we can add it to the output.
注意，W的维度是[784，10]，因为我们想要用784维的图片向量乘以它以得到一个10维的证据值向量，每一位对应不同数字类。b的形状是[10]，所以我们可以直接把它加到输出上面。

We can now implement our model. It only takes one line to define it!
现在，我们可以实现我们的模型啦。只需要一行代码！

y = tf.nn.softmax(tf.matmul(x,W)+b)

First, we multiply x by W with the expression tf.matmul(x, W). This is flipped from when we multiplied them in our equation, where we had Wx, as a small trick to deal with x being a 2D tensor with multiple inputs. We then add b, and finally apply tf.nn.softmax.
首先，我们用tf.matmul(X，W)表示x乘以W，对应之前等式里面的，这里Wx是一个2维张量拥有多个输入。然后再加上b，把和输入到tf.nn.softmax函数里面。

That’s it. It only took us one line to define our model, after a couple short lines of setup. That isn’t because TensorFlow is designed to make a softmax regression particularly easy: it’s just a very flexible way to describe many kinds of numerical computations, from machine learning models to physics simulations. And once defined, our model can be run on different devices: your computer’s CPU, GPUs, and even phones!
至此，我们先用了几行简短的代码来设置变量，然后只用了一行代码来定义我们的模型。TensorFlow不仅仅可以使softmax回归模型计算变得特别简单，它也用这种非常灵活的方式来描述其他各种数值计算，从机器学习模型对物理学模拟仿真模型。一旦被定义好之后，我们的模型就可以在不同的设备上运行：计算机的CPU，GPU，甚至是手机！

Training:
In order to train our model, we need to define what it means for the model to be good. Well, actually, in machine learning we typically define what it means for a model to be bad. We call this the cost, or the loss, and it represents how far off our model is from our desired outcome. We try to minimize that error, and the smaller the error margin, the better our model is.
为了训练我们的模型，我们首先需要定义一个指标来评估这个模型是好的。其实，在机器学习，我们通常定义指标来表示一个模型是坏的，这个指标称为成本（cost）或损失（loss），然后尽量最小化这个差值（error）。这个差值越小，我们的模型越好。

One very common, very nice function to determine the loss of a model is called “cross-entropy.” Cross-entropy arises from thinking about information compressing codes in information theory but it winds up being an important idea in lots of areas, from gambling to machine learning. It’s defined as:
一个非常常见的，非常漂亮的成本函数是“交叉熵”（cross-entropy）。交叉熵产生于信息论里面的信息压缩编码技术，但是它后来演变成为从博弈论到机器学习等其他领域里的重要技术手段。它的定义如下：
这里写图片描述

Where y is our predicted probability distribution, and y′ is the true distribution (the one-hot vector with the digit labels). In some rough sense, the cross-entropy is measuring how inefficient our predictions are for describing the truth. Going into more detail about cross-entropy is beyond the scope of this tutorial, but it’s well worth understanding.
y 是我们预测的概率分布, y’ 是实际的分布（我们输入的one-hot vector)。比较粗糙的理解是，交叉熵是用来衡量我们的预测用于描述真相的低效性。更详细的关于交叉熵的解释超出本教程的范畴，但是你很有必要好好理解它（http://colah.github.io/posts/2015-09-Visual-Information/）。

To implement cross-entropy we need to first add a new placeholder to input the correct answers:
为了计算交叉熵，我们首先需要添加一个新的占位符用于输入正确值：

y_ = tf.placeholder(tf.float32,[None,10])

Then we can implement the cross-entropy function, 这里写图片描述
然后我们可以用计算交叉熵

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_*tf.log(y),reduction_indices=[1]))

First, tf.log computes the logarithm of each element of y. Next, we multiply each element of y_ with the corresponding element of tf.log(y). Then tf.reduce_sum adds the elements in the second dimension of y, due to the reduction_indices=[1] parameter. Finally, tf.reduce_mean computes the mean over all the examples in the batch.
首先，用 tf.log 计算 y 的每个元素的对数。接下来，我们把 y_ 的每一个元素和 tf.log(y) 的对应元素相乘。最后，由于其reduction_indices = [1]因此将tf.reduce_sum与y的第二维元素相加。最后，tf.reduce_mean将会计算在批处理中所有实例的平均值。

#This is something about what I understand of reduce_sumimport tensorflow as tfx = [[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]]sess = tf.Session()print("0:")print(sess.run(tf.reduce_sum(x,reduction_indices=0)))print("1:")print(sess.run(tf.reduce_sum(x,reduction_indices=1)))print("2:")print(sess.run(tf.reduce_sum(x,reduction_indices=2)))sess.close()=>#0:#[[ 8 10 12]# [14 16 18]]#1:#[[ 5  7  9]# [17 19 21]]#2:#[[ 6 15]# [24 33]]#

所谓维度按照理解是这样的：
第一维（也就是reduction_indices=0的时候）:【[[],[]],[[],[]]】这里的粗体就是第一维
第二维（也就是reduction_indices=1的时候）:
[【[],[]】,【[],[]】]
第三维（也就是reduction_indices=2的时候）：
[[【】,【】],[【】,【】]]
所以
当reduction_indices=0时，我们将在0为的内部进行处理
【[[1,2,3],[4,5,6]]
[[7,8,9],[10,11,12]]】
第一维有一层，第二维有两层，第三维有四层，我们将第二维的两层相对加起来，也就是
1+7=8 2+8=10 3+9=12
4+10=14 5+11=16 6+12=18
当reduction_indices=1时，我们将在1维的内部进行处理
[【[1,2,3],[4,5,6]】
【[7,8,9],[10,11,12]】]
第三维的四层对应加起来，也就是
1+4=5 2+5=7 3+6=9
7+10=17 8+11=19 9+12=21
当reduction_indices=2时，我们将在2为的内部进行处理
[[【1,2,3】,【4,5,6】]
[【7,8,9】,【10,11,12】]]
三层内部，逗号之间的数字加起来，也就是
1+2+3=6 4+5+6=15
7+8+9=24 10+11+12=33

(Note that in the source code, we don’t use this formulation, because it is numerically unstable. Instead, we apply tf.nn.softmax_cross_entropy_with_logits on the unnormalized logits (e.g., we call softmax_cross_entropy_with_logits on tf.matmul(x, W) + b), because this more numerically stable function internally computes the softmax activation. In your code, consider using tf.nn.(sparse_)softmax_cross_entropy_with_logits instead).
注意在源码中，我们并不使用这个公式，因为它的数值是不稳定的，相反我们采用tf.nn.softmax_cross_entropy_with_logits作用在非标准的log上（例如，我们将softmax_cross_entropy_with_logits应用在tf.matmul(x,W)+b中，因为这个数值稳定的内部功能，计算softmax激活。在你的代码，可以考虑使用 tf.nn.(sparse_)softmax_cross_entropy_with_logits）

Now that we know what we want our model to do, it’s very easy to have TensorFlow train it to do so. Because TensorFlow knows the entire graph of your computations, it can automatically use the backpropagation algorithm to efficiently determine how your variables affect the loss you ask it to minimize. Then it can apply your choice of optimization algorithm to modify the variables and reduce the loss.
现在我们知道我们需要我们的模型做什么啦，用TensorFlow来训练它是非常容易的。因为TensorFlow拥有一张描述你各个计算单元的图，它可以自动地使用反向传播算法(backpropagation algorithm:http://colah.github.io/posts/2015-08-Backprop/)来有效地确定你的变量是如何影响你想要最小化的那个成本值的。然后，TensorFlow会用你选择的优化算法来不断地修改变量以降低成本。

train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

In this case, we ask TensorFlow to minimize cross_entropy using the gradient descent algorithm with a learning rate of 0.5. Gradient descent is a simple procedure, where TensorFlow simply shifts each variable a little bit in the direction that reduces the cost. But TensorFlow also provides many other optimization algorithms: using one is as simple as tweaking one line.
在这里，我们要求TensorFlow用梯度下降算法（gradient descent algorithm：https://en.wikipedia.org/wiki/Gradient_descent）以0.5的学习速率最小化交叉熵。梯度下降算法（gradient descent algorithm）是一个简单的学习过程，TensorFlow只需将每个变量一点点地往使成本不断降低的方向移动。当然TensorFlow也提供了其他许多优化算法（https://www.tensorflow.org/versions/r0.12/api_docs/python/train.html#optimizers）：只要简单地调整一行代码就可以使用其他的算法。

What TensorFlow actually does here, behind the scenes, is to add new operations to your graph which implement backpropagation and gradient descent. Then it gives you back a single operation which, when run, does a step of gradient descent training, slightly tweaking your variables to reduce the loss.
TensorFlow在这里实际上所做的是，它会在后台给描述你的计算的那张图里面增加一系列新的计算操作单元用于实现反向传播算法和梯度下降算法。然后，它返回给你的只是一个单一的操作，当运行这个操作时，它用梯度下降算法训练你的模型，微调你的变量，不断减少成本。

Now we have our model set up to train. One last thing before we launch it, we have to create an operation to initialize the variables we created. Note that this defines the operation but does not run it yet:
现在，我们已经设置好了我们的模型。在运行计算之前，我们需要添加一个操作来初始化我们创建的变量：

init = tf.initialize_all_variables()

We can now launch the model in a Session, and now we run the operation that initializes the variables:
现在我们可以在一个Session里面启动我们的模型，并且初始化变量：

sess = tf.Session()sess.run(init)

Let’s train – we’ll run the training step 1000 times!
然后开始训练模型，这里我们让模型循环训练1000次！

for i in range(1000):    batch_xs,bacth_ys = mninst.train.next_batch(100)    sess.run(train_step,feed_dict={x:batch_xs,y:batch_ys})

Each step of the loop, we get a “batch” of one hundred random data points from our training set. We run train_step feeding in the batches data to replace the placeholders.
该循环的每个步骤中，我们都会随机抓取训练数据中的100个批处理数据点，然后我们用这些数据点作为参数替换之前的占位符来运行train_step。

Using small batches of random data is called stochastic training – in this case, stochastic gradient descent. Ideally, we’d like to use all our data for every step of training because that would give us a better sense of what we should be doing, but that’s expensive. So, instead, we use a different subset every time. Doing this is cheap and has much of the same benefit.
使用一小部分的随机数据来进行训练被称为随机训练（stochastic training）- 在这里更确切的说是随机梯度下降训练。在理想情况下，我们希望用我们所有的数据来进行每一步的训练，因为这能给我们更好的训练结果，但显然这需要很大的计算开销。所以，每一次训练我们可以使用不同的数据子集，这样做既可以减少计算开销，又可以最大化地学习到数据集的总体特性。

Evaluating Our Model：

Well, first let’s figure out where we predicted the correct label. tf.argmax is an extremely useful function which gives you the index of the highest entry in a tensor along some axis. For example, tf.argmax(y,1) is the label our model thinks is most likely for each input, while tf.argmax(y_,1) is the correct label. We can use tf.equal to check if our prediction matches the truth.
首先指出我们估计的相对应的标签值，tr.argmax是一个非常有用的还是，它会返回给你多个维度中的tensor最大值。例如tf.argmax(y,1)返回是我们模型认为的每个输入的最有可能的标签。而tf.argmax(y_,1)是正确的标签。我们可以使用tf.equal去判断我们的预期是否正确

correct_prediction = tf.equal(tf.argmax(y,1),tf.argmax(y_,1))

That gives us a list of booleans. To determine what fraction are correct, we cast to floating point numbers and then take the mean. For example, [True, False, True, True] would become [1,0,1,1] which would become 0.75.
这行代码会给我们一组布尔值。为了确定正确预测项的比例，我们可以把布尔值转换成浮点数，然后取平均值。例如，[True, False, True, True] 会变成 [1,0,1,1] ，取平均值后得到 0.75

accuray = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

Finally, we ask for our accuracy on our test data.
最后，我们计算所学习到的模型在测试数据集上面的正确率。

print(sess.run(accuracy,feed_dict={x:mnist.test.images,y_:mnist.test.labels}))

This should be about 92%.

Is that good? Well, not really. In fact, it’s pretty bad. This is because we’re using a very simple model. With some small changes, we can get to 97%. The best models can get to over 99.7% accuracy! (For more information, have a look at this list of results.)
个结果好吗？嗯，并不太好。事实上，这个结果是很差的。这是因为我们仅仅使用了一个非常简单的模型。不过，做一些小小的改进，我们就可以得到97％的正确率。最好的模型甚至可以获得超过99.7％的准确率！（想了解更多信息，可以看看这个关于各种模型的性能对比列表。http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html)

What matters is that we learned from this model. Still, if you’re feeling a bit down about these results, check out the next tutorial where we do a lot better, and learn how to build more sophisticated models using TensorFlow!
比结果更重要的是，我们从这个模型中学习到的设计思想。不过，如果你仍然对这里的结果有点失望，可以查看下一个教程，在那里你可以学习如何用FensorFlow构建更加复杂的模型以获得更好的性能！
完整代码如下所示：

#coding=UTF-8import data.input_data as input_dataimport tensorflow as tf from tensorflow.contrib.metrics.python.metrics.classification import accuracyfrom numpy import float32mnist = input_data.read_data_sets("MNIST_data/",one_hot=True)x = tf.placeholder(tf.float32,[None,784])W = tf.Variable(tf.zeros([784,10]))b = tf.Variable(tf.zeros([10]))y = tf.nn.softmax(tf.matmul(x, W)+b)y_ = tf.placeholder(tf.float32, [None,10])cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=tf.log(y), labels=y_))train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)sess = tf.InteractiveSession()#Traintf.global_variables_initializer().run()for _ in range(1000):    batch_x,batch_y = mnist.train.next_batch(100)    sess.run(train_step, feed_dict={x:batch_x,y_:batch_y})correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))accuray = tf.reduce_mean(tf.cast(correct_prediction, "float"))print(sess.run(accuray,feed_dict={x:mnist.test.images,y_:mnist.test.labels}))

阅读全文

0 0