利用tensorflow 一步一步实现一个简单神经网络,线性回归

来源:互联网 发布:天猫跟淘宝哪个质量好 编辑:程序博客网 时间:2024/06/02 05:34

下面是基于一个简单例子,一步一步注释加翻译


A look at a very simple neural network inTensorFlow

让我们来看一个基于tensorFlow创建一个非常简单的神经网络,实现线性回归的功能。

 

This is an introduction to working withTensorFlow. It works through an example of a very simple neural network,walking through the steps of setting up the input, adding operators, setting upgradient descent, and running the computation graph.

这是一个关于TensorFlow如何使用的说明,我们将通过一个非常简单的神经网络例子,一步一步操作,包括设置输入数据,添加一些操作,建立梯度下降法,到最后运行这个计算模型。

This tutorial presumes some familiaritywith the TensorFlow computational model, which is introduced in the Hello, TensorFlownotebook, also available in this bundle.

如果对tensorflow编程计算模型不太熟悉,请参考前文tensorflow的入门篇。

 

A simple neural network

一个简单的神经网络

 

Let's start with code. We're going toconstruct a very simple neural network computing a linear regression betweentwo variables, y and x. The function it tries to compute is the best w1 and w2it can find for the function y=w2x+w1  for the data. The data we're going to give itis toy data, linear perturbed with random noise.

现在让我没开始吧,我们的目的是创建一个非常简单的神经网络,用来计算一个线性回归(yx之间的关系),功能就是计算出最优的两个参数w1,w2,从而实现y=w2x+w1这样一个线性关系,这个线性关系使得它能很好地在一组增加了噪声的数据中达到最优。

 

This is what the network looks like:

下面是我们的这个神经网络的实现:

@test {"output":"ignore"}

import tensorflow as tf

import numpy as np

import matplotlib.pyplot as plt

 

%matplotlib inline

 

# Set up the data with a noisy linearrelationship between X and Y.

#建立一个带有噪声的线性关系数组对XY

num_examples = 50 

#50个样本

X = np.array([np.linspace(-2, 4,num_examples), np.linspace(-6, 6, num_examples)])

X 数组是第一部分是-24,分为num_examples等分,第二部分是-66分为num_examples等分

X += np.random.randn(2, num_examples)

#X增加一组正太分布噪声。

x, y = X

# 分别取出道x列,y

x_with_bias = np.array([(1., a) for a inx]).astype(np.float32)

#x_with_bias 是一个2为数组,其中1.0是固定的,而a是取自x里面的数字。

 

losses = []

#记录losses数组

training_steps = 50

#训练次数为50

learning_rate = 0.002

#学习率为0.002,这个值选择很重要,要根据经验来。

 

with tf.Session() as sess:

    #Set up all the tensors, variables, and operations.

input =tf.constant(x_with_bias)

# 输入数,这个是一个行向量[1x[i]]

target =tf.constant(np.transpose([y]).astype(np.float32))

    #目标target

weights = tf.Variable(tf.random_normal([2,1], 0, 0.1))

#权重w1,w2组成的数组weights。这个是一个随机数,21列,平均为0,均方差为0.1

 

tf.global_variables_initializer().run()

#tf 全局变量初始化运行。

 

yhat =tf.matmul(input, weights)

#根据输入,和权重预测出y

yerror =tf.subtract(yhat, target)

#y帽与目标的差

   loss = tf.nn.l2_loss(yerror)

    #使用tf,计算L2_loss 损失

update_weights= tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

#根据梯度下降算法,使用tf计算更新后的权重。

#注意,真正允许还没有开始,目前只是定义好“图形”。

 

for _ inrange(training_steps):

    # Repeatedly run the operations, updatingthe TensorFlow variable.

    #重复允许这些操作,每次都更新tensorflow变量。

       update_weights.run()

        #更新权重运行。

       losses.append(loss.eval())

        #把损失保存到losses数组里,后续会把它画出来。

 

# Training isdone, get the final values for the graphs

#训练完成,取得最终的结果。

   betas = weights.eval()

   yhat = yhat.eval()

 

# Show the fit and the loss over time.

#画图展示每次操作结果和损失值。

fig, (ax1, ax2) = plt.subplots(1, 2)

#首先分为12列图展示

plt.subplots_adjust(wspace=.3)

# 两个图之间的间隔是0.3

fig.set_size_inches(10, 4)

ax1.scatter(x, y, alpha=.7)

#1描绘的x,y之间关系点的散列图,alpha =0.7表示透明度值

ax1.scatter(x, np.transpose(yhat)[0],c="g", alpha=.6)

#1描绘x,y帽之间关系的散列图,颜色为g绿色透明度为0.6

 

line_x_range = (-4, 6)

#x 轴的范围为 -46

ax1.plot(line_x_range, [betas[0] + a *betas[1] for a in line_x_range], "b", alpha=0.6)

#画出最终的一条回归的直线(拟合的直线)颜色为蓝色

ax2.plot(range(0, training_steps), losses)

# 2画出每次训练对应的损失关系。

ax2.set_ylabel("Loss")

#2设置y轴标签为Loss

ax2.set_xlabel("Training steps")

#2设置x轴标签为Training steps

plt.show()

 

 

From the beginning

现在我们开始吧!

 

The data

数据部分:

This is a toy data set here. We have 50(x,y) data points. At first, the data is perfectly linear.

这些数据是这个样子的,我们有50 x,y,他们是完全是线性的,执行下下面代码到python就可以看到具体的直观图像

#@test {"output":"ignore"}

num_examples = 50

X = np.array([np.linspace(-2, 4,num_examples), np.linspace(-6, 6, num_examples)])

plt.figure(figsize=(4,4))

plt.scatter(X[0], X[1],c="g")

plt.show()

 

 

Then we perturb it with noise:

然后我们增加一些噪声,他们应该是下面这个样子的,执行下面的代码到python

#@test {"output":"ignore"}

X += np.random.randn(2, num_examples)

plt.figure(figsize=(4,4))

plt.scatter(X[0], X[1])

plt.show()

 

What we want to do

我们到底要做什么呢?

What we're trying to do is calculate thegreen line below:

我们要做的就是找到2个参数weights[1]weights[0],从而可以实现下面的绿线

#@test {"output":"ignore"}

weights = np.polyfit(X[0], X[1], 1)

plt.figure(figsize=(4,4))

plt.scatter(X[0], X[1])

line_x_range = (-3, 5)

plt.plot(line_x_range, [weights[1] + a *weights[0] for a in line_x_range], "g", alpha=0.8)

plt.show()

 

Remember that our simple network looks likethis:

回顾下我们的神经网络应该是下面这个样子:在python里执行下面的代码

from IPython.display import Image

import base64

Image(data=base64.decodestring("iVBORw0KGgoAAAANSUhEUgAAAJYAAABkCAYAAABkW8nwAAAO90lEQVR4Xu2dT5Dc1J3Hv+YQT8VJZUhVdprLWs4FTSrGGv4ql9CuHBCH4GaTFCLZwnIcjOAy8l6Q/1SlU4XHcg6xJgtY2OOik2KxSGoTGWrXzYFC2T2MDAtWitRavmQ0e9k2SYGowom4hNRPtqA9TE+rW3/cPfPepcfup6f3fu/Tv9/T+/PVpo8//vhjsMQsULAFNjGwCrYoKy6xAAOLgVCKBRhYpZiVFcrAYgyUYgEGVilmZYUysBgDpViAgVWKWVmhDCzGQCkWGEuwrly5gtf++zW887/vYOn/lnD5T5cT40x9ZQrb/nEbxDtFiHeI2LJlSylGY4X2t8BYgUVAvfzqy3i5/TI+vPLhmq37wpYv4AHpATxw3wMMsP4cFJ5jbMAiqA4eOYg/Lv8xMcL26e34+vTXk8+vbv1q8n/03TsX38EfLv4h+aRE380dmmNwFY7O2gWOBVgE1Y/2/yjxUls+vwXaY1oS7tZK3v94MJ8zceUvV0Dea+H4AoOrQrhGHqxuT0Xjp0P7D2HqH6Yymejyu5dx5PiRZBxGnmt+bj7TdSxTfgv0ASuAzglwmyE8pfbZu3VaEDkDdT+AweevzGolvPjvL+LMb84knmr+yHxmqNKyCK7ZQ7OJ5yIo+3m6clqx8UrNB1bso2W64FQN9cnijdcdAvNAQWGRPBcLicX3Ua8S84FVcj3PnjuLhRcWkgH63OG5XHc7+NTBZEBP47NvffNbucpiF/e3QCaw2g0NfNvES5c+wtQ9u2G0LCj8BLAiFEaeBU0zYJ9fxkfYjKl7FZgtCzIHIA7QUmXov/g9LmMztt6rwLBMyFROj3TkZ0fgveXh4X96GN//zvf7t2aNHGlI7VlW0pYmRC+AKUwAsQu5thOuvIjQEjGBGJ7CQYptdOw6etc6VzXXzcUZwJrGseWt2P28DV2I4OgyDgQKFgMTYtQ1xqq10eDuR6j8Fi1NxGTkwpAfRos7h05bQscQIFgibEeHMBHCVhs4EBtY8lQQd6ulvbN78e6f302mC7Z/bXsuo9NkKk1X9PZ+IUyeR0sN4GscYl8DPzOP5VuPYynQwMU+dL4O3wzRbpQQ93O1bvQuzgRWS0p/tQA6Nuqcilq7A5u3Px28T7qw7BB1VUHqhEKTB2+pCAIVHZVD3dPgujpE6peOBzesQRS5nr/+b//g24nF7JN27qkCGq/J++RknHXm5JlVeiKGr/MQPQMdV0ZkCRBbNUwEMYzQhRyZEHgHOv29ynPM6HXtja1Rf7B4AZ7RgZv+SuMAOj+NtrYEX3avfyqMfDi2DdcLEAQBvPOX8MGtR3Ex0MEFJiRxP373wWZsvaeBhixDVRrg1/jxlwEWPV3ap+xVrR57Cjgpht2xEDV4mLIFvqkiaoUwwzp4U4Hv9/awN7YrR+vuGcAS4ZsdtKV0VNEFVqMLrIkWJGEPPP4hKA0RgiCAc1XsdJQErGQ2Ig7hOQ5sx4Hz0u+wvHX2akjtMWCpNhQCiCicq+AcCx1Fh9B2IegcNN6B4Teg1z0EeknzKqPFRe7a9AeLm4ajXvzUoJEDqUahMESrKxSqbQHbDBGLoXUNlBiuUsNOT8fFQEVsNdHmdOjStTgSGOCnLTQuBDBosLxKqnTwntw/glPnoHMS4E6iFVjgbBGcwUGMPAjtawP73GZf/wVkAutYtAvPezYUPoKjipBdGZ5vQOgavGteHbfsiXD09TZUIUbg6JD3vITlrU/iYthErPOYaQk44ZhocDF8U0HDqsEOHfQaC7/2X68lyzJVTjd0WiJu2XMem++7+tAxSd52+hguTe3GYtjq6V3XPyqDtbA/WLyAtqRg0rHhLceo3avCsk0kjqd7uoEL0FJkaC/9Hh/gS9ixS0dTCaDKHVidNhoTNN2gQP/FedAmly/t2IWm2YK2xswqDbj3antzz5oToD/915/i5smbcdo8vfaDQGiC37YfEyeW4KtcMu2g1HbCrp9Dx5Fw3ZCw04ZSb0Jse6CsLH1qgZFfK0znn+hpznzKHGpJRzus4YJ/AX/78G94ofUC7r777pwMxAhdE6pyAK8u78CJJZ+BtcKiIw8Wea0DTx34ZCH5oHYwM1y0TjhnziXbaWgB+4cP/RCPPfYYtm/fjpMnT+Kmm24aDrDYhdpoQdAbaMtNSB4Da6UhRx4sqnB3SCTPNbtvtu9iMoU/Wg5Kt9p0h8DTp09j3759ePrpp/H4448PB1fylOtC5jTUGVifseFYgJXClXou+jcN6Gk2nj7JG1Gi7TG0Hkiz7OlGP/ru6OGjq46rnnjiCSwuLibe66677hocMAZWT5uNDVgpXGfbZ5OtybQNZq1EE6G0NXmXtGvNwbrv+4n3uu222wYPjwys9QFW2goKjbQ4Tdth6CAFeSpK5J3oQMUwhynS8PjMM89AVdVs3ouBtb7Aytbrw+WiMZfnednCIwOLgTUIZml43LFjB5577rnhnx4Huek6yztWY6yqbb+wsJBMTwwUHquu5Ijej4GVoWMoPJ4/fz7xXkM9PWa4x3rLwsDK2KMXLlxIvBeFR5qe2LRpU8YrN2Y2BtaA/U7hkaYnnn322exPjwPeYz1kZ2AN2YtpeCTvdeeddw5Zyvq9jIGVo28pPJL3ok2NLDxeb0gGVg6w0kvT8HjixIlkHJY1lauaE8GRangwsvD/noKqt+kzsLJSkCEfzdi/8cYbifdaKzxWoppDmxJ5FT54NH06YZShAQVmYWAVaEwqKg2PMzMzyfTEyqfHqlRzAoOH6OqwJnXoNQeBSWcjq0sMrJJsferUqSQsdofHylRzYg8aLyG0QtiTOvhGhFZglyKD0Mt8DKySwEqLpfD45ptvYn5+Hr/+z19/sukwj2pOP72vyJXBy4BNME340Pg6AiNAu8IDkQysksGi4t9++2189wffxee++DkIO4TcqjlrSw504Eg81FobYetq+KOwKDgagjVOnRdtBgZW0RZdpbw0BL73/nv4yZM/6bv7tVeVxkk1h4FVAVgbUTWHgVUBWGUcvCVV6EP/cuiztQ9NCNsMiIshrPSIeaK3oUNIlXQqaDMDqwIjlyEV0Fv6MoQlbENT/FTIhWSXOF2AF5jocei8cCswsAo36WcLLEPchO7yyr+9smrt6TQ3geQmcgcd2CQbIHoIDKGyuSwG1joEi06oU+jj3RAWR2HQgFiiTuxqJmRgVQBWGaGQDo78/OjPe9T+qpfSeBeeqIM3JPip4k8F7aVbMLAqMHSlg/dr7YkcCZxWg1Jz0G5UL7/EwKoArBuhmoNEbupBvPrRDhxf8qFVLFrCwKoArFQi4P3o/VwTpCmgdBi3r2oOIrQbNdwfGljytZ46r2U1n4FVlmW7yn3rrbfwvX/+XrKkMyPM5FLNIS2KbCrSNI8loKX48G6AxhIDq2SwaIcDgWWaJn71H78qRDWnlxbF1aaQxJILj6TRjRhm0L4hYrwMrJLAos1+BBXtyaLty5SKVs1Zverx1RB4dhIPPe/CVioeXF2rFAOrYLDIOxFQd9xxRwLVytSt90XfFaGaU3ATCimOgVWIGa8WkoY9AorA6pUIrqJVcwpsRiFFMbAKMONqYS9LsWWo5mS5bxV5GFg5rExhj8ZPdHBitbCXo+ixv5SBNWQXpmGPvNXtt98+ZCnr9zIG1oB9O2zYG/A2Y5+dgZWxC1nYy2goNt2Q3VA0jqIDESzsZbcZ81hr2CoNe/T56KOPZrcqy8m2zazGAAt7+X8ZzGOtsCELe/mhohLGEqwyVFpY2CsGqLSUsQKrDJUWFvaKBWrswCpDpYWFvXKgKiYUxh5U/huwhd8idBqYRARX4bHTldd8Le8gTSpapYWWX0is47qnveTdi02I6aFOejlAbSdcOT2fF8NTOEixDTqnV6Uk0CC2GpW8hYTCyFXA72yj8XoAAzoE+nsxgNnrZc8DtL7bU9HJlDwqLY9855FkbY8ktS3LWlGLECbPo6UG8DUOsa+Bn5nH8q3HsRRo4GISL6vDN0O0e70SdoB2rfeshYBF71Juyzzu90TcF59FIC8WJvSVvgiT9nnPH5nP/K7CtOPonYWzh2aTF2Fu+usmvPjLF3us7cXwdR6iZ6DjyogsAWKrhokghhG6kCMTAu9Ap7+r1l0cQwoLAote4+ugwT+IsxO78XrQKkTkqzsEkqeily8Nk0il5cfHfowv3/xlLBxf6Pk2sNhTwEkx7I6FqMHDlC3wTRVRK4QZ1sGbCnxfrfxgwjBtvtHXFAZW7OsQZo7hEm7Fkxf8nm+mH6TBlau0RG00OBWcY6Gj6BDaLgSdDn46MPwG9Hr15/MGsdco5S0GrDiAIU7D5M/AgIo9gY6Lng4+5wi3jIOea59wieCQzgEnAe4kWoEFzhbBGRzEyIPQDmBWpaoxSpQMUZdCwCLh1OlmDWcCBzJsSNzDiIyL8LR8Ur1lHE2nPeZzh+d6mooENW7Zcx6b7zuHTlvCJB1Nnz6GS1O7sUhKxDl/LEP00Vhekh8sUjThNUyYAdxr59dCSwSvAWbg5Xq7exkqLfRO6TMnz/TurNAEv20/Jk4swaf2xC6U2k7Y9XPoOBIm6crYh6UoaLodABOoSU3YlpLbQ48lQT0qnR+sEq1RBlj0dGmfsnPVOtB51IMmfEdGLQ7RkkSYkps8VbJ01QIjDdaNCIVZwOi4DnxOgsRRXIzhazwakY3gmphsljLWe56RBqv6wfvg3R0HFqS6CcHxC5kQHrwGo3nFSIN1Q1RaBuinyDchSyYmDRcthWPLPF22G2mwuo+k55kgHUylJRtZoa1A0kI0bAdGPRnSszQuYFE90yUdepoznzKHWtLRDmsglZY8cHZTE7UVCGqEpmtDScZZLK20wEh7LKpst9YBKQUf1A5mhovWCefMuU9eM9JbWnEQMAIY/DQOXLr+mqmHXkfIdj18YpSRByuFa6+2F1f+cgXkuWb3zfZdN6Twt/DCQuKpsgmVDQIXy9vPAmMB1krPRf9eryot/TpsXL4fG7BSuNa7Ssu4gNOvnmMFVtqY9azS0q/DxuX7sQRrXIy7kevJwNrIvV9i2xlYJRp3IxfNwNrIvV9i2xlYJRp3IxfNwNrIvV9i2xlYJRp3IxfNwNrIvV9i2xlYJRp3Ixf9d0NIelzdt4X5AAAAAElFTkSuQmCC".encode('utf-8')),embed=True)

 

That's equivalent to the function ŷ=w2x+w1. What we're trying to do is find the "best" weights w1 and w2.That will give us that green regression line above.

它等效于这样的一个功能方程ŷ=w2x+w1.而我们的目的是尽量找到最好的权重参数w1,w2,有了这两个参数,我们就可以画出上面那个绿色回直归线。

What are the best weights? They're theweights that minimize the difference between our estimate ŷ y^ and the actualy. Specifically, we want to minimize the sum of the squared errors, so minimize∑(ŷ −y)^2, which is known as the L2 loss. So, the best weights arethe weights that minimize the L2 loss.

然而我们可能会问呀,什么是最好的权重参数呢?一般我们是这么定义的:目标和预测的值之间差的绝对值的和最小,或差的平方和最小,而差的平方和最小就是我们知道的L2损失。因此最好的权重参数就是这组参数可以使得我们的L2 loss最小。

Gradient descent

梯度下降法:

请参考这篇文章,有详细介绍

http://blog.csdn.net/fu_shuwu/article/details/76165824

What gradient descent does is start withrandom weights for ŷ =w2x+w1 and gradually moves those weights toward bettervalues.

 

It does that by following the downwardslope of the error curves. Imagine that the possible errors we could get withdifferent weights as a landscape. From whatever weights we have, moving in somedirections will increase the error, like going uphill, and some directions willdecrease the error, like going downhill. We want to roll downhill, alwaysmoving the weights toward lower error.

 

How does gradient descent know which way isdownhill? It follows the partial derivatives of the L2 loss. The partialderivative is like a velocity, saying which way the error will change if wechange the weight. We want to move in the direction of lower error. The partialderivative points the way.

 

So, what gradient descent does is startwith random weights and gradually walk those weights toward lower error, usingthe partial derivatives to know which direction to go.

 

The code again

再来看看我们的代码:

Let's go back to the code now, walkingthrough it with many more comments in the code this time:

现在我们再回头看看代码吧,这次我们将在里面添加了更多的注释。

#@test {"output":"ignore"}

import tensorflow as tf

import numpy as np

import matplotlib.pyplot as plt

 

# Set up the data with a noisy linearrelationship between X and Y.

num_examples = 50

X = np.array([np.linspace(-2, 4,num_examples), np.linspace(-6, 6, num_examples)])

# Add random noise (gaussian, mean 0, stdev1)

X += np.random.randn(2, num_examples)

# Split into x and y

x, y = X

# Add the bias node which always has avalue of 1

#增加一个偏置节点,让它的值一直为1.0

x_with_bias = np.array([(1., a) for a inx]).astype(np.float32)

 

# Keep track of the loss at each iterationso we can chart it later

#跟踪每次迭代的损失值,后面我们就要用它来画图了

losses = []

# How many iterations to run our training

training_steps = 50

# The learning rate. Also known has thestep size. This changes how far

# we move down the gradient toward lowererror at each step. Too large

# jumps risk inaccuracy, too small slow thelearning.

#学习率,也就是所谓步长的意思,步长太大会不精确,太小的话,训练学习过程就比较慢,比较花时间

learning_rate = 0.002

 

# In TensorFlow, we need to run everythingin the context of a session.

with tf.Session() as sess:

    #Set up all the tensors.

    #Our input layer is the x value and the bias node.

   input = tf.constant(x_with_bias)

    #Our target is the y values. They need to be massaged to the right shape.

   target = tf.constant(np.transpose([y]).astype(np.float32))

    #Weights are a variable. They change every time through the loop.

    #Weights are initialized to random values (gaussian, mean 0, stdev 0.1)

   weights = tf.Variable(tf.random_normal([2, 1], 0, 0.1))

 

    #Initialize all the variables defined above.

   tf.global_variables_initializer().run()

 

    #Set up all operations that will run in the loop.

    #For all x values, generate our estimate on all y given our current

    #weights. So, this is computing y = w2 * x + w1 * bias

   yhat = tf.matmul(input, weights)

    #Compute the error, which is just the difference between our

    #estimate of y and what y actually is.

   yerror = tf.subtract(yhat, target)

    #We are going to minimize the L2 loss. The L2 loss is the sum of the

    #squared error for all our estimates of y. This penalizes large errors

    #a lot, but small errors only a little.

   loss = tf.nn.l2_loss(yerror)

 

    #Perform gradient descent.

    #This essentially just updates weights, like weights += grads * learning_rate

    #using the partial derivative of the loss with respect to the

    #weights. It's the direction we want to go to move toward lower error.

   update_weights =tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

 

    #At this point, we've defined all our tensors and run our initialization

    #operations. We've also set up the operations that will repeatedly be run

    #inside the training loop. All the training loop is going to do is

    #repeatedly call run, inducing the gradient descent operation, which has theeffect of

    #repeatedly changing weights by a small amount in the direction (the

    #partial derivative or gradient) that will reduce the error (the L2 loss).

   for _ in range(training_steps):

       # Repeatedly run the operations, updating the TensorFlow variable.

       sess.run(update_weights)

 

       # Here, we're keeping a history of the losses to plot later

       # so we can see the change in loss as training progresses.

       losses.append(loss.eval())

 

    #Training is done, get the final values for the charts

   betas = weights.eval()

   yhat = yhat.eval()

 

# Show the results.

fig, (ax1, ax2) = plt.subplots(1, 2)

plt.subplots_adjust(wspace=.3)

fig.set_size_inches(10, 4)

ax1.scatter(x, y, alpha=.7)

ax1.scatter(x, np.transp

line_x_range = (-4, 6)

ax1.plot(line_x_range, [betas[0] + a *betas[1] for a in line_x_range], "g", alpha=0.6)

ax2.plot(range(0, training_steps), losses)

ax2.set_ylabel("Loss")

ax2.set_xlabel("Training steps")

plt.show()

 

 

This version of the code has a lot morecomments at each step. Read through the code and the comments.

这个版本的代码每一步都带有很多注释,你可以一边看代码和注释。

The core piece is the loop, which containsa single run call. run executes the operations necessary for theGradientDescentOptimizer operation. That includes several other operations, allof which are also executed each time through the loop. TheGradientDescentOptimizer execution has a side effect of assigning to weights,so the variable weights changes each time in the loop.

这个代码核心部分是在循环里,那里有一个run调用,run执行了一个必要的梯度下降最优算法,这个函数操作保护了几个其它小操作,所有这些操作在一个循环里每次都将被执行,而GradientDescentOptimizer函数执行完后都要赋值到weights,因此在这个循环中每次操作后这个weigihts也都会得到更新

 

The result is that, in each iteration ofthe loop, the code processes the entire input data set, generates all theestimates ŷ  for each x given thecurrent weights wi, finds all the errors and L2 losses (ŷ −y)^2, and thenchanges the weights wi by a small amount in the direction of that will reducethe L2 loss.

 

最终的结果就是,在这个循环中,每次迭代,系统代码都会运用这些输入数据集,然后生成所有统计y帽,而每一个y帽对应一个给定的x和相应的权重wi,在计算他们y帽和目标值之间的差,从而利用公式L2_losses ((ŷ −y)^2, 获得下一个权重wi+1,这个wi 会让L2 loss 慢慢变小。

 

After many iterations of the loop, theamount we are changing the weights gets smaller and smaller, and the loss getssmaller and smaller, as we narrow in on near optimal values for the weights. Bythe end of the loop, we should be near the lowest possible values for the L2loss, and near the best possible weights we could have.

经过多次循环迭代,权重会变得越来越小,从而会让损失loss 也变得越来越小,当循环结束后,我们将获得尽可能小的L2_loss值,这样尽肯能接近最优的值也就是我们要获取的。

 

The details

 

This code works, but there are still a fewblack boxes that are worth diving into here. l2_loss? GradientDescentOptimizer?What exactly are those doing?