利用tensorflow 一步一步实现一个简单神经网络,线性回归

来源:互联网 发布:天猫跟淘宝哪个质量好 编辑:程序博客网 时间:2024/06/02 05:34


A look at a very simple neural network inTensorFlow



This is an introduction to working withTensorFlow. It works through an example of a very simple neural network,walking through the steps of setting up the input, adding operators, setting upgradient descent, and running the computation graph.


This tutorial presumes some familiaritywith the TensorFlow computational model, which is introduced in the Hello, TensorFlownotebook, also available in this bundle.



A simple neural network



Let's start with code. We're going toconstruct a very simple neural network computing a linear regression betweentwo variables, y and x. The function it tries to compute is the best w1 and w2it can find for the function y=w2x+w1  for the data. The data we're going to give itis toy data, linear perturbed with random noise.



This is what the network looks like:


@test {"output":"ignore"}

import tensorflow as tf

import numpy as np

import matplotlib.pyplot as plt


%matplotlib inline


# Set up the data with a noisy linearrelationship between X and Y.


num_examples = 50 


X = np.array([np.linspace(-2, 4,num_examples), np.linspace(-6, 6, num_examples)])

X 数组是第一部分是-24,分为num_examples等分,第二部分是-66分为num_examples等分

X += np.random.randn(2, num_examples)


x, y = X

# 分别取出道x列,y

x_with_bias = np.array([(1., a) for a inx]).astype(np.float32)

#x_with_bias 是一个2为数组,其中1.0是固定的,而a是取自x里面的数字。


losses = []


training_steps = 50


learning_rate = 0.002



with tf.Session() as sess:

    #Set up all the tensors, variables, and operations.

input =tf.constant(x_with_bias)

# 输入数,这个是一个行向量[1x[i]]

target =tf.constant(np.transpose([y]).astype(np.float32))


weights = tf.Variable(tf.random_normal([2,1], 0, 0.1))




#tf 全局变量初始化运行。


yhat =tf.matmul(input, weights)


yerror =tf.subtract(yhat, target)


   loss = tf.nn.l2_loss(yerror)

    #使用tf,计算L2_loss 损失

update_weights= tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)




for _ inrange(training_steps):

    # Repeatedly run the operations, updatingthe TensorFlow variable.







# Training isdone, get the final values for the graphs


   betas = weights.eval()

   yhat = yhat.eval()


# Show the fit and the loss over time.


fig, (ax1, ax2) = plt.subplots(1, 2)



# 两个图之间的间隔是0.3

fig.set_size_inches(10, 4)

ax1.scatter(x, y, alpha=.7)

#1描绘的x,y之间关系点的散列图,alpha =0.7表示透明度值

ax1.scatter(x, np.transpose(yhat)[0],c="g", alpha=.6)



line_x_range = (-4, 6)

#x 轴的范围为 -46

ax1.plot(line_x_range, [betas[0] + a *betas[1] for a in line_x_range], "b", alpha=0.6)


ax2.plot(range(0, training_steps), losses)

# 2画出每次训练对应的损失关系。



ax2.set_xlabel("Training steps")

#2设置x轴标签为Training steps




From the beginning



The data


This is a toy data set here. We have 50(x,y) data points. At first, the data is perfectly linear.

这些数据是这个样子的,我们有50 x,y,他们是完全是线性的,执行下下面代码到python就可以看到具体的直观图像

#@test {"output":"ignore"}

num_examples = 50

X = np.array([np.linspace(-2, 4,num_examples), np.linspace(-6, 6, num_examples)])


plt.scatter(X[0], X[1],c="g")




Then we perturb it with noise:


#@test {"output":"ignore"}

X += np.random.randn(2, num_examples)


plt.scatter(X[0], X[1])



What we want to do


What we're trying to do is calculate thegreen line below:


#@test {"output":"ignore"}

weights = np.polyfit(X[0], X[1], 1)


plt.scatter(X[0], X[1])

line_x_range = (-3, 5)

plt.plot(line_x_range, [weights[1] + a *weights[0] for a in line_x_range], "g", alpha=0.8)



Remember that our simple network looks likethis:


from IPython.display import Image

import base64



That's equivalent to the function ŷ=w2x+w1. What we're trying to do is find the "best" weights w1 and w2.That will give us that green regression line above.


What are the best weights? They're theweights that minimize the difference between our estimate ŷ y^ and the actualy. Specifically, we want to minimize the sum of the squared errors, so minimize∑(ŷ −y)^2, which is known as the L2 loss. So, the best weights arethe weights that minimize the L2 loss.

然而我们可能会问呀,什么是最好的权重参数呢?一般我们是这么定义的:目标和预测的值之间差的绝对值的和最小,或差的平方和最小,而差的平方和最小就是我们知道的L2损失。因此最好的权重参数就是这组参数可以使得我们的L2 loss最小。

Gradient descent




What gradient descent does is start withrandom weights for ŷ =w2x+w1 and gradually moves those weights toward bettervalues.


It does that by following the downwardslope of the error curves. Imagine that the possible errors we could get withdifferent weights as a landscape. From whatever weights we have, moving in somedirections will increase the error, like going uphill, and some directions willdecrease the error, like going downhill. We want to roll downhill, alwaysmoving the weights toward lower error.


How does gradient descent know which way isdownhill? It follows the partial derivatives of the L2 loss. The partialderivative is like a velocity, saying which way the error will change if wechange the weight. We want to move in the direction of lower error. The partialderivative points the way.


So, what gradient descent does is startwith random weights and gradually walk those weights toward lower error, usingthe partial derivatives to know which direction to go.


The code again


Let's go back to the code now, walkingthrough it with many more comments in the code this time:


#@test {"output":"ignore"}

import tensorflow as tf

import numpy as np

import matplotlib.pyplot as plt


# Set up the data with a noisy linearrelationship between X and Y.

num_examples = 50

X = np.array([np.linspace(-2, 4,num_examples), np.linspace(-6, 6, num_examples)])

# Add random noise (gaussian, mean 0, stdev1)

X += np.random.randn(2, num_examples)

# Split into x and y

x, y = X

# Add the bias node which always has avalue of 1


x_with_bias = np.array([(1., a) for a inx]).astype(np.float32)


# Keep track of the loss at each iterationso we can chart it later


losses = []

# How many iterations to run our training

training_steps = 50

# The learning rate. Also known has thestep size. This changes how far

# we move down the gradient toward lowererror at each step. Too large

# jumps risk inaccuracy, too small slow thelearning.


learning_rate = 0.002


# In TensorFlow, we need to run everythingin the context of a session.

with tf.Session() as sess:

    #Set up all the tensors.

    #Our input layer is the x value and the bias node.

   input = tf.constant(x_with_bias)

    #Our target is the y values. They need to be massaged to the right shape.

   target = tf.constant(np.transpose([y]).astype(np.float32))

    #Weights are a variable. They change every time through the loop.

    #Weights are initialized to random values (gaussian, mean 0, stdev 0.1)

   weights = tf.Variable(tf.random_normal([2, 1], 0, 0.1))


    #Initialize all the variables defined above.



    #Set up all operations that will run in the loop.

    #For all x values, generate our estimate on all y given our current

    #weights. So, this is computing y = w2 * x + w1 * bias

   yhat = tf.matmul(input, weights)

    #Compute the error, which is just the difference between our

    #estimate of y and what y actually is.

   yerror = tf.subtract(yhat, target)

    #We are going to minimize the L2 loss. The L2 loss is the sum of the

    #squared error for all our estimates of y. This penalizes large errors

    #a lot, but small errors only a little.

   loss = tf.nn.l2_loss(yerror)


    #Perform gradient descent.

    #This essentially just updates weights, like weights += grads * learning_rate

    #using the partial derivative of the loss with respect to the

    #weights. It's the direction we want to go to move toward lower error.

   update_weights =tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)


    #At this point, we've defined all our tensors and run our initialization

    #operations. We've also set up the operations that will repeatedly be run

    #inside the training loop. All the training loop is going to do is

    #repeatedly call run, inducing the gradient descent operation, which has theeffect of

    #repeatedly changing weights by a small amount in the direction (the

    #partial derivative or gradient) that will reduce the error (the L2 loss).

   for _ in range(training_steps):

       # Repeatedly run the operations, updating the TensorFlow variable.



       # Here, we're keeping a history of the losses to plot later

       # so we can see the change in loss as training progresses.



    #Training is done, get the final values for the charts

   betas = weights.eval()

   yhat = yhat.eval()


# Show the results.

fig, (ax1, ax2) = plt.subplots(1, 2)


fig.set_size_inches(10, 4)

ax1.scatter(x, y, alpha=.7)

ax1.scatter(x, np.transp

line_x_range = (-4, 6)

ax1.plot(line_x_range, [betas[0] + a *betas[1] for a in line_x_range], "g", alpha=0.6)

ax2.plot(range(0, training_steps), losses)


ax2.set_xlabel("Training steps")




This version of the code has a lot morecomments at each step. Read through the code and the comments.


The core piece is the loop, which containsa single run call. run executes the operations necessary for theGradientDescentOptimizer operation. That includes several other operations, allof which are also executed each time through the loop. TheGradientDescentOptimizer execution has a side effect of assigning to weights,so the variable weights changes each time in the loop.



The result is that, in each iteration ofthe loop, the code processes the entire input data set, generates all theestimates ŷ  for each x given thecurrent weights wi, finds all the errors and L2 losses (ŷ −y)^2, and thenchanges the weights wi by a small amount in the direction of that will reducethe L2 loss.


最终的结果就是,在这个循环中,每次迭代,系统代码都会运用这些输入数据集,然后生成所有统计y帽,而每一个y帽对应一个给定的x和相应的权重wi,在计算他们y帽和目标值之间的差,从而利用公式L2_losses ((ŷ −y)^2, 获得下一个权重wi+1,这个wi 会让L2 loss 慢慢变小。


After many iterations of the loop, theamount we are changing the weights gets smaller and smaller, and the loss getssmaller and smaller, as we narrow in on near optimal values for the weights. Bythe end of the loop, we should be near the lowest possible values for the L2loss, and near the best possible weights we could have.

经过多次循环迭代,权重会变得越来越小,从而会让损失loss 也变得越来越小,当循环结束后,我们将获得尽可能小的L2_loss值,这样尽肯能接近最优的值也就是我们要获取的。


The details


This code works, but there are still a fewblack boxes that are worth diving into here. l2_loss? GradientDescentOptimizer?What exactly are those doing?