神经网络（二）

来源：互联网发布：mysql数据库设计编辑：程序博客网时间：2024/05/15 23:45

在本章节，我们将结合一小段python代码来进一步地说明神经网络的思想。（源代码可以从https://github.com/mnielsen/neural-networks-and-deep-learning 下载，本文只是对该代码进行了学习，未做任何修改）。关于神经网络的具体理论请参考上一章节进行学习。

核心代码是Network对象：

class Network(object):sizes变量指明神经网络共多少层，每一层包含多少个神经元。神经网络参数bias和weight都以随机值作为迭代优化的初始点。    def __init__(self, sizes):        """The list ``sizes`` contains the number of neurons in the        respective layers of the network.  For example, if the list        was [2, 3, 1] then it would be a three-layer network, with the        first layer containing 2 neurons, the second layer 3 neurons,        and the third layer 1 neuron.  The biases and weights for the        network are initialized randomly, using a Gaussian        distribution with mean 0, and variance 1.  Note that the first        layer is assumed to be an input layer, and by convention we        won't set any biases for those neurons, since biases are only        ever used in computing the outputs from later layers."""        self.num_layers = len(sizes)        self.sizes = sizes        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]        self.weights = [np.random.randn(y, x)                        for x, y in zip(sizes[:-1], sizes[1:])]

前向传播代码：
根据公式，当前层神经元的激活值al=σ(wlal−1+bl)，由第一层神经元依次向后推算每一层神经元的激活值，直至输出层神经元。

def feedforward(self, a):    """Return the output of the network if ``a`` is input."""    for b, w in zip(self.biases, self.weights):        a = sigmoid(np.dot(w, a)+b)    return a

sigmoid函数及其导数代码段：

def sigmoid(z):    """The sigmoid function."""    return 1.0/(1.0+np.exp(-z))

def sigmoid_prime(z):    """Derivative of the sigmoid function."""    return sigmoid(z)*(1-sigmoid(z))

向后传播代码段：
SGD是向后传播的主要代码，下面这段代码的核心update_mini_batch函数，除此之外是循环多个epoch，对于每个epoch，产生训练数据mini-batch。算法结束后，使用测试数据对当前神经网络的性能进行测试。

def SGD(self, training_data, epochs, mini_batch_size, eta,            test_data=None):    """Train the neural network using mini-batch stochastic    gradient descent.  The ``training_data`` is a list of tuples    ``(x, y)`` representing the training inputs and the desired    outputs.  The other non-optional parameters are    self-explanatory.  If ``test_data`` is provided then the    network will be evaluated against the test data after each    epoch, and partial progress printed out.  This is useful for    if test_data: n_test = len(test_data)    n = len(training_data)    for j in xrange(epochs):#each epoch round        random.shuffle(training_data)#re-arrange the data randomly        #all mini-batch data        mini_batches = [            training_data[k:k+mini_batch_size]            for k in xrange(0, n, mini_batch_size)]        for mini_batch in mini_batches:            self.update_mini_batch(mini_batch, eta)        if test_data:            print "Epoch {0}: {1} / {2}".format(                j, self.evaluate(test_data), n_test)        else:            print "Epoch {0} complete".format(j)

根据梯度下降公式迭代更新bias和weight的值，另外注意的是，梯度的计算要基于所有的训练数据来求解，这也是我们使用随机梯度下降来提高运行速度的原因，

w k \leftarrow w k - η m \sum j \partial C X j \partial w k

b l \leftarrow b l - η m \sum j \partial C X j \partial b l

def update_mini_batch(self, mini_batch, eta):    """Update the network's weights and biases by applying    gradient descent using backpropagation to a single mini batch.    The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta``    is the learning rate."""    #nabla_b and nabla_w stores the gradient values for bias and weight    #the gradient value is computed based on the all traning examples    nabla_b = [np.zeros(b.shape) for b in self.biases]    nabla_w = [np.zeros(w.shape) for w in self.weights]    for x, y in mini_batch:        delta_nabla_b, delta_nabla_w = self.backprop(x, y)        nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]        nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]    #update weights and bias according to gradient descent formula    self.weights = [w-(eta/len(mini_batch))*nw                   for w, nw in zip(self.weights, nabla_w)]    self.biases = [b-(eta/len(mini_batch))*nb                   for b, nb in zip(self.biases, nabla_b)]

向后传播的具体代码是写在backprop函数中，其中对于每一个训练数据(x,y)，x是输入值，y是真实的数字标签。
其中，神经网络中所有层的激活值存在activations变量中，所有层的z值存在zs变量中，使用向前传播算法可依次计算每一层的激活值和z值。使用向后传播算法可计算每一层的error项，并由error项分别计算每一层的bias和weight的值。

def backprop(self, x, y):    """Return a tuple ``(nabla_b, nabla_w)`` representing the    gradient for the cost function C_x.  ``nabla_b`` and    ``nabla_w`` are layer-by-layer lists of numpy arrays, similar    to ``self.biases`` and ``self.weights``."""    nabla_b = [np.zeros(b.shape) for b in self.biases]    nabla_w = [np.zeros(w.shape) for w in self.weights]    # feedforward    activation = x    activations = [x] # list to store all the activations, layer by layer    zs = [] # list to store all the z vectors, layer by layer    for b, w in zip(self.biases, self.weights):        z = np.dot(w, activation)+b        zs.append(z)        activation = sigmoid(z)        activations.append(activation)    # backward pass    #compute the output layer's error term    delta = self.cost_derivative(activations[-1], y) * \        sigmoid_prime(zs[-1])    #compute the output layer's bias and weight term    nabla_b[-1] = delta#the error is the bias value as proved before    nabla_w[-1] = np.dot(delta, activations[-2].transpose())#a*delta is the weight value as proved before    # Note that the variable l in the loop below is used a little    # differently to the notation in Chapter 2 of the book.  Here,    # l = 1 means the last layer of neurons, l = 2 is the    # second-last layer, and so on.  It's a renumbering of the    # scheme in the book, used here to take advantage of the fact    # that Python can use negative indices in lists.    for l in xrange(2, self.num_layers):#for each layer        z = zs[-l]        sp = sigmoid_prime(z)        delta = np.dot(self.weights[-l+1].transpose(), delta) * sp#error backpropogation        nabla_b[-l] = delta#bias term        nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())#weight term    return (nabla_b, nabla_w)

性能评估代码：
对于每一个输入数据，我们认为，在输出层的10个神经元中，输出值最大的神经元对应的ID就是该输入数据的预测标签。

def evaluate(self, test_data):    """Return the number of test inputs for which the neural    network outputs the correct result. Note that the neural    network's output is assumed to be the index of whichever    neuron in the final layer has the highest activation."""    test_results = [(np.argmax(self.feedforward(x)), y)                        for (x, y) in test_data]    return sum(int(x == y) for (x, y) in test_results)

0 0