神经网络和深度学习（一）

来源：互联网发布：刷qq会员软件编辑：程序博客网时间：2024/05/29 14:20

这两天看了 Neural Networks and Deep Learning 网上在线书目的第一章的内容和斯坦福大学《机器学习》的公开课，学习了两种主要的神经网络结构和机器学习中重要的算法——随机梯度下降算法。现在总结如下：

一个计算模型要划分为神经网络，通常需要大量彼此连接的节点（神经元），具有两个特特性：

1.每个神经元通过某种特定的输出函数（或称激励函数 activation function）计算处理来自其他相邻神经元的加权输入值

2.神经元之间的信息传递的强度，用所谓加权值来定义，算法不断自我学习，调整权值weights

在此基础上，神经网络的计算模型，依靠大量的数据来训练。

几个概念：

cost function(成本函数) ：用来定量评估根据特定输入值计算出来的输出结果，离正确值的偏差

learning algorithm : 根据cost function的结果，自我纠正，最快找到神经元之间的最优化的weights权重

神经元Perceptron :

图1 Perceptron neuron

其中x1, x2, x3 为inputs ,且必须为二进制数字（0 or 1），outputs 也是只有二进制输出。w 为计算权重，这个w设计是重点也是难点。其中计算公式如下：

简化公式，，其中w 和 x 分别代表权重和输入向量，用偏置 b= -threshold,

神经元Sigmoid :

图2 Sigmoid neurons

比较percrptron 神经元和 Sigmoid神经元，发现他们的结构是一样的，但是对于inputs取值不同，Sigmoid 神经元的Inputs 可以取0~1中的任意值，而且输出值不是0 or 1, 而是

，这里σ 被称为 sigmoid 函数，定义如下：

所以，inputs为x1, x2, ..., weights w1, w2,..., bias b 所对应的sigmoid neuron 的输出为：

根据公式，可以得到sigmoid 函数的响应曲线，如下

神经网络的架构

如上图所示，神经网络架构包括输入层、输出层和隐藏层。这种多层网络被称为 multilayer perceptrons or MLPs 。

梯度下降算法（gradient descent）：

为了能够检验对于所有的训练输入值x,我们选择的weihts权重和 biases偏置使得输出值都近似和 y(x) 相等，使用了一个cost function(成本函数 or loss or objective function)：

其中，w 代表网络中所有权重的集合，b 代表所有的偏置，n 是训练输入的总数目，a 是输出向量（依赖于x 、w、b）

如果C(w,b) ≈ 0 ，那么对于所有的training inputs x, y(x) 约等于output a .非常好

如果 C(w,b) 非常大，那么说明对于很多inputs ，y(x)不收敛到outputs a。

我们训练算法的目的是minimize 函数C(w,b)，换句话说，我们想要找到一系列的w(权重)和b(偏置)，使得 C(w,b) 尽可能的小。

我们使用的算法就是梯度下降算法。

我们要找到上图中的最低值，使用的方法是高数中的【梯度】，就是用来求变化率最大的地方，也即是沿着哪个方向，C(w,b)的值下降最快，这就是梯度下降算法的核心思想。（此处用v1, v2来代表w 和 b）

∆v1 和∆v2 分别代表在v1方向和v2方向上的变化量， ∆C表示C(v1,v2)的变化量

我们现在的想法是找到合适的∆v1 、∆v2 使得∆C为负值，这样 C就向着变小的方向变化了。

定义梯度向量：

此时，公式（7）可以重新表示如下：

我们令

其中η 是一个小的正参数（被称为学习率）

可以得到新公式：

然后我们可以不断更新：

如何将梯度下降算法应用在神经网络中呢？就是用梯度下降算法来不断寻找、纠正权重w 和偏置b 来使得等式（6）取得最小值。公式如下：

随机梯度下降算法（Stochastic Gradient Descent）

为了解决梯度下降算法训练样本输入数据太大，学习速度太慢的问题，来加速学习，产生了一种新的算法是随机梯度下降算法。这个算法通过随机选择一定的训练输入样本来计算出一个

来代表梯度

。

其中，m为随机选取的输入样本数量。标记X1, X2,...,Xm 称作 mini-batch。

可以得到：

应用如上网络进行简单的手写数字识别的代码实现

[python] view plain copy
""" 
network.py 
~~~~~~~~~~ 
 
A module to implement the stochastic gradient descent learning 
algorithm for a feedforward neural network.  Gradients are calculated 
using backpropagation.  Note that I have focused on making the code 
simple, easily readable, and easily modifiable.  It is not optimized, 
and omits many desirable features. 
"""  
  
#### Libraries  
# Standard library  
import random  
  
# Third-party libraries  
import numpy as np  
  
class Network(object):  
  
    def __init__(self, sizes):  
        """The list ``sizes`` contains the number of neurons in the 
        respective layers of the network.  For example, if the list 
        was [2, 3, 1] then it would be a three-layer network, with the 
        first layer containing 2 neurons, the second layer 3 neurons, 
        and the third layer 1 neuron.  The biases and weights for the 
        network are initialized randomly, using a Gaussian 
        distribution with mean 0, and variance 1.  Note that the first 
        layer is assumed to be an input layer, and by convention we 
        won't set any biases for those neurons, since biases are only 
        ever used in computing the outputs from later layers."""  
        self.num_layers = len(sizes)  
        self.sizes = sizes  
        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]  
        self.weights = [np.random.randn(y, x)  
                        for x, y in zip(sizes[:-1], sizes[1:])]  
  
    def feedforward(self, a):  
        """Return the output of the network if ``a`` is input."""  
        for b, w in zip(self.biases, self.weights):  
            a = sigmoid(np.dot(w, a)+b)  
        return a  
  
    def SGD(self, training_data, epochs, mini_batch_size, eta,  
            test_data=None):  
        """Train the neural network using mini-batch stochastic 
        gradient descent.  The ``training_data`` is a list of tuples 
        ``(x, y)`` representing the training inputs and the desired 
        outputs.  The other non-optional parameters are 
        self-explanatory.  If ``test_data`` is provided then the 
        network will be evaluated against the test data after each 
        epoch, and partial progress printed out.  This is useful for 
        tracking progress, but slows things down substantially."""  
        if test_data: n_test = len(test_data)  
        n = len(training_data)  
        for j in xrange(epochs):  
            random.shuffle(training_data)  
            mini_batches = [  
                training_data[k:k+mini_batch_size]  
                for k in xrange(0, n, mini_batch_size)]  
            for mini_batch in mini_batches:  
                self.update_mini_batch(mini_batch, eta)  
            if test_data:  
                print "Epoch {0}: {1} / {2}".format(  
                    j, self.evaluate(test_data), n_test)  
            else:  
                print "Epoch {0} complete".format(j)  
  
    def update_mini_batch(self, mini_batch, eta):  
        """Update the network's weights and biases by applying 
        gradient descent using backpropagation to a single mini batch. 
        The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta`` 
        is the learning rate."""  
        nabla_b = [np.zeros(b.shape) for b in self.biases]  
        nabla_w = [np.zeros(w.shape) for w in self.weights]  
        for x, y in mini_batch:  
            delta_nabla_b, delta_nabla_w = self.backprop(x, y)  
            nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]  
            nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]  
        self.weights = [w-(eta/len(mini_batch))*nw  
                        for w, nw in zip(self.weights, nabla_w)]  
        self.biases = [b-(eta/len(mini_batch))*nb  
                       for b, nb in zip(self.biases, nabla_b)]  
  
    def backprop(self, x, y):  
        """Return a tuple ``(nabla_b, nabla_w)`` representing the 
        gradient for the cost function C_x.  ``nabla_b`` and 
        ``nabla_w`` are layer-by-layer lists of numpy arrays, similar 
        to ``self.biases`` and ``self.weights``."""  
        nabla_b = [np.zeros(b.shape) for b in self.biases]  
        nabla_w = [np.zeros(w.shape) for w in self.weights]  
        # feedforward  
        activation = x  
        activations = [x] # list to store all the activations, layer by layer  
        zs = [] # list to store all the z vectors, layer by layer  
        for b, w in zip(self.biases, self.weights):  
            z = np.dot(w, activation)+b  
            zs.append(z)  
            activation = sigmoid(z)  
            activations.append(activation)  
        # backward pass  
        delta = self.cost_derivative(activations[-1], y) * \  
            sigmoid_prime(zs[-1])  
        nabla_b[-1] = delta  
        nabla_w[-1] = np.dot(delta, activations[-2].transpose())  
        # Note that the variable l in the loop below is used a little  
        # differently to the notation in Chapter 2 of the book.  Here,  
        # l = 1 means the last layer of neurons, l = 2 is the  
        # second-last layer, and so on.  It's a renumbering of the  
        # scheme in the book, used here to take advantage of the fact  
        # that Python can use negative indices in lists.  
        for l in xrange(2, self.num_layers):  
            z = zs[-l]  
            sp = sigmoid_prime(z)  
            delta = np.dot(self.weights[-l+1].transpose(), delta) * sp  
            nabla_b[-l] = delta  
            nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())  
        return (nabla_b, nabla_w)  
  
    def evaluate(self, test_data):  
        """Return the number of test inputs for which the neural 
        network outputs the correct result. Note that the neural 
        network's output is assumed to be the index of whichever 
        neuron in the final layer has the highest activation."""  
        test_results = [(np.argmax(self.feedforward(x)), y)  
                        for (x, y) in test_data]  
        return sum(int(x == y) for (x, y) in test_results)  
  
    def cost_derivative(self, output_activations, y):  
        """Return the vector of partial derivatives \partial C_x / 
        \partial a for the output activations."""  
        return (output_activations-y)  
  
#### Miscellaneous functions  
def sigmoid(z):  
    """The sigmoid function."""  
    return 1.0/(1.0+np.exp(-z))  
  
def sigmoid_prime(z):  
    """Derivative of the sigmoid function."""  
    return sigmoid(z)*(1-sigmoid(z))  

抓紧时间充电——面向对象的编程C++ / Python、神经网络知识体系架构！

阅读全文

0 0