神经网络(二)
来源:互联网 发布:mysql数据库设计 编辑:程序博客网 时间:2024/05/15 23:45
在本章节,我们将结合一小段python代码来进一步地说明神经网络的思想。(源代码可以从https://github.com/mnielsen/neural-networks-and-deep-learning 下载,本文只是对该代码进行了学习,未做任何修改)。关于神经网络的具体理论请参考上一章节进行学习。
核心代码是Network对象:
class Network(object):sizes变量指明神经网络共多少层,每一层包含多少个神经元。神经网络参数bias和weight都以随机值作为迭代优化的初始点。 def __init__(self, sizes): """The list ``sizes`` contains the number of neurons in the respective layers of the network. For example, if the list was [2, 3, 1] then it would be a three-layer network, with the first layer containing 2 neurons, the second layer 3 neurons, and the third layer 1 neuron. The biases and weights for the network are initialized randomly, using a Gaussian distribution with mean 0, and variance 1. Note that the first layer is assumed to be an input layer, and by convention we won't set any biases for those neurons, since biases are only ever used in computing the outputs from later layers.""" self.num_layers = len(sizes) self.sizes = sizes self.biases = [np.random.randn(y, 1) for y in sizes[1:]] self.weights = [np.random.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])]
前向传播代码:
根据公式,当前层神经元的激活值
def feedforward(self, a): """Return the output of the network if ``a`` is input.""" for b, w in zip(self.biases, self.weights): a = sigmoid(np.dot(w, a)+b) return a
sigmoid函数及其导数代码段:
def sigmoid(z): """The sigmoid function.""" return 1.0/(1.0+np.exp(-z))
def sigmoid_prime(z): """Derivative of the sigmoid function.""" return sigmoid(z)*(1-sigmoid(z))
向后传播代码段:
SGD是向后传播的主要代码,下面这段代码的核心update_mini_batch函数,除此之外是循环多个epoch,对于每个epoch,产生训练数据mini-batch。算法结束后,使用测试数据对当前神经网络的性能进行测试。
def SGD(self, training_data, epochs, mini_batch_size, eta, test_data=None): """Train the neural network using mini-batch stochastic gradient descent. The ``training_data`` is a list of tuples ``(x, y)`` representing the training inputs and the desired outputs. The other non-optional parameters are self-explanatory. If ``test_data`` is provided then the network will be evaluated against the test data after each epoch, and partial progress printed out. This is useful for if test_data: n_test = len(test_data) n = len(training_data) for j in xrange(epochs):#each epoch round random.shuffle(training_data)#re-arrange the data randomly #all mini-batch data mini_batches = [ training_data[k:k+mini_batch_size] for k in xrange(0, n, mini_batch_size)] for mini_batch in mini_batches: self.update_mini_batch(mini_batch, eta) if test_data: print "Epoch {0}: {1} / {2}".format( j, self.evaluate(test_data), n_test) else: print "Epoch {0} complete".format(j)
根据梯度下降公式迭代更新bias和weight的值,另外注意的是,梯度的计算要基于所有的训练数据来求解,这也是我们使用随机梯度下降来提高运行速度的原因,
def update_mini_batch(self, mini_batch, eta): """Update the network's weights and biases by applying gradient descent using backpropagation to a single mini batch. The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta`` is the learning rate.""" #nabla_b and nabla_w stores the gradient values for bias and weight #the gradient value is computed based on the all traning examples nabla_b = [np.zeros(b.shape) for b in self.biases] nabla_w = [np.zeros(w.shape) for w in self.weights] for x, y in mini_batch: delta_nabla_b, delta_nabla_w = self.backprop(x, y) nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)] nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)] #update weights and bias according to gradient descent formula self.weights = [w-(eta/len(mini_batch))*nw for w, nw in zip(self.weights, nabla_w)] self.biases = [b-(eta/len(mini_batch))*nb for b, nb in zip(self.biases, nabla_b)]
向后传播的具体代码是写在backprop函数中,其中对于每一个训练数据(x,y),x是输入值,y是真实的数字标签。
其中,神经网络中所有层的激活值存在activations变量中,所有层的
def backprop(self, x, y): """Return a tuple ``(nabla_b, nabla_w)`` representing the gradient for the cost function C_x. ``nabla_b`` and ``nabla_w`` are layer-by-layer lists of numpy arrays, similar to ``self.biases`` and ``self.weights``.""" nabla_b = [np.zeros(b.shape) for b in self.biases] nabla_w = [np.zeros(w.shape) for w in self.weights] # feedforward activation = x activations = [x] # list to store all the activations, layer by layer zs = [] # list to store all the z vectors, layer by layer for b, w in zip(self.biases, self.weights): z = np.dot(w, activation)+b zs.append(z) activation = sigmoid(z) activations.append(activation) # backward pass #compute the output layer's error term delta = self.cost_derivative(activations[-1], y) * \ sigmoid_prime(zs[-1]) #compute the output layer's bias and weight term nabla_b[-1] = delta#the error is the bias value as proved before nabla_w[-1] = np.dot(delta, activations[-2].transpose())#a*delta is the weight value as proved before # Note that the variable l in the loop below is used a little # differently to the notation in Chapter 2 of the book. Here, # l = 1 means the last layer of neurons, l = 2 is the # second-last layer, and so on. It's a renumbering of the # scheme in the book, used here to take advantage of the fact # that Python can use negative indices in lists. for l in xrange(2, self.num_layers):#for each layer z = zs[-l] sp = sigmoid_prime(z) delta = np.dot(self.weights[-l+1].transpose(), delta) * sp#error backpropogation nabla_b[-l] = delta#bias term nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())#weight term return (nabla_b, nabla_w)
性能评估代码:
对于每一个输入数据,我们认为,在输出层的10个神经元中,输出值最大的神经元对应的ID就是该输入数据的预测标签。
def evaluate(self, test_data): """Return the number of test inputs for which the neural network outputs the correct result. Note that the neural network's output is assumed to be the index of whichever neuron in the final layer has the highest activation.""" test_results = [(np.argmax(self.feedforward(x)), y) for (x, y) in test_data] return sum(int(x == y) for (x, y) in test_results)
- 神经网络(二)
- 神经网络-overfitting(二)
- 神经网络(二) 曲线拟合
- 神经网络(二)
- 神经网络 ANN(二)
- 卷积神经网络(二)
- 神经网络学习(二)
- 神经网络:表达(二)
- 神经网络:学习(二)
- 神经网络学习笔记(二) 线性神经网络
- SPSS神经网络心得(二)
- 神经网络学习笔记(二)
- 神经网络学习笔记(二)
- MatConvNet卷积神经网络(二)
- 神经网络训练细节(二)
- 神经网络二
- 卷积神经网络(二):卷积神经网络CNN的BP算法
- MATLAB神经网络编程(二)——线性神经网络
- CC2541_simpleProfile_WriteAttrCB_simpleProfile_ReadAttrCB
- SPI驱动
- 最全ACM常用STL
- React入门记事本小项目(三)
- 精通 CSS+DIV 网页样式与布局 80
- 神经网络(二)
- 【每日一记】设计模式——建造者模式
- ZCMU—1774
- pm2 管理nodejs 日志存放问题
- Oracle materizlized view Study
- 关于dispose 方法的资源释放
- 多线程中的Join()方法
- LeetCode #231 - Power of Two - Easy
- 架构设计的原则