动手写一个神经网络代码(附Backpropagation Algorithm代码分解)
来源:互联网 发布:java自学入门书籍推荐 编辑:程序博客网 时间:2024/05/16 10:28
先上Michal Daniel(传送门)的代码。类Network有六个成员函数,其中SGD、update_mini_batch、backprop负责计算每echo的残差、W和b偏导数、W和b的更新。feedforward、evaluation负责计算前向传导的值,可用于计算每echo训练集和验证集的error。cost_derivative计算网络最后一层的残差。
#### Libraries# Standard libraryimport random# Third-party librariesimport numpy as npclass Network(object): def __init__(self, sizes): self.num_layers = len(sizes) self.sizes = sizes self.biases = [np.random.randn(y, 1) for y in sizes[1:]] self.weights = [np.random.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])] def feedforward(self, a): """Return the output of the network if ``a`` is input.""" for b, w in zip(self.biases, self.weights): a = sigmoid(np.dot(w, a)+b) return a def SGD(self, training_data, epochs, mini_batch_size, eta, test_data=None): if test_data: n_test = len(test_data) n = len(training_data) for j in xrange(epochs): random.shuffle(training_data) mini_batches = [ training_data[k:k+mini_batch_size] for k in xrange(0, n, mini_batch_size)] for mini_batch in mini_batches: self.update_mini_batch(mini_batch, eta) if test_data: print "Epoch {0}: {1} / {2}".format( j, self.evaluate(test_data), len(test_data)) else: print "Epoch {0} complete".format(j) def update_mini_batch(self, mini_batch, eta): nabla_b = [np.zeros(b.shape) for b in self.biases] nabla_w = [np.zeros(w.shape) for w in self.weights] for x, y in mini_batch: delta_nabla_b, delta_nabla_w = self.backprop(x, y) nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)] nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)] self.weights = [w-(eta/len(mini_batch))*nw for w, nw in zip(self.weights, nabla_w)] self.biases = [b-(eta/len(mini_batch))*nb for b, nb in zip(self.biases, nabla_b)] def backprop(self, x, y): nabla_b = [np.zeros(b.shape) for b in self.biases] nabla_w = [np.zeros(w.shape) for w in self.weights] # feedforward activation = x activations = [x] # list to store all the activations, layer by layer zs = [] # list to store all the z vectors, layer by layer for b, w in zip(self.biases, self.weights): # feedforward 同时保存隐藏层计算的中间值结果 z = np.dot(w, activation)+b zs.append(z) # zs保存了每层神经元输入值 activation = sigmoid(z) activations.append(activation) delta = self.cost_derivative(activations[-1], y) * \ sigmoid_prime(zs[-1]) nabla_b[-1] = delta nabla_w[-1] = np.dot(delta, activations[-2].transpose()) for l in xrange(2, self.num_layers): z = zs[-l] sp = sigmoid_prime(z) delta = np.dot(self.weights[-l+1].transpose(), delta) * sp nabla_b[-l] = delta nabla_w[-l] = np.dot(delta, activations[-l-1].transpose()) # l 不是 1 return (nabla_b, nabla_w) def evaluate(self, test_data) test_results = [(np.argmax(self.feedforward(x)), y) for (x, y) in test_data] # print test_results return sum(int(x == y) for (x, y) in test_results)#cost的导数 def cost_derivative(self, output_activations, y): return (output_activations-y)#### Miscellaneous functionsdef sigmoid(z): """The sigmoid function.""" return 1.0/(1.0+np.exp(-z))def sigmoid_prime(z): """Derivative of the sigmoid function.""" return sigmoid(z)*(1-sigmoid(z))
步骤分解:
首先需要传入的参数有层数、每层的神经元个数
根据传入参数初始化权重W和b,注意初始值必须是随机值,比如使用服从N(0,
输入数据X在每一epoch迭代前都要重新打乱,然后按照mini_batch_size大小切分数据,依次用每个batch训练更新W和b。每个epoch需要把所有batch训练完,训练完后可以测试下用现在的W和b能预测出什么样的结果来,并与真实值对比。然后进入下一epoch重复训练。
反馈传导步骤分解,公式代码可以对应:
1.进行前馈传导计算,利用前向传导公式,计算
def backprop(self,x,y): # 省略部分代码 activation = x activations = [x] # list to store all the activations, layer by layer zs = [] # list to strore all the z vaectors, layer by layer for b, w in zip(self.biases, self.weights): z = np.dot(w, activation)+b zs.append(z) # 保存了每层神经元输入值,后面 activation = sigmoid(z) activations.append(activation)
z保存每层神经元输入值,activation保存每层神经元经过激活函数计算后的输出值
2.对输出层(
def backprop(self,x,y): # 省略部分代码 delta = self.cost_derivative(activations[-1], y) * \ sigmoid_prime(zs[-1]) # 求最后一层的残差 # nabla_b[-1] = delta # nabla_w[-1] = np.dot(delta, activations[-2].transpose())def cost_derivative(self, output_activations, y): return (output_activations-y)def sigmoid_prime(z): """Derivative of the sigmoid function.""" return sigmoid(z)*(1-sigmoid(z))
3.对于
def backprop(self,x,y): # 省略部分代码 # 代码里面 -l 表述倒数第 l 层。 for l in xrange(2, self.num_layers): z = zs[-l] sp = sigmoid_prime(z) delta = np.dot(self.weights[-l+1].transpose(), delta) * sp # nabla_b[-l] = delta # nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
4.计算每层cost对w和b的偏导数
def backprop(self,x,y): # 省略部分代码 for l in xrange(2, self.num_layers): # z = zs[-l] # sp = sigmoid_prime(z) # delta = np.dot(self.weights[-l+1].transpose(), delta) * sp nabla_b[-l] = delta nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
5.对于批量梯度下降法,样本从i=1到m,计算
def update_mini_batch(self, mini_batch, eta): nabla_b = [np.zeros(b.shape) for b in self.biases] nabla_w = [np.zeros(w.shape) for w in self.weights] for x, y in mini_batch: delta_nabla_b, delta_nabla_w = self.backprop(x, y) nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)] nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)] # self.weights = [w-(eta/len(mini_batch))*nw for w, nw in zip(self.weights, nabla_w)] # self.biases = [b-(eta/len(mini_batch))*nb for b, nb in zip(self.biases, nabla_b)]
6.更新权重参数:
def update_mini_batch(self, mini_batch, eta): # nabla_b = [np.zeros(b.shape) for b in self.biases] # nabla_w = [np.zeros(w.shape) for w in self.weights] # for x, y in mini_batch: # delta_nabla_b, delta_nabla_w = self.backprop(x, y) # nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)] # nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)] self.weights = [w-(eta/len(mini_batch))*nw for w, nw in zip(self.weights, nabla_w)] self.biases = [b-(eta/len(mini_batch))*nb for b, nb in zip(self.biases, nabla_b)]
重复梯度下降法的迭代步骤来减小代价函数J(W,b)的值
改进方案
权重初始化改进:
W权重初始化从区间均匀随机取值,具体解释见http://blog.csdn.net/xbinworld/article/details/50603552 和http://neuralnetworksanddeeplearning.com/chap3.html#weight_initialization
self.weights = [np.random.randn(y, x)/np.sqrt(x) for x, y in zip(self.sizes[:-1], self.sizes[1:])]
增加正则化项
def update_mini_batch(self, mini_batch, eta, lmbda, n): """``lmbda`` is the regularization parameter, and ``n`` is the total size of the training data set. """ # 省略部分代码 self.weights = [(1-eta*(lmbda/n))*w-(eta/len(mini_batch))*nw for w, nw in zip(self.weights, nabla_w)] # self.biases = [b-(eta/len(mini_batch))*nb # for b, nb in zip(self.biases, nabla_b)]
validation 求最优超参数
Quadratic Cost 二次损失函数
- 动手写一个神经网络代码(附Backpropagation Algorithm代码分解)
- 人工神经网络 backpropagation algorithm
- 动手写机器学习算法:SVM支持向量机(附代码)
- 【机器学习】动手写一个全连接神经网络(一)
- 自做代码生成器(三)动手写代码
- 神经网络之权重初始化(附代码)
- 神经网络之过拟合(附代码)
- 一步一步分析讲解神经网络基础-backpropagation algorithm
- bandit算法(1)--epsilon-Greedy Algorithm(附代码)
- 什么时候动手写代码才合适?
- 读书要多动手写代码
- 【机器学习】动手写一个全连接神经网络(二):线性回归
- 【机器学习】动手写一个全连接神经网络(三):分类
- [神经网络]2.1-How the backpropagation algorithm works-Warm up: a fast matrix-based approach ...(翻译)
- [神经网络]2.2/2.3-How the backpropagation algorithm works-The two assumptions we need...(翻译)
- 神经网络二:浅谈反向传播算法(backpropagation algorithm)为什么会很快
- 神经网络和深度学习(二)——BP(Backpropagation Algorithm, 反向传播算法)
- 神经网络和深度学习(二)——BP(Backpropagation Algorithm, 反向传播算法)
- JVM 相关介绍
- white-space,word-wrap,word-break的区别
- SPRINGMYBATIS01 Unit03: Spring Web MVC简介 、 基于XML配置的MVC应用 、 基于注解配置的MVC应用
- 统计原理笔记 Notes for Statistics II
- https://fonts.googleapis.com/css 一直加载的处理
- 动手写一个神经网络代码(附Backpropagation Algorithm代码分解)
- __declspec(novtable)的作用
- Android 支持多种屏幕第二篇
- ps命令
- bzoj 1042: [HAOI2008]硬币购物(dp+容斥)
- Java关键字this用法的总结
- ToLua学习笔记,使用Update方法(四)
- 搭建hadoop伪集群时遇到的datanode不启动的问题
- 用Docker作为PaaS的替代方案是否完美无缺