《neural network and deep learning》题解——ch03 再看手写识别问题题解与源码分析
来源:互联网 发布:最红网络伤感歌曲大全 编辑:程序博客网 时间:2024/06/15 01:31
http://blog.csdn.net/u011239443/article/details/77649026
完整代码:https://github.com/xiaoyesoso/neural-networks-and-deep-learning/blob/master/src/network2.py
我们之前根据《neural network and deep learning》题解——ch02 反向传播讲解了ch02 Network源码分析。这篇是对ch02 Network源码分析的改进。这里我们结合《机器学习技法》学习笔记12——神经网络重新讲解下。
交叉熵代价函数
class QuadraticCost(object): @staticmethod def fn(a, y): return 0.5 * np.linalg.norm(a - y) ** 2 @staticmethod def delta(z, a, y): return (a - y) * sigmoid_prime(z)class CrossEntropyCost(object): @staticmethod def fn(a, y): return np.sum(np.nan_to_num(-y * np.log(a) - (1 - y) * np.log(1 - a))) @staticmethod def delta(z, a, y): return (a - y)
这边我们把损失函数封装成两个类,静态函数 fn
返回的是损失,delta
返回的是ch02 反向传播中的δ
。该delta
对应《机器学习技法》学习笔记12——神经网络中就是:
我们在Network中使用的就是二次代价函数,这里我们就只讲解另外的交叉熵代价函数:
对应代码:
np.sum(np.nan_to_num(-y * np.log(a) - (1 - y) * np.log(1 - a)))
接下来我们来看看关于delta
的问题:
看看 network.py 中的 Network.cost_derivative ⽅法。这个⽅法是为⼆次代价函数写的。怎样修改可以⽤于交叉熵代价函数上?你能不能想到可能在交叉熵函数上遇到的问题?在 network2.py 中,我们已经去掉了Network.cost_derivative ⽅法,将其集成进了‘CrossEntropyCost.delta‘ ⽅法中。请问,这样是如何解决你已经发现的问题的?
对应《机器学习技法》学习笔记12——神经网络中,cost_derivative
就是
network中也是的cost_derivative
也是用在求δ。
而CrossEntropyCost.delta是:
return (a - y)
代码 中的 a 就是上式中的x,z 就是上式中的 s。
我们对CrossEntropyCost关于a求导,得到:
所以 CrossEntropyCost 的 cost_derivative 是
由 http://blog.csdn.net/u011239443/article/details/75091283#t0 可知:
所以:
初始化
和Network基本上一样,只不过封装成了一个default_weight_initializer函数
def __init__(self, sizes, cost=CrossEntropyCost): self.num_layers = len(sizes) self.sizes = sizes self.default_weight_initializer() self.cost = cost def default_weight_initializer(self): self.biases = [np.random.rand(y, 1) for y in self.sizes[1:]] self.weights = [np.random.rand(y, x) / np.sqrt(x) for x, y in zip(self.sizes[:-1], self.sizes[1:])]
随机梯度下降
和Network基本上一样,各个monitor是代表是否需要检测该对应的指标。
def SGD(self, training_data, epochs, mini_batch_size, eta, lmbda=0.0, evaluation_data=None, monitor_evaluation_cost=False, monitor_evaluation_accuracy=False, monitor_training_cost=False, monitor_training_accuray=False): if evaluation_data: n_data = len(evaluation_data) n = len(training_data) evaluation_cost, evaluation_accurary = [], [] training_cost, training_accuray = [], [] for j in xrange(epochs): random.shuffle(training_data) mini_batches = [training_data[k:k + mini_batch_size] for k in range(0, n, mini_batch_size)] for mini_batch in mini_batches: self.update_mini_batch(mini_batch, eta, lmbda, len(training_data)) print "Epoch %s training complete" %(j+1) if monitor_training_cost: cost = self.total_cost(training_data, lmbda) training_cost.append(cost) print "Cost on train: {}".format(cost) if monitor_training_accuray: acc = self.accuracy(training_data,covert=True) training_accuray.append(acc) print "Acc on train: {} / {}".format(acc,n) if monitor_evaluation_cost: cost = self.total_cost(evaluation_data, lmbda,convert=True) evaluation_cost.append(cost) print "Cost on evaluation: {}".format(cost) if monitor_evaluation_accuracy: acc = self.accuracy(evaluation_data) evaluation_accurary.append(acc) print "Acc on evaluation: {} / {}".format(acc, n_data) print return evaluation_cost,evaluation_accurary,training_cost,training_accuray
反向传播
def backprop(self, x, y): nabla_b = [np.zeros(b.shape) for b in self.biases] nabla_w = [np.zeros(w.shape) for w in self.weights] activation = x activations = [x] zs = [] for b, w in zip(self.biases, self.weights): z = np.dot(w, activation) + b zs.append(z) activation = sigmoid(z) activations.append(activation) delta = (self.cost).delta(zs[-1], activations[-1], y) nabla_b[-1] = delta nabla_w[-1] = np.dot(delta, activations[-2].transpose()) for l in xrange(2, self.num_layers): z = zs[-l] sp = sigmoid_prime(z) delta = np.dot(self.weights[-l + 1].transpose(), delta) * sp nabla_b[-l] = delta nabla_w[-l] = np.dot(delta, activations[-l - 1].transpose()) return (nabla_b, nabla_w) def update_mini_batch(self, mini_batch, eta, lmbda, n): nabla_b = [np.zeros(b.shape) for b in self.biases] nabla_w = [np.zeros(w.shape) for w in self.weights] for x, y in mini_batch: delta_nabla_b, delta_nabla_w = self.backprop(x, y) nabla_b = [nb + dnb for nb, dnb in zip(nabla_b, delta_nabla_b)] nabla_w = [nw + dnw for nw, dnw in zip(nabla_w, delta_nabla_w)] self.weights = [(1 - eta * (lmbda / n)) * w - (eta / len(mini_batch)) * nw for w, nw in zip(self.weights, nabla_w)] self.biases = [b - (eta / len(mini_batch)) * nb for b, nb in zip(self.biases, nabla_b)]
我们可以看到基本上和Network中一样,前面已经讲解过δ
。这里的代码也可以和《机器学习技法》学习笔记12——神经网络中的公式对应:
L2规范化
主要区别是在最后两行更新的时候加入了L2规范化:
求偏导数得:
则:
L1规范化
这里引出了我们这节的另外一个问题:
更改上⾯的代码来实现 L1 规范化
求导得到:
则:
对应的代码应该写为:
self.weights = [(1 - eta * (lmbda / n)*np.sign(w)) * w - (eta / len(mini_batch)) * nw for w, nw in zip(self.weights, nabla_w)] self.biases = [b - (eta / len(mini_batch)) * nb for b, nb in zip(self.biases, nabla_b)]
测评
有些label,我们需要对其进行二元化处理,然后使用:
def vectorized_result(j): e = np.zeros((10, 1)) e[j] = 1.0 return e
计算损失率
这会加入L2 规范化
def total_cost(self, data, lmbda, convert=False): cost = 0.0 for x, y in data: a = self.feedforward(x) if convert: y = vectorized_result(y) cost += self.cost.fn(a, y) / len(data) cost += 0.5 * (lmbda / len(data)) * sum(np.linalg.norm(w) ** 2 for w in self.weights) return cost
回到我们之前的L1规范化实现的问题,这里代码可改成:
cost += (lmbda / len(data)) * sum(np.linalg.norm(w) for w in self.weights)
计算准确率
和Network中基本上一致
def accuracy(self,data,covert=False): if covert: results = [(np.argmax(self.feedforward(x)),np.argmax(y)) for (x,y) in data] else: results = [(np.argmax(self.feedforward(x)),y) for (x,y) in data] return sum(int(x==y) for (x,y) in results)
- 《neural network and deep learning》题解——ch03 再看手写识别问题题解与源码分析
- 《neural network and deep learning》题解——ch03 交叉熵代价函数
- 《neural network and deep learning》题解——ch03 过度拟合&规范化&权重初始化
- 《neural network and deep learning》题解——ch03 如何选择神经网络的超参数
- 《neural network and deep learning》题解——ch03 其他技术(momentun,tanh)
- 《neural network and deep learning》题解——ch02 Network源码分析
- 《neural network and deep learning》题解——ch01 神经网络
- 《neural network and deep learning》题解——ch02 反向传播
- 《neural networks and deep learning》——使用神经网络识别手写数字
- Neural Network and Deep Learning
- Deep learning与Neural Network
- neural network and deep learning (1)
- neural network and deep learning (2)
- neural network and deep learning(笔记二)
- neural-networks-and-deep-learning network.py
- Neural Network and deep learning(二)
- Neural Networks and Deep Learning之中文翻译-第一章 用神经网络识别手写数字
- Deep Learning——CNN(Convolution Neural Network)
- 移动端ionic App 多文件上传问题
- pycharm执行python时,如何填写参数
- iOS提交应用至App Store流程
- Java NIO系列教程(4):Scatter/Gather
- 如何构建私有公钥基础设施
- 《neural network and deep learning》题解——ch03 再看手写识别问题题解与源码分析
- C++虚函数的实现机制
- IOS和H5之间的交互开发
- HDU2874 Connections between cities【LCA】
- [NOIP2017模拟]操作
- spark名词解释
- Jury Marks
- MIB消息注解
- HDU 5969 最大的位或 思维题