Artificial Neural Networks: Mathematics of Backpropagation (Part 4)
来源:互联网 发布:数据结构课程设计java 编辑:程序博客网 时间:2024/06/05 08:08
http://briandolhansky.com/blog/2013/9/27/artificial-neural-networks-backpropagation-part-4
Up until now, we haven't utilized any of the expressive non-linear power of neural networks - all of our simple one layer models corresponded to a linear model such as multinomial logistic regression. These one-layer models had a simple derivative. We only had one set of weights the fed directly to our output, and it was easy to compute the derivative with respect to these weights. However, what happens when we want to use a deeper model? What happens when we start stacking layers?
No longer is there a linear relation in between a change in the weights and a change of the target. Any perturbation at a particular layer will be further transformed in successive layers. So, then, how do we compute the gradient for all weights in our network? This is where we use the backpropagation algorithm.
Backpropagation, at its core, simply consists of repeatedly applying the chain rule through all of the possible paths in our network. However, there are an exponential number of directed paths from the input to the output. Backpropagation's real power arises in the form of a dynamic programming algorithm, where we reuse intermediate results to calculate the gradient. We transmit intermediate errors backwards through a network, thus leading to the name backpropagation. In fact, backpropagation is closely related to forward propagation, but instead of propagating the inputs forward through the network, we propagate the error backwards.
Most explanations of backpropagation start directly with a general theoretical derivation, but I’ve found that computing the gradients by hand naturally leads to the backpropagation algorithm itself, and that’s what I’ll be doing in this blog post. This is a lengthy section, but I feel that this is the best way to learn how backpropagation works.
I’ll start with a simple one-path network, and then move on to a network with multiple units per layer. Finally, I’ll derive the general backpropagation algorithm. Code for the backpropagation algorithm will be included in my next installment, where I derive the matrix form of the algorithm.
Examples: Deriving the base rules of backpropagation
For a single unit in a general network, we can have several cases: the unit may have only one input and one output (case 1), the unit may have multiple inputs (case 2), or the unit may have multiple outputs (case 3). Technically there is a fourth case: a unit may have multiple inputs and outputs. But as we will see, the multiple input case and the multiple output case are independent, and we can simply combine the rules we learn for case 2 and case 3 for this case.
I will go over each of this cases in turn with relatively simple multilayer networks, and along the way will derive some general rules for backpropagation. At the end, we can combine all of these rules into a single grand unified backpropagation algorithm for arbitrary networks.
Case 1: Single input and single output
Suppose we have the following network:
By now, you should be seeing a pattern emerging, a pattern that hopefully we could encode with backpropagation. We are reusing multiple values as we compute the updates for weights that appear earlier and earlier in the network. Specifically, we see the derivative of the network error, the weighted derivative of unit
So, in summary, for this simple network, we have:
Case 2: Handling multiple inputs
Consider the more complicated network, where a unit may have more than one input:
Here we see that the update for
Case 3: Handling multiple outputs
Now let's examine the case where a hidden unit has more than one output.
There are two things to note here. The first, and most relevant, is our second derived rule: the weight update for a weight leading to a unit with multiple outputs is dependent on derivatives that reside on both paths.
But more generally, and more importantly, we begin to see the relation between backpropagation and forward propagation. During backpropagation, we compute the error of the output. We then pass the error backward and weight it along each edge. When we come to a unit, we multiply the weighted backpropagated error by the unit's derivative. We then continue backpropagating this error in the same fashion, all the way to the input. Backpropagation, much like forward propagation, is a recursive algorithm. In the next section, I introduce the notion of an error signal, which allows us to rewrite our weight updates in a compact form.
Error Signals
We define the recursive error signal at unit
Otherwise, unit
The general form of backpropagation
Recall the simple network from the first section:
The last thing to consider is the case where we use a minibatch of instances to compute the gradient. Because we treat each
- Feed the training instances forward through the network, and record each
s(yi)j sj(yi) andz(yi)j zj(yi). - Calculate the error signal
δ(yi)j δj(yi) for all unitsj j and each training exampleyi yi. Ifj j is an output node, thenδ(yi)j=f′j(s(yi)j)(y^i−yi) δj(yi)=fj′(sj(yi))(y^i−yi). Ifj j is not an output node, thenδ(yi)j=f′j(s(yi)j)∑k∈outs(j)δ(yi)kwj→k δj(yi)=fj′(sj(yi))∑k∈outs(j)δk(yi)wj→k. - Update the weights with the rule
Δwi→j=−ηN∑yiδ(yi)jz(yi)i Δwi→j=−ηN∑yiδj(yi)zi(yi).
Conclusions
Hopefully you've gained a full understanding of the backpropagation algorithm with this derivation. Although we've fully derived the general backpropagation algorithm in this chapter, it's still not in a form amenable to programming or scaling up. In the next post, I will go over the matrix form of backpropagation, along with a working example that trains a basic neural network on MNIST.
- Artificial Neural Networks: Mathematics of Backpropagation (Part 4)
- Artificial Neural Networks: Mathematics of Backpropagation (Part 4)
- Artificial Neural Networks: Linear Classification (Part 2)
- Artificial Neural Networks: Matrix Form (Part 5)
- Artificial Neural Networks && FileStorage of OpenCV
- CS231n系列课程Lecture4:Backpropagation and Neural Networks(part 1)
- Artificial Neural Networks: Linear Multiclass Classification (Part 3)
- Neural Networks and Backpropagation Algorithm
- 人工神经网络(Artificial Neural Networks)
- 【CS231n winter2016 Lecture 4 (Backpropagation ,Introduction to neural networks)】
- Artificial Neural Networks/Neural Networks/Neural Computing Conception
- [论文阅读]Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks
- 人工神经网络 Artificial Neural Networks - A Tutorial
- 人工神经网络(Artificial Neural Networks, ANN)
- Support Vector Machines vs Artificial Neural Networks
- Machine Learning - Neural Networks Learning: Cost Function and Backpropagation
- Machine Learning - Neural Networks Learning: Backpropagation in Practice
- CS231n学习笔记--4.Backpropagation and Neural Networks
- springboot学习笔记1
- Linux(Red Hat6.5)下安装svn服务器,并通过http访问
- 图像处理的仿射变换和透视变换
- 第三章 创建Zookeeper会话【下】
- 3DES加密算法
- Artificial Neural Networks: Mathematics of Backpropagation (Part 4)
- bzoj2746 [HEOI2012]旅行问题 ( AC自动机 & fail树 +lca + hash )
- EasyGUI的安装
- Bean
- 如何发表期刊方法步骤
- QUESTIONS
- android初探
- com.mysql.jdbc.MysqlDataTruncation: Data truncation: Incorrect datetime value:
- 利用apidoc维护api接口文档