Machine Learning - Neural Networks Learning: Cost Function and Backpropagation
来源:互联网 发布:mysql中check约束 编辑:程序博客网 时间:2024/06/05 02:01
This series of articles are the study notes of " Machine Learning ", by Prof. Andrew Ng., Stanford University. This article is the notes of week 5, Neural Networks Learning. This article contains some topic about Cost Function and Backpropagation algorithm.
Cost Function and Backpropagation
Neural networks are one of the most powerful learning algorithms that we have today. In this and in the next few sections, We're going to start talking about a learning algorithm for fitting the parameters of a neural network given a training set.As with the discussion of most of our learning algorithms, we're going to begin by talking about the cost function for fitting the parameters of the network.
1. Cost function
I'm going to focus on the application of neural networks to classification problems. So suppose we have a network like that shown in the picture. And suppose we have a training set like this is x(i) , y(i) pairs of M training example.
L = total no. of layers in network, L = 4.
sl = no. of units (not counting bias unit) in layer l,s1 = 3, s2 = 5,s4 = sL = 4
Binary classification
y = 0 or 1
Multi-class classification (K classes)
K output units
Cost function
Logistic regression
Neural network
2. Backpropagation algorithm
Gradient computation
Need code to compute:
What we need to do therefore is to write code that takes this input the parameters theta and computes j of theta and these partial derivative terms. Remember, that the parameters in the neural network of these things, theta superscript l subscript ij, that's the real number and so, these are the partial derivative terms we need to compute. In order to compute the cost function j of theta, we just use this formula up here and so, what I want to do for the most of this video is focus on talking about how we can compute these partial derivative terms.
Given one training example ( x, y)
Forward propagation
So this is our vectorized implementation of forward propagation and it allows us to compute the activation values for all of the neurons in our neural network.
Gradient computation: Back propagation algorithm
Next, in order to compute the derivatives,we're going to use an algorithm called back propagation. The intuition of the back propagation algorithm is that for each note we're going to compute the term δ superscript l subscript jthat's going to somehow represent the error of note jin the layer l.
Intuition:
For each output unit (layer L = 4)
If you think of delta a and y as vectors then you can also take those and come up with a vectorized implementation of it, which is justδ(4) gets set as a(4)
Where here, each of these δ(4),a(4) and y, each of these is a vector whose dimension is equal to the number of output units in our network.
What we do next is compute the delta terms for the earlier layers in our network. Here's a formula for computingδ(3) isδ(3) is equal to theta 3 transpose timesδ(4). And this dot times, this is the elementy's multiplication operation that we know from MATLAB.
Backpropagation algorithm
3. Backpropagation intuition
Forward Propagation
In order to illustrate forward propagation,I'm going to draw this network a little bit differently. And in particular I'm going to draw this neural-network with the nodes drawn as these very fat ellipsis, so that I can write text in them. When performing forward propagation, we might have some particular example. Say some example (xi, yi) And it'll be this xi that we feed into the input layer.
So the way we compute this value, z1(3) is
When we forward propagated to the first hidden layer here,what we do is compute z1(2) and z2(2). So these are the weighted sum of inputs of the input units. And then we apply the sigmoid of the logistic function, and the sigmoid activation function applied to the z value. Here's are the activation values. So that gives us a1(2)and a2(2) . And then we forward propagate again to get here z1(3). Apply the sigmoid of the logistic function, the activation function to that to get a1(3). And similarly, like so until we getz1(4). Apply the activation function. This gives us a1(4), which is the final output value of the neural network.
What is backpropagation doing?
Focusing on a single example x(i),y(i), the case of 1 output unit (K=1), and ignoring regularization (λ=0), the cost function can be written as follows
i.e.how well is the network doing on example i?
More formally, what the delta terms actually are is this, they're the partial derivative with respect to zj(l), that is this weighted sum of inputs that were confusing these z terms. Partial derivatives with respect to these things of the cost function. So concretely, the cost function is a function of the label y and of the value, this h(x) output value neural network. And if we could go inside the neural network and just change those zj(l) values a little bit, then that will affect these values that the neural network is outputting. And that will end up changing the cost function.
We don't compute the bias term
- Machine Learning - Neural Networks Learning: Cost Function and Backpropagation
- Machine Learning - Neural Networks Learning: Backpropagation in Practice
- Machine Learning - Neural Networks Examples and Intuitions
- Neural Networks for Machine Learning
- Machine Learning week 5 Neural Networks Learning
- Andrew Ng《Machine Learning》第五讲——神经网络模型训练(cost function &backpropagation)
- Machine learning (Regression and neural networks) by Mark K Cowan
- Andrew NG 《machine learning》week 5,class1 —Cost functions and Backpropagation
- Neural Networks and Deep Learning
- Neural Networks and Deep Learning
- Neural networks and Deep Learning
- Neural Networks and Deep Learning
- Machine Learning week 4 Neural Networks Presentation
- Neural Networks for Machine Learning 课程笔记
- Machine Learning - Neural Networks Representation Part I
- Machine Learning - Neural Networks Representation Part II
- Coursera Machine Learning Week 4 - Neural Networks
- 《Neural Networks for Machine Learning》学习一
- 字符串-全排序
- LINUX驱动编写技巧(1)
- ANDROID中vnd.android.cursor的解释
- Leetcode no. 226
- java 合并两个排序的链表
- Machine Learning - Neural Networks Learning: Cost Function and Backpropagation
- 自定义jquery函数
- CSS选择器
- Android学习时碰到的低级错误
- 一段源代码的旅行——程序运行背后的机制和由来
- iOS网络编程 - 4
- Leetcode no. 198
- idea java complier 解决 版本老变
- Rebuild Future