Improving the way neural networks learn
来源:互联网 发布:php编辑器安卓 编辑:程序博客网 时间:2024/05/16 19:45
LINK
Why sigmoid + quadratic cost function learning slow?
The quadratic cost function is given by
where
where I have substituted
Recall the shape of the
We can see from this graph that when the neuron’s output is close to 1, the curve gets very flat, and so
Using the quadratic cost when we have linear neurons in the output layer. Suppose that we have a many-layer multi-neuron network. Suppose all the neurons in the final layer are linear neurons, meaning that the sigmoid activation function is not applied, and the outputs are simply
aLj=zLj . Show that if we use the quadratic cost function then the output errorδL for a single training examplex is given by
δL=aL−y
Similarly to the previous problem, use this expression to show that the partial derivatives with respect to the weights and biases in the output layer are given by
∂C∂wLjk∂C∂bLj==1n∑xaL−1k(aLj−yj)1n∑x(aLj−yj).
This shows that if the output neurons are linear neurons then the quadratic cost will not give rise to any problems with a learning slowdown. In this case the quadratic cost is, in fact, an appropriate cost function to use.
sigmoid + cross-entropy cost function
The cross-entropy cost function
where
The partial derivative of the cross-entropy cost with respect to the weights. We substitute
Putting everything over a common denominator and simplifying this becomes:
Using the definition of the sigmoid function,
We see that the
This is a beautiful expression. It tells us that the rate at which the weight learns is controlled by
In a similar way, we can compute the partial derivative for the bias.
It’s easy to generalize the cross-entropy to many-neuron multi-layer networks. In particular, suppose
Softmax + log-likelihood cost
In a softmax layer we apply the so-called
where in the denominator we sum over all the output neurons.
The log-likelihood cost:
The partial derivative:
These expressions ensure that we will not encounter a learning slowdown. In fact, it’s useful to think of a softmax output layer with log-likelihood cost as being quite similar to a sigmoid output layer with cross-entropy cost.
Given this similarity, should you use a sigmoid output layer and cross-entropy, or a softmax output layer and log-likelihood? In fact, in many situations both approaches work well. As a more general point of principle, softmax plus log-likelihood is worth using whenever you want to interpret the output activations as probabilities. That’s not always a concern, but can be useful with classification problems (like MNIST) involving disjoint classes.
overfitting
In general, one of the best ways of reducing overfitting is to increase the size of the training data. With enough training data it is difficult for even a very large network to overfit.
- Improving the way neural networks learn
- Improving the way neural networks learn
- Improving the way neural networks learn
- CHAPTER 3 Improving the way neural networks learn
- CHAPTER 3 改进神经网络的学习(Improving the way neural networks learn)
- 提高神经网络的学习方式Improving the way neural networks learn-1
- 提高神经网络的学习方式Improving the way neural networks learn-2
- Convergent Learning: Do different neural networks learn the same representations?
- Improving Deep Neural Networks Initialization 参考答案
- Improving Deep Neural Networks Regularization 参考答案
- Improving Deep Neural Networks Gradient Checking 参考答案
- Improving Deep Neural Networks学习笔记(一)
- Improving Deep Neural Networks学习笔记(二)
- Improving Deep Neural Networks Optimization Methods Homework
- Improving Deep Neural Networks学习笔记(三)
- Improving Deep Neural Networks Tensorflow 参考答案
- Understanding and Improving Convolutional Neural Networks via CReLU
- Improving neural networks by preventing co-adaptation of feature detectors
- C++ GUI QT4 编程--gotocell2 在QT5下编译出现的问题及解决
- Expression Tree Introduction - 02
- Android 使用Component跨应用程序访问
- Centos7 安装ELK
- Aap.net母版页跳转无刷新 解决方案
- Improving the way neural networks learn
- 简析 addToBackStack使用和Fragment执行流程
- 编写高质量代码(4)
- 云计算
- 欢迎使用CSDN-markdown编辑器
- Android - Fragment+ViewPager结合使用
- 【HDU1150】【建模】【最小点覆盖】【二分匹配】
- 博客第一篇 哈哈
- 我的感想八