Improving Deep Neural Networks学习笔记(二)
来源:互联网 发布:合泰触摸单片机芯片 编辑:程序博客网 时间:2024/05/16 12:10
文章作者:Tyan
博客:noahsnail.com | CSDN | 简书
4. Optimization algorithms
4.1 Mini-batch gradient descent
Batch gradient descent is to process entire training set at the same time. Mini-batch gradient descent is to process single mini batch
Run forward propagation and back propagation once on mini batch is called one iteration.
Mini-batch gradient descent runs much faster than batch gradient descent.
4.2 Understanding mini-batch gradient descent
If mini-batch size = m, it’s batch gradient descend.
If mini-batch size = 1, it’s stochastic gradient descend.
In pracice, mini-batch size between 1 and m.
Batch gradient descend: too long per iteration.
Stochastic gradient descend: lose speed up from vectorization.
Mini-batch gradient descend: Faster learning, 1. vectorization 2. Make progress without needing to wait.
Choosing mini-batch size:
If small training set(m <= 2000), use batch gradient descend.
Typical mini-batch size: 64, 128, 256, 512, 1024(rare).
4.3 Exponentially weighted averages
View
It’s called moving average in the statistics literature.
4.4 Understanding exponentially weighted averages
So
Th coefficients is
All of these coefficients, add up to one or add up to very close to one. It is called bias correction.
Implement exponentially weighted average:
Exponentially weighted average takes very low memory.
4.5 Bias correction in exponentially weighted averages
It’s not a very good estimate of the first several day’s temperature. Bias correction is used to mofity this estimate that makes it much better. The formula is:
4.6 Gradient descent with momentum
Gradient descent with momentum almost always works faster than the standard gradient descent algorithm. The basic idea is to compute an exponentially weighted average of gradients, and then use that gradient to update weights instead.
On iteration t:
- compute
dw , db on current mini-batch. - compute
vdw ,vdb vdw=βvdw+(1−β)dw vdb=βvdb+(1−β)db - update dw, db
w=w−αvdw b=b−αvdb
There are two hyperparameters, the most common value for
Another formula is
4.7 RMSprop
RMSprop stands for root mean square prop, that can also speed up gradient descent.
On iteration t:
- compute
dw , db on current mini-batch. - compute
sdw ,sdb sdw=βsdw+(1−β)dw2 sdb=βsdb+(1−β)db2 - update dw, db
w=w−αdwsdw‾‾‾√ b=b−αdbsdb‾‾‾√
In practice, in order to avoid
Usually
4.8 Adam optimization algorithm
On iteration t:
Bias correction:
Update weight:
Adam combines the effect of gradient descent with momentum together with gradient descent with RMSprop. It’s a commonly used learning algorithm that is proven to be very effective for many different neural networks of a very wide variety of architectures.\
Adam stands for Adaptive Moment Estimation.
4.9 Learning rate decay
Learning rate decay is slowly reduce the learning rate.
Other learning rate decay methods:
4.10 The problem of local optima
In very high-dimensional spaces you’re actually much more likely to run into a saddle point, rather than local optimum.
- Unlikely to get stuck in a bad local optima.
- Plateaus can make learning slow.
- Improving Deep Neural Networks学习笔记(二)
- DeepAI学习回顾笔记之二( Improving Deep Neural Networks)
- Improving Deep Neural Networks学习笔记(一)
- Improving Deep Neural Networks学习笔记(三)
- Neural Networks and Deep Learning 学习笔记(二)
- Improving Deep Neural Networks Initialization 参考答案
- Improving Deep Neural Networks Regularization 参考答案
- Improving Deep Neural Networks Gradient Checking 参考答案
- Improving Deep Neural Networks Optimization Methods Homework
- Improving Deep Neural Networks Tensorflow 参考答案
- Coursera-Deep Learning Specialization 课程之(二):Improving Deep Neural Networks: -weak1编程作业
- Coursera-Deep Learning Specialization 课程之(二):Improving Deep Neural Networks: -weak2编程作业
- Coursera-Deep Learning Specialization 课程之(二):Improving Deep Neural Networks: -weak3编程作业
- neural networks and deep learning 学习笔记
- Neural Networks and Deep Learning 学习笔记
- Course 2-Improving Deep Neural Networks--Week 1
- Course 2-Improving Deep Neural Networks--Week 2
- Course 2-Improving Deep Neural Networks--Week 3
- 行内元素竖直方向margin和padding以及盒子模型问题
- 华为机试-字典排序
- 【Tomcat】双击tomcat\bin文件夹中的startup.bat闪退,或者出现很多警告提示的解决方法
- Spring Cloud Stream (1)-代码篇
- 转换流与其子类之间的区别
- Improving Deep Neural Networks学习笔记(二)
- archlinux下网易云音乐netease-cloud-music部分问题
- Latex基本语法
- java 各进程功能java,javac,javaw,javaws,javap
- yii2-imagine 使用方法
- Java的枚举类型用法介绍
- 欢迎使用CSDN-markdown编辑器
- Socket编程创建 形成过程
- java图片插入窗口定时循环移动或者按鼠标位置移动