Improving Deep Neural Networks学习笔记(一)
来源:互联网 发布:阿里云负责人 编辑:程序博客网 时间:2024/05/17 01:47
文章作者:Tyan
博客:noahsnail.com | CSDN | 简书
1. Setting up your Machine Learning Application
1.1 Train/Dev/Test sets
Make sure that the dev and test sets come from the same distribution。
Not having a test set might be okay.(Only dev set.)
So having set up a train dev and test set will allow you to integrate more quickly. It will also allow you to more efficiently measure the bias and variance of your algorithm, so you can more efficiently select ways to improve your algorithm.
1.2 Bias/Variance
High Bias: underfitting
High Variance: overfitting
Assumption——human: 0% (Optimal/Bayes error), train set and dev set are drawn from the same distribution.
1.3 Basic Recipe for Machine Learning
High bias –> Bigger network, Training longer, Advanced optimization algorithms, Try different netword.
High variance –> More data, Try regularization, Find a more appropriate neural network architecture.
2. Regularizing your neural network
2.1 Regularization
In logistic regression,
This is called L2 regularization.
This is called L1 regularization.
w
will end up being sparse. In neural network, the formula is
This matrix norm, it turns out is called the Frobenius Norm
of the matrix, denoted with a F
in the subscript.
L2 norm regularization is also called weight decay
.
2.2 Why regularization reduces overfitting?
If W
is set to be reasonabley close to zero, and it will zero out the impact of these hidden units. And that’s the case, then this much simplified neural network becomes a much smaller neural network. It will take you from overfitting to underfitting, but there is a just right case
in the middle.
2.3 Dropout regularization
Dropout will go through each of the layers of the network, and set some probability of eliminating a node in neural network. By far the most common implementation of dropouts today is inverted dropouts.
Inverted dropout, kp
stands for keep-prob
:
In test phase, we don’t use dropout and keep-prob
.
2.4 Understanding dropout
Why does dropout workd? Intuition: Can’t rely on any one feature, so have to spread out weights.
By spreading all the weights, this will tend to have an effect of shrinking the squared norm of the weights.
2.5 Other regularization methods
- Data augmentation.
- Early stopping
3. Setting up your optimization problem
3.1 Normalizing inputs
Normalizing inputs can speed up training. Normalizing inputs corresponds to two steps. The first is to subtract out or to zero out the mean. And then the second step is to normalize the variances.
3.2 Vanishing/Exploding gradients
If the network is very deeper, deep network suffer from the problems of vanishing or exploding gradients.
3.3 Weight initialization for deep networks
If activation function is ReLU
or tanh
, w
initialization is:
Another formula is
3.4 Numberical approximation of gradients
In order to build up to gradient checking, you need to numerically approximate computatiions of gradients.
3.5 Gradient checking
Take matrix W
, vector b
and reshape them into vectors, and then concatenate them, you have a giant vector i
:
If
3.6 Gradient checking implementation notes
- Don’t use gradient check in training, only to debug.
- If algorithm fails gradient check, look at components to try to identify bug.
- Remember regularization.
- Doesn’t work with dropout.
- Run at random initialization; perhaps again after some training.
- Improving Deep Neural Networks学习笔记(一)
- Improving Deep Neural Networks学习笔记(二)
- Improving Deep Neural Networks学习笔记(三)
- DeepAI学习回顾笔记之二( Improving Deep Neural Networks)
- Neural Networks and Deep Learning 学习笔记(一)
- Neural Networks and Deep Learning 学习笔记(一)
- Improving Deep Neural Networks Initialization 参考答案
- Improving Deep Neural Networks Regularization 参考答案
- Improving Deep Neural Networks Gradient Checking 参考答案
- Improving Deep Neural Networks Optimization Methods Homework
- Improving Deep Neural Networks Tensorflow 参考答案
- neural networks and deep learning 学习笔记
- Neural Networks and Deep Learning 学习笔记
- Course 2-Improving Deep Neural Networks--Week 1
- Course 2-Improving Deep Neural Networks--Week 2
- Course 2-Improving Deep Neural Networks--Week 3
- Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization总结
- Andrew Ng Neural-networks-deep-learning 课程笔记一
- CSAPP-Attack-Lab
- SSH与SSM学习之hibernate05——SessionFactory
- <c:if test="value ne, eq, lt, gt,...."> 用法
- 数组与矩阵---未排序数组中累加和为给定值的最长子数组系列问题
- Num.4 分治算法
- Improving Deep Neural Networks学习笔记(一)
- 事务、数据库事务、事务隔离级别、锁的简单总结
- char 与 unsigned char
- 新手起家之梦想飞航
- 十大算法
- 学习前辈系列:根据需求找代码
- Java中的注解详解
- Mybatis <where> <if> <set> <trim> <choose>标签
- 中医微信预约系统,你值得拥有!!!