DL-1: Tips for Training Deep Neural Network
来源:互联网 发布:dns默认端口号 编辑:程序博客网 时间:2024/06/01 03:57
Different approaches for different problems.
e.g. dropout for good results on testing data.
Choosing proper loss
- Square Error
- Cross Entropy
Mini-batch
We do not really minimize total loss!
batch_size: 每次批处理训练样本个数;
nb_epoch: 整个训练数据重复处理次数。
总的训练样本数量不变。
Mini-batch is Faster. Not always true with parallel computing.
Mini-batch has better performance!
Shuffle the training examples for each epoch. This is the default of Keras.
New activation function
Q: Vanishing Gradient Problem
- Smaller gradients
- Learn very slow
Almost random
Larger gradients
- Learn very fast
- Already converge
2006 RBM –> 2015 ReLU
ReLU: Rectified Linear Unit
1. Fast to compute
2. Biological reason
3. Infinite sigmoid with different biases
4. Vanishing gradient problem
A Thinner linear network. Do not have smaller gradients.
Adaptive Learning Rate
Set the learning rate
η carefully.
- If learning rate is too large, total loss may not decrease after each update.
- If learning rate is too small, training would be too slow.
Solution:
- Popular & Simple Idea: Reduce the learning rate by some factor every few epochs.
- At the beginning, use larger learning rate
- After several epochs, reduce the learning rate. E.g. 1/t decay:
ηt=η/t+1−−−−√
- Learning rate cannot be one-size-fits-all.
- Giving different parameters different learning rates
Adagrad:
Summation of the square of the previous derivatives.
Observation:
1. Learning rate is smaller and smaller for all parameters.
2. Smaller derivatives, larger learning rate, and vice versa.
- Adagrad [John Duchi, JMLR’11]
- RMSprop
https://www.youtube.com/watch?v=O3sxAc4hxZU
- Adadelta [Matthew D. Zeiler, arXiv’12]
- “No more pesky learning rates” [Tom Schaul, arXiv’12]
- AdaSecant [Caglar Gulcehre, arXiv’14]
- Adam [DiederikP. Kingma, ICLR’15]
- Nadam
http://cs229.stanford.edu/proj2015/054_report.pdf
Momentum
Overfitting
Learning target is defined by the training data.
Training data and testing data can be different.
The parameters achieving the learning target do not necessary have good results on the testing data.
Panacea for Overfitting
- Have more training data
- Create more training data
Early Stopping
Keras-Early Stopping
Regularization
Weight decay is one kind of regularization.
Keras-regularizers
Dropout
Training
- Each time before updating the parameters
- Each neuron has p% to dropout
The structure of the network is changed.
- Using the new network for training
For each mini-batch, we resample the dropout neurons.
- Each neuron has p% to dropout
Testing
**No dropout**- If the dropout rate at training is p%, all the weights times (1-p)%
- Assume that the dropout rate is 50%.
If a weight w = 1 by training, set w = 0.5 for testing.
Dropout -Intuitive Reason
- When teams up, if everyone expect the partner will do the work, nothing will be done finally.
- However, if you know your partner will dropout, you will do better.
- When testing, no one dropout actually, so obtaining good results eventually.
Dropout is a kind of ensemble
Network Structure
CNN is a very good example!
参考
- Deep Learning Tutorial-李宏毅(Hung-yi Lee)
- DL-1: Tips for Training Deep Neural Network
- tips/tricks in deep neural network
- 用matlab训练数字分类的深度神经网络Training a Deep Neural Network for Digit Classification
- Deep Convolutional Neural Network for Image Deconvolution
- Deep Neural Network for Image Classification:Application
- Techniques for preventing overfitting in training Deep Neural Networks
- neural network and deep learning (1)
- Course 1-Neural Network & Deep Learning
- Deep Neural Network
- Deep Neural Network
- Deep Neural Network
- [论文解读] DSD -- Dense-Sparse-Dense Training for Neural Network
- 吴恩达 深度学习 1-4 课后作业2 Deep Neural Network for Image Classification: Application
- 吴恩达 深度学习 1-4 课后作业2 Deep Neural Network for Image Classification: Application
- Bag-of-Words Based Deep Neural Network for Image Retrieval
- Multi-column deep neural network for traffic sign classification
- Decoupled deep neural network for semi-supervised semantic segmentation 笔记
- 【Deep Learning】Review of Stereo Matching by Training a Convolutional Neural Network to Compare Image
- 学C#你应该熟练使用ILDasm和Reflector【带视频教程】
- python安装及numpy的安装
- 关于“属性”的几个问题,也许面试会问到哦~
- python基础篇日记
- nyoj 35 表达式求值
- DL-1: Tips for Training Deep Neural Network
- 关于“参数”的几个问题,也许面试会问到哦~
- ios UIAlertController使用
- 关于“构造函数”中的几个小问题,也许面试会问到哦~
- 神奇的decimal,也许面试会问到哦~
- 看看这个常常被初级程序员弄不懂的 “事件”
- 入门级Hadoop集群搭建详细教程(七):SSH免密码登陆
- C/C++ 知识汇总
- 一个类型转换而引起的三级事件的一些思考