Deep Learning:Optimization for Training Deep Models(二)
来源:互联网 发布:域名备案需要主机吗 编辑:程序博客网 时间:2024/05/22 06:22
Challenges in Neural Network Optimization
When training neural networks, we must confront the general non-convex case. Even convex optimization is not without its complications. In this section, we summarize several of the most prominent challenges involved in optimization for training deep models.
Ill-Conditioning
Some challenges arise even when optimizing convex functions. Of these, the most prominent is ill-conditioning of the Hessian matrix H.
The ill-conditioning problem is generally believed to be present in neural network training problems. Ill-conditioning can manifest by causing SGD to get “stuck” in the sense that even very small steps increase the cost function.
A second-order Taylor series expansion of the cost function predicts that a gradient descent step of
to the cost. Ill-conditioning of the gradient becomes a problem when
To determine whether ill-conditioning is detrimental to a neural network training task, one can monitor the squared gradient norm
Though ill-conditioning is present in other settings besides neural network training, some of the techniques used to combat it in other contexts are less applicable to neural networks. For example, Newton’s method is an excellent tool for minimizing convex functions with poorly conditioned Hessian matrices, but in the subsequent sections we will argue that Newton’s method requires significant modification before it can be applied to neural networks.
Local Minima
With non-convex functions, such as neural nets, it is possible to have many local minima. Indeed, nearly any deep model is essentially guaranteed to have an extremely large number of local minima.
Neural networks and any models with multiple equivalently parametrized latent variables all have multiple local minima because of the model identifiability problem.
A model is said to be identifiable if a sufficiently large training set can rule out all but one setting of the model’s parameters. Models with latent variables are often not identifiable because we can obtain equivalent models by exchanging latent variables with each other.
- Deep Learning:Optimization for Training Deep Models(二)
- Deep Learning:Optimization for Training Deep Models(零)
- Deep Learning:Optimization for Training Deep Models(一)
- Optimization for Deep Learning Highlights in 2017
- [阅读笔记]Programming Models for Deep Learning
- Deep Learning(二)
- training deep learning model
- Optimization algorithm in Deep Learning
- DEEP LEARNING FOR CONTROL USING AUGMENTED HESSIAN-FREE OPTIMIZATION
- Deep Learning介绍(二)
- Deep Learning 初探(二)
- 【Deep Learning】笔记:Tips for deep learning
- Deep Learning for Beginners
- Deep Learning for OCR
- using learning rate schedules for deep learning models in python with keras
- 【CNNCRF】Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation
- Deep Learning L教程(二)
- Deep Learning 文章选读(二)
- 数据库里程(1):数据库的ACID特性
- mybatis 在name 模糊查询时出现问题以及解决方法
- 报错nginx: [emerg] getgrnam("nginx") failed
- jQuey学习笔记之一
- Maven
- Deep Learning:Optimization for Training Deep Models(二)
- Linux笔记
- Visual Studio 各版本下载以及VS2013所有版本的产品密钥
- 面试题精选(二)
- HDU2041
- codeforces 867B
- 关于destroy()与list集合之间的问题
- 二叉树的性质
- matlab GUI 新手入门