Deep Learning:正则化(五)
来源:互联网 发布:网络打麻将算赌博吗 编辑:程序博客网 时间:2024/06/05 15:30
Noise Robustness
Dataset Augmentation has motivated the use of noise applied to the inputs as a dataset augmentation strategy. For some models, the addition of noise with infinitesimal variance at the input of the model is equivalent to imposing a penalty on the norm of the weights.
- In the general case, it is important to remember that noise injection can be much more powerful than simply shrinking the parameters, especially when the noise is added to the hidden units.
- Another way that noise has been used in the service of regularizing models is by adding it to the weights. This technique has been used primarily in the context of recurrent neural networks (Jim et al., 1996; Graves, 2011). This can be interpreted as a stochastic implementation of a Bayesian inference over the weights. The Bayesian treatment of learning would consider the model weights to be uncertain and representable via a probability distribution that reflects this
uncertainty. - This can also be interpreted as equivalent (under some assumptions) to a more traditional form of regularization. Adding noise to the weights has been shown to be an effective regularization strategy in the context of recurrent neural networks.
We study the regression setting, where we wish to train a function
The training set consists of m labeled examples
We now assume that with each input presentation we also include a random perturbation
the injection of noise, we are still interested in minimizing the squared error of the output of the network. The objective function thus becomes:
For small η, the minimization of J with added weight noise (with covariance ηI) is equivalent to minimization of J with an additional regularization term:
This form of regularization encourages the parameters to go to regions of parameter space where small perturbations of the weights have a relatively small influence on the output.
In other words, it pushes the model into regions where the model is relatively insensitive to small variations in the weights, finding points that are not merely minima, but minima surrounded by flat regions.
Injecting Noise at the Output Targets
Most datasets have some amount of mistakes in the y labels. It can be harmful to maximize log p(y | x) when y is a mistake. One way to prevent this is to explicitly model the noise on the labels.
- For example, we can assume that for some small constant
ϵ , the training set label y is correct with probability1−ϵ , and otherwise any of the other possible labels might be correct. - This assumption is easy to incorporate into the cost function analytically, rather than by explicitly drawing noise samples.
- The standard cross-entropy loss may then be used with these soft targets. Maximum likelihood learning with a softmax classifier and hard targets may actually never converge—the softmax can never predict a probability of exactly 0 or exactly 1, so it will continue to learn larger and larger weights, making more extreme predictions forever.
- It is possible to prevent this scenario using other regularization strategies like weight decay.
- Label smoothing has the advantage of preventing the pursuit of hard probabilities without discouraging correct classification.
- Deep Learning:正则化(五)
- Deep Learning:正则化(一)
- Deep Learning:正则化(二)
- Deep learning:正则化(三)
- Deep Learning:正则化(四)
- Deep Learning:正则化(六)
- Deep Learning:正则化(七)
- Deep Learning:正则化(八)
- Deep Learning:正则化(九)
- Deep Learning:正则化(十)
- Deep Learning:正则化(十一)
- Deep Learning:正则化(十二)
- Deep Learning:正则化(十三)
- Deep Learning:正则化(十四)
- Deep Learning 初探(五)
- Deep Learning 4 -正则化
- 深度学习(Deep Learning)读书思考三:正则化
- 深度学习(Deep Learning)读书思考三:正则化
- Kotlin类和对象(四)——扩展和data类
- OC-C混编
- 【知了堂学习笔记】SQL查询总结(2)
- Hibernate @OnetoOne注解
- 权限
- Deep Learning:正则化(五)
- TableLayout示例
- Android depencies报错解决方法
- 基本算术运算符与模运算符
- 创建过滤器的步骤
- Docker入门教程 Part 3 Services
- Hibernate框架
- 核心业务课程
- Docker Compose笔记 02 安装笔记