[深度学习论文笔记][Weight Initialization] Understanding the difficulty of training deep feedforward neural

来源:互联网 发布:java中ioc是什么 编辑:程序博客网 时间:2024/06/05 00:29

Glorot, Xavier, and Yoshua Bengio. “Understanding the difficulty of training deep feedforward neural networks.” Aistats. Vol. 9. 2010. [Citations: 722].


1 Cost Function Surface
Plateaus are less present with the softmax cost function, while there are more severe plateaus with the quadratic cost.


2 Xavier Initialization

[Motivation] Ensure that all neurons in the network initially have approximately the same output distribution, and empiracally improves the rate of convergence.


[Forward Pass] Consider linear activation function (or we are in the linear regime at the initialization).




We want


then



[Backward Pass]


We want


then



[Xavier Initialization] As a compromise between these two constraints, we might want to have


Recall the variance of uniform distribution U(−c, c) is


Let


then


I.e., the xavier initialization is



3 Xavier Initialization in Caffe

XavierFiller

- By default, 

- If FillerParameter_VarianceNorm_FAN_OUT, 

- If FillerParameter_VarianceNorm AVERAGE,  


4 References

[1]. shuzfan. http://blog.csdn.net/shuzfan/article/details/51338178.
[2]. F.-F. Li, A. Karpathy and J. Johnson. http://cs231n.github.io/neural-networks-2/.
[3]. Caffe. https://github.com/BVLC/caffe/blob/master/include/caffe/filler.hpp.

1 0
原创粉丝点击