[深度学习论文笔记][Weight Initialization] Understanding the difficulty of training deep feedforward neural

来源：互联网发布：java中ioc是什么编辑：程序博客网时间：2024/06/05 00:29

Glorot, Xavier, and Yoshua Bengio. “Understanding the difficulty of training deep feedforward neural networks.” Aistats. Vol. 9. 2010. [Citations: 722].

1 Cost Function Surface
Plateaus are less present with the softmax cost function, while there are more severe plateaus with the quadratic cost.

2 Xavier Initialization

[Motivation] Ensure that all neurons in the network initially have approximately the same output distribution, and empiracally improves the rate of convergence.

[Forward Pass] Consider linear activation function (or we are in the linear regime at the initialization).

We want

then

[Backward Pass]

We want

then

[Xavier Initialization] As a compromise between these two constraints, we might want to have

Recall the variance of uniform distribution U(−c, c) is

Let

then

I.e., the xavier initialization is

3 Xavier Initialization in Caffe

XavierFiller

- By default,

- If FillerParameter_VarianceNorm_FAN_OUT,

- If FillerParameter_VarianceNorm AVERAGE,

4 References

[1]. shuzfan. http://blog.csdn.net/shuzfan/article/details/51338178.
[2]. F.-F. Li, A. Karpathy and J. Johnson. http://cs231n.github.io/neural-networks-2/.
[3]. Caffe. https://github.com/BVLC/caffe/blob/master/include/caffe/filler.hpp.

1 0