【论文笔记】residual neural network-kaiming he

来源:互联网 发布:淘宝官方网站登录网站 编辑:程序博客网 时间:2024/06/04 20:11

http://arxiv.org/abs/1512.03385


The stack of layer will cause degradation(same train, more layer, less accuracy, which is not caused by overfit or derivation vanishing). 


H(x) is expected result, F(x) is residual error, H(x)=F(x)+x. Assume F(x) is easier be approached by CNN than H(x). Experiment support this assumption. 

Bottle-neck architecture of ResNets is more economical.




ReLu

Activation function. Its derivative is logistic function. y = 0 when x < 0, it reduce the number of active neuron in network. Therefore there are only about half of parameters should be modified when BP, increase the training speed.


derivation vanishing

Activation function has output range (-1,1) or (0,1), cause the decrease of derivation in back propagation, making shallow layer parameters can not be modified effectively, called vanish.

0 0
原创粉丝点击