整理——Some basic questions about caffe and deep learning
来源:互联网 发布:淘宝上如何交电费 编辑:程序博客网 时间:2024/05/16 18:56
Rectified Linear Units
摘自http://www.douban.com/note/348196265/
sigmoid 和 tanh 作为神经网络的激活函数已经很熟悉,今天看了一下 ReLU 这种线性激活函数。很显然,线性激活函数的计算开销又大大降低。而且很多工作显示 ReLU 有助于提升效果[1].
sigmoid:
g(x) = 1 /(1+exp(-1)). g’(x) = (1-g(x))g(x).
tanh :
g(x) = sinh(x)/cosh(x) = ( exp(x)- exp(-x) ) / ( exp(x) + exp(-x) )
Rectifier (ReL):
- hard ReLU: g(x)=max(0,x)
- Noise ReLU max(0, x+N(0, σ(x)).
softplus:
g(x) = log(1+exp(x)), 导数就是 logistic function
[以下摘自quora]
The major differences between the sigmoid and ReL function are:
Sigmoid function has range [0,1] whereas the ReL function has range [0,\infty] . Hence sigmoid function can be used to model probability, whereas ReL can be used to model positive real number.
The gradient of the sigmoid function vanishes as we increase or decrease x. However, the gradient of the ReL function doesn’t vanish as we increase x(gradient vanishing problem, 这点对训练神经网络很不好). In fact, for max function, gradient is defined as
[以上摘自quora]
ReLU 的优点在 [1] 里提到:
Hard ReLU is naturally enforcing sparsity.
The derivative of ReLU is constant,
可能 ReLU 的问题是 dead-zone (输出恒为0的结点)的问题(The derivative of hard ReLU is constant over two ranges x<0 and x>=0, for x>0, g’=1, and x<0, g’=0)。这样就会“stuck”。有些小 trick 可以解决这个问题,比如把 bias 初始设为一个正数,但是这个问题在有些论文中[2]被指出并没有什么影响。当然还可以更换其他线性激活函数,比如 maxout 和 softplus.
[1] Rectifier Nonlinearities Improve Neural Network Acoustic Models
[2] Deep Sparse Rectifier Neural Networks
[3] http://www.quora.com/Deep-Learning/What-is-special-about-rectifier-neural-units-used-in-NN-learning
关于relu来源的描述:
Geoff Hinton gave a lecture in the Summer of 2013 that I found very helpful in understanding ReLUs and the rest. Essentially he claimed that the original activation function was chosen arbitrarily and that the ReLU’s work “better”, but aren’t the be-all-end-all. Also interesting was that the ReLU is an approximation to the summation of an infinite number of sigmoids with varying offsets - I wrote a blog post showing this is the case. They arrive at this function because of experimenting with a deep network where they varied the offsets of the activation functions at random until it “just worked” without pretraining. Based off this observation Hinton decided to try a network that essentially tried all the offsets at once - hence the ReLU.
- 整理——Some basic questions about caffe and deep learning
- some questions and tips about java
- Some questions about ConcurrentLinkedQueue
- deep learning and some relate topic
- Answers to some common problems and questions about Xen
- Deep learning tutorial on Caffe technology : basic commands, Python and C++ code.
- Caffe —— Deep learning in Practice
- Caffe —— Deep learning in Practice
- Caffe —— Deep learning in Practice
- Some Questions about eXtremeDB Log
- Some questions about RT-preempt
- Some questions about ambient noise
- network of caffe and deep learning
- Unsupervise learning——Unsupervised feature learning and deep learning
- Neural Networks and Deep Learning 资料整理
- Caffe —— Deep learning in Practice 深度学习实践
- some links about learning java
- Some resource about Machine Learning
- Redis安装部署(转)
- Ubuntu&win7 双硬盘双系统
- android-async-http源码解析
- linux程序设计——使用FIFO的客户/服务器的应用程序(第十三章)
- Android仿IOS的AssistiveTouch的控件EasyTouch实现
- 整理——Some basic questions about caffe and deep learning
- 菜鸟学习c++—实现学生登陆管理系统
- 单链表判断是否带环,环的接入点
- python的学习之路linux和windows双修第一步
- DB配置文件命名推荐
- OAuth和OpenID的区别
- swift语言中的三个知识点(可选类型、可选绑定、nil)
- [leetcode] 164.Maximum Gap
- Android屏幕适配