CAFFE源码学习笔记之激活层

来源：互联网发布：embedpano.js 编辑：程序博客网时间：2024/05/19 14:40

一、前言
在网络中，经过一级卷积层和池化层的处理后，一般都需要进入激活层，进行一次非线性变化或者线性变换。激活层所用激活函数，最开始使用sigmod和tanh函数。但是这两个函数的梯度在越远离x=0的地方越小，最后基本趋近于0，使得网络收敛的速度变慢，造成所谓的“梯度弥散”问题。
这里写图片描述
为了解决该问题，ReLu激活函数被提出，其公式就是：

out_put=max(0,input)

由于其梯度始终保持不变，所以函数收敛非常快。

当然，relu也不是完美的，如果输入小于0,输出会直接判定为0，所以如果学习率过大，可能大部分神经元的输出就成0了，相当于死掉了，所以还有一些针对他的改进。

二、源码分析
1、relu继承自neuron_layer，neuron_layer继承自layer,他本身只是复制了输入的形状来初始化输出的形状。

template <typename Dtype>void NeuronLayer<Dtype>::Reshape(const vector<Blob<Dtype>*>& bottom,      const vector<Blob<Dtype>*>& top) {  top[0]->ReshapeLike(*bottom[0]);}

2、前向计算

relu_param参数设置斜率

关键：out_put=max(0,input)表示为：

top_data[i] = std::max(bottom_data[i], Dtype(0))        + negative_slope * std::min(bottom_data[i], Dtype(0));#negative_slope取0就是Relu，取负数就是P-relu等

template <typename Dtype>void ReLULayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,    const vector<Blob<Dtype>*>& top) {  const Dtype* bottom_data = bottom[0]->cpu_data();  Dtype* top_data = top[0]->mutable_cpu_data();  const int count = bottom[0]->count();  Dtype negative_slope = this->layer_param_.relu_param().negative_slope();  for (int i = 0; i < count; ++i) {//对每个像素点    top_data[i] = std::max(bottom_data[i], Dtype(0))        + negative_slope * std::min(bottom_data[i], Dtype(0));  }}

改进的P-relu的核函数：
这里写图片描述

template <typename Dtype>__global__ void ReLUForward(const int n, const Dtype* in, Dtype* out,    Dtype negative_slope) {  CUDA_KERNEL_LOOP(index, n) {    out[index] = in[index] > 0 ? in[index] : in[index] * negative_slope;  }}

3、后向计算

输入向量为bottom_data，输出向量为top_data，ReLU层公式为
top_datai={bottom_datai,bottom_datai∗negative_slope,if bottom_data_i>0else

对loss求偏导：
∂loss∂bottom_datai=∂loss∂top_datai∗∂top_datai∂bottom_datai={top_diffi,top_diffi∗negative_slope,if bottom_data_i>0else
代码就一句话：

 bottom_diff[i] = top_diff[i] * ((bottom_data[i] > 0)          + negative_slope * (bottom_data[i] <= 0))

CPU版：

template <typename Dtype>void ReLULayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,    const vector<bool>& propagate_down,    const vector<Blob<Dtype>*>& bottom) {  if (propagate_down[0]) {    const Dtype* bottom_data = bottom[0]->cpu_data();    const Dtype* top_diff = top[0]->cpu_diff();    Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();    const int count = bottom[0]->count();    Dtype negative_slope = this->layer_param_.relu_param().negative_slope();    for (int i = 0; i < count; ++i) {      bottom_diff[i] = top_diff[i] * ((bottom_data[i] > 0)          + negative_slope * (bottom_data[i] <= 0));    }  }}

GPU实现

template <typename Dtype>__global__ void ReLUBackward(const int n, const Dtype* in_diff,    const Dtype* in_data, Dtype* out_diff, Dtype negative_slope) {  CUDA_KERNEL_LOOP(index, n) {    out_diff[index] = in_diff[index] * ((in_data[index] > 0)        + (in_data[index] <= 0) * negative_slope);  }}

三、总结
使用ReLu时，需要小心设置学习率，过大会导致神经元大面积死亡。ReLu的效果比sigmod和tanh好。

0 0