caffe下relu_layer.cu详解

来源：互联网发布：数码网络侦探攻略编辑：程序博客网时间：2024/05/17 12:04

relu_layer是caffe框架的一个线性激活单元，具体功能、作用、和c++代码我不做详细分析，相信有点c++基础和深度学习基础的孩子都能看懂，今天我来详细分析relu_layer.cu文件，因为里面有点小坑思考了很久才想明白，写出来分享一波。首先是ReLULayer<Dtype>::Forward_gpu函数，这是c++写的代码，具体难度应该不大。

template <typename Dtype>void ReLULayer<Dtype>::Forward_gpu(const vector<Blob<Dtype>*>& bottom,    const vector<Blob<Dtype>*>& top) {  const Dtype* bottom_data = bottom[0]->gpu_data();  Dtype* top_data = top[0]->mutable_gpu_data();  const int count = bottom[0]->count();  Dtype negative_slope = this->layer_param_.relu_param().negative_slope();  // NOLINT_NEXT_LINE(whitespace/operators)  ReLUForward<Dtype><<<CAFFE_GET_BLOCKS(count), CAFFE_CUDA_NUM_THREADS>>>(      count, bottom_data, top_data, negative_slope);  CUDA_POST_KERNEL_CHECK;  // << " count: " << count << " bottom_data: "  //     << (unsigned long)bottom_data  //     << " top_data: " << (unsigned long)top_data  //     << " blocks: " << CAFFE_GET_BLOCKS(count)  //     << " threads: " << CAFFE_CUDA_NUM_THREADS;}

ReLUForward<Dtype><<<CAFFE_GET_BLOCKS(count), CAFFE_CUDA_NUM_THREADS>>>(      count, bottom_data, top_data, negative_slope);

追踪这个函数可以看到代码是这样

template <typename Dtype>__global__ void ReLUForward(const int n, const Dtype* in, Dtype* out,    Dtype negative_slope) {  CUDA_KERNEL_LOOP(index, n) {    out[index] = in[index] > 0 ? in[index] : in[index] * negative_slope;  }}

里面出现了宏定义，可以跳转宏定义查看源代码

// CUDA: grid stride looping#define CUDA_KERNEL_LOOP(i, n) \  for (int i = blockIdx.x * blockDim.x + threadIdx.x; \       i < (n); \       i += blockDim.x * gridDim.x)

这里有个问题，为什么需要这个循环呢？不是一个线程处理一个数据吗？这个循环其实是为了防止用户没有分配足够线程数而准备的，使每个线程串行处理多个数据，当用户分配足够多的线程时其实循环是不起任何作用的。例如 caffe上面的代码，分配的线程是足够的，所以循环是不起任何作用的。这样写应该是为了确保安全可靠性。如果分析不正确或者有误，还望赐教。

阅读全文

0 0