caffe: 新建一个loss层

来源：互联网发布：淘宝女式长袖衬衣编辑：程序博客网时间：2024/05/17 05:11

前言

最近开始使用caffe，便准备先尝试用caffe实现一篇论文中的网络，然后再设计自己的网络。这里，我参考的论文是《Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition》。

网络的loss主要有两部分构成，一部分是传统的softmax loss（caffe源码已经实现），另一部分是作者自定的pairwise rank loss，需要自己实现。因此在本篇博客中主要介绍rank loss layer的实现

基本原理

网络的结构如下图所示：

这里写图片描述

rank loss的计算公式如下：
这里写图片描述

这里写图片描述

其中，pst 表示在正确类别上的预测概率，即如果softmax的输出为prob[10], 该样本的标签为i，则 pst=prob[i]。s表示在该尺度下，因为该网络只选了3个尺度，故s的取值为1和2。

基本思路

rank loss需要利用三种尺度的softmax输出的概率值，因此首先在原来的vgg19中添加concat层，将三个输入级联起来，合并为一个输出
接下来是自定义的rank loss layer。输入为concat layer的输出和label，输出为一个标量。

代码实现

头文件
caffe loss layer头文件包含几个要素：构造函数，一个内联type()函数，一个Forward_cpu() 函数的声明，一个Backward_cpu()函数的声明。（此处暂不写GPU函数）。直接套模板就行。

#ifndef CAFFE_RANK_LOSS_LAYER_HPP_#define CAFFE_RANK_LOSS_LAYER_HPP_#include <vector>#include "caffe/blob.hpp"#include "caffe/layer.hpp"#include "caffe/proto/caffe.pb.h"#include "caffe/layers/loss_layer.hpp"namespace caffe {template <typename Dtype>class RankLossLayer:public LossLayer<Dtype>{public:    explicit RankLossLayer(const LayerParameter& param):LossLayer<Dtype>(param){}    //virtual void Reshape(const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top);    virtual inline const char* type() const    {        return "RankLoss";    }protected:   // virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom,   //                        const vector<Blob<Dtype>*>& top);   //   // virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,   //                           const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);    virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom,                             const vector<Blob<Dtype>*>& top);    virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,                              const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);};}   //namespace caffe#endif // CAFFE_RANK_LOSS_LAYER_HPP_

源文件
源文件主要负责Forward_cpu()和Backward_cpu()两个函数的实现。首先是前向传递的实现。

template <typename Dtype>void RankLossLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,const vector<Blob<Dtype>*>& top){    //该层有两个输入，一个是concat layer的输出，一个是label    //这里将bottom[0]作为softmax级联之后的输出，bottom[1]存放对应的label    //记得在写train_val.prototxt文件时，也要按照这种顺序    const Dtype* pred = bottom[0]->cpu_data();    const Dtype* label = bottom[1]->cpu_data();    //获取batch_size的大小    int num = bottom[0]->num(); //number of samples    //获取三个softmax级联之后的总长度。比如：如果一个有10类，batch_size=32,则count=32*10=320    int count = bottom[0]->count(); //length of data    //获取总的类别数    int dim = count / num; // dim of classes    //rank loss的计算中包含margin这个参数，因此首先获取这个参数    Dtype margin=this->layer_param_.rank_loss_param().margin();    //初始化所求的loss，为一个标量    Dtype loss = Dtype(0.0);    for(int i=0;i<num;i++)    {        //分别获取样本在三个尺度下预测正确的概率值在bottom[0]中的下标        int scale1_index = i*dim+(int)label[i];        int scale2_index = i*dim+dim/3+(int)label[i];        int scale3_index = i*dim+dim/3*2+(int)label[i];        // 按照公式分别计算scale1和2，scale2和3之间的rank loss        Dtype rankLoss12 = std::max(Dtype(0), pred[scale1_index]-pred[scale2_index]+margin);        Dtype rankLoss23 = std::max(Dtype(0), pred[scale2_index]-pred[scale3_index]+margin);        // 累加一个batch中的所有样本的rank loss        loss += (rankLoss12 + rankLoss23);    }    //输出    top[0]->mutable_cpu_data()[0] = loss;}

接下来实现Backward_cpu()函数。

template<typename Dtype>void RankLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom){    //获取前一层反传过来的误差    const Dtype loss_weight = top[0]->cpu_diff()[0];    //参考Forward_cpu()中的注释    const Dtype* pred = bottom[0]->cpu_data();    const Dtype* label = bottom[1]->cpu_data();    Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();    int num = bottom[0]->num();    int count = bottom[0]->count();    int dim = count / num; // dim of classes    Dtype margin=this->layer_param_.rank_loss_param().margin();    //对所要计算的误差初始化    memset(bottom_diff, Dtype(0), count*sizeof(Dtype));    for(int i=0;i<num;i++)    {        int scale1_index = i*dim+(int)label[i];        int scale2_index = i*dim+dim/3+(int)label[i];        int scale3_index = i*dim+dim/3*2+(int)label[i];        //可根据rank loss的计算公式推导出梯度的计算公式，用下列代码实现        //推导方式与ReLU函数类似，分情况计算梯度        if(pred[scale1_index]-pred[scale2_index]+margin>0)        {            bottom_diff[scale1_index] += loss_weight;            if(pred[scale2_index]-pred[scale3_index]+margin<0)            {                bottom_diff[scale2_index] -= loss_weight;            }            else            {                bottom_diff[scale3_index] -= loss_weight;            }        }        else        {            if(pred[scale2_index]-pred[scale3_index]+margin>0)            {                bottom_diff[scale2_index] += loss_weight;                bottom_diff[scale3_index] -= loss_weight;            }        }    }}

测试源文件
rank loss layer的核心代码已经完成，接下来需要写一个测试文件，用以测试梯度计算是否正确。这里主要测试反向传播。

（1）首先是产生测试时需要用到的模拟数据。

    RankLossLayerTest()        //(10,30,1,1)，10表示batch_size,30表示3个softmax级联之后的维度为30        :blob_bottom_data_(new Blob<Dtype>(10, 30, 1, 1)),          blob_bottom_label_(new Blob<Dtype>(10, 1, 1, 1)),          blob_top_loss_(new Blob<Dtype>())    {        Caffe::set_random_seed(1701);        FillerParameter filler_param;        filler_param.set_std(10);        GaussianFiller<Dtype> filler(filler_param);        //产生级联的softmax数据        filler.Fill(this->blob_bottom_data_);        blob_bottom_vec_.push_back(blob_bottom_data_);        for(int i=0;i<blob_bottom_label_->count();i++)        {            //产生label数据            blob_bottom_label_->mutable_cpu_data()[i] = caffe_rng_rand() % (blob_bottom_label_->count());        }        blob_bottom_vec_.push_back(blob_bottom_label_);        blob_top_vec_.push_back(blob_top_loss_);    }    virtual ~RankLossLayerTest()    {        delete blob_bottom_data_;        delete blob_bottom_label_;        delete blob_top_loss_;    }

（2）检查反向传播：

    void TestBackward()    {        LayerParameter layer_param;        RankLossParameter* rank_loss_param = layer_param.mutable_rank_loss_param();        rank_loss_param->set_margin(1);        RankLossLayer<Dtype> layer(layer_param);        //1e-4表示仿真步长，1e-2表示估计的梯度值和计算得到的梯度值之间的相对误差不能超过1e-2        //阅读源码，可以发现caffe是通过有限中心差分的方法来估计梯度        GradientChecker<Dtype> checker(1e-4, 1e-2);//, 1701, 0, 0.01);        //最后一个参数0表示只检查对bottom[0]的梯度，而不检查对label的梯度        checker.CheckGradientExhaustive(&layer, this->blob_bottom_vec_, this->blob_top_vec_,0);    }

至此，所有代码已经完成。因为在计算loss的时候需要传递一个参数margin，因此还需要修改caffe.proto

添加层的参数

打开caffe/src/caffe/proto/caffe.proto，添加message，如下

message RankLossParameter{    optional float margin = 1 [default=0.05];}

然后，在message LayerParameter{}中添加

//此处的150是我随便取的，只要不和上面其他的参数重复就行optional RankLossParameter rank_loss_param = 150;

至此所有工作均已完成，接下来进行测试和实验。

测试

make -j8 allmake -j8 testmake runtest

如果想单独测试自己写的rank loss，则可以

make -j8 allmake -j8 testmake runtest GTEST_FILTER='RankLossLayerTest/*'

参考资料

Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition
https://github.com/BVLC/caffe/wiki/Simple-Example:-Sin-Layer
caffe源码：https://github.com/BVLC/caffe

阅读全文

0 0