人脸识别：PairLoss

来源：互联网发布：中国网络移动经纪人编辑：程序博客网时间：2024/05/22 14:55

本次介绍的人脸识别方法，其核心贡献就是如何加快相似度的学习速度，这里所谓的相似度和一般意义上的Triplet Loss很像，即：相同身份的人脸距离较近，不同身份的人脸距离较远。方法来源于：
《arxiv：Learning a Metric Embedding for Face Recognition using the Multibatch Method》

Introduction

很多人脸识别模型都是一种Metric或者Embeding模型，即相同身份的人脸距离较近，不同身份的人脸距离较远。这样有一个好处就是，模型除了用来做recognition之外还可以很方便的拿来做face verification或者face clustering.

基于深度神经网络的Embeding方法有很多，其不同之处可以归到3个方面：
（1）Loss Function

有的是直接比较两个目标相同还是不相同，即 \(L[x_i, x_j]\);
有的使用的是最近很流行的Triplet Loss，即\(L[x_{pos},x_{pos},x_{neg}]\)

（2）Network Architecture

网络的构建一直都很灵活，有得网络还需要对图片进行对齐等预处理。

（3）Classification Layer
有得直接训练一个端到端的Embedding网络；
有得是先按照普通的分类问题训练，然后再提取中间层特征再去train一个Embedding网络；

以著名的Google的FaceNet为例，在只挖掘hard三元组的情况下，多机训练了超过1个月。
因此，本文的目标就是降低训练时间

Learn a Metric

整个网络学习一个从输入\(x\)到输出\(f_w(x)\in\mathbb{R}^d\)的一个映射，学习规则如下:

\(y=y’ \Longrightarrow |f_w(x)-f_w(x’)|^2 < \theta-1\)

\(y\neq y’ \Longrightarrow |f_w(x)-f_w(x’)|^2 > \theta+1\)

现在，我们先据此定义one pair的Loss,其中训练集定义为：\({(x_i,y_i)_{i=1}^m},y_i为标签\)

\(l(w,\theta;x_i,x_j,y_{ij})=(1-y_{ij}(\theta-|f_w(x)-f_w(x’)|^2))_+\)

其中，\(y_{ij}\in{\pm1}\),+1表示\(x_i和x_j\)属于同一身份，另外\((u)_+:=max(u,0)\)。
则整体的Loss为：

\(L(w,\theta)=\frac{1}{m^2-m} \sum_{i\neq j\in [m]}l(w,\theta;x_i,x_j,y_{ij})\)

另一方面，文章从理论和实验上证明了上面的Loss比hinge-loss或者softmax-loss等多分类Loss要更难收敛。
所以，Google的Facenet中有很大一部分工作就是在如何选择和设计Triplet三元组，因此本文后面就参考该思想设计了新的训练方法。

The Multi-Batch Estimator

这一部分在原文中占比挺多，可惜感觉完全在水，总之实现的方法就是：
假如batch_size=K，那么两两配对的话总共会有\(K\times(K-1)\)种可能。（不过实际程序实现的时候，应该只有\(\frac{K\times(K-1)}{2}\),因为\((x_i,x_j)\)或者\((x_j,x_i)\)在BP时是一样的）。

于是，我们实际每一个batch中都进行类似遍历，最后将所有pair的loss加和即为一个batch的Loss。

原文中作者的实验参数配置如下：batch_size=256,共16个人每人16张图片; 训练图片2.6M; 模型大小为1.3M; 输入图像为112x112的RGB图像，编码长度128Bit; 学习率固定0.01，最后一个epoch降为0.001。

备注：

(1)：训练图片来自互联网收集，其中有一定错误，采集办法来自于《Deep Face Recognition》,也是一篇很有启发性的方法。

(2)：文中给出了详细的网络结构，该结构参考了NIN结构；

(3)：输入图片需要先进行对齐，作者对齐时也采用了深度网络。

caffe 实现

我在caffe中实现了该层，命名为PairLoss.

#include <vector>#include "caffe/layers/pair_loss_layer.hpp"#include "caffe/util/math_functions.hpp"namespace caffe {template <typename Dtype>void PairLossLayer<Dtype>::Reshape(  const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {  LossLayer<Dtype>::Reshape(bottom, top);    CHECK_EQ(bottom[0]->num(), bottom[1]->count())    << "Inputs must have the same num and one input vs. one label.";    int num = bottom[0]->num();    sub_temp_.Reshape(num*(num-1)/2,bottom[0]->channels(),            bottom[0]->height(),bottom[0]->width());    dot_temp_.Reshape(num*(num-1)/2,1,1,1);    use_global_stats_ = this->phase_ == TEST;    //init theta    if (this->blobs_.size() == 0) {        this->blobs_.resize(1);        vector<int>sz(1);        sz[0]=1;        this->blobs_[0].reset(new Blob<Dtype>(sz));        this->blobs_[0]->mutable_cpu_data()[0]=Dtype(1.1);//init with theta=1.1        LOG(INFO) << "PairLossLayer parameter initialization successful";    }    if(use_global_stats_ )        this->param_propagate_down_.resize(this->blobs_.size(), false);    else        this->param_propagate_down_.resize(this->blobs_.size(), true);}template <typename Dtype>void PairLossLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,    const vector<Blob<Dtype>*>& top) {    Dtype loss(0.0);    int num = bottom[0]->num();    int dim = bottom[0]->count(1);    const Dtype* label = bottom[1]->cpu_data();    const Dtype  theta = this->blobs_[0]->cpu_data()[0];    Dtype loss_pair(0.0);    all_num = 0;    same_num = 0;    theta_num = 0;    for(int i=0;i<num;++i){        const int label_value_i = static_cast<int>(label[i]);        for(int j=i+1;j<num;++j){            const int label_value_j = static_cast<int>(label[j]);            caffe_sub(                    dim,                    bottom[0]->cpu_data() + i*dim,  //x_i                    bottom[0]->cpu_data() + j*dim,  //x_j                    sub_temp_.mutable_cpu_data() + all_num * dim);  // x_i-x_j            dot_temp_.mutable_cpu_data()[all_num] =caffe_cpu_dot(                    dim,                    sub_temp_.cpu_data() + all_num * dim,                    sub_temp_.cpu_data() + all_num * dim);      //(x_i-x_j)^2            if(label_value_j==label_value_i) //same indentity            {                loss_pair = Dtype(1.0)-theta+dot_temp_.cpu_data()[all_num];                if (loss_pair > Dtype(0.0))                {                    loss += loss_pair;                    theta_num -=1;                }                same_num++;            }            else            {                loss_pair = Dtype(1.0)+theta-dot_temp_.cpu_data()[all_num];                if (loss_pair > Dtype(0.0))                {                    loss += loss_pair;                    theta_num +=1;                }            }            all_num ++;         }    }    top[0]->mutable_cpu_data()[0] = loss/all_num;}template <typename Dtype>void PairLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,    const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {  if (propagate_down[1]) {    LOG(FATAL) << this->type()               << " Layer cannot backpropagate to label inputs.";  }  const Dtype alpha = top[0]->cpu_diff()[0] /all_num;  int num = bottom[0]->num();  int dim = bottom[0]->count(1);  const Dtype* label = bottom[1]->cpu_data();  // compute diff to theta  if (this->param_propagate_down_[0]) {    this->blobs_[0]->mutable_cpu_diff()[0] = theta_num*alpha;  }  // compute diff to x  if (propagate_down[0]) {    const Dtype  theta = this->blobs_[0]->cpu_data()[0];    Dtype* bout = bottom[0]->mutable_cpu_diff();    //init diff    caffe_set(bottom[0]->count(), Dtype(0.0), bout);    int count=0;    int false_num=0;    for(int i=0;i<num;++i){        const int label_value_i = static_cast<int>(label[i]);        for(int j=i+1;j<num;++j){            const int label_value_j = static_cast<int>(label[j]);            if(label_value_j==label_value_i &&                 Dtype(1.0)-theta+dot_temp_.cpu_data()[count]>Dtype(0.0)) //same indentity            {                false_num++;                caffe_cpu_axpby(                            //x_i:alpha*2*(x_i-x_j)                    dim,                    Dtype(2.0)*alpha,                    sub_temp_.cpu_data() + count*dim,                    Dtype(1.0),                    bout + i*dim);                caffe_cpu_axpby(                            //x_j:-alpha*2*(x_i-x_j)                    dim,                    Dtype(-2.0)*alpha,                    sub_temp_.cpu_data() + count*dim,                    Dtype(1.0),                    bout + j*dim);            }            if(label_value_j!=label_value_i &&                 Dtype(1.0)+theta-dot_temp_.cpu_data()[count]>Dtype(0.0)) //different indentity            {                false_num++;                caffe_cpu_axpby(                            //x_i:alpha*2*(x_i-x_j)                    dim,                    Dtype(-2.0)*alpha,                    sub_temp_.cpu_data() + count*dim,                    Dtype(1.0),                    bout + i*dim);                caffe_cpu_axpby(                            //x_j:-alpha*2*(x_i-x_j)                    dim,                    Dtype(2.0)*alpha,                    sub_temp_.cpu_data() + count*dim,                    Dtype(1.0),                    bout + j*dim);            }            count++;        }    }    if (use_global_stats_)//test: acc        top[0]->mutable_cpu_data()[0] = Dtype(count-false_num)/count;  }}#ifdef CPU_ONLYSTUB_GPU(PairLossLayer);#endifINSTANTIATE_CLASS(PairLossLayer);REGISTER_LAYER_CLASS(PairLoss);}  // namespace caffe

下面是头文件

#ifndef CAFFE_PAIR_LOSS_LAYER_HPP_#define CAFFE_PAIR_LOSS_LAYER_HPP_#include <vector>#include "caffe/blob.hpp"#include "caffe/layer.hpp"#include "caffe/proto/caffe.pb.h"#include "caffe/layers/loss_layer.hpp"namespace caffe {template <typename Dtype>class PairLossLayer : public LossLayer<Dtype> { public:  explicit PairLossLayer(const LayerParameter& param)      : LossLayer<Dtype>(param) {}  virtual void Reshape(const vector<Blob<Dtype>*>& bottom,      const vector<Blob<Dtype>*>& top);  virtual inline const char* type() const { return "PairLoss"; } protected:  virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom,      const vector<Blob<Dtype>*>& top);  //virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom,      //const vector<Blob<Dtype>*>& top);  virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,      const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);  //virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,      //const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);  bool use_global_stats_ ;  int same_num,all_num,theta_num;  Blob<Dtype> sub_temp_;//x_i-x_j  Blob<Dtype> dot_temp_;//(x_i-x_j)^2};}  // namespace caffe#endif  // CAFFE_PAIR_LOSS_LAYER_HPP_

1 0