深度哈希-DSH

来源：互联网发布：网络作家工作室编辑：程序博客网时间：2024/04/29 00:18

论文：Deep Supervised Hashing for Fast Image Retrieval CVPR2016
源代码： https://github.com/lhmRyan/deep-supervised-hashing-DSH
论文网络结构似乎是CIFAR-10和Siamese两种网络的结合：
这里写图片描述
个人认为有两个创新点：
1、通过设计损失函数，使得最后一层的输出Binary-like。
论文修改了ContrastIve Loss，加上正则项：

这个正则项会使得最终输出的特征b1，b2的取值趋向于-1和+1。

2、generate image pairs online
这里一开始以为使用的是类似Siamese的双分支的网络结构。实则不然，只是如图所示的单支网络，重点是在作者设计的损失函数中。网络的训练是Batch为单位的，比如说每次输入n张图片，对应n个标签，在Loss函数中，使用两层循环来不重复的生成所有可能的图片对（i，j），共n(n−1)2对，当i，j的标签相同即认为相似。这样做的好处自然是节省了很多存储空间和计算成本。文中说：To cover those image pairs across batches, in each iteration the training images are randomly selected from the whole training set.

void HashingLossLayer<Dtype>::Forward_cpu(    const vector<Blob<Dtype>*>& bottom,    const vector<Blob<Dtype>*>& top) {  // initialize parameters  Dtype* bout = bottom[0]->mutable_cpu_diff();//前向传播计算Loss的同时计算梯度  const int num = bottom[0]->num();  const Dtype alpha = top[0]->cpu_diff()[0] / static_cast<Dtype>(num * (num - 1));//top[0]->cpu_diff()[0]保存的是该损失层的权重，默认为1.0  const Dtype beta = top[0]->cpu_diff()[0] / static_cast<Dtype>(num);  const int channels = bottom[0]->channels();  Dtype margin = this->layer_param_.hashing_loss_param().bi_margin();  Dtype tradeoff = this->layer_param_.hashing_loss_param().tradeoff();//两种损失（ContrastiveLoss与RegularizationLoss）权衡系数  const int label_num = bottom[1]->count() / num;  bool sim;  Dtype loss(0.0);//总损失  Dtype reg(0.0);//正则化损失  Dtype data(0.0);//输入向量每维的值  Dtype dist_sq(0.0);//两向量距离的平方  caffe_set(channels*num, Dtype(0), bout);  // calculate loss and gradient  for (int i = 0; i < num; ++i) {    for (int j=i+1; j < num; ++j){      caffe_sub(    channels,    bottom[0]->cpu_data()+(i*channels),  // a    bottom[0]->cpu_data()+(j*channels),  // b    diff_.mutable_cpu_data());  // a_i-b_j      dist_sq = caffe_cpu_dot(channels, diff_.cpu_data(), diff_.cpu_data());  //D_w^2      if (label_num > 1) {  //多标签        sim = caffe_cpu_dot(label_num, bottom[1]->cpu_data() + (i * label_num), bottom[1]->cpu_data() + (j * label_num)) > 0;      }      else {    sim = ((static_cast<int>(bottom[1]->cpu_data()[i])) == (static_cast<int>(bottom[1]->cpu_data()[j])));      }      if (sim) {  // similar pairs        loss += dist_sq;        // gradient with respect to the first sample    caffe_cpu_axpby(          channels,          alpha,          diff_.cpu_data(),          Dtype(1.0),          bout + (i*channels));//计算损失函数对输入向量i的梯度        // gradient with respect to the second sample        caffe_cpu_axpby(          channels,          -alpha,          diff_.cpu_data(),          Dtype(1.0),          bout + (j*channels));//计算损失函数对输入向量j的梯度      }       else {  // dissimilar pairs        loss += std::max(margin - dist_sq, Dtype(0.0));        if ((margin-dist_sq) > Dtype(0.0)) {          // gradient with respect to the first sample          caffe_cpu_axpby(            channels,            -alpha,            diff_.cpu_data(),            Dtype(1.0),            bout + (i*channels));//计算损失函数对输入向量i的梯度          // gradient with respect to the second sample          caffe_cpu_axpby(            channels,            alpha,            diff_.cpu_data(),            Dtype(1.0),            bout + (j*channels));//计算损失函数对输入向量j的梯度        }      }    }//内层循环结束    //只针对一个输入向量而言    for (int k = 0; k < channels;k++){      data = *(bottom[0]->cpu_data()+(i*channels)+k);      // gradient corresponding to the regularizer      //正则化部分的梯度      *(bout + (i*channels) + k) += beta * tradeoff * (((data>=Dtype(1.0))||(data<=Dtype(0.0)&&data>=Dtype(-1.0)))?Dtype(1.0):Dtype(-1.0));      data = std::abs(data)-1;      reg += std::abs(data);//正则化部分的损失    }  }//外层循环结束  //将两段损失各自取平均，然后相加  loss = loss / static_cast<Dtype>(bottom[0]->num()*(bottom[0]->num()-1));  loss += tradeoff * reg /static_cast<Dtype>(bottom[0]->num());  top[0]->mutable_cpu_data()[0] = loss;}

这里写图片描述
可以看到Online方式相比传统的方式收敛速度更快。这是由于每次输入到网络的图片总数目一致的情况下，比如2n张图片，Online方式可以给出2n(2n−1)2个图片对的相关信息，而Offline（以（i，j，sij）方式输入)只能给出n对的相关信息。

论文还提到，最后输出bit位数的设置过大的话容易过拟合，需要先训练bit位数较少的，然后在此基础上微调。
这里写图片描述

0 0