segnet进行batch normalize操作时accuracy层报label值无效的问题分析

来源:互联网 发布:淘宝网购物流程图 编辑:程序博客网 时间:2024/06/06 08:25

问题描述

根据训练出来的权重文件,参考segnet-tutorial的说明,进行batch normalize操作:

python ./Segnet/Scripts/compute_bn_statistics.py ./SegNet/Models/segnet_basic_train.prototxt ./SegNet/Models/Training/segnet_basic_iter_5000.caffemodel ./Segnet/Models/Inference/

出现错误:

03 13:48:10.377765 11423 accuracy_layer.cpp:72] Check failed: label_value < num_labels (11 vs. 11)

原因分析

为什么在训练的时候就没有出错?
一个原因是训练的时候,由于test_iter的设置,一直没有进行test,所以没有进入accuracy层进行处理,这次是在accuracy层报错的。

首先看segnet_basic_train.prototxt文件的最后面的layer的定义:

layer {  name: "conv_classifier"  type: "Convolution"  bottom: "conv_decode1"  top: "conv_classifier"  param {    lr_mult: 1    decay_mult: 1  }  param {    lr_mult: 2    decay_mult: 0  }  convolution_param {    num_output: 11    kernel_size: 1    weight_filler {      type: "msra"    }    bias_filler {      type: "constant"    }  }}layer {  name: "loss"  type: "SoftmaxWithLoss"  bottom: "conv_classifier"  bottom: "label"  top: "loss"  softmax_param {engine: CAFFE}  loss_param: {    weight_by_label_freqs: true    ignore_label: 11    class_weighting: 0.2595    class_weighting: 0.1826    class_weighting: 4.5640    class_weighting: 0.1417    class_weighting: 0.9051    class_weighting: 0.3826    class_weighting: 9.6446    class_weighting: 1.8418    class_weighting: 0.6823    class_weighting: 6.2478    class_weighting: 7.3614  }}layer {  name: "accuracy"  type: "Accuracy"  bottom: "conv_classifier"  bottom: "label"  top: "accuracy"  top: "per_class_accuracy"}

由layer:conv_classifier的filter的个数可以知道,确实是11个类别,所以,label的有效值应该在0-10,但为什么label文件里面会读出11这个数值呢?而且在softmaxloss层也看到了一个参数:

ignore_label: 11

那么就是说,label里确实会出现11这个数值的。

为了验证这个label的数值,在softmax_loss_layer.cu文件进行输出log:

template <typename Dtype>void SoftmaxWithLossLayer<Dtype>::Forward_gpu(    const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {  softmax_layer_->Forward(softmax_bottom_vec_, softmax_top_vec_);  const Dtype* prob_data = prob_.gpu_data();  const Dtype* label = bottom[1]->gpu_data();//---一下新增  printf("SoftmaxLossForwardGPU has_ignore_label_=%d,bottom[1].0=%d,bottom[1].1=%d,.2=%d,.3=%d\n",has_ignore_label_,bottom[1]->shape(0),bottom[1]->shape(1),          bottom[1]->shape(2),bottom[1]->shape(3));  const Dtype* bottom_label=bottom[1]->cpu_data();  for (int i = 0; i < 1; ++i) {    for (int j = 0; j < 360*480; ++j) {      const int label_value =          static_cast<int>(bottom_label[i * inner_num_ + j]);      if (has_ignore_label_ && label_value == ignore_label_) {          printf("ignore SoftmaxLossForwardGPU.label_value=%d\n",label_value);        continue;      }else{          printf("%d ",label_value);      }    }  }//---以上新增

上面代码是参考accuracy层的写的,因为我的train的batchsize为1,图片的规格为480*360,所以上面的循环就可以那样写。

输出:

4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 ignore SoftmaxLossForwardGPU.label_value=114 4 4 4 4 4 4 4 4 4 2 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 4 4 4 4 ignore SoftmaxLossForwardGPU.label_value=11ignore SoftmaxLossForwardGPU.label_value=11ignore SoftmaxLossForwardGPU.label_value=11ignore SoftmaxLossForwardGPU.label_value=11ignore SoftmaxLossForwardGPU.label_value=114 4 4 4 4 4 ignore SoftmaxLossForwardGPU.label_value=11ignore SoftmaxLossForwardGPU.label_value=11ignore SoftmaxLossForwardGPU.label_value=11ignore SoftmaxLossForwardGPU.label_value=11ignore SoftmaxLossForwardGPU.label_value=114 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 2 2 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 4 4 4 ignore SoftmaxLossForwardGPU.label_value=11ignore SoftmaxLossForwardGPU.label_value=11

那么可以知道,label的11这个值是确实存在的,就排除了label文件有问题的原因,那么就可以分析为什么softmaxlosslayer没有报错,而accuracy会报错的原因了。
分析了其层定义,可以发现softmaxloss层的参数中有一个定义,ignore_label:11 。
结合softmaxloss的代码:

template <typename Dtype>__global__ void SoftmaxLossForwardGPU(const int nthreads,          const Dtype* prob_data, const Dtype* label,           const bool weight_by_label_freqs, const float* label_counts,          Dtype* loss, const int num, const int dim, const int spatial_dim,          const bool has_ignore_label_, const int ignore_label_,          Dtype* counts) {  CUDA_KERNEL_LOOP(index, nthreads) {    const int n = index / spatial_dim;    const int s = index % spatial_dim;    const int label_value = static_cast<int>(label[n * spatial_dim + s]);    if (has_ignore_label_ && label_value == ignore_label_) {      loss[index] = 0;      counts[index] = 0;    } else {      loss[index] = -log(max(prob_data[n * dim + label_value * spatial_dim + s],                      Dtype(FLT_MIN)));      if (weight_by_label_freqs) {        loss[index] *= static_cast<Dtype>(label_counts[label_value]);      }      counts[index] = 1;    }  }}

可以看到,对于ignore_label,会进行特殊处理了。

在accuracy layer的代码中,也可以看到有对ignore_label的过滤处理:

template <typename Dtype>void AccuracyLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,    const vector<Blob<Dtype>*>& top) {  Dtype accuracy = 0;  const Dtype* bottom_data = bottom[0]->cpu_data();  const Dtype* bottom_label = bottom[1]->cpu_data();  const int dim = bottom[0]->count() / outer_num_;  const int num_labels = bottom[0]->shape(label_axis_);  vector<Dtype> maxval(top_k_+1);  vector<int> max_id(top_k_+1);  vector<Dtype> accuracies(num_labels, 0);  vector<Dtype> nums(num_labels, 0);  int count = 0;  for (int i = 0; i < outer_num_; ++i) {    for (int j = 0; j < inner_num_; ++j) {      const int label_value =          static_cast<int>(bottom_label[i * inner_num_ + j]);      if (has_ignore_label_ && label_value == ignore_label_) {        continue;      }      DCHECK_GE(label_value, 0);      DCHECK_LT(label_value, num_labels);

所以,解决这个问题的直接方法就是在accuracy网络层增加这个ignore_label参数。

layer {  name: "accuracy"  type: "Accuracy"  bottom: "conv_classifier"  bottom: "label"  top: "accuracy"  top: "per_class_accuracy"  accuracy_param:{    ignore_label: 11  }}

具体的写法(如accuracy_param这个变量名),参考caffe.proto文件。

结果验证

gumh@gumh-B85M-DS3H-A:~/OpenSource/SegNet$ python ./Scripts/compute_bn_statistics.py ./Models/segnet_basic_train.prototxt ./Models/Training_better/segnet_basic_iter_5000.caffemodel ./Models/Inference/Building BN calc net...Calculate BN stats...WARNING: Logging before InitGoogleLogging() is written to STDERRI1103 15:02:45.675106 21206 net.cpp:42] Initializing net from parameters: name: "segnet"。。。I1103 15:02:46.122388 21210 dense_image_data_layer.cpp:201] label values:progress: 1/367I1103 15:02:46.472606 21211 dense_image_data_layer.cpp:201] label values:progress: 2/367。。。I1103 15:04:54.483098 21602 dense_image_data_layer.cpp:201] label values:progress: 364/367I1103 15:04:54.831765 21603 dense_image_data_layer.cpp:201] label values:progress: 365/367I1103 15:04:55.179713 21604 dense_image_data_layer.cpp:201] label values:progress: 366/367New data:[u'conv3_bn', u'conv1_bn', u'conv2_bn', u'conv_decode4_bn', u'conv4_bn', u'conv_decode3_bn', u'conv_decode1_bn', u'conv_decode2_bn'][u'conv3_bn', u'conv1_bn', u'conv2_bn', u'conv_decode4_bn', u'conv4_bn', u'conv_decode3_bn', u'conv_decode1_bn', u'conv_decode2_bn']Saving test net weights...donegumh@gumh-B85M-DS3H-A:~/OpenSource/SegNet$ 

为什么会有一个11的label出来呢?这个类别是表示其他的不属于任何以上11中类别的物体。

0 0