CAFFE源码学习笔记之十-data_layer

来源：互联网发布：淘宝供应商信息调查编辑：程序博客网时间：2024/05/06 23:04

一、前言

CAFFE在搭建CNN网络的时候，第一层就是数据层，所以本节梳理一下同样很庞大的DataLayer层。
先给一个网络结构：
这里写图片描述

Layer类：层的基类;

BaseDataLayer类：数据层的基类;

BasePrefetchingDataLayer类：预取层，主要是预先读取若干批次的数据，平衡CPU与GPU带宽和GPU计算速度。从继承关系可以看出，该层是多线程系统主要发挥作用的地方。
其实多线程系统在caffe中主要就是为GPU服务，准备数据的。

BasePrefetchingDataLayer类要做的就是数据的加工了。这一部分主要完成两件事：

1、确定数据层最终的输出（可以不输出label的）2、完成数据层预处理（通常要做一些白化数据的简单工作，比如减均值，乘系数）

DataLayer类：数据层，网络结构的第一层。Caffe的DataLayer的主要目标是读入两种DB的训练数据作为输入，而两种DB内存储的格式默认是一种叫Datum的数据结构。该层就是将Datum读取到blob中。

message Datum {  optional int32 channels = 1;  optional int32 height = 2;  optional int32 width = 3;  // the actual image data, in bytes  optional bytes data = 4;  optional int32 label = 5;  // Optionally, the datum could also hold float data.  repeated float float_data = 6;  // If true data contains an encoded image that need to be decoded  optional bool encoded = 7 [default = false];}

其余层都是数据的存储层，主要存储的格式有：
1、HDF5格式;包括将数据从硬盘读出，将数据写入硬盘;
2、ImageDataLayer：图像文件直接读取
3、MemoryDatalayer：从内存中读取，直观感觉是速度快;
4、WindowDataLayer：从图像数据的窗口，一般是opencv相关的吧。
5、DummyDataLayer：通过Filler产生的数据。

二、base_data_layer文件

在base_data_layer.hpp和base_data_layer.cpp文件中，分别定义了三个类：BaseDataLayer，Batch，BasePrefetchingDataLayer。
1、Batch类
Batch实际就是数据和标签，其数据类型就是Blob。

template <typename Dtype>class Batch { public:  Blob<Dtype> data_, label_;};

2、BaseDataLayer类
该类是datalayer的基类，其中由该类自己实现的成员函数只有两个：
a、构造函数
由于其继承了Layer类，所以首先构造基类Layer;
然后用transform_param()初始化其成员变量，为转换数据的维度或者预处理做准备。

template <typename Dtype>BaseDataLayer<Dtype>::BaseDataLayer(const LayerParameter& param)    : Layer<Dtype>(param),      transform_param_(param.transform_param()) {}

b、LayerSetUp函数
数据层的初始化，初始化时根据top的大小来确定，如果大小为1，表明只需要输出数据即可，不输出类标志。

template <typename Dtype>void BaseDataLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,const vector<Blob<Dtype>*>& top) {  if (top.size() == 1) {    output_labels_ = false;  } else {    output_labels_ = true;  }  data_transformer_.reset(      new DataTransformer<Dtype>(transform_param_, this->phase_));//初始化DataTransformer实例，进行数据的预处理  data_transformer_->InitRand();  // The subclasses should setup the size of bottom and top  DataLayerSetUp(bottom, top);//实际的Layer初始化是调用DataLayerSetUp函数，对特殊的层进行初始化的。该函数是纯虚函数，继承类必须自己实现。}

3、BasePrefetchingDataLayer 类
该类继承自InternalThread和BaseDataLayer类，所以预取操作是采用多线程的系统。

a、功能
因为GPU计算速度和带宽跟CPU都有较大的差距，所以需要在GPU在计算的时候预先取出若干批次的数据。而该类就是实现这个功能的。
b、成员变量
可以看出，在batch级别的预取操作中，使用了双阻塞队列。

vector<shared_ptr<Batch<Dtype> > > prefetch_：预先读取的若干批次数据的容器;BlockingQueue<Batch<Dtype>*> prefetch_free_;生产者队列BlockingQueue<Batch<Dtype>*> prefetch_full_;消费者队列Batch<Dtype>* prefetch_current_;指向当前批次的数据的指针Blob<Dtype> transformed_data_;需要注意的是之前的成员变量都是batch级别的，而该变量则是Blob型数据。

template <typename Dtype>class BasePrefetchingDataLayer :    public BaseDataLayer<Dtype>, public InternalThread { public:  explicit BasePrefetchingDataLayer(const LayerParameter& param);  // LayerSetUp: implements common data layer setup functionality, and calls  // DataLayerSetUp to do special data layer setup for individual layer types.  // This method may not be overridden.  void LayerSetUp(const vector<Blob<Dtype>*>& bottom,      const vector<Blob<Dtype>*>& top);  virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom,      const vector<Blob<Dtype>*>& top);  virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom,      const vector<Blob<Dtype>*>& top); protected:  virtual void InternalThreadEntry();//开启预取线程  virtual void load_batch(Batch<Dtype>* batch) = 0;//纯虚函数，主要是需要data_layer自己实现  vector<shared_ptr<Batch<Dtype> > > prefetch_;//预取的若干batch  BlockingQueue<Batch<Dtype>*> prefetch_free_;  BlockingQueue<Batch<Dtype>*> prefetch_full_;  Batch<Dtype>* prefetch_current_;//指向当前批次的指针  Blob<Dtype> transformed_data_;//被修正过的数据};

c、构造函数

template <typename Dtype>BasePrefetchingDataLayer<Dtype>::BasePrefetchingDataLayer(    const LayerParameter& param)    : BaseDataLayer<Dtype>(param),      prefetch_(param.data_param().prefetch()),      prefetch_free_(), prefetch_full_(), prefetch_current_() {//默认初始化阻塞队列  for (int i = 0; i < prefetch_.size(); ++i) {    prefetch_[i].reset(new Batch<Dtype>());//根据预取的size初始化prefetch_    prefetch_free_.push(prefetch_[i].get());//根据prefetch_初始化生产者的阻塞队列  }}

d、LayerSetUp函数
初始化相关数据结构之后，开启预取线程。

template <typename Dtype>void BasePrefetchingDataLayer<Dtype>::LayerSetUp(    const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {  BaseDataLayer<Dtype>::LayerSetUp(bottom, top);//调用父类的setup函数  for (int i = 0; i < prefetch_.size(); ++i) {    prefetch_[i]->data_.mutable_cpu_data();    if (this->output_labels_) {      prefetch_[i]->label_.mutable_cpu_data();    }//作者解释：在开启预取线程之前，必须主动调用mutable_cpu_data或者mutable_gpu_data，防止线程同时调用两个函数。这是因为在某些GPU上不这么做会发生错误。  }#ifndef CPU_ONLY  if (Caffe::mode() == Caffe::GPU) {    for (int i = 0; i < prefetch_.size(); ++i) {      prefetch_[i]->data_.mutable_gpu_data();      if (this->output_labels_) {        prefetch_[i]->label_.mutable_gpu_data();      }    }  }#endif  DLOG(INFO) << "Initializing prefetch";  this->data_transformer_->InitRand();//初始化随机数种子  StartInternalThread();//开启预取线程，线程启动的工作是搬运全局资源，初始化boost::thread等。  DLOG(INFO) << "Prefetch initialized.";}

e、InternalThreadEntry()
在之前的internel thread模块提到，InternalThreadEntry()函数没有实现，是在继承类中由继承者实现的。

template <typename Dtype>void BasePrefetchingDataLayer<Dtype>::InternalThreadEntry() {#ifndef CPU_ONLY  cudaStream_t stream;//创建流  if (Caffe::mode() == Caffe::GPU) {    CUDA_CHECK(cudaStreamCreateWithFlags(&stream, cudaStreamNonBlocking));  }#endif  try {    while (!must_stop()) {      Batch<Dtype>* batch = prefetch_free_.pop();//从生产者队列中pop出一个batch的数据      load_batch(batch);//load_batch是纯虚函数，任何继承该类的继承类都必须自己实现。#ifndef CPU_ONLY      if (Caffe::mode() == Caffe::GPU) {        batch->data_.data().get()->async_gpu_push(stream);//如果是GPU模式，则是使用异步流同步向GPU推送数据，该函数在syncmem中就已经总结了。        if (this->output_labels_) {          batch->label_.data().get()->async_gpu_push(stream);        }        CUDA_CHECK(cudaStreamSynchronize(stream));      }#endif      prefetch_full_.push(batch);//将batch装载进消费者队列中    }  } catch (boost::thread_interrupted&) {    // Interrupted exception is expected on shutdown  }#ifndef CPU_ONLY  if (Caffe::mode() == Caffe::GPU) {    CUDA_CHECK(cudaStreamDestroy(stream));//销毁流  }#endif}

f、datalayer中的foward_cpu()

template <typename Dtype>void BasePrefetchingDataLayer<Dtype>::Forward_cpu(    const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {  if (prefetch_current_) {    prefetch_free_.push(prefetch_current_);  }  prefetch_current_ = prefetch_full_.pop("Waiting for data");//从消费者队列中弹出一个batch，这中间会有条件变量进行多线程下的资源同步  // 根据batch的形状修改top的形状  top[0]->ReshapeLike(prefetch_current_->data_);  top[0]->set_cpu_data(prefetch_current_->data_.mutable_cpu_data());//初始化top  if (this->output_labels_) {    // Reshape to loaded labels.    top[1]->ReshapeLike(prefetch_current_->label_);    top[1]->set_cpu_data(prefetch_current_->label_.mutable_cpu_data());  }}

三、data_layer
该层继承自预取层，
头文件如下：

template <typename Dtype>class DataLayer : public BasePrefetchingDataLayer<Dtype> { public:  explicit DataLayer(const LayerParameter& param);  virtual ~DataLayer();  virtual void DataLayerSetUp(const vector<Blob<Dtype>*>& bottom,      const vector<Blob<Dtype>*>& top);  // DataLayer uses DataReader instead for sharing for parallelism？  //我看的这个版本中没有DataReader  virtual inline bool ShareInParallel() const { return false; }//是否并行训练时共享数据  virtual inline const char* type() const { return "Data"; }  virtual inline int ExactNumBottomBlobs() const { return 0; }  virtual inline int MinTopBlobs() const { return 1; }  virtual inline int MaxTopBlobs() const { return 2; } protected:  void Next();//游标移动  bool Skip();//跳过某些数据  virtual void load_batch(Batch<Dtype>* batch);//将图像数据从数据库中读取到batch中//下面三个变量在之前的版本是用DataReader类表示的。现在看样子是没有了。  shared_ptr<db::DB> db_;//数据库格式数据  shared_ptr<db::Cursor> cursor_;//游标，配合数据库取数  uint64_t offset_;//偏移量，在blob中offset可以算出当前图像在batch中的位置};

具体实现：

template <typename Dtype>DataLayer<Dtype>::DataLayer(const LayerParameter& param)  : BasePrefetchingDataLayer<Dtype>(param),    offset_() {//构造函数中的基类开启线程  db_.reset(db::GetDB(param.data_param().backend()));//protobuf参数初始化数据库类型  db_->Open(param.data_param().source(), db::READ);//打开数据库文件  cursor_.reset(db_->NewCursor());//初始化游标}template <typename Dtype>DataLayer<Dtype>::~DataLayer() {  this->StopInternalThread();//析构函数是结束线程}template <typename Dtype>void DataLayer<Dtype>::DataLayerSetUp(const vector<Blob<Dtype>*>& bottom,      const vector<Blob<Dtype>*>& top) {  const int batch_size = this->layer_param_.data_param().batch_size();//每批大小  Datum datum;//表示一个图像数据  datum.ParseFromString(cursor_->value());//从数据库中的map中根据游标读取图像文件  vector<int> top_shape = this->data_transformer_->InferBlobShape(datum);//根据datum的形状推测top的形状  this->transformed_data_.Reshape(top_shape);根据推测出的形状重塑  // Reshape top[0] and prefetch_data according to the batch_size.    // 既然获取了数据的形状(channel,height,width)，那么这里再设置一下batch_size    // top_shape[0]=batch_size    // top_shape[1]=channel    // top_shape[2]=height    // top_shape[3]=width   top_shape[0] = batch_size;  top[0]->Reshape(top_shape);  for (int i = 0; i < this->prefetch_.size(); ++i) {    this->prefetch_[i]->data_.Reshape(top_shape);  }//设置预取数据的形状  LOG_IF(INFO, Caffe::root_solver())      << "output data size: " << top[0]->num() << ","      << top[0]->channels() << "," << top[0]->height() << ","      << top[0]->width();  // label  if (this->output_labels_) {    vector<int> label_shape(1, batch_size);    top[1]->Reshape(label_shape);    for (int i = 0; i < this->prefetch_.size(); ++i) {      this->prefetch_[i]->label_.Reshape(label_shape);    }  }}template <typename Dtype>bool DataLayer<Dtype>::Skip() {  int size = Caffe::solver_count();//并行训练的个数  int rank = Caffe::solver_rank();//并行训练的序号  bool keep = (offset_ % size) == rank ||              // In test mode, only rank 0 runs, so avoid skipping              this->layer_param_.phase() == TEST;  return !keep;//跳过了哪些数据？}template<typename Dtype>void DataLayer<Dtype>::Next() {  cursor_->Next();  if (!cursor_->valid()) {    LOG_IF(INFO, Caffe::root_solver())        << "Restarting data prefetching from start.";    cursor_->SeekToFirst();//说明游标到了末尾  }  offset_++;//游标偏移量的移动}// This function is called on prefetch threadtemplate<typename Dtype>void DataLayer<Dtype>::load_batch(Batch<Dtype>* batch) {//将数据库中的数据载入到batch中。  CPUTimer batch_timer;  batch_timer.Start();  double read_time = 0;  double trans_time = 0;  CPUTimer timer;  CHECK(batch->data_.count());  CHECK(this->transformed_data_.count());  const int batch_size = this->layer_param_.data_param().batch_size();  Datum datum;//单个图像数据  for (int item_id = 0; item_id < batch_size; ++item_id) {    timer.Start();    while (Skip()) {      Next();    }    datum.ParseFromString(cursor_->value());//从数据库中获取的图像数据    read_time += timer.MicroSeconds();    if (item_id == 0) {      //根据每个batch的第一个数据来推测形状      //一个Blob的shape，[batch_size,channels,height,width]，后三个shape都可以由Datum推断出来。      vector<int> top_shape = this->data_transformer_->InferBlobShape(datum);      this->transformed_data_.Reshape(top_shape);      top_shape[0] = batch_size;      batch->data_.Reshape(top_shape);    }//Transformer提供了一个由Datum堆砌成Blob的途径    timer.Start();    int offset = batch->data_.offset(item_id);//根据该批次内的编号设置偏移量    //每个Datum在Blob的偏移位置必须计算出来，只要偏移offset=Blob.offset(i)即可，i 为一个Batch内的样本数据下标//Blob具体的shape必须提前计算出来，而且必须启动SyncedMemory自动机，分配实际内存    Dtype* top_data = batch->data_.mutable_cpu_data();    this->transformed_data_.set_cpu_data(top_data + offset);    this->data_transformer_->Transform(datum, &(this->transformed_data_));    if (this->output_labels_) {      Dtype* top_label = batch->label_.mutable_cpu_data();      top_label[item_id] = datum.label();    }    trans_time += timer.MicroSeconds();    Next();  }  timer.Stop();  batch_timer.Stop();  DLOG(INFO) << "Prefetch batch: " << batch_timer.MilliSeconds() << " ms.";  DLOG(INFO) << "     Read time: " << read_time / 1000 << " ms.";  DLOG(INFO) << "Transform time: " << trans_time / 1000 << " ms.";}

别忘了实例化该类，以及注册层

INSTANTIATE_CLASS(DataLayer);  REGISTER_LAYER_CLASS(Data);

四、总结
在大多数解释该模块的文章中都有datareader这个模块，整个数据层就可以描述成一个两级缓冲的系统。
第一级为从数据库中读取当个的图像文件，按照batch_size
存储在一个batch中。
第二级则是以batch为单位，使用双阻塞队列将若干batch存入prefetch_容器中。
如图可以说明问题：
这里写图片描述
但是我发现现在的版本中没有了DataReader类，而是直接从数据库中读取文件了。不过大致的流程没有改变。

第一级从数据库中将Datum文件按照Blob的格式存放到batch中，根据Blob中总结的偏移量计算得到坐标就可以对号入座了。

0 0