caffe代码阅读3：data_reader、internalthread以及blocking_queue的实现细节-2016.3.15

来源：互联网发布：淘宝网购物女装衬衫编辑：程序博客网时间：2024/05/21 21:35

（1）data_reader.cpp

首先介绍一下boost::weak_ptr;

弱引用是为了解决shared_ptr在循环引用下的内存释放问题而产生的。

弱引用当引用的对象活着的时候不一定存在。仅仅是当它存在的时候的一个引用。弱引用并不修改该对象的引用计数，这意味这弱引用它并不对对象的内存进行管理，在功能上类似于普通指针，然而一个比较大的区别是，弱引用能检测到所管理的对象是否已经被释放，从而避免访问非法内存。

由于弱引用不更改引用计数，类似普通指针，只要把循环引用的一方使用弱引用，即可解除循环引用。

1）DataReader类中的变量：

shared_ptr<Body> body_;static map<const string, boost::weak_ptr<DataReader::Body> > bodies_;const shared_ptr<QueuePair> queue_pair_;

2）此外还有构造函数：

explicit DataReader(const LayerParameter& param);内联函数： inline BlockingQueue<Datum*>& free() const {    return queue_pair_->free_;  }  inline BlockingQueue<Datum*>& full() const {    return queue_pair_->full_;  }

3）除此之外：

内部还定义了一个Body类，该类是继承于InternalThread

内部还定义了一个QueuePair类，该类有free和full函数，该类用于在body和readers之间进行数据分享

（2）此外该类还涉及到另一个类BlockingQueue，该类位于/util/block_queue.hpp里

1）BlockingQueue类有成员函数

void push(const T& t);bool try_pop(T* t);T pop(const string& log_on_wait = "");bool try_peek(T* t);T peek();

2）此外该类内部还有一个sync的类（该类内部有同步机制和互斥机制）

该类的定义如下：

template<typename T>class BlockingQueue<T>::sync { public:  mutable boost::mutex mutex_;  boost::condition_variable condition_;};

该类内部包含一个mutex_互斥量

还有一个条件变量condition_

3）局部的变量有：

std::queue<T> queue_;shared_ptr<sync> sync_;

BlockingQueue的push函数的实现如下：

void BlockingQueue<T>::push(const T& t) {  boost::mutex::scoped_lock lock(sync_->mutex_); //关于锁后面会详细讲  queue_.push(t);  lock.unlock();  sync_->condition_.notify_one();}

首先尝试锁住，然后将数据push到队列（queue_ 是std::queue<T> 类型的），然后unlock，条件变量通知。

BlockingQueue的try_pop函数的实现如下：

template<typename T>bool BlockingQueue<T>::try_pop(T* t) {  boost::mutex::scoped_lock lock(sync_->mutex_); //   if (queue_.empty()) {    return false;  }  *t = queue_.front();  queue_.pop();  return true;}

这里插播一段关于互斥锁的知识：

上述的代码中：

typedef unique_lock<mutex> scoped_lock;

scoped_lock是unique_lock<mutex>类型，因此通过查看boost的文档知道：

std::unique_lock<std::mutex> is the tool of choice when your locking needs are more complex than a simple lock at the beginning followed unconditionally by an unlock at the end.

也就是说当你的锁需求比简单的情况：一般的应用都是以lock开始，然后最后再unlock这样的情况，但是更复杂的时候你就需要scoped_lock。

参考文档：

http://web.archive.org/web/20140531071228/http://home.roadrunner.com/~hinnant/mutexes/locking.html

为了解释这种锁的必要性，考虑下面的例子：

class A{    mutable std::mutex  mut_;    std::vector<double> data_;public:    // ...    A& operator=(const A& rhs)    {        if (this != &rhs)        {            std::unique_lock<std::mutex> lhs_lock(mut_);            std::unique_lock<std::mutex> rhs_lock(rhs.mut_);  // 死锁            // assign data ...            data_ = rhs.data_;        }        return *this;    }    // ...};

如果线程1：

A a1();

另一个线程2复制：

A a2=a1;

而原先的线程1此时再赋值：

a1=a2;

这个时候就死锁了。。。碰到这个问题真是无解。。。

不过幸好我们还有解决方法，可以将上述代码写成：

class A{    mutable std::mutex  mut_;    std::vector<double> data_;public:    // ...    A& operator=(const A& rhs)    {        if (this != &rhs)        {            std::unique_lock<std::mutex> lhs_lock(    mut_, std::defer_lock);  // 其定义为：struct defer_lock_t {};一个空的标记类而已 通常作为参数传入给 unique_lock 或 lock_guard 的构造函数            std::unique_lock<std::mutex> rhs_lock(rhs.mut_, std::defer_lock);            std::lock(lhs_lock, rhs_lock);            // assign data ...            data_ = rhs.data_;        }        return *this;    }    // ...};

通过std::lock同时锁住两个，这样就能防止死锁了。

那么为什么新的代码能够避免这个问题：

a）首先lhs_lock和rhs_lock构建的时候是没有锁住的，因为unique_locks并没有引用他们（用了这个参数std::defer_lock ）

b）std::lock(lhs_lock, rhs_lock);同时所住着两个mutex，而不会死锁，这是它的功能

c）这儿不能用lock_guard是因为lock并不拥有所引用的mutex的模式，如果尝试编译safe_guard的话那么就无法编译

总结：也就是说遇到这种循环引用的，要先构建两个不锁的mutex，然后同时上锁（将两个资源上锁）。错误的代码是先锁住其中一个，然后再锁另一个。。。

这里再插播关于条件变量的知识：

条件变量是提供了一种机制，该机制能够等待另一个线程发来的通知，如果另一个线程满足某个条件的话。通常使用条件变量是这样的，一个线程锁住mutex，然后wait，当该线程醒来的时候会检查条件变量的值是否true，如果是则放行，否则继续睡。。。

为了介绍条件变量，给出下面的例子：

boost::condition_variable cond;boost::mutex mut;bool data_ready;void process_data();void wait_for_data_to_process(){  boost::unique_lock<boost::mutex> lock(mut);  while(!data_ready)// lock保护变量data_ready  {  cond.wait(lock);  }  process_data();}

上述代码的含义是：先定义一个lock，注意，此时是使用的unique_lock，并且mutex是关联上lock，也就是说此时是互斥的，假设处理数据的线程是多个的，然后用条件变量的wait，将线程陷入睡眠

此时另一个线程在准备数据

void retrieve_data();void prepare_data();void prepare_data_for_processing(){  retrieve_data();  prepare_data();  {  boost::lock_guard<boost::mutex> lock(mut);  data_ready=true;// lock保护变量data_ready  }  cond.notify_one();}

当多个准备数据线程坑次坑次把数据搞定后，发送通知，那么原来的线程就醒来开始干活。

接下来继续BlockingQueue的实现代码：

BlockingQueue的pop函数的实现如下：template<typename T>T BlockingQueue<T>::pop(const string& log_on_wait) {  boost::mutex::scoped_lock lock(sync_->mutex_); // 锁住  while (queue_.empty()) {    if (!log_on_wait.empty()) {      LOG_EVERY_N(INFO, 1000)<< log_on_wait;    }    sync_->condition_.wait(lock); // 如果队列一直为空则一直在等待  }  T t = queue_.front(); // 否则取出  queue_.pop();  return t;}

BlockingQueue的try_peek函数的实现如下：

该函数是判断队列首部是不是有数据

template<typename T>bool BlockingQueue<T>::try_peek(T* t) {  boost::mutex::scoped_lock lock(sync_->mutex_);  if (queue_.empty()) {    return false;  }  *t = queue_.front();  return true;}

BlockingQueue的peek 函数的实现如下：

该函数取出队列首部的数据，同样也是使用的条件变量来实现同步

template<typename T>T BlockingQueue<T>::peek() {  boost::mutex::scoped_lock lock(sync_->mutex_);  while (queue_.empty()) {    sync_->condition_.wait(lock);  }  return queue_.front();}

BlockingQueue的size 函数的实现如下：

template<typename T>size_t BlockingQueue<T>::size() const {  boost::mutex::scoped_lock lock(sync_->mutex_);  return queue_.size();}

最后定义了几个类型的BlockingQueue类

template class BlockingQueue<Batch<float>*>;template class BlockingQueue<Batch<double>*>;template class BlockingQueue<Datum*>;template class BlockingQueue<shared_ptr<DataReader::QueuePair> >;template class BlockingQueue<P2PSync<float>*>;template class BlockingQueue<P2PSync<double>*>;

讲完了BlockingQueue类接下来讲DataReader内部的QueuePair类的实现：

首先甩出定义：

class QueuePair {   public:    explicit QueuePair(int size);    ~QueuePair();    BlockingQueue<Datum*> free_;    BlockingQueue<Datum*> full_;  DISABLE_COPY_AND_ASSIGN(QueuePair);  };

从定义里面可以看出定义了两个阻塞队列free_和full_，刚才分析了阻塞队列之后，这次回头看就不懵逼了。

接着看看具体实现：

构造函数做了些啥呢？

就是根据给定的size初始化的若干个Datum（本文最后会给出该数据结构的定义）的实例到free里面。

DataReader::QueuePair::QueuePair(int size) {  // Initialize the free queue with requested number of datums  for (int i = 0; i < size; ++i) {    free_.push(new Datum());  }}

析构函数做了些啥呢？

就是将full_和free_这两个队列里面的Datum对象全部delete。

DataReader::QueuePair::~QueuePair() {  Datum* datum;  while (free_.try_pop(&datum)) {    delete datum;  }  while (full_.try_pop(&datum)) {    delete datum;  }}

接下来看看Body类的实现，该类是继承自InternalThread 这个类的

class Body : public InternalThread {   public:    explicit Body(const LayerParameter& param);    virtual ~Body();   protected:    void InternalThreadEntry();    void read_one(db::Cursor* cursor, QueuePair* qp);    const LayerParameter param_;    BlockingQueue<shared_ptr<QueuePair> > new_queue_pairs_;    friend class DataReader;  DISABLE_COPY_AND_ASSIGN(Body);  };

Body里面重写了InternalThread内部的InternalThreadEntry函数，此外还添加了read_one函数

Body内部有DataReader的友元，以及BlockingQueue<shared_ptr<QueuePair> > new_queue_pairs_;

为了弄清楚究竟干啥，有必要了解InternalThread这个类究竟干了哪些工作？

InternalThread类实际上就是boost库的thread的封装

首先看看该类的定义是啥：

class InternalThread { public:  // 构造函数和析构函数  InternalThread() : thread_() {}  virtual ~InternalThread();  /**   * Caffe's thread local state will be initialized using the current   * thread values, e.g. device id, solver index etc. The random seed   * is initialized using caffe_rng_rand.     *  caffe的线程局部状态将会使用当前线程值来进行初始化，当前的线程的值有设备id，solver的编号、随机数种子等   */  void StartInternalThread();  /** Will not return until the internal thread has exited. */  // 是否知道线程退出才返回  void StopInternalThread();  // 线程是否已经起来了  bool is_started() const; protected:  /* Implement this method in your subclass      with the code you want your thread to run. */  // 定义了一个虚函数，要求继承该类的必须要实现之  virtual void InternalThreadEntry() {}  /* Should be tested when running loops to exit when requested. */  // 在当请求退出的时候应该调用该函数  bool must_stop(); private:  void entry(int device, Caffe::Brew mode, int rand_seed, int solver_count,      bool root_solver);  // 内部的成员变量  shared_ptr<boost::thread> thread_;};}  // namespace caffe好了，看完类的定义代码的注释之后。我们来看看具体的实现namespace caffe {// 析构函数，调用停止内部线程函数InternalThread::~InternalThread() {  StopInternalThread();}// 测试线程是否起来bool InternalThread::is_started() const {  return thread_ && thread_->joinable(); // 首先thread_指针不能为空，然后该线程是可等待的（joinable）}bool InternalThread::must_stop() {  //  if interruption has been requested for the current thread, false otherwise. 见boost的doc  return thread_ && thread_->interruption_requested();}// 初始化工作，然后void InternalThread::StartInternalThread() {  CHECK(!is_started()) << "Threads should persist and not be restarted.";  int device = 0;#ifndef CPU_ONLY  CUDA_CHECK(cudaGetDevice(&device));#endif  Caffe::Brew mode = Caffe::mode();  int rand_seed = caffe_rng_rand();  int solver_count = Caffe::solver_count();  bool root_solver = Caffe::root_solver();  try {// 然后重新实例化一个thread对象给thread_指针，该线程的执行的是entry函数    thread_.reset(new boost::thread(&InternalThread::entry, this, device, mode,          rand_seed, solver_count, root_solver));  } catch (std::exception& e) {    LOG(FATAL) << "Thread exception: " << e.what();  }}// 线程所要执行的函数void InternalThread::entry(int device, Caffe::Brew mode, int rand_seed,    int solver_count, bool root_solver) {#ifndef CPU_ONLY  CUDA_CHECK(cudaSetDevice(device));#endif  Caffe::set_mode(mode);  Caffe::set_random_seed(rand_seed);  Caffe::set_solver_count(solver_count);  Caffe::set_root_solver(root_solver);  InternalThreadEntry();}// 停止线程void InternalThread::StopInternalThread() {  if (is_started()) {// 如果线程已经开始    thread_->interrupt();// 那么打断    try {      thread_->join();// 等待线程结束    } catch (boost::thread_interrupted&) {//如果被打断，啥也不干，因为是自己要打断的^_^    } catch (std::exception& e) {// 如果发生其他错误则记录到日志      LOG(FATAL) << "Thread exception: " << e.what();    }  }}}  // namespace caffe

总结一下：无非就是获取线程的状态、启动线程、以及定义的线程入口函数InternalThread::entry ，这个入口函数很有意思，里面调用了虚函数InternalThreadEntry，并且在调用之前，帮用户做好了初始化的工作（随机数种子，CUDA、工作模式及GPU还是CPU、solver的类型）。

好了插播了这么多，咱们回头继续看Body类的情况，

class Body : public InternalThread {   public:    explicit Body(const LayerParameter& param);    virtual ~Body();   protected:    void InternalThreadEntry();    void read_one(db::Cursor* cursor, QueuePair* qp);    const LayerParameter param_;    BlockingQueue<shared_ptr<QueuePair> > new_queue_pairs_;    friend class DataReader;  DISABLE_COPY_AND_ASSIGN(Body);  };

Body类里面果然重写了InternalThread的虚函数InternalThreadEntry。

我们来看看Body的情况

//Body类的构造函数，实际上是给定网络的参数，然后开始启动内部线程DataReader::Body::Body(const LayerParameter& param)    : param_(param),      new_queue_pairs_() {  StartInternalThread();// 调用InternalThread内部的函数来初始化运行环境以及新建线程去执行虚函数InternalThreadEntry的内容}// 析构，停止线程DataReader::Body::~Body() {  StopInternalThread();}// 自己实现的需要执行的函数// 首先打开数据库，然后设置游标，然后设置QueuePair指针容器void DataReader::Body::InternalThreadEntry() {  // 获取所给定的数据源的类型来得到DB的指针  shared_ptr<db::DB> db(db::GetDB(param_.data_param().backend()));  // 从网络参数中给定的DB的位置打开DB  db->Open(param_.data_param().source(), db::READ);  // 新建游标指针  shared_ptr<db::Cursor> cursor(db->NewCursor());  // 新建QueuePair指针容器，QueuePair里面包含了free_和full_这两个阻塞队列  vector<shared_ptr<QueuePair> > qps;  try {    // 根据网络参数的阶段来设置solver_count    int solver_count = param_.phase() == TRAIN ? Caffe::solver_count() : 1;    // To ensure deterministic runs, only start running once all solvers    // are ready. But solvers need to peek on one item during initialization,    // so read one item, then wait for the next solver.    for (int i = 0; i < solver_count; ++i) {      shared_ptr<QueuePair> qp(new_queue_pairs_.pop());      read_one(cursor.get(), qp.get());// 读取一个数据      qps.push_back(qp);压入    }    // Main loop    while (!must_stop()) {      for (int i = 0; i < solver_count; ++i) {        read_one(cursor.get(), qps[i].get());      }      // Check no additional readers have been created. This can happen if      // more than one net is trained at a time per process, whether single      // or multi solver. It might also happen if two data layers have same      // name and same source.      CHECK_EQ(new_queue_pairs_.size(), 0);    }  } catch (boost::thread_interrupted&) {    // Interrupted exception is expected on shutdown  }}

// 从数据库中获取一个数据void DataReader::Body::read_one(db::Cursor* cursor, QueuePair* qp) {  // 从QueuePair中的free_队列pop出一个  Datum* datum = qp->free_.pop();  // TODO deserialize in-place instead of copy?  // 然后解析cursor中的值  datum->ParseFromString(cursor->value());  // 然后压入QueuePair中的full_队列  qp->full_.push(datum);  // go to the next iter  // 游标指向下一个  cursor->Next();  if (!cursor->valid()) {    DLOG(INFO) << "Restarting data prefetching from start.";    cursor->SeekToFirst();// 如果游标指向的位置已经无效了则指向第一个位置  }}

OK接下来我们收拾DataReader类剩下的部分，这里我就偷个懒把DataReader类的所有代码的注释都贴上去。

#include <boost/thread.hpp>#include <map>#include <string>#include <vector>#include "caffe/common.hpp"#include "caffe/data_reader.hpp"#include "caffe/layers/data_layer.hpp"#include "caffe/proto/caffe.pb.h"namespace caffe {// 用于解决share_ptr在循环引用的时候的内存释放using boost::weak_ptr;map<const string, weak_ptr<DataReader::Body> > DataReader::bodies_;static boost::mutex bodies_mutex_;// 构造函数，传入的是网络的参数、// 初始化queue_pair_（里面包含两个阻塞队列free_和full_）DataReader::DataReader(const LayerParameter& param)    : queue_pair_(new QueuePair(  //        param.data_param().prefetch() * param.data_param().batch_size())) {  // Get or create a body  // 首先创建或者获取一个body实例  boost::mutex::scoped_lock lock(bodies_mutex_);  string key = source_key(param);// 从网络参数中获取key  weak_ptr<Body>& weak = bodies_[key];// bodies_是存放的string到Body的映射  body_ = weak.lock();  if (!body_) {// 如果bodies是空的    body_.reset(new Body(param));// 则新建Body实例到body_    bodies_[key] = weak_ptr<Body>(body_);// 然后存放到bodies_中去  }  body_->new_queue_pairs_.push(queue_pair_); // 并将queue_pair放入body_中的new_queue_pairs_中去}// 析构函数DataReader::~DataReader() {  string key = source_key(body_->param_);  body_.reset();  boost::mutex::scoped_lock lock(bodies_mutex_);// 上锁  if (bodies_[key].expired()) {    bodies_.erase(key);// map里面的erase  }}//DataReader::QueuePair::QueuePair(int size) {  // Initialize the free queue with requested number of datums  // 一开始全部压入free  for (int i = 0; i < size; ++i) {    free_.push(new Datum());  }}// 删除free_和full_内的datumDataReader::QueuePair::~QueuePair() {  Datum* datum;  while (free_.try_pop(&datum)) {    delete datum;  }  while (full_.try_pop(&datum)) {    delete datum;  }}//Body类的构造函数，实际上是给定网络的参数，然后开始启动内部线程DataReader::Body::Body(const LayerParameter& param)    : param_(param),      new_queue_pairs_() {  StartInternalThread();// 调用InternalThread内部的函数来初始化运行环境以及新建线程去执行虚函数InternalThreadEntry的内容}// 析构，停止线程DataReader::Body::~Body() {  StopInternalThread();}// 自己实现的需要执行的函数// 首先打开数据库，然后设置游标，然后设置QueuePair指针容器void DataReader::Body::InternalThreadEntry() {  // 获取所给定的数据源的类型来得到DB的指针  shared_ptr<db::DB> db(db::GetDB(param_.data_param().backend()));  // 从网络参数中给定的DB的位置打开DB  db->Open(param_.data_param().source(), db::READ);  // 新建游标指针  shared_ptr<db::Cursor> cursor(db->NewCursor());  // 新建QueuePair指针容器，QueuePair里面包含了free_和full_这两个阻塞队列  vector<shared_ptr<QueuePair> > qps;  try {    // 根据网络参数的阶段来设置solver_count    int solver_count = param_.phase() == TRAIN ? Caffe::solver_count() : 1;    // To ensure deterministic runs, only start running once all solvers    // are ready. But solvers need to peek on one item during initialization,    // so read one item, then wait for the next solver.    for (int i = 0; i < solver_count; ++i) {      shared_ptr<QueuePair> qp(new_queue_pairs_.pop());      read_one(cursor.get(), qp.get());// 读取一个数据      qps.push_back(qp);压入    }    // Main loop    while (!must_stop()) {      for (int i = 0; i < solver_count; ++i) {        read_one(cursor.get(), qps[i].get());      }      // Check no additional readers have been created. This can happen if      // more than one net is trained at a time per process, whether single      // or multi solver. It might also happen if two data layers have same      // name and same source.      CHECK_EQ(new_queue_pairs_.size(), 0);    }  } catch (boost::thread_interrupted&) {    // Interrupted exception is expected on shutdown  }}// 从数据库中获取一个数据void DataReader::Body::read_one(db::Cursor* cursor, QueuePair* qp) {  // 从QueuePair中的free_队列pop出一个  Datum* datum = qp->free_.pop();  // TODO deserialize in-place instead of copy?  // 然后解析cursor中的值  datum->ParseFromString(cursor->value());  // 然后压入QueuePair中的full_队列  qp->full_.push(datum);  // go to the next iter  // 游标指向下一个  cursor->Next();  if (!cursor->valid()) {    DLOG(INFO) << "Restarting data prefetching from start.";    cursor->SeekToFirst();// 如果游标指向的位置已经无效了则指向第一个位置  }}}  // namespace caffe

总结：实际上该数据层就是调用了封装层的DB来读取数据，此外还简单封装了boost的线程库，然后自己封装了个阻塞队列。

最后还有Datum究竟是哈

可以看caffe.proto文件中的定义

message Datum {

optional int32 channels = 1;

optional int32 height = 2;

optional int32 width = 3;

// the actual image data, in bytes

optional bytes data = 4;

optional int32 label = 5;

// Optionally, the datum could also hold float data.

repeated float float_data = 6;

// If true data contains an encoded image that need to be decoded

optional bool encoded = 7 [default = false];

}

参考：

[1]我猜你有可能需要boost的知识

关于unique_lock

http://zh.cppreference.com/w/cpp/thread/unique_lock

file:///C:/Program%20Files/boost_1_60_0/doc/html/thread/synchronization.html#thread.synchronization.mutex_types.mutex

关于同步机制的(Handling mutexes in C++)

http://web.archive.org/web/20140531071228/http://home.roadrunner.com/~hinnant/mutexes/locking.html

[2]如果你安装了boost的文档，你可以在找到关于线程的知识

file:///C:/Program%20Files/boost_1_60_0/doc/html/thread/thread_management.html#thread.thread_management.this_thread.interruption_requested

http://blog.chinaunix.net/uid-23093301-id-86385.html

[3]关于弱指针的知识

http://blog.csdn.net/mmzsyx/article/details/8090849

http://baike.baidu.com/link?url=-mb6Lc2iMwP0kzcAyszaJ1gugtcnlSLHeq2UT5SGdVXVgsg_ppDcin4PLTVrfAlsrm4t5focfsS9d9-Z-ZOWBq

http://www.cnblogs.com/TianFang/archive/2008/09/20/1294590.html

0 0