opencv中HOG源码学习笔记

来源：互联网发布：淄博seo交流编辑：程序博客网时间：2024/05/17 13:14

1、opencv中HOG源码中出现的一些关键字，CV_WRAP and CV_EXPORTS_W

struct CV_EXPORTS_W HOGDescriptor { public: enum { L2Hys=0 }; enum { DEFAULT_NLEVELS=64 };CV_WRAP HOGDescriptor() : winSize(64,128), blockSize(16,16), blockStride(8,8),    cellSize(8,8), nbins(9), derivAperture(1), winSigma(-1),    histogramNormType(HOGDescriptor::L2Hys), L2HysThreshold(0.2), gammaCorrection(true),    nlevels(HOGDescriptor::DEFAULT_NLEVELS){}

这是一个网友的回答

CV_EXPORTS_W is defined inmodules/core/include/opencv2/core/types_c.h as alias for CV_EXPORTS, CV_EXPORTSis defined as:

#if (defined WIN32 || defined _WIN32 || defined WINCE) && defined CVAPI_EXPORTS# define CV_EXPORTS __declspec(dllexport)#else# define CV_EXPORTS#endif

So it's alias for __declspec(dllexport) on Windows platform where CVAPI_EXPORTS is defined, otherwise it's empty.

CV_WARP is used as flag for scripts to create wrappers of the function or method. It is used for creation Python or Java wrappers.

大体意思是说，CV_WARP是一种函数或者方法的封装类，而CVAPI_EXPORTS _W是CVAPI_EXPORTS 的别称，CVAPI_EXPORTS 又是__declspec(dllexport) 的一个别称，

使用 __declspec(dllexport) 关键字从 DLL 导出数据、函数、类或类成员函数。（http://msdn.microsoft.com/zh-cn/library/a90k134d(v=vs.80).aspx）

2、openCv中的一个函数：

int cvRound (double value)

对一个double型的数进行四舍五入，并返回一个整型数！

函数 cvRound, cvFloor, cvCeil 用一种舍入方法将输入浮点数转换成整数。 cvRound 返回和参数最接近的整数值。 cvFloor 返回不大于参数的最大整数值。

cvCeil 返回不小于参数的最小整数值。

3、从现在开始我们要看一些TBB里更实在的一些东西了，之所以说它实在，是因为这些内容是切实地能帮助我们去解决一些并行编程里的问题。

首先看的也是最简单的parallel_for。

我们还是先从一个例子开始看起：

问题：对一个数组里的每个元素施加一个操作Foo(...)

串行化的版本：

1 void SerialApplyFoo (int a[], size_t n) {2     for (size_t i = 0; i < n; ++ i)3 Foo(a[i]);4 }

使用TBB并行化的版本：

1 #include "tbb/task_scheduler_init.h" 2 #include "tbb/blocked_range.h" 3 #include "tbb/parallel_for.h" 4 5 using namespace tbb;6 7 // 对每个元素执行该操作 8 void Foo(int value)9 {10     // Applied function 11 }12  13 class ApplyFoo14 {15     int * const my_a;16 public:17     void operator () (const blocked_range<size_t> & r) const 18     {19         int * a = my_a;20         for (size_t i = r.begin(); i != r.end(); ++ i)21             Foo(a[i]);22     }23     24     ApplyFoo(int a[]) : my_a(a) {}25 };26  27 int main(int argc, char* argv[])28 {29     // 创建task scheduler 30     task_scheduler_init init;31         const int n = 100;32     int a[n];33     for (int i = 0; i < n; i ++)34         a[i] = i;35     // TBB会把数组分成若干的block36     // 对block调用ApplyFoo这个functor 37     parallel_for(blocked_range<size_t>(0, n), ApplyFoo(a), simple_partitioner());38     return 0;39 }

这个其实就是我们最早开始看TBB时的一个例子。

我们看到这里面多了好几个陌生的东西：

blocked_range

parallel_for

block

OK，我们一个个来看，先说blocked_range，这个template class表述了一个一维迭代（iterator）。同样的，我们先来看看它的declaration（部分无关代码已裁减），在tbb/blocked_range.h里：

#ifndef __TBB_blocked_range_H#define __TBB_blocked_range_H#include "tbb_stddef.h"namespace tbb {/** \page range_req Requirements on range concept    Class \c R implementing the concept of range must define:    - \code R::R( const R& ); \endcode               Copy constructor    - \code R::~R(); \endcode                        Destructor    - \code bool R::is_divisible() const; \endcode   True if range can be partitioned into two subranges    - \code bool R::empty() const; \endcode          True if range is empty    - \code R::R( R& r, split ); \endcode            Split range \c r into two subranges.**///! A range over which to iterate./** @ingroup algorithms */template<typename Value>class blocked_range {public:    //! Type of a value    /** Called a const_iterator for sake of algorithms that need to treat a blocked_range        as an STL container. */    typedef Value const_iterator;    //! Type for size of a range    typedef std::size_t size_type;    //! Construct range with default-constructed values for begin and end.    /** Requires that Value have a default constructor. */    blocked_range() : my_begin(), my_end() {}    //! Construct range over half-open interval [begin,end), with the given grainsize.    blocked_range( Value begin_, Value end_, size_type grainsize_=1 ) :         my_end(end_), my_begin(begin_), my_grainsize(grainsize_)     {        __TBB_ASSERT( my_grainsize>0, "grainsize must be positive" );    }    //! Beginning of range.    const_iterator begin() const {return my_begin;}    //! One past last value in range.    const_iterator end() const {return my_end;}    //! Size of the range    /** Unspecified if end()<begin(). */    size_type size() const {        __TBB_ASSERT( !(end()<begin()), "size() unspecified if end()<begin()" );        return size_type(my_end-my_begin);    }    //! The grain size for this range.    size_type grainsize() const {return my_grainsize;}    //------------------------------------------------------------------------    // Methods that implement Range concept    //------------------------------------------------------------------------    //! True if range is empty.    bool empty() const {return !(my_begin<my_end);}    //! True if range is divisible.    /** Unspecified if end()<begin(). */    bool is_divisible() const {return my_grainsize<size();}    //! Split range.      /** The new Range *this has the second half, the old range r has the first half.         Unspecified if end()<begin() or !is_divisible(). */    blocked_range( blocked_range& r, split ) :         my_end(r.my_end),        my_begin(do_split(r)),        my_grainsize(r.my_grainsize)    {}private:    /** NOTE: my_end MUST be declared before my_begin, otherwise the forking constructor will break. */    Value my_end;    Value my_begin;    size_type my_grainsize;    //! Auxilary function used by forking constructor.    /** Using this function lets us not require that Value support assignment or default construction. */    static Value do_split( blocked_range& r ) {        __TBB_ASSERT( r.is_divisible(), "cannot split blocked_range that is not divisible" );        Value middle = r.my_begin + (r.my_end-r.my_begin)/2u;        r.my_end = middle;        return middle;    }    template<typename RowValue, typename ColValue>    friend class blocked_range2d;    template<typename RowValue, typename ColValue, typename PageValue>    friend class blocked_range3d;};} // namespace tbb#endif /* __TBB_blocked_range_H */

从代码里可以看到blocked_range有3个constructor，一个不接收参数，一个处理split（split的概念后面再讲），而我们示例里用到的是：

blocked_range( Value begin_, Value end_, size_type grainsize_=1 ) :         my_end(end_), my_begin(begin_), my_grainsize(grainsize_)     {        __TBB_ASSERT( my_grainsize>0, "grainsize must be positive" );    }

第一个参数表示起始，第二个参数表示结束，它们的类型为const_iterator，表示的区间为[begin，end)这样一个半开区间。

第三个参数，grainsize，表示的是一个“合适的大小”块，这个块会在一个循环中进行处理，如果数组比这个grainsize还大，parallel_for会把它分割为独立的block，然后分别进行调度（有可能由多个线程进行处理）。

这样我们知道，grainsize其实决定了TBB什么时候对数据进行划分，如果我们把grainsize指定得太小，那就可能会导致产生过多得block，从而使得不同block间的overhead增加（比如多个线程间切换的代价），有可能会使性能下降。相反，如果grainsize设得太大，以致于这个数组几乎没有被划分，那又会导致不能发挥parallel_for期望达到的并行效果，也没有达到理想得性能。

所以我们在决定grainsize时需要小心，最好是能够经过调整测试后得到的值，当然你也可以如本例中一样不指定，让TBB帮你来决定合适的值（一般不是最优的）。

一个调整grainsize的经验性步骤：

1首先把grainsize设得比预想的要大一些，通常设为10000

2在单处理机机器上运行，得到性能数据

3把grainsize减半，看性能降低多少，如果降低在5%-10%之间，那这个grainsize就已经是一个不错的设定了

partitioner

看完blocked_range，再来看跟它很关联的另一个概念partitioner，顾名思义，partitioner就是指示怎么进行划分block的东东。在示例的parallel_for调用中，第3个参数就指定了一个partitioner，这里我们使用的是simple_partitioner。

TBB里提供了两个partitioner，一个是我们用到的simple_partitioner，另一个是auto_partitioner。

simple_partitioner是parallel_for（以及后面会讲到的parallel_reduce，parallel_scan）的缺省partitioner。simple_partitioner有如下特性：

1 概念简单

2 确保分割不会超过grainsize大小，这样你可以假定operator()的最大范围不会超过grainsize

3它可以针对特定机器调节

simple_partitioner的缺点在于它需要你确定出一个合适的grainsize，而合适的grainsize并不是那么容易得到的。

另一个partitioner：auto_partitioner，它依赖于一定的规则自动决定划分，在线程间负载均衡和线程切换代价间寻找一个平衡，当然普适的一般就不是对于所有都是最好的～～～

如果我们想用auto_partitioner，那只要把示例里的simple_partitioner替换一下即可，要注意的是，既然auto_partitioner是自动决定分割的，那指定的grainsize就没有太大意义了。

一般情况下，建议使用auto_partitioner，除非你有足够的时间和精力去优化出一个比较好的grainsize～～～

parallel_for

最后，终于看到我们最关键的主题了：parallel_for，这是一个算法，类似于STL里的sort、for_each等。

直接步入主题，我们来看parallel_for的源码吧，在tbb/parallel_for.h中：

#ifndef __TBB_parallel_for_H#define __TBB_parallel_for_H#include "task.h"#include "partitioner.h"#include <new>namespace tbb {//! @cond INTERNALnamespace internal {    //! Task type used in parallel_for    /** @ingroup algorithms */    template<typename Range, typename Body, typename Partitioner>    class start_for: public task {        Range my_range;        const Body my_body;        typename Partitioner::partition_type my_partition;        /*override*/ task* execute();        //! Constructor for root task.        start_for( const Range& range, const Body& body, Partitioner& partitioner ) :            my_range(range),                my_body(body),            my_partition(partitioner)        {        }        //! Splitting constructor used to generate children.        /** this becomes left child.  Newly constructed object is right child. */        start_for( start_for& parent, split ) :            my_range(parent.my_range,split()),                my_body(parent.my_body),            my_partition(parent.my_partition,split())        {            my_partition.set_affinity(*this);        }        //! Update affinity info, if any.        /*override*/ void note_affinity( affinity_id id ) {            my_partition.note_affinity( id );        }    public:        static void run(  const Range& range, const Body& body, const Partitioner& partitioner ) {            if( !range.empty() ) {#if !__TBB_EXCEPTIONS || TBB_JOIN_OUTER_TASK_GROUP                start_for& a = *new(task::allocate_root()) start_for(range,body,const_cast<Partitioner&>(partitioner));#else                // Bound context prevents exceptions from body to affect nesting or sibling algorithms,                // and allows users to handle exceptions safely by wrapping parallel_for in the try-block.                task_group_context context;                start_for& a = *new(task::allocate_root(context)) start_for(range,body,const_cast<Partitioner&>(partitioner));#endif /* __TBB_EXCEPTIONS && !TBB_JOIN_OUTER_TASK_GROUP */                task::spawn_root_and_wait(a);            }        }#if __TBB_EXCEPTIONS        static void run(  const Range& range, const Body& body, const Partitioner& partitioner, task_group_context& context ) {            if( !range.empty() ) {                start_for& a = *new(task::allocate_root(context)) start_for(range,body,const_cast<Partitioner&>(partitioner));                task::spawn_root_and_wait(a);            }        }#endif /* __TBB_EXCEPTIONS */    };    template<typename Range, typename Body, typename Partitioner>    task* start_for<Range,Body,Partitioner>::execute() {        if( !my_range.is_divisible() || my_partition.should_execute_range(*this) ) {            my_body( my_range );            return my_partition.continue_after_execute_range(*this);         } else {            empty_task& c = *new( this->allocate_continuation() ) empty_task;            recycle_as_child_of(c);            c.set_ref_count(2);            bool delay = my_partition.decide_whether_to_delay();            start_for& b = *new( c.allocate_child() ) start_for(*this,split());            my_partition.spawn_or_delay(delay,*this,b);            return this;        }    } } // namespace internal//! @endcond// Requirements on Range concept are documented in blocked_range.h/** \page parallel_for_body_req Requirements on parallel_for body    Class \c Body implementing the concept of parallel_for body must define:    - \code Body::Body( const Body& ); \endcode                 Copy constructor    - \code Body::~Body(); \endcode                             Destructor    - \code void Body::operator()( Range& r ) const; \endcode   Function call operator applying the body to range \c r.**//** \name parallel_for    See also requirements on \ref range_req "Range" and \ref parallel_for_body_req "parallel_for Body". **///@{//! Parallel iteration over range with simple partitioner, or default partitioner if no partitioner is specified./** @ingroup algorithms **/template<typename Range, typename Body>void parallel_for( const Range& range, const Body& body, const simple_partitioner& partitioner=simple_partitioner() ) {    internal::start_for<Range,Body,simple_partitioner>::run(range,body,partitioner);}//! Parallel iteration over range with auto_partitioner./** @ingroup algorithms **/template<typename Range, typename Body>void parallel_for( const Range& range, const Body& body, const auto_partitioner& partitioner ) {    internal::start_for<Range,Body,auto_partitioner>::run(range,body,partitioner);}//! Parallel iteration over range with affinity_partitioner./** @ingroup algorithms **/template<typename Range, typename Body>void parallel_for( const Range& range, const Body& body, affinity_partitioner& partitioner ) {    internal::start_for<Range,Body,affinity_partitioner>::run(range,body,partitioner);}#if __TBB_EXCEPTIONS//! Parallel iteration over range with simple partitioner and user-supplied context./** @ingroup algorithms **/template<typename Range, typename Body>void parallel_for( const Range& range, const Body& body, const simple_partitioner& partitioner, task_group_context& context ) {    internal::start_for<Range,Body,simple_partitioner>::run(range, body, partitioner, context);}//! Parallel iteration over range with auto_partitioner and user-supplied context./** @ingroup algorithms **/template<typename Range, typename Body>void parallel_for( const Range& range, const Body& body, const auto_partitioner& partitioner, task_group_context& context ) {    internal::start_for<Range,Body,auto_partitioner>::run(range, body, partitioner, context);}//! Parallel iteration over range with affinity_partitioner and user-supplied context./** @ingroup algorithms **/template<typename Range, typename Body>void parallel_for( const Range& range, const Body& body, affinity_partitioner& partitioner, task_group_context& context ) {    internal::start_for<Range,Body,affinity_partitioner>::run(range,body,partitioner, context);}#endif /* __TBB_EXCEPTIONS *///@}} // namespace tbb#endif /* __TBB_parallel_for_H */

我们看到，parallel_for有两个版本，一个接收两个参数，一个接收三个参数：

range：指定划分block的范围

body：指定对block应用的操作，Body可以看成是一个操作子functor，它的operator(...)会以blocked_range为参数进行调用，当然如果我们传过来的是一个函数指针也是可以的，只要它能以blocked_range为参数进行调用

partitioner：指定划分器，可选的两种simple_partitioner和auto_partitioner

其实从parallel_for的prototypedeclaration和definition中我们可以明显地看到generic programming的意思，这里Range、Body、Partitioner其实都是GP里的concept，它们要求满足一定的policy，因此是典型的基于policy的design，当然这里的policy比起STL，有过之而无不及了。

参考：http://blog.chinaunix.net/uid-18872995-id-302624.html

parallel_for 循环访问一个索引范围，并在每次迭代时以并行方式执行用户提供的函数。

template <   typename _Index_type,   typename _Function,   typename _Partitioner>void parallel_for(   _Index_type_First,   _Index_type_Last,   _Index_type_Step,   const _Function& _Func,   _Partitioner&& _Part);template <   typename _Index_type,   typename _Function>void parallel_for(   _Index_type_First,   _Index_type_Last,   _Index_type_Step,   const _Function& _Func);template <   typename _Index_type,   typename _Function>void parallel_for(   _Index_type_First,   _Index_type_Last,   const _Function& _Func,   const auto_partitioner& _Part = auto_partitioner());template <   typename _Index_type,   typename _Function>void parallel_for(   _Index_type_First,   _Index_type_Last,   const _Function& _Func,   const static_partitioner& _Part);template <   typename _Index_type,   typename _Function>void parallel_for(   _Index_type_First,   _Index_type_Last,   const _Function& _Func,   const simple_partitioner& _Part);template <   typename _Index_type,   typename _Function>void parallel_for(   _Index_type_First,   _Index_type_Last,   const _Function& _Func,   affinity_partitioner& _Part);

_Index_type

用于迭代的索引类型。

_Function

每次迭代时将执行的函数的类型。

_Partitioner

使用进行分区所提供的范围分区程序的类型。

_First

要包含在迭代中的第一个索引。

_Last

索引一超过了要包含在迭代中的最后一个索引。

_Step

从 _First 至 _Last 迭代时的步长值。此步长必须是正数。如果该步骤小于 1，则会引发 invalid_argument。

_Func

要在每次迭代时执行的函数。这可以是 lambda 表达式、函数指针或支持函数调用运算符某个版本的任意对象，具有签名 void operator()(_Index_type)。

_Part

对分区程序对象的引用。参数可以是一个 constauto_partitioner&， conststatic_partitioner&， constsimple_partitioner& 或 affinity_partitioner& ，如果使用 affinity_partitioner 对象，引用必须是一个非常量左值引用，因此，算法可以存储将来的循环的状态可以重用。

参考：http://msdn.microsoft.com/zh-SG/library/dd505035

重要参考：http://book.51cto.com/art/201004/197560.htm

0 0