Boost.Function内核剖析

来源：互联网发布：医学论文翻译软件编辑：程序博客网时间：2024/06/18 05:53

From:http://www.csdn.net/article/2011-03-22/294383/1

摘要：回顾Boost.Function的用法上一篇文章中我曾提到，Boost.Function有两种使用形式，一是根据其所指涉的函数的参数个数不同而决定使用 boost::function0, boost::function2, boost::function3乃至（最高）boost::fu

导读：本文是侯捷先生在《程序员》杂志上发表的一篇文章，作为《Boost应用与技术系列文章》之一，文中介绍了Boost.Function的内核技术，带领读者观察Boost相关源码。

作者介绍：

侯捷，台湾著名C++技术专家，两岸著名IT技术教育者，计算机图书作家/译者/书评人。深入了解企业一线实践，挖掘C++技术难点与要点，有多年的企业客户培训经验，其深度的技术剖析和丰富的案例教学，深受企业及C++开发者的广泛好评。

曾翻译众多高阶技术书籍，包括Meyers所著之“Effective C++”系列。擅长以容易理解的图片解析繁复的结构和过程，以深入浅出的方式阐述高深的技术与复杂的源代码。

全文摘要

前文介绍了Boost.Function的用法，本期介绍其内核技术，我将带领读者观察Boost相关源码。我们常在Boost的不同子库中看到它们尝试以简单代码“衍生膨胀”出一些近似重复的代码。就今天讨论的Boost.Function 而言，乃是利用“宏持续自身迭代”加上C++ 极具特色的合并操作符（##）完成这类需求，而与此前介绍过的Boost.Tuple作法又有不同。Boost对这种“衍生膨胀代码”的需求的解法并非定于一尊。不同的子库采用不同的作法原因在于它们来自不同的作者，因而百花齐放。这让我们一方面在学习上感到杂乱，一方面却也能感受技术思路的多样化。

这些解法无所谓优劣，百花齐放的正面意义终究大于负面意义。

回顾Boost.Function的用法

上一篇文章中我曾提到，Boost.Function有两种使用形式，一是根据其所指涉的函数的参数个数不同而决定使用 boost::function0, boost::function2, boost::function3乃至（最高）boost::function10。另一是不论其所指涉的函数签名为何，一律使用 boost::function。现在先来分析第一种形式。图1 是个应用例，其中所用的function1,function2都是Boost定义的class templates。

图1此程序示范使用Boost.Function。注意它含入<boost\function.hpp>。

#include <iostream>

#include <boost\function.hpp>

//足以以一当十（所谓十是指下面这些写法）

//#include <boost\function1.hpp>

//直接根据需要这样含入也可以

//#include <boost\function2.hpp> //依此类推...

using namespace std;

using namespace boost;

int func1(int i) { return (i*5); }

bool func2(int i, double d) { return (i > d); }

int main()

{

function1<int, int> f1; //(1)

f1 = &func1;

cout << f1(10) << endl; //50

function2<bool, int, double> f2; //(2)

f2 = &func2;

cout << f2(10, 1.1) << endl; //1

return 0;

}

Boost.Function源码剖析

现在我们来看看Boost 对这些classes 的定义。显然它们被定义于应用程序所含入的头文件（或其更内层头文件）内。是的，<boost\function.hpp> 拥有图2所列内容。我们可以看到，虽然只在面对IBM VisualAge C++ 时由于编译器不能良好处理文件迭代（file iteration）所以才逐一含入多个头文件，但此种“逐一含入”方式却是我认为比较容易被了解的，所以我将以此形式来解说源码。

图2function.hpp源码摘录。定义BOOST_FUNCTINO_MAX_ARGS为10，并逐一含入functionN.hpp。

#include <boost/preprocessor/iterate.hpp>

#include <boost/detail/workaround.hpp>

#ifndef BOOST_FUNCTION_MAX_ARGS

# define BOOST_FUNCTION_MAX_ARGS 10

#endif // BOOST_FUNCTION_MAX_ARGS

…

// Visual Age C++ doesn't handle the file iteration well

#if BOOST_WORKAROUND(__IBMCPP__, >= 500)

# if BOOST_FUNCTION_MAX_ARGS >= 0

# include <boost/function/function0.hpp>

# endif

# if BOOST_FUNCTION_MAX_ARGS >= 1

# include <boost/function/function1.hpp>

# endif

# if BOOST_FUNCTION_MAX_ARGS >= 2

# include <boost/function/function2.hpp>

# endif

# if BOOST_FUNCTION_MAX_ARGS >= 3

# include <boost/function/function3.hpp>

# endif

…

# if BOOST_FUNCTION_MAX_ARGS >= 10

# include <boost/function/function10.hpp>

# endif

#else

// What is the '3' for?

# define BOOST_PP_ITERATION_PARAMS_1 (3,(0,BO

OST_FUNCTION_MAX_ARGS,<boost/function/detail/

function_iterate.hpp>))

# include BOOST_PP_ITERATE()

# undef BOOST_PP_ITERATION_PARAMS_1

#endif

由此，就图2的 "MAX_ARGS" 默认值10而言：

# define BOOST_FUNCTION_MAX_ARGS 10

相当于应用程序将function0.hpp~function10.hpp 全部含入。现在我们来看看这些头文件的内容。图3至图5列出若干个头文件，它们都定义了自己的BOOST_FUNCTION_NUM_ARGS数值；读者可依此类推这些 "NUM_ARGS" 数值分别为0~10的所有情况。

图3<boost\function\function2.hpp> 的内容

#define BOOST_FUNCTION_NUM_ARGS 2

#include <boost/function/detail/maybe_include.

hpp>

#undef BOOST_FUNCTION_NUM_ARGS

图4<boost\function\function3.hpp> 的内容

#define BOOST_FUNCTION_NUM_ARGS 3

#include <boost/function/detail/maybe_include.

hpp>

#undef BOOST_FUNCTION_NUM_ARGS

图5<boost\function\function4.hpp> 的内容

#define BOOST_FUNCTION_NUM_ARGS 4

#include <boost/function/detail/maybe_include.

hpp>

#undef BOOST_FUNCTION_NUM_ARGS

从图3至图5可知，各个functionN.hpp都定义了自己的"NUM_ARGS"数值而后含入maybe_include.hpp，显然该"NUM_ARGS"数值对maybe_include.hpp必有影响。的确，图6列出maybe_include.hpp 内容，其内的确根据"NUM_ARGS"数值各自又定义了BOOST_FUNCTION_N并含入function_template.hpp。

图6 <boost\function\defail\maybe_include.hpp>源码摘录

#if BOOST_FUNCTION_NUM_ARGS == 0

# ifndef BOOST_FUNCTION_0

# define BOOST_FUNCTION_0

# include <boost/function/function_template.

hpp>

# endif

#elif BOOST_FUNCTION_NUM_ARGS == 1

# ifndef BOOST_FUNCTION_1

# define BOOST_FUNCTION_1

# include <boost/function/function_template.

hpp>

# endif

#elif BOOST_FUNCTION_NUM_ARGS == 2

# ifndef BOOST_FUNCTION_2

# define BOOST_FUNCTION_2

# include <boost/function/function_template.

hpp>

# endif

#elif BOOST_FUNCTION_NUM_ARGS == 3

… （注：直到BOOST_FUNCTION_NUM_ARGS == 50）

#else

# error Cannot handle Boost.Function objects that

accept more than 50 arguments!

#endif

图6各情况共同含入的function_template.hpp相当庞大，目前我要讨论的是其所拥有的图7内容，那是个classtemplate定义式。乍见之下其源码很难懂，因为其中好些符号无法单靠对C++的认识而正确解读，这些符号其实都是宏或#define。

图7function_template.hpp 源码摘录，主要列出class template的定义式。

template<

typename R BOOST_FUNCTION_COMMA

BOOST_FUNCTION_TEMPLATE_PARMS,

typename Allocator = BOOST_FUNCTION_DEFAULT_A

LLOCATOR

class BOOST_FUNCTION_FUNCTION : public

function_base

{

public:

#ifndef BOOST_NO_VOID_RETURNS typedef R result_type;

#else

typedef typename

detail::function::

function_return_type<R>::type

result_type;

#endif // BOOST_NO_VOID_RETURNS

#if BOOST_FUNCTION_NUM_ARGS == 1

typedef T0 argument_type;

#elif BOOST_FUNCTION_NUM_ARGS == 2

typedef T0 first_argument_type;

typedef T1 second_argument_type;

#endif

BOOST_FUNCTION_FUNCTION() : function_base(),

invoker(0) {}

~BOOST_FUNCTION_FUNCTION() { clear(); }

...

typedef

result_type (*invoker_type)(detail::

function::any_pointer

BOOST_FUNCTION_COMMA

BOOST_FUNCTION_TEMPLATE_ARGS);

invoker_type invoker;

};

首先观察图7的class名称：

class BOOST_FUNCTION_FUNCTION : public function_base

OOST_FUNCTION_FUNCTION其实是个宏，定义如图8。其中的BOOST_JOIN又是个宏，定义如图9。

图8\BOOST_FUNCTION_FUNCTION是个宏，定义于function_template.hpp。

#define BOOST_FUNCTION_FUNCTION

BOOST_JOIN(function,BOOST_FUNCTION_NUM_ARGS)

图9BOOST_JOIN又是个宏，定义于boost\config\suffix.hpp。

#define BOOST_JOIN( X, Y ) BOOST_DO_JOIN( X, Y )

#define BOOST_DO_JOIN( X, Y ) BOOST_DO_JOIN2(X,Y)

#define BOOST_DO_JOIN2( X, Y ) X##Y

你可能会迷惑，为什么图9不直接定义为：

define BOOST_JOIN( X, Y ) X##Y

要借助另一些辅助宏呢？原因是这个宏主要用来将两个实参结合（join）在一起成为另一个语汇单元，我们希望当其中某个实参本身又是宏时也没有问题（C++Standard 16.3.1节）。此处写法的关键在于，对于一个身为宏的实参，其宏扩展不会发生在BOOST_DO_JOIN2但会发生在BOOST_DO_JOIN。

第一回合小小结果

行至此我们看了很多头文件。简单整理一下，如果应用程序含入的是：

#include <boost\function.hpp>

便会在function.hpp中含有类似这样的头文件（后缀字尾为0~10，共11个）：

#include <boost\function2.hpp>

而后便会在function2.hpp中获得：

#define BOOST_FUNCTION_NUM_ARGS 2

又在maybe_include.hpp中获得

# define BOOST_FUNCTION_2

而后便能在function_template.hpp内获得语汇单元function2以及class function2，如图10（其原形见图7）。

图10class template function2的定义式摘录

template<

typename R BOOST_FUNCTION_COMMA

BOOST_FUNCTION_TEMPLATE_PARMS,

typename Allocator = BOOST_FUNCTION_DEFAULT_A

LLOCATOR

class function2 : public function_base

{

public:

...

typedef T0 first_argument_type;

typedef T1 second_argument_type;

function2() : function_base(), invoker(0) {}

~function2() { clear(); }

...

typedef

result_type (*invoker_type)(detail::

function::any_pointer

BOOST_FUNCTION_COMMA

BOOST_FUNCTION_TEMPLATE_ARGS);

invoker_type invoker;

};

至此，图10仍存在若干横空出世的符号如下，必须分析它们才能解开谜团：

BOOST_FUNCTION_COMMA

BOOST_FUNCTION_TEMPLATE_PARMS

BOOST_FUNCTION_DEFAULT_ALLOCATOR

BOOST_FUNCTION_TEMPLATE_ARGS

继续分析——艰巨的“宏持续自身迭代”

刚才说到尚有6个符号需要解谜。让我先约好前提确定目标：以下分析的是"NUM_ARGS"为2的情况。首先分析BOOST_FUNCTION_COMMA，其定义如图11。再分析

BOOST_FUNCTION_DEFAULT_ALLOCATOR，其定义如图12。

图11BOOST_FUNCTION_COMMA被定义于boost\function\function_template.hpp

// Comma if nonzero number of arguments

#if BOOST_FUNCTION_NUM_ARGS == 0 (注：目前其值为2，所以走下一语句)

# define BOOST_FUNCTION_COMMA

#else

# define BOOST_FUNCTION_COMMA ,

#endif // BOOST_FUNCTION_NUM_ARGS > 0

图12BOOST_FUNCTION_DEFAULT_ALLOCATOR被定义于function_ template.hpp。其内判断BOOST_NO_STD_ALLOCATOR，此名称在Boost的许多.hpp中针对各种编译器版本和/或各种标准库版本都有定义。这里不讨论那些版本，直接告诉你，VC6.0采用灰色那个语句，定义为int。

注：in boost\function\function_template.hpp

// Type of the default allocator

#ifndef BOOST_NO_STD_ALLOCATOR

# define BOOST_FUNCTION_DEFAULT_ALLOCATOR

std::allocator<function_base>

#else

# define BOOST_FUNCTION_DEFAULT_ALLOCATOR int

#endif // BOOST_NO_STD_ALLOCATOR

轻松解决了两个又臭又长的符号，再来就没那么单纯了。继续分析BOOST_FUNCTION_TEMPLATE_PARMS和BOOST_FUNCTION_ TEMPLATE_ARGS，其定义分别为图13和图14。两个宏所引发的预处理动作（由preprocessor 负责）都十分繁复，不适合以这篇短文详细说明，大体上是运用“宏持续自身迭代”以及C++ 极特殊的合并操作符（merge operator ；语言符号为##），根据"NUM_ARGS"的数值n演绎出很多符号，作为模板（template）的参数或实参使用。图13会演绎出typenameT0，typename T1，typename T2，……，typename Tn-1，做为模板参数使用，图14会演绎出T0，T1，T2，……Tn-1，作为模板实参使用。有了这些演绎结果，图10存在的T0, T1当然也就有了意义。

读者如果对所谓“宏持续自身迭代”感兴趣，或欲亲眼目睹“合并操作符”##的出现，请自行观察Boost的enum_params.hpp、config.hpp和repeat.hpp。

图13BOOST_FUNCTION_TEMPLATE_PARMS定义于function_template.hpp

#define BOOST_FUNCTION_TEMPLATE_PARMS

BOOST_PP_ENUM_PARAMS(BOOST_FUNCTION_NUM_ARGS,

typename T)

注1：以上会根据BOOST_FUNCTION_NUM_ARGS为n而演绎出typename T0, typename T1, typenameT2,…,typename Tn-1, 注2：另有BOOST_FUNCTION_PARMS会根据n演绎出T0 a0, T1 1, T2 a2,…,Tn-1 an-1

图14BOOST_FUNCTION_TEMPLATE_ARGS定义于function_template.hpp

#define BOOST_FUNCTION_TEMPLATE_ARGS

BOOST_PP_ENUM_PARAMS(BOOST_FUNCTION_NUM_ARGS, T)

注1：以上会根据BOOST_FUNCTION_NUM_ARGS为n而演绎出T0, T1, T2,…Tn-1

注2：另有BOOST_FUNCTION_ARGS会根据n演绎出a0, a1, a2,…, an-1

第二回合小小结果

进行至此，假设BOOST_PP_ENUM_PARAMS为2，先前的图7原型最终将获得图15的class function2。

图15function_template.hpp中的class template被预处理之后获得的结果（当被指涉函数的实参个数为2时）。

template<

typename R,

typename T0, typename T1,

typename Allocator = int

class function2 : public function_base

{

public:

typedef R result_type; //注：此行的分析在后头

typedef T0 first_argument_type;

typedef T1 second_argument_type;

function2() : function_base(), invoker(0) {}

~function2() { clear(); }

…

typedef

result_type (*invoker_type)(detail::

function::any_pointer,

T0, T1);

invoker_type invoker;

};

因此当用户写出：

unction2<bool, int, double> f2;

图15的R，T0，T1便被编译器推导出这样的定义：

typedef bool result_type;

typedef int first_argument_type;

typedef double second_argument_type;

走笔至此，我发现遗漏分析一段述句，图7的第一段（灰色）条件编译句：

#ifndef BOOST_NO_VOID_RETURNS

typedef R result_type;

#else

typedef typename

detail::function::

function_return_type<R>::type

result_type;

#endif // BOOST_NO_VOID_RETURNS

此处以预处理器侦测程序是否曾经定义BOOST_NO_VOID_RETURNS，从而决定如何定义result_type。其中的R是使用者传进来的模板参数，代表被指涉函数的返回类型（return type），因此以R做为result_type很是直观易解。但有些编译器不支持以void为返回类型（现今还有这种编译器吗？我怀疑），或是不支持偏特化（例如VC6），但这儿整个template推导过程中会用上偏特化，因此Boost设计了一个替代方案，如图16。其意义搭配上述的typedef意思是：如果R不是void，就定义return_type为R；但如果R是void，就定义eturn_type为struct unusable。

图16function_base.hpp中的一个struct，用以决定“被指涉函数”的返回类型。其所附带的注释颇值一读，一并附上。

/**

* The unusable class is a placeholder for unused

function arguments

* It is also completely unusable except that it

constructable from

* anything. This helps compilers without partial

specialization to

* handle Boost.Function objects returning void.

struct unusable

{

unusable() {}

template<typename T> unusable(const T&) {}

};

/* Determine the return type. This supports

compilers that do not support

* void returns or partial specialization by

silently changing the return

* type to "unusable".

template<typename T> struct function_return_type {

typedef T type; };

template<>

struct function_return_type<void>

{

typedef unusable type;

};

实地观察“预处理”的成果

实地观察预处理后的结果，可以看出，只要含入<boost\function.hpp> 便可获得function0 至function10等一共11个class templates 定义式。我列出其中（以VC 6.0编译而得的）四个classes的摘录内容于图17至图20。如何“实地观察预处理后的结果”呢？以VC为例只要加上编译选项/便可获得一个延伸文件名为.i的文件（纯文字，很大），那便是了。

图17预处理器完成工作后，获得class function0。

template<

typename R ,

typename Allocator = int

class function0 : public function_base

{

public:

typedef typename detail::function::

function_return_type<R>::type

result_type;

...

typedef result_type (*invoker_type)(detail::

function::any_pointer);

invoker_type invoker;

};

图18预处理器完成工作后，获得class function1。

template<

typename R ,

typename T0 ,

typename Allocator = int

class function1 : public function_base

{

public:

typedef typename detail::function::

function_return_type<R>::type

result_type;

typedef T0 argument_type;

...

typedef result_type (*invoker_type)(detail::

function::any_pointer ,

T0);

invoker_type invoker;

};

图19预处理器完成工作后，获得class function2。

template<

typename R ,

typename T0 , typename T1,

typename Allocator = int

class function2 : public function_base

{

public:

typedef typename detail::function::

function_return_type<R>::type

result_type;

typedef T0 first_argument_type;

typedef T1 second_argument_type;

...

typedef result_type (*invoker_type)(detail::

function::any_pointer ,

T0,T1);

invoker_type invoker;

};

图20预处理器完成工作后，获得class function10。

template<

typename R ,

typename T0 , typename T1 , typename T2 ,

typename T3 , typename T4 ,

typename T5 , typename T6 , typename T7 ,

typename T8 , typename T9,

typename Allocator = int

class function10 : public function_base

{

typedef typename detail::function::

function_return_type<R>::type

result_type;

typedef T0 arg1_type; typedef T1 arg2_type;

typedef T2 arg3_type;

typedef T3 arg4_type; typedef T4 arg5_type;

typedef T5 arg6_type;

typedef T6 arg7_type; typedef T7 arg8_type;

typedef T8 arg9_type;

typedef T9 arg10_type;

...

typedef result_type (*invoker_type)(detail::

function::any_pointer ,

T0,T1,T2,T3,T4,T5,T6,T7,T8,T9);

invoker_type invoker;

};

Boost.Function的运行

搞定图7那个“模板的模板”后，我们终于可以看看应用端对Boost.Function 的使用会引发哪些代码。注意，以下所示代码皆以图7原型所衍生的代码来表现，它们不一定出现在图10或图15，因为那两张图只是摘录。

以图1的三行为例：

function2<bool, int, double> f2;

f2 = &func2;

cout << f2(10, 1.1) << endl;

第一行首先唤起function2构造函数：

function2() : function_base(), invoker(0) {}

其中先唤起父类function_base的构造函数，再设定成员变量invoker的值为0。invoker是个函数指针，类型如下：

typedef result_type (*invoker_type)(detail::

function::any_pointer, T0,T1);

invoker_type invoker;

经过符号替换之后，以上可简单写为：

typedef bool (*invoker_type)(any_pointer, int,

double);

invoker_type invoker;

其中的any_pointer 被定义为：

union any_pointer

{

void* obj_ptr;

const void* const_obj_ptr;

void (*func_ptr)();

char data[1];

};

这里之所以要定义出一个由void pointer和functionpointer构成的union，是因为C++ Standard 5.2.10/6说reinterpret_cast可以安全地在function pointer类型之间转换，C++ Standard 5.2.9/10又说static_cast可以安全地将void*转型为object pointer。但function pointer和void*之间的转型（无论谁转谁）却不合法，因此有必要以union形式来转换。图21显示的三个函数分别以三种成分完成一个any_pointer。

图21三个函数分别以三种不同的成分建立一个any_pointer

inline any_pointer make_any_pointer(void* o)

{

any_pointer p;

p.obj_ptr = o;

return p;

} inline any_pointer make_any_pointer(const void* o)

{

any_pointer p;

p.const_obj_ptr = o;

return p;

nline any_pointer make_any_pointer(void (*f)())

{

any_pointer p;

p.func_ptr = f;

return p;

}

再来看看functino_base（摘录于图22），它是function0——function10的共同父类，带有两个成员变量manager和functor，前者是个函数指针，后者是个any_pointer。其构造函数无非是把两个成员变量都设为0。

图22function_base 源码摘录（录自function_base.hpp）

* 这个class内含function1, function2, function3...等

classes

* 所需的基本元素。它是所有functions的共同父类

class function_base

ublic:

function_base() : manager(0) { functor.obj_ptr

= 0; }

bool empty() const { return !manager; }

public:

detail::function::any_pointer

(*manager)(detail::function::any_pointer,

detail::function::functor_manag

er_operation_type);

detail::function::any_pointer functor;

...

}

图23出现于图22中的functor_manager_operation_type，其定义出自function_base.hpp。其内三个命名可想象分别是1,2,3。

enum functor_manager_operation_type {

clone_functor_tag,

destroy_functor_tag,

check_functor_type_tag

};

回到应用端，当执行赋值操作（assignment）：

function2<bool, int, double> f2;

f2 = &func2;

cout << f2(10, 1.1) << endl;

唤起function2的赋值操作符（operator=），将赋值操作右侧的某个“可被呼叫物”塞给function2 对象中的成员变量functor内的某个成分：

template<typename Functor>

function2&

operator=(Functor const & f)

{

function2(f).swap(*this);

//这里非常繁复，不多说明

return *this;

}

而当应用端执行调用动作（call）：

function2<bool, int, double> f2;

f2 = &func2；

cout << f2(10, 1.1) << endl;

则是唤起function2的“函数调用操作符”（operator()）：

bool

operator()(int a0, double a1) const

{

if (this->empty())

boost::throw_exception(bad_function_call(

));

return invoker(this->functor, a0, a1);

}

一定不要忘记invoker是function2内的一个成员变量（例见图19），是个函数指针，其第一参数是个any_pointer，后续各参数分别是用以唤起该any_pointer的所需实参。

总结

本文并未将Boost.Function源码的所有细节交代清楚，因为细节实在太多太繁复。我的焦点主要集中在“宏持续自身迭代”的梳理。有了这个基础，也就有能力自行研读其它未尽细节了。

这个Boost系列持续了一年，期间我谈了7个主题。虽然尚未达成第一篇预告的所有目标，但其余主题（例如Serialization）比较庞大，整理起来比较费时，不敢预估时间，将于每次完成后以独立形式发文。换句话说这个Boost系列就以此文画上句点。

0 0