最小化文件之间的 compilation dependencies（编译依赖）

来源：互联网发布：朱元璋真实长相知乎编辑：程序博客网时间：2024/05/20 03:07

我们进入到某个C++ 程序中，对一个 class 的 implementation（实现）进行了细微的改变。不是 class 的 interface（接口），只是 implementation（实现），仅仅是 private 的东西。然后 rebuild（重建）这个程序，预计这个任务应该只花费几秒钟。毕竟只有一个 class 被改变。点了一下 Build 或者键入 make（或者其它类似的事情），然后惊呆了，继而被郁闷，就像突然意识到整个世界都被重新编译和连接！

问题在于 C++ 没有做好从 implementations（实现）中剥离 interfaces（接口）的工作。一个 class definition（类定义）不仅指定了一个 class interface（类接口）而且有相当数量的 implementation details（实现细节）。例如：

class Person {
public:
Person(const std::string& name, const Date& birthday,
const Address& addr);
std::string name() const;
std::string birthDate() const;
std::string address() const;

private:
      std::string theName;        // implementation detail
      Date theBirthDate;          // implementation detail
      Address theAddress;         // implementation detail
};

这里，如果不访问 Person 的 implementations（实现）使用到的 class，也就是 string，Date 和 Address 的定义，class Person 就无法编译。这样的定义一般通过 #include 指令提供，所以在定义 Person class 的文件中，很可能会找到类似这样的东西：

#include <string>
#include "date.h"
#include "address.h"

不幸的是，这样就建立了定义 Person 的文件和这些头文件之间的 compilation dependency（编译依赖）。如果这些头文件中的任何一个发生了变化，或者如果这些头文件所依赖的任何一个文件发生了变化，包含 Person class 的文件和使用了 Person 的任何文件一样必须重新编译，这样的 cascading compilation dependencies（层叠编译依赖）导致了数不清的麻烦。

C++ 为什么坚持要将一个 class 的implementation details（实现细节）放在 class definition（类定义）中。例如，为什么不能这样定义 Person，单独指定这个 class 的 implementation details（实现细节）呢？

namespace std {
class string; // forward declaration (an incorrect
} // one — see below)

class Date; // forward declaration
class Address; // forward declaration

class Person {
public:
      Person(const std::string& name, const Date& birthday,
                 const Address& addr);
      std::string name() const;
      std::string birthDate() const;
      std::string address() const;
    ...
};

如果这样可行，只有在 class 的 interface 发生变化时，Person 的客户才有必要重新编译。

这个想法有两个问题。第一个，string 不是一个 class，它是一个 typedef (for basic_string<char>)。造成的结果就是，string 的 forward declaration（前向声明）是不正确的。正确的 forward declaration（前向声明）要复杂得多，因为它包括其它的模板。然而，这还不是要紧的，因为不应该试图手动声明标准库的部件。替代做法是，直接使用适当的 #includes 并让它去做。标准头文件不太可能成为编译的瓶颈，特别是在你的构建环境允许你利用 precompiled headers（预编译头文件）时。如果解析标准头文件真的成为一个问题。你也许需要改变你的 interface 设计，避免使用导致不受欢迎的 #includes 的标准库部件。

第二个（而且更重要的）难点是 forward-declaring（前向声明）的每一件东西必须让编译器在编译期间知道它的 objects 的大小。考虑：

int main()
{
int x; // define an int

Person p( params ); // define a Person
}

当编译器看到 x 的定义，它们知道它们必须分配足够的空间（一般是在栈上）用于保存一个 int。这没什么问题，每一个编译器都知道一个 int 有多大。当编译器看到 p 的定义，它们知道它们必须分配足够的空间给一个 Person，但是它们怎么推测出一个 Person object 有多大呢？它们得到这个信息的唯一方法是参考这个 class 的定义，但是如果一个省略了实现细节的 class definition（类定义）是合法的，编译器怎么知道该分配多少空间呢？

这个问题在诸如 Smalltalk 和 Java 这样的语言中就不会发生，因为，在这些语言中，定义一个 object 时，编译器仅需要分配足够的空间给一个指向一个 object 的 pointer。也就是说，它们处理上面的代码就像这些代码是这样写的：

int main()
{
int x; // define an int

Person *p; // define a pointer to a Person
...
}

这当然是合法的 C++，所以也可以自己来玩这种“将 object 的实现隐藏在一个指针后面”的游戏。对 Person 做这件事的一种方法就是将它分开到两个 classes 中，其中一个仅提供一个 interface，另一个实现这个 interface。如果那个 implementation class（实现类）名为 PersonImpl，Person 就可以如此定义：

#include <string> // standard library components
// shouldn't be forward-declared

#include <memory> // for tr1::shared_ptr; see below

class PersonImpl; // forward decl of Person impl. class
class Date; // forward decls of classes used in

class Address; // Person interface
class Person {
public:
Person(const std::string& name, const Date& birthday,
const Address& addr);
std::string name() const;
std::string birthDate() const;
std::string address() const;
...

private: // ptr to implementation;
std::tr1::shared_ptr<PersonImpl> pImpl; // see Item 13 for info on
}; // std::tr1::shared_ptr

这里，main class (Person) 除了一个指向它的 implementation class（实现类） (PersonImpl) 的指针（这里是一个 tr1::shared_ptr）之外不包含任何 data member。这样一个设计通常被说成是使用了 pimpl idiom ("pointer to implementation")（“指向实现的指针”）。在这样的 classes 中，那个指针的名字经常是 pImpl，就像上面那个。

用这样的设计，使 Person 的客户脱离 dates，addresses 和 persons 的细节。这些 classes 的实现可以随心所欲地改变，但 Person 的客户却不必重新编译。另外，因为他们看不到 Person 的实现细节，客户就不太可能写出以某种方式依赖那些细节的代码。这就是 interface（接口）和 implementation（实现）的真正分离。

这个分离的关键就是用对 declarations（声明）的依赖替代对 definitions（定义）的依赖。这就是 minimizing compilation dependencies（最小化编译依赖）的精髓：只要能实现，就让头文件独立自足，如果不能，就依赖其它文件中的声明，而不是定义。其它每一件事都从这个简单的设计策略产生。因此：

当 object references（引用）和 pointers（指针）可以做到时就避免使用 objects。仅需一个类型的声明，就可以定义到这个类型的 references 和 pointers。而定义一个类型的 objects 必须要存在这个类型的定义。

只要能做到，就用对 class declarations（类声明）的依赖替代对 class definitions（类定义）的依赖。注意在声明一个使用一个 class 的函数时绝对不需要有这个 class definition，即使这个函数通过传值方式传递或返回这个 class：

class Date; // class declaration

Date today(); // fine — no definition
void clearAppointments(Date d); // of Date is needed

当然，pass-by-value（传值）通常不是一个好主意，但是如果发现自己因为某种原因而使用它，依然不能为引入不必要的 compilation dependencies（编译依赖）辩解。

不定义 Date 就可以声明 today 和 clearAppointments 的能力可能会令你感到惊奇，但是它其实并不像看上去那么不同寻常。只有有人调用了这些函数，Date 的定义才必须在调用之前被看到。为什么费心去声明没有人调用的函数，觉得奇怪吗？很简单。并不是没有人调用它们，而是并非每个人都要调用它们。如果有一个包含很多 function declarations（函数声明）的库，每一个客户都要调用每一个函数是不太可能的。通过将提供 class definitions（类定义）的责任从你的 function declarations（函数声明）的头文件转移到客户的包含 function calls（函数调用）的文件，就消除了客户对他们并不真正需要的 type definitions（类型定义）的人为依赖。

为 declarations（声明）和 definitions（定义）分别提供头文件。为了便于坚持上面的指导方针，头文件需要成对出现：一个用于 declarations（声明），另一个用于 definitions（定义）。当然，这些文件必须保持一致。如果一个 declaration（声明）在一个地方被改变了，它必须在两处都被改变。得出的结果是：库的客户应该总是 #include 一个 declaration（声明）文件，而不是自己 forward-declaring（前向声明）某些东西，而库的作者应该提供两个头文件。例如，想要声明 today 和 clearAppointments 的 Date 的客户不应该像前面展示的那样手动前向声明 Date。更合适的是，它应该 #include 适当的用于 declarations（声明）的头文件：

#include "datefwd.h" // header file declaring (but not
// defining) class Date

Date today(); // as before
void clearAppointments(Date d);

declaration-only（仅有声明）的头文件的名字 "datefwd.h" 基于来自标准 C++ 库的头文件 <iosfwd>。<iosfwd> 包含 iostream 组件的 declarations（声明），而它们相应的 definitions（定义）在几个不同的头文件中，包括 <sstream>，<streambuf>，<fstream> 和 <iostream>。

<iosfwd> 在其它方面也有启发意义，而且它表明本 Item 的建议对于 templates（模板）和 non-templates（非模板）一样有效。尽管在很多构建环境中，template definitions（模板定义）的典型特征是位于头文件中，但有些环境允许 template definitions（模板定义）位于非头文件中，所以为模板提供一个 declaration-only（仅有声明）的头文件依然是有意义的。<iosfwd> 就是一个这样的头文件。

C++ 还提供了 export 关键字允许将 template declarations（模板声明）从 template definitions（模板定义）中分离出来。不幸的是，支持 export 的编译器非常少见，而与 export 打交道的实际经验就更少了。结果是，现在就说 export 在高效 C++ 编程中扮演什么角色还为时尚早。

像 Person 这样的使用 pimpl idiom（惯用法）的 classes 经常被称为 Handle classes。为了避免对这样的 classes 实际上做什么事的好奇心，一种方法是将所有对它们的函数调用都转送给相应的 implementation classes（实现类），而让那些 classes 来做真正的工作。例如，这是两个 Person 的 member functions（成员函数）被实现的例子：

#include "Person.h" // we're implementing the Person class,
// so we must #include its class definition

#include "PersonImpl.h"      // we must also #include PersonImpl's class
                             // definition, otherwise we couldn't call
                             // its member functions; note that
                             // PersonImpl has exactly the same
                             // member functions as Person — their
                             // interfaces are identical

Person::Person(const std::string& name, const Date& birthday,
const Address& addr)
: pImpl(new PersonImpl(name, birthday, addr))
{}

std::string Person::name() const
{
return pImpl->name();
}

注意 Person 的 constructor（构造函数）是如何调用 PersonImpl 的 constructor（构造函数）的（通过使用 new ），以及 Person::name 是如何调用 PersonImpl::name 的。这很重要。使 Person 成为一个 Handle class 不需要改变 Person 要做的事情，仅仅是改变了它做事的方法。

另一个不同于 Handle class 的候选方法是使 Person 成为一个被叫做 Interface class 的特殊种类的 abstract base class（抽象基类）。这样一个 class 的作用是为 derived classes（派生类）指定一个 interface（接口）。结果，它的典型特征是没有 data members（数据成员），没有 constructors（构造函数），有一个 virtual destructor（虚拟析构函数）和一组指定 interface（接口）的 pure virtual functions（纯虚拟函数）。

Interface classes 类似 Java 和 .NET 中的 Interfaces，但是 C++ 并不会为 Interface classes 强加那些 Java 和 .NET 为 Interfaces 强加的种种限制。例如，Java 和 .NET 都不允许 Interfaces 中有 data members（数据成员）和 function implementations（函数实现），但是 C++ 不禁止这些事情。C++ 的较大弹性是有用处的。在一个 hierarchy（继承体系）的所有 classes 中 non-virtual functions（非虚拟函数）的实现应该相同，因此将这样的函数实现为声明它们的 Interface class 的构件就是有意义的。

一个 Person 的 Interface class 可能就像这样：

class Person {
public:
virtual ~Person();

virtual std::string name() const = 0;
virtual std::string birthDate() const = 0;
virtual std::string address() const = 0;
...
};

这个 class 的客户必须针对 Person 的指针和引用进行编程，因为实例化包含 pure virtual functions（纯虚拟函数）的 classes 是不可能的。（然而，实例化从 Person 派生的 classes 是可能的）和 Handle classes 的客户一样，除非 Interface class 的 interface（接口）发生变化，否则 Interface classes 的客户不需要重新编译。

一个 Interface class 的客户必须有办法创建新的 objects。他们一般通过调用一个为“可以真正实例化的 derived classes（派生类）”扮演 constructor（构造函数）角色的函数做到这一点的。这样的函数一般称为 factory functions或 virtual constructors（虚拟构造函数）。他们返回指向动态分配的支持 Interface class 的 interface 的 objects 的指针（smart pointers（智能指针）更合适）。这样的函数在 Interface class 内部一般声明为 static：
class Person {
public:
static std::tr1::shared_ptr<Person>    // return a tr1::shared_ptr to a new
   create(const std::string& name,      // Person initialized with the
          const Date& birthday,         // given params; see Item 18 for
          const Address& addr);         // why a tr1::shared_ptr is returned
};
客户就像这样使用它们：

std::string name;
Date dateOfBirth;
Address address;

// create an object supporting the Person interface
std::tr1::shared_ptr<Person> pp(Person::create(name, dateOfBirth, address));

std::cout << pp->name()                 // use the object via the
          << " was born on "            // Person interface
          << pp->birthDate()
          << " and now lives at "
          << pp->address();
// the object is automatically deleted when pp goes out of scope

当然，在某些场合，必须定义支持 Interface class 的 interface（接口）的 concrete classes （具体类）并调用真正的 constructors（构造函数）。这所有的一切都发生在幕后，隐藏在那个包含了 virtual constructors（虚拟构造函数）的实现的文件之内。例如，Interface class Person 可以有一个 concrete derived class（具体派生类）RealPerson，它为继承到的 virtual functions（虚拟函数）提供了实现：

class RealPerson: public Person {
public:
RealPerson(const std::string& name, const Date& birthday,
const Address& addr)
: theName(name), theBirthDate(birthday), theAddress(addr)
{}

virtual ~RealPerson() {}

std::string name() const;        // implementations of these
std::string birthDate() const;   // functions are not shown, but
std::string address() const;     // they are easy to imagine

private:
std::string theName;
Date theBirthDate;
Address theAddress;
};

给出了 RealPerson，写 Person::create 就微不足道了：

std::tr1::shared_ptr<Person> Person::create(const std::string& name,
const Date& birthday,
const Address& addr)
{
return std::tr1::shared_ptr<Person>(new RealPerson(name, birthday,addr));
}

Person::create 的一个更现实的实现会依赖于诸如，另外的函数参数的值，从文件或数据库读出的数据，环境变量等等，创建不同 derived class（派生类）类型的 objects。

RealPerson 示范了实现一个 Interface class 的两种最通用的机制中的一种：从 Interface class（Person）继承它的 interface specification（接口规格），然后实现 interface（接口）中的函数。实现一个 Interface class 的第二种方法包含 multiple inheritance（多继承）。

Handle classes 和 Interface classes 从 implementations（实现）中分离出 interfaces（接口），因此减少了文件之间的编译依赖。

在Handle classes 的情况下，member functions 必须通过 implementation pointer（实现的指针）得到 object 的数据。这就在每次访问中增加了一个间接层。而且必须在存储每一个 object 所需的内存量中增加这一 implementation pointer（实现的指针）的大小。最后，这一 implementation pointer（实现的指针）必须被初始化（在 Handle class 的 constructors（构造函数）中）为指向一个动态分配的 implementation object（实现的对象），所以你要承受动态内存分配（以及随后的释放）所固有的成本和遭遇 bad_alloc (out-of-memory) exceptions（异常）的可能性。

对于 Interface classes，每一个函数调用都是虚拟的，所以每调用一次函数就要支付一个间接跳转的成本。还有，从 Interface class 派生的 objects 必须包含一个 virtual table pointer。这个 pointer 可能增加存储一个 object 所需的内存量，这依赖于这个 Interface class 是否是这个 object 的 virtual functions（虚拟函数）的唯一来源。

最后，无论 Handle classes 还是 Interface classes 都不能大量使用 inline functions（内联函数）。一般情况下函数本体必须在头文件中才能做到 inline，但是 Handle classes 和 Interface classes 一般都被设计用于隐藏类似函数本体这样的实现细节。

然而，仅仅因为它们所涉及到的成本而放弃 Handle classes 和 Interface classes 会成为一个严重的错误。virtual functions（虚拟函数）也是一样，但还是不能放弃它们。替代做法是，考虑以一种改进的方式使用这些技术。在开发过程中，使用 Handle classes 和 Interface classes 来最小化当实现发生变化时对客户的影响。当能看出在速度和/或大小上的不同足以证明增加 classes 之间的耦合是值得的时候，可以用 concrete classes（具体类）取代 Handle classes 和 Interface classes 供产品使用。

Things to Remember

最小化编译依赖后面的一般想法是用对 declarations（声明）的依赖取代对 definitions（定义）的依赖。基于此想法的两个方法是 Handle classes 和 Interface classes。

库头文件应该以完整并且 declaration-only（只有声明）的形式存在。无论是否包含 templates（模板）都适用于这一点。