local static

来源：互联网发布：淘宝客优惠券采集软件编辑：程序博客网时间：2024/06/06 11:01

http://bbs.pfan.cn/post-294156.html

发表时间：2009-2-20 9:19:00 本帖已结帖！

楼主

这是一位同事写的文章, 在此转贴, 大家讨论.

通常C++中很常见的Singleton实现，是利用一个模版类中的静态函数（或者一个直接的模版函数），在其中定义一个静态变量来实现的，例如：

template <class T>
class Singleton
{
public:
static T& Instance()
{
  static T obj;
  return obj;
}
};

而这里存在一个著名的隐患，就是static变量的初始化时间为函数被首次调用时。其汇编码如下（在VS2008下生成）：

004115BD  mov         eax,dword ptr [`Singleton<SomeClass>::Instance'::`2'::`local static guard' (41A154h)]
004115C2  and         eax,1
004115C5  jne         Singleton<SomeClass>::Instance+79h (4115F9h)
004115C7  mov         eax,dword ptr [`Singleton<SomeClass>::Instance'::`2'::`local static guard' (41A154h)]
004115CC  or          eax,1
004115CF  mov         dword ptr [`Singleton<SomeClass>::Instance'::`2'::`local static guard' (41A154h)],eax
004115D4  mov         dword ptr [ebp-4],0
004115DB  mov         ecx,offset obj (41A148h)
004115E0  call        SomeClass::SomeClass (41100Ah)
004115E5  push        offset `Singleton<SomeClass>::Instance'::`2'::`dynamic atexit destructor for 'obj'' (411226h)
004115EA  call        @ILT+125(_atexit) (411082h)
004115EF  add         esp,4
004115F2  mov         dword ptr [ebp-4],0FFFFFFFFh
，有一个dword类型的local static guard，初始值为0，函数每次被调用时，都检查这个数的最后一个二进制位是否为1，如果不为1，则调用构造函数004115E0  call        SomeClass::SomeClass (41100Ah) 。

注：在C++0x标准中，规定了编译器必须完成static变量的线程安全问题，所以在支持0x标准的编译器下，直接使用原始的Singleton定义即可。

因此，如果存在多线程同时首次访问该函数，多个线程获取到的local static guard均为0，然后分别调用了一次构造函数。如果对象中包含内存的分配，或者包含重要资源的分配，则这些内存/资源会发生泄漏。而且更严重的问题是，如果对象的构造函数较为复杂，例如往容器中填写了某些数据，那么甚至可能会产生严重错误或崩溃，甚至即使加锁也不能解决问题！

举例：

构造函数流程为：

1、调用成员的构造函数（这通常是无法改变的）
2、初始化锁
3、加锁
4、往成员的某个容器里填写数据
5、解锁。

那么，以下情况下会导致十分严重的连锁错误，最后极度可能发生崩溃：

1、A线程调用成员的构造函数
2、A线程初始化锁
3、A线程加锁
4、A容器更改了某个成员，并还未来得及完全保存下来，发生了线程切换，或者是在另一个CPU核心上B线程开始了工作
5、B线程调用成员的构造函数
6、B线程初始化锁，并覆盖了A线程所初始化的锁，使得A线程所初始化的锁泄漏。
7、B线程加锁，不会被A线程阻塞（因为是不同的锁）
8、B线程试图更改某个成员，但是成员数据已经发生了不一致，从而导致生成错误的数据，甚至崩溃
9、B线程解锁
10、A线程试图解锁，但所初始化的锁已经被覆盖，结果使得线程锁发生异常。

从上面的流程可以看出，即使加了临界区锁也不能解决该对象被构造两次的问题（因为锁本身也需要构造）。除非采用命名锁之类的避免受静态内存影响的锁，那样既繁琐又不方便使用。

但是，利用数值等类型的静态变量构造方式的不同（直接在静态区写入数值），我们可以很方便的构造出一个安全的Singleton类：

template <typename T>
class ThreadSafeSingleton
{
public:
static T& Instance()
{
  static long state = 0;
  static char obj[sizeof(T)];

  if (state == 0)
  {
   long val = InterlockedAdd(&state, 2);
   if (val == 2)
   {
    new (obj) SomeClass();
    InterlockedDecrement(&state);
   }
   else
   {
    InterlockedAdd(&state, -2);
   }
  }
  while ( state > 1 )//也可以是while ((state & 1) == 0)
  {
   Sleep(0);
  }

  return reinterpret_cast<T&>(obj[0]);
}
};

其原理是用state变量代替local static guard，state为0表示对象尚未构造，state为1表示对象已经构造，state大于1通常表示对象正在构造中。另外定义了一个obj空间，用于盛放该对象。
当开始有多个线程同时进入时，state首先为0，所有线程均进入if之后的语句块。此时，所有线程开始InterlockedAdd，因为操作系统保证此操作的原子性，因此有且仅有一个线程会得到val == 2（第一个调用该函数的线程）。此时，这个线程调用该对象的构造函数，并将state减1。其它线程将state再减2，并退出这个语句块，假装自己没有来过。并且，之后进入此函数的线程都会获得state != 0，从而不进入这个语句块。
随后，其他线程均处在后一个循环中，等待构造的线程构造完毕。Sleep(0);主要是为了主动让出CPU控制权，避免构造的线程得不到CPU控制权。构造线程构造完毕后，将state减1，并退出这个语句块。此时，所有线程检测到state为1，因此将直接返回。

以上代码还存在一个BUG，就是程序退出时析构函数没有被调用。还有一个比较头疼的地方就是InterlockedAdd这个函数在32位下通常没有。因此小小的修补一下，改用了另一种思路：

template <typename T>
class ThreadSafeSingleton
{
class holder
{
  char m_space[sizeof(T)];
  long &m_state;
public:
  holder(long &state)
   : m_state(state)
  {

  }
  ~holder()
  {
   if ( m_state < 0) //通常静态对象的析构应在所有线程结束后，因此不考虑还有线程停留在构造函数中。
    reinterpret_cast<T*>(this)->~T();
  }
};
public:

static T& Instance()
{
  static long state = 0;
  static holder obj(state);

  if (state == 0)
  {
   long val = InterlockedIncrement(&state);
   if (val == 1)
   {
    new (&obj) T();
    state = (1<<31);
   }
  }
  while ( state > 0 )
  {
   Sleep(0);
  }

  return reinterpret_cast<T&>(obj);
}
};

主要就是信任不会有0x80000000个并发线程，因此在构造完成后，state应当一直是负值，不可能被并发线程突然使得state达到0。

以上版本为最终版本，欢迎大家讨论。

欢迎访问我的博客

我要推荐

作者：sarrow

专家分：35630

级别:179

会员信息

发短消息

所属博客

发表时间：2009-2-20 10:58:00

1楼

楼主说的确实如此。

我其实一般都回避这个问题——资源（或者说那个可以获取静态对象的函数），我是在主线程就创建（调用该函数），之后才创建子线程。

呵呵，我这个方法不太好。

此帖被评30分

作者：强强

专家分：4740

级别:24

会员信息

发短消息

所属博客

发表时间：2009-2-20 11:11:00

2楼

对线程没有什么深入的认识目前只是知道注意两点:
1,线程执行时尽量不操作全局变量,创建线程时把要用到的变量取到线程内部的局部变量中,执行完再把局部变量最终结果给予全局变量
2,就是临界对象的使用
都是自已摸索出来的,望高手指正

作者：小小C

专家分：4570

级别:23

会员信息

发短消息

所属博客

发表时间：2009-2-20 11:39:00

3楼

引用：
对线程没有什么深入的认识目前只是知道注意两点:
1,线程执行时尽量不操作全局变量,创建线程时把要用到的变量取到线程内部的局部变量中,执行完再把局部变量最终结果给予全局变量
2,就是临界对象的使用
都是自已摸索出来的,望高手指正

老兄哇，上述问题就算你加了临界区也是没用的，因为临界区本身也是一个全局对象。

作者：iAkiak

专家分：8460

级别:43

会员信息

发短消息

所属博客

发表时间：2009-2-20 12:14:00

4楼

看Loki的Singleton，那个可以[配置]为多种线程共享方式。基本够用了。

此帖被评30分

作者：Chipset

专家分：16190

级别:81

会员信息

发短消息

所属博客

发表时间：2009-2-20 14:17:00

5楼

//我也贴一个
Developing Lightweight, Statically Initializable C++ Mutexes
Threadsafe initialization of Singletons
Vladimir Kliatchko
The novel synchronization mechanism Vladimir presents here comes in handy for a wide range of applications.
While working on a solution for a fairly narrow and specific problem, I discovered a new synchronization mechanism. This mechanism not only provided a novel and effective solution to my problem, but turns out to have a number of properties making it useful for a wide range of important applications .
My original problem had to do with the implementation of Singletons. The issues with Singletons involve the order and time of their initialization and destruction, as well as the thread safety of these operations. Despite the multitude of techniques developed to deal with these issues, there does not seem to be one universal solution—let alone a simple one.
The particular Singleton I was working on had to be located in a low-level library, which was used by many applications and other libraries. This constraint ruled out the use of a C++ file-scope variable as a mechanism for initializing the Singleton because there is no way to ensure that initialization of such an object is done properly before its first use. To avoid the active collaboration that would result from explicit initialization, I concluded that the Singleton had to be initialized on first access. The Meyers Singleton pattern (Listing One) fits this requirement. When this pattern is employed, the Singleton is initialized the first time it is used, and automatically destroyed when the process terminates. A word of caution: It is sometimes necessary not only to ensure that a Singleton is destroyed at process termination, but also to establish a particular relative order; for my Singleton, such was not the case.
class Singleton {
    Singleton();
    ~Singleton();
  public:
    static Singleton& instance();
};
Singleton& Singleton::instance() {
    static Singleton singleton;
    return singleton;
}
Listing One
This implementation of a Singleton leaves the issue of thread safety unresolved. If multiple threads enter the instance routine simultaneously and if this routine has never been executed before, various problems may occur, depending on the compiler and platform. For example, the constructor of my Singleton may be executed multiple times or not at all. Alternatively, the constructor may be executed exactly once, which is correct, but one of the instance routines may return before the construction is completed—again, leading to unpredictable and hard-to-debug failures. As you'd expect, you can eliminate this race condition by using one of a number of various synchronization mechanisms. Listing Two, for example, employs a mutex. In Listing Two, I did not show how the mutex variable is initialized. The mutex itself is a Singleton! If this mutex has a nontrivial constructor or destructor, we have a chicken-and-egg problem.
class Singleton
{
    Singleton();
    ~Singleton();
    static Singleton *s_singleton;
    static void init();
public:
    static Singleton& instance();
};
Singleton *Singleton::s_singleton = 0;
void Singleton::init()
{
    static Singleton singleton;
    s_singleton = &singleton;
}
Singleton& Singleton::instance()
{
    lock(&mutex);
    if (!s_singleton) init();
    unlock(&mutex);
    return *s_singleton;
}
Listing Two
UNIX: pthread Solutions
Solving the problem of initializing the mutex on UNIX platforms isn't difficult. The pthread library available on these platforms provides a mutex that is a Plain Old Data (POD) type requiring only static (nonruntime) initialization, as in Listing Three. Additionally, the pthread library provides a mechanism called "once routines" that can also be used to solve this initialization problem (Listing Four).
Singleton& Singleton::instance()
{
    static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
    pthread_mutex_lock(&mutex);
    if (!s_singleton) init();
    pthread_mutex_unlock(&mutex);
    return *s_singleton;
}
Listing Three
Singleton& Singleton::instance()
{
    static pthread_once_t once_control = PTHREAD_ONCE_INIT;
    pthread_once(&once_control, &Singleton::init);
    return *s_singleton;
}

此帖被评30分

作者：Chipset

专家分：16190

级别:81

会员信息

发短消息

所属博客

发表时间：2009-2-20 14:19:00

6楼

Listing Four
Windows: Spin-Lock Solution
Unfortunately, the Singleton I was implementing had to support Windows. That's where the main complication arose. Even critical sections, which are the simplest synchronization mechanism available on Windows applicable for my Singleton, do not support static initialization; a call to InitializeCriticalSection is required. As for once routines, there is no support for them in the Win32 API.
Of course, I could have implemented a simple spin-lock using the Interlocked family of routines and the Sleep function, as in Listing Five.
Singleton& Singleton::instance()
{
    static volatile LONG lock = 0;
    while (InterlockedExchange(&lock, 1) != 0) Sleep(1);
    if (!s_singleton) init();
    InterlockedExchange(&lock, 0);
    return *s_singleton;
}
Listing Five
Listing Five can be further optimized in two important ways. First, you can replace the lock variable with a three-state control variable (Listing Six).
Singleton& Singleton::instance()
{
    static volatile LONG control = 0;
while (1)
{
        LONG prev = InterlockedCompareExchange(&control, 1, 0);
        if (2 == prev)
            // The singleton has been initialized.
            return *s_singleton;
        elsif (0 == prev)
            // got the lock
            break;
        else
{
            // Another thread is initializing the singleton:
            // must wait.
            assert(1 == prev);
            Sleep(1); // sleep 1 millisecond
        }
    }
    if (!s_singleton) init();
    InterlockedExchange(&control, 2);
    return *s_singleton;
}

此帖被评20分

作者：Chipset

专家分：16190

级别:81

会员信息

发短消息

所属博客

发表时间：2009-2-20 14:21:00

7楼

Listing Six
•    No threads have yet attempted to initialize the Singleton. (This is the initial state.)
•    The Singleton is in the process of being initialized. (Any thread, except the first thread that sets the control variable to this value, must wait for the initialization to be completed.)
•    The initialization has been completed.
Introducing these three states ensures that, once the initialization is completed, no spinning is necessary, even if multiple threads enter the Singleton::instance routine simultaneously.
The second optimization lets the sleep interval adjust dynamically using exponential backoff (Listing Seven; available electronically; see "Resource Center," page 5). If the initialization of the Singleton (the constructor of our Singleton class) takes a relatively long time or if there are many spinning threads, choosing small sleep intervals may create considerable overhead. Moreover, this problem is exacerbated whenever spinning threads delay the progress of the thread performing the initialization. On systems with less sophisticated thread scheduling (Windows 98, for instance), this deficiency may even bring the entire system to a halt.
On the other hand, if the initialization of the Singleton is relatively quick, choosing a large sleep interval causes an unnecessary delay. Although adjusting the sleep interval dynamically does not solve these problems completely, it does reduce their likelihood.
With these substantial optimizations, the spin-lock approach, though inelegant, is acceptable for many applications . Nonetheless, because I was looking for a generic solution that I planned to implement in a low-level library used by a wide range of applications and for a variety of Singleton objects, the spin-lock approach did not appeal to me.
Windows: Named-Mutex Solution
In search of a better solution, I decided to look at how two open-source libraries—Boost.Threads (boost.org/doc/html/threads.html) and Pthreads-w32 (sourceware.org/pthreads-win32/)—deal with this problem. Although neither library is directly concerned with Singletons, each has code that implements once-routine support on Windows. Boost.Threads provides a portable set of handy MT-related primitives to C++  programmers. Naturally, the platforms supported by this library include Windows. Pthreads-w32, on the other hand, implements a pthread-compatible layer, including pthread_once, on top of the Windows API, thus simplifying the porting of MT programs from UNIX to Windows. I reasoned that whatever technique these libraries use to implement once routines on Windows, it should be possible to use that same technique to implement my Singleton.
The technique I discovered in Boost.Threads (libs/thread/src/once.cpp under the Boost source tree) relied on a special Windows feature that lets a name be associated with a mutex. If a thread (not necessarily within the same process) creates a mutex with a name that already exists, instead of creating a new mutex, the system returns a handle to the existing one. The system also makes sure that the mutex is not destroyed as long as there are any open handles to it. Finally, the system guarantees that creating a mutex and closing a mutex handle are atomic operations. Listing Eight (available electronically) shows how a named mutex can be used to implement our Singleton.
The name of the mutex is designed to be unique. If you don't think the string "Singleton::instance" provides sufficient guarantee from collisions, you can use a random string generated by your favorite method (the string actually used in the Boost code is "2AC1A572DB6944B0A65C38C4140AF2F4"). Also, note that this code uses the Double-Checked-Lock optimization (www.cs.wustl.edu/~schmidt/PDF/DC-Locking.pdf). To avoid the various problems with naïve implementations of this optimization (www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf), we make sure that all access to the flag is done through Windows interlocked routines, which incorporate all the necessary memory barriers. The InterlockedExchangeAdd(&flag, 0) call—a no-op incrementing the flag by 0—is just a way of using interlocked routines for reading the flag. (The same effect can be achieved more efficiently with some inlined assembler.)

作者：Chipset

专家分：16190

级别:81

会员信息

发短消息

所属博客

发表时间：2009-2-20 14:22:00

8楼

Although a clever technique from the Boost.Threads library lets you improve upon the previous implementation by removing the inherent unpredictability of spinning, the solution suffers from several inefficiencies. To begin with, named mutexes are intended to be used for synchronization across process boundaries rather than among threads within the same process. Manipulating, and creating such "interprocess" mutexes is expensive—substantially more so than those used to synchronize threads. Moreover, with this heavy approach, a named mutex has to be created at least once, even when there is absolutely no contention—even if the application using this Singleton is singlethreaded! Finally, generating a unique mutex name from a pointer is artificial and rather inelegant. Given these drawbacks, this solution, although acceptable in many applications , did not satisfy the needs of my high-performance reusable infrastructure library.
Developing Class QLock
Next, I looked at the Pthread-w32 library. The code I discovered in that library's implementation of pthread_once (pthread_once.c, CVS version 1.16) used an approach radically different from either of the two techniques just discussed: If multiple threads enter pthread_once simultaneously, the first thread that has to wait (the second thread overall) creates a synchronization object (a Windows event object, in this case). The last thread to leave the routine destroys the object. An atomic reference count keeps track of the number of threads involved. All of the problems of the previous two approaches seemed to be avoided. Unfortunately, a closer examination of the code uncovered a race condition. After a few attempts at fixing the problem (see pthread-w32 mailing list archive if you are interested in the details; sources.redhat.com/ml/pthreads-win32/), it became clear that this general approach is fundamentally flawed and cannot be fixed.
Although reference counting used by the pthread_once implementation did not work, trying to fix it gave me some ideas. What if each thread that had to be suspended (waiting for another thread to complete the initialization of a Singleton), created and then destroyed its own synchronization object? For instance, I knew how the MCS implementation of spin-locks, to reduce contention, makes each thread wait spinning on its own flag. Could I use a similar technique, but instead of spinning, make each thread wait on its own semaphore or event object? It turned out that it was not only possible to make such a technique work, but the resulting code was fairly simple, too. Listing Nine (available electronically) uses this specialized waiting technique to implement my Singleton.
The central idea of this new approach is encapsulated in class QLock. Each instance of this class represents a lock held on the mutex that was passed as a constructor argument. A mutex itself, which is represented by the QLock::Mutex type, is just a pointer, which is set to zero when the mutex is not locked. When the mutex is locked, it points to the tail of the queue of threads waiting on this mutex. Each thread in the queue is represented by the instance of class QLock that the thread created to obtain the lock. In other words, the mutex is a pointer to the QLock instance created by the last thread waiting in the queue. This QLock instance is linked to another QLock instance, which was created by the previous thread in the queue, and so on. The QLock instance at the head of the queue corresponds to the thread that currently holds the lock.

作者：Chipset

专家分：16190

级别:81

会员信息

发短消息

所属博客

发表时间：2009-2-20 14:22:00

9楼

A Closer Look at QLock
Table 1 lists the QLock class data members, while Table 2 lists the values that d_readyFlag and d_nextFlag may have. The constructor QLock::QLock atomically exchanges the mutex pointer for this. If the mutex pointer was 0 prior to the exchange, the lock is obtained and the constructor completes. If the mutex pointer was not zero, this instance of QLock has to join the queue and wait for the ready flag.
Data Member    Description
d_mutex    Pointer to the mutex locked by this QLock instance.
d_next    Pointer to the next instance of QLock in queue waiting to obtain the lock.
d_readyFlag    Flag used to indicate that the lock has been released by the predecessor.
d_nextFlag    Flag used to indicate that the next pointer (d_next) has been set by a successor.
Table 1: QLock class data members.
Value    Description
0    Initial value indicating that the flag is not set.
-1    Indicates that the flag was set before an event object had been installed into the flag.
Neither 0 nor -1    Indicates that an event object has been installed into the flag, which now holds the handle of the event object.
Table 2: Values that d_readyFlag and d_nextFlag may have.
The destructor QLock::~QLock, using one atomic compare-and-exchange operation, checks if the mutex pointer contains the value of this pointer. If it does, it resets the mutex pointer to zero, thus releasing the mutex. If the mutex no longer contains the value of this pointer, the queue has been formed, and the destructor must pass the lock to the next instance in the queue. The destructor first waits on the d_nextFlag to make sure that the next pointer has been set, then sets the d_readyFlag.
The algorithms used in QLock's constructor/destructor are basically the same as those used to lock/unlock MCS spin-locks. The setFlag and waitOnFlag methods are where we make our important deviation from MCS locks. Instead of spinning (as is appropriate for a spin-lock), the waitOnFlag routine:
1.    Creates an event object.
2.    Atomically checks that the flag has not yet been set and installs the event object's handle into the flag.
3.    Proceeds to wait on the event object.
Now compare our QLock-based solution for the Singleton problem with the two previous solutions (spin-lock-based and named-mutex-based). Similar to the named-mutex-based solution, you avoid the unpredictable behavior of the spin-lock: Instead of spinning, waiting threads are suspended by the kernel, and resumed as soon as the mutex is available. Additionally, you avoid the problems with the named mutex: Event objects used for synchronization are process local, do not require artificial names, and are created only when threads need to be suspended due to contention. When there is no contention (when only one thread enters the Singleton::instance method at the time of initialization), no synchronization objects are created: The overhead is simply that of a few atomic operations.
Observations
After using the synchronization mechanisms implemented by the QLock class to solve the Singleton problem on Windows (and more generally, the problem with the static initialization of mutexes on Windows), I discovered that QLock has some other important properties that made it attractive for other applications  (including on UNIX). QLock::Mutex has a small memory footprint (one pointer), and a small initialization cost (that of initializing a pointer with 0). Additionally, when there is no contention, QLock is fast. If no wait is necessary, locking a QLock requires just one atomic instruction and no system calls. If there are no waiting threads, unlocking is also just one atomic instruction.
Furthermore, when there is contention, the cost of constructing event objects (or semaphores on UNIX) can be virtually eliminated by using a lock-free pool of allocated event objects. This pool works particularly well because the number of event objects required never exceeds the number of waiting threads—an inherently small number.
One common scenario in which QLock appears to be of great use is when you have a large collection of some kind of items that are accessed from multiple threads. To increase concurrency, it may be logically possible and desirable to maintain a mutex per item of the collection (instead of maintaining a single mutex for the entire collection). It is often the case, though, that the memory footprint and the initialization cost of a standard mutex or critical section would be prohibitive due to the (sometimes very large) size of the collection. QLock, because of its small size and negligible initialization cost, may be able to provide a viable solution to fine-grain locking.

作者：小小C

专家分：4570

级别:23

会员信息

发短消息

所属博客

发表时间：2009-2-20 14:27:00

10楼

能不能用中文简单的介绍一下啊！这么多的英文见了头大！

作者：Chipset

专家分：16190

级别:81

会员信息

发短消息

所属博客

发表时间：2009-2-20 14:32:00

11楼

引用：
能不能用中文简单的介绍一下啊！这么多的英文见了头大！

只看代码就可以啦。

作者：SonicLing

专家分：6260

级别:32

会员信息

发短消息

所属博客

发表时间：2009-2-20 22:16:00

12楼

既然只是创建的时候出问题，那么在并发之前，所有Singleton先调用get触发一下构造好了，我的思路跟sorrow一致。get里面就不需要lock了。

要lock，就用指针。指针比较麻烦，还得找一个集中析构的地方。。。。

我看公司用的singleton是用的指针，集中创建，集中销毁，也蛮好用的。

最后修改于2009-2-20 22:18:00

作者：eastcowboy

专家分：25080

级别:126

会员信息

发短消息

所属博客

发表时间：2009-2-21 2:50:00

13楼

多线程接触不多，从细节上提一点个人看法。
1. 对于“while ( state > 0 )”循环，似乎应该对state使用volatile关键字？否则若编译器优化把state的值放到寄存器中，则循环卡死。
2. 考虑对异常的处理。最终版中“new (&obj) T();”这句，若构造函数出现异常，则“state = (1<<31);”不会执行，于是其它线程会卡死在“while ( state > 0 )”这个循环中。虽然对于Singleton来说构造函数出现异常的概率较低，但是“异常时导致其它线程卡死”这个代价也太不应该了。
3. 有建议说不要用char space[sizeof(TheClass)]; 这个关系到字节对齐等问题。以VC来说，char数组的起始地址一般是4的倍数，但如果TheClass的各个成员都是double的话，起始地址为8的倍数会更佳。缓解的办法是用union: 如下：
union {
    char space[sizeof(TheClass)];
    int dummy1;
    char dummy2;
    double dummy3;
    // 其它各种基本类型、void指针、函数指针、类成员指针，等等
};
（虽然标准没有规定，但几乎所有编译器都会设置space, dummy1, dummy2, ...的首地址为同一位置。如此一来，编译器就会尽可能的满足各种类型的字节对齐要求。）

我觉得临界区不是没用，而是要看加在什么位置。以下的代码没有测试，但我认为是可行的~（至少是基本可行吧？没有多线程编程经验，专家们不要拍砖）

CCriticalSection g_cs;           // 全局变量
                                 // 也可以用类似物代替
                                 // 注意在多线程开始前必须初始化好
TheClass& GetSingleton()
{
    CSingleLock lock(&g_cs);     // 加锁
    static TheClass s_singleton; // 若需要，进行构造
    return s_singleton;          // 返回
}                                // 利用析构函数解锁。即使构造出现异常也可顺利解锁

总的思路就是“在单线程时就初始化锁，在多线程时可利用锁实现线程安全”。

接着看看反汇编。
从楼主贴出的反汇编代码来看，代码中出现了两条call指令。一条是调用了构造函数，另一条则调用了atexit。
C++规定全局变量、static变量的析构顺序必须与构造顺序完全相反，看来VC2008是使用atexit来实现这一点了。跟踪汇编代码进去看了看，发现atexit内部使用了EnterCriticalSection之类的函数来确保线程安全。不过我在跟踪过程中并未找到InitializeCriticalSection之类的函数。可以想象这个CRITICAL_SECTION是在之前就已经初始化好了的，不必在每个线程中判断是否需要初始化。

看来VC的CRT实现也采用了这种做法：在单线程时就初始化锁，在多线程时可利用锁实现线程安全。只要不是每次都去检查是否需要初始化锁，小心的操作应该可以确保正确。

不过CRT作为一个函数库，它的实现是被动的（它无法得知目前是单线程还是多线程），因此需要小心应付多线程带来的各种困扰。对于一般应用而言，自己的资源自己管理，没有必要弄那么复杂，1楼sarrow兄已经道出了真谛：就在单线程中把全局资源搞定！

Chipset兄贴了很多，暂时没有精力看完啦。觉得我自己的代码跟Chipset贴的第二段很像，只是不明白为什么要单独编写一个init函数？

此帖被评30分

作者：小小C

专家分：4570

级别:23

会员信息

发短消息

所属博客

发表时间：2009-2-21 9:42:00

14楼

CCriticalSection g_cs; // 全局变量

公司规定不充许用全局变量............如果没这限制,就完全没这问题了..............

作者：sarrow

专家分：35630

级别:179

会员信息

发短消息

所属博客

发表时间：2009-2-21 10:27:00

15楼

CCriticalSection g_cs; // 全局变量

这将导致模块的依赖性问题。

作者：SonicLing

专家分：6260

级别:32

会员信息

发短消息

所属博客

发表时间：2009-2-22 2:17:00

16楼

那就把全局变量移动到类里面去作为类的静态成员，一回事情，只是外观上看起来不是全局的。

作者：tdzl2003

专家分：210

级别:2

会员信息

发短消息

所属博客

发表时间：2009-2-23 10:18:00

17楼

我看到了有sarrow的出现……抓你一把，还记得我不？我天地之灵。
居然有惊讶的发现了iAkiAk。。。赞热心大牛……我都很久没泡论坛了。
楼主那篇文章是我写的……

关于Spinlock的解决方案：主要是模仿《Imperfect C++》中的相关片断，但我记得书中的代码似乎没有考虑在while循环中发生了线程切换（或者线程的并行执行），所以就写了那么一段代码。不过确实用InterlockedExchange更加美观、直白。

回复eastcowboy：
1、确实如此。应当增加这个关键字。
2、是的。这个代码我只简单的表达了一个意思，并非安全到可以直接在项目中使用。除了这个问题外，还有其他可能的问题，比如在构造函数直接或间接的又调用了Instance，会导致所有线程一起卡死。
3、字节对齐问题对于静态变量来说没有什么考虑的必要。尾部对不对齐我们不关心，首部对不对齐由编译器去处理。如果你是指在这里建议编译器进行对齐的话，确实，我没考虑到这一点。但如果只从功能上来说的话，不会导致问题。
4、用全局变量作为锁当然没问题。只是我在自己研究C++的时候，和《imperfect C++》的作者有个共同的偏好：库只用头文件，不用cpp文件。如果要用全局变量，就要打破这个“美好的愿望”了

to: sarrow 想你了，偶尔和我打个招呼嘛，你不会没我QQ号了吧。
to: iAkiAk 最近在哪混呢。。

咳嗽，翻了下主题列表，惊讶的发现了燕子姐= =#
记得最后一次和燕子姐接触是关于一个算法问题，之前吹大了，说几毫秒能算出来，后来搞不出来，然后被狠狠地鄙视- -# 那也是我最后一次搞算法的代码，凄凉啊。。。