Writing Reentrant and Thread-Safe Code(编写可重入和线程安全的代码)

来源：互联网发布：cnc简单手工编程实例编辑：程序博客网时间：2024/05/02 06:54

Writing Reentrant and Thread-Safe Code

In single-threaded processes there is only one flow of control. The code executed by these processes thus need not to be reentrant or thread-safe. In multi-threaded programs, the same functions and the same resources may be accessed concurrently by several flows of control. To protect resource integrity, code written for multi-threaded programs must be reentrant and thread-safe.

This section provides information for writing reentrant and thread-safe programs. It does not cover the topic of writing thread-efficient programs. Thread-efficient programs are efficiently parallelized programs. This can only be done during the design of the program. Existing single-threaded programs can be made thread-efficient, but this requires that they be completely redesigned and rewritten.

Understanding Reentrance and Thread-Safety
Making a Function Reentrant
Making a Function Thread-Safe
Reentrant and Thread-Safe Libraries

Understanding Reentrance and Thread-Safety

Reentrance and thread-safety are both related to the way functions handle resources. Reentrance and thread-safety are separate concepts: a function can be either reentrant, thread-safe, both, or neither.

Reentrance

A reentrant function does not hold static data over successive calls, nor does it return a pointer to static data. All data is provided by the caller of the function. A reentrant function must not call non-reentrant functions.

A non-reentrant function can often, but not always, be identified by its external interface and its usage. For example, the strtok subroutine is not reentrant, because it holds the string to be broken into tokens. The ctime subroutine is also not reentrant; it returns a pointer to static data that is overwritten by each call.

Thread-Safety

A thread-safe function protects shared resources from concurrent access by locks. Thread-safety concerns only the implementation of a function and does not affect its external interface.

In C, local variables are dynamically allocated on the stack. Therefore, any function that does not use static data or other shared resources is trivially thread-safe. For example, the following function is thread-safe:

/* thread-safe function */int diff(int x, int y){int delta;delta = y - x;if (delta < 0)delta = -delta;return delta;}

The use of global data is thread-unsafe. It should be maintained per thread or encapsulated, so that its access can be serialized. A thread may read an error code corresponding to an error caused by another thread. In AIX, each thread has its own errno value.

Making a Function Reentrant

In most cases, non-reentrant functions must be replaced by functions with a modified interface to be reentrant. Non-reentrant functions cannot be used by multiple threads. Furthermore, it may be impossible to make a non-reentrant function thread-safe.

Returning Data

Many non-reentrant functions return a pointer to static data. This can be avoided in two ways:

Returning dynamically allocated data. In this case, it will be the caller's responsibility to free the storage. The benefit is that the interface does not need to be modified. However, backward compatibility is not ensured; existing single-threaded programs using the modified functions without changes would not free the storage, leading to memory leaks.
Using caller-provided storage. This method is recommended, although the interface needs to be modified.

For example, a strtoupper function, converting a string to uppercase, could be implemented as in the following code fragment:

/* non-reentrant function */char *strtoupper(char *string){static char buffer[MAX_STRING_SIZE];int index;for (index = 0; string[index]; index++)buffer[index] = toupper(string[index]);buffer[index] = 0return buffer;}

This function is not reentrant (nor thread-safe). Using the first method to make the function reentrant, the function would be similar to the following code fragment:

/* reentrant function (a poor solution) */char *strtoupper(char *string){char *buffer;int index;/* error-checking should be performed! */buffer = malloc(MAX_STRING_SIZE);for (index = 0; string[index]; index++)buffer[index] = toupper(string[index]);buffer[index] = 0return buffer;}

A better solution consists of modifying the interface. The caller must provide the storage for both input and output strings, as in the following code fragment:

/* reentrant function (a better solution) */char *strtoupper_r(char *in_str, char *out_str){int index;for (index = 0; in_str[index]; index++)out_str[index] = toupper(in_str[index]);out_str[index] = 0return out_str;}

The non-reentrant standard C library subroutines were made reentrant using the second method. This is discussed below .

Keeping Data over Successive Calls

No data should be kept over successive calls, because different threads may successively call the function. If a function needs to maintain some data over successive calls, such as a working buffer or a pointer, this data should be provided by the caller.

[对一个需要连续调用的函数，函数中不应该保存中间处理结果，因为同时其他的thread也可能在调这个函数]

Consider the following example. A function returns the successive lowercase characters of a string. The string is provided only on the first call, as with the strtok subroutine. The function returns 0 when it reaches the end of the string. The function could be implemented as in the following code fragment:

/* non-reentrant function */char lowercase_c(char *string){static char *buffer;static int index;char c = 0;/* stores the string on first call */if (string != NULL) {buffer = string;index = 0;}/* searches a lowercase character */for (; c = buffer[index]; index++) {if (islower(c)) {index++;break;}}return c;}

This function is not reentrant. To make it reentrant, the static data, the index variable, needs to be maintained by the caller. The reentrant version of the function could be implemented as in the following code fragment:

/* reentrant function */char reentrant_lowercase_c(char *string, int *p_index){char c = 0;/* no initialization - the caller should have done it *//* searches a lowercase character */for (; c = string[*p_index]; (*p_index)++) {if (islower(c)) {(*p_index)++;break;   }}return c;}

The interface of the function changed and so did its usage. The caller must provide the string on each call and must initialize the index to 0 before the first call, as in the following code fragment:

char *my_string;char my_char;int my_index;...my_index = 0;while (my_char = reentrant_lowercase_c(my_string, &my_index)) {...}

Making a Function Thread-Safe

In multi-threaded programs, all functions called by multiple threads must be thread-safe. However, there is a workaround for using thread unsafe subroutines in multi-threaded programs. Note also that non-reentrant functions usually are thread-unsafe, but making them reentrant often makes them thread-safe, too.

Locking Shared Resources

Functions that use static data or any other shared resources, such as files or terminals, must serialize the access to these resources by locks in order to be thread-safe. For example, the following function is thread-unsafe:

/* thread-unsafe function */int increment_counter(){static int counter = 0;counter++;return counter;}

To be thread-safe, the static variable counter needs to be protected by a static lock, as in the following (pseudo-code) example:

/* pseudo-code thread-safe function */int increment_counter();{static int counter = 0;static lock_type counter_lock = LOCK_INITIALIZER;lock(counter_lock);counter++;unlock(counter_lock);return counter;}

In a multi-threaded application program using the threads library, mutexes should be used for serializing shared resources. Independent libraries may need to work outside the context of threads and, thus, use other kinds of locks.

A Workaround for Thread-Unsafe Functions

It is possible to use thread-unsafe functions called by multiple threads using a workaround. This may be useful, especially when using a thread-unsafe library in a multi-threaded program, for testing or while waiting for a thread-safe version of the library to be available. The workaround leads to some overhead, because it consists of serializing the entire function or even a group of functions.

Use a global lock for the library, and lock it each time you use the library (calling a library routine or using a library global variable), as in the following pseudo-code fragments:
```
/* this is pseudo-code! */    lock(library_lock);    library_call();    unlock(library_lock);    lock(library_lock);    x = library_var;    unlock(library_lock);
```
This solution can create performance bottlenecks because only one thread can access any part of the library at any given time. The solution is acceptable only if the library is seldom accessed, or as an initial, quickly implemented workaround.
Use a lock for each library component (routine or global variable) or group of components, as in the following pseudo-code fragments:
```
/* this is pseudo-code! */    lock(library_moduleA_lock);    library_moduleA_call();    unlock(library_moduleA_lock);    lock(library_moduleB_lock);    x = library_moduleB_var;    unlock(library_moduleB_lock);
```
This solution is somewhat more complicated to implement than the first one, but it can improve performance.

Because this workaround should only be used in application programs and not in libraries, mutexes can be used for locking the library.

Reentrant and Thread-Safe Libraries

Reentrant and thread-safe libraries are useful in a wide range of parallel (and asynchronous) programming environments, not just within threads. Thus it is a good programming practice to always use and write reentrant and thread-safe functions.

Using Libraries

Several libraries shipped with the AIX Base Operating System are thread-safe. In the current version of AIX, the following libraries are thread-safe:

Standard C library (libc.a)
Berkeley compatibility library (libbsd.a).

Some of the standard C subroutines are non-reentrant, such as the ctime and strtok subroutines. The reentrant version of the subroutines have the name of the original subroutine with a suffix _r (underscore r).

When writing multi-threaded programs, the reentrant versions of subroutines should be used instead of the original version. For example, the following code fragment:

token[0] = strtok(string, separators);i = 0;do {i++;token[i] = strtok(NULL, separators);} while (token[i] != NULL);

should be replaced in a multi-threaded program by the following code fragment:

char *pointer;...token[0] = strtok_r(string, separators, &pointer);i = 0;do {i++;token[i] = strtok_r(NULL, separators, &pointer);} while (token[i] != NULL);

Thread-unsafe libraries may be used by only one thread in a program. The uniqueness of the thread using the library must be ensured by the programmer; otherwise, the program will have unexpected behavior, or may even crash.

Converting Libraries

This information highlights the main steps in converting an existing library to a reentrant and thread-safe library. It applies only to C language libraries.

Identifying exported global variables. Those variables are usually defined in a header file with the export keyword.
Exported global variables should be encapsulated. The variable should be made private (defined with the static keyword in the library source code). Access (read and write) subroutines should be created.
Identifying static variables and other shared resources. Static variables are usually defined with the static keyword.
Locks should be associated with any shared resource. The granularity of the locking, thus choosing the number of locks, impacts the performance of the library. To initialize the locks, the one-time initialization facility may be used.
Identifying non-reentrant functions and making them reentrant. See Making a Function Reentrant .
Identifying thread-unsafe functions and making them thread-safe. See Making a Function Thread-Safe .

Related Information

Parallel Programming Overview

Thread Programming Concepts

什么是可重入性？
可重入（reentrant）函数可以由多于一个任务并发使用，而不必担心数据错误。相反，不可重入（non-reentrant）函数不能由超过一个任务所共享，除非能确保函数的互斥（或者使用信号量，或者在代码的关键部分禁用中断）。可重入函数可以在任意时刻被中断，稍后再继续运行，不会丢失数据。可重入函数要么使用本地变量，要么在使用全局变量时保护自己的数据。

可重入函数：

不为连续的调用持有静态数据。
不返回指向静态数据的指针；所有数据都由函数的调用者提供。
使用本地数据，或者通过制作全局数据的本地拷贝来保护全局数据。
绝不调用任何不可重入函数。

不要混淆可重入与线程安全。在程序员看来，这是两个独立的概念：函数可以是可重入的，是线程安全的，或者二者皆是，或者二者皆非。不可重入的函数不能由多个线程使用。另外，或许不可能让某个不可重入的函数是线程安全的。

确保可重入性的五条经验
经验 1
返回指向静态数据的指针可能会导致函数不可重入。
经验 2
记忆数据的状态会使函数不可重入。不同的线程可能会先后调用那个函数，并且修改那些数据时不会通知其他正在使用此数据的线程。如果函数需要在一系列调用期间维持某些数据的状态，比如工作缓存或指针，那么调用者应该提供此数据。
经验 3
在大部分系统中，malloc 和 free 都不是可重入的，因为它们使用静态数据结构来记录哪些内存块是空闲的。实际上，任何分配或释放内存的库函数都是不可重入的。这也包括分配空间存储结果的函数。
经验 4
为了编写没有 bug 的代码，要特别小心处理进程范围内的全局变量，如 errno 和 h_errno。
经验 5
如果底层的函数处于关键部分，并且生成并处理信号，那么这可能会导致函数不可重入。通过使用信号设置和信号掩码，代码的关键区域可以被保护起来不受一组特定信号的影响，如下：保存当前信号设置。
用不必要的信号屏蔽信号设置。
使代码的关键部分完成其工作。
最后，重置信号设置。