《Windows Via C/C++》边学习，边翻译（四）操作字符和字符串-3

来源：互联网发布：武汉的好的编程大专编辑：程序博客网时间：2024/05/22 00:11

Unicode and ANSI Functions in the C Run-Time Library

C运行期库的Unicode和ANSI函数

Like the Windows functions, the C run-time library offers one set of functions to manipulate ANSI characters and strings and another set of functions to manipulate Unicode characters and strings. However, unlike Windows, the ANSI functions do the work; they do not translate the strings to Unicode and then call the Unicode version of the functions internally. And, of course, the Unicode versions do the work themselves too; they do not internally call the ANSI versions.

像Windows函数一样，C运行时库提供了一个操作ANSI字符和字符串的函数集，以及另一个操作Unicode字符和字符串函数集。然而与Windows的ANSI函数不同的是，（C运行时库所提供的ANSI函数）完成它自己的工作；并不在内部将字符串转换为Unicode编码和调用函数的对应Unicode版本。同时，Unicode版本的函数当然会完成它自己的工作；也不会在内部调用对应的ANSI版本。

An example of a C run-time function that returns the length of an ANSI string is strlen, and an example of an equivalent C run-time function that returns the length of a Unicode string is wcslen.

一个C运行时函数返回ANSI字符串长度的例子是strlen，而一个等价的C运行时函数返回Unicode字符串长度的例子是wcslen。

Both of these functions are prototyped in String.h. To write source code that can be compiled for either ANSI or Unicode, you must also include TChar.h, which defines the following macro:

这些函数的原型都在String.h中作了定义。编写既能编译成ANSI又能编译成Unicode的原代码，必须也引入TChar.h，盖头文件定义了以下的宏：

#ifdef _UNICODE

#define _tcslen wcslen

#else

#define _tcslen strlen

#endif

Now, in your code, you should call _tcslen. If _UNICODE is defined, it expands to wcslen; otherwise, it expands to strlen. By default, when you create a new C++ project in Visual Studio, _UNICODE is defined (just like UNICODE is defined). The C run-time library always prefixes identifiers that are not part of the C++ standard with underscores, while the Windows team does not do this. So, in your applications you'll want to make sure that both UNICODE and _UNICODE are defined or that neither is defined. Appendix A, "The Build Environment," will describe the details of the CmnHdr.h header file used by all the code samples of this book to avoid this kind of problem.

现在，在你的代码中应该调用_tcslen，如果定义宏_UNICODE，它被展开为wcslen；否则展开为strlen。当在Visual Studio中创建一个新的C++工程时，_UNICODE默认是定义的。C运行时库总是在标识符前加下划线作为前缀，以表示不是标准C++的部分，而Windows开发组并不这样做。因此在你的应用程序中你想确认是否UNICODE和_UNICODE都已被定义是还是都未定义。附录A，"The Build Environment"，将讲述本书中的所有代码如何使用CmnHdr.h头文件来避免这种情况。

Secure String Functions in the C Run-Time Library

C运行期库中安全版本的字符串函数

Any function that modifies a string exposes a potential danger: if the destination string buffer is not large enough to contain the resulting string, memory corruption occurs. Here is an example:

任何修改字符串的函数都暴露了一个潜在的危险：如果目的字符串缓冲区的大小并不足以存放结果字符串，那么内存恶化(memory corruption)就会发生。这里是一个例子：

// The following puts 4 characters in a

// 3-character buffer, resulting in memory corruption

// 以下代码试图将4个字符放入3字符缓冲区中，

// 导致内存恶化

WCHAR szBuffer[3] = L"";

wcscpy(szBuffer, L"abc"); // The terminating 0 is a character too!

// 零结束符也是一个字符

The problem with the strcpy and wcscpy functions (and most other string manipulation functions) is that they do not accept an argument specifying the maximum size of the buffer, and therefore, the function doesn't know that it is corrupting memory. Because the function doesn't know that it is corrupting memory, it can't report an error back to your code, and therefore, you have no way of knowing that memory was corrupted. And, of course, it would be best if the function just failed without corrupting any memory at all.

问题是strcpy和wcscpy函数（以及大多数其它的字符串操作函数）并不接受指定缓冲区长度的引数，因此函数并不知道它是否使内存恶化，也就无法报告一个错误，你也就无从知道内存已恶化。当然，函数仅仅是失败而不腐化内存是最好的（情况）了。

This kind of misbehavior has been heavily exploited by malware in the past. Microsoft is now providing a set of new functions that replace the unsafe string manipulation functions (such as wcscat, which was shown earlier) provided by the C run-time library that many of us have grown to know and love over the years. To write safe code, you should no longer use any of the familiar C run-time functions that modify a string. (Functions such as strlen, wcslen, and _tcslen are OK, however, because they do not attempt to modify the string passed to them even though they assume that the string is 0 terminated, which might not be the case.) Instead, you should take advantage of the new secure string functions defined by Microsoft's StrSafe.h file.

这种不当行为以前常被malware所利用。微软目前提供新的函数集，来取代那些C运行时库提供的许多我们多年来熟知的、喜爱的、但不安全的字符串操作函数（如wcscat，之前提过）。编写安全的代码，不应再使用任何C运行时库家族的函数去修改字符串。（但是，使用strlen, wcslen或_tcslen这样的函数是OK的，因为他们并不修改传给它们的字符串，即使它们假定字符串以0结尾，也不会有那样的问题。）应该利用微软的StrSafe.h中定义的新安全版本的字符串函数的优点。

Note Internally, Microsoft has retrofitted its ATL and MFC class libraries to use the new safe string functions, and therefore, if you use these libraries, rebuilding your application to the new versions is all you have to do to make your application more secure.

注意微软更新了ATL和MFC的类库使用新的安全的字符串函数，因此如果你的应用程序使用了这些库，你所要做的就是rebuild你的程序到新版本，以使你的程序更安全。

Because this book is not dedicated to C/C++ programming, for a detailed usage of this library, you should take a look at the following sources of information:

由于此书并非专讲C/C++编程的，所以如果想了解更多使用该库的细节，请参考以下资源信息：

The MSDN Magazine article "Repel Attacks on Your Code with the Visual Studio 2005 Safe C and C++ Libraries" by Martyn Lovell, located at http://msdn.microsoft.com/msdnmag/issues/05/05/SafeCandC/default.aspx

The Martyn Lovell video presentation on Channel9, located at http://channel9.msdn.com/Showpost.aspx?postid=186406

The secure strings topic on MSDN Online, located at http://msdn2.microsoft.com/en-us/library/ms647466.aspx

The list of all C run-time secured replacement functions on MSDN Online, which you can find at http://msdn2.microsoft.com/en-us/library/wd3wzwts(VS.80).aspx

However, it is worth discussing a couple of details in this chapter. I'll start by describing the patterns employed by the new functions. Next, I'll mention the pitfalls you might encounter if you are following the migration path from legacy functions to their corresponding secure versions, like using _tcscpy_s instead of _tcscpy. Then I'll show you in which case it might be more interesting to call the new StringC* functions instead.

然而在本章中讨论一些细节是必要的。我开始会先讲述新函数采用的模式；之后会提到在将遗留函数迁移到相对应的安全版本的过程中，你可能会犯的错误，比如用_tcscpy_s代替_tcscpy；然后我会展示在何种情况下调用StringC*函数将会很有趣。

Introducing the New Secure String Functions

When you include StrSafe.h, String.h is also included and the existing string manipulation functions of the C run-time library, such as those behind the _tcscpy macro, are flagged with obsolete warnings during compilation. Note that the inclusion of StrSafe.h must appear after all other include files in your source code. I recommend that you use the compilation warnings to explicitly replace all the occurrences of the deprecated functions by their safer substitutes—thinking each time about possible buffer overflow and, if it is not possible to recover, how to gracefully terminate the application.

Each existing function, like _tcscpy or _tcscat, has a corresponding new function that starts with the same name that ends with the _s (for secure) suffix. All these new functions share common characteristics that require explanation. Let's start by examining their prototypes in the following code snippet, which shows the side-by-side definitions of two usual string functions:

PTSTR _tcscpy (PTSTR strDestination, PCTSTR strSource);

errno_t _tcscpy_s(PTSTR strDestination, size_t numberOfCharacters,

PCTSTR strSource);

PTSTR _tcscat (PTSTR strDestination, PCTSTR strSource);

errno_t _tcscat_s(PTSTR strDestination, size_t numberOfcharacters,

PCTSTR strSource);

When a writable buffer is passed as a parameter, its size must also be provided. This value is expected in the character count, which is easily computed by using the _countof macro (defined in stdlib.h) on your buffer.

All of the secure (_s) functions validate their arguments as the first thing they do. Checks are performed to make sure that pointers are not NULL, that integers are within a valid range, that enumeration values are valid, and that buffers are large enough to hold the resulting data. If any of these checks fail, the functions set the thread-local C run-time variable errno and the function returns an errno_t value to indicate success or failure. However, these functions don't actually return; instead, in a debug build, they display a user-unfriendly assertion dialog box similar to that shown in Figure 2-1. Then your application is terminated. The release builds directly auto-terminate.

The C run time actually allows you to provide a function of your own, which it will call when it detects an invalid parameter. Then, in this function, you can log the failure, attach a debugger, or do whatever you like. To enable this, you must first define a function that matches the following prototype:

当C运行期函数调用检测到一个无效参数时，允许你提供自己的函数。你可以在在此函数中记录失败日志、挂接调试器或其它任何你想做的，但是首先必须将函数按以下原型定义：

void InvalidParameterHandler(PCTSTR expression, PCTSTR function,
　　　　PCTSTR file, unsigned int line, uintptr_t /*pReserved*/);

The expression parameter describes the failed expectation in the C run-time implementation code, such as (L"Buffer is too small" && 0). As you can see, this is not very user friendly and should not be shown to the end user. This comment also applies to the next three parameters because function, file, and line describe the function name, the source code file, and the source code line number where the error occurred, respectively.

参数expression是C运行期实现代码中预期的失败描述，例如 (L"Buffer is too small" && 0)。如你所见，这并非是对用户友好的，不能显示给最终用户。对于接下来的三个参数也是一样，function、file和line分别描述发生错误的函数名、代码文件和代码行号。

Note All these arguments will have a value of NULL if DEBUG is not defined. So this handler is valuable for logging errors only when testing debug builds. In a release build, you could replace the assertion dialog box with a more user-friendly message explaining that an unexpected error occurred that requires the application to shut down—maybe with specific logging behavior or an application restart. If its memory state is corrupted, your application execution should stop. However, it is recommended that you wait for the errno_t check to decide whether the error is recoverable or not.

注意当DEBUG未被定义时所有这些引数均被赋为NULL，所以此句柄仅在debug版记日志时有效。在release版中，应该用含有更多对用户友好的信息，来解释该程序发生了一个非预期的错误需要shut down——可能还有指定的记录log的行为或重启此应用程序，用这些信息来取代弹出断言对话框的做法。如果内存状态在恶化，你的应用程序应该停止执行。但是建议你等待errno_t检查，来确定错误是否已经恢复。

The next step is to register this handler by calling _set_invalid_parameter_handler. However, this step is not enough because the assertion dialog box will still appear. You need to call _CrtSetReportMode(_CRT_ASSERT, 0); at the beginning of your application, disabling all assertion dialog boxes that could be triggered by the C run time.

下一步是调用_set_invalid_parameter_handler来注册此句柄。但这样是不够的，断言对话框仍将弹出。你需要在你程序的开始调用_CrtSetReportMode(_CRT_ASSERT, 0);使所用在C执行期会被触发的断言对话框无效。

Now, when you call one of the legacy replacement functions defined in String.h, you are able to check the returned errno_t value to understand what happened. Only the value S_OK means that the call was successful. The other possible return values found in errno.h, such as EINVAL, are for invalid arguments such as NULL pointers.

现在，当你调用String.h中定义的取代以往遗留的函数时，可以通过检查返回的errno_t值来理解发生的错误。只有返回值是S_OK才意味着调用成功了。其它可能的返回值在errno.h中有定义，如EINVAL，表示空指针之类的无效引数错误。

Let's take an example of a string that is copied into a buffer that is too small for one character:

举个例子，将一个字符串复制到仅能放下一个字符的buffer中：

TCHAR szBefore[5] = {
   TEXT('B'), TEXT('B'), TEXT('B'), TEXT('B'), '/0'
};

TCHAR szBuffer[10] = {
   TEXT('-'), TEXT('-'), TEXT('-'), TEXT('-'), TEXT('-'),
   TEXT('-'), TEXT('-'), TEXT('-'), TEXT('-'), '/0'
};

TCHAR szAfter[5] = {
   TEXT('A'), TEXT('A'), TEXT('A'), TEXT('A'), '/0'
};

errno_t result = _tcscpy_s(szBuffer, _countof(szBuffer),
   TEXT("0123456789"));

Just before the call to _tcscpy_s, each variable has the content shown in Figure 2-2.

调用_tcscpy_s之前，每个变量的内容如图2-2所示。

Figure 2-2: Variable state before the _tcscpy_s call

Because the "1234567890" string to be copied into szBuffer has exactly the same 10-character size as the buffer, there is not enough room to copy the terminating '/0' character. You might expect that the value of result is now STRUNCATE and the last character '9' has not been copied, but this is not the case. ERANGE is returned, and the state of each variable is shown in Figure 2-3.

将字符串"1234567890"复制到szBuffer，它正好为10个字符长，因此没有空间再复制'/0'结束符。也许你会期望结果是STRUNCATE并且最后一个字符'9'不被复制，然而事实并不是这样。函数会返回ERANGE，每个变量的状态如图2-3所示：

Figure 2-3: Variable state after the _tcscpy_s call

There is one side effect that you don't see unless you take a look at the memory behind szBuffer, as shown in Figure 2-4.

当你查看szBuffer之后的内存内容时，会发现它的一个副作用，如图2-4所示。

Figure 2-4: Content of szBuffer memory after a failed call

The first character of szBuffer has been set to '/0', and all other bytes now contain the value 0xfd. So the resulting string has been truncated to an empty string and the remaining bytes of the buffer have been set to a filler value (0xfd).

szBuffer的首字符被置为'/0'，其它所有字节被填充为0xfd。所以最终字符串被截断为空字符串，且buffer中剩余的字节被置为填充符(0xfd)。

Note If you wonder why the memory after all the variables have been defined is filled up with the 0xcc value in Figure 2-4, the answer is in the result of the compiler implementation of the run-time checks (/RTCs, /RTCu, or /RTC1) that automatically detect buffer overrun at run time. If you compile your code without these /RTCx flags, the memory view will show all sz* variables side by side. But remember that your builds should always be compiled with these run-time checks to detect any remaining buffer overrun early in the development cycle.

注意也许你想知道为什么图2-4中所有定义变量之后的内存均被填充为0xcc，答案是编译器中执行期检查(/RTCs，/RTCu，/RTC1)的实现，会在执行期自动检测缓冲区溢出。当不使用/RTCx标记编译代码时，内存查看器会并排显示所有sz*变量。请记住在build时应该一直选择使用执行期检查进行编译，这样可以在开发周期中尽早发现任何存在的缓冲区溢出错误。