unicode 与 ansi问题

来源：互联网发布：淘宝网十字绣图案编辑：程序博客网时间：2024/06/05 21:17

char * strchr(const char *,int);wchar_t * wcschr(const wchar_t *,wchar_t);int strcmp(const char *,const char *);int wcscmp(const wchar_t *,const wchar_t *);char * strcpy(char *,const char *);wchar_t * wcscpy(wchar_t *,const wchar_t *);size_t strlen(const char *);size_t wcslen(const wchar_t *);

char * strcat(char *,const char *);wchar_t * wcscat(wchar_t *,const wchar_t *);

以上摘自《WINDOWS核心编程》。

可以看到，所有UNICODE函数都是以WCS开头，WCS为宽字符英文缩写，若要调用UNICODE函数，只需要用前缀WCS来代替ANSI字符串函数前缀str即可。

1.C运行期库函数TChar.h

C库函数中有这样一个头文件：TChar.h 它的唯一作用就是帮助创建ANSI/UNICODE通用源代码文件，如果在编译源代码文件时定义了_UNICODE这个宏，TChar.h中的通用函数就被编译成wcs这组函数，否则被编译成str这组函数。如TChar.h中有个宏_tcscpy，它在定义了_UNICODE宏的情况下被被编译为wcscpy，否则被编译成strcpy。

另外需注意TChar.h中还定义了通用字符类型TCHAR。

在未定义_UNICODE宏时，可以定义这样一个字符串：

TCHAR str[100]="abc";

而若定义了_UNICODE的话，编译此句会出错误，因为编译器被要求能编译所有的ANSI字符串，而非UNICODE字符串，所以必须显示告诉编译器将“abc“ 作为UNICODE字符进行编译，可这样定义：

TCHAR str[100]=L"abc';

如此编译才不会出现问题。

实际上，这样的代码可移植性不高，如果源文件中取消定义宏_UNICODE，则必须手动删除所有的L。为避免这样的情况发生，TChar.h中还定义了这样一个宏_TEXT;

#ifdef _UNICODE

#define _TEXT(x) L##X

#ifndef_UNICODE

#define _TEXT(x) x

在定义字符串时，可这样定义：

TCHAR str[100]=_TEXT("abc");

如此的话，无论是否定义了_UNICODE宏，都能被编译成正确形式。

2.WINDOWS头文件定义的数据类型

(1).UNICODE数据类型

WCHAR :unicode字符

PWSTR: 指向UNICODE字符串的指针

PWCSTR:指向UNICODE常字符串的指针

(2).通用数据类型

PTSTR：指向字符串的指针

PTCSTR：指向常字符串的指针

具体被编译为ANSI还是UNICODE取决于源程序中是否定义了宏UNICODE。（注意不是_UNICODE，_UNICODE用于C运行期头文件，而UNICODE则用于WINDOWS头文件。编译源代码时，通常同时定义这两个宏。

3.API的UNICODE和ANSI版本

应该注意的问题是，大多数API函数都有两个版本UNICODE和ANSI类型，具体调用那种类型取决于系统中是否定义了UNICODE宏。事实上WINDOWS2000以后的API函数内部其实只有一个UNICODE版本，而ANSI版本只不过是个形式替换层，每个ANSI版本的API内部都是将ANSI类型的参数转换成UNICODE类型的参数，然后再调用UNICODE版本的API，除此之外，没有任何新的东西扩充进去。

4.WINDOWS字符串函数

windows.h提供了一组通用字符串操作函数：

函数

描述

l s t r c a t将一个字符串置于另一个字符串的结尾处l s t r c m p对两个字符串进行区分大小写的比较l s t r c m p i对两个字符串进行不区分大小写的比较l s t r c p y将一个字符串拷贝到内存中的另一个位置l s t r l e n返回字符串的长度（按字符数来计量）

这些函数既可以被编译成UNICODE版本，也可以被编译成ANSI版本，取决于源文件中是否定义了UNICODE宏。

通常，在编程时，应尽量应用这些通用字符串函数，而不是以前的C标准库函数（如strcmp,strcat等）

5.对于printf的重要扩充——wsprintf

MS对printf函数家族增加了一些特殊类型，具体实例如下：

// unicode.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include "windows.h"
#include "stdio.h"
#include "TChar.h"
int main(int argc, char* argv[])
{
char szA1[100]; //An ANSI string buffer
char szA2[100];
WCHAR szW1[100]; //A Unicode string buffer
WCHAR szW2[100];
//Normal sprintf:all strings are ANSI
sprintf(szA1, "%s","ANSI Str");
printf("str A1 is:%s/n",szA1);
//Converts Unicode string to ANSI
sprintf(szA2,"%S",L"Unicode Str");
printf("str A2 is:%s/n",szA2);
printf("the value of comparing A1 and A2 is:%d/n",::lstrcmp(szA1,szA2));
//Normal swprintf:all strings are Unicode
swprintf(szW1,L"%s",L"Unicode Str");
wprintf(L"str W1 is:%s/n",szW1);
//Converts ANSI string to Unicode
swprintf(szW2,L"%S", "ANSI Str");
wprintf(L"str W2 is:%s/n",szW2);
wprintf(L"the value of comparing W1 and W2 is:%d/n",::wcscmp(szW1,szW2));
return 0;
}

执行结果如下：

str A1 is:ANSI Str
str A2 is:Unicode Str
the value of comparing A1 and A2 is:-1
str W1 is:Unicode Str
str W2 is:ANSI Str
the value of comparing W1 and W2 is:1
Press any key to continue

注意%s和%S的使用，%S只有在UNICODE和ANSI之间的转换时才使用。

6.unicode和ansi字符串的转换函数

int MultiByteToWideChar(    UINT CodePage,          //code page    DWORD dwFlags,          //character-type options    LPCSTR lpMultiByteStr,  //address of string to map    int cchMultiByte,       //number of bytes in string    LPWSTR lpWideCharStr,   //address of wide-character buffer    int cchWideChar         //size of buffer);

int WideCharToMultiByte(  UINT CodePage,         // code page  DWORD dwFlags,         // performance and mapping flags  LPCWSTR lpWideCharStr, // address of wide-character string  int cchWideChar,       // number of characters in string  LPSTR lpMultiByteStr,  // address of buffer for new string  int cchMultiByte,      // size of buffer  LPCSTR lpDefaultChar,  // address of default for unmappable                          // characters  LPBOOL lpUsedDefaultChar   // address of flag set when default                              // char. used);
具体使用参见MSDN。
总之，在编写windows应用程序时，应尽量使用unicode类型字符，这样的话能在无形中提高程序的效率。具体应遵守的原则如下：
1.将通用数据类型TCHAR,PTSTR,PTCSTR用于文本字符和字符串
2.将显示数据类型（如BYTE,PBYTE）用于字节，字节指针和数据缓存
3.用TEXT宏定义字符串
4.修改字符串运算问题，如某些函数通常需要传递一个缓存大小，而非字节，这意味着不该传递sizeof(szBuffer),而是
（sizeof(szBuffer)/sizeof(TCHAR)）。
5.对字符串的操作函数应该尽量使用WINDOWS函数,lstrcmp,lstrcat等，避免使用wcscmp,wcscat,strcmp,strcpy等。

0 0