unicode to ansi char
来源:互联网 发布:台湾对外贸易数据 编辑:程序博客网 时间:2024/05/22 03:46
I need to make a detour for a few moments, and discuss how to handle strings in COM code. If you are familiar with how Unicode and ANSI strings work, and know how to convert between the two, then you can skip this section. Otherwise, read on.
Whenever a COM method returns a string, that string will be in Unicode. (Well, all methods that are written to the COM spec, that is!) Unicode is a character encoding scheme, like ASCII, only all characters are 2 bytes long. If you want to get the string into a more manageable state, you should convert it to a TCHAR
string.
TCHAR
and the _t
functions (for example, _tcscpy()
) are designed to let you handle Unicode and ANSI strings with the same source code. In most cases, you'll be writing code that uses ANSI strings and the ANSI Windows APIs, so for the rest of this article, I will refer to char
s instead of TCHAR
s, just for simplicity. You should definitely read up on the TCHAR
types, though, to be aware of them in case you ever come across them in code written by others.
When you get a Unicode string back from a COM method, you can convert it to a char
string in one of several ways:
- Call the
WideCharToMultiByte()
API. - Call the CRT function
wcstombs()
. - Use the
CString
constructor or assignment operator (MFC only). - Use an ATL string conversion macro.
WideCharToMultiByte()
You can convert a Unicode string to an ANSI string with the WideCharToMultiByte()
API. This API's prototype is:
int WideCharToMultiByte ( UINT CodePage, DWORD dwFlags, LPCWSTR lpWideCharStr, int cchWideChar, LPSTR lpMultiByteStr, int cbMultiByte, LPCSTR lpDefaultChar, LPBOOL lpUsedDefaultChar );
The parameters are:
CodePage
- The code page to convert the Unicode characters into. You can pass
CP_ACP
to use the current ANSI code page. Code pages are sets of 256 characters. Characters 0-127 are always identical to the ASCII encoding. Characters 128-255 differ, and can contain graphics or letters with diacritics. Each language or region has its own code page, so it's important to use the right code page to get proper display of accented characters. dwFlags
dwFlags
determine how Windows deals with "composite" Unicode characters, which are a letter followed by a diacritic. An example of a composite character isè
. If this character is in the code page specified inCodePage
, then nothing special happens. However, if it is not in the code page, Windows has to convert it to something else.
PassingWC_COMPOSITECHECK
makes the API check for non-mapping composite characters. PassingWC_SEPCHARS
makes Windows break the character into two, the letter followed by the diacritic, for examplee`
. PassingWC_DISCARDNS
makes Windows discard the diacritics. PassingWC_DEFAULTCHAR
makes Windows replace the composite characters with a "default" character, specified in thelpDefaultChar
parameter. The default behavior isWC_SEPCHARS
.lpWideCharStr
- The Unicode string to convert.
cchWideChar
- The length of
lpWideCharStr
in Unicode characters. You will usually pass -1, which indicates that the string is zero-terminated. lpMultiByteStr
- A
char
buffer that will hold the converted string. cbMultiByte
- The size of
lpMultiByteStr
, in bytes. lpDefaultChar
- Optional - a one-character ANSI string that contains the "default" character to be inserted when
dwFlags
containsWC_COMPOSITECHECK | WC_DEFAULTCHAR
and a Unicode character cannot be mapped to an equivalent ANSI character. You can pass NULL to have the API use a system default character (which as of this writing is a question mark). lpUsedDefaultChar
- Optional - a pointer to a
BOOL
that will be set to indicate if the default char was ever inserted into the ANSI string. You can pass NULL if you don't care about this information.
Whew, a lot of boring details! Like always, the docs make it seem much more complicated than it really is. Here's an example showing how to use the API:
// Assuming we already have a Unicode string wszSomeString...char szANSIString [MAX_PATH]; WideCharToMultiByte ( CP_ACP, // ANSI code page WC_COMPOSITECHECK, // Check for accented characters wszSomeString, // Source Unicode string -1, // -1 means string is zero-terminated szANSIString, // Destination char string sizeof(szANSIString), // Size of buffer NULL, // No default character NULL ); // Don't care about this flag
After this call, szANSIString
will contain the ANSI version of the Unicode string.
wcstombs()
The CRT function wcstombs()
is a bit simpler, but it just ends up calling WideCharToMultiByte()
, so in the end the results are the same. The prototype for wcstombs()
is:
size_t wcstombs ( char* mbstr, const wchar_t* wcstr, size_t count );
The parameters are:
mbstr
- A
char
buffer to hold the resulting ANSI string. wcstr
- The Unicode string to convert.
count
- The size of the
mbstr
buffer, in bytes.
wcstombs()
uses the WC_COMPOSITECHECK | WC_SEPCHARS
flags in its call toWideCharToMultiByte()
. To reuse the earlier example, you can convert a Unicode string with code like this:
wcstombs ( szANSIString, wszSomeString, sizeof(szANSIString) );
CString
The MFC CString
class contains constructors and assignment operators that accept Unicode strings, so you can let CString
do the conversion work for you. For example:
// Assuming we already have wszSomeString...CString str1 ( wszSomeString ); // Convert with a constructor.CString str2; str2 = wszSomeString; // Convert with an assignment operator.
ATL macros
ATL has a handy set of macros for converting strings. To convert a Unicode string to ANSI, use theW2A()
macro (a mnemonic for "wide to ANSI"). Actually, to be more accurate, you should useOLE2A()
, where the "OLE" indicates the string came from a COM or OLE source. Anyway, here's an example of how to use these macros.
#include <atlconv.h>// Again assuming we have wszSomeString...{char szANSIString [MAX_PATH];USES_CONVERSION; // Declare local variable used by the macros. lstrcpy ( szANSIString, OLE2A(wszSomeString) );}
The OLE2A()
macro "returns" a pointer to the converted string, but the converted string is stored in a temporary stack variable, so we need to make our own copy of it with lstrcpy()
. Other macros you should look into are W2T()
(Unicode to TCHAR
), and W2CT()
(Unicode string to const TCHAR
string).
There is an OLE2CA()
macro (Unicode string to a const char
string) which we could've used in the code snippet above. OLE2CA()
is actually the correct macro for that situation, since the second parameter to lstrcpy()
is a const char*
, but I didn't want to throw too much at you at once.
Sticking with Unicode
On the other hand, you can just keep the string in Unicode if you won't be doing anythingcomplicated with the string. If you're writing a console app, you can print Unicode strings with thestd::wcout
global variable, for example:
wcout << wszSomeString;
But keep in mind that wcout
expects all strings to be in Unicode, so if you have any "normal" strings, you'll still need to output them with std::cout
. If you have string literals, prefix them with L
to make them Unicode, for example:
wcout << L"The Oracle says..." << endl << wszOracleResponse;
If you keep a string in Unicode, there are a couple of restrictions:
- You must use the
wcsXXX()
string functions, such aswcslen()
, on Unicode strings. - With very few exceptions, you cannot pass a Unicode string to a Windows API on Windows 9x. To write code that will run on 9x and NT unchanged, you'll need to use the
TCHAR
types, as described in MSDN.
- unicode to ansi char
- UNICODE to ANSI 和 ANSI to UNICODE
- unicode to ansi
- UTF8/ANSI to Unicode
- ASCII、ANSI、UNICODE,char、byte
- ANSI Char* 和 Unicode Char* 互换
- unicode, ansi, ascII, char ,TCHAR, wchar_t &&
- ANSI UNICODE char wchar _T L 简述
- char与wchar_t的区别 ANSI Unicode
- unicode cstring to char*
- unicode to char
- JNI unicode jstring to char*
- JNI unicode jstring to char*
- vc++ Unicode Cstring to char*
- Unicode下Cstring to char*
- Unicode下Cstring to char*
- ANSI,Unicode;char,wchar_t, TCHAR;LPSTR, LPWSTR总结
- Unicode 和 Ansi转换方法——wchar_t*、char*
- 题目1183:守形数
- 汇总一下iOS6,iOS7的新特性
- 手写链表(二)-使用内部类实现添加、查询、定位、插入等功能
- Speex手册(二)——Speex介绍和编解码器描述1
- Android图形用户界面开发之ViewTree和DecorView详细介绍
- unicode to ansi char
- [MDIT每天一小时]活动发起倡议书
- NSTimer 详解
- JavaEE之DAO设计模式
- incompatibe types between TEncoding and IIdTextEncoding
- UVA 11280 - Flying to Fredericton SPFA变形
- 关于驱动开发中mmap函数的实现
- 黑马程序员_12基本数据类型,对象包装类
- Tomcat服务器之安全设置