Multibyte and Wide Characters

来源:互联网 发布:为何需要元数据系统 编辑:程序博客网 时间:2024/03/29 13:16

在MSDN上的介绍

 

A multibyte character is a character composed of sequences of one or more bytes. Each byte sequence represents a single character in the extended character set. Multibyte characters are used in character sets such as Kanji.

Wide characters are multilingual character codes that are always 16 bits wide. The type for character constants is char; for wide characters, the type is wchar_t. Since wide characters are always a fixed size, using wide characters simplifies programming with international character sets.

The wide-character-string literal L"hello" becomes an array of six integers of type wchar_t.

{L'h', L'e', L'l', L'l', L'o', 0}

The Unicode specification is the specification for wide characters. The run-time library routines for translating between multibyte and wide characters include mbstowcs, mbtowc, wcstombs, and wctomb.

 

See Also

Reference

C Identifiers

wide   characters指的是双字节表示一个字符。 
multibyte   characters指的是多字节表示一个字符。
wide   character是指全部用两个字节代表一个字符,即使这个字符是ASCII标准字符也必须用两个字节,如‘A '用0x41即可,但在wide   character中它必须为0x0041,这样结尾符也由0x00,变成了0x0000. 

multibyte   characters是MS制定的标准,界于UNICODE和ANSI字符之间,所有小于127的字符都用一个字节表示自己,如‘A '用0x41即可;所有大于127字符都表示它是个领头字节,将同后面一个字节联合起来表示一个字符,如中文字符。因此它的结尾符只需一个字节0x00.