What are TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR (etc.)?

来源:互联网 发布:重庆天畅软件 编辑:程序博客网 时间:2024/04/29 02:13

Many C++ Windows programmers getconfused over what bizarre identifierslike TCHAR,LPCTSTR are.In this article, I would attempt by best to clear out thefog.

In general, a character can be represented in 1 byte or 2 bytes.Let's say 1-byte character is ANSI character - all Englishcharacters are represented throughthis encoding. And let's say a 2-bytecharacter is Unicode, which can represent ALL languages in theworld. 

Visual C++ compiler supports char and wchar_t asnative data-types for ANSI and Unicode characters respectively.Though there is more concrete definitionof Unicode, but for understanding assumeit as two-byte character which Windows OS uses for multiplelanguage support.

What if you want your C/C++ code to be independent of characterencoding/mode used? 
Suggestion: Use generic data-types and names torepresent characters and string.

For example, instead of replacing:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
char cResponse; // 'Y' or 'N'char sUsername[64];// str* functions

with

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
wchar_t cResponse; // 'Y' or 'N'wchar_t sUsername[64];// wcs* functions

In order to support multi-lingual (i.e.Unicode) in your language, you can simply code it in more genericmanner:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
#include<TCHAR.H> // Implicit or explicit includeTCHAR cResponse; // 'Y' or 'N'TCHAR sUsername[64];// _tcs* functions

The following project setting inGeneral page describes which Character Set is to be used forcompilation:
(General -> Character Set)

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?

This way, when your project is beingcompiled as Unicode, the TCHAR wouldtranslate to wchar_t.If it is being compiled as ANSI/MBCS, it would be translatedto char.You are free to use char and wchar_t,and project settings will not affect any direct use of thesekeywords.

TCHAR isdefined as:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
#ifdef _UNICODEtypedef wchar_t TCHAR;#elsetypedef char TCHAR;#endif

The macro _UNICODE isdefined when you set Character Set to "Use Unicode CharacterSet", and therefore TCHARwouldmean wchar_t.When Character Set if set to "Use Multi-Byte CharacterSet", TCHAR would mean char.

Likewise, to support multiplecharacter-set using single code base, and possibly supportingmulti-language, use specific functions (macros). Instead ofusing strcpystrlenstrcat (includingthe secure versions suffixed with_s);or wcscpywcslenwcscat (includingsecure), you should better use use _tcscpy_tcslen_tcscatfunctions.

As youknow strlen isprototyped as:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
size_t strlen(const char*);

And, wcslen isprototyped as:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
size_t wcslen(const wchar_t* );

You may betteruse _tcslen,whichis logically prototypedas:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
size_t _tcslen(const TCHAR* );

WC isfor Wide Character. Therefore, wcs turnsto be wide-character-string. Thisway, _tcs wouldmean _T Character String. And you know _T maybe char or what_t,logically.

But, inreality, _tcslen (andother _tcs functions)areactually not functions,but macros. They are definedsimply as:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
#ifdef _UNICODE#define _tcslen wcslen #else#define _tcslen strlen#endif

You shouldrefer TCHAR.H tolookup more macro definitions like this.

You might ask why they are defined asmacros, and not implemented as functions instead? The reason issimple: A library or DLL may export a single function, with samename and prototype (Ignore overloading concept of C++). Forinstance, when you export a function as:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
void _TPrintChar(char);

How the client is supposed to call itas?

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
void _TPrintChar(wchar_t);

_TPrintChar cannot be magically convertedinto function taking 2-byte character. There has to be two separatefunctions:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
void PrintCharA(char); // A = ANSI void PrintCharW(wchar_t); // W = Wide character

And a simple macro, as defined below,would hide the difference:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
#ifdef _UNICODEvoid _TPrintChar(wchar_t); #else void _TPrintChar(char);#endif

The client would simply call it as:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
TCHAR cChar;_TPrintChar(cChar);

Note thatboth TCHAR and _TPrintChar wouldmap to either Unicode orANSI, and therefore cChar andthe argument to function would beeither char or wchar_t.

Macros do avoid these complications,and allows us to use either ANSI or Unicode function for charactersand strings. Most of the Windows functions, that take string or acharacter are implemented this way, and for programmersconvenience, only one function (a macro!) isgood. SetWindowText isone example:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
// WinUser.H#ifdef UNICODE#define SetWindowText  SetWindowTextW#else#define SetWindowText  SetWindowTextA#endif // !UNICODE

There are very few functions that donot have macros, and are available only withsuffixed W or A.One example isReadDirectoryChangesW,which doesn't have ANSI equivalent.


You all know that we use double quotation marks to representstrings. The string represented in this manner is ANSI-string,having 1-byte each character. Example:
What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
"This is ANSI String. Each letter takes 1 byte."

The string text given aboveis not Unicode,and would be quantifiable for multi-language support. To representUnicode string, you need to useprefix L.An example:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
L"This is Unicode string. Each letter would take 2 bytes, including spaces."

Notethe L at thebeginning of string, which makes it a Unicode string. Allcharacters (Irepeat all characters)would take two bytes, including all English letters, spaces,digits, and the null character. Therefore, length of Unicode stringwould always be in multiple of 2-bytes. A Unicode string of length7 characters would need 14 bytes, and so on. Unicode string taking15 bytes, for example, would not be valid in any context.

In general, string would be in multipleof sizeof(TCHAR) bytes!

When you need to express hard-codedstring, you can use:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
"ANSI String"; // ANSIL"Unicode String"; // Unicode_T("Either string, depending on compilation"); // ANSI or Unicode// or use TEXT macro, if you need more readability

The non-prefixed string is ANSI string,the L prefixedstring is Unicode, and string specifiedin _T or TEXT wouldbe either, depending on compilation.Again, _T and TEXT arenothing but macros, and are defined as:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
// SIMPLIFIED#ifdef _UNICODE  #define _T(c) L##c #define TEXT(c) L##c#else  #define _T(c) c #define TEXT(c) c#endif

The ## symbolis token pasting operator, which wouldturn _T("Unicode") into L"Unicode",where the string passed is argument to macro -If _UNICODE isdefined. If _UNICODE isnot defined, _T("Unicode") wouldsimply mean "Unicode".The token pasting operator did exist even in C language, and is notspecific about VC++ or character encoding.

Note that these macros can be used for strings as well ascharacters. _T('R') wouldturn into L'R' orsimple 'R' -former is Unicode character, latter is ANSI character.

No, you cannot usethese macros to convert variables (string or character) intoUnicode/non-Unicode text. Following is not valid:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
char c = 'C';char str[16] = "CodeProject";_T(c);_T(str);

The bold lines would get successfullycompiled in ANSI (Multi-Byte) build,since _T(x) wouldsimply be x,and therefore _T(c) and _T(str) wouldcome out to be c and str,respectively. But, when you build it with Unicode character set, itwould fail to compile:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
error C2065: 'Lc' : undeclared identifiererror C2065: 'Lstr' : undeclared identifier

I would not like to insult yourintelligence by describing why and what those errors are.

There exist set of conversion routineto convert MBCS to Unicode and vice versa, which I would explainsoon.

 


 

String classes, likeMFC/ATL's CString implementtwo versions using macro. There are two classes, namedCStringA forANSI, CStringW forUnicode. When you use CString (whichis typedef ontop of templates and Character setting), it translates to either oftwo classes. 

The TCHAR macrowas for a single character. You can definitely declare an arrayof TCHAR.What if you would like  toexpress a character-pointer, ora const-character-pointer -Which one of the following?

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
// ANSI characters foo_ansi(char*); foo_ansi(const char*);  char* pString; // Unicode/wide-string foo_uni(WCHAR*); wchar_t* foo_uni(const WCHAR*);  WCHAR* pString; // Independent foo_char(TCHAR*); foo_char(const TCHAR*);  TCHAR* pString;
After reading about TCHAR stuff,you'd definitely select the last one as your choice. But there area better alternatives available. Before that, notethat TCHAR.H headerfiledeclares only TCHAR datatype.For the following stuff, you need toinclude Windows.h (definedin WinNT.h).

NOTE: If your project implicitly or explicitlyincludes Windows.h,you need not include TCHAR.H

  • char* replacement: LPSTR
  • constchar* replacement: LPCSTR
  • WCHAR* replacement: LPWSTR
  • constWCHAR* replacement: LPCWSTR (C before W,since const isbefore WCHAR)
  • TCHAR* replacement: LPTSTR
  • constTCHAR* replacement: LPCTSTR

Now, I hope you understand thefollowing signatures:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
BOOL SetCurrentDirectory( LPCTSTR lpPathName );DWORD GetCurrentDirectory(DWORD nBufferLength,LPTSTR lpBuffer);

Continuing. You must have seen somefunctions/methods asking you topass number of characters, orreturning the number of characters. Well,like GetCurrentDirectory,you need to pass number of characters,and not numberof bytes. For example:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
TCHAR sCurrentDir[255]; // Pass 255 and not 255*2 GetCurrentDirectory(sCurrentDir, 255);

On the other side, if you need toallocate number or characters, you must allocate proper number ofbytes. In C++, you can simply use new:

 

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
LPTSTR pBuffer; // TCHAR* pBuffer = new TCHAR[128]; // Allocates 128 or 256 BYTES, depending on compilation.

But if you use memory allocationfunctions like mallocLocalAllocGlobalAlloc,etc; you must specify the number of bytes!

 

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)? Collapse CopyCode
pBuffer = (TCHAR*) malloc (128 * sizeof(TCHAR) );
Typecasting the return value is required, as you know. Theexpression in malloc'sargument ensures that it allocates desired number of bytes - andmakes up room for desired number of characters.

License

This article, along with any associatedsource code and files, is licensed under The Code Project OpenLicense (CPOL)

About the Author

AjayVijayvargiya
原创粉丝点击